This is the third in our series of blogs about real-time ray tracing. If you've not read the previous two yet, you can read them here and here, respectively.
So far, we've discussed how we used sparse voxel reflections and hybrid ray tracing in our client projects and tech demos. In this blog, we're going to outline how we've built on these advancements using hardware-accelerated techniques.
Hardware-accelerated ray tracing significantly advanced the development of this technology. The creation and integration of DXR into DirectX 12 (DX12) by Microsoft enabled graphics engineers to cast rays against the geometry already present in their rendering engines. Nvidia's RTX GPUs also featured ray tracing cores that accelerated ray casting the same way shader cores accelerated the rasterisation process.
Integration
We first developed and upgraded our core rendering tech to support DX12. We have many custom rendering techniques and effects including lighting, shadowing, and post processing, and each of these required refactoring to optimally support DX12 rendering. Once DX12 was up and running, we developed DXR passes based on the techniques we'd previously developed for the compute path tracer.
Now that each ray-traced pass was running with DXR ray tracing shaders, we could delve into optimisations to get each effect running in real time while maintaining the quality we had previously been able to achieve.
Our development team used Nvidia Quadro RTX 6000 GPUs to profile and optimise these render passes for real time. The RTX 6000 has 72 ray-tracing cores, enabling it to cast millions of rays within each 16ms frame.
Performance
With DXR, we can use the geometry and material data that is already loaded by our application. An acceleration structure is created and used for fast ray-to-geometry collision detection and hit shaders for each material. The same data is used during rasterisation and when evaluating the data for the point the rays hit.
When we first tried this technique, the performance was poor. This was due to the complexity of the BVH (Bounding Volume Hierarchy) in the acceleration structure and the large number of different materials in the models. So, to improve performance, we had to minimise this overhead and reduce this data to simplify the BVH and hit shader. We did this by enhancing 3 areas as much as possible, as outlined below.
The first was achieving high-quality reflections on every detail. This requires comprehensive materials that match the look, detail, and quality of the main render.
The second was ensuring that the shadows were detailed. To get a perfect shadow cast from every detail, the BVH must contain all polygon information. A single vehicle configuration typically ranges from 5 million to 20 million polygons.
The third was making sure that all configurations and animations were dynamic. This meant we had to enable mesh and material switching as well as allowing individual parts of the digital twin model to animate.
To achieve this, we created a proxy model to reduce the number of draw calls and materials. The essential material information is stored within a block of look-up data that can be persistent on the GPU. The hit shader uses indices to quickly find the parameters required for each triangle in these models, which removes the need for meshes to be split per material. It is important to find the right balance between features and performance: the reduction of draw calls limits the dynamic configurations, and the detail required in the mesh for the shadows and reflections limits any polygon reduction.
Instead of having a greater per-frame performance hit, the generation of the proxy was made to be as optimal as possible. This way, we could have the highest quality mesh and material data for every detail on the vehicle while maintaining the ability to be dynamic. The updated proxy and acceleration structure can be generated in parallel as a new configuration, or a vehicle's data gets loaded into the experience.
Number of Rays
A commonly used technique in gaming is to render fewer rays and use data from previous frames. This data is then improved using AI-based denoisers. The results of this are very good, but the temporal nature of this technique results in ghosting artefacts and noise when the camera is quickly switching, causing the previous frame's data to be discarded
However, this method will not work for us. Due to the nature of our projects, our clients expect perfection, and the end users (their customers) won't understand - or care about - the reasons behind any graphical artifacts or errors. As a result, we do not have the luxury of using the same methods that are common in gaming, as we must minimise these imperfections.
To balance the ray count to be optimal for real-time visualisations in our solutions, we experimented with reduced-resolution ray-traced passes. However, this actually introduced many more aliasing artefacts, because detailed edges hit incorrect pixels and caused gaps to appear.
To avoid this, we developed a variable rate shading (VRS) solution. With VRS, the ray-traced and rasterised geometry edges are highlighted with a pre-pass. Using this data, rays are then cast for every pixel on the edges, and pixels further away are rendered in much larger blocks of data, up to a 4x4 pixel block.
Results
The ability to render ray-traced content at 4K in real time while maintaining 60FPS makes a big improvement to the fine details that enhance the realism of our digital twin models. The major improvements from previous ray-tracing developments are clearly visible in this latest integration. To start with, the ray tracing now has infinite detail, unlike the voxels, which enables every part of the car to reflect and cast shadows perfectly. In addition, the hardware acceleration available on Nvidia RTX GPUs has advanced the quality of the effects by enabling the casting of many thousands of extra rays, which adds further flexibility to the effects and improves the overall results. The ability to share resources between rasterisation and ray tracing has removed the large BVH overhead we see in the compute ray tracer and enabled the visualisation of much more dynamic content, such as complex configurations and animations.
The integration of DXR real-time ray tracing into the ZeroLight product library has enabled the advancement of graphical quality across a range of services, which we are beginning to roll out to multiple clients worldwide.