This project demonstrates that variable-rate compressed textures are viable on modern GPUs, achieving over 1500 FPS when rendering the Sponza scene with JPEG textures on an RTX 4090. The results of this work are a first step towards adopting more advanced formats such as AVIF and JPEG XL, which offer improved compression efficiency and could ultimately enable much larger texture datasets to fit within GPU memory.
The implementation uses OpenGL to first create a G-Buffer, and CUDA to decode the required JPEG blocks (aka Minimum Coded Units - MCUs; 16 x 16 pixels). The rendering pipeline comprises following passes:
- Geometry Pass: Create a G-Buffer with uv-coordinates, texture-id, and mip map level.
- Mark Pass: Identify the MCUs of the JPEGs that need decoding, and put them in a queue. A hash-map based texture block cache is used to only queue MCUs for decoding that were not visible in the previous frame.
- Decode Pass: Decodes the queued MCUs and stores the results in the texture block cache.
- Resolve Pass: Replace the G-Buffer components with the decoded texels.
Take Aways:
- Texturing with JPEG needs less than 0.3ms per frame, showing that concepts of JPEG - and in the future AVIF and JPEG XL - could be adopted for efficient GPU-friendly, variable-rate compressed texture formats that require less than 1 bit per texel.
- Deferred rendering is needed as it allows us to identify what needs decoding, then let workgroups collaboratively decode.
- Mip Mapping provides a massive performance boost for JPEG-based textures, as it reduces the amount of visible MCUs that need decoding to a fraction, e.g., from 81k to 12k in certain viewpoints in Sponza, or from 224k to 22k in another test scene (Graffiti).
- Caching: A texture block cache then further halves the time spent on the JPEG rendering pipeline in each frame.
- VR: It scales well to VR since the left and the right eye share most of the visible MCUs. In VR, 0.65ms per frame is spent on the JPEG rendering pipeline, mainly due to the much higher framebuffer size (2MP on desktop vs. 6.8MP per eye in VR).
Dependencies:
- CUDA 12.4
- Visual Studio 2022 (version 17.10.3)
- A GPU with compute capability 8.6
Create Visual Studio solution files in a build folder via cmake:
TODO
- To load your own glb model with JPEG textures, modify loadGraffiti() in main.cpp.
- To view the Sponza scene, replace loadGraffiti() with loadSponza().
- To create a glb with JPEG textures, load your glb in https://www.gltfeditor.com/, then save with Textures = JPEG and the desired level of quality.
The most relevant files for modification are as follows:
| main.cpp | |
| SplatEditor_render.h | Most of the relevant host-side OpenGL and CUDA calls. |
| jpeg.cu | Contains the kernels of the JPEG rendering pipeline: mark, decode and resolve |
.png)

![Someone built a CRT case for the Switch 2 [video]](https://www.youtube.com/img/desktop/supported_browsers/edgium.png)
