Why treat textures differently?
For most content, texture access is one of the main consumers of memory bandwidth in a system. Textures are usually represented as 2D image data. As content texture resolution and texture count increases, so does memory bandwidth. This memory bandwidth overhead can slow down GPU performance if we cannot load data fast enough to keep the shader cores busy. In addition, DRAM access is energy intensive, so high bandwidth also increases power consumption and thermal load.
To reduce the impact on performance, we can use specialized compression schemes to reduce the size of texture resources. These real-time compression schemes are significantly different to the more general types of compression, like JPEG or PNG, that you are probably familiar with.
Traditional compression schemes like JPG and PNG are designed to compress or decompress an entire image. They can achieve very good compression rates and image quality. However, they are not designed to let you access smaller portions of the full image easily without decompressing the entire image.
When mapping 2D textures onto a model, individual texels might be extracted from the full texture image in an unpredictable order:
- Not all texture elements might be needed. For example, depending on the orientation of the model, and any other objects that might be obscuring parts of it.
- Texels that are rendered next to each other in the final image may originate from different parts of the texture.
The following image shows the arrangement of different texture elements within a texture image, and a model with that texture applied. Notice that adjacent texels in the rendered image are not necessarily adjacent in the texture image.
It is computationally expensive for a GPU to decompress the entire image when it only needs a small portion of the whole. As a result, real-time compression schemes are designed to provide efficient decoding for random sampling of elements within a larger image, when they are used in shaders.
There are various techniques to achieve this result, but most algorithms do the following:
- Compress a fixed size NxM texel input block
- Write this compressed block out into a fixed number of bits.
This allows simple address calculation in the GPU. Because all input and output sizes are fixed, it is a relatively simple pointer offset calculation to calculate a sample address. We therefore only need to access the data from one NxM block to decompress any single texel.