Mali Bifrost Usage Recommendations for Texture and Sampler Descriptors
Vulkan is a low level rendering API which exposes the hardware more directly to the application than the earlier APIs such as OpenGL ES. This enables a much lighter driver, reducing CPU load and improving energy efficiency, but in return places respo...
By Peter Harris
Vulkan is a low level rendering API which exposes the hardware more directly to the application than the earlier APIs such as OpenGL ES. This enables a much lighter driver, reducing CPU load and improving energy efficiency, but in return places responsibility on the application to make the best use of the underlying hardware because the driver has less visibility and fewer behavioral guarantees which would allow it to transparently optimize the applications’ hardware usage.
This blog documents the recommended application usage of texture and sampler descriptors to get the best performance out of the current Mali Bifrost GPUs.
Hardware behavior
The current Bifrost GPUs use variable sized caches to store texture and sampler descriptors in the texturing unit. Each descriptor is classified as either “compact” or “full” depending on the settings it contains, and the hardware cache can contain 16 compact entries and 8 full sized entries. Application usage which maps to full sized entries will have fewer cache entries available, and will therefore be more prone losing performance due to cache pressure.
In OpenGL ES the API specifies the defaults for many parameters which will map to “compact” entries unless overridden by the application. However, for Vulkan the application specifies all of the descriptor settings and it is important that it uses values which map to compact settings to get access to the maximum capacity of the descriptor cache.
Impacted GPU releases
Due to the potential impact of the small full descriptor cache, in particular on Vulkan content, new IP releases of the impacted GPUs provide a 24 entry cache size, irrespective of content of the descriptors.
|
GPU Product |
Impacted Releases |
Patched Release |
|
Mali-G71 |
r0p0 |
r0p1 |
|
Mali-G51 |
r0p0-r0p1, r1p0 |
r1p1 |
|
Mali-G72 |
r0p0-r0p2 |
r0p3 |
Application best practice for Vulkan
Applications using Vulkan are responsible for supplying all parameters for the texture and sampler descriptors themselves - there are no safe API-specified defaults - which means that applications need to supply parameters which map to the Mali compact samplers as often as possible.
Sampler descriptor settings
To qualify for the compact sampler descriptor optimization all of the following constraints should be followed by the application when populating the VkSamplerCreateInfo structure:
- Set sampler addressMode(U|V|W) so they are all the same
- Note that addressModeW must be set to be the same as U and V even when sampling a 2D texture
- Set sampler mipLodBias to 0.0
- Set sampler minLod to 0.0
- Set sampler maxLod to 1000.0
- Set sampler anisotropyEnable to VK_FALSE
- Set sampler maxAnisotropy to 1.0
- Set sampler borderColor to VK_BORDER_COLOR_FLOAT_TRANSPARENT_BLACK
- Set sampler unnormalizedCoordinates to VK_FALSE
It should be noted that the requirements for compact samplers conflict with the Vulkan specification's recommended approach for emulating GL_NEAREST (no filtering on samples read from mip 0) and GL_LINEAR (bilinear filtering on samples from mip 0) sampling for mipmapped textures.
There are no Vulkan filter modes that directly correspond to OpenGL minification filters of GL_LINEAR or GL_NEAREST, but they can be emulated using VK_SAMPLER_MIPMAP_MODE_NEAREST, minLod = 0, and maxLod = 0.25, and using minFilter = VK_FILTER_LINEAR or minFilter = VK_FILTER_NEAREST, respectively.
To emulate these two texture filtering modes for a texture with multiple mipmaps levels, while also being compatible with the requirements for compact samplers, use the following recommendation.
- Use a VkImageView instance which references only the level 0 mipmap by setting baseMipLevel to 0 and levelCount to 1.
- Use a VkSampler with pCreateInfo.maxLod setting to 1000.0 in accordance with the compact sampler restrictions.
Note: Direct access to textures through imageLoad() and imageStore() in shader programs (or equivalent in SPIR-V) are not impacted by this issue.
Texture descriptor settings
To qualify for the compact texture descriptor optimization the following constraints should be followed by the application when populating the VkImageViewCreateInfo structure:
- Set all fields in view components to either VK_COMPONENT_SWIZZLE_IDENTITY or the explicit per-channel identity mapping equivalent
- Set view subresourceRange.baseMipLevel to 0
Application best practice for OpenGL ES
Applications using the OpenGL ES API will use descriptors which are populated with default values defined in the API specification. These defaults will map to the compact descriptor entry types, unless settings are explicitly overridden by the application to values which are not compatible with the compact descriptor requirements. Also, as the pairing of texture and sampler is known at draw time, the driver can specialize sampler descriptor settings given to the GPU based on the actual texture in use which allows compact samplers to be used more often.
The following settings of texture and/or sampler objects should be used to ensure use of compact samplers:
- Set GL_TEXTURE_WRAP_(S|T|R) to identical values
- Note that the GL driver can specialize the sampler state based on the current texture so, unlike Vulkan, there is no need to set GL_TEXTURE_WRAP_R for 2D textures
- Do not use GL_CLAMP_TO_BORDER
- Set GL_TEXTURE_MIN_LOD to -1000.0 (default)
- Set GL_TEXTURE_MAX_LOD to +1000.0 (default)
- Set GL_TEXTURE_BASE_LEVEL to 0 (default)
- Set TEXTURE_SWIZZLE_R to GL_RED (default)
- Set TEXTURE_SWIZZLE_G to GL_GREEN (default)
- Set TEXTURE_SWIZZLE_B to GL_BLUE (default)
- Set TEXTURE_SWIZZLE_A to GL_ALPHA (default)
- Set GL_TEXTURE_MAX_ANISOTROPY_EXT to 1.0 if the EXT_texture_filter_anisotropic filtering extension is available
Note: Direct access to textures through imageLoad() and imageStore() in shader programs are not impacted by this issue.
Reasonable usage of full descriptors
It should be noted that this best practice aims to get the best use out of a hardware cache. Applications are able to use small numbers of textures and samplers which use full sized cache entries without any performance impact – the cache is smaller, not non-existent – so if using swizzles and LOD clamps is needed to correctly implement a rendering algorithm then don’t be afraid to use them. However, where it is possible to transparently substitute the recommended compact settings, such as substituting 1000.0 rather than using the maximum mipmap level present in a texture for maxLod in the sampler descriptor, it is highly recommended that you do so.
By Peter Harris
Re-use is only permitted for informational and non-commercial or personal use only.
