High load store
To improve the performance of applications that are GPU-limited, and that have high shader loads dominated by load/store operations, you should improve memory access efficiency and vectorization in your shader programs.
To reduce a high load/store load:
- Improve access density, by using vector loads in compute shaders, and access patterns that touch adjacent data from adjacent threads in each warp. This will enable a single cache line access to return data for multiple threads.
- Reduce cache pressure, by reducing precision and improving spatial locality of accesses.
- Avoid using
imageLoad()calls for read-only texture accesses. Use
- Avoid using atomic calls, because they have a high per-thread cost.