Lowering Bandwidth Use

Tile-based renderers can enable lower bandwidth approaches where intermediate per-pixel data is shared directly from the tile-memory, and the GPU only writes the final lit pixels back to memory.

For a deferred shading G-Buffer, that can use four 1080p 32bpp intermediate textures, this approach can save up to 4GB/s of bandwidth at 60 FPS.

The following extensions expose this functionality in OpenGL ES:

  • ARM_shader_framebuffer_fetch
  • ARM_shader_framebuffer_fetch_depth_stencil
  • EXT_shader_pixel_local_storage

In Vulkan, using mergeable subpasses allows access to this functionality.


Tile-based rendering carries a number of advantages, in particular it gives significant reductions in the bandwidth associated with framebuffer data and provides low-cost anti-aliasing. However, there is an important downside to consider.

The principal additional overhead of any tile-based rendering scheme applies at the point of hand-over from the geometry pass to the fragment pass.

The GPU must store the output of the geometry pass -- the per-vertex varying data and tiler intermediate state -- to main memory, which the fragment pass will subsequently read. There is therefore a balance to be struck between the extra bandwidth costs related to geometry, and the bandwidth savings for the framebuffer data.

It is also important to consider that some rendering operations, such as tessellation, are disproportionately expensive for a tile-based architecture. These operations are designed to suit the strengths of the immediate mode architecture where the explosion in geometry data can be buffered inside the on-chip FIFO buffer, rather than being written back to main memory.

Previous Next