Midgard GPU Architecture
The Midgard family of Mali GPUs use a unified shader core architecture. This means that only a single type of hardware shader processor exists in the design. This single shader processor can execute all types of shader code, such as vertex shaders, fragment shaders and compute shaders.
The exact number of shader cores in a silicon chip can vary. We license configurable designs to our silicon partners, who can then choose how to configure the GPU in their specific chipset, based on their performance needs and silicon area constraints.
For example, the Mali-T880 GPU can scale from a single core, for low-end devices, up to 16 cores for the highest performance designs.
The following diagram provides a top-level overview of the Control Bus and Data Bus of a typical Mali Midgard GPU:
To improve performance, and to reduce memory bandwidth wasted on repeated data fetches, the shader cores in the system all share access to a level 2 cache. The size of the L2 cache, while configurable by our silicon partners, is typically in the range of 32-64KB per shader core in the GPU.
Also, our silicon partners can configure the number, and bus width, of the memory ports that the L2 cache has to external memory.
The Midgard architecture aims to write one 32-bit pixel, per core, per clock. Therefore, it is reasonable to expect an eight-core design to have a total of 256-bits of memory bandwidth, for both read and write, per clock cycle. This can vary between chipset implementations.
Once the application has completed defining the render pass, the Mali driver submits a pair of independent workloads for each render pass.
The first pass handles all geometry and compute related workloads. The second pass is for the fragment-related workload. Because Mali GPUs are tile-based renderers, all geometry processing for a render pass must be complete before the fragment shading can begin.
A finalized tile rendering list is required to provide the fragment that is processing the per-tile primitive coverage the information that it needs.
Midgard GPUs can support two parallel issue queues that the driver can use, one for each workload type. Geometry and fragment workloads from both queues can be processed in parallel by the GPU. This arrangement allows the workload to be distributed across all available shader cores in the GPU.