Performance Expectations

Setting a realistic performance goal for your application is very important. A Mali Utgard GPU is capable of processing the following tasks in a single clock:

  • Issue one new thread per shader core per clock.
  • Retire and blend one fragment per shader core per clock.
  • Write one pixel per shader core per clock.
  • Issue one instruction per shader core per clock.
  • Process 14 FP16 operations per clock.
  • Read 64 bits of uniform data per clock.
  • Interpolate 64 bits of varying data per clock.
  • Sample one bilinear filtered texel per clock.

Platform Dependencies

The performance of a Mali GPU in any specific chipset is dependent on both the configuration choices made by the silicon implementation, and the final device form factor the chipset is used in.

Some characteristics, including the number of shader cores and the size of the GPU L2 cache, are visible in terms of the GPU logical configuration that the silicon partner has built.

Other characteristics depend on the memory system’s logical configuration, like the memory latency, bandwidth, DDR memory type, and how memory is shared between multiple users.

Some characteristics depend on analogue silicon implementation choices, like which silicon process was used, the target top frequency, the DVFS voltage, and frequency choices available at runtime.

Finally, some characteristics depend on the physical form factor of the device, because this determines the available power budget. Therefore, an identical chipset can have very different peak performance results in different form factor devices.

For example:

  • A small smartphone has a sustainable GPU power budget between 1-2 Watts.
  • A large smartphone has a sustainable GPU power budget between 2-3 Watts.
  • A large tablet has sustainable GPU power budget between 4-5 Watts.
  • An embedded device with a heat sink may have a GPU power budget up to 10 Watts.

When combined, it can be difficult to predict the performance of any GPU implementation based on the GPU product name, core count, and top frequency. If you are unsure, write some test scenarios that behave like your own real use cases, and then run them to see how well they work on your target devices.

Previous Next