Principles of High Performance

Knowing what to aim for

 

The smartphone industry has seen huge improvements in the performance of system-on-a-chip (SoC) designs in the last decade, with gains of up to 500% spanning the CPU, GPU, and memory system. These improvements finally make it possible to start treating high-end smartphones as a target for the types of graphics algorithms which would have previously only been possible on games consoles and desktop PCs. However, the mobile form factor still places some obstacles in the path of developers which must be intelligently avoided.

Effectiveness vs efficiency

One of our favorite quotes, from Peter Drucker's book on business management "The Effective Executive", is:

Efficiency is doing things right; effectiveness is doing the right things.

The underlying argument behind this quote is that businesses often spend too much time incrementally improving an existing way or working, rather than taking a step back and reviewing whether there is a more radical change which could give substantially better results. Although this is originally aimed at business managers, we find it a useful thing to bear in mind when developing software and optimizing graphics applications.

It's all about energy

The most significant performance limitation of smartphones is the form-factor. Passively cooling a chip inside a sealed case is never an easy task, and the heat dissipation rate will determine the how much power can be sustainably drawn during game play. For most devices the SoC power budget will be between 2.5 and 3.5 Watts, which often only leaves 1 to 1.5 Watts for the GPU. The challenge for you when trying to get the best performance and rendering quality out of these platforms, is how to get as much useful work as possible out of that power budget.

Principle 1: Be effective

What does "effectiveness" – doing the right things – mean for high performance rendering, given the energy and thermal constraints of a smartphone? We'd argue that it means spending energy on CPU cycles, GPU cycles, and memory accesses which result in a useful improvement in the visible output on the screen. Any cycle or byte we spend on something which is not visible, or which doesn't justify its cost, is energy wasted and quality lost.

Step one for any optimization activity is therefore not to make the current rendering faster by fine tuning, but ensuring that the overall rendering pipeline and choice of algorithm is suitable for the device performance capability and power budget. In addition it is always worth reviewing for workloads which can be completely removed because they are not contributing enough of a useful output to the current frame.

This might include activities such as:

  • Reviewing overall algorithm choices.
  • Reviewing the output resolution and color format of each render pass.
  • Ensuring the CPU is culling draw calls which are off-screen, or which are known to be occluded, before they are sent through the graphics API.

Principle 2: Be efficient

What does "efficiency" – doing things right – mean for high performance rendering? Hopefully by this point in the development cycle you have already optimized for principle one to ensure that only useful work is sent to the GPU, so you now need to focus on optimizing the remaining workload which is contributing to the final render.

When working with gaming partners we often find efficiency is still generally about minimizing redundancy, but focuses more on refining the finer details of the draw calls which are used, rather than the macroscale algorithm choices.

This stage might include activities such as:

  • Ensuring API usage is efficient, with minimal state changes.
  • Ensuring models are well structured, with good locality and minimal precision in data buffers.
  • Ensuring textures are using appropriate data formats, texture compression, and filtering modes.
  • Reviewing shader programs and their execution cost.

Principle 3: Data matters

Developers are used to writing and reviewing code, and so usually spend most of their optimization time looking at API calls and shader code, without really looking at the data they are passing to the GPU. This is nearly always a serious oversight.

GPUs are data-plane processors and so graphics rendering performance can be strongly influenced by data efficiency problems. Accessing external DDR memory is very energy intensive, so poorly sized or inefficiently packed data resources can rapidly consume valuable Wattage.

Info 

A useful rule of thumb is that 1GB/s of memory access uses around 100mW of power. If you spend 600mW of your 2.5W power budget on memory access, an application running at 60 FPS only gets 100MB per frame to play with.

This stage might include activities such as:

  • Reviewing total bandwidth needs of the scene render pass graph
  • Reviewing resolution of textures and framebuffers
  • Reviewing use of texture compression and mipmapping
  • Reviewing models for good data locality and minimal data precision.
  • Reviewing shader programs and their execution cost.

Principle 4: It's art

Finally, it is important to remember that graphics is an art, not a science. In most cases there is no single right answer, and we are just trying to render something which looks good. If an optimized version of something is not bit-exact it is unlikely anyone will actually notice, so don't be afraid to play with the algorithms and use faster approximations if it helps streamline performance.

Conclusions

High quality real-time rendering in a mobile form factor is a reality on modern smartphones today, but requires the developer to write effective and efficient applications to make the best use of the available system power budget.

There are many details to get right, ranging from high-level application algorithm choices, all the way down to the fine detail of object mesh data encoding. Other articles on this site will explore key topics related to efficient use of Mali GPUs in more detail.