GPU optimizations

This section of the guide shows how you can use various graphics optimizations to reduce the time and effort your CPU or GPU need to run your game.

 

Use static batching

Static batching is a common optimization technique that reduces the number of draw calls, and therefore application processor use.

Unity performs dynamic batching transparently, but cannot apply it to objects that are made of many vertices. This is because the computational overhead becomes too large.

Static batching can work on objects that are made of many vertices, but the batched objects must not move, rotate, or scale during rendering.

To enable Unity to group objects for static batching, mark them as static in the Inspector window.

The following image shows static batching settings:

Static batching settings

 

Use 4x MSAA

Arm Mali GPUs can do 4x Multi-Sample Anti-Aliasing (MSAA) with a low computational overhead.

You can enable 4x MSAA in:

  • The Universal Render Pipeline (URP) settings if you are using the URP. The following image shows the setting:

MSAA settings

  • The Unity Quality Settings if you are not using URP. Select Edit > Project Settings > Quality Settings.

 

Use Level of Detail

Level of Detail (LOD) is a technique in which the Unity engine renders different meshes for the same object at different distances from the camera. Geometry is more detailed when the object is close to the camera. The LOD is reduced as the object moves away from the camera. At the furthest distance, you can use a planar-aligned billboard.

To set up LOD groups to manage the meshes that you use and the associated distance ranges, select Add Component > Rendering > LOD Group.

From Unity 5, you can set a Fade Mode for each LOD level to blend to contiguous LODs. Fade Mode smooths the transition between the LODs. Unity calculates a blending factor according to the screen size of the object and passes it to your shader for blending. You must implement the geometry blending in a shader. The following image shows the LOD group settings:

Level of Detail group settings

 

Avoid mathematical functions in custom shaders

When writing custom shaders, minimize the use of expensive built-in mathematical functions. This is because these functions can cause the custom shader to take longer to run. Expensive functions include:

  • pow()
  • exp()
  • log()
  • cos()
  • sin()
  • tan()
  • sqrt()

 

Use lightmaps and light probes

Runtime lighting calculations are computationally expensive. A popular technique to reduce the computational cost is called lightmapping. Lightmapping pre-computes the lighting calculations and bakes them into a texture called a lightmap. With a lightmap, you lose the flexibility of a fully dynamically lit environment. However, you get high-quality images without affecting performance.

To bake the resulting lighting in a static lightmap, in the Inspector window for the geometry that is receiving the lighting:

  1. Set the geometry to Static.
  2. Under Mesh Renderer > Lighting:
    1. Check the Contribute Global Illumination option.

    2. Set Receive Global Illumination to Lightmaps.

  3. The Settings window is shown in the following image:

    Settings for geometry receiving global illumination

  4. In Window > Rendering > Lighting Settings > Scene tab select the Baked Global Illumination option.

To see the resulting lightmap, select the geometry and:

  • Open the Lighting window and view the Baked Lightmaps.
  • Open the Inspector window for the object and select Baked Lightmap > Open Preview.

If the Continuous Baking option is selected, Unity bakes the lightmap and updates the scene in the Editor window in seconds. If the Continuous Baking option is not selected, select Generate Lighting to update the scene.

A quick way to check that the lightmap has set up correctly is to run the game in the Editor window and disable the light. If the lighting is still there, the lightmap has been created correctly and is in use.

The following image shows a baked lightmap as opened from either the Inspector window’s Lightmapping section or the Lighting window:

The baked lightmap

The following image shows the Editor window displaying lighting from a green light at the end of a cave. The lighting is generated with a static lightmap:

Adding a light to bake a static lightmap

The following image shows the result of the static lightmap in the Ice Cave demo:

Lightmapped cave

Setting up lightmapping

To prepare an object for lightmapping, you need:

  • A model in your scene with lightmap UVs. You can do this by selecting Generate Lightmap UVs when importing the mesh.
  • The model must be set as Lightmap Static. Objects that are not marked as static are not placed in the lightmap.
  • There must be a light within range of the model. This light's Baking type must be set to Baked.

These objects are not likely to be perfect, so experiment to see what works best for your game.

To set up lightmapping:

  1. From the main menu, select Lighting > Window and Lightmapping.
  2. Select one of the following options:
    • Scene.
    • Baked Lightmaps
    • Real-time Lightmaps (if you are not using the URP).

Scene-level lighting options

The options in the Lighting > Scene tab can have a big impact on performance. Here, we review a few of the important ones:

  • In the Environment section, the Environment Reflections > Bounces option is the most important from the performance point of view. Reflection bounces defines the number of inter-reflections between reflective objects, that is, the number of bake times for the probe that sees the objects. If the reflection probes are updated at runtime, they can have a large, negative impact on performance. Only set the number of bounces higher than one if the reflective objects are visible in the probes.
  • Environment settings

  • In the Mixed Lighting section:
    • Select a Lighting Mode. The different modes have significant performance variations; you can review them on Unity's documentation site.
    • In URP, there is no option for Real-time Global Illumination. However, you can select Baked Global Illumination to create lightmaps, as shown in the following image:
    • Mixed Lighting settings

  • In the Lightmapping Settings:
    • Select the Lightmapper > Progressive GPU option for fast updates while setting up your scene. Select Progressive CPU for the final version.
    • Set the lightmap texture to be compressed by selecting the Compress Lightmaps option. Compressing lightmap textures requires less storage space and less bandwidth at runtime. However, the compression process can add artifacts to the texture.
    • Be careful with the Directional Mode option. When Directional Mode is set to Directional instead of Non-Directional, an extra lightmap is created to store the dominant direction of incoming light. As a result, Directional mode requires about twice as much storage space and video memory, compared to Non-Directional. If you cannot use deferred lighting with dual lightmaps, another technique you may want to use is directional lightmaps. Directional lightmaps enable you to use normal mapping and specular lighting without real-time lights. Use directional lightmaps if normal mapping must be preserved but dual lightmaps are not available, as typical for mobile devices.
    • Reduce Lightmap Resolution, Lightmap Padding and Lightmap Size to a level just above that which causes artifacts.

The following image shows lightmapping settings in line with the recommendations in this section:

Lightmapping Settings

Review the baked lightmaps

Use the Baked Lightmaps tab to preview the impact of the lightmap settings you selected in the Scene tab.

The following image shows lightmaps in the Lighting tab:

Lightmaps in the Lighting tab

Object-level lighting options

You can modify the object settings that impact the lightmapping process in the Inspector window.

You can change the following options for your object:

  • If you select a light, set Mode to Baked, Real-time, or Mixed. Setting most lights to Baked, or at least Mixed, ensures that the number of calculations at runtime is relatively low. The mode is a top-level lighting setting and has a huge impact on performance.

    The following image shows the Light settings with a mixed mode:

    Light settings with Mixed Mode

  • Shadow options also need careful consideration. To learn more about shadows, see the section Limit shadow complexity.
  • If you select an object with geometry, the two most important options for performance are:
    • Lightmapping > Stitch seams, which you will often want to use for getting correct lightmapping on complex objects.
    • Lighting > Contribute Global Illumination and Receive Global Illumination set to Lightmaps for a static geometry, as shown in the following image:

      Global Illumination settings

Use light probes for dynamic objects in your game

Light probes add dynamic lighting to lightmapped scenes. Light probes take a sample, or probe, of the lighting in an area. If the probes form a volume, or cell, the lighting is interpolated between these probes, depending on their position within the cell.

The more probes there are, the more accurate the lighting is. You do not typically require many light probes because there is interpolation between probes. This means the lighting at any position can be approximated by interpolating between the samples that are taken by the nearest probes. You require more light probes in areas where there are large changes in light color or intensity.

Be careful when you are placing the light probes. For the meshes that you want to be influenced by the probes, select Receive Global Illumination and Light Probes. The following image shows light probe settings for an object with a mesh (in Object Settings):

Probes settings

The following image shows multiple light probes:

Multiple light probes in a scene

 

Use ASTC texture compression

ASTC texture compression is an official extension to the OpenGL and OpenGL ES graphics APIs, and a core feature in the Vulkan API. ASTC can reduce your application’s memory requirements and the GPU’s memory bandwidth use.

ASTC offers texture compression with high quality and low bitrate, and has many options. ASTC includes the following features:

  • Bit rates range from 8 bits per pixel (bpp) to less than 1bpp. This enables you to fine-tune the trade-off of file size against quality.
  • Support for 1-4 color channels.
  • Support for both Low Dynamic Range (LDR) and High Dynamic Range (HDR) images.
  • Support for 2D and 3D images.
  • Support for selecting different combinations of features.

You can turn on ASTC for all textures in File > Build Settings. But we suggest leaving Texture Compression set to Don’t override and giving texture-specific settings. This compression setting is global for all the textures. The Don’t override option means that the default format will be used.

The following image shows a setting of Don't override, rather than ASTC:

Texture Compression set to Don’t override

The following image shows the section in Inspector > Texture Settings that deals with platform-specific formatting:

Platform-specific texture settings

If you select Best for Compressor Quality, the compressor will try a lot of different ASTC block options to find the best result for quality and size. If you select Fast, the compressor will try only the most promising options, which will often not be the best ones. So Best provides a better quality than Fast, at the cost of being slower. You may want to select Fast when you are experimenting with your scene and want to iterate quickly, and Best for your final version.

There are several block sizes available in the ASTC settings window. The larger block sizes provide higher compression. Select large block sizes for textures that are not shown in detail, for example, objects far away from the camera. Select smaller block sizes for textures that show more detail, for example those closer to camera.

The following image shows the block sizes for different texture compression formats:

Texture compression block sizes

Note:

  • OpenGL ES 2 does not support ASTC. Therefore, if you want to target OpenGL ES 2 devices and use texture compression, you may want to build a separate Android Application Package (APK) for those devices and use ETC encoding.
  • Most Android devices now support ASTC, so normally you should use ASTC to compress the textures in your 3D content. If you are targeting devices that do not support ASTC, try using ETC2.
  • You must differentiate between textures that are used in 3D content from textures that are used in the Graphical User Interface (GUI) elements. In some cases, you might want to leave the GUI textures uncompressed to avoid unwanted artifacts.

Each ASTC texture type needs a different compression format to achieve the best possible results.

Texture compression algorithms have different channel formats, typically RGB and RGBA. ASTC supports several other formats, but these formats are not exposed within Unity.

Each texture type is typically used for a different purpose, such as:

  • Standard texturing.
  • Normal mapping.
  • Specular.
  • HDR.
  • Alpha.
  • Look up textures.

The following image shows the texture types in the Inspector window:

Texture Type in Import Settings

Unity typically imports your texture as the default type, which is suitable for most images, but uses more specialist ones where appropriate. For example, it is important to specify when a normal map is used. This is because ASTC needs to compress differently for normal maps, where the RGB components do not correlate with color.

Selecting individual settings for all your textures improves the visual quality of your project and avoids unnecessary texture data at compression time.

For example, the following image shows settings for a GUI texture with some transparency. Because the texture is for a GUI, sRGB and Mip Maps are disabled. To include transparency, you need the alpha channel. To improve performance with an alpha channel, select Override for Android and choose an appropriate ASTC block size and compression format.

Default texture settings

The following table shows the compression ratio for the available ASTC block sizes in Unity for an RGBA 8 bits per channel texture with a 1024x1024 pixel resolution at 4 MB in size:

ASTC block size Size Compression ratio Bits per texel
4x4 1 MB 4.00 8.00
5x5 655 KB 6.25 5.12
6x6 455 KB 9.00 3.56
8x8 256 KB 16.00 2.0
10x10 164 KB 24.97 1.28
12x12 144 KB 35.93 0.89

 

Mipmapping

Mipmapping is a texturing technique that can enhance both the visual quality and performance of your game.

Mipmaps are pre-calculated versions of a texture at different sizes. Each texture that is generated is called a level, and it is half as wide and half as high as the preceding level. Unity can automatically generate the complete set of levels from the first level at the original size down to a 1x1 pixel version.

If a texture does not have mipmap levels, and is bigger than the area (in pixels) that it covers, the GPU scales the texture down to fit the smaller area. Scaling the texture down increases the load on the GPU, and leads to inaccuracies that damage the quality. If a texture does have mipmap levels, the GPU fetches pixel data from the level that is closest to the object size to render the texture. Fetching the correct level ensures a higher quality image, compared to not using mipmaps. It also reduces the GPU workload, because rather than scale down while displaying the game, the GPU fetches a level that was produced earlier.

To generate the mipmaps:

  1. In the Project window, select a texture and for Texture Type, select Advanced.
  2. In the Inspector window, select Generate Mip Maps.

The following image shows mipmap settings:

Mipmap settings

The disadvantage of mipmapping is that it requires 33% more memory to store the texture data.

Note: Textures that are used in a 2D UI do not usually need mipmapping. UI textures are typically rendered on screen without scaling, so that they only use the first level in the mipmap chain. To change this setting, in the texture’s Inspector window, either:

  • For Texture Type select Editor GUI and Legacy GUI.
  • For Texture Type select Default, and clear the Generate Mip Maps checkbox.

 

Use cubemaps for skyboxes

Games and other applications often use skyboxes to generate backgrounds. There are several methods to implement them. One method is drawing the skybox by rendering the background of the camera using a single cubemap.

Drawing a skybox with a cubemap requires one cubemap texture and one draw call. Compared to other methods, a cubemap uses less memory, memory bandwidth, and draw calls.

 

Limit shadow complexity

Shadows add perspective and realism to your scenes. Without shadows, it can sometimes be difficult to understand the depth of objects, especially if they look like other surrounding objects.

Shadow algorithms can be complex, especially when rendering accurate, high-resolution shadows. Ensure you use the simplest shadowing possible to enhance performance.

For example, the Ice Cave demo implements custom shadows: shadows based on local cubemaps are combined with shadows that are rendered at runtime.

Unity has several options for shadows in the URP object that can impact the performance of your game. The URP options are shown in the following image:

URP shadow options

Note: If you are not using URP, the shadow options are at Edit > Project Settings > Quality.

Two of the URP shadow options that impact performance are soft shadows and shadow distance:

  • Soft shadows look more realistic but take longer to calculate than hard shadows.
  • The Shadow Distance option defines the distance from the camera that shadows appear in. Increasing the shadow distance increases the number of visible shadows, which increases the computational load. Increasing the shadow distance also increases the number of texels that are available for the shadows in the shadow map. In turn, increasing the number of texels passively increases the resolution of your shadows.

You can use hard shadows with a small shadow distance and a high resolution. This produces reasonable quality shadows that are not too close to the camera and not too complex.

Lightmapped objects do not produce real-time shadows, so the more static shadows you can bake into the scene, the fewer real-time calculations the GPU does.

The following image shows an alien character with a shadow:

Alien casting shadow

Use real-time shadows sparingly

Real-time shadows can dramatically enhance the realism of a scene, but they are computationally expensive, so use them sparingly.

On mobile devices in particular, try to limit the number of lights that include real-time shadows. Try to use lightmapping instead. Usually, the moving characters should cast real-time shadows, but not receive them. Instead, they should receive the baked shadows from static objects. Static objects, on the other hand, should receive real-time shadows from the characters, but only cast shadows to the baked lightmaps, not real-time.

Consider whether, in the URP object, additional lights need to cast shadows. You can set this option for each individual light.

For the URP object, consider the following settings:

  • Use Shadow Resolution to balance quality and processing time. There is a range of options between 256 and 4,096 pixels, and your choice will depend on the number of objects and lights in the scene.
  • Use Shadow Cascades to balance quality and processing time. You can set the shadow cascades to zero, two, or four. Cascaded Shadow Maps are used for directional lights to achieve good shadow quality, especially for long viewing distances. A higher number of cascades produces better quality but increases processing overhead.

    The following image shows lighting and shadows settings for good performance:

    Lighting and shadow settings

 

Set up occlusion culling

Occlusion culling disables the rendering of objects when they are obscured from the view of the camera. This process saves GPU processing time by rendering fewer objects.

Unity automatically performs frustum culling when objects exit the camera frustum completely. However, there might be other objects that cannot be seen and do not need to be rendered, because of the style of your application.

Unity includes an occlusion culling system with most settings at Window > Rendering > Occlusion Culling.

The settings that you use for occlusion culling depend on the style of your game. You must be careful picking settings, because incorrect settings can degrade performance.

 

Use OnBecameVisible() and OnBecomeInvisible() callbacks

If you use the callbacks MonoBehaviour.OnBecameVisible() and MonoBehaviour.OnBecameInvisible(), Unity notifies your scripts when their associated game objects move in or out of a camera frustum.

You can use these callbacks to optimize the rendering process. For example, rendering reflections on a pool with a second camera and render targets involves rendering geometry and combining textures off screen before rendering to the final screen surface. This technique is relatively expensive, so only use it when necessary. You are only required to render a reflection when it is visible. For example, you do not need a reflection if:

  • The reflection surface is not in the camera frustum.
  • An opaque object is in front of the surface.

The following code checks conditions with the OnBecameVisible() and OnBecameInvisible() callbacks from the reflective surface:

void OnBecomeVisible() 
{ 
enabled = true; 
} 
void OnBecomeInvisible() 
{ 
enabled = false; 
} 

Even with these checks in place, sometimes a reflection is rendered off screen even though it is not visible onscreen. To avoid this situation, you can add another condition, for example, that the camera must be inside the room of the reflective surface. The following code shows another condition being added to avoid a reflection rendering offscreen:

void OnBecomeVisible() 
{ 
	if (inside == false) 
	{ 
	return; 
	} 
	enabled = true; 
} 
void OnBecomeInvisible() 
{ 
	if (inside == false) 
	{ 
	return; 
	} 
	enabled = false; 
} 
void OnTriggerEnter() 
{
	inside = true; 
} 
void OnTriggerExit() 
{ 
	inside = false; 
} 

The preceding conditions restrict the rendering of reflections to specific areas of the game. This means that you can add effects in other, less compute intensive areas of the game.

 

Specify the rendering order

In a scene, the object rendering order is very important for performance. If objects are rendered in random order, an object might be rendered and then be occluded by another object in front of it. This means that all the computations to render the occluded object were wasted.

Various software and hardware techniques exist to reduce the amount of wasted computation for of occluded objects. However, you can manually improve this process because you know how a player explores the scene.

Early-Z is one of the hardware techniques for reducing wasted computation that is available on the Arm Mali GPUs from the Mali-T600 series onwards. Early-Z is a system that performs a Z-test before the fragment shader is processed. If the GPU cannot enable Early-Z optimization, the depth test is executed after the fragment shader. This can be computationally expensive, and the computations can be wasted if the fragment is occluded. The Early-Z system checks that the depth of the pixel being processed is not already occupied by a nearer pixel. If the pixel is occupied, the system does not execute the fragment shader. This system provides performance benefits, but sometimes it is automatically disabled. For example, if the fragment shader modifies the depth by writing into the gl_FragDepth variable. This means that the fragment shader calls discard.

To assist this system in achieving maximum efficiency, ensure that opaque objects are rendered from front to back. This rendering helps to reduce the overdraw factor in scenes with only opaque objects.

Ordering the rendering of each frame front-to-back can be expensive and incorrect if you render transparent objects in the same pass. However, Arm Mali GPUs from T620 onwards provide a mechanism called Forward Pixel Kill (FPK). Mali GPUs are pipelined so that multiple threads can be concurrently executing for the same pixel. If a thread completes its execution, the FPK system stops all other threads for that pixel if the current one covers them. The effect is a reduction of wasted computations.

Unity provides queue options inside the shader to specify the order of rendering, so that rendering doesn’t rely entirely on hardware inference. If you set the order of rendering in the shader, objects that have a material that uses this shader are rendered together. Inside this rendering group, the order of rendering is random. However, this is not always the case (for example, transparency).

You can override a rendering group for a material in the material script. By default, Unity provides some standard groups that are rendered from first to last in the following order:

Name Value Notes
Background 1000 -
Geometry 2000 Default used for opaque geometry.
AlphaTest 3000 The AlphaTest is drawn after all opaque objects, for example, foliage.
Transparent 4000 This group is also rendered in back to front order to provide the correct results.
Overlay 5000 Overlay effects, for example, user interface, lens flares, dirty lens.

The integer values can be used instead of their string names. These are not the only values available. You can specify other queues using an integer value between those that are shown. The higher the number, the later it is rendered. For example, you can use one of the following instructions to render a shader after the Geometry queue, but before the AlphaTest queue:

Tags { "Queue" = "Geometry+1" } 
Tags { "Queue" = "2001" } 

In the Ice Cave demo, the cave covers large parts of the screen and its shaders are expensive. Where possible, parts were not rendered, because rendering them could degrade performance.

Rendering order optimization was included in the Ice Cave demo after looking at the composition of the framebuffers. The tools that were used to look at the composition of the framebuffers include the Unity Frame Debugger and tools like the Graphics Analyzer. These tools show the rendering order.

Use the Unity Frame Debugger to debug in play mode and get the sequence of drawings Unity executes.

In the Ice Cave demo, scrolling down the draw calls shows that the cave is rendered first. This means that the objects are rendered into the scene, occluding the parts of the cave that are already rendered. Another example is the reflective crystals that in some scenes are occluded by the cave. In these cases, setting a higher rendering order means fragment shaders are not executed for the occluded crystals, resulting in a reduction in computations.

 

Consider using depth prepass

Setting the rendering order for objects to avoid overdraw is useful, but it is not always possible to specify the rendering order for each object. For example, if you have a set of objects that the camera can rotate around freely, you cannot specify their order ahead of time, because objects that were previously at the back can now appear at the front. In this case, if there is a static rendering order set for these objects, some objects might be drawn last, even if they are occluded. This can also happen if an object can cover parts of itself. In cases like this, a depth prepass can sometimes reduce overdraw.

Usually, depth prepass is an optimization that reduces performance, rather than improves it. But it can be worth evaluating whether depth prepass will help if your game:

  • Is fragment bound.
  • Has spare capacity on the CPU and vertex shader.
  • Has significant transparency parts in scenes.

Make sure to check the effect on performance and remember that Forward Pixel Kill (FPK) will already reduce a lot of the wasted computation.

In Unity, to do a rendering prepass for objects with custom shaders, add an extra pass to your shaders.

An extra pass that renders to the depth buffer only is shown in the following code:

// extra pass that renders to depth buffer only 
Pass { 
ZWrite On 
ColorMask 0 
} 

After adding this pass, the frame debugger shows that the objects are rendered twice. The first time that they are rendered there are no changes in the color buffer, because the depth prepass renders the geometry without writing colors in the frame buffer.

This process initializes the depth buffer for each pixel with the depth of the nearest visible object. After this prepass, the geometry is rendered as usual. However, using the Early-Z technique, only the objects that contribute to the final scene are rendered.

Extra vertex shader computations are required for this technique. This is because the vertex shader is computed two times for each object, one time for filling the depth buffer and another time for the actual rendering.

Tip: You can see the depth buffer by choosing it in the top-left menu of the frame debugger.

Previous Next