HomeCommunityMobile, Graphics, and Gaming blog
June 16, 2026

Mori to Nanite: Billions of triangles on mobile

Discover how Nanite scales to mobile, delivering richer geometry, improved visuals, and balanced performance in the Unreal Engine Mori demo

By Powen Yang

Share
Reading time 20 minutes

Unreal Nanite is a virtualized geometry system for creating and rendering highly detailed 3D content. Nanite uses a highly compressed mesh format and a cluster-based streaming architecture. It displays only the pixel-scale detail visible to the camera, eliminating traditional polygon budgets and the need for manual Level of Detail (LOD) creation. This enables developers to import and use film-quality assets, such as ZBrush sculpts and photogrammetry scans, while sustaining exceptional performance. Nanite removes much of the work required for mesh optimization and LOD generation. It helps artists work faster and improves visual fidelity. Nanite can handle scenes composed of billions of polygons, which makes it useful for testing next-generation mobile GPUs in real-time graphics workloads.

Mori is an in-house Unreal Engine demo project developed to showcase the capabilities of the latest Arm mobile GPUs, delivering console-quality visuals on portable devices. Whilst it already demonstrates high rendering fidelity, Mori also provides an ideal platform for evaluating Nanite integration. Evaluating Nanite in this context helps assess its ability to increase geometric detail and improve rendering efficiency on mobile hardware. 

We conducted our testing on a Vivo X200 Pro device equipped with a Mali-G925 Immortalis MC12 GPU. The project runs on a modified version of Unreal Engine 5.5.2, utilizing the desktop renderer on mobile devices with Vulkan Shader Model 5 support. For more information, see the Arm Mobile, Graphics, and Gaming blogs. After we enabled Nanite support, the game ran as expected however, the initial performance was suboptimal. Without optimization, the device achieved an average frame rate of about 15 FPSwhich is below an acceptable threshold for most real-time applications.

Dense forest scene in the Mori demo showcasing Nanite on mobile. Frame rate analysis of a Nanite-enabled scene averaging 15.75 FPS.

Profiling and identifying bottlenecks

The first step is to collect performance data and identify bottlenecks. You can use several tools to achieve this, including:

Arm Streamline capture showing Mali GPU performance metrics during Nanite rendering.

  • RenderDoc for Arm GPU: A graphics debugger that provides detailed frame introspection and supports API features and extensions available on the latest Arm GPUs.

Arm Graphics Analyzer capture of a Nanite-rendered forest scene on mobile hardware.

  • Unreal insights: A standalone profiling and performance analysis tool for Unreal Engine that enables developers to record, collect, analyze, and visualize performance data from applications and games. Developers can view the live trace or save the trace file and transfer it to a PC for further investigation.

Unreal Insights trace showing CPU and GPU timing data for a Nanite-enabled scene.

  • GPU visualizer: An Unreal tool for quickly profiling GPU performance, available via the console command ProfileGPU or the shortcut Ctrl + Shift + , (comma). Although it is an editor tool, it still provides accurate profiling data for identifying performance bottlenecks.

GPU Visualizer capture highlighting rendering hotspots in Nanite materials.

  • DumpGPU: DumpGPU is a console command available on multiple platforms that enables developers to export intermediate RDG textures and buffers to disk for analysis and debugging of rendering issues.

DumpGPU analysis of Nanite rendering resources in Unreal Engine.

Unreal Engine also provides a Nanite Visualization Mode to help developers identify common Nanite related issues. You can access this mode by entering the command r.Nanite.Visualize [mode] in the console or using the viewport menu options. Unreal Engine also provides  an advanced visualization mode through the r.Nanite.Visualize.Advanced1 console command, providing additional information for performance analysis and debugging.

Unreal Engine Nanite visualization overlays used to analyze rendering performance.

Tip

Nanite Cluster visualization showing cluster distribution on a tree model.

If the visualization mode flickers,  disable  anti-aliasing with the console command ShowFlag.AntiAliasing 0 which can improve stability.

Investigating WPO and masked materials

We used Unreal Insights,the GPU Visualizer, and  the r.Nanite.ShowMeshDrawEvents 1 console command  to investigate performance. This command provides additional visibility into Nanites operation, particularly during the rasterization phase. We found that certain materials consumed a disproportionate amount of rendering time. Further investigation revealed that these materials used World Position Offset (WPO) and masked blending, both of which introduce significant challenges for Nanite. When WPO displacement is applied, Nanite divides meshes into smaller clusters, each with its own bounds that must be culled individually on the GPU. Excessive or unbounded WPO increases the number of clusters and, in turn, the culling overhead. Masked materials are also more expensive than opaque ones, as masked-out pixels have a similar rendering cost to fully rendered pixels. On Mali GPUs, WPO can be slow because it requires more complex shaders, leading to register spilling.

Pixel Programmable materials cannot use the hardware rasterizer on Mali, which further compounds performance issues. These limitations are specific to the Mali architecture. We have implemented a fix, but it requires Shader Model 6 (SM6). Together, these factors made WPO and masked blending the primary contributors to the performance bottlenecks observed.

Nanite draw event log showing material-level rendering costs during rasterization.

Nanite Overdraw visualization highlighting costly foliage and vegetation in a forest scene.

visualization mode: evaluate WPO

Nanite Raster Bins visualization showing software and hardware rasterization across a forest scene.

visualization mode: pixel programmable

Green: with WPO
Red: no WPO

Red: Pixel Programmable Material
Purple: None Pixel Programmable Material

Note: The following features are considered as pixel programmable:

  • Alpha masking
  • Two-sided
  • Pixel depth offset
  • World position offset (vertex animation)
  • Custom UVs

After the investigation, we performed 2 targeted tests to verify the underlying causes of the performance issue.

  • Disabling WPO: When WPO was disabled, we observed no noticeable change in visual quality, while performance improved to about 25 FPS.

Nanite-enabled forest scene in the Mori demo running on Arm mobile hardware. Performance graph showing frame rate improvement to 27.33 FPS with WPO disabled.
  • Disabling masked materials: This adjustment had a substantial impact on visual appearance however, performance improved significantly, with frame rates increasing to almost 40 FPS.

Forest scene with masked materials disabled, showing simplified foliage and improved performance. Performance graph showing frame rates increasing to 35.75 FPS with WPO and masked materials disabled.

The test results show that WPO and masked materials are the main causes of the observed performance degradation. We investigated this issue and developed engine patches to improve the performance of these materials. Pixel Programmable materials could not originally use the Software Rasterizer on Mali however, an engine patch resolves this limitation.  We recommend avoiding reliance on these features where possible to ensure optimal performance.

Tips for WPO

When WPO cannot be fully disabled for example, for material-based rotation, it can be optimized through selective control. Developers can toggle WPO with the Evaluate World Position Offset option or set a distance threshold using World Position Offset Disable Distance. This enables WPO only when necessary to maintain visuals and reduce performance overhead.

Unreal Engine editor showing the Evaluate World Position Offset setting used to disable WPO.

Unreal Engine editor showing the World Position Offset Disable Distance setting for performance optimization.

Evaluating Nanite with high-polygon assets

To further evaluate Nanite, we developed a prototype to test its capabilities under more demanding conditions. In this experiment, we replaced the original trees in the scene with 2 high-polygon models containing about 76,000 and 634,000 triangles respectively. The goal was to evaluate how well Nanite manages and renders assets with much higher geometric complexity compared to traditional scenes. By increasing the polygon count, we evaluated whether maintain performance on mobile hardware while rendering dense, film-quality geometry.

30k~ faces (original tree)

76k~ faces

634k~ faces

High-polygon Nanite tree model showing increased foliage density and geometric detail.

Scene: 650k

Scene 3M

Scene: 30M

Nanite performance statistics showing the effect of increasing r.Nanite.MaxPixelsPerEdge.

Nanite performance statistics showing the effect of tuning r.Nanite.MinPixelsPerEdgeHW.

Nanite performance statistics showing the effect of using a Packed Level Actor for tree instancing.

Nanite disabled Nanite enabled
30k~ faces

Avg: 33.71 fps

FPS graph showing Nanite performance after optimization, averaging 33.71 FPS.

Avg: 41.96 fps

FPS graph showing final Nanite performance averaging 41.96 FPS after optimization.

76k~ faces

Avg: 26.74 fps

Performance trace highlighting frame rate variability in a Nanite-enabled scene.

Avg: 43.96 fps

Frame time and FPS graph for the optimized Nanite forest environment.

634k~ faces

Avg: 5.02 fps

Performance trace highlighting severe frame rate limitations in the initial test scene.

Avg: 38.23 fps

Frame time and FPS graph for the optimized Nanite scene on mobile hardware.

The performance results from the Nanite and non-Nanite configurations were encouraging, strongly indicating that Nanite can be effectively leveraged in this scenario. Based on these results, we moved to the next stage of the evaluation, focusing on enhancing visual quality.

Building Nanite-ready content

Based on the outcomes of the previous experiments and prototype evaluations, we defined a set of specifications for modifying and improving assets for Nanite. These specifications aim to maintain visual fidelity and stable performance on mobile hardware:

  • High polygon counts: High-polygon objects capture richer details. Because Nanite can handle very high polygon counts, we focused on representing as much detail as possible in elements such as bark and leaves. We created a tree model with about 8.66 million triangles: 3.75 million for the trunk and 4.91 million for the foliage.


Lit mode Lighting only Wireframe
High-detail Nanite tree branches and foliage rendered in Unreal Engine. Tree asset rendered without textures to highlight Nanite geometry and foliage density. Nanite Triangles visualization showing dense foliage geometry in a high-detail tree asset.
Close-up of a high-detail Nanite tree trunk showing bark texture and geometric detail. Geometry-only view of a Nanite tree trunk highlighting bark detail. Nanite Triangles visualization showing dense bark geometry on a high-polygon tree trunk.
Close-up of a high-detail Nanite pine branch showing individual needle geometry Untextured pine needle geometry showing detailed foliage structure in a Nanite asset. Nanite Triangles visualization showing dense pine needle geometry on a high-detail branch
  • Material restrictions: To maintain efficiency, materials should avoid WPO and masked blending, as both increase performance overhead. We applied a standard PBR material setup with base color, roughness, and metallic and occlusion maps. We also packed the occlusion, roughness, and metallic maps into the RGB channels of a single ORM texture to reduce texture usage.

Base Color Roughness Normal Occlusion ORM
Bark texture map used for a high-detail Nanite tree trunk. Bark normal map used to add surface detail to a Nanite tree trunk. Bark normal map used to add surface detail to a Nanite tree trunk. Bark normal map used to add surface detail to a Nanite tree trunk. Bark normal map used to add surface detail to a Nanite tree trunk.
  • Consistency across assets: The same guidelines apply to all scene elements, including grass, twigs, rocks, and other environmental details. This ensures consistent rendering behavior across the project.
  • Terrain optimization: In the original Mori game, the terrain used the landscape system however, the mesh detail remained insufficient, even after increasing the landscape resolution. To improve visual quality, we developed 2 sets of terrain meshes. The first mesh covers the playable area and contains about 4 million polygons to capture fine details. The second mesh represents distant regions beyond the playable space. It contains about 400,000 polygons because less detail is required. Both meshes share a set of 3 8K textures: base color, normal, and an ORM map. This approach balances resource efficiency and visual quality.

Terrain comparison showing visual artifacts caused by low mesh density and their resolution with a high-detail terrain mesh.

High-detail terrain mesh showcasing rocky ground and landscape detail in the Mori Nanite demo.

Nanite Cluster visualization showing terrain mesh subdivision across a high-detail landscape.

Optimizing the final scene

After upgrading the assets to improve visual fidelity, the frame rate fell below the 30 FPS performance goal. To address this gap, we optimized the system to improve performance while maintaining visual quality. We explored optimization methods, including:

  • Nanite.MaxPixelsPerEdge: MaxPixelsPerEdge controls the maximum screen-space edge length for Nanite meshes. This setting directly affects triangle density, with lower values producing more triangles for higher visual detail and higher values simplifying the mesh to improve performance. This setting helps balance visual fidelity against rendering efficiency. The results show that increasing the Max Pixels Per Edge value improves performance by reducing the total number of rendered triangles. However, values above a certain threshold can introduce visible artifacts. By tuning this parameter, we balanced visual quality and performance.

r.Nanite.MaxPixelsPerEdge: 1

Nanite Cluster visualization showing terrain and vegetation cluster distribution in the Mori scene.

r.Nanite.MaxPixelsPerEdge: 50

Nanite Cluster visualization showing optimized cluster distribution across terrain and vegetation.

r.Nanite.MaxPixelsPerEdge: 1 r.Nanite.MaxPixelsPerEdge: 3 r.Nanite.MaxPixelsPerEdge: 10
Final Nanite-enabled Mori forest scene demonstrating high visual quality and real-time performance on mobile hardware.Final Nanite-enabled Mori forest scene demonstrating high visual quality and real-time performance on mobile hardware. Final optimized Mori scene demonstrating Nanite visual quality and real-time performance on Arm mobile hardware. Final optimized Mori scene demonstrating detailed vegetation and stable Nanite performance on Arm mobile hardware.

Avg: 21.87 fps

Performance graph showing Nanite rendering at 21.87 FPS with r.Nanite.MaxPixelsPerEdge=1.

Avg: 33.74 fps

Performance graph showing Nanite rendering at 33.74 FPS with r.Nanite.MaxPixelsPerEdge=3.

Avg: 41.06 fps

Performance graph showing Nanite rendering at 41.06 FPS with r.Nanite.MaxPixelsPerEdge=10.

  • Nanite.MinPixelsPerEdgeHW: MinPixelsPerEdgeHW determines the triangle edge length, in pixels, at which Nanite switches from software to hardware rasterization. Triangles above this threshold use the GPU hardware rasterizer for efficiency, while smaller, pixel-scale triangles use the Nanite software rasterizer. This value controls the balance between hardware and software rasterization, allowing developers to adjust it to balance performance and visual quality. The results show that setting r.Nanite.MinPixelsPerEdgeHW=64 improves performance by about 1 ms. However, further adjustments do not consistently improve performance. The effect of this parameter depends on the underlying hardware, as the balance between software and hardware rasterization can vary between devices. Developers should test different MinPixelsPerEdgeHW values to identify the optimal configuration for their specific target platforms.

MinPixelsPerEdgeHW: 8 MinPixelsPerEdgeHW: 32 MinPixelsPerEdgeHW: 128
Nanite statistics showing the distribution of software and hardware rasterized clusters. Nanite statistics showing the distribution of software and hardware rasterized clusters. Nanite statistics showing the distribution of software and hardware rasterized clusters.
r.Nanite.MinPixelsPerEdgeHW:8 r.Nanite.MinPixelsPerEdgeHW:16 r.Nanite.MinPixelsPerEdgeHW:32 r.Nanite.MinPixelsPerEdgeHW:64
Nanite statistics showing reduced software-rasterized clusters after tuning r.Nanite.MinPixelsPerEdgeHW. Nanite statistics showing reduced software-rasterized clusters after tuning r.Nanite.MinPixelsPerEdgeHW. Nanite statistics showing reduced software-rasterized clusters after tuning r.Nanite.MinPixelsPerEdgeHW. Nanite statistics showing reduced software-rasterized clusters after tuning r.Nanite.MinPixelsPerEdgeHW.

Avg: 25.15fps

Performance graph showing Nanite rendering at 25.15 FPS with r.Nanite.MinPixelsPerEdgeHW=8.

Avg: 29.50 fps

Performance graph showing Nanite rendering at 29.50 FPS with r.Nanite.MinPixelsPerEdgeHW=16.

Avg: 32.41 fps

Performance graph showing Nanite rendering at 32.41 FPS with r.Nanite.MinPixelsPerEdgeHW=32.

Avg: 34.25 fps

Performance graph showing Nanite rendering at 34.25 FPS with r.Nanite.MinPixelsPerEdgeHW=64.

Avg: 23.44 fps
Performance graph showing Nanite rendering at 23.44 FPS with r.Nanite.MinPixelsPerEdgeHW=128.

  • Packed Level Actor (PLA): Unreal Engine’s Packed Level Actor combines multiple static meshes into a single optimized actor, converting them into Instanced Static Meshes (ISMs) or Hierarchical Instanced Static Meshes (HISMs). This process reduces the actor count and enhances GPU efficiency, making it particularly useful for large-scale environments and set dressing. In our scene, all trees were static meshes, we grouped them into a PLA using instancing. This improved rendering efficiency and reduced rendering time by about 1 to 2 ms.

Packed level actor of the forest

without PLA with PLA

Avg: 30.98 fps

Performance graph showing Nanite rendering at 30.98 FPS without mesh instancing.

Avg: 32.66 fps

Performance graph showing Nanite rendering at 32.66 FPS with mesh instancing enabled.

These performance optimizations helped us to balance visual fidelity and performance. The goal was to achieve the frame rate required for a smooth and responsive mobile experience while maintaining high visual fidelity. The following images show a comparison between the original Mori project and the Nanite-enabled version, highlighting the visual enhancements realized through this approach.

Mori Mori with Nanite
Original tree asset used as a baseline for comparing Nanite visual quality improvements. High-detail Nanite tree asset showcasing dense foliage, branches, and bark geometry.
Original tree canopy rendered without Nanite, used as a visual quality baseline. Upward view of a high-detail Nanite tree canopy showing dense branch and foliage geometry.
Original terrain rendered without Nanite, used as a visual quality baseline. Nanite-enhanced terrain showcasing detailed ground geometry in the Mori forest scene.
Original terrain rendered without Nanite, used as a visual quality baseline. Nanite-enhanced forest terrain showing detailed grass, rocks, and ground geometry.
Original Mori forest scene rendered without Nanite, used as a visual quality baseline. Nanite-enhanced Mori forest scene showcasing detailed vegetation and terrain on mobile hardware.
Original pine tree asset rendered without Nanite, used as a visual quality baseline. Upward view of a Nanite-enhanced pine tree showing detailed bark, branches, and foliage.

Nanite improved visual quality in the Mori project. It enabled the use of highly detailed assets with very high polygon counts while maintaining a balanced level of performance on mobile hardware. The project demonstrates that Nanite can improve  visual quality while maintaining reasonable performance on mobile devices however, there is still more work to do. Several areas still require improvement, for example, the scene feels static due to the absence of wind and foliage animation, such as leaf and grass movement. The limited variety of materials also reduces environmental detail and variation. These areas provide opportunities for further refinement and experimentation in future iterations.

References


Log in to like this post
Share

Article text

Re-use is only permitted for informational and non-commercial or personal use only.

placeholder