Performance profiling on Android

Arm Streamline supports data capture on non-rooted Android devices. It collects CPU performance data and Mali GPU performance data so you can profile your game or app without device modification. Configuring Arm Streamline to collect the right data is easy. Use the built-in templates to select the most appropriate set of counters for your target device, or use the built-in templates as a starting point for a customized data visualization.

Get started

About Streamline

Arm Streamline helps you optimize software for devices that use Arm® processors. Evaluate where the software in your system spends most of its time by capturing a performance profile of your application running on a target device. Quickly determine whether your performance bottleneck relates to the CPU processing or GPU rendering using interactive charts and comprehensive data visualizations.

With Arm Streamline, you can:

  • Find hot spots in your code to be targeted for software optimization.
  • Identify the processor that is the major bottleneck in the performance of your application.
  • Use CPU performance counters to provide insights into L1 and L2 cache efficiency, enabling cache-aware profiling.
  • Identify the cause of heavy rendering loads that cause poor GPU performance.
  • Use GPU performance counters to identify workload inefficiencies.
  • Reduce device power consumption and improve energy efficiency by optimizing workloads
    using performance counters from the CPU, GPU, and memory system.

For CPU bottlenecks, use the native profiling functionality to locate specific problem areas in your application code. Investigate how processes, threads, and functions behave, from high-level views, right down to line-by-line source code analysis. The basic profile is based on regular sampling of the PC (Program Counter) of the running threads, allowing identification of the hotspots in the running application. Hardware performance counters that are provided by the target processors can supplement this analysis. These counters enable hotspot analysis to include knowledge of hardware events such as cache misses and branch mispredictions.

For GPU bottlenecks, use performance data from the Mali™ GPU driver and hardware performance counters to explore the rendering workload efficiency. Visualize the workload breakdown, pipeline loading, and execution characteristics to quickly identify where to apply rendering optimizations.

Optimize for 64-bit architecture

Most of today’s mobile devices are based on Arm’s 64-bit architecture and when games and applications take advantage of this, they can get some great performance gains. Streamline supports 64-bit by default in all versions making it easy to tweak applications to get the very best performance.


Annotate your code to link notable sections with performance data

Streamline enables you to annotate code in an application or game. This allows the user to relate parts of their code with the performance data that gets generated. This is customizable by each user, but examples include: seeing how long their physics takes per frame, or whether their job manager is scheduling things effectively.