Arm Streamline helps you optimize software for devices that use Arm® processors. Evaluate where the software in your system spends most of its time by capturing a performance profile of your application running on a target device. Quickly determine whether your performance bottleneck relates to the CPU processing or GPU rendering using interactive charts and comprehensive data visualizations.
With Arm Streamline, you can:
- Find hot spots in your code to be targeted for software optimization.
- Identify the processor that is the major bottleneck in the performance of your application.
- Use CPU performance counters to provide insights into L1 and L2 cache efficiency, enabling cache-aware profiling.
- Identify the cause of heavy rendering loads that cause poor GPU performance.
- Use GPU performance counters to identify workload inefficiencies.
- Reduce device power consumption and improve energy efficiency by optimizing workloads
using performance counters from the CPU, GPU, and memory system.
For CPU bottlenecks, use the native profiling functionality to locate specific problem areas in your application code. Investigate how processes, threads, and functions behave, from high-level views, right down to line-by-line source code analysis. The basic profile is based on regular sampling of the PC (Program Counter) of the running threads, allowing identification of the hotspots in the running application. Hardware performance counters that are provided by the target processors can supplement this analysis. These counters enable hotspot analysis to include knowledge of hardware events such as cache misses and branch mispredictions.
For GPU bottlenecks, use performance data from the Mali™ GPU driver and hardware performance counters to explore the rendering workload efficiency. Visualize the workload breakdown, pipeline loading, and execution characteristics to quickly identify where to apply rendering optimizations.