An In-Depth Look at Streamline

Make your software as efficient as the hardware it's running on

Giving you better insight into how your software executes than ever before, Streamline performance analyzer makes it easy to optimize for Arm. Whether it's making your games more immersive and pushing the envelope of what Mali GPUs can deliver, or simply finding hot spots in source code to make a SoC as efficient as possible, Streamline is the ideal tool.

The following features are available in DS-5 Ultimate and Professional Editions. DS-5 Community Edition provides a limited subset. For common questions, error messages and tips, read the Streamline FAQ »



Your dashboard for performance overview

See at-a-glance how your system is performing with the Timeline view. With charts showing the values of each counter in the top half, and details of process execution in the bottom half, your optimization process starts here.

By default, Streamline will collect a predefined set of hardware performance counters, but you're free to add or remove counters to suit your needs.



Filter by processes and threads

Simply click on an individual process to expand it, showing the worker threads. The timeline view will change correspondingly, cutting out activity not related to that thread or process.

By holding Ctrl you can select multiple processes or threads to show an aggregate view. Streamline also enables you to filter when you have large numbers of threads in a process, indicated by the funnel icon.

Kernel activity, which is shown in red, appears by default in the timeline view. You can select the kernel process to view it in isolation, just as you would with any other process.


Multicore, multicluster and big.LITTLE

Charts will initially appear stacked, giving you an overall impression of system performance. Streamline makes it super simple to look at per-core or per-cluster performance. Clicking the arrows on the charts will expand them, allowing you to spot classic errors like accidentally running a single core at 100% whilst the others idle.

Processes and threads can also be viewed per core or cluster, or as an overall heat map. See whether there are too many processes trying to access a core at once with the blue contention dashes, or whether I/O access is causing a system slowdown with red dashes.



Add custom annotations

Similarly to the way you might add printf statements for debugging, you can add annotations to Streamline to help add extra context to a report.

Simply include the streamline_annotate.h header file and add the necessary macros to begin annotating your code.

Visual annotations are just as easy to add, enabling you to capture images and add them to your Streamline report.

Learn more »


Overlay charts and customize expressions

For metrics not covered by individual counters, you can use the settings for each chart to configure customized expressions. This can be useful for quick calculations within Streamline (for an example of this, read our tutorial on Mali GPU optimization).

Whether it's for like-for-like comparison, or trend-spotting, adding extra counters to a Streamline chart is easy. Chart type, color and y-axis limits can all be customized, along with the units. Custom charts can be saved for later use, giving you quick access in other Streamline reports.


Mali GPU Optimization

Identify problems which might cause lagging or unnecessary GPU bottlenecking with a wide range of Mali counters, helping to make your games and multimedia applications run smoothly. Whether you're a small development team or a large games studio, Streamline can save you valuable time.

Learn more about optimizing for Mali GPUs »

Measure real-world energy usage

Arm Energy Probe can be used with your development board to monitor energy usage. In turn, these data can be displayed alongside the usual Streamline charts to give a more comprehensive picture of performance analysis. The alignment of the power chart can be manually adjusted to compensate for any offset.




OpenCL visualizer

OpenCL calls can also be visualized in Streamline, showing how a thread initiates tasks within OpenCL queues, including dependencies, synchronization, memory access and execution diagrams for each queue.


Streamline Documentation & Configuration

DS-5 comes with a detailed Streamline user guide, helping you to get started and explaining the settings you can choose from in the configuration dialog. The Streamline interface is designed to be easy-to-use, though the manual provides explanations for each of the elements. If you can't find an answer in the manual, ask the experts in the Arm Connected Community, or contact technical support.

Export and collaborate across your team

If you want to share data from Streamline with colleagues who aren't using DS-5, you can easily export as comma, tab or space separated values.

Streamline reports can also be shared for analysis in other DS-5 installations. You don't need to re-run a capture, simply re-associate the path of the source code.



Table views

Break performance down by function

To understand exactly which functions are causing hotspots in your software, the functions view gives you a table which is sorted by usage. You can use the calipers on the timeline to isolate a particular period of interest, which will be reflected in this view.



Drill down to the Source Code

The devil is in the detail

The Code view presents the percentage each source line or disassembly entry contributed to the total samples collected for the function. View it alongside the disassembly to get the most granular detail level available in Streamline.

Clicking through from the stack, call paths or functions view highlights the corresponding block of code, giving you a great way of tracking problems from the highest level all the way down to the root cause.





Automatically list event counters

Streamline automatically reports which counters are available for collection, based on the core you are connected to. The full list of counters is very extensive and available in each processor Technical Reference Manual.

Event-based Sampling

See exactly which individual lines of code are causing excessive numbers of CPU performance events such as cache misses and branch mispredictions by turning on event-based sampling, giving you an even degree of clarity.

Set threshold limits for individual counters and then inspect them in the function and source code view, helping you to focus on problem areas.

User space Only Streamline

Profile without rebuilding the kernel

To access many of the counters in Streamline, you no longer need to rebuild the kernel with the gator.ko module included. Streamline now works in user space mode, provided you have a Linux kernel version 3.4 or greater. Note that not all information available through use of gator.ko module is available in user space mode.

This can be very useful if your team doesn't have access to the kernel source in order to rebuild it, making it easier to profile an application or overall system performance. Streamline will always initially look for the gator module, defaulting back to operating in user space mode if it's not detected.

Additionally, DS-5 also contains a pre-built gator daemon binary, which makes it un-necessary to rebuild the daemon in most cases.


Community Edition Features

The essentials for native Android app development

DS-5 Community Edition allows you to use a limited set of hardware and software counters, giving you a flavor of the features available in DS-5. This makes it an ideal tool for Android native app development, assisting you in profiling your software.

DS-5 Community Edition Counters
Hardware Counters Clock cycles
Instructions executed
Cache coherency hit/miss
Branch mispredicted/pc change
Fault unaligned access
Linux Software Counters Interrupts
Soft interrupts
Disk I/O
Memory usage
Network traffic
Mali Counters All counters available