An In-Depth Look at Streamline
Make your software as efficient as the hardware it's running on
Giving you better insight into how your software executes than ever before, Streamline performance analyzer makes it easy to optimize for Arm. Whether it's making your games more immersive and pushing the envelope of what Mali GPUs can deliver, or simply finding hot spots in source code to make a SoC as efficient as possible, Streamline is the ideal tool.
The following features are available in DS-5 Ultimate and Professional Editions. DS-5 Community Edition provides a limited subset. For common questions, error messages and tips, read the Streamline FAQ »
Your dashboard for performance overview
See at-a-glance how your system is performing with the Timeline view. With charts showing the values of each counter in the top half, and details of process execution in the bottom half, your optimization process starts here.
By default, Streamline will collect a predefined set of hardware performance counters, but you're free to add or remove counters to suit your needs.
Filter by processes and threads
Simply click on an individual process to expand it, showing the worker threads. The timeline view will change correspondingly, cutting out activity not related to that thread or process.
By holding Ctrl you can select multiple processes or threads to show an aggregate view. Streamline also enables you to filter when you have large numbers of threads in a process, indicated by the funnel icon.
Kernel activity, which is shown in red, appears by default in the timeline view. You can select the kernel process to view it in isolation, just as you would with any other process.
Multicore, multicluster and big.LITTLE
Charts will initially appear stacked, giving you an overall impression of system performance. Streamline makes it super simple to look at per-core or per-cluster performance. Clicking the arrows on the charts will expand them, allowing you to spot classic errors like accidentally running a single core at 100% whilst the others idle.
Processes and threads can also be viewed per core or cluster, or as an overall heat map. See whether there are too many processes trying to access a core at once with the blue contention dashes, or whether I/O access is causing a system slowdown with red dashes.
Add custom annotations
Similarly to the way you might add
printf statements for debugging, you can add annotations to Streamline to help add extra context to a report.
Simply include the
streamline_annotate.h header file and add the necessary macros to begin annotating your code.
Visual annotations are just as easy to add, enabling you to capture images and add them to your Streamline report.
Overlay charts and customize expressions
For metrics not covered by individual counters, you can use the settings for each chart to configure customized expressions. This can be useful for quick calculations within Streamline (for an example of this, read our tutorial on Mali GPU optimization).
Whether it's for like-for-like comparison, or trend-spotting, adding extra counters to a Streamline chart is easy. Chart type, color and y-axis limits can all be customized, along with the units. Custom charts can be saved for later use, giving you quick access in other Streamline reports.
Mali GPU Optimization
Identify problems which might cause lagging or unnecessary GPU bottlenecking with a wide range of Mali counters, helping to make your games and multimedia applications run smoothly. Whether you're a small development team or a large games studio, Streamline can save you valuable time.
Measure real-world energy usage
Arm Energy Probe can be used with your development board to monitor energy usage. In turn, these data can be displayed alongside the usual Streamline charts to give a more comprehensive picture of performance analysis. The alignment of the power chart can be manually adjusted to compensate for any offset.
OpenCL calls can also be visualized in Streamline, showing how a thread initiates tasks within OpenCL queues, including dependencies, synchronization, memory access and execution diagrams for each queue.
Streamline Documentation & Configuration
DS-5 comes with a detailed Streamline user guide, helping you to get started and explaining the settings you can choose from in the configuration dialog. The Streamline interface is designed to be easy-to-use, though the manual provides explanations for each of the elements. If you can't find an answer in the manual, ask the experts in the Arm Connected Community, or contact technical support.
Export and collaborate across your team
If you want to share data from Streamline with colleagues who aren't using DS-5, you can easily export as comma, tab or space separated values.
Streamline reports can also be shared for analysis in other DS-5 installations. You don't need to re-run a capture, simply re-associate the path of the source code.
Break performance down by function
To understand exactly which functions are causing hotspots in your software, the functions view gives you a table which is sorted by usage. You can use the calipers on the timeline to isolate a particular period of interest, which will be reflected in this view.
Drill down to the Source Code
The devil is in the detail
The Code view presents the percentage each source line or disassembly entry contributed to the total samples collected for the function. View it alongside the disassembly to get the most granular detail level available in Streamline.
Clicking through from the stack, call paths or functions view highlights the corresponding block of code, giving you a great way of tracking problems from the highest level all the way down to the root cause.
Automatically list event counters
Streamline automatically reports which counters are available for collection, based on the core you are connected to. The full list of counters is very extensive and available in each processor Technical Reference Manual.
See exactly which individual lines of code are causing excessive numbers of CPU performance events such as cache misses and branch mispredictions by turning on event-based sampling, giving you an even degree of clarity.
Set threshold limits for individual counters and then inspect them in the function and source code view, helping you to focus on problem areas.
User space Only Streamline
Profile without rebuilding the kernel
To access many of the counters in Streamline, you no longer need to rebuild the kernel with the gator.ko module included. Streamline now works in user space mode, provided you have a Linux kernel version 3.4 or greater. Note that not all information available through use of gator.ko module is available in user space mode.
This can be very useful if your team doesn't have access to the kernel source in order to rebuild it, making it easier to profile an application or overall system performance. Streamline will always initially look for the gator module, defaulting back to operating in user space mode if it's not detected.
Additionally, DS-5 also contains a pre-built gator daemon binary, which makes it un-necessary to rebuild the daemon in most cases.
Community Edition Features
The essentials for native Android app development
DS-5 Community Edition allows you to use a limited set of hardware and software counters, giving you a flavor of the features available in DS-5. This makes it an ideal tool for Android native app development, assisting you in profiling your software.
|DS-5 Community Edition Counters|
|Hardware Counters||Clock cycles|
|Cache coherency hit/miss|
|Branch mispredicted/pc change|
|Fault unaligned access|
|Linux Software Counters||Interrupts|
|Mali Counters||All counters available|