Add Streamline annotations and rebuild on Pi

First, add a few annotations to the application so that we can see the phases where it spends time. To do this, edit the file examples/graph_alexnet.cpp or you can use the version of the file in this download.

Add the include file streamline_annotate.h and call the ANNOTATE_SETUP macro at the start of the program.

int main(int argc, char **argv) 
{ 
     int st; 
     ANNOTATE_SETUP; 
     st= arm_compute::utils::run_example(argc, argv); 
     ANNOTATE_MARKER_STR("main complete"); 
     return (st); 
}

Next, annotate the do_setup() and the do_run() with the ANNOTATE_CHANNEL_COLOR() so the time spent in these is clearly visible. The modified graph_alexnet.cpp is attached as a reference.

Note: The download also contains an updated SConscript file that shows you how to include Streamline annotations in the application. 

Here, you are going to rebuild using the scons command. Only the example needs to be recompiled:

$ scons Werror=1 debug=1 asserts=0 neon=1 opencl=1 build=native –j2

Next, connect Streamline, start a capture, run the example, and stop the capture. For information and instructions on the connection process, read this article from the sentence starting with "Click the eye-ball, browse for a target" within the section Gator driver and daemon.


Some immediate observations on the capture:

  • Setup (do_setup) takes up most of the application's length and lasts around 14 seconds. Most of the time it is spent performing >200 Mb of Disk I/O.
  • Once the run (do_run) starts there are 4 threads to fully utilize the Cortex-A53x4.
  • As expected, much of the time for each thread is spent doing matrix multiply operations.

Having now profiled an application on the Raspberry Pi,  the remainder of this guide does the same on the HiKey 960 platform running Android.

Previous Next