Profile with Streamline on HiKey 960

Before checking with Streamline, time the application and see how long it takes compared to the Pi and verify the results as the same as on the Pi:

$ /bin/time ./graph_alexnet 0 ./assets/ ./assets/go_kart.ppm ./assets/labels.txt 

---------- Top 5 predictions ---------- 

0.9736 - [id = 573], n03444034 go-kart 
0.0118 - [id = 518], n03127747 crash helmet 
0.0108 - [id = 751], n04037443 racer, race car, racing car 
0.0022 - [id = 817], n04285008 sports car, sport car 
0.0006 - [id = 670], n03791053 motor scooter, scooter 

Test passed 
real 1.315090 
user 4.900000 
sys 0.188000

As expected, the application runs much faster on the HiKey 960 board. It is over a second faster than the Pi to complete the entire run. This is because the application is taking advantage of the multiprocessor as shown by user time lasting longer than real time.

Some observations when compared with the Raspberry Pi are:

  • Setup is very fast, reading the model files takes very little time, less than ¼ of a second.
  • Once the run starts there are 8 threads used which fully utilize the dual-cluster Cortex-A53x4, Cortex-A73x4 design.
  • A large amount of time for each thread is spent doing matrix multiply operations, which are the same as the Raspberry Pi.
Previous Next