Profile with Streamline on HiKey 960
Before checking with Streamline, time the application and see how long it takes compared to the Pi and verify the results as the same as on the Pi:
$ /bin/time ./graph_alexnet 0 ./assets/ ./assets/go_kart.ppm ./assets/labels.txt ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter Test passed real 1.315090 user 4.900000 sys 0.188000
As expected, the application runs much faster on the HiKey 960 board. It is over a second faster than the Pi to complete the entire run. This is because the application is taking advantage of the multiprocessor as shown by user time lasting longer than real time.
Some observations when compared with the Raspberry Pi are:
- Setup is very fast, reading the model files takes very little time, less than ¼ of a second.
- Once the run starts there are 8 threads used which fully utilize the dual-cluster Cortex-A53x4, Cortex-A73x4 design.
- A large amount of time for each thread is spent doing matrix multiply operations, which are the same as the Raspberry Pi.