Profiling AlexNet on Raspberry Pi and HiKey 960 with the Arm Compute LibraryOverview Set up your Raspberry Pi NFS on Pi Build the Arm Compute Library on Pi Run the graph_alexnet application on Pi Start Streamline gatord on Pi Add Streamline annotations and rebuild on Pi Build the Arm Compute Library on HiKey 960 Profile with Streamline on HiKey 960 Next steps
Profile with Streamline on HiKey 960
Before checking with Streamline, time the application and see how long it takes compared to the Pi and verify the results as the same as on the Pi:
$ /bin/time ./graph_alexnet 0 ./assets/ ./assets/go_kart.ppm ./assets/labels.txt ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter Test passed real 1.315090 user 4.900000 sys 0.188000
As expected, the application runs much faster on the HiKey 960 board. It is over a second faster than the Pi to complete the entire run. This is because the application is taking advantage of multicore processing as shown by user time lasting longer than real time.
Some observations when compared with the Raspberry Pi are:
- Setup is very fast, reading the model files takes very little time, less than ¼ of a second.
- Once the run starts there are 8 threads used which fully utilize the dual-cluster Cortex-A53x4, Cortex-A73x4 design.
- A large amount of time for each thread is spent doing matrix multiply operations, which is the same as the Raspberry Pi.