Profiling AlexNet on Raspberry Pi and HiKey 960 with the Arm Compute LibraryOverview Set up your Raspberry Pi NFS on Pi Build the Arm Compute Library on Pi Run the graph_alexnet application on Pi Start Streamline gatord on Pi Add Streamline annotations and rebuild on Pi Build the Arm Compute Library on HiKey 960 Profile with Streamline on HiKey 960 Next steps
Run the graph_alexnet application on Pi
First, download the ZIP file with the AlexNet model, input images, and labels onto the Pi. Create a new directory and unzip the file, as shown here:
$ cd ; mkdir assets_alexnet $ unzip compute_library_alexnet.zip -d assets_alexnet
The AlexNet example performs a simple image classification task.
On Linux, the libraries are compiled as
.so files, so the LD_LIBRARY_PATH environment variable is used to point to the
.so files from the Arm Compute Library.
Next, run the application to ensure that the test passes. To do this, enter the following commands:
$ export LD_LIBRARY_PATH=$HOME/ComputeLibrary/build/ $ export PATH_ASSETS=$HOME/assets_alexnet $ time ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt Can't load libOpenCL.so: libOpenCL.so: cannot open shared object file: No such file or directory Can't load libGLES_mali.so: libGLES_mali.so: cannot open shared object file: No such file or directory Can't load libmali.so: libmali.so: cannot open shared object file: No such file or directory Couldn't find any OpenCL library . ./build/examples/graph_alexnet [GRAPH][08-02-2018 06:39:39][INFO] Instantiating NEConvolutionLayer [GRAPH][08-02-2018 06:39:40][INFO] Data Type: F32 Input Shape: 227x227x3 Weights shape: 11x11x3x96 Biases Shape: 96 Output Shape: 55x55x96 PadStrideInfo: 4,4;0,0,0,0 Groups: 1 WeightsInfo: 0;0;0,0 [GRAPH][08-02-2018 06:39:53][INFO] Instantiating NESoftmaxLayer Data Type: F32 Input shape: 1000 Output shape: 1000 ---------- Top 5 predictions ---------- 0.9736 - [id = 573], n03444034 go-kart 0.0118 - [id = 518], n03127747 crash helmet 0.0108 - [id = 751], n04037443 racer, race car, racing car 0.0022 - [id = 817], n04285008 sports car, sport car 0.0006 - [id = 670], n03791053 motor scooter, scooter Test passed real 0m20.017s user 0m21.930s sys 0m1.460s
The application prints some messages about missing libraries for OpenCL and OpenGL, debug messages that only occur for debug builds, and finally the predictions at the end. The user time is more than the real time which means not a lot of parallel computation is happening. Streamline will show more about what is happening.