Transform pre-trained Caffe model
In order to have a version of our NN that will be deployable on an Arm Cortex-M microcontroller you will have to follow the steps below.
Quantize the model
Once you have a trained model you will need to optimize it for your microcontroller. To do this you can use Arm's quantization script to convert the Caffe model weights and activations from 32-bit floating point to an 8-bit and fixed point format. This will not only reduce the size of the network, but also avoid floating point computations, that are more computationally expensive.
In the previously cloned
ML-examples repository, you will find the NN quantizer script that works by testing the network and figuring out the best format for the dynamic 8-bit fixed-point representation ensuring minimal loss in accuracy on the test dataset. The output of this script is a serialized Python pickle(.pkl) file which includes the network's model, quantized weights and activations, and the quantization format of each layer. Running the following command generates the quantized model:
# Run command in the ML-examples/cmsisnn-cifar10 directory cd ~/CMSISNN_Webinar/ML-examples/cmsisnn-cifar10 # Note: To enable GPU for quantization sweeps, use '--gpu' argument python nn_quantizer.py --model models/cifar10_m7_train_test.prototxt --weights models/cifar10_m7_iter_300000.caffemodel.h5 --save models/cifar10_m7.pkl
Convert the model
Now that you have a more optimized network you should take care of converting it to a C++ file that you will then be able to include in your camera application.
# Run command in the ML-examples/cmsisnn-cifar10 directory python code_gen.py --model models/cifar10_m7.pkl --out_dir code/m7
This script gets the quantization parameters and network graph connectivity from the previous step and generates the code consisting of NN function calls. Note: that at this stage the supported layers are convolution, innerproduct, pooling (max/average) and relu. The output is a series of files:
- nn.cpp and nn.h are the files to be included to run the NN on the device
- weights.h and parameter.h consist of quantization rangesHTML