Quantize the model

Once we have a trained model we need to shrink it to a reasonable size. To do this we use the quantization script from Arm to convert the Caffe model weights and activations from 32-bit floating point to an 8-bit and fixed point format. This will not only reduce the size of the network, but also avoid floating point computations.

The NN quantizer script works by testing the network and figuring out the best format for the dynamic fixed-point representation. The output of this script is a serialized Python (.pkl) file which includes the network's model, quantized weights and activations, and the quantization format of each layer. Running the following command generates the quantized model:

python2 repos/ML-examples/cmsisnn-cifar10/nn_quantizer.py --model repos/openmv/ml/cmsisnn/models/smile/smile_train_test.prototxt --weights repos/openmv/ml/cmsisnn/models/smile/smile_iter_*.caffemodel --save repos/openmv/ml/cmsisnn/models/smile/smile.pk1

If you want to learn more about quantization read this blog.

Previous Next