Quantization aware training

This section shows how we use quantization aware training in the model.

When using quantization aware training, the quantization of weights and activations is simulated during training. The simulation of weights and activation allows the model to adjust its weights and learn to adapt as well as it can to the quantization that we enforce. 

To do this we used TensorFlow’s model optimization toolkit, this toolkit includes tools to prepare tf.keras model for Quantization Aware Training (QAT). We used the toolkit to make our trained floating-point model quantization aware and then trained it for several iterations. Training it for several iterations, allowed us to claw back most of the accuracy we had lost when using just post-training quantization. We then used ROC curve to evaluate the performance of our classification problem at various thresholds settings. The curve in the following graph shows this difference, especially at low false acceptance rates:

Here are some tips for better accuracy when applying quantization aware training to the model:

  • It is generally better to fine-tune with quantization aware training than to train from scratch.
  • Try quantizing the later layers instead of the first layers: For quantization aware training, we found the best results were obtained when we introduced quantization at the end of normal training. We then used this result as a final fine-tuning. When we tried training from scratch, we found it was difficult for the training to even begin to converge.

After training to a suitable accuracy, we removed the final classification layer of the model and fine-tuned it, reserving the already-trained weights. This is because we used the model MobileNet v2 with a 0.5 width parameter in our training, the new last layer of the model produces an output of 640 numbers. This is the feature vector that represents symbolic characteristics of different faces.

Previous Next