Post-training quantization

This section shows how we use post-training quantization in the model.

During post-training quantization, trained model weights are quantized to the required bit width. Also, a small calibration dataset is used to model expected inputs and outputs for the different layers, so that network activations can be successfully quantized. The calibration dataset are samples of what your model expects to see when you deploy it. You can use some data from your training set for this job. This is shown in the following code:

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():    
    for _ in range(num_calibration_steps):       
       # Get sample input data as a numpy array in a method of your choosing
       yield [input]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8  
tflite_quant_model = converter.convert()

Post-training quantization was easy to do, and we were able to get a working model from it quickly. However, the accuracy drop was not in the ideal range. On reflection, we believe that the large drop we encountered is linked to our choice of a small model architecture. Therefore, if we had used a bigger network, we expect this drop would have been smaller.

Previous Next