Quantize the Graph

Use the TensorFlow Lite Converter tflite_convert to optimize the TensorFlow graphs and convert them to the TensorFlow Lite format for 8-bit inference. This tool is installed as standard in your path with TensorFlow 1.9 or later.

To use the TensorFlow Lite Converter:

  1. Use the tflite_convert command-line program using the command:

    tflite_convert --graph_def_file=<your_frozen_graph> \
    --output_file=<your_chosen_output_location> \
    --input_format=TENSORFLOW_GRAPHDEF \
    --output_format=TFLITE \
    --inference_type=QUANTIZED_UINT8 \
    --output_arrays=<your_output_arrays> \
    --input_arrays=<your_input_arrays> \
    --mean_values=<mean of input training data> \
    --std_dev_values=<standard deviation of input training data>

    For CifarNet, this command is:

    tflite_convert --graph_def_file=/tmp/frozen_cifarnet.pb \
    --output_file=/tmp/quantized_cifarnet.tflite \
    --input_format=TENSORFLOW_GRAPHDEF \
    --output_format=TFLITE \
    --inference_type=QUANTIZED_UINT8 \
    --output_arrays=CifarNet/Predictions/Softmax \
    --input_arrays=input \
    --mean_values 121 \
    --std_dev_values 64

    This command creates an output file that is one quarter of the size of the 32-bit frozen input file.

    For more information on using the TensorFlow Lite Converter, see the TensorFlow GitHub.

  2. Check the accuracy of your result to ensure that it is the same as the original 32-bit graph.

Previous