Quantize the Graph
Use the TensorFlow Lite Converter tflite_convert
to optimize the TensorFlow graphs and convert them to the TensorFlow Lite format for 8-bit inference. This tool is installed as standard in your path with TensorFlow 1.9 or later.
To use the TensorFlow Lite Converter:
-
Use the
tflite_convert
command-line program using the command:tflite_convert --graph_def_file=<your_frozen_graph> \ --output_file=<your_chosen_output_location> \ --input_format=TENSORFLOW_GRAPHDEF \ --output_format=TFLITE \ --inference_type=QUANTIZED_UINT8 \ --output_arrays=<your_output_arrays> \ --input_arrays=<your_input_arrays> \ --mean_values=<mean of input training data> \ --std_dev_values=<standard deviation of input training data>
For CifarNet, this command is:
tflite_convert --graph_def_file=/tmp/frozen_cifarnet.pb \ --output_file=/tmp/quantized_cifarnet.tflite \ --input_format=TENSORFLOW_GRAPHDEF \ --output_format=TFLITE \ --inference_type=QUANTIZED_UINT8 \ --output_arrays=CifarNet/Predictions/Softmax \ --input_arrays=input \ --mean_values 121 \ --std_dev_values 64
This command creates an output file that is one quarter of the size of the 32-bit frozen input file.
For more information on using the TensorFlow Lite Converter, see the TensorFlow GitHub.
-
Check the accuracy of your result to ensure that it is the same as the original 32-bit graph.