Before you can use the TensorFlow Lite quantization tools, you must:
- Install TensorFlow 1.9 or later. Arm tested TensorFlow version 1.10.1.
To follow the CifarNet examples in this article, clone the tensorflow/models repository from GitHub using the command:
git clone https://github.com/tensorflow/models.git
Use the master branch. Arm tested the checkout, d4e1f97fd8b929deab5b65f8fd2d0523f89d5b44.
- Prepare your network for quantization:
Remove unsupported operations that the TensorFlow quantization toolchain doesn’t support yet. Note that this support will change over time. See the TensorFlow documentation for details.
To remove unsupported operations from CifarNet, lines 68 and 71 must be removed from
Add fake quantization layers to the model graph, before you initialize the optimizer. Call this function on the finished graph to add these layers:
For a CifarNet example, you modify
tf.contrib.quantize.create_training_graph(quant_delay=90000)before the procedure to configure the optimization procedure on line 514. The
quant_delayparameter specifies how many steps the operation allows the network to train in 32-bit floating-point (FP32), before it introduces quantization effects. The value 90000 indicates that the model will be trained for 90000 steps with floating-point weights and activations, before the quantization process begins. You can also load the weights of an existing trained model and fine-tune it for quantization. In this case, add the code that loads the weights and set the
quant_delayvalue to 0 so that quantization begins immediately.