Write your application
There are specific steps that you carry out to deploy and use a TensorFlow Lite quantized model with the Arm NN SDK.
You must do the following:
- Load the model output labels.
- Load and pre-process an input image for the quantized model.
- Prepare the output tensor.
- Import a graph.
- Optimize the model and load it onto a compute device.
- Run a graph on a device.
- Interpret and report the output.
Using the example code, this guide walks you through each step.
Load the model output labels
You must use the model output labels to interpret the outputs of the model. These labels are usually in a text file the model creator or distributor provides. In this file, each line contains the label or labels corresponding to each output node. You can use the utility function LoadModelOutputLabels
that the model_output_labels_loader.hpp
file defines to load the labels. The following example code loads the labels using the LoadModelOutputLabels
function:
const std::vector<CategoryNames> modelOutputLabels = LoadModelOutputLabels(programOptions.modelOutputLabelsPath);
Load and pre-process an input image for the quantized model
You must pre-process images before the model can use them as inputs. The pre-processing method that you use depends on the framework, model, or model data type you use.
For the purposes of this guide, you must do the following to pre-process the image:
- Resize the input images to match the dimensions of the input tensor of the model. In this example, the MobileNet V1 model accepts 224x224 input images.
- For floating-point models, you must scale the input image values to a range of -1 to 1. For example, if the input image values are between 0 to 255, you must divide the image values by 127.5 and subtract 1. For integer quantized models, the image values must be within the 0 to 255 range. Given that these image values are already within the correct range, you do not need to scale the input images of integer quantized models.
- Use the C++ operation
static_cast
to convert the input image values from floating point to 8-bit unsigned integer type.
You can pre-process images offline with your own tools. However, Arm NN comes with the PrepareImageTensor
utility function that handles pre-processing.
Note
You can also use the ImageTensorGenerator
as an offline tool to use static_cast
to convert images to input tensors. Refer to the README in the folder of the tool for more information.
The following example code loads and pre-processes an image the command-line option imagePath
specifies:
// Load and preprocess input image const std::vector<TContainer> inputDataContainers = { PrepareImageTensor<uint8_t>(programOptions.imagePath, inputTensorWidth, inputTensorHeight, normParams, inputTensorBatchSize, inputTensorDataLayout) } ;
As they are specific to the MobileNet V1 model, you must specify the following in your code:
-
inputTensorWidth
inputTensorHeight
inputTensorBatchSize
inputTensorDataLayout
inputName
outputName
Note
The inputName
and outputName
of your specific model can differ from the names in the example code. Ensure that you specify the correct inputName
and outputName
.
The following is the example code:
const std::string inputName = “input”; const std::string outputName = “MobilenetV1/Predictions/Reshape_1”; const unsigned int inputTensorWidth = 224; const unsigned int inputTensorHeight= 224; const unsigned int inputTensorBatchSize= 1; const armnn::DataLayout inputTensorDataLayout = armnn::DataLayout::NHWC;
The normParams
variable determines how the input image is normalized. The following pseudocode shows how the image pre-processor within the PrepareImageTensor
utility function calculates normalized image values:
out = ((in / scale) – mean) / stddev
Therefore, you specify the normParams
variable as the following example code shows:
// Prepare image normalization parameters normParams.scale = 1.0; normParams.mean = { 0.0, 0.0, 0.0 }; normParams.stddev = { 1.0, 1.0, 1.0 };
Prepare the output tensor
You must prepare a container to receive the output of the model.
The following example code prepares a container to receive the output of the model:
// Output tensor size is equal to the number of model output labels const unsigned int outputNumElements = modelOutputLabels.size(); std::vector<TContainer> outputDataContainers = { std::vector<uint8_t>(outputNumElements) };
Import a graph
You must import the TensorFlow Lite graph that you use. The Arm NN SDK provides parsers for reading graphs from TensorFlow Lite.
The SDK supports TensorFLow Lite graphs in text and binary ProtoBuf formats. To import the graph, you must:
- Load the model.
- Bind the input and output points of its graph.
The following example code imports the graph:
// Import the TensorFlowLite model. using IParser = armnnTfLiteParser::ITfLiteParser; auto armnnparser(IParser::Create()); armnn::INetworkPtr network = armnnparser->CreateNetworkFrom BinaryFile(programOptions.modelPath.c_str());
After this step, the code is common regardless of the framework that you started with.
The following example code binds the input and output tensors to the data and selects the loaded network identifier:
// Find the binding points for the input and output nodes using BindingPointInfo = armnnTfLiteParser::BindingPointInfo; const std::vector<BindingPointInfo> inputBindings = { armnnparser->GetNetworkInputBindingInfo(0, inputName) }; const std::vector<BindingPointInfo> outputBindings = { armnnparser->GetNetworkOutputBindingInfo(0, outputName) };
Note
You define the inputName
and outputName
strings at the beginning of the code.
Optimize the model and load it onto a compute device
You must optimize your network and load it onto a compute device. The Arm NN SDK supports optimized execution on multiple CPU and GPU devices. Before you start executing a graph, you must select the appropriate device context and optimize the graph for that device.
To select an Arm Mali GPU for use, you must specify -c GpuAcc
in the command line.
The following example code optimizes and loads your network onto a compute device:
// Create a runtime and optimize the network for a specific compute device, // e.g. CpuAcc, GpuAcc armnn::IRuntime::CreationOptions options; armnn::IRunTimePtr runtime(armnn::IRuntime::Create(options)); armnn::IOptimizedNetworkPtr optimizedNet = armnn::Optimize(*network, programOptions.computeDevice, runtime->GetDeviceSpec()); // Load the optimized network onto the device armnn::NetworkId networkId; runtime->LoadNetwork(networkId, std::move(optimizedNet));
Run a graph on a compute device
A compute device performs inference using the EnqueueWorkload()
function of the context.
The following example code runs a single inference on the test image:
runtime->EnqueueWorkload(networkId, armnnUtils::MakeInputTensors(inputBindings, inputDataContainers), armnnUtils::MakeOutputTensors(outputBindings, outputDataContainers));
Interpret and report the output
The output of the model is a tensor of the same size as the number of output labels. The size of the ImageNet tensor is 1001. You must interpret each value as the probability of the input image being classified as the corresponding label. To find the label that the model predicts most confidently, you must find the label of the output node with the highest output value.
The std::distance()
function in the following example code is used to find the index of the largest element in the output. This function is equivalent to the argmax()
function from the NumPy library.
std::vector<uint8_t> output = boost::get<std::vector<uint8_t>>(outputDataContainers[0]); size_t labelInd = std::distance(output.begin(), std::max_element(output.begin(),output.end())); std::cout << “Prediction: “; for (const auto& label : modelOutputLabels[labelInd]) { std::cout << label << “, “; } std:: cout << std::endl;