Write your application

There are specific steps that you carry out to deploy and use a TensorFlow Lite quantized model with the Arm NN SDK.

You must do the following:

  • Load the model output labels.
  • Load and pre-process an input image for the quantized model.
  • Prepare the output tensor.
  • Import a graph.
  • Optimize the model and load it onto a compute device.
  • Run a graph on a device.
  • Interpret and report the output.

Using the example code, this guide walks you through each step.

Load the model output labels

You must use the model output labels to interpret the outputs of the model. These labels are usually in a text file the model creator or distributor provides. In this file, each line contains the label or labels corresponding to each output node. You can use the utility function LoadModelOutputLabels that the model_output_labels_loader.hpp file defines to load the labels. The following example code loads the labels using the LoadModelOutputLabels function:

const std::vector<CategoryNames> modelOutputLabels =

Load and pre-process an input image for the quantized model

You must pre-process images before the model can use them as inputs. The pre-processing method that you use depends on the framework, model, or model data type you use.

For the purposes of this guide, you must do the following to pre-process the image:

  1. Resize the input images to match the dimensions of the input tensor of the model. In this example, the MobileNet V1 model accepts 224x224 input images.
  2. For floating-point models, you must scale the input image values to a range of -1 to 1. For example, if the input image values are between 0 to 255, you must divide the image values by 127.5 and subtract 1. For integer quantized models, the image values must be within the 0 to 255 range. Given that these image values are already within the correct range, you do not need to scale the input images of integer quantized models.
  3. Use the C++ operation static_cast to convert the input image values from floating point to 8-bit unsigned integer type.

You can pre-process images offline with your own tools. However, Arm NN comes with the PrepareImageTensor utility function that handles pre-processing.


You can also use the ImageTensorGenerator as an offline tool to use static_cast to convert images to input tensors. Refer to the README in the folder of the tool for more information.

The following example code loads and pre-processes an image the command-line option imagePath specifies:

// Load and preprocess input image
const std::vector<TContainer> inputDataContainers =
{ PrepareImageTensor<uint8_t>(programOptions.imagePath,
 inputTensorWidth, inputTensorHeight,
 inputTensorDataLayout) } ;

As they are specific to the MobileNet V1 model, you must specify the following in your code:

  • inputTensorWidth
  • inputTensorHeight
  • inputTensorBatchSize
  • inputTensorDataLayout
  • inputName
  • outputName


The inputName and outputName of your specific model can differ from the names in the example code. Ensure that you specify the correct inputName and outputName.

The following is the example code:

const std::string inputName = “input”;
const std::string outputName = “MobilenetV1/Predictions/Reshape_1”;
const unsigned int inputTensorWidth = 224;
const unsigned int inputTensorHeight= 224;
const unsigned int inputTensorBatchSize= 1;
const armnn::DataLayout inputTensorDataLayout = armnn::DataLayout::NHWC;

The normParams variable determines how the input image is normalized. The following pseudocode shows how the image pre-processor within the PrepareImageTensor utility function calculates normalized image values:

out = ((in / scale) – mean) / stddev

Therefore, you specify the normParams variable as the following example code shows:

// Prepare image normalization parameters
normParams.scale = 1.0;
normParams.mean = { 0.0, 0.0, 0.0 };
normParams.stddev = { 1.0, 1.0, 1.0 };

Prepare the output tensor

You must prepare a container to receive the output of the model.

The following example code prepares a container to receive the output of the model:

// Output tensor size is equal to the number of model output labels
const unsigned int outputNumElements = modelOutputLabels.size();
std::vector<TContainer> outputDataContainers = { std::vector<uint8_t>(outputNumElements) };

Import a graph

You must import the TensorFlow Lite graph that you use. The Arm NN SDK provides parsers for reading graphs from TensorFlow Lite.

The SDK supports TensorFLow Lite graphs in text and binary ProtoBuf formats. To import the graph, you must:

  1. Load the model.
  2. Bind the input and output points of its graph.

The following example code imports the graph:

// Import the TensorFlowLite model.
using IParser = armnnTfLiteParser::ITfLiteParser;
auto armnnparser(IParser::Create());
armnn::INetworkPtr network = armnnparser->CreateNetworkFrom BinaryFile(programOptions.modelPath.c_str());

After this step, the code is common regardless of the framework that you started with.

The following example code binds the input and output tensors to the data and selects the loaded network identifier:

// Find the binding points for the input and output nodes
using BindingPointInfo = armnnTfLiteParser::BindingPointInfo;
const std::vector<BindingPointInfo> inputBindings = { armnnparser->GetNetworkInputBindingInfo(0, inputName) };
const std::vector<BindingPointInfo> outputBindings = { armnnparser->GetNetworkOutputBindingInfo(0, outputName) };


You define the inputName and outputName strings at the beginning of the code.

Optimize the model and load it onto a compute device

You must optimize your network and load it onto a compute device. The Arm NN SDK supports optimized execution on multiple CPU and GPU devices. Before you start executing a graph, you must select the appropriate device context and optimize the graph for that device.

To select an Arm Mali GPU for use, you must specify -c GpuAcc in the command line.

The following example code optimizes and loads your network onto a compute device:

// Create a runtime and optimize the network for a specific compute device,
// e.g. CpuAcc, GpuAcc
armnn::IRuntime::CreationOptions options;
armnn::IRunTimePtr runtime(armnn::IRuntime::Create(options));
armnn::IOptimizedNetworkPtr optimizedNet = armnn::Optimize(*network, programOptions.computeDevice, runtime->GetDeviceSpec());

// Load the optimized network onto the device

armnn::NetworkId networkId;
runtime->LoadNetwork(networkId, std::move(optimizedNet));

Run a graph on a compute device

A compute device performs inference using the EnqueueWorkload() function of the context.

The following example code runs a single inference on the test image:

    armnnUtils::MakeInputTensors(inputBindings, inputDataContainers),
    armnnUtils::MakeOutputTensors(outputBindings, outputDataContainers));

Interpret and report the output

The output of the model is a tensor of the same size as the number of output labels. The size of the ImageNet tensor is 1001. You must interpret each value as the probability of the input image being classified as the corresponding label. To find the label that the model predicts most confidently, you must find the label of the output node with the highest output value.

The std::distance() function in the following example code is used to find the index of the largest element in the output. This function is equivalent to the argmax() function from the NumPy library.

std::vector<uint8_t> output = boost::get<std::vector<uint8_t>>(outputDataContainers[0]);

size_t labelInd = std::distance(output.begin(), std::max_element(output.begin(),output.end()));
std::cout << “Prediction: “;
for (const auto& label : modelOutputLabels[labelInd])
 std::cout << label << “, “;
std:: cout << std::endl;
Previous Next