Overview

This guide shows you how to run a TensorFlow model using the open-source Arm NN SDK using an example application. You can use the knowledge that you gain from this guide to run your own models on Arm Cortex CPUs and Mali GPUs.

This guide uses Arm NN to run a model, following these steps:

  1. Load and parse the MNIST test set.
  2. Import graph.
  3. Optimize and load onto a compute device.
  4. Run graph on device.

The guide explains each step of the example code to help you understand each stage of the process.

Before you begin

We assume that you are familiar with neural networks and the MNIST dataset. If you are new to either of these concepts, read the TensorFlow tutorials and the MNIST database of handwritten digits paper.

The complete example with source code, data, and model is available on GitHub.

Load and parse the MNIST test set

To begin building your own TensorFlow model, load and parse the MNIST test set.

The following sample code loads and parses the MNIST test set:

// Load a test image and its correct label
std::string dataDir = "data/";
int testImageIndex = 0;
std::unique_ptr input = loadMnistImage(dataDir, testImageIndex);

The loadMnistImage helper function is not covered here. In simple terms, it parses the MNIST data files and returns an MnistImage struct for the requested image with a label and a 28*28=784 element array containing the data:

//Helper struct for loading MNIST data
struct MnistImage
{
unsigned int label;
float image[g_kMnistImageByteSize];
};

Import graph

Arm NN provides parsers for reading model files from neural network frameworks. There are typically two steps to do this:

  1. Load the model.
  2. Bind the input and output points of its graph.

 The following sample code imports the graph:

// Import the TensorFlow model. Note: use CreateNetworkFromBinaryFile for .pb files.
armnnTfParser::ITfParserPtr parser = armnnTfParser::ITfParser::Create();
armnn::TensorInfo inputTensorInfo({1, 784, 1, 1}, armnn::DataType::Float32);
armnn::INetworkPtr network = parser->CreateNetworkFromTextFile("model/simple_mnist_tf.prototxt",
                                                                   { {"Placeholder", {1, 784, 1, 1}} },
                                                                   { "Softmax" });

TensorFlow graphs in both text and binary ProtoBuf formats are supported. For more details on freezing TensorFlow graphs to include their weights, see the Customization basics: tensors and operations guide in the Related information section.

Note: After this step, the code is common regardless of the framework that you started with. This is because the INetwork and two BindingPointInfo objects provide everything that is needed.

Optimize and load onto a compute device

Arm NN supports optimized execution on multiple devices, including CPU and GPU. Before you start executing a graph, you must select the appropriate device context and optimize the graph for that device.

Using an Arm Mali GPU is as simple as specifying Compute::GpuAcc when creating the context. No other changes are required.

For a Raspian base installation, the only dependency that you must add is TensorFlow from Google’s binaries. Install some TensorFlow prerequisites by entering the following code in the command line:

// Create a context and optimize the network for one or more compute devices in order of preference
// e.g. GpuAcc, CpuAcc = if available run on Arm Mali GPU, else try to run on Arm v7 or v8 CPU 
armnn::IRuntime::CreationOptions options;
armnn::IRuntimePtr context = armnn::IRuntime::Create(options);
armnn::IOptimizedNetworkPtr optNet = armnn::Optimize(*net, {armnn::Compute::GpuAcc, armnn::Compute::CpuAcc}, runtime->GetDeviceSpec());
//Load the optimized network onto the device
armnn::NetworkID networkIdentifier;
context->LoadNetwork(networkIdentifier, std::move(optNet));

Run graph on device

Inference on a compute device is performed using the EnqueueWorkload() function of the context.

This example code runs a single inference on the test image:

// Run a single inference on the test image
std::array<float, 10=""> output;
armnn::Status ret = context->EnqueueWorkload(networkIdentifier,
                                            MakeInputTensors(inputBindingInfo, &input->image[0]),
                                            MakeOutputTensors(outputBindingInfo, &output[0]));</float,>

Here the input and output tensors are bound to data and the loaded network identifier is selected. The result of the inference can be read directly from the output array and compared to the MnistImage label that we read from the data file:

// Convert 1-hot output to an integer label and print
int label = std::distance(output.begin(), std::max_element(output.begin(), output.end()));
std::cout << "Predicted: " << label << std::endl;
std::cout << "   Actual: " << input->label << std::endl;

In this case, the std::distance function is used to find the index of the largest element in the output. This function is the equivalent to NumPy's argmax() function.

Deploy an application using Arm NN

You must link your application with the Arm NN library and the TensorFlow parsing library. The following code links your application:

g++ -std=C++11 -I$(ARMNN_INC) mnist.cpp -o mnist -L$(ARMNN_LIB) -larmnn -larmnnTfParser

For convenience, Arm NN can run with a reference implementation on x86 for development and testing, but the optimized kernels are only available for Arm CPUs and GPUs. This means that you must run your profiling and optimization steps on a real Arm-powered device, because x86 performance is not representative.

Related information

Here are some resources related to material in this guide:

Next steps

This guide describes the steps that are required to run a TensorFlow model using the Arm NN SDK. Using the information in this guide, you can to build and run your own models using Arm NN.