Building your Caffe model using the Arm NN SDK

To run a Caffe model using the Arm NN SDK:

  1. Load and parse the MNIST test set.
  2. Import a graph.
  3. Optimize and load onto a compute device.
  4. Run a graph on a device.

Using the example code, this guide walks you through each step.

Load and parse the MNIST test set

To begin building your own Caffe model, load and parse the MNIST test set.

This example code loads and parses the MNIST test set:

// Load a test image and its correct label
std::string dataDir = "data/"; 
int testImageIndex = 0;
std::unique_ptr input = loadMnistImage(dataDir, testImageIndex);

Import a graph

The Arm NN SDK provides parsers for reading model files from Caffe.

The SDK supports Caffe graphs in text and binary ProtoBuf formats. To import the graph:

  1. Load the model.
  2. Bind the input and output points of its graph.

This example code imports the graph:

// Import the Caffe model. Note: use CreateNetworkFromTextFile for text files.
armnnCaffeParser::ICaffeParserPtr parser =
armnn::INetworkPtr network = 
      		{ }, // input taken from file if empty
      		{ "prob" }); // output node

After this step, the code is common regardless of the framework that you started with.

This example code binds the input and output tensors to the data and selects the loaded network identifier:

// Find the binding points for the input and output nodes
armnnCaffeParser::BindingPointInfo inputBindingInfo =
armnnCaffeParser::BindingPointInfo outputBindingInfo =

You can read the result of the inference from the output array and compare it to the MnistImage label from the data file.

Optimize and load onto a compute device

You must optimize your network and load it onto a compute device. The Arm NN SDK supports optimized execution on multiple CPU and GPU devices. Before you start executing a graph, you must select the appropriate device context and optimize the graph for that device.

You specify a preferential order of compute devices you want workloads to run on when you call the optimize() function. The optimizer attempts to schedule the workloads on the first specified device and falls back to the following device if the first is not available or does not support the workload. The following are the compute devices you specify to run workloads:

  • The Arm Mali GPU which you use GpuAcc to specify.
  • The Armv7 or Armv8 CPU which you use CpuAcc to specify.

This example code optimizes and loads your network onto a compute device:

//// Create a context and optimize the network for one or more compute devices in order of preference 
// e.g. GpuAcc, CpuAcc = if available run on Arm Mali GPU, else try to run on ARM v7 or v8 CPU
armnn::IRuntime::CreationOptions options;
armnn::IRuntimePtr context = armnn::IRuntime::Create(options);
armnn::IOptimizedNetworkPtr optNet = armnn::Optimize(*net, {armnn::Compute::GpuAcc, armnn::Compute::CpuAcc}, runtime->GetDeviceSpec()); 

// Load the optimized network onto the device
armnn::NetworkId networkIdentifier;
context->LoadNetwork(networkIdentifier, std::move(optNet));

Run a graph on a compute device

Inference on a compute device is performed using the EnqueueWorkload() function of the context.

This example code runs a single inference on the test image:

// Run a single inference on the test image 
std::array<float, 10=""> output; 
armnn::Status ret = context->EnqueueWorkload(networkIdentifier, 
        MakeInputTensors(inputBindingInfo, &input->image[0]), 
	MakeOutputTensors(outputBindingInfo, &output[0])); 
The std::distance() function in the following example code is used to find the index of the largest element in the output. This function is equivalent to NumPy's argmax() function.

// Convert 1-hot output to an integer label and print 
int label = std::distance(output.begin(),
std::cout << "Predicted: " << label << std::endl; 
std::cout << "   Actual: " << input->label << std::endl; 
return 0;
Previous Next