Arm NN is an inference engine for CPUs, GPUs, and NPUs. Arm NN bridges the gap between existing neural network frameworks and the underlying hardware IP. It enables efficient translation of existing neural network frameworks, such as TensorFlow and Caffe. This guide explains how to create Linux applications which load TensorFlow trained Neural Network models, run them on Arm Cortex-A CPUs and Mali GPUs, and profile application performance with Arm Streamline.

The HiKey 960 board with a HiSilicon Hi3660 SoC is a great development board with good performance and recent Arm IP. The board runs the latest Android Pie, and contains the Cortex-A73, Cortex-A53, and Mali-G71 IP. The HiKey 960 is fully supported by the Android Open Source Project (AOSP).

The HiKey 960 is also great for Linux development, but the details on how to install Linux are more difficult compared to Android. The information below explains one way to install Ubuntu Linux on the HiKey 960.

This guide shows you how to setup a HiKey 960 board with Linux, so that you can use it to run applications using Arm NN. This guide also shows you how to use Arm Streamline for ML application profiling. The process is:

  • Setup Linux on the HiKey 960 board.
  • Build the Arm NN software stack.
  • Build and run the example ML applications.
  • Use Streamline for performance analysis.

Before you begin

In addition to your HiKey 960 board, make sure that you also have an HDMI monitor and cable, USB keyboard, USB type-C to USB cable (to connect the HiKey to your computer), and a power supply. You also need a small nail or paper clip to move the small switches on the board.

The scripts and examples for the project are available on github.

To get started, use git to clone the repository and enter the directory with the content.

$ git clone https://github.com/ARM-software/Tool-Solutions.git
$ cd Tool-Solutions/ml-tool-examples

In this guide, we assume that you are using an Ubuntu 18.04 host machine, but the instructions should work on most Linux distributions. Windows is not recommended as a host machine.

Duration: We estimate that you will need about two hours to complete the instructions in this guide.

Run Ubuntu Linux on the HiKey 960

Let's get started with installing Ubuntu Linux on the HiKey 960. The general steps are:

  • Run a recovery to make sure the partition table is correct.
  • Flash the firmware and Linux operating system.
  • Boot and run Linux.

Everything related to setting up Ubuntu on the HiKey 960 is in the hikey960-ubuntu/ directory:

$ cd hikey960-ubuntu

Build an Ubuntu filesystem

Before doing any board flashing, you need an Ubuntu root filesystem. The root filesystem for a Linux distribution is large and not easy to distribute as a binary file, but it can be created from scratch. This walk-through builds the Linux distribution from scratch using the build.sh script in the build-ubuntu/ directory.

To build the root filesystem, change directory to the build-ubuntu/ directory and run the build.sh script. This script builds the filesystem and adds some extra artifacts that need to be placed into the filesystem. For customization, study the files build.sh and config-fs.sh.

The script will ask for the sudo password on the host machine one or two times, so enter the password when needed. To build the file system move to the build directory and run the build script:

$ cd build-ubuntu/
$ ./build.sh
$ cd ..

It will take some time to download the packages for the Armv8-A version of Ubuntu. If the download goes well, the result is the output file ubuntu/system.simg This file represents the root filesystem that can be flashed to the HiKey 960 system partition. This file will be used in a later step.

Flash the base firmware and OS

The flash process builds on an existing project used to flash the base firmware and operating system. The flash process also ensures that the partition table is reset to the default state before installing the operating system. The flow is to clone the project from github, copy in some new files, and run the flash procedure. Download the project using:

$ git clone https://github.com/96boards-hikey/tools-images-hikey960

Fastboot, the Android flashing and booting utility, is used to flash the images. It can be obtained either from the Android SDK platform tools or installed using the following code:

$ sudo apt-get install fastboot

Flashing the base firmware may or may not be required. Whether or not it is required depends on the state of your board, but we recommend doing it for best results. To flash the base firmware, follow the procedure from 96boards.org, which is outlined in the following table.

Perform the flashing process in to two parts, recovery mode and fastboot mode, as described below. First, you will need to become familiar with the switches that control the modes on the board.

The board has three distinct modes: normal, fastboot, and recovery, as you can see in the following table. Recovery mode is used first, then fastboot, and finally normal mode to boot and run.

Switch name Switch Normal mode Fastboot mode Recovery mode
Ext boot Switch 3 OFF ON OFF
Boot mode Switch 2 OFF OFF ON
Auto power Switch 1 ON ON ON 


Flash the base firmware and OS

Recovery mode

First, put the board in recovery mode by setting the switches using the previous table. Connect the HiKey 960 board to your host machine via the USB-C (on the HiKey board) to the USB (on the host machine), and power it on. On the host machine, look in /dev/ for ttyUSB0 or ttyUSB1. The device node which appears after powering on the board is the one to use in the recovery procedure.

Run the recovery procedure using the recover.sh script. This procedure runs the hikey_idt Linux binary. First, copy all the files in flash-images/ to the tools-images-hikey960/ directory. Then run the recover.sh script with the device node the HiKey 960 board as the argument you can see in this code:

$ cp flash-images/* tools-images-hikey960/
$ cd tools-images-hikey960
$ ./recovery.sh /dev/ttyUSB0
Config name: config
Port name: /dev/ttyUSB0
0: Image: ./hisi-sec_usb_xloader.img Downalod Address: 0x20000
1: Image: ./hisi-sec_uce_boot.img Downalod Address: 0x6a908000
2: Image: ./hisi-sec_fastboot.img Downalod Address: 0x1ac00000
Serial port open successfully!
Start downloading ./hisi-sec_usb_xloader.img@0x20000...
file total size 99584
downlaod address 0x20000
Finish downloading
Start downloading ./hisi-sec_uce_boot.img@0x6a908000...
file total size 23680
downlaod address 0x6a908000
Finish downloading
Start downloading ./hisi-sec_fastboot.img@0x1ac00000...
file total size 3430400
downlaod address 0x1ac00000
Finish downloading

Flash the base firmware and OS

Fastboot mode

Now that the HiKey 960 board is flashed with the basic boot code, the next step is to flash the remaining images including the bootloader, the Linux kernel and the file system. To do this, power off the board, move the switches to Fastboot mode, as shown in the preceding table, and power the board back on.

In the host machine, confirm the board is visible with fastboot, as you can see in this code:

$ sudo fastboot devices
447786182000000        fastboot

Before the HiKey board can be flashed, the correct images and setup scripts must be in the same directory. In the previous step, the script fastboot.sh and two Linux images, boot.img and dts.img, were copied from flash-images/ to the tools-images-hikey960/ directory. The system.simg file, generated in the first step, also needs to be copied into the tools-images-hikey960/ directory. Copy the file system image using:

$ cp ../build-ubuntu/ubuntu/system.simg .

In summary, the Linux-related images in the tools-images-hikey960 directory are:

  • boot.img
  • dts.img
  • system.simg

In the tools-images-hikey960/ directory, run the fastboot.sh script: $ ./fastboot.sh

This code shows the contents of the fastboot.sh. The script should flash all the images.

#!/bin/bash -e


# partition table
sudo fastboot flash ptable ${IMG_FOLDER}/hisi-ptable.img

# bootloader
sudo fastboot flash xloader ${IMG_FOLDER}/hisi-sec_xloader.img
sudo fastboot flash fastboot ${IMG_FOLDER}/hisi-fastboot.img

# extra images
sudo fastboot flash nvme   ${IMG_FOLDER}/hisi-nvme.img
sudo fastboot flash fw_lpm3   ${IMG_FOLDER}/hisi-lpm3.img
sudo fastboot flash trustfirmware   ${IMG_FOLDER}/hisi-bl31.bin

# linux kernel and file system
sudo fastboot flash boot boot.img
sudo fastboot flash dts dts.img
sudo fastboot flash system system.simg

Boot Linux

Now that the flashing is done, the HiKey 960 board is ready to boot Linux. To do this, turn the board off, put the switches in normal mode, disconnect the USB type-C cable, connect an HDMI monitor via the HDMI cable, and plug in a USB keyboard.

Note: It is possible to view the HiKey through a console on the host computer, but this requires the 96boards UART adapter and is not discussed in this guide. You can use the UART adapter instead of an HDMI monitor. With the HiKey 960 board connected to a monitor and USB keyboard, power on the board. Linux should boot and show a login prompt on the HDMI monitor. Sign in using the following credentials. The password includes the letters m and l, not a number 1:

username: arm01 password: armml2018

After the credentials are accepted, the HiKey 960 board is running Linux. To connect and control the board from another machine, connect the HiKey 960 to a wireless network and use a secure shell client, such as ssh, to login to the board.

After login, run the utility nmtui and connect to a local wireless network using the menu item Activate a Connection. Select the router and enter the key to join the network.

Run the ifconfig command to get the IP address the board.

Now try to ssh to the HiKey 960 ip address from the host machine that you used to prepare the board. Use the same login credentials that you used previously. Run the ssh command with the username and substitute the IP address of the board as shown here:

$ ssh arm01@< ip address >

Add more diskspace

The partition with the root filesystem is only 4 Gb. More disk space is required, so that another filesystem can be created on the user partition and mounted. We also recommend adding 4 Gb of swap space. The script $ sudo ./filesystem.shis in the HiKey 960 Linux home directory at $HOME/filesys.sh. This script will create another filesystem and the swap space. Read this script so that you understand it, and then run it to add the filesystem and swap space. 

Answer yes to create a new filesystem and the swap space. The result is a new filesystem mounted on $HOME/armnn-devenv. This filesystem has more space and a swap file at $HOME/armnn-devenv/swapfile.

For the rest of the project, do the work in $HOME/armnn-devenv to take advantage of this new space.

In the following sections of this guide, we will try a couple of applications on the HiKey 960 board: MNIST Draw and MNIST Demo.

The easiest way to build Arm NN and the software examples is on the HiKey 960 board. Download the repository and run the build script on the HiKey 960 board:

$ cd $HOME
$ git clone https://github.com/ARM-software/Tool-Solutions.git 
$ cd Tool-Solutions/ml-tool-examples/build-armnn/
$ ./build-armnn.sh

The build script will download and compile all of the software that is required for Arm NN. This download and compilation will take about 90 minutes to complete.


MNIST Draw is a fun, single page website that enables users to hand-draw and classify digits between 0 and 9 using machine learning. A machine learning model trained against the MNIST dataset is used for classification.

The project is a modified version of mnist-draw, which uses the Arm NN SDK to perform inferences on an Arm Cortex-A CPU. The application runs on the HiKey 960 board and can be accessed over a network using a browser.



Navigate to the MNIST Draw code example, shown here:

# Go into the repository
$ cd $HOME/Tool-Solutions/ml-tool-examples/mnist-draw

# Build the armnn-draw application
$ make -C armnn-draw

# Set LD_LIBRARY_PATH for Arm NN (if not already done)
# This is also helpful to put in $HOME/.bashrc for future use
$ export LD_LIBRARY_PATH=$HOME/armnn-devenv/armnn/build

# Start Python server
$ python3 -m http.server --cgi 8000

Then open a browser on any machine which can access the HiKey 960 board, and go to http://ip-address:8000

An example of the website's interface is shown in the following image.

Using the mouse, draw a digit between 0 and 9 on the empty canvas, and then hit the 'Predict' button to process their drawing. Any errors during processing will be indicated with a warning icon and printed to the console. Common errors include not compiling the application in armnn-draw/ and not using python3.

Results are displayed as a bar graph, in which each classification label receives a score between 0.0 and 1.0 from the machine learning model. Clear the canvas with the Clear button to draw and process other digits.

MNIST Draw - Machine Learning model

Machine Learning model

For more information about how the web application translates the digit drawn into an image file processed by Arm NN, refer to the python script at cgi-bin/mnist.py.

A convolutional neural network (CNN) is defined within the model/ directory, and is used by the program in armnn-draw/ which incorporates the Arm NN SDK. This model is configured for MNIST data inputs. The default model is optimized_mnist_tf.pb.

Using this model, let’s try something that includes more than one image. In this example application, we will use a larger set of MNIST images, and show inference on both the CPUs and the GPU.


MNIST demo application

The MNIST demo application uses a TensorFlow neural network that is trained for MNIST. The demo application also uses Arm NN for inference on Arm Cortex-A or Mali. There are two example applications, one with the simple NN and one with a better neural network. Look at the C++ files for each version: mnist_tf_convol.cpp is the better NN, and mnist_tf_simple.cpp is the simple single-layer network. Both applications read a TensorFlow model. Models are stored in the model/ directory in protobuf binary format. The MNIST data is stored in directory data/ in a simple format that is designed for storing vectors. This directory contains the MIST test data and labels. Build the applications using make, as shown here:

$ cd $HOME/Tools-Solutions/ml-tool-examples/mnist-demo
$ make

The make builds both applications.

The purpose of these examples is to demonstrate how to use Arm NN to load and execute TensorFlow models in a C++ application.


Streamline is a performance analyzer for software running on Arm processors. It can be downloaded and installed from Arm Developer. There are thirty-day trials available for those needing a license.

Before profiling the application with Streamline, let’s look at the code in the mnist-demo application. Open either mnist_tf_convol.cpp or mnist_tf_simple.cpp.

Here are the steps to follow:

  1. Load and parse the MNIST data
  2. The helper function in mnist_loader.hpp scans the file in the dataset and returns a MnistImage struct with two fields: the label and an array of pixel values.

  3. Import the Tensorflow graph
  4. You can import a Tensorflow graph from both text and binary Protobuf formats.

    Importing a graph consists of binding the input and output points of the model graph.

    You can find these points by visualizing the model in Tensorboard.

    Note: After this step, the code is common regardless of the framework that you started with.

  5. Optimize for a specific compute device
  6. Arm NN supports optimization for both CPU and GPU devices.

    It is easy to specify the device when creating the execution runtime context in the code.

  7. Run the graph
  8. Running the inference on the chosen compute device is performed through the EnqueueWorkload() function of the context object.

    The result of the inference can be read directly from the output array and compared to the MnistImage label that we read from the data file.


Run the MNIST inference

To run the application use the command line to specify the hardware to use (CPU or GPU), and the number of images to process:

# Optimisation modes: 0 for CpuRef, 1 for CpuAcc, 2 for GpuAcc
# Input size: 1 to 2000 (number of images to predict)
$ ./mnist_tf_convol 1 10

Try diffent program configurations and compare the execution times

$ time ./mnist_tf_convol 0 10 # 10 images on unoptimised CPU device
$ time ./mnist_tf_convol 1 100 # 100 images on optimised CPU device
$ time ./mnist_tf_convol 2 1000 # 1000 images on GPU device


Use Streamline to connect and profile the application

Here are the steps to profile with Streamline:

  1. Start Streamline gatord on the HiKey 960 board
  2. After building all the software, the gator daemon is available in the home directory.

    Run using sudo to receive access to all hardware counters. Use the -a option to enable running a command during a capture (useful for a future step to automate the capture).

    $ sudo $HOME/gatord -a
  3. Ensure that the application we want to profile is running correctly
  4. The application binary is compiled as covered in previous steps. Run it to confirm that it is working.

    You should see output that looks similar to this:

    $ ./mnist_tf_convol 1 10
    Optimisation mode: CpuAcc
    #1 | Predicted: 7 Actual: 7
    #10 | Predicted: 9 Actual: 9
    Prediction accuracy: 100%
  5. Connect Streamline to the target board
  6. Run streamline and use the eyeball icon on the Target tab to connect to the HiKey 960 board.

    $ streamline &

  7. Start a capture in Streamline -> Run the application -> Stop the capture in Streamline
  8. Use the red button to start a capture and run the application to be profiled.

    $ ./mnist_tf_convol 1 10000

    Use the Stop sign button to stop the capture.

  9. Observe the results with Streamline
  10. Review the CPU workload, the correlation between the phases of execution and the functions called as well as the correlation between the functions and the kernels.

    Make sure to click on the Heat Map along the bottom and change it to Compute Library to see the ML information.


Automate the launch and capture

To automate the launch and capture, use the Streamline Capture & Analysis Options dialog to specify a working directory and a command to run.

There is a run.sh script in the mnist-demo/ directory, which can be the command to launch.

Also, click the Stop Capture checkbox to stop the capture when the run is over.

Study the trade-offs between the CPU and GPU acceleration.

The following screenshot shows how to setup an automated capture:


Here is an example capture obtained from Streamline showing the layers of the neural network and the functions in the Arm Compute Library which were executed:

Next steps

This guide has described how to setup Linux on the HiKey 960 board, build the Arm NN software stack, run ML examples, and use Streamline for performance analysis. You can use what you have learned in this guide to begin writing your own applications using Arm NN.

To explore related areas, look at our how to guides on other ML frameworks such as Caffe, TensorFlow Lite, and ONYX. To ask questions and get help, visit the Machine Learning forum on Arm Community.

To explore related areas, look at our how to guides on other ML frameworks such as Caffe, TensorFlow Lite, and ONYX.

To ask questions and get help, visit the Machine Learning forum on Arm Community.