Before profiling the application with Streamline, let’s look at the code in the mnist-demo application. Open either mnist_tf_convol.cpp or mnist_tf_simple.cpp.
Here are the steps to follow:
- Load and parse the MNIST data
- Import the Tensorflow graph
- Optimize for a specific compute device
- Run the graph
The helper function in mnist_loader.hpp scans the file in the dataset and returns a MnistImage struct with two fields: the label and an array of pixel values.
You can import a Tensorflow graph from both text and binary Protobuf formats.
Importing a graph consists of binding the input and output points of the model graph.
You can find these points by visualizing the model in Tensorboard.
Note: After this step, the code is common regardless of the framework that you started with.
Arm NN supports optimization for both CPU and GPU devices.
It is easy to specify the device when creating the execution runtime context in the code.
Running the inference on the chosen compute device is performed through the EnqueueWorkload() function of the context object.
The result of the inference can be read directly from the output array and compared to the MnistImage label that we read from the data file.