Streamline is a performance analyzer for software running on Arm processors. It can be downloaded and installed from Arm Developer. There are thirty-day trials available for those needing a license.

Before profiling the application with Streamline, let’s look at the code in the mnist-demo application. Open either mnist_tf_convol.cpp or mnist_tf_simple.cpp.

Here are the steps to follow:

  1. Load and parse the MNIST data
  2. The helper function in mnist_loader.hpp scans the file in the dataset and returns a MnistImage struct with two fields: the label and an array of pixel values.

  3. Import the Tensorflow graph
  4. You can import a Tensorflow graph from both text and binary Protobuf formats.

    Importing a graph consists of binding the input and output points of the model graph.

    You can find these points by visualizing the model in Tensorboard.

    Note: After this step, the code is common regardless of the framework that you started with.

  5. Optimize for a specific compute device
  6. Arm NN supports optimization for both CPU and GPU devices.

    It is easy to specify the device when creating the execution runtime context in the code.

  7. Run the graph
  8. Running the inference on the chosen compute device is performed through the EnqueueWorkload() function of the context object.

    The result of the inference can be read directly from the output array and compared to the MnistImage label that we read from the data file.

Previous Next