Run ML Inference with PyArmNN

This section of the guide shows you how to set up a Machine Learning (ML) model on your Raspberry Pi. You will perform the following steps:

  • Import the PyArmNN module
  • Load an input image
  • Create a parser and load the network
  • Choose backends, create the runtime, and optimize the model
  • Perform inference
  • Interpret and report the output

The example code that performs these steps is predict_pyarmnn.py. To add predict_PyArmNN.py to your Raspberry Pi, you can clone it from the GitHub repository, or you can copy the following code into a python file on your Raspberry Pi.

The example script

The complete predict_pyarmnn.py code is as follows:

import pyarmnn as ann
import numpy as np
import cv2
import argparse

print('Working with Arm NN version ' + ann.ARMNN_VERSION)

#Load an image 
parser = argparse.ArgumentParser(
      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument(
      '--image', help='File path of image file', required=True)
args = parser.parse_args()

image = cv2.imread(args.image)
image = cv2.resize(image, (128, 128))
image = np.array(image, dtype=np.float32) / 255.0
print(image.shape)

# ONNX, Caffe and TF parsers also exist.
parser = ann.ITfLiteParser()  
network = parser.CreateNetworkFromBinaryFile('./fire_detection.tflite')

graph_id = 0
input_names = parser.GetSubgraphInputTensorNames(graph_id)
input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0])
input_tensor_id = input_binding_info[0]
input_tensor_info = input_binding_info[1]
print('tensor id: ' + str(input_tensor_id))
print('tensor info: ' + str(input_tensor_info))
# Create a runtime object that will perform inference.
options = ann.CreationOptions()
runtime = ann.IRuntime(options)

# Backend choices earlier in the list have higher preference.
preferredBackends = [ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')]
opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions())

# Load the optimized network into the runtime.
net_id, _ = runtime.LoadNetwork(opt_network)
print("Loaded network, id={net_id}")
# Create an inputTensor for inference.
input_tensors = ann.make_input_tensors([input_binding_info], [image])

# Get output binding information for an output layer by using the layer name.
output_names = parser.GetSubgraphOutputTensorNames(graph_id)
output_binding_info = parser.GetNetworkOutputBindingInfo(0, output_names[0])
output_tensors = ann.make_output_tensors([output_binding_info])

runtime.EnqueueWorkload(0, input_tensors, output_tensors)
results = ann.workload_tensors_to_ndarray(output_tensors)
print(results[0])
j = np.argmax(results[0])
if j == 0:
    print("Non-Fire")
else:
    print("Fire")

Running the example script

Run the Python script from the command line, as shown in the following code:

$ python3 predict_pyarmnn.py --image ./images/opencountry_land663.jpg 

In the preceding code, the following picture was used:

You should get the following output:

Working with Arm NN version 21.0.0
(128, 128, 3)

tensor id: 15616, 
tensor info: TensorInfo{DataType: 1, IsQuantized: 0, QuantizationScale: 0.000000, QuantizationOffset: 0, NumDimensions: 4, NumElements: 49152}
[0.9967675, 0.00323252]
Non-Fire

In our example, the image has a 0.9967675 probability of being class 0, and a 0.00323252 probability of being class 1. Class 0 means fire, and class 1 means non-fire. The example did not detect fire in the image.

Explaining the example script

The following steps break down what is happening and why in predict_pyarmnn.py:

  1. Import the PyArmNN module, to define the location of our model, image, and label file. This is shown in the following code:

    import pyarmnn as ann
    import numpy as np
    import cv2
    
    print('Working with Arm NN version ' + ann.ARMNN_VERSION)
    print(ann.ARMNN_VERSION)
  2. Load and pre-process an input image. To do this, load the image specified in the following code example, then resize it to the model input dimension. In predict_pyarmnn.py, the model accepts 128x128 input images. The input image is wrapped in a const tensor and bound to the input tensor. This is shown in the following code:

    parser = argparse.ArgumentParser(
          formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
          '--image', help='File path of image file', required=True)
    args = parser.parse_args()
    
    # Load an image.
    image = cv2.imread(args.image)
    image = cv2.resize(image, (128, 128))
    image = np.array(image, dtype=np.float32) / 255.0
    print(image.size)

    The model is a floating-point model. Therefore, we must scale the input image values to a range of -1 to 1.

  3. Create a parser object that will be used to load the network file. Arm NN has parsers for various model file types, including TFLite, ONNX, and Caffe. Parsers create the underlying Arm NN graph, so that you do not need to construct your model graph by hand.

    The following code creates a TfLite parser to load our TensorFlow Lite model from the specified path:

    parser = ann.ITfLiteParser()
    network = parser.CreateNetworkFromBinaryFile('./fire_detection.tflite')
  4. Use the parser to extract the input information for the network.

    You can extract all the input names by calling GetSubgraphInputTensorNames() and use those input names to get the input binding information. For this example, the model only has one input layer. This means that you use input_names[0] to obtain the input tensor, then use this string to retrieve the input binding info.

    The input binding information is a tuple consisting of integer identifiers for the following:

    • Bindable layers, for example, inputs, and outputs
    • The tensor information, for example, data type, quantization information, number of dimensions, and total number of elements.

    The following code extracts the input binding information:

    graph_id = 0
    input_names = parser.GetSubgraphInputTensorNames(graph_id)
    input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0])
    input_tensor_id = input_binding_info[0]
    input_tensor_info = input_binding_info[1]
    print('tensor id: ' + str(input_tensor_id))
    print('tensor info: ' + str(input_tensor_info))
  5. Specify the backend list so that you can optimize the network. This is shown in the following code:

    options = ann.CreationOptions()
    runtime = ann.IRuntime(options)
    
    # Backend choices earlier in the list have higher preference.
    preferredBackends = [ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')]
    opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions())

    If your device has an Arm CPU and a Mali GPU, you could define the backend list as follows:

    preferredBackends = [ann.BackendId('CpuAcc'), ann.BackendId('GpuAcc'), ann.BackendId('CpuRef')]
  6. Load the optimized network in the run-time context. LoadNetwork() creates the backend-specific workloads for the layers. This is shown in the following example:

    # Load the optimized network into the runtime.
    net_id, _ = runtime.LoadNetwork(opt_network)
    print("Loaded network, id={net_id}")
    input_tensors = ann.make_input_tensors([input_binding_info], [image])
  7. Get the output binding information and make the output tensor. This is done in a similar way to the input binding information. We can use the parser to retrieve the output tensor names and get the binding information.

    The following code assumes that an image classification model has only one output. Therefore, the code only uses the first name from the list returned. You could easily extend this code to process multiple outputs by iterating over the output_names array:

    # Get output binding information for an output layer by using the layer name.
    output_names = parser.GetSubgraphOutputTensorNames(graph_id)
    output_binding_info = parser.GetNetworkOutputBindingInfo(0, output_names[0])
    output_tensors = ann.make_output_tensors([output_binding_info])
  8. Perform inference. The EnqueueWorkload() function of the run-time context executes the inference for the network loaded. This is shown in the following example:

    runtime.EnqueueWorkload(0, input_tensors, output_tensors)
    results = ann.workload_tensors_to_ndarray(output_tensors)
Previous Next