Performance comparison: PyArmNN and TensorFlow Lite

In this section of the guide, we compare the performance of PyArmNN and the TensorFlow Lite Python API on a Raspberry Pi.

TensorFlow Lite uses an interpreter to perform inference. The interpreter uses static graph ordering and a custom (less-dynamic) memory allocator. For more information on how to load and run a model with Python API, see the TensorFlow Lite documentation.

For performance comparison, inference was carried out with our fire detection model. In our example, we only run inference once. We can also run the model multiple times and take the average inferencing time.

The script compares performance. To add to your Raspberry Pi you can clone it from the GitHub repository, or you can copy the following code into a Python file on your Raspberry Pi.

The following code shows

import tensorflow as tf
import argparse
import io
import time
import cv2
import numpy as np
from timeit import default_timer as timer

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="./fire_detection.tflite")

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
_,height,width,_ = input_details[0]['shape']
floating_model = False
if input_details[0]['dtype'] == np.float32:
    floating_model = True

parser = argparse.ArgumentParser(
      '--image', help='File path of image file', required=True)
args = parser.parse_args()

image = cv2.imread(args.image)
image = cv2.resize(image, (width, height))
image = np.expand_dims(image, axis=0)
if floating_model:
image = np.array(image, dtype=np.float32) / 255.0

# Test model on image.
interpreter.set_tensor(input_details[0]['index'], image)
start = timer()
end = timer()
print('Elapsed time is ', (end-start)*1000, 'ms')

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
j = np.argmax(output_data)
if j == 0:

You can run by using the following command:

$ python3 --image ./images/746.jpg

In the preceding command, the following picture was used:

When you have run you will get the following output:

2020-01-01 11:32:33.609188: E 
Elapsed time is  38.02500700112432 ms
[[9.9076563e-04 9.9900925e-01]]

We can extend with the same code for inference benchmarking. To extend you must add the following code before the line runtime.EnqueueWorkload(0, input_tensors, output_tensors):

start = timer()

After the line runtime.EnqueueWorkload(0, input_tensors, output_tensors) you must add the following code:

end = timer()
print('Elapsed time is ', (end - start) * 1000, 'ms')
Therefore, you will have the following code in
start = timer()
runtime.EnqueueWorkload(0, input_tensors, output_tensors)
end = timer()
print('Elapsed time is ', (end - start) * 1000, 'ms')

Run the script again:

$ python3 --image ./images/746.jpg

In the preceding code, the following picture was used:

You will get the following output:

Working with Arm NN version 21.0.0 
(128, 128, 3)

tensor id: 15616, 
tensor info: TensorInfo{DataType: 1, IsQuantized: 0, QuantizationScale: 0.000000, QuantizationOffset: 0, NumDimensions: 4, NumElements: 49152}

Elapsed time is  21.224445023108274 ms
[0.0009907632, 0.99900925]

From the preceding code, you can observe an inference performance enhancement by using the Arm NN Neon optimized computational backend.

Previous Next