Performance comparison: PyArmNN and TensorFlow Lite

In this section of the guide, we compare the performance of PyArmNN and the TensorFlow Lite Python API on a Raspberry Pi.

TensorFlow Lite uses an interpreter to perform inference. The interpreter uses static graph ordering and a custom (less-dynamic) memory allocator. For more information on how to load and run a model with Python API, see the TensorFlow Lite documentation.

For performance comparison, inference was carried out with our fire detection model. In our example, we only run inference once. We can also run the model multiple times and take the average inferencing time.

The predict_tflite.py script compares performance. To add predict_tflite.py to your Raspberry Pi you can clone it from the GitHub repository, or you can copy the following code into a Python file on your Raspberry Pi.

The following code shows predict_tflite.py:

import tensorflow as tf
import argparse
import io
import time
import cv2
import numpy as np
from timeit import default_timer as timer

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="./fire_detection.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
_,height,width,_ = input_details[0]['shape']
floating_model = False
if input_details[0]['dtype'] == np.float32:
    floating_model = True

parser = argparse.ArgumentParser(
      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument(
      '--image', help='File path of image file', required=True)
args = parser.parse_args()

image = cv2.imread(args.image)
image = cv2.resize(image, (width, height))
image = np.expand_dims(image, axis=0)
if floating_model:
image = np.array(image, dtype=np.float32) / 255.0

# Test model on image.
interpreter.set_tensor(input_details[0]['index'], image)
start = timer()
interpreter.invoke()
end = timer()
print('Elapsed time is ', (end-start)*1000, 'ms')

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
j = np.argmax(output_data)
if j == 0:
    print("Non-Fire")
else:
print("Fire")

You can run predict_tflite.py by using the following command:

$ python3 predict_tflite.py --image ./images/746.jpg

In the preceding command, the following picture was used:

When you have run predict_tflite.py you will get the following output:

2020-01-01 11:32:33.609188: E 
Elapsed time is  38.02500700112432 ms
[[9.9076563e-04 9.9900925e-01]]
Fire

We can extend predict_pyarmmnn.py with the same code for inference benchmarking. To extend predict_pyarmmnn.py you must add the following code before the line runtime.EnqueueWorkload(0, input_tensors, output_tensors):

start = timer()

After the line runtime.EnqueueWorkload(0, input_tensors, output_tensors) you must add the following code:

end = timer()
print('Elapsed time is ', (end - start) * 1000, 'ms')
Therefore, you will have the following code in predict_tflite.py:
start = timer()
runtime.EnqueueWorkload(0, input_tensors, output_tensors)
end = timer()
print('Elapsed time is ', (end - start) * 1000, 'ms')

Run the predict_pyarmnn.py script again:

$ python3 predict_pyarmnn.py --image ./images/746.jpg

In the preceding code, the following picture was used:

You will get the following output:

Working with Arm NN version 21.0.0 
(128, 128, 3)

tensor id: 15616, 
tensor info: TensorInfo{DataType: 1, IsQuantized: 0, QuantizationScale: 0.000000, QuantizationOffset: 0, NumDimensions: 4, NumElements: 49152}

Elapsed time is  21.224445023108274 ms
[0.0009907632, 0.99900925]
Fire

From the preceding code, you can observe an inference performance enhancement by using the Arm NN Neon optimized computational backend.

Previous Next