Model choice

In this section of the guide, we explain the reasoning behind our model choice.

In this guide, we are working in the embedded space. Therefore, it is important to be aware of the limitations our target device may have, for example, the amount of available flash memory or RAM. These limitations should be considered when deciding what neural network architecture to use. For example, a model with a large memory footprint may be too big to fit in the available memory on your device.

In this guide, we use the MobilenetV2 model architecture from Google. This family of models provides accuracy, a small memory footprint and is provides efficient execution on performance-constrained devices. The MobilenetV2 model architecture is a family of models, which means that it is possible to select a model that fits the restrictions a device may have. This makes it an ideal model choice for an embedded platform. 

The trade-off between accuracy, speed, and size requirements is something that must always be considered when choosing what model to use. To give an example of this, our requirements are:

  • Inference time under 500ms
  • A model size of under 1MB
  • To be as accurate as possible

To meet those requirements, we first focused on the model size. The model size is determined by the number of parameters in the model. As mentioned in What happens in an MCU face recognition model?  we remove the final classification Conv2D layer of the model after training. This is because we only need the tensors that go into this layer as the fingerprint for our face recognition.

The following image shows the structure of the last few layers of Mobilenet v2.

This last Conv2D layer accounts for (1001x1x1x1280) + 1001 = 1.28M parameters. Therefore, the memory size can be calculated by subtracting 1.28 from the parameters. In our quantized model, each parameter uses 1 byte of memory and our size requirements are for the model to use under 1MB of memory. Therefore, by removing 1.28 from the parameters, we can see which model fits our size requirements. In the memory size column in the table below, this shows that all the models under float_v2_0.75_96 fit our size requirements.

We then must decide on an input resolution. Working with higher resolutions can give us better accuracy but comes at the expense of slower inference times. Inference times for these different input resolutions were measured on an Arm microcontroller-based device like a Cortex-M4 or a Cortex-M7. The highest resolution that satisfied our inference time requirement was selected, which was128 input size.

Note: Both model size and input resolution affect inference time, so it can be worth checking different combinations to find the sweet spot. A smaller model with higher resolution might give us faster inference times with similar accuracy.

The following table shows the results that were obtained when we altered the model version and image resolution:

Different model version and image resolution Quantized MACs(M) Parameters (M) Memory size Top 1 accuracy Top 5 accuracy
float_v2_1.4_224 uint8 582 6.06 4,78 75.0 92.5
float_v2_1.3_224 uint8 509 5.34 4.06 74.4 92.1
float_v2_1.0_224 uint8 300 3.47 2.19 71.8 91.0
float_v2_1.0_192 uint8 221 3.47 2.19 70.7 90.1
float_v2_1.0_160 uint8 154 3.47 2.19 68.8 89.0
float_v2_1.0_128 uint8 99 3.47 2.19 65.3 86.9
float_v2_1.0_96 uint8 56 3.47 2.19 60.3 83.2
float_v2_0.75_224 uint8 209 2.61 1.33 69.8 89.6
float_v2_0.75_192 uint8 153 2.61 1.33 68.7 88.9
float_v2_0.75_160 uint8 107 2.61 1.33 66.4 87.3
float_v2_0.75_128 uint8 69 2.61 1.33 63.2 85.3
float_v2_0.75_96 uint8 39 2.61 1.33 58.8 81.6
float_v2_0.5_224 uint8 97 1.95 0.67 65.4 86.4
float_v2_0.5_192 uint8 71 1.95 0.67 63.9 85.4
float_v2_0.5_160 uint8 50 1.95 0.67 61.0 83.2
float_v2_0.5_128 uint8 32 1.95 0.67 57.7 80.8
float_v2_0.5_96 uint8 18 1.95 0.67 51.2 75.8
float_v2_0.35_224 uint8 59 1.66 0.38 60.3 82.9
float_v2_0.35_192 uint8 43 1.66 0.38 58.2 81.2
float_v2_0.35_160 uint8 30 1.66 0.38 55.7 79.1
float_v2_0.35_128 uint8 20 1.66 0.38 50.8 75.0
float_v2_0.35_96 uint8 11 1.66 0.38 45.5 70.4
Previous Next