Overview Before you begin Check the supported layers Compare the ML framework and CMSIS-NN data layouts Quantization Compute activation statistics Choose a quantization scheme Compute the layer Q-formats Compute the layer shifts Generate the CMSIS-NN implementation Test the result Optimize the final implementation Summary Related information Next steps
Optimize the final implementation
In CMSIS-NN, there are several versions of the kernels depending on the values of the layer dimensions. For instance, there is a square version of functions for convolutional layers.
There are also some specific _opt versions of the fully connected layers which require some additional weight reordering. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs describes this.
The final implementation should use the most efficient version of each layer.
Also, often several buffers in the network have the same size. When those buffers are not used at the same time, the memory should be reused to minimize the number of buffers required for the full network.