Next steps

Now that you have implemented real-time Machine Learning (ML) on a Cortex-M device, what other ML applications can you deploy using this approach with CMSIS-NN?

To recap, your models need to be small, pre-trained, and optimized for your input data.

CMSIS-NN is optimized for Convolutional Neural Networks (CNNs) and makes use of SIMD instructions. SIMD is only available in Arm Cortex-M4, Cortex-M7, Cortex-M33 and Cortex-M35P processors. Although it is possible to run CMSIS-NN on earlier processors, such as the Cortex-M0, you won't see the same performance benefits that are on offer with Cortex-M4, Cortex-M7, Cortex-M33 and Cortex-M35P based devices.

To keep learning about ML with Arm, here are some resources related to this guide:

GitHub Repositories

You can find more code examples and resources for performing ML with Arm technologies on our GitHub pages:

This guide shows that you don’t need a high-spec machine or cloud-based engine to do real-time Machine Learning (ML) tasks. Thanks to the recent advancements in Arm hardware and software, you can now do machine learning fast and efficiently on embedded devices, taking advantage of all the benefits of data processing at the edge.

Given the exponential growth in the number of IoT devices being deployed, and considering that the majority of them have an Arm Cortex-M microprocessor at their heart, it is only natural to want to make sure that these are running ML workloads efficiently. Now you know the steps required for deploying a trained Caffe model on an Arm Cortex-M based microcontroller device and how to optimize your neural network functions using the CMSIS-NN library.