



### Getting started with Cortex-M software development and Arm Development Studio

Pareena Verma & Ronan Synnott, Arm

August 11, 2020

#### Al Virtual Tech Talks Series

| Date       | Title                                                                             | Host         |
|------------|-----------------------------------------------------------------------------------|--------------|
| Today      | Getting started with Arm Cortex-M software development and Arm Development Studio | Arm          |
| August, 25 | Efficient ML across Arm from Cortex-M to Web Assembly                             | Edge Impulse |
| Sept, 8    | Running accelerated ML applications on mobile and embedded devices using Arm NN   | Arm          |
| Sept, 22   | How to reduce AI bias with synthetic data for edge applications                   | Dori.Al      |

Visit: developer.arm.com/solutions/machine-learning-on-arm/ai-virtual-tech-talks



#### Agenda

- ➤ Introduction to Arm Cortex-M processor family
  - ➤ Highest performance Cortex-M7 processor
  - ➤ First Helium capable Cortex-M55 processor
- CMSIS-NN and TensorFlow Lite Micro
  - > Build and deploy a ML application with TensorFlow Lite and CMSIS-NN kernels
- > Q & A 1
- > Arm Development Studio and demonstration
  - ➤ Develop and debug your ML application on Arm Cortex-M7 and Cortex-M55 FVPs
- > Q & A 2





#### Cortex-M processor portfolio

#### Cortex-M7 High Maximum performance, performance control and TrustZone DSP Cortex-M33 Cortex-M55 Cortex-M35P Cortex-M3 Cortex-M4 Tamper Performance Flexibility, Mainstream Balanced Performance resistance, control and performance and efficiency control and flexibility, control efficiency efficiency for ML **DSP DSP** and DSP Cortex-M23 Cortex-M0 Cortex-M0+ Lowest Smallest area. Lowest cost, Highest energy power & area lowest power low power efficiency Armv6-M Armv7-M Armv8-M



#### Uplift in DSP and ML performance for Cortex-M



\*Existing processors with DSP extensions

Relative ML and signal processing performance



#### Cortex-M7 key features

https://developer.arm.com/ip-products/processors/cortex-m/cortex-m7

- High performance core with DSP capabilities
  - Powerful DSP instructions
  - SP/DP Floating Point Unit
  - Six stage dual-issue pipeline
  - 5.01 CoreMark/MHz
- Flexible memory systems
  - Up to 16MB tightly-coupled memories for real-time determinism
  - Memory Protection Unit (MPU) and up to 64kB caches
  - 64-bit AXI4 memory interface



#### Cortex-M55 key features

https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55

- High performance core with DSP and vector processing capabilities
  - Helium vector processing technology
    - Configurable as integer-only or integer + floating-point
  - Powerful DSP instructions
  - HP/SP/DP Floating Point Unit
    - Vector HP/SP
  - Four stage pipeline
  - 4.2 CoreMark/MHz
- Flexible memory systems
  - Up to 16MB tightly-coupled memories for real-time determinism
  - Memory Protection Unit (MPU) and up to 64kB caches
  - 64-bit AXI5 memory interface



#### Cortex-M55 arithmetic configuration options

#### **Helium Configuration**

FPU Configuration

|        |                     | MVE=0<br>(No MVE)                                     | MVE=1<br>(Integer MVE)                                | MVE=2<br>(Integer and Floating-point MVE)             |
|--------|---------------------|-------------------------------------------------------|-------------------------------------------------------|-------------------------------------------------------|
| J<br>1 | FPU=0<br>(No FPU)   | Scalar Integer                                        | <ul> <li>Vector Integer</li> </ul>                    | • N/A                                                 |
|        | FPU=1<br>(With FPU) | <ul><li>Scalar Integer</li><li>Scalar Float</li></ul> | <ul><li>Vector Integer</li><li>Scalar Float</li></ul> | <ul><li>Vector Integer</li><li>Vector Float</li></ul> |







# CMSIS-NN and TensorFlow Lite Micro

#### Meet the TensorFlow Family



Model Creation
Model Training
Inference in cloud
Computationally Large
Google TPU



© 2020 Arm Limited (or its affiliates)



TensorFlow models made suitable for Edge devices Inference on device only No reliance on network connectivity

E.g. Speech recognition

and NLP in 80Mb
Smartphones
Cortex-A Class IoT



## **TensorFlow**Lite Micro

Bare metal TensorFlow Lite runtime for Arm MCU
Inference on device only
No reliance on network connectivity
Runs on MCU in 10'sK
Ultra Low Power- always on\*
Cortex-M



#### TensorFlow Lite Micro



- A version of TensorFlow Lite designed to run on Microcontrollers:
  - Less than 16KB binary footprint + Inference Model (typically < 100K)</li>
  - No memory allocation
  - Maintains the TensorFlow Lite API's
  - Examples to get started on Arm Cortex-M
    - https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/
- Several Arm Fixed Virtual Platforms (FVPs) available today for software development
  - Start porting your ML workloads to Arm Cortex-M systems without the need for a physical target
  - Programmer's view, which gives you a comprehensive model on which to build and test your software
  - <a href="https://developer.arm.com/tools-and-software/simulation-models/fixed-virtual-platforms">https://developer.arm.com/tools-and-software/simulation-models/fixed-virtual-platforms</a>



#### Software Architecture - CMSIS-NN



- Open Source : <u>https://github.com/ARM-software/CMSIS\_5</u>
- An optimized kernel library for Cortex-M
  - Supports TFLite operators
  - Broadly equivalent to the Arm Compute Library for Cortex-A CPUs
- Offline flow creates a binary for Cortex-M based platforms
- Targets the Cortex-M architectures
  - Armv6-M/Armv7-M/Armv8-M and Armv8.1-M with MVE support
  - Runs on earlier versions of the architecture



#### CMSIS-NN and TensorFlow Lite for Microcontrollers

Access to optimized kernels through TensorFlow Lite micro



- Support for optimized bit exact int8 kernels
- Fallback on reference kernels when optimization is not available



#### Build TensorFlow Lite micro examples for Cortex-M7 and Cortex-M55 FVPs



- Start with the examples in the TensorFlow repo <u>https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/</u>
- Add support for your target platform (Cortex-M7 and Cortex-M55 FVP)
- Generate project/binary with 'TAGS=cmsis-nn armclang'
- Build the binary for a simple application (micro\_speech\_test) that runs an inference on Cortex-M FVP

Docker project in Arm Tool-Solutions repo on Github <a href="https://github.com/ARM-software/Tool-Solutions/tree/master/docker/tensorflow-lite-micro-fvp">https://github.com/ARM-software/Tool-Solutions/tree/master/docker/tensorflow-lite-micro-fvp</a>



#### TensorFlow Lite Micro examples – micro\_speech

Audio detection to detect the words "yes" and "no"









#### **Arm Development Studio**

The most comprehensive embedded C/C++ dedicated software development solution



#### Compiler

Best-in-class, safety-certified code compilation designed alongside Arm IP



#### Virtual prototypes



Architectural exploration and early software development



#### Multi-Core Debugger

Family of debug tools for silicon bring-up and software development for complex SoCs



#### Streamline



Visibility of software performance running in Linux or baremetal



#### Arm Fast Models and Fixed Virtual Platforms (FVPs)



Earliest architecture support



Accelerate time to market



- Software development platforms for leading edge Arm IP
  - Available as pre-configured FVPs and model portfolio to build custom platforms
  - Earliest access to simulation models for Arm CPU and SystemIP
  - Ideal for software validation and continuous integration environments
- Performance, fidelity and flexibility
  - High-performance models for software developers
  - Accuracy proven against IP validation suites, used in IP validation flows







#### Get started

- Get a free 30-day evaluation of Arm Development Studio
  - https://developer.arm.com/tools-and-software/embedded/arm-development-studio/evaluate
- Download the examples used in this webinar
  - git clone <a href="https://github.com/ARM-software/Tool-Solutions/">https://github.com/ARM-software/Tool-Solutions/</a>
- For more information on Helium technology and get access to programmer's guides
  - https://developer.arm.com/architectures/instruction-sets/simd-isas/helium
- Contact us
  - Arm-Tool-Solutions@arm.com









# Join our next virtual tech talk: Efficient ML across Arm from Cortex-M to Web Assembly by Edge Impulse

Tuesday 25 August

Register here:

developer.arm.com/solutions/machine-learning-on-arm/ai-virtual-tech-talks



Thank You Danke Merci

谢谢 ありがとう

Gracias

Kiitos 사합니다

धन्यवाद

شکِرًا

תודה



The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks