Arm Machine Learning Processor 

Industry-leading performance and efficiency for inference at the edge.

Based on a new, class-leading architecture, the Arm ML processor's optimized design enables new features, enhances user experience and delivers innovative applications for a wide array of market segments including mobile, IoT, embedded, automotive, and infrastructure. It provides a massive uplift in efficiency compared to CPUs, GPUs and DSPs through efficient convolution, sparsity and compression. 

Download the datasheet

Key Features

  • Specially designed to provide outstanding performance for mobile with 4 TOP/s and efficiency of 5 TOP/W; additional optimizations provide a further increase in real-world use cases.
  • Programmable layer engines for future-proofing.
  • Incorporates a variety of compression technologies to minimize system memory bandwidth.
  • Highly tuned for advanced geometry implementations.
  • Supports secure operating mode to protect DNN IP and data.
  • High responsiveness reduces latency to improve user experience.
  • Supports TrustZone system security for secure operating mode and configurable secure queues for multiple users, flexible processing in the TEE or SEE for secure cases like biometric payment, protecting content for high-value media streams.
 
 

Key Benefits

  • Enables ML processing on the edge, saving power, reducing data consumption and enhancing user privacy.
  • Flexible design supports a variety of popular neural networks, including CNNs and RNNs, for classification, object detection, image enhancements, speech recognition and natural language understanding.
  • Winograd accelerates common filters by 225% compared to other NPUs, allowing more performance in less area.
  • Minimizes system memory bandwidth by 1.5-3x through a variety of compression technologies, targeting both weight and activation feature maps.
  • Tight system integration through ACE-Lite master port and optional SMMU integration allows for support and protection of memory and easy handling of multiple users.
  • The Arm Machine Learning processor is compatible with Arm NN, an inference engine for CPUs, GPUs and NPUs that bridges the gap between existing NN frameworks and the underlying IP.
Machine Learning Processor Block Diagram.

Find out more

Find out more about Arm Machine Learning processor

Contact us

To learn more about Machine Learning on Arm, visit our ML Developer community.

Learn more

 


Specifications

Key Features Performance
(at 1Ghz)
4 TOP/s

Data Types
Int-8 and Int-16

Network Support
CNN and RNN

Efficient Convolution Winograd support

Sparsity Yes

Secure Mode TEE or SEE

Multi-Core Capability 8 NPUs in a cluster
64 NPUs in a mesh
Memory System Embedded SRAM 1MB

Bandwidth Reduction Extended compression technology, layer/operator fusion

Main Interface 1xAXI4 (128-bit), ACE-5 Lite
Development Platform Neural Frameworks TensorFlow, TensorFlow Lite, Caffe2, PyTorch, MXNet, ONNX

Neural Operator API ArmNN, AndroidNN

Software Components ArmNN, neural compiler, driver and support library

Debug and Profile Layer-by-layer visibility

Evaluation and Early Prototyping Arm Juno FPGA systems and cycle models

 


Applications

Mobile

AR/VR

IoT

Smart camera

Healthcare

Medical

Logistics

Start of internet connection.

STB/DTV

Robotics

Home

Consumer 

Drones

A stack of servers.

Infrastructure

Get Support

Arm Support

Arm training courses and on-site system-design advisory services enable licensees to efficiently integrate the Arm ML processor into their design to realize maximum system performance with lowest risk and fastest time-to-market.

Arm training courses  Arm Design Reviews  Open a support case

Community Blogs

Community Forums

Suggested answer Can i change SP at run time in CM33?
  • Arm Development Studio
  • Cortex-M33
  • Armv8-M
0 votes 202 views 3 replies Latest 14 hours ago by Joseph Yiu Answer this
Suggested answer Memory Protection Unit - Complexity in usage 0 votes 520 views 7 replies Latest 16 hours ago by Andy Neil Answer this
Suggested answer Monitor Mode Debug 0 votes 610 views 7 replies Latest 18 hours ago by Andy Neil Answer this
Not answered Audio mixing efficiently and hard realtime requirment
  • algorithms
  • audio
  • Digital Signal Processor (DSP)
  • Cortex-M4
  • STM32 F4
0 votes 69 views 0 replies Started yesterday by Manojkumar Subramaniam Answer this
Not answered Are there any Cortex-M controller with h.264 encoder? 0 votes 67 views 0 replies Started yesterday by Vick Answer this
Answered 32-bit encoding hex values for Arm instructions 0 votes 298 views 3 replies Latest yesterday by BQL Answer this
Suggested answer Can i change SP at run time in CM33? Latest 14 hours ago by Joseph Yiu 3 replies 202 views
Suggested answer Memory Protection Unit - Complexity in usage Latest 16 hours ago by Andy Neil 7 replies 520 views
Suggested answer Monitor Mode Debug Latest 18 hours ago by Andy Neil 7 replies 610 views
Not answered Audio mixing efficiently and hard realtime requirment Started yesterday by Manojkumar Subramaniam 0 replies 69 views
Not answered Are there any Cortex-M controller with h.264 encoder? Started yesterday by Vick 0 replies 67 views
Answered 32-bit encoding hex values for Arm instructions Latest yesterday by BQL 3 replies 298 views