Arm Machine Learning Processor 

Industry-leading performance and efficiency for inference at the edge.

Machine Learning Processor Block Diagram.

The Arm Machine Learning processor is an optimized, ground-up design for machine learning acceleration, targeting mobile and adjacent markets. The solution consists of state-of-the-art optimized fixed-function engines that provide best-in-class performance within a constrained power envelope.

Additional programmable layer engines support the execution of non-convolution layers, and the implementation of selected primitives and operators, along with future innovation and algorithm generation. The network control unit manages the overall execution and traversal of the network and the DMA moves data in and out of the main memory.

Onboard memory allows central storage for weights and feature maps, thus reducing traffic to the external memory and therefore, power.

Key Features

  • Specially designed to provide outstanding performance for mobile with up to 4.1 TOPs; additional optimizations provide a further increase in real-world use cases.
  • Best-in-class efficiency at >3 TOP/ W.
  • Programmable layer engines for future-proofing.
  • Highly tuned for advanced geometry implementations.
  • Scalable onboard memory reduces external memory traffic.
  • Arm NN works with Android NNAPI to provide a translation layer between major neural network frameworks, such as TensorFlow and Caffe, and the Arm Machine Learning processor, as well as other Arm IP.


  • Greater than 4.1 TOPs in mobile environments.
  • Propriety optimizations provide further increase in real-world use cases.
  • Efficiency of >3 TOP/W.

Find out more

Find out more about Arm Machine Learning processor

Contact us

To learn more about Machine Learning on Arm, visit our ML Developer community.

Learn more


Key Benefits

  • Most efficient solution to run neural networks.
  • Designed for the mobile and adjacent markets.
  • Optimized, ground-up design for machine learning acceleration.
  • Best-in-class performance with state-of-the-art, fixed-function engines.
  • Programmable engines for future innovation and algorithms.
  • Massive efficiency uplift from CPUs, GPUs, DSPs and accelerators.
  • Completes Arm’s heterogeneous Machine Learning platform solution.
  • Enabled by open-source software.
  • Industry-leading performance in thermally- and cost-constrained environments.
  • When combined with the Arm Object Detection processor, provides highly efficient and optimized people detection.





Smart camera




Small area






Webinar - Project Trillium: Optimizing ML Performance for any Application

Project Trillium is a suite of Arm IP designed to deliver scalable ML and neural network functionality at any point on the performance curve, from sensors, to mobile, and beyond. 


Find out more