Entitlements could not be checked due to an error reaching the service. Showing non-confidential search results only.

Accelerate Mobile AI on Arm With SME2

Build faster, more efficient mobile AI apps with Arm Scalable Matrix Extension 2 (SME2). This guide shows you how to run and optimize on-device Large Language Models (LLMs), voice, vision, and GenAI workloads using SME2-enabled hardware, supported frameworks, and tools for Android and iOS.


What is SME2 and Why Should Developers Use It?


SME2 is Arm’s latest CPU extension for accelerating matrix-oriented compute workloads directly on-device. It is designed to improve performance for AI and ML models – particularly those relying on operations like matrix multiplication, common in Transformers, CNNs, and LLMs.


Orange AI ML Dev

Up to 6× faster inference on models like Google’s Gemma 3

Orange Mobile Gen AI Dev icon

Supported natively in PyTorch, LiteRT, and ONNX Runtime

Orange Faster AI icon

Available on iPhone 16 and Apple M4-series chips; Android support coming soon

Get Started

Supported Frameworks

SME2 is seamlessly integrated with the following frameworks, enabling Kleidi Libraries to automatically accelerate compute-intensive workloads with minimal changes to your existing codebase.

LiteRT logo
ExecuTorch logo
ONNX logo
MNN logo
LLaMA lgo
Angel logo
MediaPipe logo
OpenCV

Get Started - What Are You Developing?

  • GENERATIVE AI
  • VOICE AND VISION
  • LIBRARIES AND FRAMEWORKS
  • Arm SIMD Instructions

Designed for application developers, this section showcases real-world examples – including LLMs, audio generation, and multimodal LLMs – running directly on Arm CPUs using KleidiAI, ExecuTorch, ONNX Runtime, and MediaPipe. It assumes a foundational understanding of Android development and familiarity with Android Studio.



Resources Framework Description
Generate Audio with Stable Audio on LiteRT LiteRT Learn how to deploy the Stable Audio Open Small text-to-audio model using LiteRT on Android and macOS.
Vision LLM Inference on Android with KleidiAI + MNN MNN Run Vision Transformers (ViT) efficiently on Android with KleidiAI and MNN in this beginner-friendly path.
Build an Android Chat App with ExecuTorch + Llama 3 – ExecuTorch Learning Path + Docs PyTorch / ExecuTorch Step-by-step guide to build a lightweight, real-time Llama 3 chat app on Arm-based Android devices.
Build a Chatbot on Android with ONNX Runtime ONNX Learn to build a powerful Android chat app using the ONNX Runtime and Generate() API for efficient inference.
Multimodal AI on Android with MediaPipe + KleidiAI - Build a Selfie App  + Run LLM Inference MediaPipe Develop high-performance multimodal apps using MediaPipe, KleidiAI, and XNNPACK – from selfie filters to LLM integration.
Neural Network Quantization for Mobile AI N/A Explore key quantization techniques to reduce model size and improve performance for on-device AI.

Designed for application developers, this section walks you through accelerating voice assistants, enhancing camera pipelines, and optimizing computer vision apps using frameworks like KleidiAI, PyTorch, and OpenCV. It assumes foundational knowledge of Android development, including experience with Android Studio.



Resources Framework Description
Accelerate Voice Assistants with KleidiAI + SME2 Whisper.cpp and Llama.cpp Learn how to optimize Voice Assistant performance on Android using KleidiAI and SME2.
Enhance Camera Effects with AI Optimization LiteRT with XNNPack Discover how KleidiAI and KleidiCV can optimize camera pipelines for real-time visual effects on Android.
Train a Digit Classifier with PyTorch for Android PyTorch Learn to train a digit classification model with PyTorch and optimize it for Android deployment.
Accelerate OpenCV on Android with KleidiCVCV Camera App With OpenCV + Face Detection on Android OpenCV Three Learning Paths on using KleidiCV and SME2 to accelerate OpenCV apps on Android – from basics to face detection.

Designed for library and framework developers, this section introduces lightweight, open-source libraries for accelerating AI/ML frameworks, tools, and libraries with open-source Arm Kleidi Libraries.



Resources Description
Arm Kleidi Libraries Lightweight, open-source libraries for accelerating AI and ML workloads – an alternative to Arm Compute Library (ACL) with lower overhead.
Accelerate Generative AI workloads using KleidiAI Learn how to accelerate GenAI workloads with KleidiAI. Includes a step-by-step guide for running key functions like Gemma LLM inference. Read launch blog.
Arm KleidiCV GitLab Repo A high-performance library for computer vision. Integrates easily with any CV framework to accelerate image processing on Arm-based devices. Read launch blog.

Designed for developers directly using the Arm SIMD Instruction Set, this section provides practical SME2 examples, compiler toolchain insights, and low-level programming techniques in C/C++ and assembly.



Resources Decription
Introduction to SME2 blogs
Part 1 – SME2 Overview
Part 2 – SME2 Architecture Deep Dive
Part 3 – Matrix Multiplication
A 3-part blog series introducing Arm SME2—covering architecture, programming model, and comparisons with NEON and SVE.
SME2 Semantics, Toolchains & Code Examples A programmer’s guide to Arm SME2, including architecture, semantics, and how to accelerate matrix workloads on Armv9-A CPUs.
SME2 Glossary & Intrinsics Reference Technical reference for SME and SME2 intrinsics in C/C++, including descriptions, syntax, and usage examples.
Accelerate Matrix Multiplication with SME2 Advanced Learning Path for applying SME2 to optimize matrix multiplication on Arm-based platforms.
Function Multiversioning for SME2, NEON & SVE2 Learn to optimize C/C++ apps across SIMD instruction sets using function multiversioning for performance portability.
Arm SIMD Extensions Best Practice  Optimize your AI/ML workloads with Arm SIMD code, either in assembly or using Arm Intrinsics in C/C++, to leverage huge performance gains. 

Tools and Libraries

Use these tools to profile, tune, and deploy your AI workloads after model selection and initial integration. These tools support low-level ML optimization by targeting Arm-specific features and analyzing system performance across compute and memory.


Arm Performance Studio

For Android app developers – profile performance for AI/ML workloads on mobile.

Access Learning Path

Arm KleidiAI

For framework and library developers, this Learning Path covers accelerating GenAI workloads with KleidiAI, featuring examples with Gemma LLM.

Access Learning Path

Matrix Multiplication with SME2

Learn how to accelerate matrix multiplication on Apple M4 devices and iPhone16 with SME2.

Access Learning Path

What's Next?

  • CODE-ALONGS
  • DEVELOPER PROGRAM
  • COURSES and LABS
  • DEVELOPER RESEARCH
  • MORE RESOURCES
Workflow diagram with robot, cloud, and data connections.

Generate AI Audio on Arm Devices with Stability AI and LiteRT

Code-Along and Expert Q&A

August 28 | 9 a.m. PT | 6 p.m. BST

Join this session to code along with experts and deploy Stability AI’s audio model on Android using LiteRT.

Register Now
Robot and satellite on a tech-themed background.

Arm Developer Program

Have a technical question about AI applications on Arm?

Join the Arm Developer Program and connect with a global community of developers and Arm engineers to build better apps on Arm. Get early access to tools, technical content, workshops, and support to help you debug, optimize, and ship your projects.

Explore Program
Cloud-based system with robot, data, and user.

Arm Developer Council

Join the Arm Developer Council to share feedback, help shape the tools and platforms you use — and receive a voucher for your time.

Learn More