Accelerate Mobile AI on Arm With SME2
Build faster, more efficient mobile AI apps with Arm Scalable Matrix Extension 2 (SME2). This guide shows you how to run and optimize on-device Large Language Models (LLMs), voice, vision, and GenAI workloads using SME2-enabled hardware, supported frameworks, and tools for Android and iOS.
What is SME2 and Why Should Developers Use It?
SME2 is Arm’s latest CPU extension for accelerating matrix-oriented compute workloads directly on-device. It is designed to improve performance for AI and ML models – particularly those relying on operations like matrix multiplication, common in Transformers, CNNs, and LLMs.
Up to 6× faster inference on models like Google’s Gemma 3
Supported natively in PyTorch, LiteRT, and ONNX Runtime
Available on iPhone 16 and Apple M4-series chips; Android support coming soon
Supported Frameworks
SME2 is seamlessly integrated with the following frameworks, enabling Kleidi Libraries to automatically accelerate compute-intensive workloads with minimal changes to your existing codebase.


Get Started - What Are You Developing?
- GENERATIVE AI
- VOICE AND VISION
- LIBRARIES AND FRAMEWORKS
- Arm SIMD Instructions
Designed for application developers, this section showcases real-world examples – including LLMs, audio generation, and multimodal LLMs – running directly on Arm CPUs using KleidiAI, ExecuTorch, ONNX Runtime, and MediaPipe. It assumes a foundational understanding of Android development and familiarity with Android Studio.
Resources | Framework | Description |
Generate Audio with Stable Audio on LiteRT | LiteRT | Learn how to deploy the Stable Audio Open Small text-to-audio model using LiteRT on Android and macOS. |
Vision LLM Inference on Android with KleidiAI + MNN | MNN | Run Vision Transformers (ViT) efficiently on Android with KleidiAI and MNN in this beginner-friendly path. |
Build an Android Chat App with ExecuTorch + Llama 3 – ExecuTorch Learning Path + Docs | PyTorch / ExecuTorch | Step-by-step guide to build a lightweight, real-time Llama 3 chat app on Arm-based Android devices. |
Build a Chatbot on Android with ONNX Runtime | ONNX | Learn to build a powerful Android chat app using the ONNX Runtime and Generate() API for efficient inference. |
Multimodal AI on Android with MediaPipe + KleidiAI - Build a Selfie App + Run LLM Inference | MediaPipe | Develop high-performance multimodal apps using MediaPipe, KleidiAI, and XNNPACK – from selfie filters to LLM integration. |
Neural Network Quantization for Mobile AI | N/A | Explore key quantization techniques to reduce model size and improve performance for on-device AI. |
Designed for application developers, this section walks you through accelerating voice assistants, enhancing camera pipelines, and optimizing computer vision apps using frameworks like KleidiAI, PyTorch, and OpenCV. It assumes foundational knowledge of Android development, including experience with Android Studio.
Resources | Framework | Description |
Accelerate Voice Assistants with KleidiAI + SME2 | Whisper.cpp and Llama.cpp | Learn how to optimize Voice Assistant performance on Android using KleidiAI and SME2. |
Enhance Camera Effects with AI Optimization | LiteRT with XNNPack | Discover how KleidiAI and KleidiCV can optimize camera pipelines for real-time visual effects on Android. |
Train a Digit Classifier with PyTorch for Android | PyTorch | Learn to train a digit classification model with PyTorch and optimize it for Android deployment. |
Accelerate OpenCV on Android with KleidiCV – CV Camera App With OpenCV + Face Detection on Android | OpenCV | Three Learning Paths on using KleidiCV and SME2 to accelerate OpenCV apps on Android – from basics to face detection. |
Designed for library and framework developers, this section introduces lightweight, open-source libraries for accelerating AI/ML frameworks, tools, and libraries with open-source Arm Kleidi Libraries.
Resources | Description |
Arm Kleidi Libraries | Lightweight, open-source libraries for accelerating AI and ML workloads – an alternative to Arm Compute Library (ACL) with lower overhead. |
Accelerate Generative AI workloads using KleidiAI | Learn how to accelerate GenAI workloads with KleidiAI. Includes a step-by-step guide for running key functions like Gemma LLM inference. Read launch blog. |
Arm KleidiCV GitLab Repo | A high-performance library for computer vision. Integrates easily with any CV framework to accelerate image processing on Arm-based devices. Read launch blog. |
Designed for developers directly using the Arm SIMD Instruction Set, this section provides practical SME2 examples, compiler toolchain insights, and low-level programming techniques in C/C++ and assembly.
Resources | Decription |
Introduction to SME2 blogs
Part 1 – SME2 Overview Part 2 – SME2 Architecture Deep Dive Part 3 – Matrix Multiplication |
A 3-part blog series introducing Arm SME2—covering architecture, programming model, and comparisons with NEON and SVE. |
SME2 Semantics, Toolchains & Code Examples | A programmer’s guide to Arm SME2, including architecture, semantics, and how to accelerate matrix workloads on Armv9-A CPUs. |
SME2 Glossary & Intrinsics Reference | Technical reference for SME and SME2 intrinsics in C/C++, including descriptions, syntax, and usage examples. |
Accelerate Matrix Multiplication with SME2 | Advanced Learning Path for applying SME2 to optimize matrix multiplication on Arm-based platforms. |
Function Multiversioning for SME2, NEON & SVE2 | Learn to optimize C/C++ apps across SIMD instruction sets using function multiversioning for performance portability. |
Arm SIMD Extensions Best Practice | Optimize your AI/ML workloads with Arm SIMD code, either in assembly or using Arm Intrinsics in C/C++, to leverage huge performance gains. |
Tools and Libraries
Use these tools to profile, tune, and deploy your AI workloads after model selection and initial integration. These tools support low-level ML optimization by targeting Arm-specific features and analyzing system performance across compute and memory.
Arm Performance Studio
For Android app developers – profile performance for AI/ML workloads on mobile.
Arm KleidiAI
For framework and library developers, this Learning Path covers accelerating GenAI workloads with KleidiAI, featuring examples with Gemma LLM.
Matrix Multiplication with SME2
Learn how to accelerate matrix multiplication on Apple M4 devices and iPhone16 with SME2.
What's Next?
- CODE-ALONGS
- DEVELOPER PROGRAM
- COURSES and LABS
- DEVELOPER RESEARCH
- MORE RESOURCES

Generate AI Audio on Arm Devices with Stability AI and LiteRT
August 28 | 9 a.m. PT | 6 p.m. BST
Join this session to code along with experts and deploy Stability AI’s audio model on Android using LiteRT.

Arm Developer Program
Have a technical question about AI applications on Arm?
Join the Arm Developer Program and connect with a global community of developers and Arm engineers to build better apps on Arm. Get early access to tools, technical content, workshops, and support to help you debug, optimize, and ship your projects.

Arm Developer Council
Join the Arm Developer Council to share feedback, help shape the tools and platforms you use — and receive a voucher for your time.