Run Local LLMs Efficiently with Llama.CPP on Arm
This page is your technical guide to running open source large language models (LLMs) like LLaMA, Mistral, and Gemma using Llama.CPP on Arm-based CPUs. Learn how to install, optimize, and deploy local LLMs across platforms like AWS Graviton, Microsoft Cobalt, Google Axion, NVIDIA Grace, and Raspberry Pi.
Benefits of Running LLMs on Arm with Llama.CPP
Llama.CPP is a lightweight, high-performance inference engine optimized for Arm CPUs, enabling fast, private LLM execution without accelerators. It supports quantization, multi-threading, and NEON/SVE acceleration for efficient, low-latency deployment across cloud environments. Designed for real-world developer workflows, it integrates seamlessly with CI/CD pipelines and container-based systems. Using PyTorch? Go to the PyTorch Developer Launchpad.
Get Started
Setup
To start developing with Llama.CPP on Arm-based systems, you’ll first need to set up your environment. This includes installing the core C++ library, exploring the high-level Python API, and validating your setup by running a local LLM inference.
Before starting, make sure you have:
Learn and Code
This section guides you through deploying quantized LLMs, building multi-agent systems, and creating RAG chatbots – optimized for CPU execution on cloud.
Build a Local LLM Chatbot
Deploy a quantized LLaMA model using Llama.CPP and run it natively on Arm CPUs. Learn how to load the model, handle prompts, and optimize runtime – no GPU needed.
Run Multi-Agent AI Locally
Spin up multiple LLM agents with Llama.CPP to build parallel, task-driven AI workflows. Perfect for experimentation with multi-agent orchestration on Arm systems.
Build a RAG Chatbot
Use retrieval-augmented generation (RAG) to create a smarter chatbot with external context — powered by Llama.CPP on Arm. Prefer to follow along step-by-step with a video? Go to code-along.
Run Hugging Face NLP Models on Arm
Deploy Hugging Face Transformers efficiently on Arm-based cloud platforms. Great for comparing performance between frameworks and workloads.
Arm Ecosystem Dashboard
The Arm Ecosystem Dashboard is your go-to resource for discovering cloud services, tools, and software stacks optimized for Arm. Whether you’re deploying on AWS, Azure, or GCP, this page helps you find the right partners, platforms, and verified solutions to accelerate development on Arm-based infrastructure.
Performance Tools
This section gives you access to tools that help you profile performance, migrate existing apps, automate cloud deployment, and benchmark workloads on Arm-based platforms.
Resources | Decription |
Streamline CLI | Collect and analyze performance data from Arm-based systems. Automate profiling workflows and integrate into CI pipelines. |
Migrate Ease | Identify and adapt workloads for Arm-based cloud environments. Automates analysis and optimization for a smoother migration. |
Runbooks | Step-by-step automation guides for configuring, running, and benchmarking workloads on Arm platforms. |
AWS Q CLI | Quickly launch and benchmark Arm-based instances on AWS using a streamlined command-line interface. |
AWS Perf (APerf) | Access low-level performance counters on Arm CPUs to analyze core behavior, frequency, and workload efficiency. |
What's Next?
- CODE-ALONGS
- ARM DEVELOPER PROGRAM
- COURSES AND LABS
- DEVELOPER RESEARCH
- MORE RESOURCES

Build and Run a RAG Pipeline on Arm – On Demand
Watch the step-by-step code-along and expert Q&A to build a complete RAG app with Hugging Face, LangChain, and FAISS – optimized for Arm-based infrastructure.

Arm Developer Program
Have a technical question about Llama.CPP on Arm cloud or Arm cloud migration?
Join the Arm Developer Program and connect with a global community of developers and Arm engineers to build better apps on Arm. Get early access to tools, technical content, workshops, and support to help you debug, optimize, and ship your projects.
Course: Optimizing Gen AI on Arm Processors
Learn to optimize generative AI workloads on Arm for mobile, edge and cloud through hands-on labs and lectures.
Arm Developer Labs
Tackle real-world Arm-based cloud challenges with hands-on projects — perfect for building, learning, and prototyping.

Arm Developer Council
Join the Arm Developer Council to share feedback, help shape the tools and platforms you use — and receive a voucher for your time.