Entitlements could not be checked due to an error reaching the service. Showing non-confidential search results only.

Run Local LLMs Efficiently with Llama.CPP on Arm

This page is your technical guide to running open source large language models (LLMs) like LLaMA, Mistral, and Gemma using Llama.CPP on Arm-based CPUs. Learn how to install, optimize, and deploy local LLMs across platforms like AWS Graviton, Microsoft Cobalt, Google Axion, NVIDIA Grace, and Raspberry Pi.

 

Benefits of Running LLMs on Arm with Llama.CPP

 

Llama.CPP is a lightweight, high-performance inference engine optimized for Arm CPUs, enabling fast, private LLM execution without accelerators. It supports quantization, multi-threading, and NEON/SVE acceleration for efficient, low-latency deployment across cloud environments. Designed for real-world developer workflows, it integrates seamlessly with CI/CD pipelines and container-based systems. Using PyTorch? Go to the PyTorch Developer Launchpad.

Get Started

get started icon
Setup
code
Learn and Code
tool icon
Tools
Ecosystem icon
Ecosystem
Next steps icon
Next Steps

Setup

Llama logo

To start developing with Llama.CPP on Arm-based systems, you’ll first need to set up your environment. This includes installing the core C++ library, exploring the high-level Python API, and validating your setup by running a local LLM inference.

Before starting, make sure you have:

Learn and Code

This section guides you through deploying quantized LLMs, building multi-agent systems, and creating RAG chatbots – optimized for CPU execution on cloud.

Build a Local LLM Chatbot

Deploy a quantized LLaMA model using Llama.CPP and run it natively on Arm CPUs. Learn how to load the model, handle prompts, and optimize runtime – no GPU needed.

Start the Learning Path

Run Multi-Agent AI Locally

Spin up multiple LLM agents with Llama.CPP to build parallel, task-driven AI workflows. Perfect for experimentation with multi-agent orchestration on Arm systems.

Start the Learning Path

Build a RAG Chatbot

Use retrieval-augmented generation (RAG) to create a smarter chatbot with external context — powered by Llama.CPP on Arm. Prefer to follow along step-by-step with a video? Go to code-along.

Start the Learning Path

Run Hugging Face NLP Models on Arm

Deploy Hugging Face Transformers efficiently on Arm-based cloud platforms. Great for comparing performance between frameworks and workloads.

Start the Learning Path

Arm Ecosystem Dashboard

The Arm Ecosystem Dashboard is your go-to resource for discovering cloud services, tools, and software stacks optimized for Arm. Whether you’re deploying on AWS, Azure, or GCP, this page helps you find the right partners, platforms, and verified solutions to accelerate development on Arm-based infrastructure.

Explore Dashboard

Performance Tools

This section gives you access to tools that help you profile performance, migrate existing apps, automate cloud deployment, and benchmark workloads on Arm-based platforms.



Resources Decription
Streamline CLI Collect and analyze performance data from Arm-based systems. Automate profiling workflows and integrate into CI pipelines.
Migrate Ease Identify and adapt workloads for Arm-based cloud environments. Automates analysis and optimization for a smoother migration.
Runbooks Step-by-step automation guides for configuring, running, and benchmarking workloads on Arm platforms.
AWS Q CLI Quickly launch and benchmark Arm-based instances on AWS using a streamlined command-line interface.
AWS Perf (APerf) Access low-level performance counters on Arm CPUs to analyze core behavior, frequency, and workload efficiency.

What's Next?

  • CODE-ALONGS
  • ARM DEVELOPER PROGRAM
  • COURSES AND LABS
  • DEVELOPER RESEARCH
  • MORE RESOURCES
Workflow diagram with robot, cloud, and data connections.

Build and Run a RAG Pipeline on Arm – On Demand

Code-Along and Expert Q&A

Watch the step-by-step code-along and expert Q&A to build a complete RAG app with Hugging Face, LangChain, and FAISS – optimized for Arm-based infrastructure.

Sign-Up to Watch
Robot and satellite on a tech-themed background.

Arm Developer Program

Have a technical question about Llama.CPP on Arm cloud or Arm cloud migration?

Join the Arm Developer Program and connect with a global community of developers and Arm engineers to build better apps on Arm. Get early access to tools, technical content, workshops, and support to help you debug, optimize, and ship your projects.

Explore Program

Course: Optimizing Gen AI on Arm Processors

 

Learn to optimize generative AI workloads on Arm for mobile, edge and cloud through hands-on labs and lectures.

Explore course on GitHub

Arm Developer Labs

 

Tackle real-world Arm-based cloud challenges with hands-on projects — perfect for building, learning, and prototyping.

Explore Labs
Cloud-based system with robot, data, and user.

Arm Developer Council

Join the Arm Developer Council to share feedback, help shape the tools and platforms you use — and receive a voucher for your time.

Learn More