Run Local LLMs Efficiently with Llama.CPP on Arm

This page is your technical guide to running open source large language models (LLMs) like LLaMA, Mistral, and Gemma using Llama.CPP on Arm-based CPUs. Learn how to install, optimize, and deploy local LLMs across platforms like AWS Graviton, Microsoft Cobalt, Google Axion, NVIDIA Grace, and Raspberry Pi.

Benefits of Running LLMs on Arm with Llama.CPP

Llama.CPP is a lightweight, high-performance inference engine optimized for Arm CPUs, enabling fast, private LLM execution without accelerators. It supports quantization, multi-threading, and NEON/SVE acceleration for efficient, low-latency deployment across cloud environments. Designed for real-world developer workflows, it integrates seamlessly with CI/CD pipelines and container-based systems. Using PyTorch? Go to the PyTorch Developer Launchpad.

Learn and Code

This section guides you through deploying quantized LLMs, building multi-agent systems, and creating RAG chatbots – optimized for CPU execution on cloud.

Performance Tools

This section gives you access to tools that help you profile performance, migrate existing apps, automate cloud deployment, and benchmark workloads on Arm-based platforms.

Resources	Decription
Streamline CLI	Collect and analyze performance data from Arm-based systems. Automate profiling workflows and integrate into CI pipelines.
Migrate Ease	Identify and adapt workloads for Arm-based cloud environments. Automates analysis and optimization for a smoother migration.
Runbooks	Step-by-step automation guides for configuring, running, and benchmarking workloads on Arm platforms.
AWS Q CLI	Quickly launch and benchmark Arm-based instances on AWS using a streamlined command-line interface.
AWS Perf (APerf)	Access low-level performance counters on Arm CPUs to analyze core behavior, frequency, and workload efficiency.

What's Next?

CODE-ALONGS
ARM DEVELOPER PROGRAM
COURSES AND LABS
DEVELOPER RESEARCH
MORE RESOURCES

Run Local LLMs Efficiently with Llama.CPP on Arm

Get Started

Setup

Learn and Code

Build a Local LLM Chatbot

Run Multi-Agent AI Locally

Build a RAG Chatbot

Run Hugging Face NLP Models on Arm

Arm Ecosystem Dashboard

Performance Tools

What's Next?

Build and Run a RAG Pipeline on Arm – On Demand

Arm Developer Program

Course: Optimizing Gen AI on Arm Processors

Arm Developer Labs

Arm Developer Council

Explore Arm Cloud Migration

Developer Launchpad – PyTorch on Arm

Developer Launchpad – Cloud Native

Developer Launchpad – CI/CD

Learning Paths

Servers and Cloud Computing