Analyze and Optimize Workloads on Arm Neoverse
Arm Performix is a performance analysis toolkit for server and cloud developers building on Arm-based infrastructure. It helps teams understand and optimize software performance on Arm with function-level insights, visualizations, and guided recommendations derived from hardware-level execution data.
As compute evolves toward platforms such as Arm AGI CPU, these capabilities become more important for scaling efficiently across concurrent agents and distributed workloads.
Performix makes performance analysis accessible, repeatable, and automatable for cloud-native software and emerging agentic workflows.
Download Arm Performix Toolkit
Download the full toolkit (UI + CLI), or choose the CLI only.
Download Arm Performix CLI
Arm Community Forum
Need help with Arm Performix or want to share feedback with Arm?
Arm Performix Code-Along
Learn how to surface bottlenecks fast and optimize your code with actionable performance data, live on 26 May.
Get Started with Guides and Tutorials
Get started with Arm Performix using Learning Paths and the user guide.
Arm Performix Install Guide.
Find code hotspots with Arm Performix.
Tune application performance with Arm Performix CPU microarchitecture analysis.
Learn how to capture and analyze performance data with the Arm Performix User Guide.
Purpose-Built Performance Optimization
Built for the Arm Infrastructure Ecosystem
Arm Performix works on Arm Neoverse platforms such as AWS Graviton, Microsoft Cobalt and Google Axion, and also supports future platforms using the Arm AGI CPU. It is designed for:
- Cloud backend and application developers
- Platform and system software teams
- Core library, runtime, and framework developers
- Performance and infrastructure engineers
Why Use Arm Performix
Quickly move from identifying bottlenecks to making effective optimizations without needing deep expertise in CPU microarchitecture:
- Find hotspots in complex applications
- Understand why performance is limited
- Focus optimization effort where it matters
- Work efficiently with CI/CD integrations
Tool Benefits and Features
Accelerated
Insights
Insights
Identify performance bottlenecks in large server and cloud applications within minutes, so you can focus on fixing the issues that matter. Arm Performix combines clear visualizations such as code hotspots and CPU microarchitecture behavior, with guided analysis to reduce trial and error and help you move quickly from observation to action.
Optimization,
Simplified
Simplified
Performance analysis on Arm should be accessible, repeatable, and effective for every developer. Arm Performix combines detailed profiling with Arm’s architectural expertise to deliver clear, actionable insights, whether you’re investigating your first bottleneck or performing deep, low-level optimization on critical code paths.
Seamless
Integration
Integration
Arm Performix is designed to fit into modern cloud development workflows, from local investigation to automated regression testing. You can run analysis from the command-line, integrate it into CI/CD pipelines, and access insights directly within development environments, supporting increasingly automated and agent-driven workflows on Arm-based infrastructure.
Favourite editor via Arm MCP Server
FAQs
Arm Performix is Arm’s performance profiling and analysis toolkit for developers building and running performance-critical workloads on Arm-based systems. It profiles running applications, collects system- and hardware-level metrics, and translates them into guided insights that help identify CPU, memory, and system bottlenecks.
Arm Performix supports large-scale cloud, infrastructure, and AI workloads on Arm platforms including AWS Graviton, Microsoft Cobalt, Google Axion, and next-generation Arm solutions.
Arm Performix is built for engineers responsible for validating, profiling, and optimizing workloads on Arm across development, deployment, and production:
- System and Platform Engineers – platform bring-up, firmware, kernel, hardware validation
- Core Library Developers – C/C++, SIMD/vectorization, performance-critical primitives
- Runtime and Compiler Engineers – JITs, toolchains, AI frameworks, distributed runtimes
- Cloud and Backend Developers – hyperscale services, SaaS platforms, CI/CD regression monitoring
Arm Performix reduces root cause analysis time from hours to minutes and lowers the barrier to production-grade performance optimization on Arm.
- Multi-core, NUMA, and heterogeneous compute architectures
- Performance interactions across CPU, memory, networking, and storage
- OS-level, hardware-level, and application-level behavior
With a single command, Arm Performix performs application and system profiling on Arm-based servers. Connecting to a local, on-prem, or cloud-based Arm system. It also:
- Profiles a running workload
- Collects system-level and application-level performance counter data
- Identifies bottlenecks across compute, memory bandwidth, and core utilization
- Provides guided next steps for optimization
Rather than exposing raw counters alone, Arm Performix explains what metrics mean and why they matter, enabling faster, more confident optimization decisions.
Arm Performix supports Arm Neoverse-based systems running Linux, including cloud environments such as AWS Graviton, Microsoft Cobalt, Google Axion, and next-generation Arm solutions. It is designed for both on-premises data center deployments and large-scale public cloud infrastructure.
Arm Performix is free to download and use, so you can start analyzing and optimizing performance right away.
Yes. Arm Performix produces machine-readable output suitable for automation and regression tracking. Engineering teams can integrate it into CI/CD pipelines to:
- Detect performance regressions early
- Track optimization impact over time
- Compare performance across architectures
- Validate scaling characteristics before production rollout
This makes continuous performance validation a standard part of cloud-native development workflows.
Arm Performix is designed to be accessible to all engineers without requiring deep Arm microarchitecture knowledge. It provides guided, architecture-aware insights that contextualize performance data and highlight root causes directly.
Traditional CPU profiling and performance analysis tools often expose raw hardware counters without guidance. Arm Performix differs by providing:
- A unified data collection and analysis workflow
- Cross-layer visibility (system and application)
- Bottleneck identification across compute and memory subsystems
- Actionable, architecture-aware recommendations
- Automation-ready outputs
It eliminates tool-hopping and manual counter interpretation, delivering production-ready performance insight tailored specifically for Arm environments.
For AI inference services, distributed runtimes, and large-scale backend systems, performance efficiency directly impacts infrastructure cost, fleet sizing, and power consumption.
Arm Performix helps teams profile AI inference workloads and cloud services running on Arm infrastructure by:
- Validating workload scaling characteristics
- Optimizing memory bandwidth utilization
- Identifying CPU bottlenecks in inference pipelines
- Increasing confidence in workloads migrated to Arm from X86
This ensures performance transparency across modern AI and cloud deployments.
