Collaboration across the full stack: How Arm, Canonical and Google Cloud help shift infrastructure toward Agentic AI

From silicon features to AI runtimes, the Arm ecosystem is getting ready for a new generation of AI compute

Reading time 5 minutes

The software ecosystem around Arm has evolved significantly over the last several years. Key operating system vendors (OSVs), cloud providers, and independent software vendors (ISVs) now provide broad support for Arm-based platforms.

Canonical has been part of that journey from the beginning. Ubuntu was among the first enterprise Linux distributions to support Arm servers during the early hyperscale transition, helping establish the software foundation that enabled Arm adoption across cloud infrastructure. Beyond the Ubuntu operating system (OS), Canonical has spent years building an Arm-native software infrastructure ecosystem spanning MAAS, OpenStack, MicroCloud, Ceph, LXD and Canonical Kubernetes, all with highly optimized Arm64 support.

Today, Ubuntu continues to power a significant portion of Arm deployments across public cloud, enterprise infrastructure, and AI environments. This includes agentic workloads.

The rise of agentic AI is beginning to reshape infrastructure requirements across the industry. Unlike traditional prompt-response workloads, agentic systems operate continuously. They orchestrate reasoning loops, interact with external tools and services, manage memory and state, and coordinate multiple models and runtimes simultaneously. This creates a very different infrastructure profile, where sustained throughput, memory behavior, latency consistency, and system-level efficiency become increasingly important.

As the industry transitions toward Agentic AI workflows, the underlying software stack plays a critical role, underscoring the relevance of the long-standing collaboration between Arm and Canonical.

Building the foundation for Arm AGI CPU

The introduction of Arm AGI CPU represents a broader shift in how infrastructure is being designed for modern AI workloads. As agentic systems become more persistent, memory intensive, and orchestration heavy, infrastructure must balance compute density, efficiency, scalability, and observability simultaneously.

But unlocking the full potential of platforms like Arm AGI CPU requires more than advanced silicon. It requires a mature software ecosystem capable of exposing hardware capabilities all the way up the stack.

Using Ubuntu at Arm during internal enablement and validation efforts around Arm AGI CPU further reinforced how closely kernels, drivers, runtimes, profiling tools, and orchestration layers must evolve alongside modern compute architectures.

Ubuntu and Canonical’s broader infrastructure stack help bridge that gap between hardware and application deployment by enabling developers to efficiently provision infrastructure, orchestrate distributed environments, optimize workloads, and gain visibility into system behavior across Arm-based platforms.

For developers, the operating system is no longer just a deployment layer. It forms the foundation for:

Compilers and developer environments
Virtualization
Distributed storage
Container orchestration
Observability tooling
AI runtimes

All of these components must work together efficiently to support large-scale AI infrastructure.

In many cases, enabling new infrastructure capabilities requires deep coordination across the full software stack long before hardware reaches a deployment phase. However, it all starts with an OS and that is why the operating system layer matters.

Bringing Arm infrastructure closer to developers with Google Cloud

As new Arm platforms continue to evolve, developer access to the latest infrastructure becomes equally important. Google Cloud’s C4A metal instances enable developers to develop and deploy directly on Arm-based bare metal environments with much greater visibility into physical hardware behavior and performance characteristics. This provides a practical platform for profiling, optimization, and infrastructure validation of software using modern Arm infrastructure available today.

More importantly, instances like C4A metal help bridge the gap between cloud-native software development and hardware-aware optimization. Developers can move beyond abstract virtualized environments and begin understanding how their workloads behave at the silicon, memory, and system levels.

This becomes increasingly valuable as modern AI infrastructure grows to be even more performance sensitive.

Visibility and optimization for modern AI infrastructure

As AI systems scale, visibility into workload behavior across the hardware and software stack becomes critical.

Even in accelerated AI environments, the CPU remains deeply involved in the execution path. Modern AI workloads still rely on the CPU for runtime scheduling, data movement, networking, orchestration, request handling, memory management, and coordination across distributed services.

This is especially true for agentic AI systems, where multiple frameworks, runtimes, orchestration layers, and external services interact continuously throughout workload execution. Understanding how these systems behave under real workloads is becoming just as important as model optimization itself.

This is where tools like Arm Performix become valuable.

Arm Performix helps developers and infrastructure teams better understand CPU, memory, and system-level behavior when optimizing AI and cloud-native workloads on Arm platforms. Rather than relying solely on synthetic benchmarks or raw hardware counters, developers increasingly need visibility into how real applications behave across the full software stack.

Combined with Arm-based server platforms and accessible bare metal environments like Google Cloud C4A metal, modern tools such as Arm Performix and the Arm MCP server provide a practical path for profiling, analysis, and optimization of modern Arm infrastructure, from high-level application behavior down to silicon-aware performance analysis.

Join the conversation at Ubuntu summit

To learn first-hand about optimizing modern workloads on Arm-based Google Cloud infrastructure using Ubuntu, join the session at Ubuntu Summit 26.04.

In this session presented by David Haikney, Arm will demonstrate a practical workflow with the following profiling and optimization capabilities:

Running workloads on Arm-based systems
Capturing low-level performance telemetry
Analyzing workload bottlenecks
Visualizing silicon-level behavior
Identifying optimization opportunities

This will be performed against a realistic speech to text inference workload featuring vLLM and the Whisper model.

Performix dashboard showing workload telemetry and CPU utilization on Arm infrastructure.

Figure 1: Performix dashboard showing flame graph of hot functions

Figure 2: Analyzing CPU core behavior on Ubuntu with Performix

The demo offers practical techniques and necessary tooling that help developers build and optimize next-generation applications on Arm Neoverse platforms.

Whether you are developing cloud-native services, AI applications, infrastructure software, or running performance-sensitive workloads, now is an exciting time to explore what is possible with Arm-enabled infrastructure.

Watch Level Up Your Code on Arm and Ubuntu

By Yan Fisher

Article text

Re-use is only permitted for informational and non-commercial or personal use only.