Designing Differentiation into ARM-based SoCs
At ARM, we are focused on growing the largest ecosystem on the planet for efficient computing. This guide explores the different degrees of freedom ARM SoC developers have in building SoCs based on the ARM Cortex CPUs. Even with a "standard" Cortex CPU, there are many ways that you can differentiate your SoC implementation.
Partners who license ARM CPUs can choose the cache size (L1 and L2), bus interface (e.g. AMBA4 or AMBA5), number of cores in a cluster (1 to 4), and how many CPU clusters to use in the design (2 clusters in a big.LITTLE. design for example). We have seen that partners have built 2+4 big.LITTLE configurations with 2 high performance cores and 4 max efficiency cores for midrange and premium smartphone markets, and 4+4 topologies for higher end smartphone and tablet markets. Similarly, we have seen partners build 2 clusters of 4 LITTLE cores to deliver Octacore capabilities at low to mid-range price points.
L2 size is an important factor in performance on many benchmarks, so high-end designs often push L2 sizes to 2MB for the high performance CPU cluster; low-end and mid-range designs can sometimes play this trade-off differently, with a 1MB L2 for the high performance cluster, or 512kB L2 cache size for a high efficiency CPU cluster in a big.LITTLE SoC, trading off performance for cost savings. This range of configurability allows ARM partners to tailor the CPU capacity in their SoC to their target markets, while retaining full compatibility with the ARM architecture and full access to the benefits of the ARM ecosystem.
Cortex-A CPU IP comes with optional power domains around each CPU core, the L2 subsystem, and other areas of the design. Partners can choose how to implement these voltage domains, and can choose to share or group some domains. Further to this, ARM introduced state retention modes for CPU cores and for the Advanced SIMD units in some of our more recent CPUs that partners can optionally use to offer finer grained power management in the SoC.
There are of course numerous peripherals and interfaces beyond the CPU, GPU, and other processing subsystems that can differentiate an SoC. By taking standard Cortex-A CPUs, some partners choose to devote more of their engineering resources to optimizing and tuning specific peripherals and interfaces to differentiate their SoCs.
Memory System Performance
Although every Cortex-A CPU is equivalent to every other Cortex-A CPU of the same revision in terms of performance within the CPU, often CPU performance depends quite heavily on memory system performance, and we can observe two Cortex-A CPUs of the same type delivering significantly different performance as a result of this. As one example, the latency to L2 memory depends on the number of slices a partner uses to meet timing for their target frequency; a partner with lower latency to the L2 will have an advantage in performance benchmarks that spill outside the L1 instruction or data caches. As another example, the latency to main memory can differ a lot from one SoC to another - if one SoC has a memory latency of 100 cycles and the other 140 cycles, the 100 cycle latency memory system will be a big advantage in many (but not all) of the key benchmarks, and is often an observable advantage in terms of delivered performance on real-world workloads.
Often partners seek to differentiate on memory system performance, recognizing the large impact this has on overall performance even against other SoCs with the same Cortex-A CPU. One last point on the topic of memory system performance; CPU performance is very sensitive to latency to main memory, and GPU performance is more sensitive to bandwidth to main memory, so ARM partners will optimize and balance between latency and bandwidth in the design of the memory system for their target applications.
SoC Power Level Management
The way in which a given SoC manages power incorporates several different mechanisms to slow down or shut down components when under light or zero demand during different phases of use. With so many different components in an SoC design, ARM partners have a lot of ways in which they can manage power, and some partners differentiate on the power management mechanisms in the SoC, the big.LITTLE tuning and power management framework, or the software that organizes the management of component shutdown and presents it to the OS or middleware.
There are several system and implementation choices which further offer ways for ARM partners to differentiate when using Cortex-A standard CPUs:
ARM IP is shipped as synthesizable RTL that can be implemented on several different process nodes. Today (early 2015) partners at the highest premium end of the market are building with ARM IP on 16nm and 14nm, while many premium designs are being built and currently shipping on 20nm, with a range of designs targeting 28nm for lower-cost premium SoC platforms for the mid-range and entry level. The frequency and power characteristics can vary significantly for the same ARM CPU implemented on different process nodes, so the choice of process remains one of main ways (and most obvious) that partners differentiate on ARM IP.
The time and effort spend on physical placement, routing, and optimization of the logic and RAM arrays in a design can significantly differentiate one Cortex-A CPU from another. For example, investment in physical design can produce higher maximum frequency for the same design, lower power at the same maximum frequency, or some combination of the two. Also, partners sometimes iterate on the physical design of a CPU, such that the 2nd or 3rd generation of a product can be significantly improved in power, performance, and area (cost) characteristics due to improvements in the physical implementation of the same Cortex-A CPU, providing further differentiation for the partner. ARM POP IP has been a factor in improving the quality of results that can be achieved in physical design by partners, and also improves the next differentiation factor in this list… time to market.
Time to Market
Release windows are critical in markets like high-end smartphones and premium tablets, where a delay of one month can mean missing a whole year design cycle for devices with an annual refresh. Some partners differentiate on being very fast to market based on designs with Cortex-A CPUs. Often in those fast markets, the initial SoC product will be followed with a revised version that improves on the original.
GPU, ISP, video and audio subsystems
In a modern mobile SoC, the performance of the chip is often influenced even more strongly by the performance of the graphics processor, the image processing, the video and audio subsystem, and of course the way these components all work together. ARM provides industry leading IP in the Mali GPU and video subsystem, but we allow our partners to mix and match between our IP, their own IP, and that of 3rd parties. This allows the ARM partnership to experiment with different combinations of IP, iterate rapidly, and compete for the best combination in each device generation. This competitive iteration has led to rapid innovation in smartphones and tablets and is a key benefit of the ARM ecosystem, a benefit that is now well established in networking markets, and making inroads into server markets, for example.
The way in which the CPU, GPU, ISP, video subsystem, coherent interconnect, and memory system work together as a combined system is an increasingly important factor in modern SoCs, and a key way for partners to differentiate their chips. Examples of differentiation in the system design include the use and configuration of cache coherent interconnect, next level cache memories, dynamic memory controllers, and the software that configures the system and optimizes things like power down modes and operating points at run-time.
Beyond the hardware IP and custom components in an SoC, there is of course the software that configures and operates the SoC. The key attribute we have been discussing is the compatibility of all ARM-based designs, so that the Linux kernel, application software, and middleware all run the same on ARM-based CPUs. ARM partners can differentiate along all of the dimensions listed above, and still maintain full software compatibility that allows them to tap in to the vast wealth of software written for the ARM architecture. The chip support and board support packages with a given SoC can be a point of differentiation for ARM partners that invest there.
As a result of all of these opportunities for differentiation, any 2 Cortex-A57, Cortex-A72, or Cortex-A53-based processors can be quite different in their system, power, and performance characteristics, while still being identical from a software perspective.
A quick listing of ways the performance can differ
- Max frequency (and max sustainable frequency - influences by power)
- Power (affects sustained frequency in a thermally constrained environment)
- Latency to the L2
- Latency to main memory
- Bandwidth to main memory
- L2 size (and L1 size for some ARM CPUs)
- big.LITTLE topology - number of cores
- big.LITTLE tuning and scheduling policy
- Coherent interconnect
Beyond all these of course, SoC developers can innovate around the core with their own IP blocks and design techniques.