Overview

ARM big.LITTLE technology is a heterogeneous processing architecture which uses two types of processor. ”LITTLE” processors are designed for maximum power efficiency while ”big” processors are designed to provide maximum compute performance. Both types of processor are coherent and share the same instruction set architecture (ISA).

Using big.LITTLE technology, each task can be dynamically allocated to a big or LITTLE core depending on the instantaneous performance requirement of that task. Through this combination, big.LITTLE technology provides a solution that is capable of delivering the high peak performance demanded by the the latest mobile devices, within the thermal bounds of the system, with maximum energy efficiency.

Background

The performance demanded from users of current smartphones and tablets is increasing at a much faster rate than the capacity of batteries or power savings from advances in semiconductor process. At the same time, users are demanding longer battery life within roughly the same form factor. This conflicting set of demands requires innovations in mobile system-on-chip (SoC) design beyond what process technology and traditional power management techniques can deliver. 

The usage pattern for smartphones and tablets is quite dynamic. Periods of high-processing intensity, such as those seen in mobile gaming and web browsing, alternate with typically longer periods of low-processing intensity tasks such as texting, e-mail and audio, and quiescent periods during complex apps. ARM big.LITTLE processing takes advantage of this variation in required performance by combining two very different processors together in a single SoC. The big processor is designed for maximum performance within the mobile power budget. The smaller processor is designed for maximum efficiency and is capable of addressing all but the most intense periods of work.

Hardware

In a big.LITTLE system, the CPU subsystem is fully cache coherent, and the big and LITTLE CPU cores are fully architecturally identical; they run all the same instructions and support the same extensions such as virtualization, large physical addressing and so on.

Typical Processor Combinations

ARM Cortex-A series processor combinations that meet big.LITTLE requirements are shown in the table below

  1st Generation: ARMv7 
(32-bit, 40-bit physical)
 2nd Generation: ARMv8
(32-bit/64-bit)
 High-performance CPU (big)  Cortex-A15, Cortex-A17
 Cortex-A57, Cortex-A72
 High-efficiency CPU (LITTLE)
 Cortex-A7
 Cortex-A53

A Typical big.LITTLE SoC

An SoC running big.LITTLE processing is built with the cache coherent interconnect, the global interrupt distributor, and typically other system IP components as shown below:

big.LITTLE system diagram
bigLITTLE Overview

Software

big.LITTLE software automatically handles the allocation of tasks to the appropriate CPU cores. One such solution is Global Task Scheduling (GTS) model of software. In this model, the operating system is directly aware of the high-performance and high-efficiency cores in the system, and can dynamically allocate each task to an appropriate core based on the performance required. The mechanics is described in detail in white papers (See Resources Section below) from ARM and ARM partners. The big.LITTLE MP software is available for free in the open source.

Tools

ARM DS-5 Development Studio provides an end-to-end suite of tools for embedded C/C++ software development on a big.LITTLE SoC. 

DS-5 gives you the debug and trace tools necessary to make sure that your hardware is behaving as expected, along with a simple means of configuring the CoreSight™ elements of your SoC design. For a big.LITTLE SoC, DS-5 Debugger perspective shows cores arranged into multicore and multicluster groups, allowing you to see at a glance how big and LITTLE cores are performing. 

In the Streamline performance analyzer, you can analyze performance of clusters, core groups, individual cores, applications, threads and lines of source code to quickly spot bottlenecks that might be slowing your system down. The screen capture below illustrates the different core and cluster views available in Streamline. This capture was made using a Cortex®-A15 cluster and Cortex-A7 cluster in big.LITTLE configuration. As you can see, executing the Xaos example program requires only the LITTLE cores in this case.

Streamline showing clusters and cores of a big.LITTLE system