Running Linux Kernel on a Minimal Arm Cortex-A15 System
Initial Challenges
To begin, the initial hardware needed to run Linux on an Arm Cortex-A15 processor must be identified. This should include the core, memory, and a UART. Use the latest kernel from kernel.org and change as little of the Linux source code as possible, as this makes it easier to update to new versions of Linux as they are released.
Methodology
Simulation speed is more important than hardware accuracy when experimenting with Linux configurations to confirm a working kernel. This is an ideal situation in which to generate a Fast Model design using the SoC Designer canvas. After the kernel is working with the Fast Model design, it is easy to run the cycle accurate simulator, sdsim, for benchmarking and utilize Swap & Play to confirm a fully working virtual prototype that is both fast and accurate.
Hardware Design
The hardware design, referred to as 'A15mini' (indicating a Cortex-A15 system with minimal hardware), requires specifying the memory map and interrupt connections. The memory for the Cortex-A series Versatile Express is from 0x80000000 to 0xffffffff. The first PL011 UART is located at 0x1c090000 and uses interrupt 5 on the Cortex-A15 IRQS[n:0] input request lines connected to the internal Generic Interrupt Controller (GIC). The Cortex-A15 needs to have the base address for the internal memory mapped peripherals (PERIPHBASE) set to 0x2c000000. It should be noted that the UART runs from a 24 MHz reference clock.
Creating the A15mini with SoC Designer requires instantiating the models and connecting them on the canvas using sdcanvas. Using cycle accurate models means more detail is needed to create the design. In addition to the CPU, simple address decoder, and memory, use the Arm CCI-400 Cache Coherent Interconnect and the NIC-301 interconnect. These models, along with the Cortex-A15 are built using Arm IP Exchange, the web-based portal that builds models directly from Arm RTL code. To use the system just answer a few questions or submit an XML file from AMBA Designer to retrieve a simple to use model in the form of a .so file.
The design is shown below:
Linux Preparation
To create a Linux image for the A15mini, the following items are needed:
- Boot Loader
- Kernel Image
- Device Tree Blob
- File System
For the most efficient method, use a single executable file (ELF) containing the items above. Note that the file must be regenerated when any of the items changes.
Kernel Configuration
Download the latest Linux from kernel.org and begin configuration. The default configuration for the Versatile Express is found in the Linux source tree at: arch/arm/configs/vexpress_defconfig
Use the Versatile Express configuration as the baseline by running:
$ make ARCH=arm vexpress_defconfig
File System
In this example the file new-buildroot-rootfs.cpio.gz has been copied from the Cortex-A9 Linux CPAK and renamed to fs.cpio.gz, however you may prefer using an alternative file system.
Customizing Kernel Configuration
To create the single executable file with all of the needed artifacts, embed the file system image in the kernel, and append the Device Tree Blob at the end of the kernel image.
To embed the file system image in the kernel, you can use any of the Linux configuration interfaces, e.g. menuconfig:
$ make ARCH=arm menuconfig
Navigate to the General Setup menu (see the image below), scroll down to “Initramfs sources file(s)” and add the name of the file system image, fs.cpio.gz.
To append the Device Tree Blob at the end of the kernel image, access the Boot options menu item “Use appended device tree blog to zImage (EXPERIMENTAL).” Enable this to append the .dtb file at the end of the zImage file (shown in the image below).
In the Boot options menu set the Default kernel command string by adding root=/dev/ram and earlyprintk, as this specifies to use a ram based root file system.
Source Code Changes
To run the 3.13.1 kernel on the A15mini, edit the file arch/arm/mach-vexpress/v2m.c to remove the line that configures the kernel scheduler clock to read a time value from the Versatile Express System Registers (this peripheral has a register which provides a time value).
The line removed is in the function v2m_dt_init_early() and is line number 423, the call to versatile_sched_clock_init().
Compilation
Now the source tree can be compiled. In this example using Ubuntu 13.10 and the cross-compiler with the GNU prefix arm-linux-gnueabi, the compile command is:
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j 4
The most important compilation result is the kernel image file arch/arm/boot/zImage.
Device Tree Blob
The Versatile Express device trees for Fast Models are available from linux-arm.org. The easiest way to get the files is to use git:
$ git clone git://linux-arm.org/arm-dts.git
The file that is the closest match for the A15mini in the fast_models/ directory rtsm_ve-cortex_a15x1.dts
This file also includes a .dtsi file named rtsm_ve-motherboard.dtsi. To support the A15mini hardware system the two device tree files must be modified so they match the hardware available by removing all of the hardware that doesn’t exist.
Edit rtsm_ve-motherboard.dtsi t1 to remove all of the peripherals that are gone from the A15mini such as flash, Ethernet, keyboard/mouse controller, three extra UARTs, watch dog timer, and more.
Architected Timer Support
Linux requires a timer to run, however using the internal timer in the Cortex-A15, sometimes called the Architected timer, enables the hardware design to be kept as small as possible. This can be achieved by removing the SP804 timers in the Versatile Express and the System Register block that provides the time value for the scheduler clock.
Support for the Arm Architected timer is present in rtsm_ve-cortex_a15x1.dts, however the clock-frequency must be added to the timer in the file.
Here is the timer entry with the clock-frequency added:
Using the Device Tree Compiler
After the two device tree files have been shrunk down to support only the hardware available on the A15mini, they can be compiled into the device tree blob using the device tree compiler, dtc, which is available in the scripts/dtc directory of the kernel source tree. To do this, reach over into the kernel source tree and run:
$ ../linux-3.13/scripts/dtc/dtc -O dtb -o rtsm_ve-cortex_a15x1.dtb\ fast_models/rtsm_ve-cortex_a15x1.dts
Adding Device Tree to Kernel
To use the feature that appends the device tree to the Linux kernel image, copy the zImage from the kernel tree:
$ cp linux-3.13.1/arch/arm/boot/zImage
Then concatenate the device tree blob to the end of the kernel image:
$ cat arm-dts/rtsm_ve-cortex_a15x1.dtb >> zImage
Boot Loader
To create a single, all-inclusive executable file, a small assembly boot loader is preferable. An example of this is available in the Arm Fast Models ThirdParty IP package. This is an add-on to Arm Fast Models which contains all of the open source software that can be run on Fast Models examples. The majority of the package is the Linux source trees and file system images for different examples.
Two useful files from the RTSM_Linux source code that come with the ThirdParty IP package include:
- boot.S - The assembly file that serves as the boot loader. It’s small and easy to understand, and much easier to work with in cases where a full boot loader like u-boot is not needed.
- model.lds - A linker script that specifies how to link boot.o (compiled boot.S) and zImage into a single executable file.
The addition of these two files, combined with the modified zImage (including embedded file system and concatenated device tree) means that everything is now present in the single executable file.
You may prefer to make minor adjustments to the Makefile, inserting your paths, compiler name, and file names to generate the file A15-linux.axf as the final output that will be used in simulation.
The updated Makefile is shown below:
Running Simulation
There are two ways to confirm the design is correct:
- Run Linux on a fast version of the design
- Run a small software program in cycle accurate mode
Both methods are recommended to make sure the design is functioning properly and there are no errors in connecting the interrupt, setting CPU model parameters, or other common mistakes made during design construction. One of the most common mistakes is forgetting to set the PERIPHBASE address of the A15 to 0x2c000000.
- To run a fast version of the design, start by selecting FastModel System in the Tools menu in sdcanvas:
- The Fast Model System Creator dialog will appear as shown below. Clicking the Create button will generate a Fast Model equivalent system automatically from the cycle accurate system.
- This Fast Model system can be run in sdsim using the Simulation -> Simulate System … (or F5). When the simulator starts specify the A15-linux.axf file that was created in the last section and watch Linux boot in a matter of seconds. The terminal below shows 3.13.1 Linux with the machine type reported as Versatile Express.
Swap & Play
Working with Linux on a cycle accurate simulation is a great way to study all of the details of the hardware, software combination such as bus utilization, cache metrics, cache snooping, barrier transactions, and much more. However Linux takes about 300M instructions to boot and well over a billion clock cycles.
One alternative to waiting for a full cycle accurate simulation is to create Swap & Play checkpoints at various points of interest and then load the checkpoints into the cycle accurate simulation.
- To create a checkpoint run the Fast Model simulation and then stop the simulator using either a breakpoint or just hitting the Stop button at the place you want to stop, such as the Linux prompt. Use the File -> Save As menu in sdsim and select Swap & Play Checkpoint (*.mxc) as shown below:
- The next dialog will ask for a name of the checkpoint and a location for the file. Enter any name and hit OK to save the checkpoint.
- To load the checkpoint into the cycle accurate simulation load the design into sdsim and then use the File -> Restore checkpoint view.
- Select the checkpoint saved in the Fast Model simulation and it will load into the cycle accurate simulation. There is no need to load an image file since it will be restored by the checkpoint. The disassembly window and the register window will show the same location that was saved from the fast model simulation.
A debugger such as RealView or DS-5 can also be connected to start source level debugging.
Alternatively use the addr2line utility to find out where in the code the Disassembly or Register Window is showing. This is useful to take a peek at the current location without starting the full debugger. For the disassembly window above use:
$ arm-linux-gnueabi-addr2line -e vmlinux 0x80014524 /home/cds/jasona/kernel.org/linux-x1/linux-3.13.1/arch/arm/mm/proc-v7.S:73
This allows you to see the source code for the current location as a quick check. Here is the code:
Swap & Play can be used to load checkpoints and do cycle accurate debugging as well as performance analysis for benchmarks. Cycle Model users typically run benchmarks such as Dhrystone, CoreMark, and LMbench as Linux applications.
The example shown here for porting Linux and setting up a system which can be used with Swap & Play uses a CPAK on Arm IP Exchange. You can use this system to port your own version of Linux or customize the hardware configuration to match that of your own design.
This article was originally written as a blog by Jason Andrews. Read the original post on Connected Community.