Getting Up and Running with the Arm Mali-450 MP GPU

This article describes how to create a Cycle Model from the Mali RTL for Mali-450 MP GPU using Cycle Model Studio and introduces a bare-metal system built on SoC Designer that consists of the GPU alongside several other components. The article also discusses a Linux-based CPAK containing a Cortex-A15 based system with a Mali-450 GPU, which enables users to integrate the GPU drivers into the Linux Kernel and register the Mali driver while booting Linux on the virtual prototype environment.

Generating the Cycle Model

Requirements for a Cycle Model for Arm Mali-450 include configuration with 4 out of 8 pixel processors enabled, 128k cache size, 2-port 128 data bit AXI master, and with the Power Management Unit (PMU) disabled.

These options are configured in the RTL using `define directives. These can be set using the Verilog specific options within the Compiler Properties provided by Cycle Model Studio. Use the +define+option to specify the desired configuration (M450_4_128K_128B_NOPMU in this case), and the +incdir+option to specify the paths to the directories for the compiler to search for include Verilog files. Only the top-level Verilog file needs to be specified in the Project Explorer after configuring the compiler properties. After the initial setup, a 100% cycle-accurate Cycle Model can be generated.



Generating the SoC Designer Component

The next step is to take the compiled model and integrate it into SoC Designer. Using Cycle Model Studio’s SoC Designer Component Generation wizard, add the transactors that correspond to two 128-bit AMBA AXI Master interfaces, and one APB slave transactor to the Cycle Model before building the final SoC Designer component. The tool automatically suggests potential matches to the related RTL signals once the transactor to be added is selected by the user.


The figure below shows the generated SoC Designer component that corresponds to the GPU with 4 pixel processors enabled, the 128kB cache, and two 128-bit AXI Master ports ready to be integrated into a system on SoC Designer.


Arm Mali-450 GPU Bare-Metal System


The Mali-450 bare-metal system is built so that the GPU has two AXI master ports which are used to access shared memory through the interconnect. The Arm Cortex-A15 processor programs the GPU by configuring the internal GPU registers through the interconnect subsystem and the APB3 interface. The GIC-400 is configured to generate interrupts corresponding to each processor internal to the Mali-450 GPU. The UART is used to display characters and in this case, it displays contents of internal GPU registers and test results.

The Mali-450 GPU model has been configured to have four Pixel Processors (PPs), two L2 Caches, one Geometry Processor (GP), and no Power Management Unit (PMU). 


The Test Infrastructure

Along with the RTL, Arm provides an Integration Kit that consists of tests to confirm that the GPU is correctly integrated into the system. These tests confirm if the internal APB registers of various components in the GPU (like the PPs, the GP, the L2 caches, the DMA unit, the Broadcast unit, and the Dynamic Load Balancing Unit) are readable and writable. The results of the tests are displayed on the UART. Tests that confirm if the IRQ pins are connected correctly, and the GIC-400 is properly configured are also available as a part of the test infrastructure. The boot code corresponding to the Arm processor, the driver code corresponding to the Mali-450 GPU, the GIC-400 and the UART are provided with the Integration kit.



Begin building the system on SoC Designer by configuring the PL301 interconnect using AMBA designer with address ranges defined for the UART, GIC-400 Interrupt Controller, memory, and the GPU (these details are provided in the Integration Manual). The GPU requires an address range of 192kb and each APB3 slot on PL301 provides up to 4kb, so 48 APB3 slots are defined and eventually merged using the APBMerger component.

After configuring the PL301 interconnect, put the whole system together on SoC Designer Canvas and run the test application. By providing the testname in the Makefile, the application file (axf file) to be loaded into the SoC Designer Plus simulator can be built. These axf files are loaded and the test applications executed immediately without any manual intervention. 

This bare-metal system is now available as a CPAK, and the integration tests described above are included.


Partitioning the System

The diagram below shows the Mali-450 CPAK introduced in this article. The system is comprised entirely of 100% accurate models which enable us to see the impact of our IP configuration settings and bare-metal software. Migrating to higher level tasks such as driver development requires mapping portions of the system to more abstract models in order to execute at faster speeds. SoC Designer automatically understands the relationship between Cycle Models of Arm IP and their Fast Model equivalents; the Fast Model Generator tool in SoC Designer can be used to understand which models need to be executed accurately, and which ones should be represented as Fast Models.



The dotted line indicates the portion of the virtual prototype to be executed as Fast Models. The rest of the system will continue to execute as cycle accurate models. Once the models for each domain are identified, SoC Designer automatically builds a new virtual prototype representation containing both Fast Models and Cycle Models. Figure 2 shows a block diagram that describes such a system. All of the necessary transactor logic to map between the abstract models and the accurate models is inserted automatically.


Using the CPAK with Arm Fast Models

Since the virtual prototype will run at Arm Fast Model speeds when the accurate models are not being used, this platform can be used to boot Linux and integrate the Mali device driver. This can be achieved by integrating the Mali Linux drivers into the Linux kernel that is used as part of the Cortex-A15 Linux CPAK. The Linux drivers are part of the Mali-450 GPU Linux Driver Development Kit (DDK), which contains the Mali Linux device drivers and the base drivers. It also provides links to download the OpenGL drivers, specific to Mali-450 GPU. The Integration Guide can be used, along with minor modifications specific to the virtual prototyping environment, to integrate the software into the Linux kernel. The integration procedure involved the following steps:

  1. Integrate and build the Mali Linux device driver: The user specifies the macros corresponding to the GPU configuration parameters, along with configuring the Mali GPU memory, the framebuffer memory and power management options. The device driver is built as part of the kernel image, but it can also be built as a kernel module.
  2. Build the OpenGL and GLES libraries and add them to the root file system.
  3. Build the GPU benchmarks and add them to the root file system.
  4. Build the kernel image that contains the Mali device drivers, base drivers, the OpenGL drivers and the GPU benchmarks.

This CPAK can be useful to device driver developers who can step through the process of loading the Mali device drivers during boot-up. Figure 3 shows the messages printed by the console while loading the Mali Linux drivers with debug messages turned on. It describes the whole procedure of registering the drivers that starts with initializing the Mali memory system and ends with initializing the platform device, provided all intermediate steps pass. The intermediate steps define the settings for the framebuffer, the dedicated and shared memory. Each of the internal components of the Mali-450 that include the L2 cache, the Graphics Processor, the Pixel Processors and the Dynamic Load Balancing unit are then created and the corresponding base addresses are defined.



The Results

With the driver ported to Linux, it is now possible to quickly boot the OS in just over two minutes and then start processing frames with 100% accuracy. The benchmark that uses the Mali base drivers runs test suites on:

  1. GPU memory allocation
  2. Initialization of the Graphics Processor
  3. Initialization of the Pixel Processors
  4. Running vertex shader jobs
  5. Running rendering jobs that draw a simple triangle

The Mali-450 CPAK can be found on Arm System Exchange, enabling users to duplicate these results within minutes of downloading the tool.



This article was originally written as a blog in three parts by Varun Subramanian. The third part of the original post is available to read on Connected Community.