Using Arm Cycle Models to Explore the Cortex-R8

Arm Cycle Models are used to perform a variety of tasks, including IP evaluation, software development and performance optimization. This tutorial describes how they have been used by Arm silicon partners to understand the Cortex-R8. This model is available for use in SoC Designer and, for the first time, Accellera SystemC simulation.

Introduction

Earlier cycle accurate model availability has led to the use of Cycle Models to understand new processors. Prior to this partners would use RTL simulation or FPGA boards. RTL simulation can be cumbersome, especially for software engineers doing benchmarking tasks, and it lacks software debugging and performance analysis features. FPGA boards are familiar to software engineers, but lack the ability change CPU build-time parameters such as cache and TCM sizes.

With this in mind, Cycle Models have become an attractive alternative to learning about new technologies. The examples below provide more insight on how the models are being used.


Benchmarking

There are several ways to run benchmarks on a new processor:

  1. SoC Designer can be used to run benchmarks and measure how many cycles are required for C function on a new processor. The tool provides an integrated disassembly view which can be used to set breakpoints to run from point A to point B and measure cycle counts.


  2. DS-5 can also be connected to the Cortex-R8 for a full source code view of the software.


  3. The cycle count is always visible on the toolbar of SoC Designer.



    Many times a simple subtraction is all that is needed to measure cycle count between breakpoints.

  4. After the first round of benchmarking is done, the code can be moved from external memory to TCM and execution repeated. The Cortex-R8 cycle model will boot from ITCM when the INITRAM parameters are set to true. Right clicking on the Cortex-R8 model and setting parameters make it easy to change between external memory and TCM.


  5. In addition to just counting cycles, SoC Designer provides additional analysis features. One useful feature is a transaction view. The transaction monitor can be used to make sure the expected transactions are occurring on the bus. For example, when running out of TCM little or no bus activity is expected on the AXI interface, and if there is activity it usually indicates incorrect configuration. Below shows a transaction view of the activity on the AXI interface when running from external memory. Each transaction has a start and end time to indicate how long it takes.


  6. All PMU events are instrumented and can be automatically captured in Cycle Models. These are viewed by enabling the profiling feature and looking at the results using the analyzer view. The hex values to the left of each event correspond to the event codes in the Technical Reference Manual. In addition to raw values, graphs of events over time can be created to identify hotspots.



  7. The analysis tools also provide information about bus utilization, latency, transaction counts, retired instructions, branch prediction, and cache metrics as shown below. Custom reports can also be generated.



  8. After observing a benchmark in external memory and TCM, it’s common to change TCM sizes and cache sizes. Models with different cache sizes and TCM sizes can easily be configured and created using Arm IP Exchange and the impact on the benchmark observed. The IP configuration page is shown below. Generating a new model is as simple as selecting new values on the web page and pushing the build button. After the compilation is done the new model is ready for download and can replace the current Cortex-R8 model.




Cache and Memory Latency

The Cortex-R8 Cycle Model can be used to analyze the performance impact of adding the PL310 L2 cache controller. There is a Cycle Model of the PL310 available from Arm IP Exchange. It can be added into a system and enabled by programming the registers of the cache controller. The register view is shown below.


Register view for Cache Controller


SoC Designer provides ideal memory models which can be configured for various wait states and delays. Performance of memory accesses using these memory models can be compared with adding the PL310 into the system. The same analysis tools can be used to determine latency values from the L2 cache and the overall performance impact of adding the L2 cache. Right clicking on the PL310 and enabling the profiling features will generate latency and throughput information for the analysis view.

Example systems using the Cortex-R8 and software to configure the system and run various programs are available from Arm System Exchange. The systems serve as a quick start by providing cycle accurate IP models, fully configured and initialized systems, and software source code. Most users take an example system as a starting point and then modify and customize it to meet particular design tasks.


Summary

Arm Cycle Models have become the new standard for IP evaluation and early benchmarking and performance analysis. The Cortex-R8 Cycle Model is available for use in SoC Designer and SystemC simulation. Example systems and software are available, models of different configurations can be easily generated using Arm IP Exchange, and the software debugging and performance analysis features make Cycle Models an easy to use environment to evaluate and make informed IP selection decisions.

This article was originally written as a blog by Jason Andrews. Read the original post on Connected Community.