Memory Subsystem Optimization

This article provides answers to questions around the memory subsystem, for example how memory controllers impact system performance and whether there is increased latency in a processor due to memory datapath. It also discusses how Cycle Model products can be used to address these concerns.

Complexity of Memory Controllers and Their Impact on Overall System Performance

Multicore processors have increased the need for high performance memory subsystems, and as a result, the performance of the memory controller has become a critical factor in the overall system performance. In order to obtain good application performance, it is vital to understand the performance metrics of the memory subsystem.

Advancements in DRAM technology introduce complexities in the ability to optimally use a DRAM device. Most modern memory controllers offer complex arbitration and optimization techniques. They also support multiple memory protocols and are highly configurable via mode register settings. The efficiency of memory controllers is dependent on several factors such as queue length, read-write distribution, burst length and address distribution. DDRx memory controllers keep track of a wide range of timing parameters while scheduling and reordering multiple read and write requests. These timing parameters including read latency, write latency and read-write turnaround are vital to the performance of the memory controller. Changing any one of these parameters will have a significant impact on the overall system performance. For example, larger queue lengths or enabling high priority requests in the memory controller have a measurable impact on latency.



With such increased complexity in the memory controllers it has become necessary to measure, profile and validate the performance of the memory controller within the context of a system. Therefore it is important for users to understand whether their memory controller will meet the desired system-level performance when there are multiple masters (eg: CPU, GPU) contending for the same shared memory resource.


Why a Cycle Accurate Memory Controller Model?

To accurately measure performance of the memory controller, designers can use a cycle memory model and leverage the profiling capabilities within SoC Designer to calculate and visualize key performance metrics such as latency, bandwidth and throughput under different traffic loads. These features enable them to perform a wide range of complex test scenarios and easily identify the bottlenecks in their system. This approach has been used by numerous designers to optimize their memory subsystem performance and identify potential bottlenecks. 

The following process outlines how to generate a custom configuration model of the DMC-400 using Cycle Model Studio and also build a reference virtual platform with system-level simulations running in SoC Designer.


Building the DMC-400 SoCD Component

The DMC-400 SoCD component consists of an integrated memory controller, the physical interface (PHY) and the DDR3 memory itself. Cycle Models provide highly configurable and synthesizable DDRx memory models. A wide range of memory configurations are supported through compile time parameters, and each model’s interface timing can be adjusted to simplify the connection to the controller’s physical layer interface. The Cycle DDR3 Memory Model interfaces with the memory controller physical layer interface at the pin level, as defined by the JEDEC standard. 



  1. To build the DMC-400 model, create a top level RTL that instantiates the DMC-400 memory controller, the PHY and the DDR3 memory in a single module.
  2. Compile the top level module using Cycle Model Studio to create the memory subsystem. Tie off the ECC and QVN signals that are not required and add register visibility for the vital timing and control registers.
  3. Finally use the Cycle Component Generator in CMS to add AXI transactors on the system interfaces and an APB transactor on the programming interface of the DMC-400 to generate the SoCD component.

          


Building a Virtual Platform with the DMC-400 Component

  1. After building the DMC-400 SoCD component add it to the SoC Designer component library and connect the programming slave of the DMC-400 to an APB stub and the AXI system interface slaves to AXI stubs.
  2. Create a simple stub script to program and initialize the memory controller.
  3. Use the register view and internal waveform dumping capability in the component to easily configure the critical timing and mode control registers in the memory controller.
  4. With the memory controller configured, leverage the AXI stubs to drive random or deterministic read/write traffic patterns of mixed burst length into the DMC-400 through the system interface slave ports. Add monitors on all 4 transaction interfaces to record the start and end time of all the transactions.     

  5.  

    DMC-400 DDR3 Write transaction


  6. With such a simple unit-test, the user can test the programming sequence of the DMC-400 and also run mixed traffic load through it.
  7. To create a more elaborate reference platform with other system IP, leverage an existing Cortex-A9 CPAK and add the DMC-400 to it. Having already configured the DMC-400, the user can run system-level simulations on the reference platform.

DMC-400: Reference Virtual Platform and Component Register View



This article was originally written as a blog by Pareena Verma.