How is PPA implementation analysis performed?
To perform PPA analysis on a piece of IP, the IP needs to be ready for a semiconductor fabrication plant, also called a fab or a foundry, can realize the design. For the team running a PPA analysis, this involves making decisions as though the IP was going to be manufactured and recording the decisions to provide context to the analysis. To interpret PPA analysis data effectively, you must understand the context in which the analysis was performed.
To simplify things, we are going to refer to the IP as a processor for the rest of this section and explore the steps that are taken to get to the realization point. Processors are defined at the Register Transfer Level (RTL). RTL is a design abstraction and usually written in either the Verilog or VHDL hardware description languages. Before PPA analysis can be performed, this abstraction is taken to the point at which it can be physically made in a fab. In other words, a complete trial physical implementation is required for the processor.
From RTL to fab process ready
Configuring the processor to a specification
Decisions are made about which processor options to include and what the sizes of the caches will be. The design decisions taken here will have a noticeable effect on the area that the final design takes up, and the level of performance that the final design can attain.
Note: By reducing non-essential options and reducing the size of the processor memories, you can have a smaller design if that is important to you. There is a tradeoff here. Having smaller caches reduces the area of the processor but also reduces cache throughput performance.
Choosing the fab process
In this step, a fab that can potentially realize the IP is chosen. Fabs run a process node or process to generate the IP, and a fab typically offers several processes from older budget processes to cutting-edge processes. Ultimately, a fab process makes many design decisions, including the definition of some critical dimensions, which affect the power, performance, and area of any IP that is realized by the process.
Note: Choosing the fab is as important as choosing the process. Fabs use the same nomenclature, for example 40 nm or 28nm, for processes, but the original meaning has ceased to apply to newer processes. Even older processes, pre 40nm, can differ greatly from fab to fab. It is important to be aware of this if you are comparing two PPA analyses. As a rule, the Arm PPA analysis for the IP within the Arm Flexible Access program uses Taiwan Semiconductor Manufacturing Company (TSMC) processes.
Fabs also offer process options. These options are designed by the fab to optimize for specific properties, for example performance or power. They represent an important starting point, to which further adjustments can be made to suit individual SoC designs.
Applying Electronic Design Automation to fully synthesize the processor
This step maps the high-level RTL definition of the processor to a set of logical gates, for example NAND and NOR gates. The gates, also called cells, come from a physical IP library, for example those contained within the Arm Standard Cell Libraries.
Note: Physical IP libraries are designed for a specific fab process. Process design rules define what options the libraries can provide. The choice of library can affect power, performance, and area through variables, for example channel length and track size.
A target frequency for the IP must be set before synthesis is run. This value is used throughout the implementation flow. The Electronic Design Automation (EDA) tools select cells from the available libraries to meet the target frequency requirement.
At the end of this step, the EDA tools will have generated what is referred to as the netlist.
In addition, the memory compilers are used to create memory models. These memory models are also required by the EDA Tools when they carry out synthesis and, in the next step, placement and routing. The final netlist shows how the memory is connected to the rest of the design.
Applying Electronic Design Automation for placement and routing
This step takes the gates that form the netlist and ensures that:
- The gates are legally placed in a core boundary.
- The clock reaches all registers in the design in a balanced manner.
- The design is fully routed while adhering to any design rules that are stipulated for the fab process.
Running the analysis
The output of the previous steps is a complete trial physical implementation, on which the analysis can be run. Even on powerful server farms, each run of these analyses can take hours or, for more complex designs, days to complete.
For some pieces of IP, more than one analysis is performed. For example, for a single processor, PPA analysis runs for the targets shown in the following table might be made:
Table 3‑1 Example PPA analysis targets
|Minimum||A configuration for a minimum viable processor targeting the lowest possible power and area.
|As Fast As Possible (AFAP)
||A configuration for a processor capable of obtaining the highest possible clock speed.
||A typical processor configuration.
For each analysis, choices are made regarding the IP configuration, process, and physical IP libraries with the aim of either targeting one requirement or taking a general approach, which covers all requirements to some extent. Rather like the process options themselves, these analyses can provide you with a starting point. For example, if performance is most important to you, you could start with the As Fast As Possible (AFAP) PPA analysis and then think about possible adjustments to it. Alternatively, if power consumption and the size of the IP are important factors to you, you could start with the Minimum analysis.
For processors, another reason why more than one analysis is performed is because some processors can be configured to be multi-core. In these cases, it is common to supply a PPA analysis for a dual-core or quad-core configuration for comparison with the uniprocessor configuration.
The goal of this documentation is not to teach you how to run a PPA analysis on a piece of IP. Instead, you will gain an understanding of how the decisions that are made when running an analysis lead to the results which you can see. This will:
- Help to make sure that you are comparing two similar things when you compare two pieces of IP. If you spot a difference in a variable, for example the track size of a physical IP library, you can potentially account for any distortion that this variant is causing.
- Give you ideas about how to tailor your chosen IP so that it is exactly what you need. For example, imagine that you chose a processor because, from the PPA data, it was the best fit for your SoC project. However, perhaps you need a little more performance. In this case, you might consider swapping out the physical IP library used for a 12-track high performance library. You can think of this as working backwards from the PPA analysis data.
The following sections explore in greater detail the variants or choices that can affect the PPA results for a piece of IP:
- Choosing the IP configuration
- Choosing the fab process
- Choosing the physical IP library
The sections are written from the perspective of a team running a PPA analysis and considers the choices that they face. You will not be making these choices yourself in the context of generating PPA data. However, the choices do mirror the choices that you will make before the manufacture of your SoC designs. In that sense, these sections will help to inform you to make these decisions.