Choosing the physical IP libraries
Once the process has been chosen, the analysis team must choose which physical IP library to use. Physical IP libraries are designed for specific processes and offer options in the following areas:
- Channel length
- Track height
- Voltage threshold
- High performance kits
Ultimately, the choice of physical IP library enables further adjustment of the manufacturing outcome beyond the choice of process option. For the remainder of this section, we refer to the Arm Standard Cell Libraries, which are production-quality physical IP libraries. We assume that the analysis team selected exclusively from these libraries for their implementation.
Note: Unless specified otherwise, Arm Standard Cell Libraries were used to produce Arm PPA analysis data.
It is also possible to hybridize within the different options that the Arm Standard Cell Libraries provide. For example, it is possible to have 35% low voltage threshold cells and 65% high voltage threshold cells in an implementation. This is discussed in Hybridizing the options given by Arm Standard Cell Libraries.
Memory models explores how memory on a piece of IP, for example a cache, is compiled into memory models.
How the target frequency determines physical IP library usage
The EDA tools decide which cells to use from the cell libraries that are available. The tools work to meet the specified target frequency and use hybridization as required to achieve this. Appropriate cell libraries, containing cells that can provide performance for critical paths on the silicon, must be available for the tools to use.
The target frequency must be chosen with care. Choosing a target frequency that is too high results in the area and power usage being higher than if a realistic value was chosen. In addition, the implementation will not achieve the overly high value.
Channel length was discussed in Introduction to fab process nomenclature. Although the minimum value for this critical dimension originally defined the name of a process, the option to vary this dimension exists in Arm Physical IP libraries for the newer processes, that is, those at 40nm processes and below. The minimum channel length remains a process rule to which libraries designed for a specific process must adhere.
Choosing a channel length that is longer than the minimum allowed does not usually increase the area of the IP. This statement might seem counterintuitive because channel length was, for many years, used to name processes. With every new advancement, more transistors could be fitted into a smaller area. However, each advancement represented an overall shift forward in the fundamental size of the process technology. This drive towards a reduction in fundamental size continues into the newer processes, for example 16nm, even though the minimum channel length for such processes is now larger than the naming size.
In fact, library cells for a process are often 100% footprint compatible. This means that the length of the cell is standardized, and the channel length is contained within the overall length of the cell. You can think of channel lengths that are greater than the minimum value being absorbed. For example, TSMC's 28nm process supports channel lengths of 31nm and 38nm. However, when footprint compatible cells are used, neither choice will result in an increase in the manufactured IPs area. Although the analysis team could, for some processes, choose very large channel lengths, this does not make economic sense. Arm Standard Cell Libraries provide a limited range of realistic channel lengths for a process, which ensures that the implemented cells have a footprint compatible size.
A good question is why the analysis team would choose a channel that is length greater than the minimum value allowed? In the previous example, using the shorter channel length of 31nm creates faster transistors but at a cost to static power. The 38nm channel length is slower but the transistors in the cell will have lower static power usage.
Note: Arm PPA analysis data records the channel length used, for example C38 or C31. When reviewing PPA analysis data, consider increasing the channel length in situations where the IP appears capable of providing excess performance for your requirement, but the static power consumption is too high.
Because the footprint of the cells is not affected by channel length, it is also possible to have cells of varying channel length in your implementation. Hybridizing the options given by Arm Standard Cell Libraries discusses further why varying channel lengths might be used.
Cells are sequentially placed along tracks, which are stacked one on top of the other on the silicon die. It is important to remember that track height in fact runs perpendicular to the channel length dimension and should not be thought of as a z dimension. Track height is itself a constant defined by the process design rules. One track constitutes the minimum allowable spacing between the Metal 1 routes in a process.
Cell heights are expressed in tracks. The height of the cells in an IP implementation can be varied if they adhere to the design rules of a process. When a cell is higher, it allows for a larger transistor width to be used. To confirm, the transistor width dimension also runs perpendicular to the channel length dimension. If you think of channel length running along the x dimension, and the transistors spanning the channel, then cell height and transistor width run along the y dimension. The following figure shows a comparison between the layout of 12-track footprint compatible cells and the layout of 7-track footprint compatible cells on a die:
Wider transistor widths equate to higher performance but also increase the area of the implemented IP. When higher track heights are used, the increase in area is felt along the x and y dimensions, and the die remains a square.
Regarding track height, Arm Standard Cell Libraries can be split into the three main categories that are shows in the following table:
Table 6‑1 Arm Standard Cell Library track height options
|7 or 8-track
||Ultra-high density and low power
||For cost critical applications
|9 or 10-track
||For mainstream applications
||For speed critical designs
Note: Arm PPA analysis data records the track height of the cells that are used: SC7, SC8, SC9, SC10, or SC12. When reviewing PPA analysis data, consider the effect of the track height on the results. Making changes to the track size could bring an IP implementation closer to your exact SoC requirements.
The voltage threshold of the cells in an IP implementation can be varied if they adhere to the design rules of a process. The voltage threshold has an impact on the switching speed of the transistors inside a cell, and therefore has a direct impact on performance. In addition, the voltage threshold also has an impact on static power consumption. High voltage threshold cells are slower, because they require more voltage to switch on, but consume less static power. Low voltage threshold cells are faster but consume more static power. Ultra-high and ultra-low variants, and a standard option, are available for cell voltage thresholds.
Note: Arm PPA analysis data records the voltage threshold of the cells used: ULVt, LVt, SVt, HVt, and UHVt. The tradeoff between power and performance that sometimes needs to be made when designing an SoC is exemplified by the voltage threshold options that are available. If you need to make this tradeoff, keep the voltage thresholds of the cells in mind when reviewing PPA analysis data.
Add-on kits for the Arm Standard Cell Libraries
Depending on the process, Arm offers add-ons to the Arm Standard Cell Libraries, which contain additional selected cells and can be chosen instead of base cells. The add-on kits are tailored to a specific purpose, for example:
- Arm High Performance Kit (HPK)
- Arm Power Management Kit (PMK)
- Arm Low Power Kit (LPK)
The LPK and PMK implement low-power techniques like power gating and dynamic voltage scaling. The HPK employs advantageous circuit tuning practices inside the logic gates to achieve higher performance with a minimal impact on area and power.
Note: Arm PPA analysis data records whether any of the add-on kits were used. The tradeoff between power and performance that sometimes needs to be made when designing an SoC is exemplified by the add-on kits available. If you need to make this tradeoff, keep in mind which add-on kits were used when you review PPA analysis data.
Hybridizing the options given by Arm Standard Cell Libraries
Each individual cell library represents a choice for each of the four options available for a cell: channel length, track height, voltage threshold, and add-on kit. Add-on kits extend the other three options over the add-on kit. Every combination of the options for a cell is available. This means that you can choose different combinations for different percentages of cells in the implementation.
In the example shown in the following table, a team carrying out an AFAP analysis on a processor chose to use four different libraries in the implementation. Each row in the table corresponds to an individual cell library. Two base libraries and two libraries from the Arm High Performance Kit were used:
Table 6‑2 Libraries used in an example AFAP analysis
|Percentage usage||Track size||Kit||Voltage threshold
The above choice is for the TSMC 28nm process. Because this is an AFAP implementation, a maximum track size and minimum channel length were chosen for all cells. This is expected, because both options maximize performance.
The other two performance boosts are used more sparingly. Only 10% of the cells are completely maximized for performance, because they come from the HPK and have a low voltage threshold. The HPK provides another 5% of the cells, but these cells have a standard voltage threshold. Another 25% of the cells have a low voltage threshold, but are selected from the base kit. In fact, it would waste power and area to use the HPK and a low voltage threshold for all the cells. The analysis team have used the EDA tools to save on power and area. The tools choose smaller and less power-hungry cells on paths that are not critical for timing. Even when an AFAP analysis runs, this additional area and power recovery step is performed in the implementation flow, specifically to improve the power and area results.
Note: Arm PPA analysis data records cell library hybridization. Some analyses, for example targeting minimum power and area usage, may use a single cell library. However, it is very likely that you would use hybridization, through the EDA tools, to save on power and area whatever the performance requirements of your SoC design. Although hybridization provides the ultimate flexibility in the tradeoff between power and area, cost, and performance, the design rules of some processes place limitations on this. For example, a process design rule may limit the amount of different voltage thresholds that can be combined on any given SoC.
Arm PPA analysis uses production quality memory models with real timing. Memory compilers are used to create the memory models. For processors that include memory as part of the processor, for example level 1 and level 2 caches, or TCMs, the size and type of the memory included can have a major impact on PPA analysis.
Regarding caches, the size of the memories alone has a major impact on area and power consumption. Doubling the memory size can more than double the area and power usage. Two 32KB caches will have a significantly higher area and power usage than two 16KB caches. However, it is important that a PPA analysis is realistic. Completely omitting caches from an analysis where caches are likely to be used in the real world is not helpful. Even when Arm runs AFAP or Minimum analyses for a processor, the size of the level 1 data and instruction caches is the same as for the Featured analysis. This enables these analyses to be more meaningfully compared.
Different compilers create memory with differing characteristics:
Table 6‑3 Memory compilers
|Arm Artisan RF High-Density SP
||Lowest area and power usage
|Arm Artisan RF High-Speed SP
||Higher speed but higher area and power
|Arm Artisan Fast Cache Instances
||Optimized for L1 memories
Note: Arm PPA analysis data records the memory compiler that is used for the SoC memory. The speed of the memories is often critical to the timing closure of the processor and has a major impact on the frequency that the processor can be clocked at. However, faster memories take up more area and consume more power. You need to be aware of the tradeoff between processing frequency and power usage, and you need to use the smallest memories possible to achieve the performance that you require.