Further choices that affect PPA implementation analysis
The preceding three sections have provided a full understanding of the decisions that need to be made before performing a PPA analysis. We showed how an analysis team defines a physical implementation to analyze. This section explores further choices that the team must make when they run the analysis. Remember these decisions when you interpret PPA data. Also, be aware that some of the constraints applied at this part of the proceedings are hypothetical. For example, it is important to choose a temperature at which the IP is theoretically running. However:
- The value chosen might represent an extreme, undesirable situation that is unlikely to occur in the real world.
- PPA tools are used to simulate the environment, and the IP is never realized to test the findings.
Silicon speed, voltage, and temperature ranges
This section looks at the outlying values, and the median value, for three factors that affect the power and performance figures in PPA analysis data. The PPA tools take these factors into account when generating the data. However, it must be decided which values to use on each of the ranges when generating the power and performance figures. The IP is not expected to operate outside of the outliers.
The following table describes the three ranges:
||During integrated circuit semiconductor fabrication, variation from the nominal doping concentrations in the transistors on a silicon wafer can occur. This affects the carrier mobility (both electron and hole mobility) in the transistors. This variation can cause significant changes in the speed at which the digital signals transition, from high to low and from low to high.
Silicon speed is described by two-letter designators, with the first letter for the n-channel MOSFET speed and the second letter for the p-channel MOSFET speed.
For example, if the ss outlier is selected, an assumption has been made that the analysis is for silicon which, because of its chemical makeup, has the slowest performance possible.
Because of reduced process variability, more modern processes offer two new outliers: ssg (global slow) and ffg (global fast). These outliers represent high-yielding silicon, which can provide a 10-15% performance boost over more conservative silicon represented by ss and ff.
||The transistors in a piece of IP are expected to operate over an acceptable variance in voltage. Transistors operate faster at higher voltages and slower at lower voltages. The worst-case slow outlier is assumed to be -10% from the typical voltage, and the best-case fast outlier is assumed to be +10% from the typical voltage.
For example, if the typical, median voltage is 0.9V, then the best-case, high voltage outlier is 0.99V, and the worst case, low voltage outlier is 0.81V.
||The transistors in a piece of IP are expected to operate over an acceptable variance in temperature, which is typically -40°C to 125°C. It is very important to note that the worst-case outlier for a process of 40nm or above is 125°C, but this switches to 0°C for processes under 40nm.
For power and performance analysis, it is very important to know where on the process, voltage, and temperature range the analysis was carried out.
Worse case corner
Next we will look at the concept of a worst-case corner, which assumes the most pessimistic scenario for all the ranges. The following figure show a diagrammatic representation of worst-case corners for processes of 40nm or above, and for processes under 40nm:
How power figures are affected
The power usage for a piece of IP is always calculated using typical values for silicon speed, voltage, and temperature as opposed to corner cases. Corner cases use extreme values that are unlikely to occur in a real-world situation and are not used when calculating power usage. This means that in a real-world situation, particularly if you take steps to ensure normal conditions, the power usage should be close to the calculated values if a similar IP setup is used. However, if you require a power usage below what a piece of IP can give under typical conditions, you should:
- Decrease the power usage of the IP by adjusting the physical IP library variants or by choosing a different process or process option.
- Select a piece of IP from the Arm Flexible Access program which has lower power usage.
How performance figures are affected
The frequency obtainable for a piece of IP is always calculated using the worst-case corner for silicon speed, voltage, and temperature. This frequency value has a direct effect on any benchmark values which are calculated during the analysis. Using an IP setup that is like the PPA analysis may give you better performance in a real situation, particularly if you take steps to ensure a consistent voltage or temperature. However, if you consistently require more performance than a piece of IP can give when the worst-case corners are used, you should:
- Increase the performance of the IP by adjusting the physical IP library variants or by choosing a different process or process option.
- Select a piece of IP from the Arm Flexible Access program which has better performance.
Margins and OCV derates
As we discussed previously, various factors affect the frequency, and therefore the achievable performance, of a piece of IP, for example:
- The options chosen for the IP
- The fab process and process options that are selected
- The physical IP libraries used in the implementation
- The worst-case corner for silicon speed, voltage, and temperature
Some other modifiers are discussed in the following subsections.
Fabs recommend setup and hold margins that provide a safety margin against potential failure in manufactured IP. When Arm produces physical implementations for the purpose of PPA analysis, these safety margins are added to all timing paths in the design. Incorporating the safety margins effectively reduces the performance while ensuring consistent operation. In addition to adhering to the foundry recommendations, the margins that Arm analysis teams add also consider clock period jitter. This is because the PPLs on which IP clocks rely might contain imperfections.
On-chip variation derates
An On-chip Variation (OCV) derate is a modification in performance which accounts for the effect of timing variation in the silicon that arises as part of the manufacturing process. As the feasible size of manufactured IP becomes smaller and smaller, because of more advanced processes, the significance of chip variation increases. Using an OCV derate attempts to model natural variation in the manufacturing process so that resulting PPA figures are more realistic.
Conclusions on the effect of margins and OCV derates
Effectively, taking margins and OCV derates into account gives an even more realistic view of the frequency obtainable by the IP than one derived for the worst-case corner. Final frequency figures generated for Arm PPA analysis include applied margins and OCV derates. In some cases, you will notice that a slightly more favorable frequency, which is for the worst-case corner with no further adjustment, is provided.
Using the fab-recommended margins on all timing paths makes physical implementations, generated by Arm, representative of IP on a realized SoC. However, the margins also have an impact on area. For example, in the case of a 28nm fab process, up to 10% of the total area can be a result of:
- Hold fixing on functional and scan paths, because of large fab-recommended hold margins
- Application of high OCV derates to timing paths
When you compare pieces of IP, you need to check whether these precautionary measures have been applied to each PPA analysis. If these measures are neglected, the result might be a favorable analysis, which leads to an unacceptable final product when the implementation is realized.