How can higher performance increase the area of implemented IP?

This example examines how area is affected across implementations of a processor as a result of the target frequency setting, IP option selection, and physical IP library choice. The analyses for three implementations, Minimum, Featured, and AFAP, are compared against each other. The comparison is interesting because, regarding fab processes, everything is the same for all three implementations. This means that the effect of the variants can be clearly shown.

The first three sections showcase the data of each of the implementations. Examine the data in these three sections before reading the breakdown of the results in the final section.

Minimum implementation example

The following tables contain PPA data that was obtained from a Minimum implementation, including data on the implementation decisions.

Power Performance Area
Dynamic 77.0mW/GHz
Maximum frequency 484.5 MHZ
0.330mm2 (Post-shrink)
Static 2.39 mW     
Silicon process
Fab
TSMC
Process
28nm
Process option
HPM
Post-shrink scale 0.81 (0.9 x 0.9)
Physical IP
Cell libraries Arm Standard Cell Libraries
% Kit
Channel length Track size
Voltage threshold
100 Base
C38 SC9
SVt
Memory libraries Arm Artisan RF High-Density SP
IP config
CPU Single core
Level 1 caches Yes
I-cache size
32K
D-cache size
32K 
TCMs
No
I-TCM size
N/A
D-TCM size
N/A
Integrated interrupt controller
No
IRQs
N/A
ACP
No
ECC
No 
ETM
No
FPU No
LLPP
No
MPU
No
Neon
N/A
SCU
No
Process conditions
  For power figures
For performance figures
Silicon speed Typical (tt)
Slow (ss)
Voltage 0.9V
0.81V
Temperature 85°C
0°C
Margin
50ps
OCV
8%

Featured implementation example

The following tables contain PPA data that was obtained from a Featured implementation, including data on the implementation decisions.

Power
Performance
Area
Dynamic 123.2 mW/GHz Maximum frequency 1000 MHz
0.837mm² (Post-shrink)
Static 13.051 mW
   
Silicon process
Fab
TSMC
Process
28nm
Process option
HPM
Post-shrink scale 0.81 (0.9 x 0.9)
Physical IP
Cell libraries Arm Standard Cell Libraries, Arm High Performance Kit
% Kit
Channel length Track size
Voltage threshold
16.5 Base
C31 SC9
SVt
8.8
Base
C35
SC9  SVt 
28.5
Base
C38
SC9 SVt
33.7
Base
C35
SC9  HVt
3.4
HPK
C31
SC9
SVt
1.7
HPK
C35
SC9
SVt
4.2
HPK
C38
SC9
SVt
3.1
HPK
C35
SC9
HVt
Memory libraries Arm Artisan RF High-Speed SP
IP config
CPU Single core
Level 1 caches Yes
I-cache size
32K
D-cache size
32K 
TCMs
Yes
I-TCM size
32K
D-TCM size
32K
Integrated interrupt controller
Yes
IRQs
32
ACP
Yes
ECC
Yes
ETM
Yes
FPU Yes
LLPP
No
MPU
No
Neon
N/A
SCU
No
Process conditions
  For power figures
For performance figures
Silicon speed Typical (tt)
Slow (ss)
Voltage 0.9V
0.81V
Temperature 85°C
0°C
Margin
50ps
OCV
8%

As Fast As Possible implementation example

The following tables contain PPA data that was obtained from an AFAP implementation, including data on the implementation decisions.

Power
Performance
Area
Dynamic 143.0 mW/GHz Maximum frequency 1538 MHz
0.628 mm² (Post-shrink)
Static 92.6 mW
   
Silicon process
Fab
TSMC
Process
28nm
Process option
HPM
Post-shrink scale 0.81 (0.9 x 0.9)
Physical IP
Cell libraries Arm Standard Cell Libraries, Arm High Performance Kit
% Kit
Channel length Track size
Voltage threshold
60.5
Base
C31 SC12
SVt
23.5
Base
C31
SC12 LVt 
5.2
HPK
C31
SC12 SVt
10.8
HPK
C31
SC12 LVt
Memory libraries Arm Artisan Fast Cache Instances
IP config
CPU Single core
Level 1 caches Yes
I-cache size
32K
D-cache size
32K 
TCMs
No
I-TCM size
N/A
D-TCM size
N/A
Integrated interrupt controller
No
IRQs
N/A
ACP
No
ECC
No
ETM
No
FPU No
LLPP
No
MPU
No
Neon
N/A
SCU
No
Process conditions
  For power figures
For performance figures
Silicon speed Typical (tt)
Slow (ss)
Voltage 0.9V
0.81V
Temperature 85°C
0°C
Margin
50ps
OCV
8%

Breakdown of results

As expected, the Featured implementation has the largest area. The addition of IP options increases the number of cells that are required. However, the fact that the AFAP implementation is larger than the Minimum implementation may seem unusual, especially because both implementations use minimal IP options.

Notice that the Featured implementation takes some steps to allow for a certain level of performance above that of the Minimum implementation. For example, although not to the level of the AFAP, the Minimum implementation employs hybridization when selecting the cell libraries. By doing so, the Minimum implementation allows high performing cells to be selected for critical paths. In this sense, the Featured implementation is aiming for a configuration which could have a practical use.

The reason for the larger area observed in the AFAP implementation is that high performance cells require a higher area. The observation does not seem supported with regard to channel length. The Minimum implementation uses a longer cell channel length (C38) than the AFAP, which exclusively uses C31 cells. The longer channel length lowers the power consumption of the processor for the Minimum implementation. However, because channel length does not increase the size of footprint compatible cells, the fact that the AFAP uses a shorter channel length, for increased performance, does not result in an area reduction.

The Minimum implementation uses a single cell library across the board. This is because, if minimum power usage is a priority, there is no need to speed up any specific groups of cells. In contrast, the Featured and AFAP implementations use higher performing cells for critical paths. Regarding performance boosting, the AFAP goes further by:

  • Exclusively using C31 cells
  • Using cells with a track size of 12. Wider cells support high performance.
  • Using cells with a low voltage threshold. Low voltage threshold cells are faster but consume more static power.

Two things are responsible for pushing up the areas of the AFAP implementation:

  • The high track size
  • The ambitious target frequency for the processor, which means that the EDA tools use larger, higher drive strength cells from the libraries

The Minimum implementation is best in terms of power usage. Higher channel length, smaller track size, and high voltage threshold cells all contribute to lower power usage. Also, when the demands for performance are less, higher density cells can be used, thereby reducing the area.


Previous Next