Impact of Parameter Variations on Multi-core chips E. Humenay, D. Tarjan, K. Skadron 2004, Kevin Skadron Department of Computer Science University of Virginia 1 2004, Kevin Skadron Motivation Process variations are projected to severely impact the yield of high-performance semiconductors Multi-core architectures have become the future trend of high-performance chips Understanding how process variations interact with CMPs is required 2 Variation Types PVT Variations - Process - Voltage - Temperature

2004, Kevin Skadron This work primarily focuses on process variations 3 Process Variations P variations stem from a variety of sources Within-Die (WID) Die-to-Die (D2D) Wafer-to-Wafer (W2W) Core-to-Core (C2C) 2004, Kevin Skadron 4 WID Variations WID variations can be further sub-divided 2004, Kevin Skadron Systematic (WIDsys) Random (WIDrand) Threshold voltage, Vth, and effective channel length, Leff, are the 2 parameters most susceptible

to random variations Systematic Variations cause parameter values to be spatially correlated Can be modeled as deterministic or random WID variations cause C2C variations 5 Drain Induced Barrier Lowering (DIBL) Ideally, Vth and Leff values are independent of each other The DIBL effect introduces a dependency Vth Vth 0 VDDe 2004, Kevin Skadron (DIBL Leff ) DIBL causes there to be an exponential dependency between Leff and sub-threshold leakage 6 2004, Kevin Skadron

Modeling Methodology In order to estimate the impact of P variations on delay it is necessary to have a critical path (CP) model Prior CP models vary inputs into RC delay equation for Monte-Carlo analyses. Simplicity comes at the expense of accuracy. 7 CP Modeling: Prior Work Fmax GCP model (Bowman, JSSC 02) Ncp ~ Number of critical paths Lcp ~ Number of gates in critical path (Logic Depth) 2004, Kevin Skadron Ncp Lcp Marculescu DAC 05 Ncp ~ stages device count. 8

Importance of Ncp As Ncp increases mean delay increases and delay variation decreases 0.04 0.035 Ncp 0.025 1 2 0.02 4 16 0.015 128 0.01 0.005 1.066 1.061 1.055 1.050 1.044

1.039 1.033 1.028 1.022 1.017 1.011 1.006 1.000 0.995 0.989 0.984 0.978 0.973 0.967 0.962 0 0.956 Count/Samples 2004, Kevin Skadron 0.03

Normalized Delay 9 Modified CP Model Goal: More accurately describe each functional units delay distribution in order to determine which functional units will affect the final frequency distribution Improvements Considering wire delay when determining Lcp Better Ncp assignments Importance of Weff: Vth 2004, Kevin Skadron ~ 1 / Weff Leff 10 Modified CP Model Categorize each stage as being either SRAM or combinational logic SRAM

2004, Kevin Skadron L1s TLBs Register File Rename Map Issue Queue Logic Type SRAM LOGIC Ncp Hi Lo Lcp Lo Hi Weff Lo/Hi Hi Execution Units Decode Stage Issue Select

11 SRAM model Modified version of CACTI 4.0 is used to estimate fraction of access time susceptible to device variations Ncp ~ number of read ports Weff is dependent on unit type 2004, Kevin Skadron L1 caches are assumed to be optimized for area (minimal sized Weff) Time critical SRAM units have larger widths (Assume 5x larger than min) Only consider variation in SRAM access time 12 Combinational Logic Model Logic model is based off of Sklansky adder

Delay modeled with Horowitz delay equation i:k 2004, Kevin Skadron Critical path is carry circuitry Weff is chosen to alleviate fan-out delay i:k i:j Gi:k Pi:k Gk-1:j k-1:j Pk-1:j k-1:j i:j i:j Gi:j Gi:k Pi:k Gk-1:j

Pi:j i:j Gi:j Gi:j Gi:j Pi:j Pi:j 13 WIDrand: SRAM delay Because of large Ncp L1 is likely to be slowest SRAM unit Nominal Frequency is 3GHz 0.09 64KB L1 Count/Samples 120 Entry RF 8KB TLB 0.06 6.96 6.78 6.60

6.42 6.24 6.06 5.88 5.70 5.52 5.34 5.16 4.98 4.80 4.62 4.44 4.26 4.08 3.90 3.72 3.54 3.36 3.18 3.00

2.82 2.64 2.46 0 2.28 2004, Kevin Skadron 0.03 % Frequency Slowdown Due to Random Process Variations 14 WIDrand: SRAM vs. Logic L1 will also be slower than logic Count/Samples 0.09 0.06 64b Adder Critical Path 64KB L1 Cache 6.96 6.42 5.88

5.34 4.8 4.26 3.72 3.18 2.64 2.1 1.56 1.02 0.48 -0.1 -0.6 -1.1 -1.7 -2.2 -2.8 0 -3.3 2004, Kevin Skadron 0.03

% Frequency Slowdown Due to Random Process Variations 15 WIDsys Pattern WIDsys model is derived from actual measurements (Friedberg ISQED05) Fast, High-leakage Leff 28 POWER4-like core scaled to 45nm 2004, Kevin Skadron 14mm 27 26 Slow, Low-leakage 14mm 25 16 Impact of WIDsys on Delay WIDsys can cause frequency from core-to-core to differ by as much as 5%

Large Lcp value causes combinational logic units to be more affected by WIDsys variation 12 % Frequency Slowdown 2004, Kevin Skadron 10 8 64KB L1 6 Logic 4 2 0 0 2 4 6 8 10 12 % WID Systematic Variation in Leff

17 Random Leakage Variation WIDrand will not have an impact on leakage at the architectural level since total leakage is an aggregate sum 0.05 0.045 0.04 1 2 4 0.03 0.025 0.02 0.015 0.01 0.005 14.8 14 13.3 12.5 11.8 11 10.3 9.5

8.75 8 7.25 6.5 5.75 5 4.25 3.5 2.75 2 1.25 0 0.5 Count/Samples 0.035 2004, Kevin Skadron Number of Transistors Normalized Aggregate Leakage 18 C2C Leakage Variation

Figure shows core leakage when considering all possible core locations on a die 3 different magnitudes of DIBL are considered BSIM suggests .15 (best-case) # of Core Positions on Chip 2004, Kevin Skadron 120 100 DIBL 80 0.15 60 0.14 0.13 40 20 0 Normalized Core Leakage 19 2004, Kevin Skadron

Conclusions L1 caches will determine the WID mean frequency. Variations in other units will not directly affect the frequency distribution Considering wire delay in CP model causes device variations to have less of an impact on the frequency distribution WID variations do not result in significant C2C frequency differences At 45nm, C2C sub-threshold leakage variation may be as much as 45% 20