Low Power Design Methodologies and Flows
Low Power Design Methodologies and Flows
Low Power
Design Methodologies and Flows
Jerry Frenkil
Jan M. Rabaey
Slide 12.1
Low Power Design Methodology Motivations
Minimize power
Reduce power in various modes of device operation
Dynamic power, leakage power, or total power
Minimize time
Reduce power quickly
Complete the design in as little time as possible
Prevent downstream issues caused by LPD techniques
Avoid complicating timing and functional verification
Minimize effort
Reduce power efficiently
Complete the design with as few resources as possible
Prevent downstream issues caused by LPD techniques
Avoid complicating timing and functional verification
Slide 12.2
Methodology Issues
Power Characterization and Modeling
How to generate macro-model power data?
Model accuracy
Power Analysis
When to analyze?
Which modes to analyze?
How to use the data?
Power Reduction
Logical modes of operation
For which modes should power be reduced?
Dynamic power versus leakage power
Physical design implications
Functional and timing verification
Return on Investment
How much power is reduced for the extra effort? Extra logic? Extra area?
Power Integrity
Peak instantaneous power
Electromigration
Impact on timing
Slide 12.3
Some Methodology Reflections
Slide 12.4
Power Characterization and Modeling
Slide 12.5
Power Characterization and Modeling
IL
I sc Characterization
CL Database
(raw power data)
I leakage
Power Modeler
Power
Models
Slide 12.6
Generalized Low-Power Design Flow
Design Phase Low-Power Design Activities
Slide 12.7
Power-Analysis Methodology
Motivation
Determine ASAP if the design will meet the power spec
Identify opportunities for power reduction, if needed
Method
Set up regular, automatic power analysis runs (nightly, weekly)
Run regular power analysis regressions as soon as a simulation
environment is ready
Initially can re-use functional verification tests
Add targeted mode- and module-specific tests to increase coverage
Compare analysis results against design spec
Check against spec for different operational modes (idle, xmit, rcv)
Compare analysis results against previous analysis results
Identify power mistakes - changes/fixes resulting in increased power
Identify opportunities for power reduction
Slide 12.8
Power Analysis Methodology Issues
Development phases
System
Description available early in the design cycle
Least accurate but fastest turn around times (if synthesizing ESL to RTL)
Design
Most common design representation
Easy to identify power-saving opportunities
Power results can be associated with specific lines of code
Implementation
Gate-level design available late in the design cycle
Slowest turn around times (due to lengthy gate-level simulations) but most
accurate results
Difficult to interpret results for identifying power-saving opportunities
cant see the forest for the trees
Availability of data
When are simulation traces available?
When is parasitic data available?
Slide 12.9
System-Phase Analysis Methodology
ESL
ESL IP
IP sim
sim ESL
ESL IP
IP power
power Env.
Env. Tech.
Tech.
stimulus
stimulus models
models Code
Code models
models Data
Data Data
Data
RTL
Code
Power
Reports
Slide 12.10
Design-Phase Analysis Methodology
mode 1
mode 2
S mode n IP power Env. Tech.
RTL
S models Data
RTL Design Data
Stimulus
RTL Simulation
mode 1
mode 2
A mode n
RTL Power Analysis
A
Activity
Data
P
R P
R Power
Reports
Slide 12.11
Implementation-Phase Analysis
mode 1
mode 2
S mode n RTL IP power Env. Tech.
S RTL Design models Data Data
Stimulus
mode 1
mode 2 gate
A mode n Gate-level
Gate level
netlist
A
Activity
Power Analysis
Data
P
R P
R Power
Reports
Slide 12.12
Power Analysis over Project Duration
Slide 12.13
System-Phase Low-Power Design
Primary objectives: minimize feff and VDD
Modes
Modes enable power to track workload
Software programmable; set/controlled by OS
Hardware component needed to facilitate control
Software timers and protocols needed to determine when to change
modes and how long to stay in a mode
Challenges
Evaluating different alternatives
Slide 12.14
Power-Down Modes Example
Slide 12.15
Parallelism and Pipelining Example
Concept: maintain performance with reduced VDD
Total area increases but each data path works less in each cycle
VDD can be reduced such that the work requires the full cycle time
Cycle time remains the same, but with reduced VDD
Pipelining a data path
Power can be reduced by 50% or more
Modest area overhead due to additional registers
Paralleling a data path
Power can be reduced by 50% or more
Significant area overhead due to paralleled logic
Multiple CPU cores
Enables multi-threaded performance gains with a constrained VDD
Slide 12.16
System-Phase Low-Power Design Flow
Slide 12.17
Design-Phase Low-Power Design
Clock gating
Reduces / inhibits unnecessary clocking
Registers need not be clocked if data input hasnt changed
Data gating
Prevents nets from toggling when results wont be used
Reduces wasted operations
Slide 12.18
Clock Gating
Power is reduced by two mechanisms
Clock net toggles less frequently, reducing feff
Registers internal clock buffering switches less often
din d q dout
Memory
qn enM
en Control
clk
clk clk
Slide 12.19
Clock-Gating Insertion
Slide 12.20
Clock Gating Verilog Code
Conventional RTL Code
//always clock the register
always @ (posedge clk) begin // form the flip-flop
if (enable) q = din;
end
Slide 12.21
Clock Gating: Glitchfree Verilog
enable d q en_out
LATCH
gn G1
clk gclk
Slide 12.22
Data Gating
Objective
Reduce wasted operations reduce feff
Example
Multiplier whose inputs change X
every cycle, whose output
conditionally feeds an ALU
Low-Power Version
Inputs are prevented from
rippling through multiplier, X
if multiplier output is not
selected
Slide 12.23
Data-Gating Insertion
Issues
Extra logic in data path slows timing
Additional area due to gating cells
Slide 12.24
Data-Gating Verilog Code: Operand Isolation
Conventional Code
assign muxout = sel ? A : A*B ; // build mux
B
X
muxout
A
sel
Low-Power Code
assign multinA = sel & A ; // build and gate
assign multinB = sel & B ; // build and gate
assign muxout = sel ? A : multinA*multinB ;
B
X
muxout
A
sel
Slide 12.25
Memory System Design
Slide 12.26
Split Memory Access
16K x 32
din
RAM
32
addr dout
write
noe
15 addr[14:0]
d q
pre_addr
addr[14:1]
dout
clock noe
write
32
addr dout
din
RAM
16K x 32
addr[0]
Slide 12.27
Implementation Phase Low-Power Design
Low-power synthesis
Dynamic power reduction via local clock gating insertion, pin-swapping
Slack redistribution
Reduces dynamic and/or leakage power
Power gating
Largest reductions in leakage power
Slide 12.28
Slack Redistribution
Objective
Reduce dynamic Power or leakage power
or both by trading off positive timing slack
Physical-level optimization
Best optimized post-route
Must be noise-aware Post-optimized
Number of Paths
Slide 12.29
Dynamic Power Optimization: Cell Resizing
Positive-Slack Trade-Off for Reduced Dynamic Power
Objective: reduce dynamic power where speed is not needed
Optimization performed post-route for optimum results
Cells along paths with positive slack replaced with lower-drive cells
Switching currents, input capacitances, and area are all reduced
Incremental re-route required new cells may have footprints
different from the previous cells
2x 1x
2x 1x
2x 2x
2x 2x 2x
2x 2x 2x
2x 2x
2x 2x
Slide 12.30
Leakage Power Optimization: Multi-VTH
Trade Off Positive Slack for Reduced Leakage Power
Objective: reduce leakage power where speed is not needed
Optimization performed post-route for optimum results
Cells along paths with positive slack replaced with High-VTH cells
Leakage currents reduced where timing margins permit
Re-routing not required new cells have same footprint as
previous cells
L H
L H
L L
L L
L L L L
L L
L L
Slide 12.31
Slack Redistribution Flows
y y
Reduce Power
Reduce Pwr OK (timing-and OK
n n
noise-aware)
y y
Slide 12.32
Slack Redistribution: Trade-Offs and Issues
Yield
Slack redistribution effectively turns non-critical paths into critical
or semi-critical paths
Increased sensitivity to process variation and speed faults
Libraries
Cell resizing needs a fine granularity of drive strengths for best
optimization results more cells in the library
Multi-VTH requires an additional library for each additional VTH
Iterative loops
Timing and noise must be re-verified after each optimization
Both optimizations increase noise and glitch sensitivities
Done late in the design process
Difficult to predict in advance how much power will be saved
Very much dependent upon design characteristics
Slide 12.33
Power Gating
Objective
Reduce leakage currents by inserting a switch transistor (usually
high-VTH) into the logic stack (usually low-VTH)
Switch transistors change the bias points (VSB ) of the logic transistors
Most effective for systems with standby operational modes
1 to 3 orders of magnitude leakage reduction possible
But switches add many complications
VDD
Logic
VDD Cell
Logic
Cell
Virtual
Ground
sleep Switch
Cell
Slide 12.34
Power Gating: Physical Design
Switch placement
In each cell?
Very large area overhead, but placement and routing is easy
Grid of switches?
Area-efficient, but a third global rail must be routed
Ring of switches?
Useful for hard layout blocks, but area overhead can be significant
Global Supply
Virtual Grounds
Virtual
Module Supply
Switch
Switch Integrated Cells
Within Each Cell Switch Cell
Slide 12.35
Power Gating: Switch Sizing
Switch
Cell
Area
(m2)
I LKG
tD
Vvg_max (mV)
Lvg_max (m)
Slide 12.36
Power Gating: Additional Issues
Library design: special cells are needed
Switches, isolation cells, state retention flip-flops (SRFFs)
Headers or Footers?
Headers better for gate leakage reduction, but ~ 2X larger
Which modules, and how many, to be power gated?
Sleep control signal must be available, or must be created
State retention: which registers must retain state?
Large area overhead for using SRFFs
Floating signal prevention
Power-gate outputs that drive always-on blocks must not float
Rush currents and wake-up time
Rush currents must settle quickly and not disrupt circuit operation
Delay effects and timing verification
Switches affect source voltages which affect delays
Power-up & power-down sequencing
Controller must be designed and sequencing verified
Slide 12.37
Power Gating Flow
Design power
power-gating
gating
Determine floorplan
library cells
Determine state
retention mechanism Clock tree synthesis
Design power-gating
power gating virtual-rail
Verify virtual rail
controller electrical
characteristics
Slide 12.38
Multi-V DD
Objective
Reduce dynamic power by reducing theVDD2 term
Higher supply voltage used for speed-critical logic
Lower supply voltage used for non-speed-critical logic
Example
MemoryVDD = 1.2 V
LogicVDD = 1.0 V
Logic dynamic power
savings = 30%
Slide 12.39
Multi-VDD Issues
Partitioning
Which blocks and modules should use which voltages?
Physical and logical hierarchies should match as much as possible
Voltages
Voltages should be as low as possible to minimize CVDD2f
Voltages must be high enough to meet timing specs
Level shifters
Needed (generally) to buffer signals crossing islands
May be omitted if voltage differences are small, ~ 100 mV
Added delays must be considered
Physical design
Multiple VDD rails must be considered during floorplanning
Timing verification
Sign-off timing verification must be performed for all corner cases across
voltage islands
For example, for two voltage islands Vhi,Vlo
Number of timing-verification corners doubles
Slide 12.40
Multi-VDD Flow
Determine which blocks
run at which VDD
Multi-voltage
synthesis
Determine floorplan
Multi-voltage placement
Route
Verify timing
Slide 12.41
Power Integrity Methodologies
Motivation
Ensure that the power delivery network will not
adversely affect the intended performance of the IC
Functional operation
Performance speed and power
Reliability
Method
Analyze specific voltage drop parameters
Effective grid resistances
Static voltage drop
Dynamic voltage drop
Electromigration
Analyze impact of voltage drop upon timing and noise
Slide 12.42
Power Integrity Verification Flow
Placement, Power
Routing
Routing
Instance
Currents
Dynamic Voltage Drop
& EM Analysis
Voltage Drop optimization
(Spread peak currents, Decap
Dynamic Voltage Drop insert & optimize decaps) Models
Optimization
Voltage-Aware
Timing & SI Analysis Voltage-aware STA/SI
(Compute voltage drop effects
Power Grid Sign-off on timing & SI)
Slide 12.43
Power Integrity: Effective Resistance Check
Resistance Histogram
Motivation
Verify connectivity of all
circuit elements to the
power grid
Are all elements
connected?
Are all elements
connected to the grid
with a low resistance?
Method
Extract power grid to
obtain R
Isolate and analyze R
in the equation
V (t ) = I(t )*R + C*dv/dt *R + L*di/dt
Well formed distribution Unexpected outliers
of resistances indicates indicate poorly
well-connected connected (high R)
instances Instances.
Slide 12.44
Power Integrity: Stimulus Selection
Slide 12.45
Power Integrity: Static Voltage Drop
Motivation
Verify first-order voltage drop
Is grid sufficient to handle
average current flows?
Static voltage drop should
only be a few % of the supply
voltage
0% drop
Select stimulus
Compute time-averaged
power consumption for a Typical static voltage drop bulls-eye of
typical operation to obtainI an appropriately constructed power grid.
But 10% static voltage drop is very high.
Compute: V = IR
Non time-varying
Slide 12.46
Power Integrity: Dynamic Voltage Drop
Motivation
Verify dynamic voltage drop
Are current and voltage transients within spec?
Can chip function as expected in external RLC environment?
Method
Extract power grid to obtain on-chip R and C
Include RLC model of the package and bond wires
Select stimulus
Compute time-varying power for specific operation to obtain I(t)
Compute V(t) = I(t)*R + C*dv/dt*R + L*di/dt
Slide 12.47
Voltage Drop Mitigation with Decoupling Caps
Package + Rdecap
bond-wire Cn-well Ron
Ccoupling
Kmutual
Cdecap
Ccell
Rsignal
Rdecap
Cp-well Ron
Csignal
Slide 12.48
Decoupling Cap Effectiveness
30
Number of Instances (x1000)
25
20
15
10
0
0.7 0.8 0.9 1.0
Effective Voltage (V V )
47 mV improvement
improv
imp roveme nt after
ement after
Decaps placement Decaps optimized decap
p placement
decap p cem
pla cement
ent optimization
opt
p imi
imizat
zation
ion
based upon placement based
available space upon dynamic
voltage drop
Slide 12.49
Dynamic Voltage Drop Impact
Timing analysis without voltage drop finds no negative slack paths
Timing analysis with voltage drop uncovers numerous timing violations
3000
2500
90000 2000
80000 1500
Number of paths
70000
1000
60000
500
50000
0
40000 2 1.5 1 0.5 0 0.5
30000
20000
10000
0
2
10
11
12
13
14
15
0
8
1
Slack(ns)
Slide 12.50
Summary Low Power Methodology Review
Power analysis
Run early and often, during all design phases
Power reduction
Multiple techniques and opportunities during all phases
Most effective opportunities occur during the early design phases
Power integrity
Voltage drop analysis is a critical verification step
Consider the impact of voltage drop upon timing and noise
Slide 12.51
Some Useful References
Books and Book Chapters
A. Chandrakasan and R. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers, 1995.
D. Chinnery and K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.
J. Frenkil, Tools and Methodologies for Power Sensitive Design, in Power Aware Design Methodologies, M. Pedram
and J. Rabaey, Kluwer, 2002.
J. Frenkil and S. Venkatraman, Power Gating Design Automation, in Closing the power crap Between
ASIC and custom, Chapter 10, Springer2007.
M. Keating et al., Low Power Methodology Manual For System-on-Chip Design, Springer, 2007.
C. Piguet, Ed., Low-Power Electronics Design, Ch. 3842, CRC Press, 2005
Slide 12.52