LPV_06
LPV_06
• System Design
Hardware/software trade-offs and optimization
• Technology
Transistor scaling, Voltage scaling
• SoC Design
Low power memories and macros
Logic
• Dynamic, leakage power opt w/ EDA tools to meet power budgets
• Power Analysis
Analyze Power using EDA tools to identify Power Problems at a
realistically earlier level.
Reducing active power
• Downsizing transistors (CL)
▫ Slows down logic
• Lowering the supply voltage (VDD)
▫ Slows down logic
▫ Reducing swing slows down
the succeeding stage Pdyn ~ CL Vswing VDD f
• Reducing frequency (f)
▫ Does not reduce energy E ~ CL Vswing VDD
• Reducing switching activity (a)
▫ Logic restructuring
• Reducing glitching
▫ Balancing logic
Reducing Active Power
Target
Path count
delay
Original delay
distribution
Delay
Reducing Leakage
• Using higher thresholds
▫ Channel doping
▫ Body biasing
▫ Reduces drive current
• Using stack effect
▫ Stacked devices
▫ Sleep transistors
• Using longer transistors
▫ Limited benefit
▫ Increase in active current
Power-Performance Optimization
Energy/op
Unoptimized
design
Emax
Emin
Dmin Dmax Delay
Energy/op
(sizing, supply, threshold)
Logic style
(std. cells, custom , …)
Block topology
(adder: CLA, CSA, …) topology B
Micro-architecture
(parallel, pipelined) Delay
Design for Low-Power Techniques
Slid
e 17
Adaptive filtering
• The basic principle is to adjust the filter’s order
length depending on the noise characteristics of
the input signal.
FIR
Filter
Slid
e 19
Switching activity reduction
Input Capacitance = C
Output
Processor Voltage = V
Frequency = f
Power = CV2f
Capacitance =
Processor 2.2C
Voltage = 0.6V
Frequency = 0.5f
Output Power = 0.396CV2f
f/2
Input • The power dissipation of the
parallel system is:
Register
Register
Input • Voltage = 0.6V
Proc. Proc.
• Frequency = f
• Power = 0.432CV2f
f
• The power dissipation of
the pipelined system is:
• Control node change the flow of data that pass through it.
• Examples: multiplexers, condition selectors etc.
Slid
e 39
x 2 x1 sin y1 cos
( x1 y1 ) sin y1 (cos sin )
y 2 x1 (1 sin ) y1 sin
x1 ( x1 y1 ) sin
Operator Reduction (Cont..)
Operator Reduction (Cont..)
Operator Reduction (Cont..)
Architecture and System
Slid
e 44
Loop Unrolling
• The technique of loop unrolling replicates the body of a loop some
number of times (unrolling factor u) and then iterates by step u
instead of step 1. This transformation reduces the loop overhead,
increases the instruction parallelism and improves register, data
cache or TLB locality.
for i = 2 to N - 2 step 2
for i = 2 to N - 1
A(i ) = A(i ) + A(i - 1) A(i + 1)
A(i ) = A(i ) + A(i - 1) A(i + 1)
A(i 1) = A(i 1) + A(i ) A(i + 2)
Slid
e 47
Architecture and System
Slid
e 48
Loop Unrolling for Low Power
Loop Unrolling for Low Power
Loop Unrolling for Low Power
Effective Resource Utilization
7 S 7
S
+ + + +
D
D
5 1 2 6
+ + 1 2 6
5
+ D +
D Retiming
D
3 4 D
3 4
D
Before AFTER
CYCLE Multipliers
1 Adder Multipliers Adder
1 1, 3 - 2 8
2 2, 4 5 1 6
1 - 6, 8 3 7
1 - 7 4 5