0% found this document useful (0 votes)
3 views

LPV_06

The document discusses various techniques for low power design in architecture and systems, focusing on power and performance management, including methods for reducing active power and leakage. It highlights the importance of hardware/software trade-offs, power analysis using EDA tools, and design techniques such as parallelism, pipelining, and loop unrolling. Additionally, it covers specific power-saving modes in microprocessors and strategies for managing power consumption effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

LPV_06

The document discusses various techniques for low power design in architecture and systems, focusing on power and performance management, including methods for reducing active power and leakage. It highlights the importance of hardware/software trade-offs, power analysis using EDA tools, and design techniques such as parallelism, pipelining, and loop unrolling. Additionally, it covers specific power-saving modes in microprocessors and strategies for managing power consumption effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 52

Low Power Design

Low power Architecture & Systems: Power &


performance management, switching activity reduction,
parallel architecture with voltage reduction, flow graph
transformation, low power arithmetic components, low
power memory design.
Managing the Power Problem

• System Design
 Hardware/software trade-offs and optimization
• Technology
 Transistor scaling, Voltage scaling
• SoC Design
 Low power memories and macros
 Logic
• Dynamic, leakage power opt w/ EDA tools to meet power budgets
• Power Analysis
 Analyze Power using EDA tools to identify Power Problems at a
realistically earlier level.
Reducing active power
• Downsizing transistors (CL)
▫ Slows down logic
• Lowering the supply voltage (VDD)
▫ Slows down logic
▫ Reducing swing slows down
the succeeding stage Pdyn ~  CL Vswing VDD f
• Reducing frequency (f)
▫ Does not reduce energy E ~  CL Vswing VDD
• Reducing switching activity (a)
▫ Logic restructuring
• Reducing glitching
▫ Balancing logic
Reducing Active Power

• Downsizing, lowering the supply on the critical path will lower


the operating frequency
▫ Downsize non-critical paths
▫ Narrows down the path delay distribution
▫ Increases impact of variations

Target
Path count

delay
Original delay
distribution

Delay
Reducing Leakage
• Using higher thresholds
▫ Channel doping
▫ Body biasing
▫ Reduces drive current
• Using stack effect
▫ Stacked devices
▫ Sleep transistors
• Using longer transistors
▫ Limited benefit
▫ Increase in active current
Power-Performance Optimization
Energy/op
Unoptimized
design

Emax

Emin
Dmin Dmax Delay

Maximize throughput for given energy or


Minimize energy for given throughput
Power-Performance Optimization
There are many sets of parameters to adjust
Tuning variables topology A
Devices
Circuit

Energy/op
(sizing, supply, threshold)
Logic style
(std. cells, custom , …)
Block topology
(adder: CLA, CSA, …) topology B
Micro-architecture
(parallel, pipelined) Delay
Design for Low-Power Techniques

• Reduced supply voltage


▫ Charging power varies as VDD2
▫ Reduce transistor threshold voltages to maintain noise margins
▫ But reduced thresholds increase leakage currents exponentially
• Change your CMOS logic family – use a low-power one
• Transistor resizing to speed-up circuit and reduce power
• Use parallelism and pipelining in system architecture – use more,
but slower, hardware
• Standby modes – clock disabling and power-down of selected
logic blocks
• Adiabatic computing – avoid gain/loss of heat during computing
• Software redesign to lower power dissipation
Low Power Design Techniques
• Low power applications
▫Remote systems (e.g., satellite)
▫Portable systems (e.g., mobile phone)
• Methods of low power design
▫Reduced supply voltage
▫Adiabatic switching and charge recovery
▫Clock suppression
▫Logic design for reduced activity
▫Reduce Hazards & Glitches (40% in arithmetic logic)
▫Transistor sizing
▫Pass-transistor logic
▫Pseudo-NMOS logic
▫Multi Threshold gates
▫Software techniques
• Reference: Chandrakasan and Brodersen
Performance Maximization
Techniques
• Use parallelism and pipelining in system architecture –
use more, but slower, hardware
• High throughput
• High router utilization
Power & performance
management
• Microprocessor sleep modes
• Performance Management
• Adaptive filtering
Power & performance
management
• It refer to general class of Low power
techniques that carefully manage the
performance and throughput of a system.
• Do not waste power by designing hardware
that has more performance than necessary.
• Throughput : the amount of work that a
system can do in a given time period
Microprocessor sleep modes
• Deactivate some functional units when no
computation is required.
• At different level of the design
• Subsystems, modules, buses, functional
units, state machine etc.
• Motorola PowerPC603 processor
Motorola PowerPC603 processor
• The CPU has three primary power saver
modes called
• DOZE
• NAP
• SLEEP
• controlled by software.
• In DOZE mode, most functional unit of the
processor are stopped except the on chip cache
memory to maintain cache coherency.

• In NAP mode, cache is also turned off to conserve


power and the processor wakes up after a fixed
amount of time or upon external interrupt.

• In SLEEP mode, the entire processor clock may be


halted and an external reset or interrupt can
resume its operation.
MODE 66 MHz 80 MHz

No power 2.18 W 2.54W


management
Dynamic power 1.89W 2.20W
management

DOZE 307mW 366mW


NAP 113mW 135mW
SLEEP 89mW 105mW
SLEEP without 18mW 19mW
PLL
SLEEP without 2mW 2mW
system clock
Performance Management:

Slid
e 17
Adaptive filtering
• The basic principle is to adjust the filter’s order
length depending on the noise characteristics of
the input signal.

• The quality or order length requirement of a


digital filter depends on:
▫ The desired signal to noise ratio of the output.
▫ The noise energy level of the input signal.
Adaptive Filtering:

FIR
Filter

Slid
e 19
Switching activity reduction

• Some of the techniques available in the reducing


the switching activity are as follows:
• Guarded Evaluation.
• Bus Multiplexing
• Glitch reduction by pipelining
Guarded Evaluation:
• It is a technique to reduce switching activities by
adding latches or blocking gates at the inputs of a
combinational modules if the outputs are not
used.
Example: Guarded Evaluation
• For example, consider a multiplier whose outputs
are used only under certain conditions.
• In this case, the input to a multiplier can be
stopped from toggling whenever outputs are not
used. It will stop unnecessary switching from
entering into the multiplier.
Bus Multiplexing
• Highly congested designs tend to consume more
power due to longer wire lengths.
• Placement has to be porous and more spread out
to route the design, resulting in longer wire
lengths and more switches per wire.
• All of this contributes to bad timing results, as
well as increased power consumption.
Bus Multiplexing (cont..)
• Reducing such busses helps both timing and
power.
• The busses carrying correlated data should be
multiplexed together to further reduce switching
into the MUX/DEMUX logic.
Glitch Reduction by Pipelining
• Glitches are unwanted switching activities that
occur before a signal settles to its intended value.

• Pipelining is another technique that involves


introducing the registers in the middle of long
combinatorial paths.

• This adds latency but increases the speed and


reduces the levels of logic.

• The introduction of extra registers consumes


power but minimizes the glitches drastically.
Parallel architecture with voltage
reduction
• Used to improve the computation throughput of
high performance digital system.
• Uniprocessing system
• Parallel system
Uniprocessing system

Input Capacitance = C
Output
Processor Voltage = V
Frequency = f
Power = CV2f

• In a uniprocessing system, the power dissipation


will be given by:
• Puni = CV2f
Parallel Architecture

Capacitance =
Processor 2.2C
Voltage = 0.6V
Frequency = 0.5f
Output Power = 0.396CV2f
f/2
Input • The power dissipation of the
parallel system is:

Processor f • Ppar = 0.396CV2f


= 0.396Puni

• About 60% reduction in power


f/2
obtainable.

For N stage Parallelism :


2
V   f  Puni
Ppar ( nC )     2
 n  n n
Pipelined Architecture
• Capacitance = 1.2C

Register

Register
Input • Voltage = 0.6V
Proc. Proc.
• Frequency = f
• Power = 0.432CV2f
f
• The power dissipation of
the pipelined system is:

For N stage Pipelining : • Ppip = 0.432CV2f


= 0.432Puni
2
V  Puni • About 60% reduction in
Ppip C   f  2 power obtainable.
 n n
Flow graph transformation
• This is a system level technique for the design of
special purpose DSP systems, which are
characterized by computation intensive data path
operations with simple control structures.
• This is also known as Control Data Flow Graph.
Control Data Flow Graph

• The graph consists of control nodes and data nodes


connected by edges.

• Control node change the flow of data that pass through it.
• Examples: multiplexers, condition selectors etc.

• Data nodes provide computation operators for the input


data streams such as addition, multiplication, shift etc.

• The graph edges represent the data streams of the


system.
Control Data Flow Graph (Cont..)

• A control data flow graph expresses the


conceptual algorithm of the system.

• The control data flow graph is often the starting


point to derive the actual hardware architecture
of a system by mapping the operators and edges
to actual hardware modules and busses
respectively.
Control Data Flow Graph
(Cont..)
• Draw a control flow graph of a system that
computes the equation
y n a n bn  3a n  1
y n a n bn  3a n  1
Hardware architecture
Operator Reduction
• Draw a control flow graph of a system that
computes the equation
• Y=AB+AC
Operator Reduction(Cont..)
• Draw a control flow graph of a system that
computes the equation
• Y=A(B+C)
Operator Reduction (Cont..)
Architecture and System

Slid
e 39
x 2  x1 sin   y1 cos 
( x1  y1 ) sin   y1 (cos  sin  )
y 2  x1 (1  sin  )  y1 sin 
 x1  ( x1  y1 ) sin 
Operator Reduction (Cont..)
Operator Reduction (Cont..)
Operator Reduction (Cont..)
Architecture and System

Slid
e 44
Loop Unrolling
• The technique of loop unrolling replicates the body of a loop some
number of times (unrolling factor u) and then iterates by step u
instead of step 1. This transformation reduces the loop overhead,
increases the instruction parallelism and improves register, data
cache or TLB locality.
for i = 2 to N - 2 step 2
for i = 2 to N - 1
A(i ) = A(i ) + A(i - 1) A(i + 1)
A(i ) = A(i ) + A(i - 1) A(i + 1)
A(i  1) = A(i  1) + A(i ) A(i + 2)

Loop overhead is cut in half because two iterations are performed in


each iteration.
If array elements are assigned to registers, register locality is improved
because A(i) and A(i +1) are used twice in the loop body.
Instruction parallelism is increased because the second assignment
can be performed while the results of the first are being stored and the
loop variables are being updated.
Loop Unrolling (IIR filter example)
loop unrolling : localize the data to reduce the activity of the inputs of the
functional units or two output samples are computed in parallel based on
two input samples.
Yn  1  X n  1  A Yn  2
Yn  X n  A Yn  1  X n  A ( X n  1  A Yn  2 )

Neither the capacitance switched nor the voltage is altered. However,


loop unrolling enables several other transformations (distributivity,
constant propagation, and pipelining). After distributivity and constant
propagation,
Yn  1  X n  1  A Yn  2
Yn  X n  A Yn  1  A2 Yn  2
I I R Filter
Loop Unrolling is a method to apply parallelism to the
computation.

Slid
e 47
Architecture and System

Slid
e 48
Loop Unrolling for Low Power
Loop Unrolling for Low Power
Loop Unrolling for Low Power
Effective Resource Utilization
7 S 7
S
+ + + +

D
D
5 1 2 6
+ + 1 2 6
5
+ D +
D Retiming
D
3 4 D

3 4
D

Before AFTER

CYCLE Multipliers
1 Adder Multipliers Adder
1 1, 3 - 2 8
2 2, 4 5 1 6
1 - 6, 8 3 7
1 - 7 4 5

Can reducd interconnect capacitance.

You might also like