4.CMOS Power Consumption&Low Power Technique Final
4.CMOS Power Consumption&Low Power Technique Final
Motivation
CMOS Power Consumption Sources
Low power techniques
UPF
References
Prepared by:
ICpedia PnR Team
C0ntents
Motivation
CMOS Power Consumption Sources
UPF
References
2
The Transistor Revolution
3
Advances in IC Technologies
Intel 4004 microprocessor Intel Pentium (IV) microprocessor Intel Xeon E7-8893 v2
4
Electronic Technology Today: CMOS Convergence
5
Moore’s Law
1965. Moore’s law was discovered, according to which the number of transistors in ICs doubles every 18 months.
6
Die Size Growth & Frequency
Die size grows by 14% every year Clock frequency doubles every 2 years
7
Power Dissipation & Power Density
Power increases about ten times every 3 years Power Density increases twice every year
8
Power Consumption of Integrated Circuits
9
Power Consumption of Integrated Circuits
10
Power Affected Problems
<90nm
Technology
Application
• Wireless • Microprocessors
• Mobile • Graphics/multimedia • All design<90nm
• Embedded systems
• Networking/telecom
Concern
11
Sources of Power Dissipation in CMOS
– represents the Charging and discharging load capacitances
– is due to the direct-path short current, which arises when both NMOS
and PMOS transistors are simultaneously active, conducting current
directly from supply to ground
– is due to leakage current, which arises from reverse bias diode currents, Power
sub-threshold currents, gate tunneling
Cload
ILeakage
Isw
IInt
12
Dynamic Power
Dynamic Power: Charging and discharging load capacitances
=
Minimizing Dynamic Power :
1. Voltage and frequency scaling (lower )
- Run high speed circuits at lower voltage to meet performance constraint
- Reduce clock frequency with parallel functional units
- Relax critical path constraints by pipelining.
- Reducing voltage lead to quadratic reduction in the dynamic switching power but Switching
frequency is reduced
- Note that: voltage and frequency usually fixed by project so the most controllable factors will
be switching activity and load capacitance.
2. Reduce capacitive load (lower )
- Minimum device sizes if performance allows
= +
- Compact and custom layout
- Shorten or eliminate long wires, including clock net
3. Reduce activity factor (lower )
- Clock gating
13
Short Circuit Power
Occurs when PMOS and NMOS devices on simultaneously, Direct path from
to .
Short circuit power increases with rise and fall times of input.
Another terminology that minizine short circuit power requires that gate output transition
should not be faster than the input transition.
3. When input and output rise/fall times are equalized, most power is associated with
dynamic power.
15
Technology Scaling Impact on Short Circuit Power
1-16% short-circuit power at 0.18 micron.
16
Static Power
Static or leakage power occurs when the device is at steady state, no activity in the
device [transistor is off].
pn-junction leakage
Even when the p-n junctions are reverse-biased, there is a small current flowing
through these junctions due to minority carriers. When the temperature increases, the
minority carrier injection also raises, and causes an increase in leakage current.
17
Static Power
( )
𝐕 gs − 𝐕 th − 𝐕 ds
𝒒 𝒒
𝐈sub =𝐈𝟎 ex p nkT
⋅ 𝟏−ex p kT
Reverse Biased Junction
( )
𝟐
kT 𝟏 .𝟖 𝐖 (Band-To-Band-Tunneling) BTBT Leakage
𝐈𝐎 ≈ 𝜷 𝒆 𝛃=μ Cox
( )
qVapp
𝐪 𝐋
𝐈 reverse = 𝐀 ⋅ 𝐉𝐒 𝐞 kT
−𝟏
18
Static Power
Gate Oxide leakage increases with :
- Increase in VOX - Reduction in tOX
19
Technology Scaling Impact on Leakage Power
20
Exercise 1
What really matters for IR drop analysis, is overall power consumption.. Or power density?
For FPGAs, what is the major component of power, static or dynamic power?
For the same technology node , when the delays are small, Power consumptions will be ………. [ High – Low
– Cannot be determined – Doesn’t change ]
21
Importance of Power Awareness
Power has an impact on:
System and Software Architecture Choose appropriate power intent, design styles etc
RTL Implementation
Implement power intent in appropriate format
Logic simulation
Power aware simulation and analysis
Logic Synthesis
Automate synthesis of LPD techniques
Timing Analysis
Power Management should be considered at the earliest design stages. Almost every step of design flow need to be modified for
LPD “Low Power Design”.
23
Importance of Power Awareness
• Clock gating
• Multi-Threshold Logic
• Power Gating
24
Dynamic Power Reduction Techniques
Clock Gating :
• Clock Gating technique used for synchronous design to disable clock switching for certain time [when circuit not used].
• A significant fraction of the dynamic power in a chip is in the distribution network of the clock.
• Up to 50% or even more of the dynamic power can be spent in the clock buffers. The reason is:
1. Clock buffers have the highest toggle rate in the system, there are lots of clock buffers in a design
2. Clock buffers often have a high drive strength to minimize clock delay
The most common technique to reduce this power is to turn clocks off when they are not required. This approach is known as clock gating.
• RTL compilation with clock gating insertion :
• Today, most libraries include specific clock gating cells that are recognized by the synthesis tool. The combination of explicit clock
gating cells and automatic insertion makes clock gating a simple and reliable way of reducing power.
25
Dynamic Power Reduction Techniques
Clock Gating Distribution :
• Can insert clock gating at multiple levels in clock tree
• Can shut off entire subtree
26
Dynamic Power Reduction Techniques
Integrated Clock Gating (ICG) Cell
• Integrated Clock Gating (ICG) Cell is a specially designed cell that is used for clock gating
techniques.
• ICG cell basically stops the clock propagation through it when we apply a low clock enable
signal on it.
• As shown in the waveform figure it provides a glitch-free clock gated output. passed the clock
single only when the enable signal is high and stop the clock propagation when enable signal
is low.
27
Dynamic Power Reduction Techniques
Why not only AND gate as a clock gating?
• The issue with the AND gate as clock gating is, it can not provide a glitch-free
output whereas a glitch.
Note that :
• There are trade of between power and area due to number of ICG cells that will be added.
• Enable signal must operate at much lower frequency than clock itself otherwise it may out of
power saving.
• Best way to insert ICG cells is near to leaf cells of clock tree.
28
Dynamic Power Reduction Techniques
DVFS is a method through which variable amount of energy is allocated to perform
a task
• Adjusts performance and energy consumption levels while the device is active
OFF
Pcpu
f1
f3 = f1/3
E1 = CV12f1t f2 = f1/2 Dynamic Voltage/Frequency Scaling
E3 = E1/9
E2 = E1/4 (DVFS)
Time
t 2t 3t
29
Dynamic Power Reduction Techniques
Voltage Scaling
𝐊 𝑽 𝑫𝑫 Voltage Average Power Delay (ns)
𝐃𝐞𝐥𝐚𝐲 = (uW)
(𝑽 ¿ ¿ 𝑫𝑫 −𝑽 𝑻 )𝜶 ¿
1.8 106.32 1.76
• Reducing VDD supply voltage reduce the power consumption.
1.5 66.43 1.934
• At lower voltages, the delay increase is very significant 16.7% 37.5% 10%
33.3% 65.6% 33%
• Transistor sizing or parallel processing can help reduce the overall delay
44.4% 79.5% 127%
30
Dynamic Power Reduction Techniques
DVFS Example:
P1 α v1
Power Power 𝑣1
2
𝑓1
2
P1
( 𝑉 𝑓
(𝑉 2 , 𝐹2 )= 1 , 1
2 2 ) 𝑃2 α
4
.
2
W P2
W 75% energy saving
Time Time
T1 D D
DVFS Summary:
• DVFS technique has been proven to be a highly effective technique for power minimization subject to a performance constraint.
• DVFS should consider not only the CPU power but also the total system power dissipation.
31
Dynamic Power Reduction Techniques
Multi-Voltage Design Technique:
• Two VDDs are becoming common. Many chips already have two supplies (one for core and
one for I/O).
• Power reduction effect will be decreased as VDD’S are scaled. VDD1 VDD2
LS
LS
LS
LS
LS
LS
LS
LS
◦ Need for level shifters on signals running between blocks. Level Shifters (LS) are special
standard cells used in Multi Voltage designs to covert one voltage level to another. LS
LS
◦ The best solution is to make sure each domain gets the voltage swings ( rise and fall 0.9 0.7
times) that it expects. Level shifters are needed between any domains that use different LS
LS
voltages. This approach limits any voltage swing and timing characterization issues to the
boundary of voltage domains and leaves the internal timing of the domain unaffected.
• When driving signals between power domains with radically different power rails, the need
for level shifters is clear.
Two types of level shifters are used: High to Low & Low to High
32
Dynamic Power Reduction Techniques
High-to-Low Level Shifter is basically two cascaded inverts has vddl supply and input is vddh. It’s placed in the lower voltage domain
• If the distance between the 1.2V domain and the 0.9V domain is small enough, and the library has a strong enough buffer, then the
driving buffer can be placed in the 1.2V domain. No additional buffering is required.
• If the distance between the 1.2V domain and the 0.9V domain is Large, adding additional buffers is required. Additional buffers in
the 1.1V domain as shown.
• Additional buffer uses the power rail of the 1.2V domain. But this means that the 1.2V rail must be routed probably in the 1.1V
domain. This kind of complex power routing is one of the key challenges in automating the implementation of multi-voltage designs.
VDDL VDDL
VDDH
OUTL OUTL
D Q D Q
CLK CLK
VSS VSS
33
Dynamic Power Reduction Techniques
Low-to-High Level Shifter is basically getting the lower voltage signal and uses a cross-coupled amplifier transistor structure
running at the higher voltage .
• Since the output driver requires more current than the input stage, the level shifter is placed in the 1.2V domain.
• However, power routing will be a challenge no matter where the level shifter is placed. Because it requires both rails, at least one of
the rails will have to be routed from another domain.
• If the distance between the 1.2V domain and the 0.9V domain is small enough, and the library has a strong enough buffer, then the
driving buffer can be placed in the 0.9V domain. No additional buffering is required. Otherwise, additional buffers need to be
placed in the 1.1V domain, causing the power routing problems mentioned above.
D Q OUTH
INL OUTH
CLK
VSS
1.1V Domain
34
Dynamic Power Reduction Techniques
Recommendation When using Level Shifters:
• Place the level shifters in the receiving domain – in the lower domain for High-to-Low shifters, in the higher domain for Low-to-
High shifters.
• Low-to-High level shifters have significant delays that need to be understood and thoughtfully factored into RTL design
partitioning for timing critical blocks.
• Ensure there is a defined relationship between different voltage domains such that the operating conditions make it clear whether
an up- or down-shifter is required.
• Routing clocks across different power domains means that they have to go through level shifters.
• This clearly complicates automation – the clock tree synthesis tools need to understand level shifters and automatically insert them
in the appropriate places.
• The clock buffers in the multi-level domain will sometimes be powered at 0.9V and sometime at 1.2V. The solution is that
optimization and timing analysis must be done simultaneously for both situations to assure that timing will be met for both
conditions.
35
Static Power Reduction Techniques
Multi-Threshold Logic:
• As geometries have shrunk to 130nm, 90nm, 65nm, 45nm, 32nm, 22nm, 14nm and below, using libraries
with multiple VT has become a common way to reduce leakage current.
• Many libraries today offer two or three versions of their cells: Low VT, Standard VT, and High VT.
• HVT cells: Standard cells made up of transistors having high Vth. These cells consumes less power but
are slow. These can be used in path where timing is not critical thus, we can afford to introduce delay
while saving static power.
• LVT cells: Standard cells made up of transistors having low Vth. These cells are fast but consumes more
power. These are used in timing critical path.
• SVT cells: Standard cells made up of transistors having medium Vth. It offers a trade-off between HVT
and LVT, thus is consumes less power than LVT cells but are faster than HVT cells. These can be used
when we are not able to meet the timing by a small margin.
• The implementation tools can take advantage of these libraries to optimize timing and power
simultaneously.
36
Static Power Reduction Techniques
Power Gating:
• A critical decision in power gating is how to switch power. In general, there are
two approaches: fine grain power gating and coarse grain power gating. OFF
• In fine grain power gating the switch is placed locally inside each standard 0.9V
cell in the library. Since this switch must supply the worst-case current required
by the cell, it must be quite large not to impact performance. The area overhead 0.9V 0.9V
of each cell is significant (often 2x-4x the size of the original cell).
• In coarse grain power gating, a block of gates has its power switched by a
collection of switch cells. The sizing of a coarse grain switch network is more VDD VDD
difficult than a fine grain switch as the exact switching activity of the logic it Sleep
Sleep
supplies is not known and can only be estimated. But coarse grain gating VVDD
designs have significantly less area penalty than fine grain.
VVDD
VSS
37
Static Power Reduction Techniques
Comparison between two methods as the following table:
• Today, virtually all power gated designs use coarse grain power gating.
38
Static Power Reduction Techniques
Power Gating Challenges :
• Minimizing the impact of power gating on timing and area.
• Interface isolation.
• Performing power state transition verification to ensure all legal state entry and exit arcs are simulated and verified.
39
Static Power Reduction Techniques
Dynamic and Leakage Power Profiles
An example activity profile for a sub-system using clock gating to reduce power
• SLEEP events initiate entry to the low power mode & WAKE events initiate return to active mode.
• Figure shows Realistic Profile with Power Gating. The leakage power savings are not perfect and instantaneous; the full leakage
power savings take some time to reach target levels. This is due partly to the hotter thermal profile of the preceding activity and
partly to the non-ideal nature of the power-gating technology. Therefore, the achievable savings are compromised to some extent.
40
Static Power Reduction Techniques
A simplified view of an SoC that uses internal power gating is shown.
• Unlike a block that is always powered on, the power-gated block receives its power
through a power-switching network. This network switches either VDD or VSS to the
power gated block. In this example, VDD is switched; VSS is provided directly to the
entire chip. The switching fabric typically consists of large umber of CMOS switches
distributed around or within the power gated block.
• One challenge for power gating designs is that the outputs of the power gated block
may ramp off very slowly. The result could be that these outputs spend a significant
amount of time at threshold voltage, causing large crowbar currents in the always
powered on block. To prevent these crowbar currents, isolation cells (the “Isol” block
in the figure) are placed between the outputs of the power gated block and the inputs
of the always on block. These isolation cells are designed so that they do not
experience crowbar current when one of the inputs is at threshold, as long as the
control input is off. The power gating controller provides this isolation control signal.
• Isolation Cells
• Header Cells
• Retention Registers
41
Static Power Reduction Techniques
Isolation Cells:
• The outputs of the power gated block are the primary concern, since they can
cause electrical or functional problems in other blocks.
VDD2
VDD
• The inputs to the power gated blocks usually are not an issue they can be driven IN
to valid logic values by powered up blocks without creating electrical (or OUTSimplified logic model
EN
functional) problems in the powered down block.
VSS
• There are three basic types of isolation cell: D D
• Clamp library cells are designed to avoid crowbar currents and leakage paths D ISO Q D ISO Q
Bypas Bypas
when signal input floats, if the control input is in the appropriate (“isolate”) s 0 0 0 s
0 1 0
state. In addition, their extra attributes to ensure these cells never get optimized mode
1 0 1 mode
1 1 1
away, buffered incorrectly or inverted as part of logic optimization X 1 1
X 0 0
Output
clamped 42
Static Power Reduction Techniques
Recommendation When using Isolation Cells:
• Isolation clamps on clocks can considerably complicate clock tree synthesis and timing closure. Clock tree balancing can become
difficult. If possible, avoid clocks that are generated in a power gated block and used externally to the block.
• Since the Power Gated Controller manages all the power gating inside a design, it enables the isolation enable signal before power
gating to make sure there is no ‘X’ value propagated to [on-domain] after power is cut off.
• Ensure that stuck-at-0 and stuck-at-1 faults can be detected during test on the isolation control signals. This facilitates verifying
during manufacturing test that isolation works.
43
Static Power Reduction Techniques
Retention Registers : preserve status while the power-gated block is turned off
• Another approach to providing state retention while power gating is to replace a standard register with a retention register.
• A retention register contains a “shadow” register that can preserve the registers state during power down and restore it at power up.
When SAVE is asserted, the state of the main register When RETAIN goes high, the state of the main
is loaded into the shadow register. register is loaded into the shadow register.
When RESTORE is asserted, the state of the shadow When RETAIN goes low, the state of the shadow
register is loaded back into the main register. register is loaded back into the main register.
SAVE and RESTORE are level-sensitive signals. RETAIN is an edge-sensitive signal.
45
UPF
UPF stands for unified power format. It’s power format specification to implement low
power techniques in a design flow. UPF is designed to reflect the power intent of a design at a
relatively high level
• UPF 1.0 was defined by Accellera
• UPF 2.0 defined by IEEE
• UPF 2.1 – UPF3.1
- Added new capabilities
- New commands
At each design stage the UPF is used along with the RTL/Verilog.
At RTL level Simulation, Simulators understand power intent and do power aware verification.
48
References
1. Synopsys University Courseware
3. https://www.linkedin.com/company/learnvlsi/
4. https://www.cs.utexas.edu/users/hunt/FMCAD/2007/presentations/Tutorial_Najm.pdf
5. https://www.ece.ucdavis.edu/~ramirtha/EEC216/W08/lecture1_updated.pdf
6. https://vlsi.pro/power-dissipation-leakage-power
7. https://teamvlsi.com/2021/08/integrated-clock-gating-icg-cell-in-vlsi.html/
8. https://vlsitutorials.com/isolation-cells-level-shifter-cells-low-power-vlsi/
9. https://media-exp1.licdn.com/dms/document/C561FAQGpoRcAJIQVmg/feedshare-document-pdf-analyzed/0/16491303
65274?e=2147483647&v=beta&t=kBQgCDk86uJPWRcCbC3CJuexMG5WH_vk1AEN8VYImmU
10. https://www.cnblogs.com/guolongnv/articles/6252690.html
57