Power Reduction Techniques for
an
8-core Xeon® Processor
Tanmay Rao M
MS VLSI CAD
101002014
Under the Guidance of
1 Mr Shridhar Nayak
12/8/21
2 12/8/21
Content
What is Nehalem Architecture?
What is a Xeon Processor?
Power Reduction Techniques
Core and Cache Recovery
Power Management Techniques
Idle Power Reduction
Conclusion
3 12/8/21
Nehalem Architecture
Nehalem is a family of next-generation Itanium
processors
New system architecture, significantly enhanced
Core architecture innovations in power
management and modular design.
Intel QuickPath Interconnect, in high-end models,
replacing the legacy front side bus.
4 12/8/21
Intel® Hyper-Threading Technology.
Intel Turbo Boost Technology.
The following caches:
32 KB L1 instruction and 32 KB L1 data cache per
core
256 KB L2 cache per core
4–12 MB L3 cache shared by all cores
30% lower power usage for the same performance
5 12/8/21
Xeon Processor
Xeon is a brand of multiprocessing- or multi-
socket-capable x86 microprocessors targeted at the
non-consumer server, workstation .
The Xeon brand has been maintained over several
generations of x86 and x86-64 processors.
The Xeon CPUs generally have more cache than
their desktop counterparts in addition to
multiprocessing capabilities.
6 12/8/21
Power Dissipation Trends
7 12/8/21
Overview of Xeon Nehalem-Ex
The shared L3 cache is split in eight slices. Even
though each cache slice is aligned with a processor
core, the entire L3 cache is seen as one large,
shared cache by all cores.
The top side of the floor-plan includes four point-
to-point Quick Path Interconnect links (QPI)
running at 6.4GT/s
The bottom side houses the memory interface.
8 12/8/21
Processor Die Photo
9 12/8/21
Power Reduction Techniques
The processor has four voltage domains.
To reduce both switching and leakage power, we strive to
operate each domain at the lowest possible voltage level.
To minimize leakage power, long channel devices are
used in non-critical circuit paths. These devices trade off
speed for lower leakage: about 10% lower speed provides
a 3x leakage reduction.
The processor cores include 58% long channel devices,
while the un-core logic uses 85% low-leakage transistors.
All circuits use static CMOS logic to minimize the active
switching power.
10 12/8/21
Processor Voltage Domains
11 12/8/21
L3 Cache
The L3 cache implements both sleep and shut-off
modes, that reduce leakage by 35% and 83%
respectively.
In active state, the large pull-up is turned on and
connects the virtual array supply to the real Vce.
In sleep mode, the large pull-up is turned off and
the small multi-leg programmable pull-up acts as a
resistor that drops the supply voltage into the sleep
state.
12 12/8/21
In shut-off mode all pull-up devices are turned off and the
virtual VCC drops to a low voltage determined by the residual
leakage of the array.
In shut-off state the array is functionally disabled and the logic
state of the array bits is lost.
13 12/8/21
Cache and Core Recovery
Ifone of the cores has a manufacturing defect, we
disable the core and recover that part.
The same applies for a massive defect in the cache
which cannot be repaired using the built-in cache
redundancy.
On-die electrically programmable one-time fuses
are used for the core and cache recovery.
The disabled cores are clock and power gated to
reduce power consumption.
14 12/8/21
Cache And Core Recovery
15 12/8/21
Infrared Image of the Die
16 12/8/21
Brighter shades of gray indicate higher switching
and leakage power .
The two cores that are disabled show dark in the
picture, since there is no infrared emission coming
from them.
The one bright spot in each core is due to the
thermal sensor, which is powered by the clean PLL
voltage domain and is not affected by the core-
level power gates.
17 12/8/21
Power Management
The PCU contains a micro-controller.
The PCU receives the output of core-level voltage
and temperature sensors, as well as the desired
power state for each core.
The microcontroller computes the voltage ID bits
that program the external voltage regulator and the
multiplier ratio for the core PLLs.
PCU controls the core voltage regulator output and
the core power gates, manage the transitions
between the different power states.
18 12/8/21
Block Diagram of PCU
19 12/8/21
Idle Power Reduction
To minimize the power consumption in unused blocks.
Cutting the power dissipation of the unused I/O links
Multiple platform configurations leave I/O links un-
terminated at the other end.
The link detect circuit is implemented.
The Link Detect signal shuts off the link PLL and the
analog bias circuit, such that there is no active power
dissipated in the un-terminated link.
This saves about 2W per disabled link.
20 12/8/21
21 12/8/21
SUMMARY
Multiple voltage and clock domains are used to
minimize the power consumption for each domain.
Power gating is implemented at both the core and
cache level to control leakage.
An on-die microcontroller manages voltage and
frequency operating points, as well as power and
thermal events.
Idle power is reduced by shutting off the un-
terminated I/O links.
22 12/8/21
REFERENCES
“Power Reduction Techniques for an 8-core Xeon®
Processor” Stefan Rusu et al ,Intel Corporation ,Santa
Clara, CA, USA.
K. Mistry, et aI., "A 45nm Logic Technology with High-
k+ Metal Gate Transistors, Strained Silicon, 9 Cu
Interconnect Layers, 193nm Dry Patterning, and 100%
Pb-free Packaging", IEDM Tech. Digest, December
2007
S. Rusu, et aI., "A 45nm 8-Core Enterprise Xeon"
Processor", ISSCC Tech. Digest, February 2009.
23 12/8/21
THANK YOU