0% found this document useful (0 votes)
48 views4 pages

Perez Et Al - 2019 - Distributed Network of LDO Microregulators Providing Submicrosecond DVFS and IR

This document summarizes a distributed network of LDO microregulators that was integrated into a 24-core microprocessor fabricated in 14nm SOI CMOS. The network uses multiple sensing points to reduce IR drops across the voltage supply grid and improve voltage regulation. It employs a master controller that can operate in single-sense, multi-sense, or multi-sector modes. Each microregulator contains a charge pump with a switched-capacitor accelerator that can speed up output voltage transitions by up to 17x. Experimental results showed line and load regulations of 6.7mV/V and 1.5mV/A, respectively, with a power efficiency of 89.9% and current efficiency of 98.5%.

Uploaded by

马晓飞
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views4 pages

Perez Et Al - 2019 - Distributed Network of LDO Microregulators Providing Submicrosecond DVFS and IR

This document summarizes a distributed network of LDO microregulators that was integrated into a 24-core microprocessor fabricated in 14nm SOI CMOS. The network uses multiple sensing points to reduce IR drops across the voltage supply grid and improve voltage regulation. It employs a master controller that can operate in single-sense, multi-sense, or multi-sector modes. Each microregulator contains a charge pump with a switched-capacitor accelerator that can speed up output voltage transitions by up to 17x. Experimental results showed line and load regulations of 6.7mV/V and 1.5mV/A, respectively, with a power efficiency of 89.9% and current efficiency of 98.5%.

Uploaded by

马晓飞
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Distributed Network of LDO Microregulators

Providing Submicrosecond DVFS and IR Drop


Compensation for a 24-Core Microprocessor in 14nm
SOI CMOS

Miguel E. Perez, Michael A. Sperling, and John F. Bulzacchelli and Zeynep Toprak-Deniz
Timothy E. Diemoz IBM Research
IBM Systems IBM T. J. Watson Research Center
IBM Yorktown Heights, NY, USA
Poughkeepsie, NY, USA

Abstract—A distributed network of LDO


microregulators (uREGs) senses and corrects the voltage at
multiple points on a power supply grid in a multi-core
microprocessor to reduce IR drops and associated
performance loss. Adding a switched-capacitor (SC)
accelerator to the charge pump of each uREG speeds up
output voltage transitions by 17X for greater DVFS savings.
Line and load regulations are 6.7mV/V and 1.5mV/A,
respectively. The regulator also achieves a power efficiency
of 89.9% and a current efficiency of 98.5%, while reaching
a peak power density of 91.1W/mm2.

Keywords—Distributed regulator; LDO; multi-sense; multi- Fig. 1. Single-sense (left) versus multi-sector (right) schemes.
sector; IR drop; charge pump.
II. REGULATOR DESIGN
I. INTRODUCTION
The use of dynamic voltage and frequency scaling (DVFS) A. Distributed Sensing and Control Schemes
to alleviate the energy constraints of multi-core processors has The iVRM is implemented as a distributed system with a
motivated recent work on integrated voltage regulator modules master regulator controller (VREGC) sensing the voltages on
(iVRMs) [1]. Distributed regulators [2,3] offer key benefits the grid and sending corrective signals to various groups of
when the regulated domain extends beyond 1 mm, as placement uREGs across the core. Fig. 1 compares the distributed
of individual microregulators (uREGs) closer to their loads architecture employed in this work with that presented in [2].
helps reduce response time and power grid IR drops due to With the single sense point scheme of [2], the VREGC only
current redistribution. IR drop errors can be further reduced with ensures that the regulated voltage is correct at one point on the
active sensing and correction of voltage at multiple points on a grid, and IR drops can lead to significant voltage errors across
grid [3], but comparator offsets can lead to extreme load sharing the grid if the load is not uniformly distributed. With the multi-
imbalances among uREGs. This paper describes a control sector scheme adopted here, the VREGC monitors four sense
points (one in each quadrant) and independently adjusts the
mechanism for striking a balance between IR drop suppression
uREGs located in those respective areas, thereby minimizing IR
and load sharing imbalance which is insensitive to comparator
drop errors. An alternative multi-sense control scheme (not
offsets. It also presents a switched-capacitor (SC) accelerator in shown in Fig. 1) is also implemented, in which feedback from
the charge pump of each uREG that speeds up output voltage the sense point with the lowest voltage reading is selected as the
transitions by up to 17X for greater DVFS savings. correction signal delivered to all uREGs in the core. While
multi-sense (unlike multi-sector) control does not suppress IR
drop errors, it does guarantee that even the lowest sense point on
the grid is properly regulated.

* 978-1-5386-9395-7/19/$31.00 ©2019 IEEE


B. Master Regulator Controller (VREGC)
Fig. 2 shows a schematic of VREGC, which controls the
uREGs by sending them digital up/down (UP/DN) codes that
drive a charge pump whose voltage (Vcp) serves as a local
reference for the high-speed uREG comparator (Fig. 3). This
obviates the need to send precise reference voltages to all the
uREGs. The voltage on the power grid is sensed differentially
with a sample/hold (S/H) and compared to a programmable
reference (VIDout). The error voltage is amplified and then fed
to a latched comparator, whose output is processed by a digital
Fig. 2. Voltage regulator controller (VREGC) schematic.
FIR filter that produces the 2b UP/DN code sent to the uREGs.
The truth table in Fig. 2 specifies how the VREGC is configured
for the three possible control modes. In single-sense mode, a
single sense point is chosen, and the output bit of a single
comparator is used to generate the same UP/DN codes for all the
uREGs. Sending the same UP/DN codes to all the uREGs
ensures that they all operate with the same duty cycle
(proportional to uREG output current) and therefore guarantees
balanced load sharing, even in the face of uREG comparator
offsets [2]. In multi-sense mode, all 4 comparator outputs are fed
through an OR gate. A comparator (CMP) output of 1 indicates
that the voltage at the sense point is below target, so the lowest
sense point always dominates. In this mode, VDD is regulated
so that it is equal to or higher than the target at every sense point.
In multi-sector mode, each comparison from every sense point
is used to generate unique UP/DN codes which are delivered to
the group of uREGs in that sector, so every sense point is
actively regulated.
Fig. 3. Micro-regulator (uREG) schematic. Passgates are binarily sized.
C. Micro-Regulator (uREG)
Fig. 3 shows a simplified schematic of the uREG, which
features a common gate amplifier [2] that compares the local D. Minimum uREG Duty-Cycle Enforcement
VDD to Vcp. Its output is amplified to rail-to-rail levels and In multi-sector mode, groups of uREGs in each quadrant will
level-shifted (LVL) so that it can control the passgate devices. have different duty cycles (output currents) whenever their local
The use of fast-switching PFETs with gates driven rail-to-rail loads differ, which helps minimize power grid gradients.
provides sub-ns response time. Self-generated ripple is reduced However, this load sharing imbalance may become excessive (to
using slow PFETs with RC-filtered gate voltages and PMOS the point where some uREG groups may be driven to zero duty
strength (active width) calibration. The UP/DN codes from the cycle and shut down completely) if the comparators of VREGC
VREGC are gated with signal LSTG2 to generate the charge have different voltage offsets. With no switching activity in
pump control signals (providing negative feedback from the those uREGs, feedback will not be delivered to the charge pump
VREGC). The charge pump voltage Vcp reaches a steady-state voltage Vcp. This condition seriously hinders the ability of the
value when the uREG duty cycle (D) satisfies the condition uREGs to respond quickly to a sudden rise in local load currents.
<UP>/<DN>=D/(1-D), where <UP> and <DN> represent the To avoid this undesired effect and also strike a balance between
time-averaged values of the UP and DN codes. Whenever a fast IR drop suppression and load sharing imbalance, the VREGC
VDD transition is desired, the SC pump circuit (accelerator) in includes a minimum UP pulse generator that counts the number
the uREG is enabled. When going to a higher VDD, Cpmp of consecutive DN codes sent to any given sector. If this number
switches between Vcp and VDD, dumping a constant charge on surpasses a critical threshold (NCRIT), an automatic UP code is
Cp with every cycle. When transitioning to a lower VDD, Cpmp sent to the uREGs even if the local value of VDD is above target
switches between GND and Vcp, discharging Vcp. On the way at the time. This mechanism guarantees a minimum duty cycle
up, pumping stops after every sense point has crossed the desired for each sector and limits the degree of load sharing imbalance.
threshold. During a downward transition, the pump signal is NCRIT is set to a value such that the total output current of all four
gated with LSTG2 to only allow a pump when VDD is below sectors operating at minimum duty cycle is smaller than the
the instantaneous target set by Vcp, thus ensuring that VDD can lowest possible load current across process variation.
track Vcp closely even in the presence of light loads. To avoid
drooping too low, the pump is halted as soon as any sense point
reports that the desired voltage has been reached.
III. EXPERIMENTAL RESULTS
The iVRMs were integrated into a 24-core processor
fabricated in 14nm SOI CMOS. The processor consists of 6
chiplets, each having 4 cores and a shared L2/L3 cache. Within
each chiplet, there are 5 independently regulated power grids (30
across the entire processor), one for each core (VDDCORE) and
one for the L2/L3 cache (VDDCACHE). Each iVRM powering one
core employs 20.5nF of input decoupling capacitance (DCAP)
and 481nF of output DCAP. Fig. 4 shows a micrograph of a
single core. The yellow rows represent the power headers while
the red blocks represent the location of the distributed uREGs
within the headers (16 per core). The pink dots indicate where
the four sense points are located throughout the core. The
VREGC is placed nearby in the cache for greater ease of
Fig. 4. Single core micrograph showing uREGs within power headers, VREGC
communication with other parts of the processor controlling the and sense points.
iVRM. Fig. 5 shows the load and line regulation achieved by the
iVRM. High currents were obtained by running intensive
workloads at their respective Fmax for a given VDD. Low
currents were achieved by gating off clocks, leaving the load to
consist of mere leakage. Tight regulation is achieved, as load
regulation is 1.5mV/A (across 30X load current variation with
VDIN=1.1V), and line regulation is 6.7mV/V across an input
range of 0.64-1.1V and an output range of 0.6-1.06V. DC
measurements were made of the regulator’s current and power
efficiencies at various VDD levels under high load conditions,
as shown in Fig. 6. At VDIN=1V and VDD=0.92V, the regulator
achieves a power efficiency of 89.9% and a current efficiency of
98.5% while supplying 11.9A.
Rise and fall times were measured for 100mV VDD
transitions with and without the use of the SC pump circuit. Fig.
7 shows an up to 17X speed-up when reducing VDD to 850mV
(helping to increase DVFS power savings), and up to 5X faster
transitions when raising VDD back to 950mV (allowing the core
to respond faster to more demanding workloads). Fig. 8 shows
responses to a 10ns load step from 4.8 to 9.7A. Compared to
iVRM bypass mode, the voltage droop in regulated mode is 6X
and 2.7X smaller under moderate (238mV Vds) and low (79mV
Vds) headroom conditions, respectively.
Fig. 9 shows a plot of processor Fmax (normalized to highest
Fmax) versus workload for the same VDD target using different
Fig. 5. Load (top) and line (bottom) regulation.
sense modes. Different workloads are used to emphasize certain
parts of the core more heavily than others to create load current
imbalances. Multi-sector and multi-sense achieve higher Fmax
consistently, regardless of the type of workload. Using single
sense point Sense1, located in a less active region of the core for
some workloads, causes up to 3% performance degradation.
Direct measurements show at least 30mV reduction of the grid
voltage in this mode. While use of single sense point Sense2
achieves high Fmax, this would change if workloads with less
activity around Sense2 were encountered (not found among
workloads tested), so multi-sector and multi-sense are more
robust. In addition, multi-sector is on average slightly (1.5%)
more power-efficient than multi-sense while maintaining
virtually equal performance levels, making it an optimal
compromise between power efficiency and performance. Fig. 6. Power and current efficiencies across VDD with high loading.
to a different value of VDD. This new capability maximizes
DVFS power savings and improves how quickly a regulated
core can respond to a surge in computing demand. Table I
compares the performance achieved by our design with that of
recent state-of-the-art LDOs found in the literature.

TABLE I. LDO PERFORMANCE COMPARISON

Design [1] [2] [3] [4] This


Work
Process 14nm 22nm 65nm 65nm 14nm
Area [mm2] 0.041* 0.355 0.776 0.03 0.14
VIN [V] 0.6-1.1 0.68- 0.6-1.2 0.5-1 0.64-1.1
Fig. 7. Output voltage transitions at light loads with and without the charge 1.1
pump acceleration.
VOUT [V] 0.4-1 0.61- 0.55-1.15 0.45-0.95 0.6-1.06
1.03
IMAX [A] 4 11.9 0.5 0.012 12
CL [nF] 4000 750 0.9 0*** 481
Peak current
eff. [%] 99.7 96.7 99.9 99.97 98.5
Load
regulation
[mV/A] N/A <0.5 4237 500 1.5
Line
Fig. 8. Load step responses under low (left) and moderate (right) headroom regulation
conditions. [mV/V] <2 20 N/A N/A 6.7
Dropout
voltage [mV] 80 70 50 50 40
∆VOUT [mV]
@ 50 @ 125 @ 105 @ 20 @
∆ILOAD/TEDGE 3A/1.5ns N/A 0.45A/20ns 0.01A/1ns 4.9A/10ns
Peak power
density
[W/mm2] 77.1** 34.5 0.74 0.2 91.1
*Does not include power gate area **Area used does not include power gate
***Uses 0.1nF AC coupling capacitor to compensate for lack of CL

ACKNOWLEDGMENT
The authors thank Alper Buyuktosunoglu and Ramon
Fig. 9. Fmax using various sense schemes versus workload. Monfort for providing many of the workloads (created with
MicroProbe) used for testing.
IV. CONCLUSION
In this paper, we describe an asynchronous and distributed LDO REFERENCES
regulator that features key innovations for improving its [1] K. Luria, J. Shor, M. Zelikson, and A. Lyakhov, “Dual-use low-drop-out
performance in the high-power and highly dynamic processor regulator / power gate with linear and on-off conduction modes for
microprocessor on-die supply voltages in 14nm,” ISSCC Dig. Tech.
application space. In particular, we introduce a set of new Papers, pp. 156-157, Feb. 2015.
sensing and control schemes that compensate for voltage errors [2] Z. Toprak-Deniz, et al., “Distributed system of digitally controlled
due to IR drops on a power grid, which allows a processor to microregulators enabling per-core DVFS for the POWER8TM
achieve greater performance (higher Fmax) without wasteful microprocessor,” ISSCC Dig. Tech. Papers, pp. 98-99, Feb. 2014.
guardbanding. We also describe a control mechanism for [3] Y. Lu, F. Yang, F. Chen and P. K. T. Mok, “A 500mA analog-assisted
striking a balance between suppression of IR drop errors and digital-LDO-based on-chip distributed power delivery grid with
extreme imbalances in load sharing, which is an important but cooperative regulation and IR-drop reduction in 65nm CMOS,” ISSCC
Dig. Tech. Papers, pp. 310-311, Feb. 2018.
often underappreciated tradeoff in the design of distributed
[4] M. Huang, Y. Lu, S.-P. U, and R. P. Martins, “An output-capacitor-free
regulator systems. Finally, we show how to significantly analog-assisted digital low-dropout regulator with tri-loop control,”
increase the speed of output voltage transitions with the use of ISSCC Dig. Tech. Papers, pp. 342-343, Feb. 2017.
SC pump circuits that assist the baseline feedback loop transition

You might also like