0% found this document useful (0 votes)
9 views3 pages

A 100Gb S Transmitter With Digital Pre-Distortion and MUX-Merged Voltage-Mode Driver Achieving 3-Times INLPP Improvement in 28nm CMOS

This document presents a 100Gb/s transmitter utilizing a digital pre-distortion (DPD) technique to enhance linearity in voltage-mode drivers, achieving a threefold improvement in integral non-linearity (INLPP). The proposed DPD calibration method reduces resource consumption by optimizing the look-up table (LUT) size and employs a 4:1 multiplexer to extend output bandwidth. The results demonstrate significant improvements in output signal quality and energy efficiency, making it suitable for high-speed wireline applications.

Uploaded by

王竣右
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

A 100Gb S Transmitter With Digital Pre-Distortion and MUX-Merged Voltage-Mode Driver Achieving 3-Times INLPP Improvement in 28nm CMOS

This document presents a 100Gb/s transmitter utilizing a digital pre-distortion (DPD) technique to enhance linearity in voltage-mode drivers, achieving a threefold improvement in integral non-linearity (INLPP). The proposed DPD calibration method reduces resource consumption by optimizing the look-up table (LUT) size and employs a 4:1 multiplexer to extend output bandwidth. The results demonstrate significant improvements in output signal quality and energy efficiency, making it suitable for high-speed wireline applications.

Uploaded by

王竣右
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

IEEE CICC 2025 25-3

A 100Gb/s Transmitter with Digital Pre-Distortion and MUX-Merged is subsequently used to compute the final pre-distortion code. The polarity of the
Voltage-Mode Driver Achieving 3-times INLPP Improvement in 28nm sign bit Code[7] determines whether the original value of the LUT or its 2s-
CMOS complement is used. In this way, the storage bit width of the LUT is reduced from
the 7bit to 4bit with more than 30% reduction in resource consumption according
Chenxi Han, Xiaoteng Zhao, Qi Zhang, Yuan Liu, Yuhao Zhang, Hongzhi Liang, to the synthesized results. The algorithmic flow of the DPD calibration scheme is
Yukui Yu, Shubin Liu, Zhangming Zhu illustrated in the top-right of Fig. 3. The first step is to measure output ramp curve.
Then, fitting an ideal ramp curve and calculating the difference code Δ between
Xidian University, Xi’an, China the origin code and fitting code, which is stored in the LUT. For example, in Fig. 3
In high-speed wireline transmitters (TXs), the output drivers are mainly categorized bottom-right, blue curve and red curve are represented the origin ramp and fitting
into two types, i.e., current-mode logic (CML) and voltage-mode (VM) driver. For ramp, respectively. For the Code A (0001101), it deviates from the fitting curve
TXs with data rates exceeding 100Gb/s, most of existing work resorts to the CML and needs to be mapped to a nearest code corresponding to the fitting value. The
driver for higher bandwidth [1-6]. Although VM driver consumes less power for the distances of codes 0001111, 0010000, 0010001 (decimal 15, 16, 17) from the
same output swing and exhibits good technology scalability, its output impedance fitting values are r1, r2, and r3, respectively. Code B (0010000) corresponds to r2
is significantly affected by the source-drain voltage (VDS) of MOSFETs, resulting in is pre-distortion code because r2 < r3 < r1. The Δ code (0011) is stored in address
poor linearity, especially for PAM-4 signaling. Additionally, the timing margin 0001101 of the LUT. Finally, if the output linearity after pre-distortion meets the
between retiming clock and data in TX may fail to meet the constraint due to PVT requirements, the DPD flow is ended. Otherwise, a new round of pre-distortion
variations [7]. data calculation will be performed. The simulation results at the bottom of Fig. 3
To address these issues, this paper proposes a digital pre-distorted (DPD) linearity show that the proposed DPD calibration technique can handle the output non-
calibration technique that implements in digital domain without additional high- linearity regardless of its polarities, improving INLp-p from 10-LSB to 1-LSB.
speed analog overhead. Furthermore, by merging a 4:1 MUX in the output VM Benefitting from the proposed DPD P-over-N driver which reduces the transistor
driver, the internal full-rate nodes are eliminated and the output bandwidth is size and extends the limit of output bandwidth compared with the conventional
extended by applying a cascaded output peaking network. Besides, an improved SST drivers, it is possible to merge output driver with 4:1 multiplexer in an effort to
adaptive retiming clock optimization technique is proposed for data eliminate high-rate nodes inside the transmitter. Such idea is widely implemented
rates >100Gb/s [7]. Finally, a DAC-DSP TX prototype with proposed techniques in CML driver at present while still uncommon in VM driver. The output
2025 IEEE Custom Integrated Circuits Conference (CICC) | 979-8-3315-1745-8/25/$31.00 ©2025 IEEE | DOI: 10.1109/CICC63670.2025.10983867

achieves 3-times improvement in peak-to-peak integral non-linearity (INLPP) with multiplexing topology and 1-UI pulse generators are illustrated in Fig. 4. Each
competitive energy efficiency. driver slice consists of four P-over-N differential pairs sharing two impedance
Figure 1 illustrates the principle of the proposed technique and the comparison of tuning transistors. The P-over-N pairs are controlled by two types of 1-UI pulse
different VM driver schemes. The P-over-N driver structure is straightforward but generator. The generator A remains high level output voltage when it does not
the conduction impedance of the transistor varies with the output voltage, reducing work, while the generator B remains low. As is shown in the timing diagram in the
the linearity of the output signal. Source-series-terminated (SST) is an effective bottom left of Fig. 4, the generator A and B alternately generate pull-down pulse
way to improve the linearity of VM outputs [8]. However, the passive resistor PDA0~3 and pull-up pulse PDB0~3 to turn on the corresponding P-over-N pairs,
occupies a large layout area and the size of the transistors should be increased to respectively. To further extend the output bandwidth, a cascaded series peaking
ensure high linearity, which also increases the pre-driver power consumption. The inductor network is adopted to neutralize capacitors at different nodes (Fig. 4
PN-over-NP driver compensates for nonlinearity via adding an N-over-P pair bottom right). The post-layout simulation results show that the bandwidth of
instead of the passive resistor [10]. Yet, it doubles the number of transistors with cascaded peaking inductor is improved by 221% and 140% compared to no
rather low swing. The proposed DPD linearity calibration technique enhances the peaking inductor and single series peaking, respectively.
P-over-N driver output linearity by changing the weights of the DAC control codes. The prototype TX is fabricated in 28nm CMOS with 0.278 mm2 active area (see
Only a DPD encoder is needed to pre-distort the original DAC control weights Fig. 7). The TX output eye diagrams are tested by a real-time oscilloscope via
based on the characteristics of the output non-linearity, with no additional analog wire-bonding package, PCB traces, connectors, cables, and DC blocks. Measured
overhead. Furthermore, the DPD encoder can be easily integrated in the DAC- 50 GBaud and 40 GBaud eye diagrams are shown in Fig. 5 top and middle,
DSP based TX at the cost of digital complexity. respectively. The 50 GBaud eye heights (widths) of NRZ and PAM4 are 145 mV
To verify the proposed linearity calibration technique, a DAC-DSP based TX with (0.65 UI) and 25/27/28 mV (0.32 UI) with FFE coefficients [0.0169, -0.0735,
the DPD P-over-N driver is designed, as illustrated in Fig. 2. The DSP is 0.6468, -0.1320, -0.0711, 0.0597]. The entire TX consumes 204 mW (2.04 pJ/b) at
implemented by a digital synthesis flow including PRBS generator, 6-tap look-up 100 Gb/s PAM-4 signaling, where the proposed DPD calibration consumes less
table (LUT) based feed-forward equalizer (FFE), pre-distorted and temperature than 10% of total power. Without DPD calibration, the upper and lower eyes are
encoder. The DSP outputs eight 32-bit parallel signals to eight data path slices, almost closed the at 80-Gb/s PAM-4 condition because of the non-linearity of the
each of which consisting of 32:4 serialization, retimer and single-to-differential conventional P-over-N driver. After the DPD calibration, the eye diagram is better
(S2D), and DAC. For the enough timing margin in 1-UI pulse generator, retimer is opened and the RLM is improved from 93.8% to 99%. Figure 5 bottom
divided into two stages. The first stage uses CK4_0 as the sampling clock and demonstrates the improvement of the DPD calibration with the ramp code, the
CK4_90/180/270 are used in the second stage. Each DAC is realized by replicating the maximum deviation from ideal curve exhibits 3x reduction after calibration.
unit slice, the number of which is related to the weights. An output multiplex The measured output ramp curves INLs for different polarity non-linearities are
topology that merges 4:1 MUX and VM driver is implemented to eliminate internal shown in Fig. 6 top. Before applying DPD linearity calibration, the original INLs are
full-rate nodes. Finally, the cascaded peaking networks extend output bandwidth. -6.5/+5.5 LSB and -7.5/+5.6 LSB, respectively. After calibration, the INLs are
In the clock path, differential input clocks are divided by a current mode logic (CML) significantly reduced to -1.6/1.9 LSB and -1.5/1.9 LSB, achieving 3-times INLPP
divider to generate quadrature-phase clocks CK4_DIV. Two differential clocks from Improvement. The performance summary and comparison with prior arts are
CK4_DIV are sent to the subsequent dividers to produce sub-rate clocks for shown in Fig. 6 bottom.
serialization and DSP. The phase order of CK4_DIV is automatically adjusted by a The proposed DPD technique effectively eliminates the need for series resistors,
phase rotator (PR) and then sent to the retimer to optimize the time margin. reducing the driver size while maintaining high linearity with an RLM of 98%. This
Specifically, the divided clock of the CK8_RP is used as an option of phase technique making it possible to merge the 4:1 MUX in the final stage, which is a
detection (PD) clock, which increases the time margin of phase detection and pioneer attempt in TX using VM-driver. At 100+Gb/s data rate adopting VM driver,
make it suitable for higher data rates TXs than that in [7]. this paper demonstrates best-in-class energy efficiency using DSP-based
The DPD architecture and simulation results are shown in Fig. 3. For 7bit DAC architecture achieving by planar CMOS technology. Furthermore, by adding an
control code, the size of a DPD encoder based on the complete LUT is 896 bit optional reference (CK8) in the PD in clock path, the adaptive retiming scheme is
(7bx128), consuming a lot of resources. Due to the output non-linearity of P-over- achieved at 100Gb/s with <16 UI convergence time, which is the significantly
N driver is odd symmetric about center code, only half range of the calibration shorter than prior works [6,8].
code needs to be stored. For instance, in Fig. 3 bottom-left, the output value of the
Code 0100001 is 0.2, and its inverse -0.2 corresponds to the Code 1011110.
Hence, the mapping of the LUT address can be realized by Code[6] XOR
Code[5:0], where the depth of the LUT is reduced by 1/2. To further minimize
resource consumption, only the difference (Δ) between the pre-distortion code and
the original code is stored in the LUT (set as a 4bit signed number), and an adder

Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on September 01,2025 at 15:07:01 UTC from IEEE Xplore. Restrictions apply.
979-8-3315-1745-8/25/$31.00 ©2025 IEEE
IEEE CICC 2025 25-3

Principle of the Digital Pre-Distorted (DPD) P-over-N Driver

Parallel Pre-Distorted P-over-N Final


FFE Serializer x8
Data Encoder Driver Output LSB Slice
MSB Slice DAC Weights (xN):
DSP
(Synthesized) Data Path MSB to 22 weight:
x8,x8,x8,x4,x2,x1,x0.5,x0.25
INL Convex
Convex
SPI Control xN
Shape Concave
Concave
x2 DAC Slice
x2
Retimer & S2D Output
Center Code Odd Symmetric Center Code OUTP
about Center Code 1-UIPulse
Pulse Network
D

Digital Pre-distored &


Coder Resource 1-UI
Digital Reduction >50% Analog

Therm. Encoder
PRBS Generator

6-tap LUT FFE


1-UIPulse
Pulse OUTN

SST Driver
32 32 D L

Serializer
1-UI

4:1 MUX
128 x7 x8

32:4

&
Size and Parasitic D L 1-UIPulse
1-UI Pulse
Reduction
D D 1-UIPulse
1-UI Pulse
Data VOUT Data VOUT Data VOUT Data DPD VOUT
Coder

Half-LUT CK4_0 CK4_90/180/270 CK4_0/90/180/270


Based on

CK16
CK8
Odd Symmetry 4
CK32 CK8_RP PR DCC/ CK2_P
4

MUX
Delay PD 4
Structure P-over-N Driver SST Driver PN-over-NP Driver DPD P-over-N Driver Div QEC CML
SEL1[3:0] (20 GHz)
DIV2
Linearity Bad Good Good Good
CK4_DIV CK2_N
Scalability Good Bad Good Good DIV 16/8/4 2
Swing Large Large Small Large Adaptive Retiming Clock: Convergence within 8/16 UI
Clock Path I/Q Gen.
Area Small Large Small Small
Bandwidth Large Small Medium Large

Fig. 1. The principle of the DPD P-over-N driver (top) and structures
comparison among different output drivers (bottom). Fig. 2. Overall architecture of the proposed TX.
DPD Architecture Optimization Based on Non-Linearity Odd Symmetry DPD Flow Start
4:1 MUX & DRV. OUTN OUTP

Code[6:0] 7 7 CK0 CK90


TUNEP<3:0> Output TUNEP<3:0>
Code[6] Measure Output Ramp Curve Network OUT
Code[5:0] (PDA)
XOR
Code[6]

Code[6]

4bit DPD LUT 6 Fitting an Ideal Ramp Curve


1-UI Pulse PDA0 PDA0 1-UI Pulse
addr0 Gen. A Gen. A DIN
1
addr1 7 Calculate Pre-distorted Code (Δ) DIN0 DON DOP DIN0 DIN

X DPD Code 1-UI Pulse PDB0 PDB0 1-UI Pulse


64:1

MUX

O Gen. B Gen. B
Update Pre-distorted LUT Data
4 R 4 7
4 OUT
addr62 TUNEN<3:0> TUNEN<3:0> (PDB) CK180
No Meet Linearity
addr63 CK270
4
Yes
Half Depth LUT DPD Code Calculation
(Address: Odd Symmetric) (Δ Polarity: Odd Symmetric) End
Timing Diagram Output Network
0.6 DIN0 D0 D4 D8
0.4 Origin Code A DPD Code B
Odd Symmetric XOR (7b,0001101) (7b,0010000) Frequency Response of Output Networks
0.4 DIN1 D1 D5 D9
Output Voltage (V)

Output Voltage (V)

0.2 0100001 = 1^1011110


0.2 Δ Code 0
Code=1011110 (4b,0011) Origin Curve DIN2 D2 D6 RT CT CESD CPAD RT
0 Code=0100001 0 A Fitting Curve
Out = 0.2 Out = -0.2 Calibrated Curve DIN3 D3 D7
r1 Δ (a)

S21 (dB)
-0.2 -0.2 r1>r3>r2 B L1
1^0100001=1011110
r3 -5
-0.4 r2 CK0
-0.4 Origin Curve Fitting Curve Calibrated Curve 13 14 15 16 17 (a) No Inductor
0 16 32 48 64 80 96 112 128 -0.6 0 16 32 48 64 80 96 112 128 CK90 (b) One Inductor
DAC Code DAC Code RT CT CESD CPAD RT -10 (c) Cascaded Inductors
5 5 CK180
Origin INL
INL[LSB]

INL[LSB]

(b)
0 0
Origin INL CK270 L1 L2 0 10 20 30 40 50 60
-5 -5 Frequence (GHz)
0 16 32 48 64 80 96 112 128 0 16 32 48 64 80 96 112 128 PDA0 D0 D4 D8
DAC Code DAC Code (a)No Inductor (b)One Inductor BW Extend 128%
0.5 0.4 PDB0 RT CT CESD CPAD RT
(a)No Inductor
INL[LSB]

INL[LSB]

0 D0 D4 D8 (c)Cascaded Inductors BW Extend 221%


0
-0.5 Calibrated DOP D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 (c)
Calibrated INL -0.4 INL
-1
Fig. 4. Schematics of the proposed P-over-N driver with merged 4:1 MUX and 1-
0 16 32 48 64 80 96 112 128 0 16 32 48 64 80 96 112 128
DAC Code DAC Code

Fig. 3. Architecture and flow chart of the proposed DPD (top), simulation UI pulse generators (top). Timing diagram (bottom-left) and bandwidth
results for different polarity non-linearities, including ramp and INL curves. comparison among different output networks (bottom-right).
Measured Ramp Curve INL
50 Gb/s NRZ 25mV
4ps 100 Gb/s PAM4 25mV
4ps 8 8
Origin INL Origin INL
119 mV 6 Calibrated INL 6 Calibrated INL
4 4
41 mV 2 2
INL[LSB]

INL[LSB]

0 0
-40 mV -2 -2
Origin INL: -6.5/+5.5 -4 Origin INL: -7.5/+5.6
-4
Calibrated INL: -1.6/+1.9 Calibrated INL: -1.5/+1.9
EH = 145 mV EH = 25/27/28 mV -120 mV -6 -6
-8
EW = 0.65 UI EW = 0.23 UI RLM:97.9% -8
0 16 32 48 64 80 96 112 128 0 16 32 48 64 80 96 112 128
DAC Code DAC Code
80 Gb/s PAM4 w/o Calibration 80mV 5ps 80 Gb/s PAM4 w/ Calibration 80mV
5ps JSSC’24[9] CICC’24[1] ISSCC’23[2] JSSC’22[3] JSSC’20[4] JSSC’18[5] JSSC’22 [6] ISSCC’19 [8] This Work
394 mV 400 mV Architecture Analog Analog Analog Analog Analog DAC-DSP DAC-DSP Analog DAC-DSP
Technology
28 28 40 28 65 10 10 40 28
(nm)
146 mV 131 mV Automatic
No No No No No No PD+PR PD+PI PD+PR
Retiming
-149 mV -134 mV Retiming *512 *768
UI
Clock N/A N/A N/A N/A N/A N/A 8/16 UI
UI(Max) (Max)
EH = 20/90/33 mV -396 mV EH = 77/80/80 mV -403 mV
Conv.Time
Data Rate
EW = 0.16/0.28/0.16 UI RLM:93.8% EW = 0.25 UI RLM:99% (Gb/s)
80 128 100 100/200 112 112 224 112 100

FFE Taps 4 5+1 3+1 5 2 3 8 4 6


Ramp Output w/o Calibration Ramp Output w/ Calibration Modulation SE-PAM4 PAM4 PAM-8 NRZ/PAM4 NRZ/PAM4 NRZ/PAM4 PAM2/4/8 PAM4/NRZ PAM4/NRZ
CML CML CML
Output DPD VM
VM (merge 4:1 CML (merge 4:1 CML (merge 4:1 CML SST
Driver (merge 4:1 MUX)
MUX) MUX) MUX)
Power (mW) 248 192 335 926/926 243 232 390 436 204
Efficiency
3.1 1.5 3.35 9.26/4.63 2.17 2.07 1.74 **3.89 2.04
(pJ/bit)
Before Cal. After Cal.
Origin Curve Calibrated Curve RLM 0.99 0.99 - 0.98 0.99 0.985 0.99 0.976
0.938# 0.99#
Ideal Curve Ideal Curve
Area (mm2) 0.18 0.045
0.197 0.432 0.694 0.03 0.0875 0.56 0.278
Wire Wire
Packaging Bare die Flip-chip Bare die Bare die LGA Flip-chip Wire bonding
bonding bonding
Fig. 5. Measured 50 GBaud NRZ and PAM4 output eye diagrams (top), * Estimated from the paper. ** With on-chip clock. # According Fig. 5

80Gb/s PAM4 eye diagrams and ramp output curves before and after DPD Fig. 6. Measured different polarity non-linearities ramp curves INLs before and
calibration (middle and bottom). after DPD calibration (top) and comparison with prior arts (bottom).
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on September 01,2025 at 15:07:01 UTC from IEEE Xplore. Restrictions apply.
IEEE CICC 2025 25-3

Acknowledgements
This work was supported by National Key Research and Development Program under
Grant 2022YFB4401904 and the National Natural Science Foundation of China under
DSP Grant 92473201, 62374126, 62021004, 62227816.
675 um

Data Serial CLK


DAC PD
Peaking
Network

660 um
Fig. 7. Die photo of the proposed TX in 28-nm CMOS technology.

References [8] PJ. Peng et al., “A 112Gb/s PAM-4 Voltage-Mode Transmitter with 4-Tap Two
[1] Wu H, et al., “A 128Gb/s PAM-4 Transmitter with Edge-Boosting Pulse Step FFE and Automatic Phase Alignment Techniques in 40nm CMOS,” ISSCC,
Generator and Pre-Emphasis Asymmetric Fractional-Spaced FFE in 28nm vol. 56, issue 7, 2019. https://doi.org/10.1109/isscc.2019.8662361
CMOS,” CICC, 2024. https://doi.org/10.1109/cicc60959.2024.10529064 [9] Park J K, et al., “An 80-Gb/s/pin Single-Ended Voltage-Mode PAM-4
[2] J. Yang, et al., “A 100Gb/s 1.6Vppd PAM-8 Transmitter with High Swing 3+1 Transmitter With a Pulsewidth Pre-Emphasis and a 4-Tap FFE in 28-nm CMOS,”
Hybrid FFE Taps in 40nm,” ISSCC, 2023. JSSC, 2024. https://doi.org/10.1109/jssc.2024.3431288
https://doi.org/10.1109/isscc42615.2023.10067452 [10] J.-H. Park, et al., “A 32Gb/s/pin 0.51pJ/b Single-Ended Resistor-less
[3] Wang Z, et al., “An output bandwidth optimized 200-Gb/s PAM-4 100-Gb/s Impedance-Matched Transmitter with a T-Coil-Based Edge-Boosting Equalizer in
NRZ transmitter with 5-tap FFE in 28-nm CMOS,” JSSC, vol. 57, issue 1, 2021. 40nm CMOS,” ISSCC, 2023. https://doi.org/10.1109/ISSCC42615.2023.10067552
https://doi.org/10.1109/jssc.2021.3109562
[4] Zheng X, et al., “A 50–112-Gb/s PAM-4 transmitter with a fractional-spaced
FFE in 65-nm CMOS,” JSSC, vol. 55, issue 7, 2020.
https://doi.org/10.1109/jssc.2020.2987712
[5] Kim J, et al., “A 112 Gb/s PAM-4 56 Gb/s NRZ reconfigurable transmitter with
three-tap FFE in 10-nm FinFET,” JSSC, vol. 54, issue 1, 2019.
https://doi.org/10.1109/jssc.2018.2874040
[6] J. Kim et al., “A 224-Gb/s DAC-Based PAM-4 quarter-rate transmitter with 8-tap
FFE in 10-nm FinFET,” JSSC, vol. 57, issue 1, 2022.
https://doi.org/10.1109/jssc.2021.3108969
[7] Liu S, et al., “A 56 Gb/s DAC-DSP-based transmitter with adaptive retiming
clock optimization using inverse-PR-based PD achieving 8-UI converge time in 28-
nm CMOS,” SCIS, vol. 67, issue 8, 2024. https://doi.org/10.1007/s11432-024-
4072-9

Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on September 01,2025 at 15:07:01 UTC from IEEE Xplore. Restrictions apply.

You might also like