0% found this document useful (0 votes)
39 views27 pages

Physical Design For 3D ICs

This document discusses the physical design aspects of 3D integrated circuits (ICs), focusing on through-silicon vias (TSVs) and their impact on design methodologies. It covers the manufacturing processes, placement strategies, low-power design techniques, and the need for advanced electronic design automation (EDA) tools to address the challenges associated with TSVs. The chapter emphasizes the importance of optimizing power, performance, and area in 3D IC designs while also highlighting the current state of available tools as of early 2015.

Uploaded by

Freddy González
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views27 pages

Physical Design For 3D ICs

This document discusses the physical design aspects of 3D integrated circuits (ICs), focusing on through-silicon vias (TSVs) and their impact on design methodologies. It covers the manufacturing processes, placement strategies, low-power design techniques, and the need for advanced electronic design automation (EDA) tools to address the challenges associated with TSVs. The chapter emphasizes the importance of optimizing power, performance, and area in 3D IC designs while also highlighting the current state of available tools as of early 2015.

Uploaded by

Freddy González
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Physical Design for 3D ICs 9

Sung-Kyu Lim

CONTENTS
9.1 Introduction 218
9.2 What Is a Through-Silicon Via? 219
9.2.1 Through-Silicon Via (TSV) Manufacturing 220
9.2.2 3D IC Tool Availability as of Early 2015 221
9.3 TSV Placement in Block-Level 3D ICs 222
9.3.1 TSV Placement Styles 222
9.3.2 A Block-Level 3D IC Design Flow 224
9.3.3 A 3D IC Design Evaluation Methodology 224
9.3.4 Simulation Results 226
[Link] Wirelength and Timing Results 226
[Link] Power Consumption, Thermal, and Stress Results 227
[Link] Summary 228
9.3.5 Needs for EDA Tool Development 229
9.4 Low-Power 3D IC Design with Block Folding 230
9.4.1 Target Benchmark and 3D IC Design Flow 230
9.4.2 A Block Folding Methodology 231
[Link] Block Folding Criteria 231
[Link] Folding Examples 232

217

© 2016 by Taylor & Francis Group, LLC


218    9.1 Introduction

9.4.3 Full-Chip Assembly and Simulation Results 234


[Link] Summary 236
9.4.4 Needs for EDA Tool Development 236
9.5 To Probe Further 236
9.5.1 3D IC Floorplanning 236
[Link] 3D IC Placement 237
[Link] 3D IC Routing and Buffering 237
[Link] 3D IC Power Distribution Network Routing 238
[Link] 3D IC Clock Routing 238
[Link] Physical Design for Monolithic 3D IC 239
[Link] 3D IC Mechanical Reliability Analysis and Design Optimization 239
[Link] 3D IC Low-Power Physical Design Methodologies 240
References 240

9.1 INTRODUCTION

A major focus of the electronics industry today is to miniaturize ICs by exploiting advanced
lithography technologies. This trend is expected to continue to the 7 nm node and, perhaps, beyond.
However, due to the increasing power, performance, and financial bottlenecks beyond 7 nm, the
semiconductor industry and academic researchers have begun to actively look for alternative
solutions. This search has led to the current focus on thinned and stacked 3D ICs, initially by wire
bond, later by flip chip and package on package (POP), most recently by through-silicon via (TSV)
[16,40,41,48,82], and in the near future by monolithic 3D ICs [7,78,67,68].* A 3D IC provides the
possibility of arranging and interconnecting digital and analog functional blocks across multiple
dies at a very fine level of granularity, as illustrated in Figure 9.1. This shortens interconnect, which
naturally translates into reduced delay and power consumption [6,32,48,63,77]. Historically, 3D
IC technology was first adopted in real-time image sensing [42] and recently by DRAM [33,37]
and flash memory [25]. Advances in 3D IC integration and packaging are gaining momentum and
have become of critical interest to the semiconductor community. However, the lack of physical
design tools that can handle TSVs and 3D die stacking—in addition to cost and yield—delays the
mainstream acceptance of this technology.
This chapter focuses on TSV-based 3D ICs that are different from 2.5D ICs based on inter-
poser technologies [71,75,76]. These 3D ICs are built by stacking bare dies and connecting them
with TSVs. TSVs are deployed in dies with active devices, not just passive components as in
2.5D ICs. The design methods discussed in this chapter are chip oriented as opposed to package
oriented, because our focus is on the quality of the chip, not its package. In a 3D IC built with
TSVs, the individual dies are fabricated separately and later bonded and connected with TSVs.
Another emerging technology for die stacking is monolithic 3D integration, where the dies are
grown on top of each other in a sequential fashion. This chapter focuses on 3D ICs built with
TSVs, not monolithic 3D integration, while we provide an overview of recent EDA work on the
latter topic in Section 9.5. We review design challenges, algorithms, design methodologies, and
technical details for key physical design steps targeting 3D ICs built with TSVs. We also exam-
ine related simulation results to evaluate the effectiveness of these techniques on real designs. The
history of physical design tool development for TSV-based 3D ICs is short, and new research is
reported at an increasing pace, as described in Section 9.5. Instead, we focus on the following
practical topics that 3D IC designers face during 3D IC adoption:

* Monolithic 3D IC technology has been commercialized in late 2014 by Samsung for flash memory [25]. However, the
application of this technology to logic is far behind: only a single or a small number of gates have been demonstrated
[7,67,68,78].

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    219

Heat sink

Bulk silicon

Wafer 1

Device layer 1
Metal layers
Face-to-face Wafer-to-wafer
bonding via
Metal layers
Wafer 2
Device layer 2
Thinned sililcon Backside IO via
IO bump

FIGURE 9.1 A two-tier 3D IC with face-to-face bonding. (Image from Black, B. et al., Die stacking
(3D) microarchitecture, in Proceedings of Annual International Symposium Microarchitecture, 2006.)

◾◾ TSV placement in block-level 3D ICs: In 3D ICs, block-level designs offer various


advantages over designs integrated at other levels of granularity, such as gate level,
because they promote the reuse of IP blocks. One fundamental problem in this design
style is the placement of TSVs, which have a profound impact not only on traditional
physical design objectives such as area, wirelength, timing, and power [36,81] but also
on reliability metrics including thermal and mechanical stress [30]. We study trade-offs
among the various ways to place TSVs in block-level designs, where TSVs are placed
between the blocks or form arrays at strategic locations.
◾◾ Low-power 3D IC design with block folding: Low power is a potential key benefit of
3D ICs, yet few thorough design studies explored it. We study several physical design
methodologies to reduce power consumption in 3D ICs. We use a large-scale com-
mercial-grade microprocessor (OpenSPARC T2) as a benchmark. Our specific focus is
on functional module partitioning schemes that further reduce wirelength and buffer
usage of individual modules used in a 3D IC design. We also study the needs for and the
development of several new physical design tools to serve this purpose.

In this chapter, we present self-contained algorithms and methodologies that EDA engineers
can implement from scratch, or in the form of pre- and postprocessing, or plug-ins to an existing
tool—either 2D or 3D IC tool—and reproduce the related results. In addition, we describe how 3D
IC designers can use various commercial tools built for 2D ICs to handle 3D IC physical design
with power, performance, and area (PPA) optimization, as well as multi-physics simulation and
reliability analysis. Our choices for particular tools used in this chapter are not due to the lack of
features in other tools (in fact, our decision should not discourage the use of other tools). On the
contrary, we hope to inspire the development of new EDA algorithms, software, and methodolo-
gies. We conclude each section by discussing related issues and implications that EDA engineers
may face during tool development for 3D IC physical design.

9.2 WHAT IS A THROUGH-SILICON VIA?

A through-silicon via (TSV) is a micron scale via that vertically penetrates either the silicon sub-
strate or the entire IC stack. TSV is currently the de facto standard for the die-to-die interconnect
in 3D ICs. A TSV is typically made up of a copper pillar surrounded by a silicon dioxide liner.

© 2016 by Taylor & Francis Group, LLC


220    9.2 What Is a Through-Silicon Via?

In addition, tungsten pillars and benzocyclobutene (BCB) liners are used to mitigate TSV-induced
mechanical stress issues such as delamination and cracks. As of early 2015, TSVs are usually a
single-digit micron in diameter and two-digit micron in height [16,40,41,82]. The RC parasitics vary
even more widely depending on the material, process, and geometric parameters. The TSV resis-
tance is roughly an order of magnitude smaller than that of a back-end-of-line (BEOL) via, while
the capacitance is one or two orders of magnitude larger. The keep-out zone (KOZ) is defined as a
rectangular region surrounding a TSV, where the placement of devices is strictly forbidden. A KOZ
is required to minimize adverse issues such as cracks and timing variations of nearby devices.
Depending on when the TSVs are fabricated, two major TSV types exist: via first and via last,
as illustrated in Figure 9.2a. In the via-first case, TSVs are fabricated before CMOS or BEOL
(back end of line) metallization. The dimensions of via-first TSVs are typically smaller (1–10 μm
diameter), with aspect ratios (height:diameter) of 5:1 to 10:1. Via-last TSVs, on the other hand, are
created after BEOL or bonding, essentially when the wafer is finished. In this case, the processing
can be done at the foundry or the packaging house. The via-last TSV diameter is typically wider
(10–50 μm), with aspect ratios of [Link].
An important benefit of TSV-based 3D IC is wirelength reduction, which leads to power and
delay savings [6,32,48,63,77]. However, the sheer size of TSVs is identified as a major impediment for
the greater usage of TSVs. According to the 2013 ITRS [26], the TSV diameter is projected to
range from 1.5 to 1.0 μm between 2009 and 2015. However, the area of a four-transistor NAND
gate is projected to range from 0.82 to 0.20 μm2 during the same period. This means that the
area ratio between TSVs and logic gates is projected to increase from 2.74 (= 2.25/0.82) to
5 (= 1.0/0.20). This TSV-to-gate-size ratio becomes even larger if the KOZ is considered. This area
overhead issue—which becomes the burden for physical design to minimize—directly influences
the achievable performance-power trade-off curves of 3D ICs. Moveover, it was demonstrated
in [37] that TSVs at the full-chip level can easily occupy 20%–30% of the die area even after
careful partitioning. These concerns call for EDA tools that carefully consider the impact of
TSV size during the partitioning, placement, routing, and clock tree synthesis stages of physical
implementation. Tools are also needed to capture the RC parasitics of these large TSVs and their
impact on circuit power and performance.

9.2.1 THROUGH-SILICON VIA (TSV) MANUFACTURING

There are two main technologies for manufacturing TSVs: dry etching (Bosch etching) and laser
drilling [48]. Laser drilling is faster and cheaper but cannot produce TSVs with small diameter
or high aspect ratio. Thus, laser drilling is preferred for via-last TSVs, while the Bosch etching
is more frequently used for via-first TSVs [48]. Polysilicon, copper, and tungsten are the most
­popular materials for TSV fill. Silicon dioxide is a popular material for the liner that sits between
the TSV and the silicon substrate for insulation purposes.

Face
Face
Metal
(face)

Back
Device Via-last
Face
TSV
Substrate

Via-first
(back)

TSV Liner
Back
Landing pad
Back

(a) (b)

FIGURE 9.2 (a) TSV type and landing pads (shown with face-to-back bonding), (b) die bonding
styles (shown with via-first TSV). (Image from Lim, S.K., SV-aware 3D physical design tool needs for
faster mainstream acceptance of 3D ICs, in ACM DAC Knowledge Center, 2010.)

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    221

After the formation of the metal-filled holes, the chips are thinned and bonded together.
Thinning is done by grinding, chemical mechanical planarization (CMP), or by a wet chemical
process. A silicon or glass carrier is typically used here, where the wafer is turned upside down
and temporarily bonded to the carrier. Depending on which sides of the dies are bonded together,
there exist three types of bonding styles, namely, face-to-face, face-to-back, and back-to-back, as
illustrated in Figure 9.2b. Some of the popular bonding technologies [41] include oxide fusion,
metal-to-metal, copper-to-copper, micro-bumping, and polymer adhesive bonding. Note that
face-to-face bonding does not utilize TSVs because the interconnect between the dies is estab-
lished by using metal layers only.
From the perspective of physical design, via-first TSVs are less intrusive because they interfere
only with the device, M1, and top layers, whereas via-last TSVs interfere with all layers in the
die, as illustrated in Figure 9.2a. Via-first TSVs have their landing pads on M1 and the top metal
layers, whereas via-last TSVs have their landing pads only on the top metal layers. A landing pad
includes keep-out zones around it to reduce electrical coupling and mechanical damage to nearby
devices and interconnects. The connections between via-first TSVs are made using local inter-
connect and vias between adjacent dies, whereas via-last TSVs are stacked on top of each other, as
illustrated in Figure 9.2a. Therefore, via-first TSVs are normally used for signal and clock delivery,
whereas power delivery networks utilize via-last TSVs.

9.2.2 3D IC TOOL AVAILABILITY AS OF EARLY 2015

Ansys [2] offers the following modeling and simulation tools for 3D IC:

◾◾ RedHawk: Simulator for simultaneous switching noise, decoupling capacitance, and


on-chip and off-chip inductance for 3D IC.
◾◾ Sentinel-TI: Thermal simulation and mechanical stress integrity analysis platform for
stacked die/3D IC designs.

Atrenta [5] offers SpyGlass Physical 2.5D/3D that provides early estimates of area, timing,
and routability for RTL designers without the need for physical design expertise or tools for
2.5D/3D IC. It provides valuable physical reports and rules to identify area, congestion, and
timing issues at the early stages of the 3D IC design.
Cadence [10] offers

◾◾ Encounter: 3D IC physical design tool (placement, optimization, routing) for custom and
digital designs
◾◾ QRC Extraction: 3D IC verification and analysis tool
◾◾ Encounter DFT Architect: Design-for-test tool for 3D ICs
◾◾ Design IP for wide-I/O controller
◾◾ SiP (System-in-Package) Co-design: IC/Package co-design tool

Mentor Graphics [50] offers

◾◾ Calibre: 3D IC physical verification, extraction, LVS, and DFM for 3D IC products: SiP,
silicon interposers, or stacked die with TSVs
◾◾ Tessent: Deterministic scan testing, embedded pattern compression, built-in self test,
specialized embedded memory test and repair, and boundary scan tool for 2.5D and 3D ICs

Synopsys [72] offers

◾◾ DFTMAX Test Automation: Design-for-Test tool for stacked die and TSV
◾◾ DesignWare STAR Memory System IP: Integrated memory test, diagnostic and repair
solution
◾◾ IC Compiler: 3D IC place-and-route tool, including TSV, microbump, silicon interposer
redistribution layer (RDL) and signal routing, power mesh creation, and interconnect checks

© 2016 by Taylor & Francis Group, LLC


222    9.3 TSV Placement in Block-Level 3D ICs

◾◾ StarRC Ultra: Parasitic extraction tool with a support for TSV, microbump, interposer
RDL, and signal routing metal
◾◾ HSPICE and CustomSim: Multi-die interconnect simulation and analysis tool.
◾◾ PrimeRail: IR drop and EM analysis tool for 3D IC
◾◾ IC Validator: DRC for microbumps and TSVs, LVS connectivity checking between
stacked die
◾◾ Galaxy Custom Designer: Custom editor for silicon interposer RDL, signal routing, and
power mesh
◾◾ Sentaurus Interconnect: Thermo-mechanical stress analyzer to evaluate the impact of
TSVs and microbumps used in multi-die stacks

Xilinx offers 3D IC EDA tools to the customers of its 3D FPGA devices [81].
Among the other vendors, 3D IC layout editors are offered by Micro Magic [51] and R3
Logic [64], while 3DInCites [1] offers an up-to-date list of 3D EDA activities.

9.3 TSV PLACEMENT IN BLOCK-LEVEL 3D ICs


In general, block-level designs offer various advantages over designs done at lower levels of gran-
ularity, such as gate level, because they promote the reuse of existing hard IP blocks. The same
philosophy applies to 3D ICs, where the IPs can be assembled into multiple tiers and connected
with intra- and/or inter-tier vias and interconnects. In addition, chip-scale IPs such as a multi-
core design or an entire L2 cache can be easily stacked and assembled in 3D ICs. A major physical
design challenge in block-level 3D IC designs is TSV placement. In this section, we review trade-
offs among various ways to place TSVs in block-level designs, where TSVs are placed between the
blocks or form small and/or large arrays at strategic locations. We discuss three practical options
studied in [4], namely, TSV farm, TSV distributed, and TSV whitespace. Other options published
in the literature include [39,74]. Depending on the location of through-silicon vias (TSVs) in the
bottom die, a redistribution layer (RDL) may become necessary on the backside of the bottom tier
to connect the two dies, as shown in Figure 9.3.
Among several possible configurations, we focus on a two-tier 3D IC, where the bottom die
has a larger footprint. A typical example of such a stacking includes a hybrid memory cube [73]
(= 3D DRAM) stacked on top of a multi-core processor. Both dies are facing down so that the heat
sink is located above the backside (= bulk) of the top die, and C4 bumps are below the frontside
(= top metal layer) of the bottom die. This stacking allows for better power delivery and poten-
tially better cooling if the top die consumes more power. We further assume that the design of the
top die is fixed so that we focus on the block-level design of the bottom die.

9.3.1 TSV PLACEMENT STYLES

When face-to-back bonding is utilized between two dies with different die sizes, redistribution-
layer (RDL) routing on the backside of the bottom die is required in some cases. If some TSVs in
the bottom die are outside the footprint area of the top die, RDL routing is necessary to connect
the TSVs to the bonding pads of the top die, as illustrated in Figure 9.3a. But if all TSVs inserted in

RDL
TSV

(a) (b)

FIGURE 9.3 A side view of a 3D IC. (a) With a redistribution layer (RDL), (b) without an RDL.
(Image from Athikulwongse, K. et al., Block-level designs of die-to-wafer bonded 3D ICs and their
design quality tradeoffs, in Proceedings of Asia and South Pacific Design Automation Conference, 2013.)

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    223

the bottom die are inside the footprint area of the top die as shown in Figure 9.3b, the TSVs in the
bottom die can be directly bonded to the bonding pads in the top die, without any RDL routing.
Although the RDL allows connections between TSV landing pads on the backside of the
bottom die and the bonding pads in the top die, it causes several negative effects. First of all,
typical wires on the RDL are wide, possibly as wide as the wires on the topmost metal layers.
Thus, their parasitic capacitance is much higher than local metal wires and cause timing
degradation and dynamic power overhead. In addition, the large minimum pitch between adjacent
wires in the RDL limits the minimum TSV pitch in a TSV array. For example, if four TSVs are
placed in a 2 × 2 array, they can be placed as close to each other as possible. However, if 25 TSVs
are placed in a 5 × 5 array, the TSV in the center cannot be routed by an escape routing unless the
TSV pitch is several times larger than the minimum pitch.
Our discussion distinguishes two options that are available for the design of 3D ICs with
different die sizes: (1) insert all TSVs inside the footprint area of the top die so that RDL routing is
not required or (2) insert TSVs wherever they are needed and perform RDL routing. The former
limits TSV locations but does not require RDL wires. The latter provides a higher degree of free-
dom on TSV locations but requires RDL wires. In addition, different TSV insertion styles lead to
very different layout qualities. We study three different design styles: TSV farm (without RDLs),
TSV distributed (with RDLs and regularly placed TSVs), and TSV whitespace (with RDLs and
irregularly placed TSVs), as shown in Figure 9.4.

(a) (b)

(c)

FIGURE 9.4 TSV placement styles in the block-level 3D IC. (a) TSV-farm, (b) TSV-distributed, (c) TSV-
whitespace styles. TSVs are shown in white. (Image from Athikulwongse, K. et al., Block-level designs
of die-to-wafer bonded 3D ICs and their design quality tradeoffs, in Proceedings of Asia and South
Pacific Design Automation Conference, 2013.)

© 2016 by Taylor & Francis Group, LLC


224    9.3 TSV Placement in Block-Level 3D ICs

9.3.2 A BLOCK-LEVEL 3D IC DESIGN FLOW

In [4], TSV insertion and floorplanning are performed in the bottom die as follows: in the
TSV-farm and the TSV-distributed styles, TSVs are pre-placed in arrays and treated as obstacles
during floorplanning. In the TSV-farm style, an array of TSVs are placed in the middle of the
bottom die. In the TSV-distributed style, on the other hand, TSVs are placed all over the bot-
tom die. Therefore, some of the TSVs are placed outside the footprint area of the top die. After
the TSVs are pre-placed, the floorplanning of the blocks is manually (or automatically using an
obstacle-aware floorplanner) performed. The following factors are considered for each style:

◾◾ TSV farm: Since functional blocks and TSVs should not overlap, the blocks are placed
around the TSV farm. Since the TSV farm area is usually large and occupies a prime
location in the chip footprint, this style may cause significant wirelength overhead if the
blocks are highly connected. On the other hand, if the inter-block connectivity is not
high, the TSV farm in the center does not cause a significant wirelength overhead.
◾◾ TSV distributed: In this style, TSVs may not cause a significant wirelength overhead.
This is because TSVs are grouped in small arrays unlike the one large array in the
TSV-farm style. However, some large blocks may have very few locations available for
their placement because they cannot be placed in the space between adjacent TSV
arrays. This design restriction may degrade wirelength, timing, and power. However,
the TSV-distributed style promotes low operating IC temperature and low TSV stress
because of the even distribution of TSVs.
◾◾ TSV whitespace: In this style, a 3D floorplanner is used first to obtain TSV-whitespace
style layouts. After floorplanning, TSVs are manually inserted into the whitespace
between blocks. Therefore, TSVs are placed in irregular positions, unlike the other two
styles. When there is not enough whitespace, the floorplan is perturbed by shifting
blocks to create or expand whitespace. Since a 3D floorplanner is invoked without any
restrictions imposed by TSVs, this style is expected to optimize the traditional objectives
such as power, performance, and area (PPA) better than other design styles.

Another noteworthy work in TSV management for block-level 3D ICs is by [39]. The authors
propose two styles, namely, legacy 2D and TSV islands. In the legacy 2D style, functional blocks
are first floorplanned, and then TSVs are inserted in the whitespace for inter-block connection.
This style resembles the TSV-whitespace case explained earlier, where the location of TSVs is
irregular and TSVs do not tend to form groups. In the TSV island style, TSVs are grouped to form
small islands. Unlike the TSV-distributed case earlier, however, the location of these islands is
irregular. The authors presented a net clustering approach to group TSVs into islands while not
degrading the initial 3D floorplan quality.

9.3.3 A 3D IC DESIGN EVALUATION METHODOLOGY

A timing and power analysis flow for 3D IC [34] is shown in Figure 9.5. First, parasitic resistance
and capacitance of each die are extracted using, for example, Cadence QRC Extraction. Since the
face-to-back die bonding style is assumed, the capacitive coupling between the bottom and the
top dies is negligible.* The parasitic resistance and capacitance of the redistribution layer (RDL)
are extracted next.
For 3D static timing analysis, the top and the bottom dies are represented as modules in
a top-level Verilog file. A top-level SPEF file is also created. It includes not only the parasitic
resistance and capacitance of both dies but also resistance and capacitance of TSVs and the RDL
wires. For an accurate power analysis, the switching activity of all logic cells is obtained by a func-
tional simulation of the whole chip. Synopsys PrimeTime is used to perform static timing and
power analysis, using the combined SPEF file that contains parasitics in both dies.

* If we use face-to-face bonding, this inter-die coupling must be extracted. Currently, such a tool does not exist.

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    225

Top-die DEF/GDSII

Bot-die DEF/GDSII

SoC encounter
Top-level RTL

Top-die RTL
Top-die RC Design
switching
Top-level Bot-die RTL activity
TSV RC Bot-die RC

Primetime PX

Timing and power of 3D ICs

FIGURE 9.5 A timing and power analysis flow for 3D ICs. (Image from Athikulwongse, K. et al.,
Block-level designs of die-to-wafer bonded 3D ICs and their design quality tradeoffs, in Proceedings of
Asia and South Pacific Design Automation Conference, 2013.)

The thermal analysis flow for 3D IC is shown in Figure 9.6. This flow is built based on a com-
mercial tool, namely, Ansys FLUENT, and enhanced with custom plug-ins. First, a meshed
structure is created, where each thermal tile contains material composition information, such
as copper and dielectric density in the tile. This information is extracted from GDSII layout files,
which include logic cells as well as TSVs. These files together with the power dissipation of each
logic cell are supplied to the layout analyzer. The layout information of a tile consists of the total
power dissipated in the tile, and thermal conductivity computed from the materials inside, such
as poly-silicon used for transistor gates, tungsten used for vias, copper used for TSVs, and dielec-
tric material. With a sufficiently small thermal tile size, the equivalent thermal conductivity can
be computed based on a thermal resistive model [80]. Once the thermal equations are built,
FLUENT solves them to obtain temperature values at all thermal tiles.

Top-die DEF/GDSII

TSV Logic cell


Bot-die DEF/GDSII
position power

Layout and material property analyzer

Meshed Thermal Volumetric


structure conductivity heat source

Boundary
User defined functions conditions

Ansys FLUENT

3D IC Thermal maps

FIGURE 9.6 A GDSII layout-level thermal analysis flow for 3D ICs. (Image from Athikulwongse, K. et al.,
Block-level designs of die-to-wafer bonded 3D ICs and their design quality tradeoffs, in Proceedings of
Asia and South Pacific Design Automation Conference, 2013.)

© 2016 by Taylor & Francis Group, LLC


226    9.3 TSV Placement in Block-Level 3D ICs

The mechanical stress of a 3D IC layout can be analyzed using the stress analyzer obtained
from [29]. The inputs to the analyzer are die size, TSV diameter, TSV locations, simulation grid
density, and pre-computed data of TSV stress tensor. The analyzer outputs a von Mises stress
map [18], which is a widely used mechanical reliability diagnostic. The computation of stress at a
point affected by multiple TSVs is based on the principle of linear superposition of stress tensors.
With stress tensors obtained from finite element analysis (FEA) using a commercial tool such as
ABAQUS FEA, we can perform a full-chip stress analysis.

9.3.4 SIMULATION RESULTS

In this simulation, we use a 45 nm technology [55]. An open-source hardware IP core [56]


is synthesized using an open cell library [53]. We assume a high thermal conductivity mold-
ing compound [24]. The total numbers of gates in the benchmark design is 1,363,536. The total
number of functional blocks, inter-block nets, and TSVs used are 69, 1853, and 312, respectively.
The same partitioning and thus the same number of TSVs are used in all the three TSV placement
styles for fair comparisons. The TSV size is 10 μm, and TSV pitch is 30 μm. The parasitic capacitance
and resistance are 50 fF and 50 mΩ, respectively. RDL wire width and spacing of 0.4 μm is used
in the experiments.

[Link] WIRELENGTH AND TIMING RESULTS

The silicon area, footprint, and block-to-block (B2B) and RDL wirelength of the three styles
are shown in Table 9.1. The same area and footprint for all three styles are used: 3.979 and
2.766 mm2, respectively. The TSV-farm style shows the shortest wirelength because all the
TSVs occupy only one area in the middle of the die, confining the obstruction of an optimal
block placement to a small area. The TSV-distributed style shows the longest block-to-block
wirelength (27% longer than the TSV-farm style) because the TSV arrays distributed all over
the die obstruct an optimal block placement. The TSV-whitespace style shows a slightly lon-
ger wirelength (2%) than the TSV-farm style because we start from optimal block placement
and moves blocks only when it is necessary to insert TSVs in some positions. Most TSVs are
inserted in the original whitespace and do not interfere with the placement of the blocks very
much. In addition, the TSV-distributed and the TSV-whitespace styles require RDL routing,
as shown in Figure 9.7.
The longest path delay (LPD), without and with timing optimization, are also shown in
Table 9.1. The timing optimization proposed in [44] is used with the target delay of 1.25 ns.
Without timing optimization, none of the designs meets the target delay; however, the TSV-farm
style shows the shortest delay. With timing optimization, all designs almost met the target delay,
and the delay of the TSV-farm style is still the shortest. The delay of the TSV-distributed and the
TSV-whitespace styles is longer than that of the TSV-farm style by 10% and 15%, respectively.
Because of the long wirelength, it is hard to optimize both the TSV-distributed and the TSV-
whitespace styles. In addition, no buffer can be added along the RDL routing because the routing
is on the backside of the bottom die.

TABLE 9.1 Comparison of Wirelength and Longest Path Delay (LPD) with or without
Timing Optimization
Design Style Wirelength (m) LPD (ns)

Block-to-Block RDL w/o w/opt.

TSV farm 1.447 (100.00%) — 3.136 1.293 (100.00%)


TSV distributed 1.842 (+27.30%) 0.170 4.252 1.425 (+10.20%)
TSV whitespace 1.483 (+2.46%) 0.176 4.568 1.492 (+15.38%)

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    227

FIGURE 9.7 A redistribution layer (RDL) routing for the TSV-whitespace style, where the two dies
are bonded using a wide-I/O interface. (Image from Athikulwongse, K. et al., Block-level designs of
die-to-wafer bonded 3D ICs and their design quality tradeoffs, in Proceedings of Asia and South Pacific
Design Automation Conference, 2013.)

[Link] POWER CONSUMPTION, THERMAL, AND STRESS RESULTS

We now compare power consumption, thermal, and mechanical stress among the three TSV
placement styles when they are operating at their maximum frequency, as shown in Table 9.1.*
First, the total power consumption is shown in Table 9.2. We observe that the TSV-distributed
and the TSV-whitespace styles consume 6% and 10% less power than the TSV-farm style.
The maximum, minimum, and average temperatures are shown in Table 9.2. Although the
minimum and average temperatures across all the three designs are close, the maximum
temperature of the three designs is different. The TSV-distributed style shows that the low-
est maximum temperature, not because it consumes less power—resulting from relatively low
speed—but primarily because TSVs distributed all over the die, helps conduct heat. The TSV-
farm style shows a high maximum temperature because TSVs in the center of the die cannot help
conduct heat from high-power blocks far from them. The TSV-whitespace style also shows high
maximum temperature although it consumes the least power for the same reason. The thermal
profiles of the TSV-farm, TSV-distributed, and TSV-whitespace styles computed, based on the
maximum operating speed, are shown in Figure 9.8. We see that TSVs help reduce temperature,
and the local cool spots correspond to TSV array locations. The TSV-distributed style shows the
lowest maximum temperature because TSVs are distributed across the die. The TSV-whitespace
style exhibits the highest temperature because high-power blocks can be far from TSVs.

TABLE 9.2 Comparison of Power Consumption and Temperature


Design Style Ptotal (mW) Tmax (°C) Tmin (°C) Tave (°C)

TSV farm 1183 (100.00%) 76.87 38.04 47.56


TSV distributed 1107 (−6.40%) 62.43 39.15 46.28
TSV whitespace 1065 (−9.99%) 77.04 38.65 46.19

* Note that it is also possible to conduct simulations under the same clock frequency and compare power and thermal
qualities. This iso-performance comparison—although meaningful—is beyond the scope of this chapter.

© 2016 by Taylor & Francis Group, LLC


228    9.3 TSV Placement in Block-Level 3D ICs

(a) (b) (c)

FIGURE 9.8 Temperature maps for (a) TSV-farm, (b) TSV-distributed, and (c) TSV-whitespace styles.
The actual values are reported in Table 9.2. (Image from Athikulwongse, K. et al., Block-level designs of
die-to-wafer bonded 3D ICs and their design quality tradeoffs, in Proceedings of Asia and South Pacific
Design Automation Conference, 2013.)

The maximum and average stress values are shown in Table 9.3. The area with stress higher
than 10 MPa (mega-pascal) is also shown in the table. Despite high TSV density, the TSV-farm
style shows the lowest maximum stress among the designs. This is primarily due to the phenom-
enon called destructive interference of stress, where some vertical and horizontal stress vectors
cancel each other in a TSV array [28]. We see that the maximum von Mises stress values reduce
because of the interference. The average stresses above the 10-MPa threshold on the die, on the
other hand, show the opposite trend. The TSV-farm style shows the highest average stress. When
many TSVs are packed into a confined area, the impact of interference from non-neighboring
TSVs becomes noticeable. This phenomenon may overwhelm the destructive interference of
stress and accumulate high levels of overall stress. Therefore, the TSV-farm style shows higher
average stress values compared with others. Last, the TSV-whitespace style shows the largest area
of stress above the threshold and the TSV-farm style the smallest. It is mainly due to the fact that
the area occupied by the TSV arrays and their keep-out zones is the smallest in the TSV-farm
case. The stress profiles of three different styles are shown in Figure 9.9.

[Link] SUMMARY

We explored design trade-offs among three practical TSV placement styles. Because of the
absence of RDL wiring, the TSV-farm style showed the best timing. The design in this style shows
the highest average stress, but the area impacted by stress is the smallest. This means that a high
level of stress is confined to a small area, and thus, the overall reliability could be worse. The TSV-
distributed style showed the worst wirelength because TSV arrays interfere with block placement.
However, it showed the lowest temperature because TSVs distributed across the die help reduce
temperature.
Simulation results shown in this chapter are heavily design and technology dependent.
The wirelength, timing, power, thermal, and stress results can vary significantly from one
design to another based on the following factors: the total number of blocks and their intra-
and inter-die connectivity (= TSV and RDL requirements), TSV and RDL dimension and keep-
out-zone (KOZ) requirements, the device and interconnect technologies, material properties

TABLE 9.3 Comparison of TSV Mechanical Stress


Design Style σmax (MPa) σave,σ>10 (MPa) Areaσ>10 (mm2)

TSV farm 676.78 (100.0%) 150.20 (100.0%) 0.353 (100.00%)


TSV distributed 691.29 (+2.1%) 97.73 (−34.9%) 0.598 (+69.7%)
TSV whitespace 688.99 (+1.8%) 88.95 (−40.8%) 0.695 (+97.2%)

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    229

(a) (b) (c)

FIGURE 9.9 Stress maps for (a) TSV-farm, (b) TSV-distributed, and (c) TSV-whitespace styles.
The actual values are reported in Table 9.3. (Image from Athikulwongse, K. et al., Block-level
designs of die-to-wafer bonded 3D ICs and their design quality tradeoffs, in Proceedings of Asia and
South Pacific Design Automation Conference, 2013.)

and their coefficient-of-thermal expansion (CTE) mismatch, etc. Thus, the TSV-farm versus
TSV-distributed versus TSV-whitespace comparisons discussed in this chapter are to be treated
as case studies. These comparisons illustrate that the EDA tools that handle the design, analysis,
and optimization of 3D ICs under these requirements are the key in choosing the best possible
options for a target application to be implemented in a 3D IC.

9.3.5 NEEDS FOR EDA TOOL DEVELOPMENT

We suggest the following requirements to the algorithm and tool developers of TSV placement
tools for block-level 3D IC designs. First, it is crucial to understand the power, performance, area
(PPA), and multi-physics (= electro-thermo-mechanical) reliability trade-offs among different
design styles. While this chapter explores three styles, namely, TSV-farm, TSV-distributed,
and TSV-whitespace, additional design styles are possible [39]: TSVs can be placed along the
periphery or at some custom locations specified by the designers. The PPA and multi-physics
reliability qualities can differ significantly among these options, and the tool needs to offer
accurate assessment of these TSV placement solutions.
Second, block-level design for 3D IC, that is, 3D floorplanning, will still be performed manually
for small- and medium-size designs. In this case, PPA and reliability analysis will be the only tool
required, where accuracy and runtime will be the key objectives. In case an automatic floorplan-
ner is desired to handle very large floorplanning problems, the tool must offer either (1) compa-
rable quality solutions at a fraction of runtime or (2) better quality solutions while not requiring a
prohibitive runtime, both compared with manual floorplanning. In either case, the key challenge
is that the optimization engine used in the floorplanner, including Analytical [21,87], Simulated
Annealing [14,17], Genetic Algorithm, Machine Learning, etc., needs to effectively and efficiently
search the solution space and evaluate the PPA and reliability of each candidate solution quickly
but accurately.
Third, the floorplanner must be able to handle various technological options available in
3D IC, including the TSV geometries (size, keep-out-zone [KOZ] requirement, etc.), bonding
styles (face-to-face, face-to-back, back-to-back), and multi-physics properties of the chip
elements. These parameters may be fixed in the early design stage, or the choice may be given
to the designers to choose the best option. In case of the latter, the additional dimensions for
optimization that must be explored, for example, 1 μm versus 2 μm KOZ, face-to-face versus
face-to-back, silicon dioxide versus benzo-cyclo-butene liner for TSV, etc., will further com-
plicate the overall floorplanning process. Algorithms and tools developed for this purpose will
need to accurately capture the impact of these choices on full-chip PPA and reliability while
searching for optimum solutions.

© 2016 by Taylor & Francis Group, LLC


230    9.4 Low-Power 3D IC Design with Block Folding

9.4 LOW-POWER 3D IC DESIGN WITH BLOCK FOLDING

We review the 3D block folding methods described in [32] that are developed to reduce power
consumption in 3D ICs on top of the traditional 3D floorplanning. This study is based on the
OpenSPARC T2 (an 8-core 64-bit SPARC SoC) design database and Synopsys 28 nm process
design kit (PDK) with nine metal layers that are both available to the academic community.*
We first discuss how to build, analyze, and optimize GDSII-level 2D and two-tier 3D layouts
using industry EDA tools and enhancements. Based on this design environment, we study how
to rearrange blocks into 3D to reduce power. Next, we explore block folding methods, that
is, partitioning a block into two subblocks and bonding them to achieve power savings in the 3D
design. We employ a mixed-size 3D placer for block folding. Last, we demonstrate system-level
3D power benefits by assembling folded blocks.

9.4.1 TARGET BENCHMARK AND 3D IC DESIGN FLOW

The OpenSPARC T2, an open-source commercial microprocessor from Sun Microsystems with
500 million transistors, consists of 53 blocks including eight SPARC cores (SPC), eight L2-cache
data banks (L2D), eight L2-cache tags (L2T), eight L2-cache miss buffers (L2B), and a cache
crossbar (CCX). Each block is synthesized with 28 nm cell and memory macro libraries. For the
2D design, we follow the original T2 floorplan [54] as much as possible, as shown in Figure 9.10.
In addition, special care is taken to optimize both connectivity and data flow between blocks to
reduce inter-block wirelength.
The RTL-to-GDSII tool chain for 3D IC design used here is based on commercial tools,
and enhanced with in-house tools to handle TSVs and 3D stacking. With the initial
design constraints, the entire 3D netlist is synthesized. The layout of each die is done sep-
arately based on the 3D floorplanning result. With a given target timing constraint, cells and

FIGURE 9.10 GDSII layouts of OpenSPARC T2 (full chip): 2D IC design (9 × 7.9 mm2). (Image from
Jung, M. et al., On enhancing power benefits in 3D ICs: Block folding and bonding styles perspective,
in Proceedings of ACM Design Automation Conference, 2014.)

* Synopsys has developed a 32/28 nm Interoperable PDK for its University Program members to use specifics of modern
technologies. This PDK enables students to master design of digital, analog, and mixed-signal ICs, using the latest
Synopsys Custom Implementation tools and utilizing IP-free technology, with parameters and peculiarities close to
real processes.

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    231

FIGURE 9.11 3D IC GDSII layouts of OpenSPARC T2 (full chip): core/cache stacking (6 × 6.4 mm2,
# TSV = 3263). (Image from Jung, M. et al., On enhancing power benefits in 3D ICs: Block folding and
bonding styles perspective, in Proceedings of ACM Design Automation Conference, 2014.)

memory macros are placed in each block. Note that we only utilize regular-Vt (RVT) cells as
a baseline. The netlists and the extracted parasitic files are used for 3D static timing analysis,
using Synopsys PrimeTime to obtain new timing constraints for each block’s I/O pins as well
as die boundaries (= TSVs). In this section, we assume two-tier, face-to-back bonded 3D ICs.
We use the following parameters for TSV: diameter 3 μm, height 18 μm, pitch 6 μm, resistance
0.043 Ω, and capacitance 8.4 fF.
With these new timing constraints, we perform block-level and chip-level timing optimiza-
tions (buffer insertion and gate sizing) as well as power optimizations (gate sizing) using Cadence
Encounter. We improve the design quality through iterative optimization steps such as pre-CTS
(clock tree synthesis), post-CTS, and post-route optimizations. We utilize all nine metal layers for
the SPC design, which requires the most routing resources among all blocks. We use seven layers
for all other blocks. Thus, the top two metal layers can be utilized for over-the-block routing in the
chip-level design. Figure 9.11 shows a 3D IC design, where all cores are partitioned into one tier
and all L2 cache in another. This is one of the most popular approaches to die partitioning. We use
this design as another baseline—in addition to the 2D design shown in Figure 9.11—for compari-
son with a new partitioning scheme, named the block folding described in the next section.

9.4.2 A BLOCK FOLDING METHODOLOGY

So far in this chapter, a block-level design style is used in both 2D and 3D ICs. In this case, each
block in a 3D IC design occupies a single tier, and TSVs are placed outside the blocks to connect
them. In this section, we study block folding, where we take the tier-partitioning approach into
a finer-grained level: we partition a single block into multiple tiers under the same footprint and
connect them with TSVs that are placed inside the folded block.

[Link] BLOCK FOLDING CRITERIA

For the block folding to provide power saving, the following criteria need to be met in the target
block to be folded:

◾◾ The target block must consume a high enough portion of the total system power.
Otherwise, the power saving from the block folding could be negligible at the system
level. Blocks that consume over 1% of the total system power are listed in Table 9.4.
Note that the total power portion of SPC, L2D, and L2T is the average of the eight

© 2016 by Taylor & Francis Group, LLC


232    9.4 Low-Power 3D IC Design with Block Folding

TABLE 9.4 2D IC Design Characteristics Used for Block Folding Candidate Selection
Block Total Power Portion (%) Net Power Portion (%) # Long Wires (K) Remark

SPC 5.8 55.1 27.7 CPU clock, 8×


RTX 3.6 44.4 27.5 I/O clock
CCX 2.8 57.6 12.4 CPU clock
L2D 2.1 29.2 6.5 8×
L2T 1.8 48.5 6.0 8×
RDP 1.7 48.9 5.2 I/O clock
TDS 1.3 43.1 4.8 I/O clock
DMU 1.1 40.7 5.4 I/O clock

Long wires are defined as the wires longer than 100× the standard cell height. The CPU clock runs at 500 MHz and the I/O clock
at 250 MHz.

corresponding blocks. Thus, SPC, L2D, and L2T are outstanding target blocks. In addi-
tion, RTX and CCX consume high power as a single block and hence could provide non-
negligible power benefit if folded.
◾◾ The net power portion of the target block needs to be high. If the block is cell power
dominated,* the wirelength reduction of the folded block may not reduce the total power
noticeably. Therefore, SPC and CCX are attractive blocks to fold. L2D shows a relatively
low net power portion compared with other blocks, as it is the memory dominated block
that contains 512 kB (32·16 kB memory macros in our implementation).
◾◾ The target block must contain many long wires so that the wirelength decreases, and
hence, the net power reduction in the folded block can be maximized. In this study, we
define long wires as wires longer than 100× the standard cell height. We observe that
SPC, RTX, and CCX have a large number of long wires.

In our study, we fold five blocks: SPC, CCX, L2D, L2T, and RTX. In the following sections,
we discuss block folding methodologies for SPC, CCX, and L2D. Each block shows distinctive
folding characteristics. Before this, we briefly explain the mixed-size 3D placer that is employed
for block folding.

[Link] FOLDING EXAMPLES

In T2, eight cores use the cache crossbar (CCX) to exchange data stored in eight L2-cache banks.
This crossbar is divided into two separate modules, the processor-to-cache crossbar (PCX) and
the cache-to-processor crossbar (CPX). There are no signal connections between these two
blocks except for the clock and a few test signals. The PCX occupies 48% of the block area and
utilizes 48% of the CCX I/O pins, and the CPX uses the rest of them. Thus, the natural way to
fold this crossbar is by placing the entire PCX block in one die and the CPX in another die, along
with related I/O pins.
Sample 2D and 3D crossbar layouts are shown in Figure 9.12. Interestingly, in the 2D design,
we see that the PCX and CPX blocks are separated into several groups. The PCX has eight sources
(SPCs) and nine targets (eight L2-cache banks and the I/O bridge). The PCX I/O pin locations
are determined based on the target core and L2-cache bank locations in the chip-level floorplan,
which in turn attracts connected cells. Because of this, the cells of the PCX block tend to be
placed far apart, which degrades cell-to-cell wirelength significantly. However, folding the cross-
bar eliminates this problem and hence cell-to-cell wirelength decreases by 31.7% compared with

* Cell power, also known as the internal power, is the power consumed within the boundary of a cell, including intra-cell
switching power and short-circuit power.

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    233

2060 µm

490 µm

(a)
680 µm
680 µm

Die_top: cpx Die_bot: pcx


(b)

FIGURE 9.12 Cache crossbar (CCX) module 2D IC and 3D IC layouts. (a) A 2D IC design. The Cache-
to-Processor (CPX) sub-module is highlighted with white color. (b) A 3D IC design (# TSV = 4). (Image
from Jung, M. et al., On enhancing power benefits in 3D ICs: Block folding and bonding styles
perspective, in Proceedings of ACM Design Automation Conference, 2014.)

the 2D. The folded crossbar leads to 54.6% reduced footprint, 28.8% shorter wirelength, 62.5% less
buffer count, and 32.8% power reduction over the 2D counterpart.
Note that only four signal TSVs are used in this folded design, and this is due to the unique
characteristics of the CCX module itself. However, we must consider the connections in and
out of CCX to cores and cache blocks so that the overall TSV count is minimized in the full-
chip 3D IC layout. Last, we examine whether different 3D partitions with more 3D connec-
tions can provide better power savings. However, as we increase the TSV count up to 6393,
the 3D power benefit reduces down to 23.4%, largely due to the area overhead of TSVs (13.3%).
The single L2-cache data bank contains a 512 kB memory array. This L2D is further divided
into four logical sub-banks. In our implementation, each sub-bank group is partitioned into eight
blocks of size 16 kB each. The L2D is a memory macro dominated design, and hence, there are
not many 3D partitioning options to balance area after folding. Thus, two sub-banks are placed in
each die along with related logic cells. Although the buffer count and wirelength reduce by 33.5%
and 6.4%, respectively, in the folded L2D, their impact on the total power saving is not significant
(5.1% reduction over 2D), as shown in Table 9.5. This is because both cell and leakage power are
dominated by memory macros, which 3D folding cannot improve unless these memory macros
themselves are folded. Additionally, the net power portion is only about 29% of the total power
in 2D, and hence, the small net power reduction in 3D does not lead to a noticeable total power
reduction. Still, the footprint area reduction of 48.4% is nonnegligible, and this might affect chip-
level design quality.
In case of the SPARC core (SPC), we employ the block folding strategy for one additional step:
we fold the blocks inside SPC, which contains 14 blocks including two integer execution units
(EXU), a floating point and graphics unit (FGU), five instruction fetch units (IFU), and a load/
store unit (LSU). This SPC is the highest power consuming block in T2. We apply the same block
folding criteria discussed in Section 9.4.2 and fold six blocks as shown in Figure 9.13. We call this
second-level folding. With this second-level folding, we obtain 9.2% shorter wirelength, 10.8% less
buffers, and 5.1% reduced power consumption than the SPC without second-level folding, that is,
a block-level 3D design of the SPC. Additionally, this 3D SPC achieves 21.2% power saving over
the 2D SPC.

© 2016 by Taylor & Francis Group, LLC


234    9.4 Low-Power 3D IC Design with Block Folding

TABLE 9.5 Comparison between 2D IC and 3D IC Level-2 Cache Data (L2D)


Module Designs
L2D 2D 3D Diff (%)

Footprint (mm )2 2.54 1.31 −48.4


Wirelength (m) 3.41 3.19 −6.4
# cells (×106) 53.1 42.2 −20.5
# buffers (×106) 38.1 25.3 −33.5
Total power (mW) 172.9 164.0 −5.1
Cell power (mW) 25.8 24.6 −4.7
Net power (mW) 50.5 44.5 −11.9
Leakage power (mW) 96.6 94.9 −1.8

gkt
exu0_top exu0_bot
pku tlu_top fgu_top tlu_bot fgu_bot
pmu
exu1_top exu1_bot
dec
ifu ifu_ibu mmu
cmu

lsu_top lsu_bot
ifu_ftu_top ifu_ftu_bot

Top die Bottom die

FIGURE 9.13 Second-level folding of a SPARC core. Six blocks inside the core shown in black text
are folded. (Image from Jung, M. et al., On enhancing power benefits in 3D ICs: Block folding and
bonding styles perspective, in Proceedings of ACM Design Automation Conference, 2014.)

9.4.3 FULL-CHIP ASSEMBLY AND SIMULATION RESULTS

Based on the criteria for block folding discussed in Section 9.4.2, SPC, CCX, L2D, L2T, and RTX
have been folded in the full-chip T2 design. Unlike the other four blocks, RTX runs at I/O clock
frequency (250 MHz). In addition, almost all signals to/from RTX are connected with MAC,
TDS, and RDP that form a network interface unit (NIU) with RTX. Thus, the impact of RTX
folding is limited to the RTX block and NIU. In this study, we use a T2 design with all five types
of blocks folded.
For the F2B (face-to-back) bonding, the bottom die of folded blocks uses up to M7 (TSV
landing pad at M1) as in unfolded blocks, while the top die utilizes up to M9 (TSV landing pad
at M9). Thus, M8 and M9 can be used for over-the-block routing including folded blocks in the
die bottom. The only exception is SPC that uses up to M9 for both dies, as this block requires
the most routing resources. This is why SPCs are placed in the top and the bottom of the chip, as
shown in Figure 9.14. Otherwise, these SPC blocks will act as inter-block routing blockages.
We place CCX in the center. There are about 300 wires between CCX and each SPC (or L2T).
Thus, in this implementation, wires between CCX and L2T are much shorter than those between
CCX and SPC. All other control units (SIU, NCU, DMU, and MCU) are placed in the center row
as well. Finally, NIU blocks are placed at the bottom-most part of the chip, as most connections
are confined to the NIU.
Up to this point, both 2D and 3D designs utilize only regular-Vth (RVT) cells. However,
the semiconductor industry has been using multi-Vth cells to further optimize power, especially
for leakage power, at the cost of more complex power distribution network design. We employ
high-Vth (HVT) cells to examine their impact on power consumption in 2D and 3D designs.

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    235

FIGURE 9.14 3D IC GDSII layouts of OpenSPARC T2 (full chip): block folding with TSVs
(6 × 6.6 mm2, #TSV = 69,091). (Image from Jung, M. et al., On enhancing power benefits in 3D
ICs: Block folding and bonding styles perspective, in Proceedings of ACM Design Automation
Conference, 2014.)

Each HVT cell is around 30% slower, yet has 50% lower leakage, and 5% smaller cell power
consumption than the RVT counterpart.
We now compare three full-chip T2 designs: 2D IC, 3D IC without folding (core/cache
stacking), and 3D IC with block folding (five types of blocks folded), all with a dual-Vth (DVT)
cell library. Detailed comparisons are shown in Table 9.6. We first observe higher HVT cell usage
in 3D designs, especially for the 3D with folding case (94.0% of cells are HVT). This is largely due
to better timing in 3D designs and helps reduce power in 3D ICs further. We observe that 3D
with folding case reduces the total power by 20.3% compared with the 2D and by 10.0% compared
with the 3D without folding case. This clearly demonstrates the effectiveness of block folding in
large-scale commercial-grade 3D designs for power reduction.

TABLE 9.6  Full-Chip T2 Comparison among 2D IC, 3D IC without Block Folding


A
(Core/Cache Stacking), and 3D IC with Block Folding (Five Types of Blocks
Folded) Designs
2D 3D w/o Folding 3D w/o Folding
Footprint (mm2) 71.1 38.4 (−46.0%) 40.8 (−42.6%)
Wirelength (m) 339.7 321.3 (−5.5%) 309.6 (−8.9%)
# Cells (×106) 7.41 7.09 (−4.3%) 6.83 (−7.8%)
# Buffers (×106) 2.89 2.37 (−17.9%) 2.23 (−22.8%)
# HVT cells (×106) 6.50 (87.8%) 6.38 (90.0%) 6.42 (94.0%)
# TSV 0 3263 69,091
Total power (W) 8.240 7.113 (−13.7%) 6.570 (−20.3%)
Cell power (W) 1.770 1.394 (−21.2%) 1.175 (−33.6%)
Net power (W) 4.467 3.966 (−11.2%) 3.806 (−14.8%)
Leakage power (W) 2.003 1.753 (−12.4%) 1.589 (−24.2%)

The same dual-Vth design technique is applied to all cases. The numbers in parentheses indicate the difference against the
2D, except for the high-Vth (HVT) cell count reported as a % of the total cell count.

© 2016 by Taylor & Francis Group, LLC


236    9.5 To Probe Further

[Link] SUMMARY

We studied the power benefit of 3D ICs with an OpenSPARC T2 chip. To further enhance the
3D power benefit on top of the conventional 3D floorplanning method, the impact of block
folding methodologies was explored. With the aforementioned methods, a total power saving
of 20.3% was achieved against the 2D counterpart. Note that the 3D power benefit will increase
with a faster clock. With better timing in 3D, the discrepancy in terms of cell size and HVT
cell usage between 2D and 3D designs will increase, which in turn will enhance the 3D power
savings. Ongoing efforts address thermal issues in various 3D design styles with different bonding
styles, the impact of parasitics such as TSV-to-wire coupling capacitance on 3D power, and other
sources of 3D power benefit loss. Interested readers are referred to [32] for more details.

9.4.4 NEEDS FOR EDA TOOL DEVELOPMENT

The first physical design tool that is needed to support block folding is a mixed-size 3D placer
that can handle macros, gates, and TSVs, together for all dies in the stack. For a given block to be
folded, the tool must partition the macros and the gates into multiple dies, while optimizing the
number of connections (= TSVs) across the dies so that the PPA overhead of TSVs is minimized.
The next step is to place the objects into multiple dies while optimizing the overall PPA and
reliability. A TSV-based 3D placer based on a system of supply/demand of placement space was
presented in [36] but lacks the capability to handle hard macros. This capability can be added by
treating a hard macro as a large cell that demands some placement space. However, this leads to
large whitespace regions, called halos, in the vicinity of hard macros. Spindler et al. [70] solved
this issue by reducing the declared size of the hard macros. However, we observe that this tactic
is insufficient for extremely large hard macros such as memory banks in the L2 cache, for which
halos still exist.
An I/O pin partitioner for folded blocks is also needed because the inter-block routing
quality is largely affected by block I/O pins. In the extreme case, when all I/O pins of folded
blocks are placed in the die bottom, routing congestion and detour will be serious in this
die. This phenomenon in turn increases coupling capacitance and thus net power consump-
tion. Such a bad inter-block design quality can degrade intra-block design metrics as well.
Therefore, I/O pins of folded blocks need to be partitioned so that the inter-block wirelength
in both dies is balanced.
A 3D static timing analysis (STA) tool is a must for any 3D IC designer. The tool needs to
handle multiple dies simultaneously and perform fast and accurate timing calculations. In addi-
tion, all of the parasitics including TSV and micro-bump related, as well as cross-die elements,
must be extracted for a correct timing calculation. A true 3D buffer insertion tool is also required
for effective timing closure in 3D IC. In the current tool flow, we first calculate timing constraints
at die boundaries (= TSV connections), using a 3D STA tool. We then perform buffer insertion
and gate sizing for each die separately, using those timing constraints. We chose this suboptimal
approach simply because of the lack of a 3D IC buffer inserter. In an all-dies-together or true
3D approach, buffers are inserted and gates are sized while processing all dies at the same time
instead of individual dies separately.

9.5 TO PROBE FURTHER

9.5.1 3D IC FLOORPLANNING

Cong et al. [14] use a combined bucket and 2D array (CBA) representation to better explore the
solution space of the multi-tier module packing problem. They employ a fast but less accurate
hybrid resistive model and another accurate but relatively slow resistive model selectively within
their floorplanning to incorporate thermal awareness. Healy et al. [21] present a multi-objective
micro-architectural floorplanning algorithm for high-performance processors implemented using
3D ICs. The floorplanner determines the dimension and placement locations of the functional

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    237

modules, taking into consideration thermal reliability, area, wirelength, vertical overlap, and
bonding-aware layer partitioning. This hybrid floorplanning approach combines linear program-
ming and simulated annealing.
Zhou et al. [89] use a three-stage force-directed optimization flow combined with legaliza-
tion techniques that eliminate block overlaps during multi-layer floorplanning. A temperature-
dependent leakage model is used to permit optimization based on the feedback loop connecting
thermal profile and leakage power consumption. Falkenstern et al. [17] focus on 3D floorplan and
power/ground (P/G) co-synthesis, which builds the floorplan and the P/G network concurrently.
Their tool integrates a 3D B*-tree floorplan representation, a resistive P/G mesh, and a simulated
annealing (SA) engine to explore the 3D floorplan and P/G network.
Tsai et al. [74] propose a two-stage 3D fixed-outline floorplan algorithm, where stage one
simultaneously plans hard macros and TSV blocks for wirelength reduction, while the next stage
improves the wirelength by reassigning signal TSVs. In [39], the authors show how to integrate 2D
IP blocks into 3D chips, without altering their layout. Their main idea is to optimize whitespace
for TSV insertion. Experiments indicate that the overhead of the proposed integration is small,
which can help accelerate the industry adoption of 3D integration.

[Link] 3D IC PLACEMENT

In [19], thermal vias are assigned to specific areas of a 3D IC and used to adjust their effective
thermal conductivities. Their method uses finite element analysis (FEA) to calculate temperatures
quickly during each iteration, while making iterative adjustments to these thermal conductivities in
order to achieve a desired thermal objective. Cong et al. [13] propose several techniques to obtain
3D IC layouts from 2D IC placement solutions through transformations. They present different
types of folding transformations, where cutlines are drawn and the chip is folded, to obtain 3D
IC placements. They provide a conflict-net graph based technique to reassign the cell layers to
reduce wirelength further.
The work of Kim et al. [37] is the first one, which reveals the significant area overhead of TSVs
in 3D IC placement. They present a force-directed 3D gate-level placement that efficiently handles
TSVs. The authors developed a technique for irregular TSV placement, where cells and TSVs are
placed together. In the case of regular TSV arrays, the cells are placed first, and next, they assign
TSVs to nets to complete routing. Athikulwongse et al. [3] present techniques to exploit TSV
stress to improve the timing of the design. They incorporate stress-aware mobility models into a
force-directed placement framework to improve the overall timing of the design. They also pro-
vide techniques to handle (1) regular TSV arrays, where only the standard cells are moved, and
(2) irregular TSV placements, where both standard cells and TSVs are placed together.
Hsu et al. [22] propose an algorithm that first performs 3D analytical global placement with a
density optimization step. This reserves whitespace for TSVs that will be inserted later. Next, the
authors perform TSV insertion into the layout and perform TSV-aware standard cell legalization.
Finally, they perform tier-by-tier detailed placement to optimize the wirelength further. Cong
et al. [12] propose a thermal-aware 3D placement method that considers both the thermal effect
and the area impact of TSVs. They demonstrate that the minimum temperature can be achieved
by making the TSV area in each placement bin proportional to the lumped power consumption
in that bin, together with the bins in all the tiers directly above it.

[Link] 3D IC ROUTING AND BUFFERING

Cong and Zhang [15] compare multiple methods for thermal via insertion. They propose a heu-
ristic algorithm by planning thermal vias in vertical and horizontal directions. The authors claim
a 200× speed-up compared with a nonlinear programming problem formulation with a thermal
resistive model, with 1% difference in solution quality. Pathak and Lim [61] present an algorithm
for 3D Steiner tree construction and TSV relocation. A constructive Steiner routing algorithm is
used for the initial tree construction, while a linear programming method is subsequently used
to refine TSV locations for thermal optimization. The proposed algorithm reduces delay and
maximum temperature with comparable performance versus greedy methods.

© 2016 by Taylor & Francis Group, LLC


238    9.5 To Probe Further

Two buffering algorithms are presented in [43]: van Ginneken−based and slew-aware buffer
insertion. The authors develop an efficient way to propagate slew information during the buffer
insertion for the nets that contain TSVs. They demonstrate significant buffer count and negative
slack (both the worst and the total) savings, compared with van Ginneken and another baseline
that is based on Cadence Encounter. The authors in [57,62] present an algorithm for signal routing
considering electro-migration (EM). Higher priority is given to the EM-critical nets and higher
mean time to failure (MTTF) is achieved. The total number of nets with EM violations can also be
reduced. Hsu et al. [23] use signal TSVs to improve heat dissipation in 3D ICs. The authors compare
their results to [15] and claim 23% less TSVs used for thermal reduction. TSVs are initially placed
and later relocated considering temperature reduction.

[Link] 3D IC POWER DISTRIBUTION NETWORK ROUTING

Yu et al. [83] discuss how to place power/ground (P/G) vias, considering power and thermal issues
simultaneously. The optimization is performed by calculating an RLC matrix for power distribu-
tion and a thermal resistance matrix for thermal optimization. The authors demonstrate up to
45.5% non-signal via count reduction by simultaneous optimization using dynamic programming.
Zhou et al. [88] discuss decoupling-capacitance insertion with MIM (metal−insulator−metal)
and CMOS capacitors. A sequence-of-linear-programs method is used, and the authors show that
denser power grids help reduce voltage droops. A congestion model is built for evaluating area
congestion in each design cell.
Healy and Lim [20] compare two different PG TSV placement strategies: TSV distributed
­versus TSV clustered. The TSV-distributed placement topology spreads the TSV locations evenly
across the design, while TSV-clustered topology groups the TSVs within a small area. The authors
demonstrate a 50% lower IR drop and 42% lower noise on power distribution network (PDN)
with distributed TSV, compared with clustered TSV. Savidis et al. [66] present a test chip for
3D PDN noise measurement. TSV density and decoupling capacitance impact are experimen-
tally described with measurement power grid noise extracted by a source follower sense circuit.
A design without decoupling capacitance is measured to have much lower resonance frequency
than those with decoupling capacitance. Song et al. [69] present a technique that utilizes TSVs
as decoupling capacitors to reduce PDN noise. Using an extra stacked chip with TSVs as decou-
pling capacitors is shown to be more effective than on-chip or off-chip decoupling capacitors.
They showed that with the extra tier, the parallel resonance frequency point is eliminated more
effectively, compared with off-chip decoupling capacitors.
In [58], electro-migration for 3D IC PDN is modeled with a focus on multi-scale via structure,
that is, TSVs and local vias used together for vertical power delivery. The work investigates the inter-
play of TSVs and conventional local vias in 3D ICs. The authors also study the impact of struc-
ture, material, and initial void size on EM-related lifetime of multi-scale via structures. In [86],
a transient modeling of EM in TSV and TSV-to-wire interfaces in the power delivery network
of 3D ICs is carried out. Atomic depletion and accumulation, effective resistance degradation,
and full-chip-scale PDN lifetime degradation due to EM are captured in their model. The results
show that voids and hillocks grow at various TSV-to-wire interfaces and degrade the effective
resistance of TSVs significantly.

[Link] 3D IC CLOCK ROUTING

Minz et al. [52] and Zhao et al. [85] are the first publications that show that multiple TSVs used
in a clock tree for 3D ICs reduce wirelength and thus power consumption, compared with the
single-TSV case. They present the 3D MMM (MMM = method of means and medians) algorithm
that builds a 3D abstract tree for a given TSV bound under wirelength versus TSV congestion
trade-off. This abstract tree is then embedded, buffered, and refined under the given nonuniform
thermal profiles so that the temperature-dependent skews are minimized and balanced. Zhao
et al. [84] provide a technique to construct pre-bond testable clock trees. If a 3D clock tree is built
using multiple clock TSVs, each die—except for the die that contains the clock source—contains
multiple subtrees that are not connected. These dies require multiple clock probe pads during
pre-bond testing. The authors present a routing method to build a temporary tree that connects

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    239

together these subtrees and use a single clock probe pad for low-cost and low-power testing. This
work is extended by Kim and Kim [38] to consider co-optimization of TSV usage and power
consumption.
In [35], the authors present techniques to build a clock tree for a 3D IC that consists of one
GPU and a four-layer stacked DRAM connected on a 2.5D interposer. The clock generator is
on the GPU chip and is fed through the interposer to clock pins on the 3D DRAM module. The
authors provide an improved clock tree in the 3D DRAM module that reduces skew and jitter.
The authors of [49] propose a fault-tolerant 3D clock tree that significantly reduces the overhead
compared with redundant TSV for clock TSV. The authors propose a fault-tolerant TSV unit
that makes use of the existing 2D redundant trees designed for pre-bond testing and thus has a
minimum area overhead, while maintaining the same yield. Chae et al. [11] present a post-silicon
tuning methodology, called tier adaptive body biasing (TABB), to reduce skew and data path vari-
ability in 3D clock trees. The proposed TABB uses specialized on-die sensors to independently
detect the process corners of NMOS and PMOS devices and, accordingly, tune the body biases of
NMOS/PMOS devices to reduce the clock skew variability.

[Link] PHYSICAL DESIGN FOR MONOLITHIC 3D IC

Monolithic 3D IC is a vertical integration technology that builds two or more tiers of devices
sequentially rather than bonding two independently fabricated dies together using bumps and/or
TSVs [7]. Compared with other existing 3D integration technologies (wire-bonding, interposer,
through-silicon-via, etc.), monolithic 3D integration allows ultra fine-grained vertical integration
of devices and interconnects, thanks to the extremely small size of inter-tier vias, typically local-
via-sized (70 nm in diameter).
Bobba et al. [9] propose two different strategies of stacking standard cells in monolithic 3D
that utilize 2D tools: intra-cell stacking and cell-on-cell stacking. Intra-cell stacking requires
the modification of standard cell design but permits a direct reuse of 2D IC tools. In case of
cell-on-cell stacking, the authors propose a placement tool based on commercial tools and an
LP formulation for tier assignment. In [46], the authors present physical design techniques for
transistor-level monolithic 3D ICs. They first build a cell library that consists of 3D gates and
then model their timing and power characteristics. They perform iso-performance comparisons
and demonstrate a significant power benefit over 2D. They also demonstrate that this benefit
increases at future technology nodes.
Samal et al. [65] present a comprehensive study of thermal effects in monolithic 3D ICs. They
develop a fast and accurate block-level thermal model using a nonlinear regression technique.
They then incorporate this model into a simulated-annealing based floorplanner to reduce the
maximum temperature of the monolithic 3D IC with minimal wirelength overhead. In [58], the
authors present a power-performance study of block-level monolithic 3D ICs, under inter-tier
performance differences that are caused by the manufacturing process. They first present an
RTL-to-GDSII floorplanning framework that can handle soft blocks. Next, they model the inter-
tier performance differences and present a floorplanning framework that can generate circuits
immune to these differences. Panth et al. [60] present physical design techniques for gate-level
monolithic 3D ICs. These techniques place the gates into half the footprint area of a 2D IC, using
a commercial 2D engine. Next, the gates are partitioned into multiple tiers to give high-quality
3D solutions. The authors also present techniques to utilize the commercial tool for timing opti-
mization, clock tree synthesis, and inter-tier via insertion.

[Link] 3D IC MECHANICAL RELIABILITY ANALYSIS AND DESIGN OPTIMIZATION

Yang et al. [81] propose a TSV stress-aware timing analyzer and show how to optimize layout for
a better performance. They show that TSV stress−induced timing variations can be as much as
10% for an individual cell and full-chip designs. Jung et al. [28] presented a full-chip TSV inter-
facial crack analysis flow and design optimization methodology to alleviate TSV interfacial crack
problems in 3D ICs. First, they analyze TSV interfacial cracks at the TSV/dielectric liner inter-
face caused by TSV-induced thermo-mechanical stress. Then, they explore the impact of TSV

© 2016 by Taylor & Francis Group, LLC


240    References

placement in conjunction with various associated structures, such as landing pad and dielectric
liner on TSV interfacial cracks.
In [29], the authors show how TSV structures affect stress field and mechanical reliability in
3D ICs. They also present an accurate and fast full-chip stress and mechanical reliability analysis
flow, which can be applicable to placement optimization for 3D ICs. Results show that KOZ size,
TSV size, liner material/thickness, and TSV placement are the key design parameters to reduce
the mechanical reliability problems in TSV-based 3D ICs. This work is extended in [31], where the
authors show how package elements affect the stress field and the mechanical reliability on top
of the TSV-induced stress in 3D ICs. This chapter shows that the mechanical reliability of TSVs
in the bottom-most die in the stack is highly affected by packaging elements and that this effect
decreases as it moves onto the upper dies.

[Link] 3D IC LOW-POWER PHYSICAL DESIGN METHODOLOGIES

In [27], the power benefit of 3D ICs is demonstrated with an OpenSPARC T2 core. Four design
techniques are explored to optimize power in 3D IC designs: 3D floorplanning, intra-block level
metal layer usage control, dual-Vth design, and functional module folding. With these methods,
total power savings of 21.2% were achieved.* Lee and Lim [45] demonstrated how the power con-
sumption of the buses in GPUs can be reduced with 3D IC technologies. To maximize the power
benefit of 3D ICs, the authors claim that finding a good partition and floorplan solution is critical.
To further enhance the 3D power benefit versus the conventional 3D floorplanning method for
GPU, block folding methodologies and bonding style impact were explored [32]. The authors also
developed an efficient method to find face-to-face via locations for two-tier 3D ICs and showed
more 3D power reduction in F2F bonding than in F2B.

REFERENCES
1. 3DInCites. [Link] November, 2015.
2. Ansys 3D IC Tools. [Link] November, 2015.
3. K. Athikulwongse, A. Chakraborty, J.-S. Yang, D. Pan, and S. K. Lim. Stress-driven 3D-IC placement
with TSV keep-out zone and regularity study. In Proceedings of IEEE International Conference on
Computer-Aided Design, San Jose, CA, 2010.
4. K. Athikulwongse, D. H. Kim, M. Jung, and S. K. Lim. Block-level designs of die-to-wafer bonded 3D
ICs and their design quality tradeoffs. In Proceedings of Asia and South Pacific Design Automation
Conference, Yokohama, Japan, 2013.
5. Atrenta 3D IC Tools. [Link] November, 2015.
6. K. Banerjee, S. Souri, P. Kapur, and K. Saraswat. 3-D ICs: A novel chip design for improving deep-
submicrometer interconnect performance and systems-on-chip integration. Proceedings of the IEEE,
89(5):602–633, 2001.
7. P. Batude et al. Advances in 3D CMOS sequential integration. In Proceedings of IEEE International
Electron Devices Meeting, Baltimore, MD, 2009.
8. B. Black et al. Die stacking (3D) microarchitecture. In Proceedings of Annual International Symposium
Microarchitecture, Orlando, FL, 2006.
9. S. Bobba, A. Chakraborty, O. Thomas, P. Batude, T. Ernst, O. Faynot, D. Pan, and G. De Micheli.
CELONCEL: Effective design technique for 3-D monolithic integration targeting high perfor-
mance integrated circuits. In Proceedings of Asia and South Pacific Design Automation Conference,
Yokohama, Japan, 2011.
10. Cadence 3D IC Tools. [Link] November, 2015.
11. K. Chae, X. Zhao, S. K. Lim, and S. Mukhopadhyay. Tier adaptive body biasing: A post-silicon tuning
method to minimize clock skew variations in 3-D ICs. IEEE Transactions on Components, Packaging
and Manufacturing Technology, 3(10):1720–1730, 2013.
12. J. Cong, G. Luo, and Y. Shi. Thermal-aware cell and through-silicon-via co-placement for 3D ICs. In
Proceedings of ACM Design Automation Conference, San Diego, CA, 2011.
13. J. Cong, G. Luo, J. Wei, and Y. Zhang. Thermal-aware 3D IC placement via transformation. In
Proceedings of Asia and South Pacific Design Automation Conference, Yokohama, Japan, 2007.

* This work is presented in Section 9.4.

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    241

14. J. Cong, J. Wei, and Y. Zhang. A thermal-driven floorplanning algorithm for 3D ICs. In Proceedings of
IEEE Internatioanl Conference on Computer-Aided Design, San Jose, CA, 2004.
15. J. Cong and Y. Zhang. Thermal via planning for 3-D ICs. In Proceedings of IEEE International
Conference on Computer-Aided Design, San Jose, CA, 2005.
16. G. V. der Plas et al. Design issues and considerations for low-cost 3D TSV IC technology. In IEEE
International Solid-State Circuits Conference, San Francisco, CA, 2010, pp. 148–149.
17. P. Falkenstern, Y. Xie, Y.-W. Chang, and Y. Wang. Three-dimensional integrated circuits (3D IC)
floorplan and power/ground network co-synthesis. In Proceedings of Asia and South Pacific Design
Automation Conference, Taipei, Taiwan, 2010.
18. S. Franssila. Introduction to Microfabrication. John Wiley & Sons, Chichester, U.K., 2004.
19. B. Goplen and S. Sapatnekar. Placement of thermal vias in 3-D ICs using various thermal objectives.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(4):692–709,
2006.
20. M. Healy and S. K. Lim. Distributed TSV topology for 3-D power-supply networks. IEEE Transactions
on VLSI Systems, 20(11):2066–2079, 2012.
21. M. Healy, M. Vittes, M. Ekpanyapong, C. Ballapuram, S. K. Lim, H.-H. Lee, and G. Loh. Multiobjective
microarchitectural floorplanning for 2-D and 3-D ICs. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 26(1):38–52, 2007.
22. M.-K. Hsu, Y.-W. Chang, and V. Balabanov. TSV-aware analytical placement for 3D IC designs.
In Proceedings of ACM Design Automation Conference, June, 2011.
23. P.-Y. Hsu, H.-T. Chen, and T. Hwang. Stacking signal TSV for thermal dissipation in global routing
for 3D IC. In Proceedings of Asia and South Pacific Design Automation Conference, Yokohama, Japan,
2013.
24. X. Hu et al. High thermal conductivity molding compound for flip-chip packages. US Patent
2009/0004317 A1, 2009.
25. J.-W. Im et al. A 128Gb 3b/cell V-NAND flash memory with 1Gb/s I/O rate. In IEEE International
Solid-State Circuits Conference, 2015, pp. 1–3.
26. ITRS. The International Technology Roadmap for Semiconductors. [Link]
27. M. Jung et al. How to reduce power in 3D IC designs: A case study with OpenSPARC T2 core.
In Proceedings of IEEE Custom Integrated Circuits Conference, September, 2013.
28. M. Jung, X. Liu, S. Sitaraman, D. Z. Pan, and S. K. Lim. Full-chip through-silicon-via interfa-
cial crack analysis and optimization for 3D IC. In Proceedings of IEEE International Conference on
Computer-Aided Design, San Jose, CA, 2011.
29. M. Jung, J. Mitra, D. Pan, and S. K. Lim. TSV stress-aware full-chip mechanical reliability analysis
and optimization for 3D IC. In Proceedings of ACM Design Automation Conference, San Diego, CA,
2011.
30. M. Jung, J. Mitra, D. Pan, and S. K. Lim. TSV Stress-aware full-chip mechanical reliability analysis
and optimization for 3D IC. Communications of the ACM, 57(1):107–115, 2014.
31. M. Jung, D. Pan, and S. K. Lim. Chip/package co-analysis of thermo-mechanical stress and reliability
in TSV-based 3D ICs. In Proceedings of ACM Design Automation Conference, San Francisco, CA,
2012.
32. M. Jung, T. Song, Y. Wan, Y. Peng, and S. K. Lim. On enhancing power benefits in 3D ICs: Block
folding and bonding styles perspective. In Proceedings of ACM Design Automation Conference, San
Francisco, CA, 2014.
33. U. Kang et al. 8Gb 3D DDR3 DRAM using through-silicon-via technology. In IEEE International
Solid-State Circuits Conference, San Francisco, CA, USA, 2009, pp. 130–131, 131a.
34. D. Kim et al. 3D-MAPS: 3D massively parallel processor with stacked memory. In IEEE International
Solid-State Circuits Conference, San Francisco, CA, 2010.
35. D. Kim, J. Kim, J. Cho, J. S. Pak, J. Kim, H. Lee, J. Lee, and K. Park. Distributed multi TSV 3D clock
distribution network in TSV-based 3D IC. In Proceedings of IEEE Electrical Performance of Electronic
Packaging, San Jose, CA, 2011.
36. D. H. Kim, K. Athikulwongse, and S. K. Lim. A study of through-silicon-via impact on the 3D stacked
IC layout. In Proceedings of IEEE International Conference on Computer-Aided Design, San Jose, CA,
2009.
37. J.-S. Kim et al. A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4x128 I/Os using TSV-based stack-
ing. In IEEE International Solid-State Circuits Conference, San Francisco, CA, 2011, pp. 496–498.
38. T.-Y. Kim and T. Kim. Clock tree synthesis with pre-bond testability for 3D stacked IC designs.
In Proceedings of ACM Design Automation Conference, Anaheim, CA, 2010.
39. J. Knechtel, I. Markov, and J. Lienig. Assembling 2-D blocks into 3-D chips. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 31(2):228–241, 2012.
40. J. Knickerbocker et al. Three-dimensional silicon integration. IBM Journal of Research and
Development, 52(6):553–569, November 2008.

© 2016 by Taylor & Francis Group, LLC


242    References

41. M. Koyanagi, T. Fukushima, and T. Tanaka. High-density through silicon vias for 3-D LSIs. Proceedings
of the IEEE, 97(1):49–59, 2009.
42. M. Koyanagi, Y. Nakagawa, K.-W. Lee, T. Nakamura, Y. Yamada, K. Inamura, K.-T. Park, and
H. Kurino. Neuromorphic vision chip fabricated using three-dimensional integration technology.
In IEEE International Solid-State Circuits Conference, San Francisco, CA, pp. 270–271, 2001.
43. Y.-J. Lee, I. Hong, and S. K. Lim. Slew-aware buffer insertion for through-silicon-via-based 3D ICs.
In Proceedings of IEEE Custom Integrated Circuits Conference, San Jose, CA, 2012.
44. Y.-J. Lee and S. K. Lim. Timing analysis and optimization for 3D stacked multi-core microprocessors.
In Proceedings of IEEE International 3D Systems Integration Conference, Munich, Germany, 2010.
45. Y.-J. Lee and S. K. Lim. On GPU bus power reduction with 3D IC technologies. In Proceedings of
Design, Automation and Test in Europe, Dresden, Germany, 2014.
46. Y.-J. Lee, D. Limbrick, and S. K. Lim. Power benefit study for ultra-high density transistor-level
monolithic 3D ICs. In Proceedings of ACM Design Automation Conference, Austin, TX, 2013.
47. S. K. Lim. TSV-aware 3D physical design tool needs for faster mainstream acceptance of 3D ICs.
In ACM DAC Knowledge Center, 2010.
48. J.-Q. Lu. 3-D Hyperintegration and packaging technologies for micro-nano systems. Proceedings of
the IEEE, 97(1):18–30, 2009.
49. C.-L. Lung, Y.-S. Su, S.-H. Huang, Y. Shi, and S.-C. Chang. Fault-tolerant 3D clock network.
In Proceedings of ACM Design Automation Conference, San Diego, CA, 2011.
50. Mentor Graphics 3D IC Tools. [Link] November, 2015.
51. Micro Magic 3D IC Tools. [Link] November, 2015.
52. J. Minz, X. Zhao, and S. K. Lim. Buffered clock tree synthesis for 3D ICs under thermal variations.
In Proceedings of Asia and South Pacific Design Automation Conference, Seoul, Korea, 2008.
53. NanGate Inc. NanGate 45nm Open Cell Library, 2009.
54. U. Nawathe et al. An 8-core 64-thread 64b power-efficient SPARC SoC. In IEEE International
Solid-State Circuits Conference, 2007.
55. North Carolina State University. FreePDK45, 2009.
56. [Link]. OpenCores. [Link] 2009.
57. J. Pak, S. K. Lim, and D. Pan. Electromigration-aware routing for 3D ICs with stress-aware EM
modeling. In Proceedings of IEEE International Conference on Computer-Aided Design, San Jose, CA,
2012.
58. J. Pak, S. K. Lim, and D. Pan. Electromigration study for multi-scale power/ground vias in TSV-based
3D ICs. In Proceedings of IEEE International Conference on Computer-Aided Design, San Jose, CA,
2013.
59. S. Panth, K. Samadi, Y. Du, and S. K. Lim. Power-performance study of block-level monolithic 3D-ICs
considering inter-tier performance variations. In Proceedings of ACM Design Automation Conference,
San Francisco, CA, 2014.
60. S. A. Panth, K. Samadi, Y. Du, and S. K. Lim. Design and CAD methodologies for low power gate-
level monolithic 3D ICs. In Proceedings of International Symposium on Low Power Electronics and
Design, La Jolla, CA, 2014.
61. M. Pathak and S. K. Lim. Performance and thermal-aware steiner routing for 3-D stacked ICs. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(9):1373–1386, 2009.
62. M. Pathak, J. Pak, D. Pan, and S. K. Lim. Electromigration modeling and full-chip reliability analysis
for BEOL interconnect in TSV-based 3D ICs. In Proceedings of IEEE International Conference on
Computer-Aided Design, San Jose, CA, 2011.
63. R. Patti. Three-dimensional integrated circuits and the future of system-on-chip designs. Proceedings
of the IEEE, 94(6):1214–1224, 2006.
64. R3 Logic 3D IC Tools. [Link] November, 2015.
65. S. Samal, S. Panth, K. Samadi, M. Saedi, Y. Du, and S. K. Lim. Fast and accurate thermal modeling
and optimization for monolithic 3D ICs. In Proceedings of ACM Design Automation Conference, San
Francisco, CA, 2014.
66. I. Savidis, S. Kose, and E. Friedman. Power noise in TSV-based 3-D integrated circuits. IEEE Journal
of Solid-State Circuits, 48(2):587–597, 2013.
67. C.-H. Shen et al. Heterogeneously integrated sub-40 nm low-power epi-like Ge/Si monolithic 3D-IC
with stacked SiGeC ambient light harvester. In Proceedings of IEEE International Electron Devices
Meeting, San Francisco, CA, 2014.
68. M. Shulaker, T. Wu, A. Pal, L. Zhao, Y. Nishi, K. Saraswat, H.-S. Wong, and S. Mitra. Monolithic 3D inte-
gration of logic and memory: Carbon nanotube FETs, resistive RAM, and silicon FETs. In Proceedings
of IEEE International Electron Devices Meeting, San Francisco, CA, 2014, pp. 27.4.1–27.4.4.
69. E. Song, K. Koo, J. S. Pak, and J. Kim. Through-silicon-via-based decoupling capacitor stacked
chip in 3-D-ICs. IEEE Transactions on Components, Packaging and Manufacturing Technology,
3(9):1467–1480, 2013.

© 2016 by Taylor & Francis Group, LLC


Chapter 9 – Physical Design for 3D ICs    243

70. P. Spindler, U. Schlichtmann, and F. M. Johannes. Kraftwerk2—A fast force-directed quadratic


placement approach using an accurate net model. In IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 2008.
71. V. Sundaram et al. Low cost, high performance, and high reliability 2.5D silicon interposer. In IEEE
Electronic Components and Technology Conference, Las Vegas, NV, 2013, pp. 342–347.
72. Synopsys 3D IC Tools. [Link] November, 2015.
73. The Hybrid Memory Cube. [Link]
74. M.-C. Tsai, T.-C. Wang, and T. Hwang. Through-silicon via planning in 3-D floorplanning. IEEE
Transactions on VLSI Systems, 19(8):1448–1457, 2011.
75. R. Tummala. Moore’s law meets its match (system-on-package). IEEE Spectrum, 43(6):44–49, 2006.
76. R. Tummala and J. Laskar. Gigabit wireless: System-on-a-package technology. Proceedings of the
IEEE, 92(2):376–387, February 2004.
77. Various Authors. Special issue on 3-D integration technologies. Proceedings of the IEEE, 97(1):1–175,
2009.
78. S. Wong, A. El-Gamal, P. Griffin, Y. Nishi, F. Pease, and J. Plummer. Monolithic 3D integrated circuits.
In Proceedings of International Symposium on VLSI Technology, Systems and Applications, Hsinchu,
Taiwan, 2007, pp. 1–4.
79. Xilinx 3D IC Tools. [Link] November, 2015.
80. C. Xu et al. Fast 3-D thermal analysis of complex interconnect structures using electrical modeling
and simulation methodologies. In Proceedings of IEEE International Conference on Computer-Aided
Design, San Jose, CA, 2009.
81. J.-S. Yang, K. Athikulwongse, Y.-J. Lee, S. K. Lim, and D. Pan. TSV stress aware timing analysis with
applications to 3D-IC layout optimization. In Proceedings of ACM Design Automation Conference,
Anaheim, CA, 2010.
82. K. B. Yeap et al. A critical review on multiscale material database requirement for accurate three-
dimensional IC simulation input. IEEE Transactions on Device and Materials Reliability, 12(2):217–
224, 2012.
83. H. Yu, J. Ho, and L. He. Allocating power ground vias in 3D ICs for simultaneous power and thermal
integrity. ACM Transactions on Design Automation of Electronics Systems, 14(3):1–31, 2009.
84. X. Zhao, D. Lewis, H.-H. Lee, and S. K. Lim. Low-power clock tree design for pre-bond testing of
3-D stacked ICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
30(5):732–745, 2011.
85. X. Zhao, J. Minz, and S. K. Lim. Low-power and reliable clock network design for through silicon
via based 3D ICs. IEEE Transactions on Components, Packaging and Manufacturing Technology,
1(2):247–259, 2011.
86. X. Zhao, Y. Wan, M. Scheuermann, and S. K. Lim. Transient modeling of TSV-wire electromigration
and lifetime analysis of power distribution network for 3D ICs. In Proceedings of IEEE International
Conference on Computer-Aided Design, San Jose, CA, 2013.
87. P. Zhou, Y. Ma, Z. Li, R. Dick, L. Shang, H. Zhou, X. Hong, and Q. Zhou. 3D-STAF: Scalable tempera-
ture and leakage-aware floorplanning for 3D ICs. In Proceedings of IEEE International Conference on
Computer-Aided Design, San Jose, CA, 2007.
88. P. Zhou, K. Sridharan, and S. Sapatnekar. Optimizing decoupling capacitors in 3D circuits for power
grid integrity. IEEE Design and Test of Computers, 26(5):15–25, 2009.

© 2016 by Taylor & Francis Group, LLC

You might also like