Design For Manufacturability With Advanced Lithography - Bei Yu, David Z. Pan (Auth.) - 1, 2016
Design For Manufacturability With Advanced Lithography - Bei Yu, David Z. Pan (Auth.) - 1, 2016
Pan
Design for
Manufacturability
with Advanced
Lithography
Design for Manufacturability with Advanced
Lithography
Bei Yu • David Z. Pan
123
Bei Yu David Z. Pan
ECE Department ECE Department
The University of Texas The University of Texas
Austin, TX, USA Austin, TX, USA
Shrinking the feature size of very large-scale integrated circuits (VLSI) with
advanced lithography has been a holy grail for the semiconductor industry. However,
the gap between manufacturing capability and the expectation of design perfor-
mance becomes critically challenged in sub-28 nm technology nodes. To bridge this
gap, design for manufacturability (DFM) is a must to co-optimize both design and
lithography process at the same time.
In this book, we have aimed to present the most state-of-the-art research
results on DFM with multiple patterning lithography (MPL) and electron beam
lithography (EBL). Note that we have made no attempt to include everything,
or even everything which is important in DFM. For example, design challenges
toward extreme ultraviolet (EUV), directed self-assembly (DSA), and nanoimprint
lithography (NIL) are not covered in this book. We hope this book will function
as a concise introduction and demonstration on the design and technology co-
optimization.
DFM for advanced lithography could be defined very differently under different
circumstances. In general, progress in advanced lithography happens along three
different directions:
• New patterning technique (e.g., layout decomposition for different patterning
techniques)
• New design methodology (e.g., lithography aware standard cell design and
physical design)
• New illumination system (e.g., layout fracturing for EBL system, stencil planning
for EBL system)
For the research direction of new patterning technique, we study the layout
decomposition problems for different patterning technique and explore four impor-
tant topics. We present the proof that triple patterning layout decomposition is
NP-hard. We propose a number of CAD optimization and integration techniques
to solve different decomposition problems. For the research direction of new
design methodology, we will show the limitation of traditional design flow. That
is, ignoring advanced lithography constraints in early design stages may limit the
vii
viii Preface
potential to resolve all the lithography process conflicts. To overcome the limitation,
we propose a coherent framework, including standard cell compliance and detailed
placement, to enable lithography friendly design. For the EBL illumination system,
we focus on two topics to improve the throughput of the whole EBL system. With
simulations and experiments, we demonstrate the critical role and effectiveness
of DFM techniques for the advanced lithography, as the semiconductor industry
marches forward in the deeper submicron domain.
We are particularly grateful to Dr. Bei Yu’s PhD dissertation committee members,
as the major material of this book is based on his dissertation. In particular, we
would like to thank Prof. Ross Baldick for his technical suggestions and comments
on the optimization formulations. We would like to thank Prof. Ray Chen for his
comments on future applications. We would like to thank Prof. Michael Orshansky
for his technical suggestions during the development of this book. We would like to
thank Prof. Nur A. Touba for his kindness and support. We would like to thank Dr.
Kevin Lucas for the great insights and helpful comments to patterning techniques
during the years.
We are grateful to our colleagues, Dr. Charles Alpert (Cadence), Prof. Yao-Wen
Chang (National Taiwan University), Dr. Salim Chowdhury (Oracle), Prof. Chris
Chu (ISU), Dr. Brian Cline (ARM), Dr. Gilda Garreton (Oracle), Prof. Shiyan Hu
(MTU), Prof. Ru Huang (Peking University), Dr. Zhuo Li (Cadence), Dr. Lars
Liebmann (IBM), Dr. Gerard Luk-Pat (Synopsys), Dr. Rajendran Panda (Oracle),
Dr. Jing Su (ASML), Prof. Martin Wong (UIUC), Dr. Greg Yeric (ARM), Prof.
Xuan Zeng (Fudan University), and Dr. Yi Zou (ASML), for their valuable help,
suggestions, and discussions on early draft of this book.
We would like to express our gratitude to the colleagues and alumni of the
UTDA group at the University of Texas who gave us detailed expert feedback (e.g.,
Yongchan Ban, Ashutosh Chakraborty, Minsik Cho, Duo Ding, Jhih-Rong Gao,
Derong Liu, Yen-Hung Lin, Yibo Lin, Che-Lun Hsu, Tetsuaki Matsunawa, Jiaojiao
Ou, Jiwoo Pak, Subhendu Roy, Biying Xu, Xiaoqing Xu, Jae-Seok Yang, Wei Ye,
Kun Yuan, Boyang Zhang, Yilin Zhang). Only through those inspiring discussions
and productive collaborations that this book could be developed and polished.
We thank Xuming Zeng and Aaron Zou for editing and proofreading many
chapters of the book. We also thank the EDAA and the Springer Press publication
team for their help and support in the development of this text.
Last but not least, we would like to thank our families for their encouragement
and support, as they endured the time demands that writing a book has imposed
on us.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Advanced Lithography Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Overview of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Layout Decomposition for Triple Patterning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Layout Decomposition for Triple Patterning . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Preliminaries and Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Density Balanced Layout Decomposition for Triple Patterning . . . . . . 31
2.3.1 Preliminaries and Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Layout Decomposition for Other Patterning Techniques. . . . . . . . . . . . . . . . 53
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Layout Decomposition for Triple Patterning with End-Cutting . . . . . . 53
3.2.1 Preliminaries and Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Layout Decomposition for Quadruple Patterning and Beyond . . . . . . . 67
3.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Acronyms
CD Critical dimension
CP Character projection
DFM Design for manufacturability
DPL Double patterning lithography
DSA Directed self-assembly
EBL Electron beam lithography
EUV Extreme ultra violet
ICC Independent component computation
ILP Integer linear programming
IVR Iterative vertex removal
LELE Litho-etch-letho-etch process
MCC Multi-column cell
MPL Multiple patterning lithography
MPLD Multiple patterning layout decomposition
NIL Nanoimprint lithography
OPC Optical proximity correction
OSP Overlapping aware stencil planning
QPL Quadruple patterning lithography
QPL Quadruple patterning layout decomposition
RET Reticle enhancement technique
SADP Self-aligned double patterning
SDP Semidefinite programming
TPL Triple patterning lithography
TPLD Triple patterning layout decomposition
VLSI Very large scale integrated circuits
VSB Variable shaped beam
xi
Chapter 1
Introduction
Shrinking the feature size of very large scale integrated (VLSI) circuits with
advanced lithography has been a holy grail for the semiconductor industry. How-
ever, the gap between manufacturing capability and the expectation of design
performance becomes critical challenge for sub-32 nm technology nodes [1, 2].
Before addressing these challenges, we introduce some preliminaries of the current
main-stream lithography system.
As illustrated in Fig. 1.1, a conventional lithography system consists of four
basic components: light source, mask, projection lens, and wafer. The high energy
laser source sheds light on the mask and exposes the wafer through an extremely
complex combination of projection lens. In the conventional lithography system,
the resolution (R) is represented as follows [3]:
R D k1 ; (1.1)
NA
where is the wavelength of the light source (currently 193 nm), k1 is the process-
related parameter, and NA is the numerical aperture. For smaller feature sizes
(smaller R), we need smaller k1 and larger NA. The theoretical limitation of
k1 is 0:25 with intensive optical proximity correction (OPC) [4]. The NA can
be enhanced from 0:93 to 1:35 using a technique called immersion lithography,
where water is used as the medium between the lens and wafer. But it is hard
to find new liquid material to get more than 1:35 NA value in the near future [5].
Therefore, the current optical lithography system is reaching its fundamental limit
and severe variations are observed on the wafer at sub-32 nm technology nodes.
Due to these severe variations, the conventional lithography is no longer capable for
emerging technology nodes and a set of advanced lithography techniques are called
for help.
In emerging technology node and the near future, multiple patterning lithography
(MPL) has become the most viable lithography technique. As shown in Figs. 1.2
and 1.3, in MPL the original layout design is divided into several masks. Then
Mask
Projection
Lens
Medium
Wafer
each mask is implemented through one exposure-etch step, through which the
layout can be produced. Generally speaking, MPL consists of double patterning
lithography (DPL) (Fig. 1.2), triple patterning lithography (TPL) (see Fig. 1.3),
or even quadruple patterning lithography (QPL) [6–8]. There are two main types
of DPL with different manufacturing processes: litho-etch-litho-etch (LELE) [6]
and self-aligned double patterning (SADP) [9]. The advantage of MPL is that the
effective pitch can improve thus the lithography resolution can be further enhanced
[10]. Thus DPL has been heavily developed by industry for 22 nm technology node,
while triple patterning or quadruple patterning has been explored in industry test-
chip designs [11].
In the longer future (for the logic node beyond 14 nm), electron beam lithography
(EBL) is a promising advanced lithography technique, along with other candidates,
e.g., extreme ultra violet (EUV), directed self-assembly (DSA), and nanoimprint
lithography (NIL) [1]. As shown in Fig. 1.4, EBL is a maskless technology
that shoots desired patterns directly into the silicon wafer using charged particle
beam [12]. EBL has been widely deployed in the mask manufacturing, which
is a significant step affecting the fidelity of the printed image on the wafer and
1 Introduction 3
a
f f
b b
e e
a a
c c
d d
b
f
b
e
a
c
d
mask 1 mask 2 mask 3
Fig. 1.3 Multiple patterning lithography (MPL) process: the initial layout is divided into several
masks, and then each mask is implemented through one exposure-etch step
Shaping Apentures
2nd Apenture
Water
critical dimension (CD) control. In addition, due to the capability of accurate pattern
generation, EBL system has been developed for several decades [13]. Compared
with the traditional lithographic system, EBL has several advantages. (1) Electron
beam can be easily focused into nanometer diameter with charged particle beam,
which can avoid the diffraction limitation of light. (2) The price of a photomask
set is getting unaffordable, especially through the emerging MPL techniques. As a
maskless technology, EBL can reduce the manufacturing cost. (3) EBL allows
flexibility for fast turnaround times and even late design modifications to correct
or adapt a given chip layout. Because of all these advantages, EBL is being used in
mask making, small volume LSI production, and R&D to develop the technological
nodes ahead of mass production.
4 1 Introduction
Challenges for New Patterning Technique The key challenge of MPL is the
new design problem, called layout decomposition, where input layout is divided
into three masks. When the distance between two input features is less than
minimum coloring distance mins , they need to be assigned to different masks to
avoid a coloring conflict. Sometimes coloring conflict can also be resolved by
inserting stitch to split a pattern into two touching parts. However, this introduces
stitches, which lead to yield loss because of overlay error. Therefore, two of
the main objectives in layout decomposition are conflict minimization and stitch
minimization. An example of triple patterning layout decomposition is shown in
Fig. 1.3, where all features in input layout are divided into three masks (colors).
Challenges for New Design Methodology With widening manufacturing gap,
even the most advanced resolution enhancement techniques still cannot guarantee
lithography-friendly design. Therefore, increasing cooperation of physical design is
a must.
Challenges for EBL Illumination System The conventional type of EBL system
is variable shaped beam (VSB). As illustrated in Fig. 1.4, in VSB mode the layout is
decomposed into a set of rectangles, and each rectangle would be shot into resist by
dose of electron sequentially. The whole processing time of EBL system increases
with number of beam shots. Even with decades of development, the key limitation
of the EBL system has been and still is the low throughput [14]. Therefore, how to
improve the throughput of EBL system is an open question.
Note that although other advanced lithography techniques are not discussed in
this dissertation, all of them suffer from different technical barriers. For instance,
EUV is challenged by issues such as lack of power sources, resists, and defect-
free masks [15, 16]. Directed self-assembly (DSA) is a technique to phase block
copolymers to construct nanostructure. However, so far this technique can only be
used to generate contact or via layer patterns [17].
In this book, we present our research results on design for manufacturing (DFM)
for MPL and EBL [2, 7, 8, 18–22]. Figure 1.5 shows the typical design flow and
our proposed research works in the corresponding design stages. The goal of this
book is to resolve three DFM challenges in advanced lithography: new patterning
technique, new design methodology, and new EBL system.
References 5
RTL
Timing Closure
Final Layout
Fig. 1.5 The proposed DFM techniques in their corresponding design stages
References
1. Pan, D.Z., Yu, B., Gao, J.-R.: Design for manufacturing with emerging nanolithography. IEEE
Trans. Comput. Aided Des. Integr. Circuits Syst. 32(10), 1453–1472 (2013)
2. Yu, B., Gao, J.-R., Ding, D., Ban, Y., Yang, J.-S., Yuan, K., Cho, M., Pan, D.Z.: Dealing with IC
manufacturability in extreme scaling. In: IEEE/ACM International Conference on Computer-
Aided Design (ICCAD), pp. 240–242 (2012)
3. Mack, C.: Fundamental Principles of Optical Lithography: The Science of Microfabrication.
Wiley, Chichester (2008)
4. Wong, A.K.-K.: Resolution Enhancement Techniques in Optical Lithography, vol. 47. SPIE
Press (2001). DOI:10.1117/3.401208
5. Lin, B.J.: Successors of ArF water-immersion lithography: EUV lithography, multi-e-beam
maskless lithography, or nanoimprint? J. Micro/Nanolithog. MEMS MOEMS 7(4), 040101
(2008)
6. Kahng, A.B., Park, C.-H., Xu, X., Yao, H.: Layout decomposition for double patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 465–472 (2008)
7. Yu, B., Yuan, K., Zhang, B., Ding, D., Pan, D.Z.: Layout decomposition for triple patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 1–8 (2011)
8. Yu, B., Pan, D.Z.: Layout decomposition for quadruple patterning lithography and beyond. In:
ACM/IEEE Design Automation Conference (DAC), pp. 53:1–53:6 (2014)
9. Zhang, H., Du, Y., Wong, M.D., Topaloglu, R.: Self-aligned double patterning decomposition
for overlay minimization and hot spot detection. In: ACM/IEEE Design Automation Confer-
ence (DAC), pp. 71–76 (2011)
10. Lucas, K., Cork, C., Yu, B., Luk-Pat, G., Painter, B., Pan, D.Z.: Implications of triple patterning
for 14 nm node design and patterning. In: Proceedings of SPIE, vol. 8327 (2012)
6 1 Introduction
11. Bakshi, V.: EUV Lithography, vol. 178. SPIE Press (2009). DOI:10.1117/3.769214
12. Pain, L., Jurdit, M., Todeschini, J., Manakli, S., Icard, B., Minghetti, B., Bervin, G., Beverina,
A., Leverd, F., Broekaart, M., Gouraud, P., Jonghe, V.D., Brun, P., Denorme, S., Boeuf, F.,
Wang, V., Henry, D.: Electron beam direct write lithography flexibility for ASIC manufacturing
an opportunity for cost reduction. In: Proceedings of SPIE, vol. 5751 (2005)
13. Pfeiffer, H.C.: New prospects for electron beams as tools for semiconductor lithography. In:
Proceedings of SPIE, vol. 7378 (2009)
14. Yuan, K., Yu, B., Pan, D.Z.: E-beam lithography stencil planning and optimization with
overlapped characters. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 31(2), 167–179
(2012)
15. Arisawa, Y., Aoyama, H., Uno, T., Tanaka, T.: EUV flare correction for the half-pitch 22nm
node. In: Proceedings of SPIE, vol. 7636 (2010)
16. Wagner, C., Harned, N.: EUV lithography: lithography gets extreme. Nat. Photon. 4(1), 24–26
(2010)
17. Chang, L.-W., Bao, X., Chris, B., Philip Wong, H.-S.: Experimental demonstration of aperiodic
patterns of directed self-assembly by block copolymer lithography for random logic circuit
layout. In: IEEE International Electron Devices Meeting (IEDM), pp. 33.2.1–33.2.4 (2010)
18. Yu, B., Lin, Y.-H., Luk-Pat, G., Ding, D., Lucas, K., Pan, D.Z.: A high-performance triple
patterning layout decomposer with balanced density. In: IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), pp. 163–169 (2013)
19. Yu, B., Gao, J.-R., Pan, D.Z.: Triple patterning lithography (TPL) layout decomposition using
end-cutting. In: Proceedings of SPIE, vol. 8684 (2013)
20. Yu, B., Xu, X., Gao, J.-R., Pan, D.Z.: Methodology for standard cell compliance and detailed
placement for triple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 349–356 (2013)
21. Yu, B., Yuan, K., Gao, J.-R., Pan, D.Z.: E-BLOW: e-beam lithography overlapping aware
stencil planning for MCC system. In: ACM/IEEE Design Automation Conference (DAC),
pp. 70:1–70:7 (2013)
22. Yu, B., Gao, J.-R., Pan, D.Z.: L-shape based layout fracturing for e-beam lithography. In:
IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), pp. 249–254
(2013)
Chapter 2
Layout Decomposition for Triple Patterning
2.1 Introduction
dmin
a b
Fig. 2.2 (a) In double patterning, even stitch insertion cannot avoid the native conflicts. (b) Native
conflict in double patterning might be resolved by triple patterning
Zhang et al. [18], and Chen et al. [20] proposed different heuristic methods for
the TPLD problem. For row based layout design, [15, 17] presented polynomial
time decomposition algorithms.
a b c
f
b
e
a
d
c
Fig. 2.3 Layout graph construction and decomposition graph construction. (a) Layout graph
for given input, where all edges are conflict edges; (b) the vertex projection; (c) corresponding
decomposition graph, where dash edges are stitch edges
Based on the projection result, all the legal splitting locations are computed. Then a
decomposition graph [22] is constructed by Definition 2.2.
Definition 2.2 (Decomposition Graph). A decomposition graph (DG) is an undi-
rected graph with a single set of vertices V, and two sets of edges, CE and SE, which
contain the conflict edges and stitch edges (SE), respectively. V has one or more
vertices for each polygonal shape and each vertex is associated with a polygonal
shape. An edge is in CE iff its two vertices are within the minimum coloring distance
mins . An edge is in SE iff its two vertices are associated with the same polygonal
shape, necessitating a stitch.
An example of a decomposition graph (DG) is shown in Fig. 2.3c. Note that
the conflict edges are marked with black lines, while stitch edges are marked with
dashed lines. Here each stitch edge is a stitch candidate.
Stitch candidate generation is one of the most important steps in parsing a layout,
as it not only determines the number of vertices in the decomposition graph, but
also affects the decomposition result. We use DP candidates to represent the stitch
candidates generated by all previous double patterning research. Kahng et al. [1] and
Xu and Chu [3] propose different methodologies for generating the DP candidates.
In this section, we show that DP candidates may be redundant or lose some useful
candidates, and that they cannot be directly applied in TPLD problem. We therefore
provide a procedure to generate appropriate stitch candidates for triple patterning
lithography.
Here, we provide two examples demonstrating that DP candidates are not
appropriate for triple patterning. First, because of an extra color choice, some DP
candidates may be redundant. As shown in Fig. 2.4a, the stitch can be removed
because no matter what color is assigned to features b and c, feature a can always
be assigned a legal color. We denote this kind of stitch as a redundant stitch. After
removing these redundant stitches, some extra vertices in the decomposition graph
can be merged. In this way, we can reduce the problem size. Besides, DP candidates
10 2 Layout Decomposition for Triple Patterning
a b
a
a
may cause the stitch loss problem, i.e., some useful stitch candidates cannot be
detected and inserted in layout decomposition. In DPL, stitch candidates have just
one requirement: they cannot intersect any projection. For example, as shown in
Fig. 2.4b, because this stitch intersects with the projection of feature b, it cannot be
a DP candidate. However, if features b; c, and d are assigned with three different
colors, introducing the stitch is required to resolve the conflict. In other words, the
requirement for stitches in DPL limits the ability of stitches to resolve the triple
patterning conflicts and may result in unnoticed conflicts. We denote the useful
stitches forbidden by the DPL requirement as lost stitches.
Given the projection results, we propose a new process for stitch candidate
generation. Compared with the DP candidates, our methodology can remove
some redundant stitches and systematically solve the stitch loss problem. We
define the projection sequence as follows:
Definition 2.3 (Projection Sequence). After the projection, the feature is divided
into several segments, each of which is labeled with a number representing how
many other features are projected onto it. The sequence of numbers on these
segments is the projection sequence.
Instead of analyzing each feature and all of its respective neighboring features,
we can directly carry out stitch candidate generation based on the projection
sequence. For convenience, we provide a terminal zero rule, i.e., the beginning and
the end of the projection sequence must be 0. To maintain this rule, sometimes a
default 0 needs to be added. An example of projection sequence is shown in Fig. 2.5,
where the middle feature has five conflicting features, b; c; d; e, and f . Based on
the projection results, the feature is divided into ten segments. After labeling each
segment, we can get its projection sequence: 01212101010. Here, a default 0 is
added at the end of the feature.
Based on the definition of the projection sequence, we summarize the rules for
redundant stitches and lost stitches. First, motivated by the case in Fig. 2.4a, we can
summarize the redundant stitches as follows: if the projection sequence begins with
2.2 Layout Decomposition for Triple Patterning 11
“01010,” then the first stitch in DP candidates is redundant. Since the projection of
a feature can be symmetric, if the projection sequence ends with “01010,” then the
last stitch candidate is also redundant. Thus, the rule for lost stitches is as follows,
if a projection sequence contains the sub-sequence “xyz,” where x; y; z > 0 and
x > y; z > y, then there is one lost stitch at the segment labeled as y. For example,
the stitch candidate in Fig. 2.4b is contained in the sub-sequence “212,” so it is a lost
stitch.
The details of stitch candidate generation for TPL are shown in Algorithm 1.
If necessary, each multiple-pin feature is first decomposed into several two-pin
features. For each feature, we can then calculate its projection sequence. We remove
the redundant stitches by checking if the projection sequence begins or ends with
“01010”. Next we search for and insert stitches, including the lost stitches. Here,
we define a sequence bunch. A sequence bunch is a sub-sequence of a projection
sequence, and contains at least three non-0 segments.
An example of the stitch candidate generation is shown in Fig. 2.6. In the DP
candidate generation, there are two stitch candidates generated (stitch 2 and stitch 3).
Through our stitch candidate generation, stitch 3 is labeled as a redundant stitch.
Besides, stitch 1 is identified as a lost stitch candidate because it is located in a
sub-sequence “212”. Therefore, stitch 1 and stitch 2 are chosen as stitch candidates
for TPL.
Problem Formulation
a b c
1 1
λ
4
1 λ /4 2 4 λ 4
3λ 2
4 8
λ
2
3 3
3 4
6 5 6
5 5 6
Fig. 2.7 Reducing PG3C to TPLD. (a) An instance of PG3C; (b) the transferred orthogonal
drawing; (c) the corresponding TPLD instance
can be constructed in polynomial time [25], the whole reduction can be finished in
polynomial time. Minimizing the number of conflicts in the original PG3C instance
is thus equivalent to minimizing the number of conflict in the constructed TPLD
instance, which completes the proof.
2.2.2 Algorithms
The overall decomposition flow is illustrated in Fig. 2.8. First, we construct layout
graph to translate the original layout into graph representations. Two graph division
techniques are developed to the layout graph: independent component computa-
tion (ICC) and iterative vertex removal (IVR). Second, after vertex projection,
we transform the layout graph into a decomposition graph and propose two
other graph division methods: bridge edge detection/removal and bridge vertex
detection/duplication. Third, after these graph-based techniques, the decomposition
graph is divided into a set of components. To solve the color assignment for
each DG component, we propose two approaches. One is based on ILP, which
can resolve the problem exactly but may suffer a runtime overhead. The other
is a semidefinite programming (SDP) based algorithm: instead of using ILP, we
formulate the problem into a vector programming problem, then its relaxed version
can be resolved through SDP. After a mapping stage, the SDP solution can then be
translated into a color assignment solution. Finally, we merge all DG components
together to achieve the final TPLD result.
14 2 Layout Decomposition for Triple Patterning
Input Layout
Independent Component
Computation (ICC) LG Construction and
Division
Iterative Vertex Removal
(IVR)
Stitch Candidate Gen
Bridge Edge Detection
DG Construction and
Division
Bridge Vertex Detection
ILP based
Color Assignment on
each DG Component
SDP based
Output Masks
X X
min cij C ˛ sij (2.1)
eij 2CE eij 2SE
In Eq. (2.1), xi is a variable for the three colors of rectangles ri , cij is a binary
variable for each conflict edge eij 2 CE, and sij is a binary variable for each stitch
edge eij 2 SE. Constraint (2.1a) is used to evaluate the number of conflicts when
touch vertices ri and rj are assigned the same color (mask). Constraint (2.1b) is used
to calculate the number of stitches. If vertices ri and rj are assigned different colors
(masks), stitch sij is introduced.
We will now show how to implement (2.1) with ILP. Note that Eqs. (2.1a) and
(2.1b) can be linearized only when xi is a 0–1 variable [1], for which it is difficult
to represent three different colors. To handle this problem, we represent the color of
each vertex using two 1-bit 0–1 variables xi1 and xi2 . In order to limit the number of
colors for each vertex to 3, for each pair .xi1 ; xi2 / the value .1; 1/ is not permitted.
In other words, only values .0; 0/; .0; 1/, and .1; 0/ are allowed. Thus, (2.1) can be
formulated as (2.2).
X X
min cij C ˛ sij (2.2)
eij 2CE eij 2SE
The objective function is the same as in (2.1), where both minimize the weighted
summation of the number of conflicts and stitches. Constraint (2.2a) is used to
limit the number of colors for each vertex to 3. In other words, only three bit-pairs
.0; 0/; .0; 1/; .1; 0/ are legal.
Constraints (2.2b)–(2.2f) are equivalent to constraint (2.1a), where the 0–1
variable cij1 demonstrates whether xi1 equals to xj1 , and cij2 demonstrates whether xi2
equals to xj2 . The 0–1 variable cij is true only if two vertices connected by conflict
edge eij have the same color, e.g. both cij1 and cij2 are true.
Constraints (2.2g)–(2.2k) are equivalent to constraint (2.1b). The 0–1 variable
sij1 demonstrates whether xi1 is different from xj1 , and sij2 demonstrates whether xi2
is different from xj2 . Stitch sij is true if either sij1 or sij2 is true.
Although ILP formulation (2.2) can optimally solve the color assignment problem
in theory, for practical design it may suffer from a runtime overhead problem. In
this section, we show that instead of expensive ILP, the color assignment can be
also formulated using vector programming, with three unit vectors representing
three different colors. The vector programming problem is then relaxed and solved
through SDP. Given the solutions of SDP, we develop a mapping process to obtain
the final color assignment solutions. Note that our algorithm is fast enough such that
both SDP formulation and the mapping process can be finished in polynomial time.
There are three possible colors in the color assignment. We set a unit vector vEi
for every vertex vi . If eij is a conflict edge, we want vertices vEi and vEj to be far apart.
If eij is a stitch edge, we hope vertices vEi and vEj to be the same. As shown inpFig. 2.9,
we associate all the vertices with three different unit vectors: .1; 0/; . 12 ; 23 /, and
p
. 12 ; 23 /. Note that the angle between any two vectors of the same color is 0,
while the angle between vectors with different colors is 2=3.
√
1 3
(− ,− )
2 2
2.2 Layout Decomposition for Triple Patterning 17
Additionally, we define the inner product of two m-dimension vectors vEi and vEj
as follows:
X
m
vEi vEj D vik vjk
kD1
Based on the above property, we can formulate the color assignment as the
following vector program [26]:
X 2 1
2˛ X
min vEi vEj C C .1 vEi vEj / (2.3)
e 2CE
3 2 3 e 2SE
ij ij
( p ! p !)
1 3 1 3
s.t: vEi 2 .1; 0/; ; ; ; (2.3a)
2 2 2 2
Formula (2.3) is equivalent to mathematical formula (2.1): the left part is the cost
of all conflicts, and the right part gives the total cost of the stitches. Since the TPLD
problem is NP-hard, this vector programming is also NP-hard. In the next part, we
will relax (2.3) to a SDP, which can be solved in polynomial time.
Constraint (2.3a) requires solutions of (2.3) be discrete. After removing this
constraint, we generate formula (2.4) as follows:
X 2 1
2˛ X
min yEi yEj C C .1 yEi yEj / (2.4)
e 2CE
3 2 3 e 2SE
ij ij
This formula is a relaxation of (2.3), since we can take any feasible solution vEi D
.vi1 ; vi2 / to produce a feasible solution of (2.4) by setting yEi D .vi1 ; vi2 ; 0; 0; : : : ; 0/,
i.e., yEi yEj D 1 and yEi yEj D vEi vEj in this solution. Here, the dimension of vector yEi is
jVj, that is, the number of vertices in the current DG component. If ZR is the value of
an optimal solution of formula (2.4), and OPT is an optimal value of formula (2.3),
it must satisfy: ZR OPT. In other words, the solution to (2.4) provides a lower
bound approximation to that in (2.3). After removing the constant factor from the
objective function, we derive the following vector programming problem.
18 2 Layout Decomposition for Triple Patterning
X X
min .yEi yEj / ˛ .yEi yEj / (2.5)
eij 2CE eij 2SE
s.t: (2.4a)–(2.4b)
Without the discrete constraint (2.3a), programs (2.4) and (2.5) are no longer
NP-hard. To solve (2.5) in polynomial time, we will show that it is equivalent
to a SDP problem. SDP is similar to LP in that both have a linear objective
function and linear constraints. However, a square symmetric matrix of variables
can be constrained to be positive semidefinite. Although semidefinite programs are
more general than linear programs, both of them can be solved in polynomial time.
The relaxation based on SDP actually has better theoretical results than those based
on LP [27].
Consider the following standard SDP:
Constraint (2.6c) means matrix X should be positive semidefinite. Similarly, xij is the
ith row and the jth column entry of X. Note that the solution of SDP is represented as
a positive semidefinite matrix X, while solutions of the relaxed vector programming
are stored in a list of vectors. However, we can show that they are equivalent.
Lemma 2.2. A symmetric matrix X is positive semidefinite if and only if X D VV T
for some matrix V.
Given a positive semidefinite matrix X, we can use the Cholesky decomposition
to find the corresponding matrix V in O.n3 / time.
Theorem 2.2. The semidefinite program (2.6) and the vector program (2.5) are
equivalent.
2.2 Layout Decomposition for Triple Patterning 19
3 4 3 4
Proof. Given solutions fyE1 ; yE2 ; : : : ; yEm g of (2.5), the corresponding matrix X is
defined as xij D yEi yEj . In the other direction, based on Lemma 2.2, given a matrix
X from (2.6), we can find a matrix V satisfying X D VV T by using the Cholesky
decomposition. The rows of V are vectors fvi g that form the solutions of (2.5).
After solving the SDP formulation (2.6), we get a set of continuous solutions in
matrix X. Since each value xij in matrix X corresponds to yEi yEj , and yEi yEj is an
approximative solution of vEi vEj in (2.3), we can draw the conclusion that xij is
an approximation of vEi vEj . Instead of trying to calculate all vEi through Cholesky
decomposition, we pay attention to the xij value itself. Basically, if xij is close to 1,
then vertices i and j tend to be in the same color; if xij is close to 0:5, vertices i and
j tend to be in different colors.
For most of the cases, SDP can provide reasonable solutions that each xij is
either close to 1 or close to 0:5. A decomposition graph example is illustrated in
Fig. 2.10. It contains seven conflict edges and one stitch edge. Moreover, the graph
is not 2-colorable, since it contains several odd cycles. To solve the corresponding
color assignment through SDP formulation, we construct matrix A as Eq. (2.7) as
follows:
0 1
0 1 1 0:1 1
B 1 0 1 0 1C
B C
B C
A D B 1 1 0 1 0C
B C
@ 0:1 0 1 0 1 A
1 10 1 0
Note that here, we set ˛ as 0:1. After solving the SDP (2.6), we can get a matrix X
as follows:
0 1
1:0 0:5 0:5 1:0 0:5
B 1:0 0:5 0:5 0:5 C
B C
B C
XDB 1:0 0:5 1:0 C
B C
@ ::: 1:0 0:5 A
1:0
20 2 Layout Decomposition for Triple Patterning
3 4 3 4
Here we only list the upper part of the matrix X. Because X14 is 1:0, vertices 1 and
4 should be in the same color. Similarly, vertices 3 and 5 should also be in the same
color. In addition, because of all other 0:5 values, we know that no more vertices
can be in same color. Thus the final color assignment result for this example is
shown in Fig. 2.10b.
The matrix X generated from Fig. 2.10 is an ideal case, that is, all values are
either 1 or 0:5. Therefore from X we can derive the final color assignment easily.
Our preliminary results show that with reasonable threshold such as 0:9 < xij 1
for same mask, and 0:5 xij < 0:4 for different mask, more than 80 % of
vertices can be decided by the global SDP optimization. However, for practical
layout, especially those containing conflicts and stitches, some values in the matrix
X may not be so clear.
We use Fig. 2.11 for illustration. The decomposition graph in Fig. 2.11a contains
a four-clique structure f1; 3; 4; 5g, therefore at least one conflict would be reported.
Through solving the SDP formulation (2.6), we can achieve matrix X as in (2.8).
0 1
1:0 0:5 0:13 0:5 0:13
B 1:0 0:5 1:0 0:5 C
B C
B C
XDB 1:0 0:5 0:13 C (2.8)
B C
@ ::: 1:0 0:5 A
1:0
From X, we can see that x24 D 1:0. Vertices 2 and 4 should therefore share the
same color. It is not as clear-cut for x13 ; x15 , and x35 , which are all 0:13. For those
vague values, we propose a mapping process to discover the final color assignment.
The mapping algorithm is as follows. All xij values in matrix X are divided into
two types: clear and vague. If xij is close to 1 or 0:5, it is denoted as a clear
value; otherwise, it is a vague value. The mapping uses all the xij values to calculate
the guideline used to generate the final decomposition results, even when some xij s
are vague.
The details of the mapping are shown in Algorithm 2. Given the solutions from
program (2.6), triplets are constructed from particular values and sorted by xij (lines
1–2). Our mapping can then be divided into two steps. In the first stage (lines 3–8),
if xij is close to 1 or 0:5, the relationship between vertices ri and rj can be directly
determined. Here thunn and thsp are user-defined threshold values, where thunn should
2.2 Layout Decomposition for Triple Patterning 21
be close to 1, and thsp should be close to 0:5. If xij > thunn , implying that xij is
close to 1, then we apply operation Union(i, j) to merge vertices ri and rj to form a
large vertex (i.e., they are in the same color). Similarly, if xij < thsp , implying that
xij is close to 0:5, then operation Separate(i, j) is used to label vertices ri and rj
incompatible. If ri and rj are incompatible, or ri is imcompatible with any vertex that
is in rj ’s group, vertices ri and rj cannot be assigned the same color, and function
Compatible(i, j) will return false. In the second step (lines 9–12), we continue to
union the vertices i and j with the largest remaining xij , until all vertices are assigned
into three colors.
We use the disjoint-set data structure to group vertices into three colors.
Implemented with union by rank and path compression, the running time per
operation on the disjoint-set is amortized constant time [28]. Let n be the number
of vertices, then the number of triplets is O.n2 /. Sorting all the triplets requires
O.n2 log n/. Since all triplets are sorted, each of them can be visited at most once.
Because the runtime of each operation can be finished almost in constant time,
the complexity of Algorithm 2 is O.n2 log n/. Applying Algorithm 2 to the matrix
in (2.8), we can get the final color assignment (see Fig. 2.11b), where one conflict
between vertices 3 and 5 is reported.
Graph Division
graph into several independent components, we partition the initial large layout
graph into several small ones. After solving the TPLD problem for each isolated
component, the overall solution can be taken as the union of all the components
without affecting the global optimality. It should be noted that ICC is a well-known
technique that has been applied in many previous studies.
We can further simplify the layout graph by iteratively removing all vertices with
degree less than or equal to two. This second technique is called IVR, as described
in Algorithm 3. Initially, all vertices with degree no more than two are detected and
removed temporarily from the layout graph. After each vertex removal, we need to
update the degrees of the other vertices. This removing process will continue until
all the vertices are at least of degree three. All the vertices that are temporarily
removed are stored in stack S. The decomposition graph is then constructed for the
remaining vertices. After solving the color assignment for each DG component, the
removed vertices are recovered one by one.
If all the vertices in one layout graph can be temporarily removed (pushed onto
the stack S), the TPLD problem can be solved optimally in linear time. An example
is illustrated in Fig. 2.12, where all the vertices can finally be pushed onto the stack.
Even though some vertices still remain, our IVR technique can shrink the problem
size dramatically. We observe that this technique can also further partition the layout
graph into several independent components.
On the layout graph simplified by ICC and IVR, projections are carried out to
calculate all the potential stitch positions. We then construct the decomposition
graph, which includes both the conflict edges from the layout graph and the stitch
edges. Here, the stitch edges are based on the projection result. Note that ICC can
still be applied here to partition a decomposition graph into several smaller ones.
We propose two new techniques to reduce the size of each decomposition graph. The
first is bridge edge detection and removal, and the second is bridge vertex detection
and duplication.
2.2 Layout Decomposition for Triple Patterning 23
a b c
f
b
e
a
d
c
d e f
g h i
Fig. 2.12 An example of iterative vertex removal (IVR), where the TPLD problem can be solved
in linear time: (a) layout graph; (b, c, d, e) iteratively remove and push in vertices whose degrees
are less than three; (f, g, h) after assigning colors for the remaining vertices, iteratively pop up and
recover the vertices and assign any legal color; (i) TPLD can be finished after the iterative vertex
recovery
A bridge edge of a graph is an edge whose removal disconnects the graph into
two components. Removing the bridge edge can divide the whole problem into two
independent sub-problems.
An example of the bridge edge detection is shown in Fig. 2.13. Conflict edge eab
is found to be a bridge edge. Removing the bridge divides the decomposition graph
into two sides. After layout decomposition for each component, if vertices a and b
are assigned the same color, without loss of generality, we can rotate colors of all
vertices in the lower side. Similar method can be adopted when bridge is a stitch
edge. We adopt an O.jVj C jEj/ algorithm [29] to detect all bridge edges in the
decomposition graph.
A bridge vertex of a graph is a vertex whose removal disconnects the graph into
two or more components. Similar to bridge edge detection, we can further simplify
the decomposition graph by removing all the bridge vertices. An example of bridge
vertex computation is illustrated in Fig. 2.14. This simplification method is effective
24 2 Layout Decomposition for Triple Patterning
a b c d
Fig. 2.13 Bridge edge detection and removal. (a) Initial decomposition graph. (b) After bridge
edge detection, remove edge eab . (c) In two components we carry out layout decomposition. (d)
Rotate colors in the lower component to add bridge
a b c
Fig. 2.14 Bridge vertex detection and duplication. (a) Initial decomposition graph. (b) After
bridge vertex detection, duplicate vertex a. (c) Rotate the colors in lower sub-graph to merge
vertices a1 and a2
because for standard cell layouts, usually we can choose the power and ground lines
as the bridge vertices. By this way we can significantly partition the layouts by rows.
All bridge vertices can be detected using an O.jVj C jEj/ search algorithm.
Post Refinement
Although the graph division techniques can dramatically reduce the computational
time to solve the TPLD problem, Kuang and Young [16] pointed out that for some
cases IVR may lose some optimality. One example is illustrated in Fig. 2.15. The
simplified layout graph (Fig. 2.15b) can be inserted stitch candidates and assigned
legal colors (see Fig. 2.15b). However, when we recover removed vertices, the vertex
degree of a increases to 3, and there is no available color for it (see Fig. 2.15c). The
reason for this conflict is that during stitch candidate generation, vertex a is not
considered.
2.2 Layout Decomposition for Triple Patterning 25
a b c
a a a
Fig. 2.15 Iterative vertex removal may introduce additional conflicts. (a) Layout graph after
iterative vertex removal; (b) stitch generation and color assignment on the graph; (c) after adding
back the simplified vertices, one additional conflict is introduced to vertex a
a b c
a a a
Fig. 2.16 An example of post-refinement. (a) Extend layout graph to include a; (b) stitch
generation and color assignment on the new graph; (c) no additional conflict at final solution
We implemented our algorithm in CCC and tested it on an Intel Core 2.9 GHz
Linux machine. We chose GUROBI [30] as the ILP solver and CSDP [31] as the
SDP solver. ISCAS benchmarks from [2, 4] were scaled down and modified for use
as our test cases. The metal one layer was used for experimental purposes, because
it represents one of the most complex layers in terms of layout decomposition.
26 2 Layout Decomposition for Triple Patterning
The minimum color spacing mins was set to 120 for the first ten cases and 100 for
the last five cases, as in [8, 13, 14]. Parameter ˛ was set to 0:1, so the decomposition
cost was equivalent to cn# C 0:1 st#, where cn# and st# denote the conflict number
and the stitch number, respectively.
Figure 2.17 illustrates part of the decomposition result for case S1488, which can
be decomposed in 0:1 s.
First, we demonstrate the effectiveness of our stitch candidate generation.
Table 2.2 compares the performance and runtime of ILP for two different stitch
candidates, i.e., DP stitch and TP stitch. “ILP w/o. TP stitch” and “ILP w. TP
stitch” apply DP stitch and TP stitch, respectively. Note that all graph division
techniques are applied here. The columns “st#” and “cn#” denote the number
of stitches and conflicts, respectively. Column “CPU(s)” is computational time
in seconds. As discussed in Sect. 2.2.1, applying TP stitch generates more stitch
candidates. We can see from Table 2.2 that 30 % more runtime would be introduced
as a result. However, TP stitch overcomes the lost stitch problem present in DP
stitch, so the decomposition cost is reduced by 40 %. In other words, compared
with DP stitch, TP stitch can provide higher performance in terms of the number of
stitches and conflicts.
Second, we show the effectiveness of the graph division, which consists of a
set of techniques: ICC, IVR, and bridge detection. Through applying these division
techniques, the decomposition graph size can be reduced. Generally speaking, ILP
requires less runtime for a smaller decomposition graph. Table 2.3 compares the
performance and runtime of ILP on two different decomposition graphs. Here “ILP
w. ICC” means the decomposition graphs are only simplified by the ICC, while
“ILP w. 4SPD” means all the division techniques are used. Columns “TSE#” and
“TCE#” denote the total number of stitch edges and conflict edges, respectively.
From Table 2.3, we can see that, compared with only using the ICC technique, also
applying IVR and bridge detection is more effective: the number of stitch edges can
be reduced by 92 %, while the number of conflict edges can be reduced by 93 %.
2.2 Layout Decomposition for Triple Patterning 27
The columns “st#” and “cn#” show the number of stitches and conflicts in the final
decomposition results. “CPU(s)” is computational time in seconds. Compared with
the “ILP w. ICC,” the “ILP w. 4SPD” can achieve the same results with much less
runtime for some smaller cases. For some large circuits, the runtime of “ILP w. ICC”
is unacceptable, i.e., longer than 2 h. Note that if no ICC technique is used, even for
small circuits like C432, the runtime of ILP is unacceptable.
Here, we discuss the SDP solutions in greater detail. As discussed before, if
the value Xij is close to 1 or 0.5, it can be directly rounded to an integer value.
Otherwise, we have to rely on some mapping methods. Figure 2.18 illustrates the
Xij value distributions in circuit C499 and C6288. As we can see that all for C499,
the values are either in the range of Œ0:9; 1:0 or in Œ0:5; 0:4. In other words, here
SDP is effective and its results can be directly used as final decomposition results.
For the case C6288, since its result consists of several stitches and conflicts, some
Xij values are vague. But most of the values are still distinguishable.
We further demonstrate the effectiveness of this post-refinement. Table 2.4 lists
the decomposition results of two SDP-based algorithms, where the columns indicate
whether refinement was used. As shown in Table 2.4, by adding an additional post-
refinement stage, the decomposition costs can be reduced by 14 % at the expense of
only 6 % additional computational time.
Finally, we compare our decomposition algorithms with state-of-the-art lay-
out decomposers [13, 14], as shown in Table 2.5. Columns “ILP w. All” and
28
Fig. 2.18 Value distribution in matrix X for cases C499 and C6288
Compared to ILP, our SDP-based algorithm provides a much better tradeoff between
runtime and performance, achieving very comparable results (1 % of conflict
difference), but more than a 3 speed-up (Table 2.6).
In order to further evaluate the scalability of all the decomposers, we created six
additional benchmarks (“c5_total”–“c10_total”) to compare different algorithms on
very dense layouts. Tables 2.7 and 2.8 compare the results. As we can see, although
ILP can achieve the best decomposition results, its high runtime complexity makes
it impossible to solve one of the large dense layouts, even all the graph division
techniques are adopted. On the other hand, although the decomposers [13, 14] are
faster, in that all the cases can be finished in 1 s, they introduce 92 % and 87 %
more decomposition cost, respectively. In some cases, they introduce hundreds of
conflicts more than the SDP-based algorithm. Each conflict may require manual
layout modification or high ECO efforts, which are very time consuming. Therefore,
we can see that for these dense layouts, the SDP-based algorithm can achieve a good
tradeoff in terms of runtime and performance.
Table 2.7 Performance comparisons with other decomposers on very dense layouts
DAC’12 [13] TCAD [14] ILP w. All SDP w. All
Circuit st# cn# Cost st# cn# Cost st# cn# Cost st# cn# Cost
c5_total 433 248 291.3 374 248 285.4 504 21 71.4 497 30 79.7
c6_total 954 522 617.4 869 521 607.9 N/A N/A N/A 1056 179 284.6
c7_total 1111 623 734.1 938 624 717.8 1113 283 394.3 1103 304 414.3
c8_total 1910 727 918 1671 727 894.1 1676 409 576.6 1636 433 596.6
c9_total 834 525 608.4 635 525 588.5 865 256 342.5 853 267 352.3
c10_total 2463 1144 1390.3 2120 1146 1358 2510 355 606 2472 403 650.2
Avg. 1284.2 631.5 759.9 1101.2 631.8 742.0 N/A N/A N/A 1269.5 269.3 396.3
Ratio – – 1.92 – – 1.87 – – – – – 1.0
2 Layout Decomposition for Triple Patterning
2.3 Density Balanced Layout Decomposition for Triple Patterning 33
In layout decomposition, especially for TPL, density balance should also be con-
sidered along with conflict and stitch minimization. A good pattern density balance
is also expected to be a consideration in mask CD and registration control [32],
while unbalanced density would cause lithography hotspots as well as lowered CD
uniformity due to irregular pitches [4]. However, from the algorithmic perspective,
achieving a balanced density in TPL is harder than in DPL. (1) In DPL, two colors
can be more implicitly balanced; by contrast, in TPL, oftentimes existing strategies
try to do DPL first and then do some “patch” with the third mask, which causes a
big challenge in explicitly considering the density balance. (2) Due to the additional
color, the solution space is much larger [12]. (3) To reduce potential hotspots,
local density balance should be considered instead of global density balance, since
neighboring patterns are one of the main sources of hotspots. As shown in Fig. 2.19a,
b, when only the global density balance is considered, feature a is assigned white
34 2 Layout Decomposition for Triple Patterning
a b
b1 b1
c d
b3 b4 b3 b4
b1 b2 b1 b2
Fig. 2.19 Decomposed layout with (a, b) global balanced density. (c, d) Local balanced density
in all bins
color. Since two black features are close to each other, hotspot may be introduced.
To consider the local density balance, the layout is partitioned into four bins
{b1 ; b2 ; b3 ; b4 } (see Fig. 2.19c). Feature a intersects bins b1 and b2 , therefore it is
colored as blue to maintain the local density balances for both bins (see Fig. 2.19d).
Problem Formulation
Output Masks
graphs are constructed. Our goal is to assign all vertices in the decomposition graph
three colors (masks) to minimize the number of stitches and conflicts, while keeping
all density uniformities DUk as small as possible.
2.3.2 Algorithms
The overall flow of our density balanced TPL decomposer is illustrated in Fig. 2.20.
It consists of two stages: graph construction and simplification, and color assign-
ment. Given the input layout, layout graphs and decomposition graphs are con-
structed. Then, graph simplifications [8, 13] are applied to reduce the problem
size. Two additional graph simplification techniques are introduced. During stitch
candidate generation, the methods described in [16] are applied to search all stitch
candidates for TPL. In the second stage, for each decomposition graph, each vertex
is assigned one color. Before calling the SDP formulation, a fast color assignment
trial is proposed to achieve better speed-up (see Sect. 2.3.2).
Stitch candidate generation is one important step in parsing a layout. Fang
et al. [13] and Ghaida et al. [21] pointed out that the stitch candidates generated
by previous DPL works cannot be directly applied in TPL layout decomposition.
Therefore, we provide a procedure to generate appropriate stitch candidates for TPL.
The main idea is that after projection, each feature is divided into several segments,
which are labeled with numbers representing how many other features are projected
onto them. If one segment is projected by fewer than two features, then a stitch
can be introduced. Note that to reduce the problem size, we restrict the number of
maximum stitch candidates on each feature.
36 2 Layout Decomposition for Triple Patterning
a b c
b3 b4
a d
c
b1 b2
d e f
d1
a1
b b
a d
c c
a2 d2
g h i
d1 d1
a1 a1 stitch
b b
c c
a2 d2 a2 d2
Figure 2.21 illustrates the decomposition process step by step. Given the input
layout as in Fig. 2.21a, we partition it into a set of bins fb1 ; b2 ; b3 ; b4 g (see
Fig. 2.21b). Then the layout graph is constructed (see Fig. 2.21c), where the ten
vertices representing the ten features in the input layout, and each vertex represents
a polygonal feature (shape) where there is an edge (conflict edge) between two
vertices if and only if those two vertices are within the minimum coloring distance
mins . During the layout graph simplification, the vertices whose degrees are less than
or equal to two are iteratively removed from the graph. The simplified layout graph,
shown in Fig. 2.21d, only contains vertices a; b; c, and d. Figure 2.21d shows the
results of the projection. After stitch candidate generation [16], there are two stitch
candidates for TPL (see Fig. 2.21e). Based on the two stitch candidates, vertices a
and d are each divided into two vertices. The constructed decomposition graph is
given in Fig. 2.21f. It maintains all the information about conflict edges and stitch
2.3 Density Balanced Layout Decomposition for Triple Patterning 37
candidates, where the solid edges are the conflict edges while the dashed edges are
the stitch edges, which function as stitch candidates. In each decomposition graph,
a color assignment, which contains SDP formulation and partition based mapping,
is carried out. During color assignment, the six vertices in the decomposition graph
are divided into three groups: fa1 ; cg; fbg and fa2 ; d1 ; d2 g (see Fig. 2.21g, h). Here
one stitch on feature a is introduced. Figure 2.21i shows the final decomposed
layout after iteratively recovering the removed vertices. Our last process merges
the decomposed graphs. Since this example only has one decomposition graph, this
process is skipped.
Density balance, especially local density balance, is seamlessly integrated into
each step of our decomposition flow. In this section, we first elaborate how to
integrate the density balance into the mathematical formulation and corresponding
SDP formulation. Then, we discuss density balance in all other steps.
P P p
So d1 d2 D 2=3 i j leni lenj .1 vEi vEj /, where vEi D .1; 0/ and vEj D . 12 ; 23 /.
We can also calculate d1 d3 and d2 d3 using similar methods. Therefore,
DU2 D d1 d2 C d1 d3 C d2 d3
X
D 2=3 leni lenj .1 vEi vEj /
i;j2V
Because of Lemma 2.3, DUk can be represented as a vector inner product. Thus,
we have achieved the following theorem:
Theorem 2.3. Maximizing DUk achieves better density balance in bin bk .
2.3 Density Balanced Layout Decomposition for Triple Patterning 39
P
Note that we can remove the constant i;j2V denki denkj 1 in the DUk expression.
Similarly, we can eliminate the constants in the calculation of the conflict and stitch
numbers. The simplified vector expression is as follows:
X X X
min .vEi vEj / ˛ .vEi vEj / ˇ DUk (2.10)
eij 2CE eij 2SE bk 2B
X
s.t: DUk D denki denkj .vEi vEj / 8bk 2 B (2.10a)
i;j2V
( p ! p !)
1 3 1 3
vEi 2 .1; 0/; ; ; ; (2.10b)
2 2 2 2
where Aij is the entry that lies in the ith row and the jth column of matrix A:
8 P
< 1 C ˇ Pk denki denkj ; 8bk 2 B; eij 2 CE
Aij D ˛ C ˇ k denki denkj ; 8bk 2 B; eij 2 SE
: P
ˇ k denki denkj ; otherwise
Due to space limitations, the detailed proof is omitted. The solution of (2.11)
is continuous instead of discrete, and provides a lower bound of vector expres-
sion (2.10). In other words, (2.11) provides an approximate solution to (2.10).
Each Xij in solution of (2.11) corresponds to a feature pair .ri ; rj /. The value of Xij
provides a guideline, i.e., whether two features ri and rj should be the same color. If
Xij is close to 1, features ri and rj tend to belong to the same color (mask), whereas
if it is close to 0:5, ri and rj tend to be in different colors (masks). With these
40 2 Layout Decomposition for Triple Patterning
guidelines, we can adopt a mapping procedure to finally assign all input features
into three colors (masks).
In [8], a greedy approach was applied for the final color assignment. The idea is
straightforward: all Xij values are sorted, and vertices ri and rj with larger Xij values
tend to be the same color. The Xij values can be classified into two types: clear
and vague. If most of the Xij s in matrix X are clear (i.e., close to 1 or 0.5), this
greedy method may achieve good results. However, if the decomposition graph is
not 3-colorable, some values in matrix X are vague. For the vague Xij s, e.g., 0.5, the
greedy method may not be so effective.
Contrary to the previous greedy approach, we propose a partition-based mapping,
which can solve the assignment problem for the vague Xij ’s in a more effective way.
The new mapping is based on a three-way maximum-cut partitioning. The main
ideas are as follows: If a Xij is vague, instead of only relying on the SDP solution, we
also take advantage of the information in the decomposition graph. The information
is captured through constructing a graph, denoted by GM . Through formulating the
mapping as a three-way partition of the graph GM , our mapping can provide a global
view to search for better solutions.
Algorithm 4 shows our partition-based mapping procedure. Given the solutions
from program (2.11), all non-zero Xij values are used to form triplets that are
then sorted (lines 1–2). The mapping incorporates two stages to deal with the two
different types of Xij : clear and vague. The first stage (lines 3–8) is similar to that
in [8]. If Xij is clear, then the relationship between vertices ri and rj can be directly
determined. Here thunn and thsp are user-defined threshold values. For example, if
Xij > thunn , which means that ri and rj should be in the same color, then Union(i, j) is
applied to merge them into a large vertex. Similarly, if Xij < thsp , then Separate(i, j)
is used to label ri and rj as incompatible. In the second stage (lines 9–16) we deal
with vague Xij values. During the previous stage some vertices have been merged,
therefore the total number of vertices is not large. Here we construct a graph GM to
2.3 Density Balanced Layout Decomposition for Triple Patterning 41
a b
d1
a1 a1(20)
1
b 1
1
b(5)
c -0.1 d1(15)
a2 d2 1
2
-0.1
a2+d2
2 c (5)
(100)
c d
a1(20) a1(20)
1 1
1 1
1
b(5) 1
b(5)
a2+d2
2 c (5) a2+d2
2 c (5)
(100) (100)
Fig. 2.22 Density balanced mapping. (a) Decomposition graph. (b) Construct graph GM . (c)
Mapping result with cut value 8.1 and density uniformities 24. (d) A better mapping with cut
8.1 and density uniformities 23
represent the relationships among all the remaining vertices (line 9). Each edge eij in
this graph has a weight representing the cost if vertices i and j are assigned into same
color. Therefore, the color assignment problem can be restated as a maximum-cut
partitioning problem on GM (line 10–16).
Through assigning a weight to each vertex representing its density, graph GM is
able to balance densities among different bins. Based on the GM , a partitioning is
performed to simultaneously achieve a maximum-cut and balanced weight among
different parts. Note that we need to modify the gain function. Then in each move,
we try to achieve more balanced and larger cut partitions.
An example of the density balanced mapping is shown in Fig. 2.22. Based on
the decomposition graph (see Fig. 2.22a), SDP is formulated. Given the solutions
of SDP, after the first stage of mapping, vertices a2 and d2 are merged in to a large
vertex. As shown in Fig. 2.22b, the graph GM is constructed, where each vertex is
associated with a weight. There are two partition results with the same cut value 8.1
(see Fig. 2.22c, d). However, their density uniformities are 24 and 23, respectively.
To keep a more balanced resulting density, the second partitioning in Fig. 2.22c is
adopted as color assignment result.
42 2 Layout Decomposition for Triple Patterning
It is well known that the maximum-cut problem, even for a two-way partition,
is NP-hard. However, we observe that in many cases, after the global SDP
optimization, the graph size of GM could be quite small, i.e., less than 7. For these
small cases, we develop a backtracking based method to search the entire solution
space. Note that here backtracking can quickly find the optimal solution even
though three-way partitioning is NP-hard. If the graph size is larger, we propose
a heuristic method, motivated by the classic FM partitioning algorithm [34, 35]. We
make the following modifications to the classic FM algorithm: (1) In the first stage
of mapping, some vertices are labeled as incomparable, therefore before moving
a vertex from one partition to another, we should check whether it is legal. (2)
Classical FM algorithm is for min-cut problem, we need to modify the gain function
of each move to achieve a maximum cut.
The runtime complexity of graph construction is O.m/, where m is the number
of vertices in GM . The runtime of three-way maximum-cut partitioning algorithm is
O.m log m/. Note that the first stage of mapping needs O.n2 log n/ [8]. Since m is
much smaller than n, the complexity of density balanced mapping is O.n2 log n/.
Here, we show that the layout graph simplification, which was proposed in [8],
also takes local density balance into account. During layout graph simplification,
we iteratively remove and push all vertices with degree less than or equal to two.
After the color assignment on the remaining vertices, we iteratively recover all the
removed vertices and assign them legal colors. Instead of randomly assigning colors,
we search for legal colors that also improve the density uniformity.
Speed-up Techniques
a b c d
d1 d2 d1 d2 d1
d d2
b c b c b c
b c
a1 a2 a DG1
a a a'
DG2
e f e f
e f e f
g1 g2 g1 g2 g2
g g1
a + d1
a d1
d2 c
c
removal can partition the layout graph into two parts: {b, c, d} and {e, f, g}.
If stitch candidates are introduced within a, the corresponding decomposition graph
is illustrated in Fig. 2.23b, which is hard to simplify further. If we prohibit a
from being a stitch candidate, the corresponding decomposition graph is shown in
Fig. 2.23c, where a is still a cut vertex in the decomposition graph. Then, we can
apply 2-connected component computation [13] to simplify the problem size, and
apply color assignment separately (see Fig. 2.23d).
Our second technique, Decomposition graph vertex clustering, is a speed-up
technique to further reduce the decomposition graph size. As shown in Fig. 2.24a,
vertices a and d1 share the same conflict relationships against b and c. Besides, there
are no conflict edges between a and d1 . If no conflict is introduced, vertices a and
d1 should be assigned the same color, and we can cluster them together, as shown in
Fig. 2.24b. Note that the stitch and conflict relationships are also merged. Applying
vertex clustering in the decomposition graph can further reduce the problem size.
Our third technique is called Fast Color Assignment Trial. Although the
SDP and the partition-based mapping can provide high performance for color
assignment, it is still expensive to apply to all decomposition graphs. We therefore
derive a fast color assignment trial before calling our SDP-based method. If no
conflicts or stitches are introduced, our trial solves the color assignment problem
in linear time. Note that SDP is skipped only when the decomposition graph can be
colored without stitches or conflicts, so our fast trial does not degrade the quality
of the solution. Besides, our preliminary results show that more than half of the
44 2 Layout Decomposition for Triple Patterning
decomposition graphs can be decomposed using this fast method. Therefore, the
runtime can be dramatically reduced.
We implemented our decomposer in CCC and tested it on an Intel Xeon 3.0 GHz
Linux machine with 32 GB RAM. ISCAS 85&89 benchmarks from [8] were used,
where the minimum coloring spacing dism was set the same with previous studies
[8, 13]. Additionally, to perform a comprehensive comparison, we also test on
other two benchmark suites. The first suite is six dense benchmarks (“c9_total”–
“s5_total”), while the second suite is two synthesized OpenSPARC T1 designs
“mul_top” and “exu_ecc” with the Nangate 45 nm standard cell library [36]. When
processing these two benchmark suites, we set the minimum coloring distance
dism D 2 wmin C 3 smin , where wmin and smin denote the minimum wire width
and the minimum spacing, respectively. The parameter ˛ was set to 0:1. The size of
each bin was set to 10 dism 10 dism . We used CSDP [31] as the SDP solver.
In the first experiment, we compared our decomposer with the state-of-the-art
layout decomposers, which are not balanced density aware [8, 13, 16]. We obtain
the binary files from [8, 13]. Since currently we cannot obtain the binary for the
decomposer in [16], we directly use the results listed in [16]. Here our decomposer
is denoted as “SDP C PM,” where “PM” denotes the partition-based mapping. The
ˇ is set as 0. In other words, SDP C PM only optimizes for the number of stitches
and conflicts. Table 2.10 shows the comparison of the various decomposers in terms
Table 2.10 Comparison of runtime and performance
ICCAD’11 [8] DAC’12 [13] DAC’13 [16]a SDP C PM
Circuit cn# st# Cost CPU(s) cn# st# Cost CPU(s) cn# st# Cost CPU(s) cn# st# Cost CPU(s)
C432 3 1 3.1 0.09 0 6 0.6 0.03 0 4 0.4 0.01 0 4 0.4 0.2
C499 0 0 0 0.07 0 0 0 0.04 0 0 0 0.01 0 0 0 0.2
C880 1 6 1.6 0.15 1 15 2.5 0.05 0 7 0.7 0.01 0 7 0.7 0.3
C1355 1 2 1.2 0.07 1 7 1.7 0.07 0 3 0.3 0.01 0 3 0.3 0.3
C1908 0 1 0.1 0.07 1 0 1 0.1 0 1 0.1 0.01 0 1 0.1 0.3
C2670 2 4 2.4 0.17 2 14 3.4 0.16 0 6 0.6 0.04 0 6 0.6 0.4
C3540 5 6 5.6 0.27 2 15 3.5 0.2 1 8 1.8 0.05 1 8 1.8 0.5
C5315 7 7 7.7 0.3 3 11 4.1 0.27 0 9 0.9 0.05 0 9 0.9 0.7
C6288 82 131 95.1 3.81 19 341 53.1 0.3 14 191 33.1 0.25 1 213 22.3 2.7
C7552 12 15 13.5 0.77 3 46 7.6 0.42 1 21 3.1 0.1 0 22 2.2 1.1
S1488 1 1 1.1 0.16 0 4 0.4 0.08 0 2 0.2 0.01 0 2 0.2 0.3
S38417 44 55 49.5 18.8 20 122 32.2 1.25 19 55 24.5 0.42 19 55 24.5 7.9
S35932 18 94.8 89.7 46 103 56.3 4.3 44 41 48.1 0.82 44 48 48.8 21.4
2.3 Density Balanced Layout Decomposition for Triple Patterning
93
S38584 63 122 75.2 92.1 36 280 38.8 3.7 36 116 47.6 0.77 37 118 48.8 22.2
S15850 73 91 82.1 79.8 36 201 56.1 3.7 36 97 45.7 0.76 34 101 44.1 20.0
Avg. 25:8 30.7 28.9 19.1 11.3 60.87 17.42 0.978 10.1 37.4 13.8 0.22 9.07 39.8 13.0 5.23
Ratio 2.2 3.65 1.34 0.19 1.06 0.04 1.0 1.0
a
The results of DAC’13 decomposition are from [16]
45
46 2 Layout Decomposition for Triple Patterning
of runtime and performance. For each decomposer we list the number of stitches
and conflicts it generates, as well as its cost and runtime. The columns “cn#” and
“st#” denote the number of conflicts and the number of stitches, respectively. “cost”
is the cost function, which is set as cn# C0:1 st#. “CPU(s)” is the computation
time in seconds.
First, we compare SDP C PM with the decomposer in [8], which is based on
SDP formulation as well. From Table 2.10 we can see that the new stitch candidate
generation (see [16] for more details) and partition-based mapping can achieve
better performance, reducing the cost by around 55 %. Besides, SDP C PM runs
nearly 4 faster due to several proposed speed-up techniques, including 2-vertex-
connected component computation, layout graph cut vertex stitch forbiddance,
decomposition graph vertex clustering, and fast color assignment trial. Second,
we compare SDP C PM with the decomposer in [13], which applies several graph
based simplifications and a maximum independent set (MIS) based heuristic. From
Table 2.10 we can see that although the decomposer in [13] is faster, its MIS based
heuristic produces solutions that are on average 33 % more costly compared to those
produced by SDP C PM. Although SDP C PM is slower, it can reduce the cost of
the solution by around 6 % compared with the decomposer in [16].
In addition, we compare SDP C PM with other two decomposers [8, 13] for
some very dense layouts, as shown in Table 2.11. We can see that for some cases
the decomposer in [8] cannot finish in 1000 s. Compared with the work in [13],
SDP C PM can reduce solution cost by 65 %. It is observed that compared with other
decomposers, SDP C PM demonstrates much better performance when the input
layout is dense. When the input layout is dense, each independent problem may still
be quite large after graph simplification, then our SDP-based approximation can
achieve better results than heuristic. It can be observed that for the last three cases
our decomposer could eliminate thousands of conflicts. Each conflict may require
manual layout modification or high ECO efforts, which are very time consuming.
Furthermore, even though our runtime is more than [13], it is still acceptable, not
exceeding 6 min for the largest benchmark.
In the second experiment, we test our decomposer for density balance. We
analyze edge placement error (EPE) using Calibre WORKbench [37] on an
industry-strength setup. For analyzing the EPE in our test cases, we used systematic
lithography process variation, such as using ranges of focus ˙50 nm and dose
˙5 %. In Table 2.12, we compare SDP C PM with “SDP C PM C DB,” which is
our density balanced decomposer. Here ˇ is set as 0.04; testing found that bigger ˇ
do not help, and we want to give greater weight to conflict and stitch cost. Column
“cost” also lists the weighted cost of conflict and stitch, i.e., cost D cn#C0:1st#.
From Table 2.12 we can see that by integrating density balance into our
decomposition flow, our decomposer (SDPCPMCDB) can reduce the amount of
EPE hotspots by 14 %. The density balanced SDP based algorithm can maintain
similar performance to the baseline SDP implementation: only 7 % more cost of
conflict and stitch, and only 8 % more runtime. In other words, our decomposer can
achieve a good density balance while keeping comparable conflict and stitch cost.
Table 2.11 Comparison on very dense layouts
ICCAD 2011 [8] DAC 2012 [13] SDP C PM
Circuit cn# st# cost CPU(s) cn# st# cost CPU(s) cn# st# cost CPU(s)
mul_top 836 44 840.4 236 457 0 457 0.8 118 271 145.1 57:6
exu_ecc 119 1 119.1 11.1 53 0 53 0.7 22 64 28.4 4:3
c9_total 886 228 908.8 47.4 603 641 667:1 0.52 117 1009 217.9 7:7
c10_total 2088 554 2143.4 52 1756 1776 1933:6 1.1 248 1876 435.6 19
s2_total 2182 390 2221 936.8 1652 5976 2249:6 4 703 5226 1225.6 70:7
s3_total 6844 72 6851.2 7510.1 4731 13,853 6116:3 13.1 958 10,572 2015.2 254:5
s4_total NA NA NA >10,000 3868 13,632 5231:2 13 1151 11,091 2260.1 306
s5_total NA NA NA >10,000 4650 16,152 6265:2 12.9 1391 13,683 2759.3 350:4
Avg. NA NA NA >3600 2221:3 6503.8 2871:6 5.8 588:5 5474 1135.9 134
2.3 Density Balanced Layout Decomposition for Triple Patterning
6000
Runtime (sec)
5000
4000
3000
2000
1000
0
200 400 600 800 1000 1200 1400 1600 1800 2000
Number of nodes
between graph (problem) size against SDP runtime. Here the X axis denotes the
number of nodes (e.g., the problem size), and the Y axis shows the runtime. We can
see that the runtime complexity of SDP is less than O(n2:2 ).
2.4 Summary
In this chapter we have proposed a set of algorithms to solve the TPLD problem.
We have shown that this problem is NP-hard, thus the runtime required to solve it
exactly increases dramatically with the problem size. Then we presented a set of
graph division techniques to reduce the problem size. We proposed a general ILP
formulation to simultaneously minimize the number of conflicts and stitches. We
also proposed a novel vector program, and its SDP relaxation to improve scalability
for very dense layouts. In addition, density balancing was integrated into all the
key steps of our decomposition flow. Our decomposer performs better than current
state-of-the-art frameworks in minimizing the cost of conflicts and stitches. As TPL
may be adopted by industry for 14 nm/11 nm nodes, we believe more research will
be needed to enable TPL-friendly design and mask synthesis.
50 2 Layout Decomposition for Triple Patterning
References
1. Kahng, A.B., Park, C.-H., Xu, X., Yao, H.: Layout decomposition approaches for double
patterning lithography. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29, 939–952
(2010)
2. Yuan, K., Yang, J.-S., Pan, D.Z.: Double patterning layout decomposition for simultaneous
conflict and stitch minimization. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29(2),
185–196 (2010)
3. Xu, Y., Chu, C.: GREMA: graph reduction based efficient mask assignment for double
patterning technology. In: IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pp. 601–606 (2009)
4. Yang, J.-S., Lu, K., Cho, M., Yuan, K., Pan, D.Z.: A new graph-theoretic, multi-objective layout
decomposition framework for double patterning lithography. In: IEEE/ACM Asia and South
Pacific Design Automation Conference (ASPDAC), pp. 637–644 (2010)
5. Xu, Y., Chu, C.: A matching based decomposer for double patterning lithography. In: ACM
International Symposium on Physical Design (ISPD), pp. 121–126 (2010)
6. Tang, X., Cho, M.: Optimal layout decomposition for double patterning technology. In:
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 9–13 (2011)
7. Anton, V.O., Peter, N., Judy, H., Ronald, G., Robert, N.: Pattern split rules! a feasibility study
of rule based pitch decomposition for double patterning. In: Proceedings of SPIE, vol. 6730
(2007)
8. Yu, B., Yuan, K., Zhang, B., Ding, D., Pan, D.Z.: Layout decomposition for triple patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 1–8 (2011)
9. Yu, B., Xu, X., Gao, J.-R., Pan, D.Z.: Methodology for standard cell compliance and detailed
placement for triple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 349–356 (2013)
10. Ma, Q., Zhang, H., Wong, M.D.F.: Triple patterning aware routing and its comparison with
double patterning aware routing in 14nm technology. In: ACM/IEEE Design Automation
Conference (DAC), pp. 591–596 (2012)
11. Lin, Y.-H., Yu, B., Pan, D.Z., Li, Y.-L.: TRIAD: a triple patterning lithography aware
detailed router. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 123–129 (2012)
12. Cork, C., Madre, J.-C., Barnes, L.: Comparison of triple-patterning decomposition algorithms
using aperiodic tiling patterns. In: Proceedings of SPIE, vol. 7028 (2008)
13. Fang, S.-Y., Chen,W.-Y., Chang, Y.-W.: A novel layout decomposition algorithm for triple
patterning lithography. In: ACM/IEEE Design Automation Conference (DAC), pp. 1185–1190
(2012)
14. Fang, S.-Y., Chang, Y.-W., Chen, W.-Y.: A novel layout decomposition algorithm for triple
patterning lithography. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33(3), 397–408
(2014)
15. Tian, H., Zhang, H., Ma, Q., Xiao, Z., Wong, M.D.F.: A polynomial time triple patterning
algorithm for cell based row-structure layout. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 57–64 (2012)
16. Kuang, J., Young, E.F.: An efficient layout decomposition approach for triple patterning
lithography. In: ACM/IEEE Design Automation Conference (DAC), pp. 69:1–69:6 (2013)
17. Tian, H., Du, Y., Zhang, H., Xiao, Z., Wong, M.D.F.: Constrained pattern assignment for
standard cell based triple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 178–185 (2013)
18. Zhang, Y., Luk, W.-S., Zhou, H., Yan, C., Zeng, X.: Layout decomposition with pairwise
coloring for multiple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 170–177 (2013)
References 51
19. Yu, B., Lin, Y.-H., Luk-Pat, G., Ding, D., Lucas, K., Pan, D.Z.: A high-performance triple
patterning layout decomposer with balanced density. In: IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), pp. 163–169 (2013)
20. Chen, Z., Yao, H., Cai, Y.: SUALD: spacing uniformity-aware layout decomposition in
triple patterning lithography. In: IEEE International Symposium on Quality Electronic Design
(ISQED), pp. 566–571 (2013)
21. Ghaida, R.S., Agarwal, K.B., Liebmann, L.W., Nassif, S.R., Gupta, P.: A novel methodology
for triple/multiple-patterning layout decomposition. In: Proceedings of SPIE, vol. 8327 (2012)
22. Yuan, K., Pan, D.Z.: WISDOM: wire spreading enhanced decomposition of masks in double
patterning lithography. In: IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pp. 32–38 (2010)
23. Michael, R.G., David, S.J.: Computers and Intractability: A Guide to the Theory of NP-
Completeness. W. H. Freeman & Co., New York (1979)
24. Kaufmann, M., Wagner, D.: Drawing Graphs: Methods and Models, vol. 2025. Springer, Berlin
(2001)
25. Tamassia, R., Di Battista, G., Batini, C.: Automatic graph drawing and readability of diagrams.
IEEE Trans. Syst. Man Cybern. Syst. 18(1), 61–79 (1988)
26. Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)
27. Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)
28. Cormen, T.T., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge
(1990)
29. Tarjan, R.E.: A note on finding the bridges of a graph. Inf. Process. Lett. 2, 160–161 (1974)
30. Gurobi Optimization Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2014)
31. Borchers, B.: CSDP, a C library for semidefinite programming. Optim. Methods Softw. 11,
613–623 (1999)
32. Lucas, K., Cork, C., Yu, B., Luk-Pat, G., Painter, B., Pan, D.Z.: Implications of triple patterning
for 14 nm node design and patterning. In: Proceedings of SPIE, vol. 8327 (2012)
33. Chen, P., Kuh, E.S.: Floorplan sizing by linear programming approximation. In: ACM/IEEE
Design Automation Conference (DAC), pp. 468–471 (2000)
34. Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions.
In: ACM/IEEE Design Automation Conference (DAC), pp. 175–181 (1982)
35. Sanchis, L.A.: Multiple-way network partitioning. IEEE Trans. Comput. 38, 62–81 (1989)
36. NanGate FreePDK45 Generic Open Cell Library. http://www.si2.org/openeda.si2.org/projects/
nangatelib (2008)
37. Mentor Graphics.: Calibre verification user’s manual (2008)
Chapter 3
Layout Decomposition for Other Patterning
Techniques
3.1 Introduction
So far we have discussed the conventional process of TPL, called LELELE, which
follows the same principle as litho-etch-litho-etch (LELE) type double patterning
lithography (DPL). Here each “L” and “E” represents one lithography process
and one etch process, respectively. Although LELELE has been widely studied
by industry and academia, it still has two major issues. First, even with stitch
insertion, there are some native conflicts in LELELE, like four-clique conflict [1].
For example, Fig. 3.1 illustrates a four-clique conflict among features a, b, c, and
d. No matter how we assign the colors, there will be at least one conflict. Since
this four-clique structure is common in advanced standard cell design, LELELE
type TPL suffers from the native conflict problem. Second, compared to LELE type
double patterning, there are more serious overlapping problems in LELELE [2].
To overcome the limitations of LELELE, Lin [3] recently proposed a new TPL
manufacturing process called LELE-end-cutting (LELE-EC). As a TPL, this new
manufacturing process contains three mask steps: first mask, second mask, and trim
mask. Figure 3.2 illustrates an example of the LELE-EC process. To generate the
target features in Fig. 3.2a, the first and second masks are used for pitch splitting,
which is similar to LELE type DPL process. These two masks are shown in
Fig. 3.2b. Finally, a trim mask is applied to trim out the desired region as in Fig. 3.2c.
In other words, the trim mask is used to generate some end-cuts to further split
feature patterns. Although the target features are not LELELE-friendly, they are well
suited to the LELE-EC process and can be decomposed thereby without introducing
conflicts. In addition, if all cuts are properly designed or distributed, the LELE-EC
process can introduce better printability than the conventional LELELE process [3].
a b
Target/ Final 1st Mask 2nd Mask 3rd Mask
a b
c d
Fig. 3.1 Process of LELELE type triple patterning lithography. (a) Target features; (b) layout
decomposition with one conflict introduced
a b c
Target/ Final 1st Mask 2nd Mask Trim Mask
Fig. 3.2 Process of LELE-EC type triple patterning lithography. (a) Target features; (b) first and
second mask patterns; (c) trim mask, and final decomposition without conflict
a b c
Fig. 3.3 LELELE process example. (a) Decomposed result; (b) simulated images for different
masks; (c) combined simulated image as the final printed patterns
Figures 3.3 and 3.4 present simulated images of a design with four short features
passed through the LELELE and LELE-EC processes, respectively. The lithography
simulations are computed based on the partially coherent imaging system, where
the 193 nm illumination source is modeled as a kernel matrix given by Banerjee
et al. [4]. To model the photoresistance effect with the exposed light intensity,
we use the constant threshold model with threshold 0:225. We can make several
observations from these simulated images. First, there are some round-offs around
the line ends (see Fig. 3.3c). Second, to reduce the round-off issues, as illustrated in
Fig. 3.4b, in the LELE-EC process the short lines can be merged into longer lines,
after which the trim mask is used to cut off some spaces. There may be some corner
3.2 Layout Decomposition for Triple Patterning with End-Cutting 55
a b c
Fig. 3.4 LELE-EC process example. (a) Decomposed result; (b) simulated images for different
masks, where orange pattern is trim mask; (c) combined simulated image as the final printed
patterns
roundings due to the edge shorting of trim mask patterns; however, since the line
shortening or rounding is a strong function of the line width [5], and we observe that
trim mask patterns are usually much longer than the line-end width, we assume the
rounding caused by the trim mask is insignificant. This assumption is demonstrated
as in Fig. 3.4c.
Extensive research has been carried out in order to solve the corresponding
design problems for LELELE-type TPL [1, 6–12]. Additionally, the related con-
straints have been considered in early physical design stages, like routing [13, 14],
standard cell design [15, 16], and detailed placement [16]. However, only few
attempts have been made to address the LELE-EC layout decomposition problem.
Although introducing a trim mask improves printability, it introduces significant
design challenges, especially in the layout decomposition stage. In this section, we
propose a comprehensive study of LELE-EC layout decomposition. Given a layout
specified by features in polygonal shapes, we extract the geometric relationships
between the shapes and construct conflict graphs. The conflict graphs also model the
compatibility of all end-cut candidates. Based on the conflict graphs, we use integer
linear programming (ILP) to assign each vertex into one layer. Our goal in the layout
decomposition is to minimize the number of conflicts, while also minimizing the
amount of overlapping errors.
Given a layout which is specified by features in polygonal shapes, layout graph [1]
is constructed. As shown in Fig. 3.5, the layout graph is an undirected graph with a
set of vertices V and a set of conflict edges CE. Each vertex in V represents one input
feature. An edge is in CE if and only if the two corresponding features are within
minimum coloring distance dism of each other. In other words, each edge in CE is a
56 3 Layout Decomposition for Other Patterning Techniques
a b
1 2 1 2
3 4 3 4
5 6 5 6
7 7
Fig. 3.5 Layout graph construction. (a) Input layout; (b) layout graph with conflict edges
a b c
1 2 1 2 1-2
7 7 5-7
Fig. 3.6 End-cut graph construction. (a) Input layout; (b) generated end-cut candidates; (c) end-
cut graph
conflict candidate. Figure 3.5a shows one input layout, and the corresponding layout
graph is in Fig. 3.5b. Here the vertex set is V D f1; 2; 3; 4; 5; 6; 7g, while the conflict
edge set is CE D f.1; 2/, .1; 3/, .1; 4/, .2; 4/, .3; 4/, .3; 5/, .3; 6/, .4; 5/, .4; 6/,
.5; 6/, .5; 7/, .6; 7/g. For each conflict candidate edge, we check whether there is
an end-cut candidate. For each end-cut candidate i j, if it is applied, then features
i and j are merged into one feature. Then, the corresponding conflict edge can be
removed. If stitches are considered during layout decomposition, some vertices in
the layout graph can be split into several segments. The segments in one layout
graph vertex are connected through stitch edges. All these stitch edges are included
in a set called SE. Please refer to [17] for the details of stitch candidate generation.
End-Cut Graph
Since all the end-cuts are manufactured through a single exposure process, they
should be distributed far away from each other. That is, two end-cuts conflict if
the distance between them is smaller than the minimum end-cut distance disc .
These conflict relationships among end-cuts are not available in layout graph, so we
construct an end-cut graph to store the relationships. Figure 3.6a gives an example
input layout, with all end-cut candidates pointed out in Fig. 3.6b. The corresponding
end-cut graph is shown in Fig. 3.6c. Each vertex in the end-cut graph represents
one end-cut. There is a solid edge if and only if the two end-cuts conflict with each
other. There is a dash edge if and only if they are close to each other, and they can
be merged into one larger end-cut.
3.2 Layout Decomposition for Triple Patterning with End-Cutting 57
Problem Formulation
Here, we give the problem formulation of layout decomposition for triple patterning
with End-Cutting (LELE-EC).
Problem 3.1 (LELE-EC Layout Decomposition). Given a layout is specified by
features in polygonal shapes, we can construct the layout and end-cut graphs. The
LELE-EC layout decomposition assigns all vertices in the layout graph one of two
colors and selects a set of end-cuts in the end-cut graph. The objective is to minimize
the number of conflicts and stitches.
With the end-cut candidate set, LELE-EC layout decomposition is more com-
plicated due to the additional constraints. Even if there are no end-cut candidates,
LELE-EC layout decomposition is similar to the LELE type DPL layout decom-
position. Sun et al. [18] showed that achieving the minimum number of conflicts
and stitches with LELE layout decomposition is NP-hard, so it is not hard to see
that LELE-EC layout decomposition is NP-hard as well. An NP-hard problem is a
set of computational search problems that are difficult to solve [19]. No NP-hard
problem can be solved in polynomial time in the worst case under the assumption
that P ¤ NP.
3.2.2 Algorithms
The overall flow of our layout decomposer is illustrated in Fig. 3.7. First we generate
all end-cut candidates to find all possible end-cuts. Then we construct the layout
graph and the end-cut graph to transform the original geometric problem into a
graph problem, modeling the LELE-EC layout decomposition as a coloring problem
in the layout graph and an end-cut selection problem in the end-cut graph. Both the
Layout
End-cut Candidate
Decomposition Generation
Rules
Decomposition on Graph
ILP Formulation
Output Masks
coloring problem and the end-cut selection problem can be solved using one ILP
formulation. Since the ILP formulation may suffer from excessive runtime in some
cases, we propose several graph simplification techniques. To further reduce the
problem size, we propose a step in which some end-cut candidates are pre-selected
before ILP formulation. All the steps in the flow are detailed in the following
sections.
In this section, we will explain the details of our algorithm which generates all
end-cut candidates. An end-cut candidate is generated between two conflicting
polygonal shapes. It should be stressed that compared to the end-cut generation
in [20], our methodology differs in two ways:
• An end-cut can be a collection of multiple end-cut boxes, depending on the
corresponding shapes. For instance, two end-cut boxes (ecb1 and ecb2 ) need to
be generated between shapes S1 and S2 as shown in Fig. 3.8. We propose a shape-
edge dependent algorithm to generate the end-cuts with multiple end-cut boxes.
• We consider the overlaps and variations caused by end-cuts. Two lithography
simulations are illustrated in Figs. 3.9 and 3.10, respectively. In Fig. 3.9 we find
some bad patterns or hotspots, due to the cuts between two long edges. In
Fig. 3.10 we can see that the final patterns are in better shape. Therefore, to reduce
the amount of manufacturing hotspots from trim mask, we avoid generating cuts
along two long edges during end-cut candidate generation.
S1 S2 S1 S2
ecb2
a b c
Fig. 3.9 (a) Decomposition example where cuts are along long edges; (b) simulated images for
different masks; (c) combined simulated image with some hotspots
3.2 Layout Decomposition for Triple Patterning with End-Cutting 59
a b c
Fig. 3.10 (a) Decomposition example where cuts are between line ends; (b) simulated images for
different masks; (c) combined simulated image with good printability
a b c d
Edge-edge end-cut box Corner-corner end-cut box No end-cut box
a b c
type1 overlap type2 overlap
edges are in same direction but they do not overlap, and in Fig. 3.11c, the shape-
edges are oriented in different directions. The end-cut boxes generated in these
two cases are called corner-corner end-cut boxes. No end-cut box is generated in
case of Fig. 3.11d. In addition, end-cut boxes are not generated for the following
cases: (1) the end-cut box is overlapping with any existing polygonal shape in the
layout, (2) the height (h) or width (w) of the box is not within some specified
range, i.e., when h, w do not obey the following constraints hlow h hhigh and
wlow w whigh .
Then all generated end-cut boxes between two shapes are divided into indepen-
dent components IC (Line 10) based on finding connected components of a graph
G D .V; E/ with V D fvi g D set of all end-cut boxes and (vi ; vj / 2 E, if vi overlaps
with vj . The overlap between two end-cut boxes is classified into type1 and type2
overlaps. When two boxes overlap only in an edge or in a point but not in space,
we call this type1 overlap, whereas the overlap in space is termed type2 overlap as
shown in Fig. 3.12. Each of the ic 2 IC may contain multiple end-cut boxes. If the
total number of end-cut-boxes (jVj) is equal to jICj, that implies there is no overlap
between the end-cut boxes and we generate all of them (Lines 11–14).
For multiple boxes in each ic, if there is an overlap between corner-corner and
edge-edge end-cut boxes, the corner-corner end-cut box is removed (Line 16). After
doing this, either there will be a set of type1 overlaps or a set of type2 overlaps in
each ic. In case of type2 overlaps, the end-cut box with the minimum area is chosen
as shown in Fig. 3.13. For type1 overlaps in each ic, all end-cut boxes are generated
(Line 20).
3.2 Layout Decomposition for Triple Patterning with End-Cutting 61
a b c d
Min-area end-cut box
S1
S2
Fig. 3.13 Minimum area end-cut box is chosen for type2 overlaps
ILP Formulations
After the construction of the layout and end-cut graphs, the LELE-EC layout
decomposition problem can be reformulated as a coloring problem on the layout
graph and a selection problem on the end-cut graph. At first glance, the coloring
problem is similar to that in LELE layout decomposition. However, since the conflict
graph cannot be guaranteed to be planar, the face graph-based methodology [21]
cannot be applied here. Therefore, we use an ILP formulation to solve both the
coloring and selection problems simultaneously. For convenience, some notations
in the ILP formulation are listed in Table 3.1.
First, we discuss the ILP formulation when no stitch candidates are generated
in layout graph. Given a set of input layout features fr1 ; : : : ; rn g, we construct the
layout and end-cut graphs. Every conflict edge eij is in CE, while every end-cut
candidate ecij is in SE. xi is a binary variable representing the color of ri . cij is a
ij 2 CE. To minimize the number of conflicts, our
binary variable for conflict edge eP
objective function is to minimize eij 2CE cij .
To evaluate the number of conflicts, we provide the following constraints:
8
ˆ
ˆ xi C xj 1 C cij C ecij if 9 ecij 2 EE
<
.1 xi / C .1 xj / 1 C cij C ecij if 9 ecij 2 EE
(3.1)
ˆ xi C xj 1 C cij if 6 9 ecij 2 EE
:̂
.1 xi / C .1 xj / 1 C cij if 6 9 ecij 2 EE
62 3 Layout Decomposition for Other Patterning Techniques
words, with these speed-up techniques, ILP formulation can achieve comparable
results to formulations without them.
The first speed-up technique is called independent component computation. By
breaking down the whole layout graph into several independent components, we
partition the initial layout graph into several small ones. Then each component can
be resolved through ILP formulation independently. At last, the overall solution can
be taken as the union of all the components without affecting the global optimality.
Note that this is a well-known technique which has been applied in many previous
studies (e.g., [17, 22, 23]).
Our second technique is called Bridge Computation. A bridge of a graph is
an edge whose removal disconnects the graph into two components. If the two
components are independent, removing the bridge can divide the whole problem
into two independent sub-problems. We search for all bridge edges in the layout
graph, then divide the whole layout graph through these bridges. Note that all
bridges can be found in O.jVj C jEj/, where jVj is the number of vertices and jEj is
the number of edges in the layout graph.
Our third technique is called End-Cut Pre-Selection. Some end-cut candidates
have no conflict end-cuts. Each end-cut candidate ecij that has no conflict end-cuts
would be pre-selected for the final decomposition results. That is, the features ri
and rj are merged into one feature. This further reduces the problem size of ILP
formulation. End-cut pre-selection can thus be finished in linear time.
We implemented our algorithms in C++ and tested on an Intel Xeon 3.0 GHz Linux
machine with 32GB RAM. Fifteen benchmark circuits from [1] are used. GUROBI
[24] is chosen as the ILP solver. The minimum coloring spacing mins is set as 120
for the first ten cases and as 100 for the last five cases, as in [1, 7]. The width
threshold wth, which is used in end-cut candidate generation, is set as dism .
In the first experiment, we show the decomposition results of the ILP formu-
lation. Table 3.2 compares two ILP formulations: “ILP w/o. stitch” and “ILP w.
stitch.” Here, “ILP w/o. stitch” is the ILP formulation based on the graph without
stitch edges, while “ILP w. stitch” considers the stitch insertion in the ILP. Note
that all speed-up techniques are applied to both. Columns “Wire#” and “Comp#”
report the total feature number and the divided component number, respectively.
For each method, we report the number of conflicts, the number of stitches, and
the computational time in seconds(“CPU(s)”). “Cost” is the cost function, set to
conflict# C0:1 stitch#.
From Table 3.2, we can see that compared with “ILP w/o. stitch,” when stitch
candidates are considered in the ILP formulation, the cost can be reduced by 2 %,
while the runtime increased by 5 %. It should be noted that stitch insertion has
been shown to be an effective method to reduce the costs for both LELE and
LELELE layout decompositions. However, we can see that for LELE-EC layout
64
decomposition, stitch insertion is not very effective. When also considering the
overlap variability due to the stitches, stitch insertion for LELE-EC may not be
an effective method.
In the second experiment, we analyze the effectiveness of the proposed speed-up
techniques. Figure 3.14 compares two ILP formulations “w/o. stitch w/o. speedup”
and “w/o. stitch w. speedup,” where “w/o. stitch w/o. speedup” only applies
independent component computation, while “w. speedup” involves all three speed-
up techniques. Besides, none of them consider the stitch in layout graph. From
Fig. 3.14 we can see that with speed-up techniques (bridge computation and end-
cut pre-selection), the runtime can be reduced by around 60 %.
Figure 3.15 demonstrates the similar effectiveness of speed-up techniques
between “w. stitch w. speedup” and “w/o. stitch w. speedup.” Here stitch candi-
dates are introduced in layout graph. We can see that for these two ILP formulations,
the bridge computation and the end-cut pre-selection can reduce runtime by around
56 %.
Figure 3.16 shows four conflict examples in decomposed layout, where conflict
pairs are labeled with red arrows. We can observe that some conflicts (see
Fig. 3.16a, c) are introduced due to the end-cuts existing in neighboring. For these
two cases, the possible reason is the patterns are irregular, therefore some end-cuts
that close to each other cannot be merged into a larger one. We can also observe
some conflicts (see Fig. 3.16b, d) come from via shapes. For these two cases, one
possible reason is that it is hard to find end-cut candidates around via, comparing
with long wires.
66 3 Layout Decomposition for Other Patterning Techniques
a b
c d
Fig. 3.16 Conflict examples in decomposed results. (a), (c) Conflicts because no additional end-
cut can be inserted due to the existing neighboring end-cuts. (b), (d) Conflicts because no end-cut
candidates between irregular vias
3.3 Layout Decomposition for Quadruple Patterning and Beyond 67
a b
mask 1
a b a b
mask 2
mask 3
c d c d
mask 4
Fig. 3.17 (a) A common native conflict from triple patterning lithography; (b) the conflict can be
resolved through quadruple patterning lithography
68 3 Layout Decomposition for Other Patterning Techniques
3.3.2 Algorithms
The overall flow of our layout decomposition is summarized in Fig. 3.18. We first
construct a decomposition graph to transform the original geometric patterns into a
graph model. By this way, the QPLD problem can be formulated as four-coloring
Graph Division
SDP based Algorithm
Color Assignment
Linear Color Assignment
Output Masks
√ √ √ √
6 2 1
(− 6 2 1
3 , − 3 , −3)
( 3 , − 3 , −3)
x
√
2 2 1
(0, 3 , −3)
y
on the decomposition graph. To reduce the problem size, graph division techniques
(see Sect. 3.3.2) are applied to partition the graph into a set of components. Then
the color assignment problem can be solved independently for each component, to
minimize both the number of conflicts and the number of stitches. In the following
section, we propose two color assignment algorithms: a semidefinite programming
(SDP) based algorithm, and linear color assignment.
SDP has been successfully applied to the triple patterning layout decomposition
problem [1, 10]. We will now show that SDP formulation can be extended to
solve QPLD problem. To represent four different colors p
(masks),p
as illustrated
p
in
2 2 1 6 2 1
Fig. 3.19, we use four unit vectors [25]: .0; 0; 1/, .0; 3 ; 3 /, . 3 ; 3 ; 3 / and
p p
. 36 ; 32 ; 13 /. We construct the vectors in such a way that the inner product
for any two vectors vEi ; vEj satisfies the following invariant: vEi vEj D 1 if vEi D vEj ;
vEi vEj D 13 if vEi ¤ vEj .
Using the above vector definitions, the QPLD problem can be reformulated in
the following manner:
X 3 1
3˛ X
min vEi vEj C C .1 vEi vEj / (3.6)
e 2CE
4 3 4 e 2SE
ij ij
( p ! p p !
2 2 1 6 2 1
s.t: vEi 2 .0; 0; 1/; 0; ; ; ; ; ;
3 3 3 3 3
p p !)
6 2 1
; ;
3 3 3
where the objective function minimizes the number of conflicts and stitches. ˛ is a
user-defined parameter, which is set as 0:1 in this work. After relaxing the discrete
constraints in (3.6) and removing the constants from the objective function, we
obtain the following SDP formulation.
70 3 Layout Decomposition for Other Patterning Techniques
7: function BACKTRACK(t; G0 )
8: if t sizeŒG0 then
9: if Find a better color assignment then
10: Store current color assignment;
11: end if
12: else
13: for all legal color c do;
14: G0 Œt c;
15: BACKTRACK(t C 1; G0 );
16: G0 Œt 1;
17: end for
18: end if
19: end function
X X
min vEi vEj ˛ vEi vEj (3.7)
eij 2CE eij 2SE
After solving the SDP, we get a set of continuous solutions in a matrix X, where
each value xij in matrix X corresponds to vi vj . If xij is close to 1, vertices vi ; vj tend
to be in the same mask (color). A greedy mapping algorithm [1] can then be used to
produce a color assignment, but its performance may not be ideal.
To overcome the limitations of the greedy method, we propose a backtrack based
algorithm (see Algorithm 7) that considers both SDP results and graph information
for use in our framework. The backtrack-based method accepts two arguments: the
SDP solution fxij g and a threshold value tth. In our work tth is set to 0:9. As stated
before, if xij is close to be 1, two vertices vi and vj tend to be in the same color
(mask). Therefore, we scan all pairs, and combine some vertices into one larger
vertex (lines 1–3). This reduces the number of vertices and thus simplifies the graph
(line 4). The simplified graph is called a merged graph [10]. On the merged graph,
the BACKTRACK algorithm searches for an optimal color assignment (lines 7–19).
3.3 Layout Decomposition for Quadruple Patterning and Beyond 71
The backtrack-based method may still incur runtime overhead, especially for
complex cases where the SDP solution cannot provide enough merging candidates.
Therefore, a more efficient method of color assignment is required. One might think
that the color assignment for quadruple patterning can be solved through the four
color map theorem [26], which states that every planar graph is four-colorable.
However, in emerging technology nodes, the designs are so complex that we observe
many K5 or K3;3 structures, where K5 is the complete graph with five vertices and
K3;3 is the complete bipartite graph with six vertices. Kuratowski’s theorem [27]
also states that the decomposition graph is not planar, so it is difficult to apply the
classical four coloring technique [28].
Here we propose an efficient color assignment algorithm that targets not only
planar graphs, but also general graphs. Our algorithm also runs in linear time, an
improvement over the classical four coloring method [28], which runs in quadratic
time.
Our linear color assignment algorithm, summarized in Algorithm 8, involves
three stages. The first stage is iterative vertex removal. For each vertex vi , we denote
its conflict degree (the number of incident conflict edges) as dconf .vi / and its stitch
degree (the number of stitch edges) as dstit .vi /. The main idea is to identify the
vertices with conflict degree less than 4 and stitch degree less than 2 as non-critical.
Thus, they can be temporarily removed and pushed onto a stack S (lines 1–4). After
coloring the remaining (critical) vertices, each vertex in the stack S is popped one
at a time and assigned a legal color (lines 11–15). This strategy is safe in terms of
the number of conflicts. In other words, when a vertex is popped from S, there will
always be an available color that will not introduce any conflicts.
In the second stage (lines 5–9), all remaining (critical) vertices should be
assigned colors one by one. However, color assignment in one specific order may be
72 3 Layout Decomposition for Other Patterning Techniques
c d
Half Pitch
c c
d
a b d
a b
e
e
stuck at a local optimum, which stems from the greedy nature. For example, given
a decomposition graph in Fig. 3.20a, if the coloring order is a-b-c-d-e, when vertex
d is the greedily selected grey color, the following vertex e cannot be assigned any
color without conflict (see Fig. 3.20b). In other words, vertex ordering significantly
impacts the coloring result.
To alleviate the impact of vertex ordering, we propose two strategies. The first
strategy is called color-friendly rules, as in Definition 3.1. In Fig. 3.20c, all conflict
neighbors of pattern d are labeled inside a grey box. Since the distance between a
and d is within the range .mins ; mins C hp/, a is color-friendly to d. Interestingly,
we discover a rule that for a complex/dense layout, color-friendly patterns tend
to be of the same color. Based on these rules, during linear color assignment, to
determine one vertex color, instead of just comparing its conflict/stitch neighbors,
the colors of its color-friendly vertices would also be considered. Detecting color-
friendly vertices is similar to conflict neighbor detection; thus, it can be finished
during decomposition graph construction without much additional effort.
Definition 3.1 (Color-Friendly). A pattern a is color-friendly to pattern b iff their
distance is larger than mins , but smaller than mins C hp. Here hp is the half pitch,
and mins is the minimum coloring distance.
Our second strategy is called peer selection, where three different vertex orders
are processed simultaneously, with the best one selected as the final coloring
solution (lines 6–8). Although the color assignment is solved thrice, the total
computational time is still linear since, for each order, the coloring runs in linear
time.
In the third stage (line 10), post-refinement greedily checks each vertex to see
whether the solution can be further improved.
1. SEQUENCE-COLORING: Vertices are assigned colors based on the initial
order.
3.3 Layout Decomposition for Quadruple Patterning and Beyond 73
Algorithm 9 3ROUND-COLORING(vec)
Require: Vector vec containing all vertices;
1: Label each vi 2 vec as UNSOLVED;
2: for all vertex vi 2 vec do F 1st round
3: c.vi / a minimum cost color;
4: Label vi as SOLVED;
5: if All four colors have been applied at least once then
6: Break; F End 1st round
7: end if
8: end for
9: for all UNSOLVED vertex vi 2 vec do F 2nd round
10: if vi has only one legal color c then
11: c.vi / c;
12: Label vi as SOLVED;
13: end if
14: end for
15: for all UNSOLVED vertex vi 2 vec do F 3rd round
16: c.vi / a minimum cost color;
17: Label vi as SOLVED;
18: end for
19: return c.vi /; 8vi 2 vec;
The key thing to note is that the vertices in vec are nearly conflict degree descending
order, but construction takes linear time.
The details of the method 3ROUND-COLORING are shown in Algorithm 9,
where vertices with fewer coloring choices tend to be resolved first. In other words,
we prefer to first assign a color to a vertex with “less flexibility.” At the beginning,
all vertices are labeled as UNSOLVED (line 1), then the vector vec is scanned three
times in succession. In the first round of scanning, for each vertex vi 2 vec, a greedy
color with minimum cost is assigned (lines 2–4). When all four colors have been
applied at least once, the first round is stopped. In the second round of scanning,
the vertices with only one legal color would be assigned colors. When a vertex
74 3 Layout Decomposition for Other Patterning Techniques
Graph division is a technique that partitions the whole decomposition graph into
a set of components, for which color assignment can be solved independently for
each component. In our framework, the techniques extended from previous work
are summarized as follows: (1) Independent Component Computation [1, 7–10, 21,
22, 29]; (2) Vertex with Degree Less than 3 Removal [1, 7, 9, 10]1 ; (3) 2-Vertex-
Connected Component Computation [7, 9, 10].
Another technique, cut removal, has been proven powerful in double patterning
layout decomposition [1, 7, 29]. A cut of a graph is an edge whose removal partitions
the graph into two components. The definition of cut can be extended to 2-cut (3-
cut), which is a pair (triplet) of edges whose removal would disconnect the graph.
However, while 1-cut and 2-cut detection can be finished in linear time [7], 3-cut
detection is much more complicated. Here we propose an effective 3-cut detection
method, which can be easily extended to detect any K-cut (K 3).
Figure 3.21a shows a graph with a 3-cut (a d; b e; c f ), and two
components can be derived by removing this 3-cut. After color assignment the two
components, for each cut edge, if the colors of the two endpoints are different, the
two components can be merged directly. Otherwise, a color rotation operation is
required for one component. For a vertex v in graph, we denote c.v/ as its color,
where c.v/ 2 f0; 1; 2; 3g. Vertex v is said to be rotated by i, if c.v/ is changed to
.c.v/ C i/. It is easy to see that all vertices in one component should be rotated
by the same value, so no additional conflict is introduced within the component
itself. An example of such a color rotation operation is illustrated in Fig. 3.21b, c,
where conflict between vertices c; f would need to be removed to interconnect two
components together. Here all the vertices in component 2 are rotated by 1 (see
Fig. 3.21c). We have the following Lemma:
Lemma 3.1. In the QPLD problem, color rotation after interconnecting 3-cut does
not increase the number of conflicts.
Proof. We denote the three edges in a 3-cut as (a1 b1 ; a2 b2 ; a3 b3 ). Without loss
of generality, we rotate the colors of b1 ; b2 ; b3 to remove any conflict derived from
the edges. Since all vertices are rotated by the same value, there are four rotation
options for the whole component. For one edge fa1 b1 g, one option is infeasible
1
In QPLD problem, the vertices with degree less than 4 would be detected and removed temporally.
3.3 Layout Decomposition for Quadruple Patterning and Beyond 75
a b c
color 0 color 1 color 2 color 3
a b c a b c a b c
d e f d e f d e f
rotated by 1
that would cause one conflict. At most three options are infeasible on the 3-cut.
Therefore, there will always remain an option that does not introduce any conflict
on the 3-cut.
To detect all 3-cuts, we have the following Lemma:
Lemma 3.2. If the minimum cut between two vertices vi and vj is less than 4, then
vi ; vj belong to different components when divided by a 3-cut.
Based on Lemma 3.2, we can see that if the cut between vertices vi ; vj is larger or
equal to 4 edges, vi ; vj should belong to the same component. One straightforward
3-cut detection method is to compute the minimum cuts for all the fs tg pairs.
However, for a decomposition graph with n vertices, there are n.n 1/=2 pairs of
vertices. Computing all the cut pairs may take prohibitively long and be impractical
for complex layout designs.
Gomory and Hu [30] showed that the cut values between all pairs of vertices can
be computed by solving only n1 network flow problems on graph G. Furthermore,
they showed that the flow values can be represented by a weighted tree T on the n
vertices, where for any pair of vertices .vi ; vj /, if e is the minimum weight edge on
the path from vi to vj in T, then the minimum cut value from vi to vj in G is exactly
the weight of e. Such a weighted tree T is called Gomory–Hu tree (GH-tree). For
example, given the decomposition graph in Fig. 3.22a, the corresponding GH-tree is
shown in Fig. 3.22b, where the value on edge eij is the minimum cut number between
vertices vi and vj . Because of Lemma 3.2, to divide the graph through 3-cut removal,
all the edges with value less than 4 would be removed. The final three components
are shown in Fig. 3.22c.
76 3 Layout Decomposition for Other Patterning Techniques
a b c
d a a
a
3
b b
c 4 4 3
b
e c d e c d e
Fig. 3.22 (a) Decomposition graph; (b) corresponding GH-tree; (c) components after 3-cut
removal
The procedure of the 3-cut removal is shown in Algorithm 10. We first construct
a GH-tree based on the algorithm by Gusfield [31] (line 1). Dinic’s blocking flow
algorithm [32] is applied to aid in GH-tree construction. All edges in the GH-tree
with weights less than four are then removed (line 2). After solving the connected
components problem (line 3), we can assign colors to each component separately
(lines 4–5). Finally, color rotation is applied to interconnect all 3-cuts back (line 6).
a b
wm
sm 2 · sm + wm
(Sect. 3.3.2), SDP followed by greedy mapping, and linear color assignment
(Sect. 3.3.2), respectively. Here we implement an ILP formulation extended from
the triple patterning work [1]. In SDPCGreedy, a greedy mapping from [1] is
applied. All the graph division techniques, including GH-tree based division, are
applied. The columns “cn#” and “st#” denote the number of conflicts and stitches,
respectively. Column “CPU(s)” is the time color assignment takes in seconds.
From Table 3.3 we can see that for small cases the ILP formulation can achieve
best performance in terms of conflict number and stitch number. However, for
large cases (S38417, S35932, S38584, S15850) the runtime for ILP increases
dramatically such that none of them can be finished in 1 h. Compared with ILP,
SDPCBacktrack can achieve near-optimal solutions, i.e., in every case the conflict
number is optimal, while two more stitches are introduced in only one case.
SDPCGreedy method can achieve 2 speedup against SDPCBacktrack. But the
performance of SDPCGreedy worsens for complex designs hundreds of additional
conflicts are reported. The linear color assignment can achieve around 200 speed-
up against SDPCBacktrack, while only 15 % more conflicts and 8 % more stitches
are reported.
We further compare the algorithms for pentuple patterning, that is, K D 5.
To the best of our knowledge there is no exact ILP formulation for pentuple
patterning in literature. Therefore we consider three baselines, i.e., SDPCBacktrack,
SDPCGreedy, and Linear. All the graph division techniques are applied. Table 3.4
evaluates the six most dense cases. We can see that compared with SDPCBacktrack,
SDPCGreedy can achieve around 8 speed-up, but 15 % more conflicts are
reported. In terms of runtime, linear color assignment can achieve 500 and
60 speed-up, against SDP+Backtrack and SDPCGreedy, respectively. In terms of
performance, linear color assignment reports the best conflict number minimization,
but more stitches may be introduced.
Interestingly, we observe that when a layout is multiple patterning friendly,
color-friendly rules can provide a good guideline; thus, linear color assignment
can achieve high performance in terms of the number of conflicts. However,
when a layout is very complex or involves many native conflicts, linear color
78
assignment reports more conflicts than SDPCBacktrack. One possible reason is that
the color-friendly rules are not good in modeling global conflict minimization, but
both SDP and backtrack provide a global view.
Lastly, Figs. 3.24 and 3.25 compare the performance of different vertex orders,
in terms of the number of conflicts and stitches. Here SEQUENCE-COLORING,
DEGREE-COLORING, and 3ROUND-COLORING denote the coloring through the
three different respective orders. Peer selection is the method proposed in our linear
color assignment. From Fig. 3.24 we can clearly see that peer selection can achieve
fewer conflicts than any single vertex order. This is because, for each test case, the
whole decomposition graph is divided into several components. For each component
one specific order may dominate and then be selected by peer selection. Therefore,
for the whole layout, peer selection would be better than any single vertex ordering
rule.
80 3 Layout Decomposition for Other Patterning Techniques
3.4 Summary
References
1. Yu, B., Yuan, K., Zhang, B., Ding, D., Pan, D.Z.: Layout decomposition for triple patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 1–8 (2011)
2. Ausschnitt, C., Dasari, P.: Multi-patterning overlay control. In: Proceedings of SPIE, vol. 6924
(2008)
References 81
3. Lin, B.J.: Lithography till the end of Moore’s law. In: ACM International Symposium on
Physical Design (ISPD), pp. 1–2 (2012)
4. Banerjee, S., Li, Z., Nassif, S.R.: ICCAD-2013 CAD contest in mask optimization and bench-
mark suite. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 271–274 (2013)
5. Mack, C.: Fundamental Principles of Optical Lithography: The Science of Microfabrication.
Wiley, New York (2008)
6. Cork, C., Madre, J.-C., Barnes, L.: Comparison of triple-patterning decomposition algorithms
using aperiodic tiling patterns. In: Proceedings of SPIE, vol. 7028 (2008)
7. Fang, S.-Y., Chen, W.-Y., Chang, Y.-W.: A novel layout decomposition algorithm for triple
patterning lithography. In: ACM/IEEE Design Automation Conference (DAC), pp. 1185–1190
(2012)
8. Tian, H., Zhang, H., Ma, Q., Xiao, Z., Wong, M.D.F.: A polynomial time triple patterning
algorithm for cell based row-structure layout. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 57–64 (2012)
9. Kuang, J., Young, E.F.: An efficient layout decomposition approach for triple patterning
lithography. In: ACM/IEEE Design Automation Conference (DAC), pp. 69:1–69:6 (2013)
10. Yu, B., Lin, Y.-H., Luk-Pat, G., Ding, D., Lucas, K., Pan, D.Z.: A high-performance triple
patterning layout decomposer with balanced density. In: IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), pp. 163–169 (2013)
11. Zhang, Y., Luk, W.-S., Zhou, H., Yan, C., Zeng, X.: Layout decomposition with pairwise
coloring for multiple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 170–177 (2013)
12. Yu, B., Pan, D.Z.: Layout decomposition for quadruple patterning lithography and beyond. In:
ACM/IEEE Design Automation Conference (DAC), pp. 53:1–53:6 (2014)
13. Ma, Q., Zhang, H., Wong, M.D.F.: Triple patterning aware routing and its comparison with
double patterning aware routing in 14nm technology. In: ACM/IEEE Design Automation
Conference (DAC), pp. 591–596 (2012)
14. Lin, Y.-H., Yu, B., Pan, D.Z., Li, Y.-L.: TRIAD: a triple patterning lithography aware
detailed router. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 123–129 (2012)
15. Tian, H., Du, Y., Zhang, H., Xiao, Z., Wong, M.D.F.: Constrained pattern assignment for
standard cell based triple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 178–185 (2013)
16. Yu, B., Xu, X., Gao, J.-R., Pan, D.Z.: Methodology for standard cell compliance and detailed
placement for triple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 349–356 (2013)
17. Kahng, A.B., Park, C.-H., Xu, X., Yao, H.: Layout decomposition for double patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 465–472 (2008)
18. Sun, J., Lu, Y., Zhou, H., Zeng, X.: Post-routing layer assignment for double patterning. In:
IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), pp. 793–798
(2011)
19. Cormen, T.T., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge
(1990)
20. Yu, B., Gao, J.-R., Pan, D.Z.: Triple patterning lithography (TPL) layout decomposition using
end-cutting. In: Proceedings of SPIE, vol. 8684 (2013)
21. Xu, Y., Chu, C.: A matching based decomposer for double patterning lithography. In: ACM
International Symposium on Physical Design (ISPD), pp. 121–126 (2010)
22. Yuan, K., Yang, J.-S., Pan, D.Z.: Double patterning layout decomposition for simultaneous
conflict and stitch minimization. In: ACM International Symposium on Physical Design
(ISPD), pp. 107–114 (2009)
23. Yang, J.-S., Lu, K., Cho, M., Yuan, K., Pan, D.Z.: A new graph-theoretic, multi-objective layout
decomposition framework for double patterning lithography. In: IEEE/ACM Asia and South
Pacific Design Automation Conference (ASPDAC), pp. 637–644 (2010)
82 3 Layout Decomposition for Other Patterning Techniques
24. Gurobi Optimization Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2014)
25. Karger, D., Motwani, R., Sudan, M.: Approximate graph coloring by semidefinite program-
ming. J. ACM 45, 246–265 (1998)
26. Appel, K., Haken, W.: Every planar map is four colorable. Part I: discharging. Ill. J. Math.
21(3), 429–490 (1977)
27. Kuratowski, C.: Sur le probleme des courbes gauches en topologie. Fundam. Math. 15(1),
271–283 (1930)
28. Robertson, N., Sanders, D.P., Seymour, P., Thomas, R.: Efficiently four-coloring planar graphs.
In: ACM Symposium on Theory of computing (STOC), pp. 571–575 (1996)
29. Tang, X., Cho, M.: Optimal layout decomposition for double patterning technology. In:
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 9–13 (2011)
30. Gomory, R.E., Hu, T.C.: Multi-terminal network flows. J. Soc. Ind. Appl. Math. 9(4), 551–570
(1961)
31. Gusfield, D.: Very simple methods for all pairs network flow analysis. SIAM J. Comput. 19(1),
143–155 (1990)
32. Dinic, E.A.: Algorithm for solution of a problem of maximum flow in networks with power
estimation. Sov. Math. Dokl 11(5), 1277–1280 (1970)
33. Borchers, B.: CSDP, a C library for semidefinite programming. Optim. Methods Softw. 11,
613–623 (1999)
Chapter 4
Standard Cell Compliance and Placement
Co-Optimization
4.1 Introduction
The TPL layout decomposition problem with conflict and stitch minimization has
been studied extensively in the past few years [1–9], including the early work
presented in Chaps. 2 and 3. However, most existing work suffers from one or
more of the following drawbacks. (1) Because the TPL layout decomposition
problem is NP-hard [3], most of the decomposers are based on approximation
or heuristic methods, possibly leading to extra conflicts being reported. (2) For
each design, since the library only contains a fixed number of standard cells,
layout decomposition would contain numerous redundant works. For example, if
one cell is applied hundreds of times in a single design, it would be decomposed
hundreds of times during layout decomposition. (3) Successfully carrying out these
decomposition techniques requires the input layouts to be TPL-friendly. However,
since all these decomposition techniques are applied at a post-place/route stage,
where all the design patterns are already fixed, they lack the ability to resolve some
native TPL conflict patterns, e.g., four-clique conflicts.
It has been observed that most hard-to-decompose patterns originate from the
contact and M1 layers. Figure 4.1 shows two common native TPL conflicts in
the contact and M1 layers, respectively. As shown in Fig. 4.1a, contact layout
within the standard cell may generate some four-clique patterns, which cannot
be decomposed. Meanwhile, if placement techniques are not TPL-friendly, some
boundary metals may introduce native conflicts (see Fig. 4.1b). Since redesigning
indecomposable patterns in the final layout requires high ECO efforts, generating
TPL-friendly layouts, especially in the early design stage, becomes especially
urgent. Through these two examples, we can see that TPL constraints should be
considered in both the standard cell design and placement stages, so that we can
avoid indecomposable patterns in the final layout.
a b
Fig. 4.1 Two native conflicts (in red boxes) from (a) contact layer within a standard cell; (b) M1
layer between adjacent standard cells
4.2 Preliminaries
Our framework assumes a row-structure layout where cells in each row are at
the same height, and power and ground rails go from the very left to the very
right (see Fig. 4.2a). A similar assumption was made in the row-based TPL
layout decomposition in [5] as well. The minimum metal feature width and the
minimum spacing between neighboring metal features are denoted as wmin and
smin , respectively. We define the minimum spacing between metal features among
different rows as drow . If we further analyze the layout patterns in the library, we
observe that the width of a power or ground rail is twice the width of a metal wire
within standard cells [18]. Under the row-structure layout, we have the following
lemma.
Lemma 4.1. There is no coloring conflict between two M1 wires or contacts from
different rows.
Proof. For TPL, the layout will be decomposed into three masks, which means
layout features within the minimum coloring distance are assigned three colors to
a b
Power Power Power
wmin smin
Fig. 4.2 (a) Minimum spacing between M1 wires among different rows. (b) Minimum spacing
between M1 wires with the same color
86 4 Standard Cell Compliance and Placement Co-Optimization
increase the pitch between neighboring features. Then, we can see from Fig. 4.2 that
the minimum spacing between M1 features with the same color in TPL is dmin D
2 wmin C 3 smin . We assume the worst case for drow , which means the standard cell
rows are placed as mirrored cells and allow for no routing channel. Thus, drow D
4 wmin C 2 smin . We should have drow > dmin , which equals 2 wmin > smin .
This condition can easily be satisfied for the M1 layer. For the same reason, we can
achieve a similar conclusion for the contact layer.
Based on the row-structure assumption, the whole layout can be divided into
rows, and layout decomposition or coloring assignment can be carried out for each
row separately. Without loss of generality, for each row, the power/ground rails are
assigned the color 1 (default color). The decomposed results for each row will then
not induce coloring conflicts among different rows. In other words, the coloring
assignment results in each row being able to be merged together without losing
optimality.
The overall flow of our proposed framework is illustrated in Fig. 4.3, which consists
of two stages: (1) methodologies for standard cell compliance and (2) TPL-aware
detailed placement. In the first stage, standard cell compliance, we carry out
Std-Cell Library
2. Std-Cell Analysis
3. Std-Cell Pre-Coloring
Global Moving
Decomposed Placement
Fig. 4.3 Overall flow of the methodologies for standard cell compliance and detailed placement
4.3 Standard Cell Compliance 87
standard cell conflict removal, timing analysis, standard cell pre-coloring, and
lookup table generation. After the first stage we can ensure that, for each cell, a
TPL-friendly cell layout and a set of pre-coloring solutions will be provided. In the
second stage, TPL-aware detailed placement, we will discuss how to consider TPL
constraints in the single row placement problem (see Sects. 4.4.1 and 4.4.2), and
global moving (see Sect. 4.4.3).
Note that, since triple patterning lithography constraints are seamlessly inte-
grated into our coherent design flow, we do not need a separate additional step of
layout decomposition. In other words, the output of our framework is a decomposed
layout that resolves cell placement and color assignment simultaneously.
Without considering TPL in standard cell design, the cell library may involve several
cells with native TPL conflicts (see Fig. 4.1a for one example). The inner native
TPL conflict cannot be resolved through either cell shift or layout decomposition.
Additionally, one cell may be applied many times in a single design, implying
that each inner native conflict may cause hundreds of coloring conflicts in the
final layout. To achieve a TPL-friendly layout after the physical design flow, we
should first ensure the standard cell layout compliance for TPL. Specifically, we will
manually remove all four-clique conflicts through standard cell modification. Then,
parasitic extraction and SPICE simulation are applied to analyze the timing impact
for the cell modification.
An example of native TPL conflict is illustrated in Fig. 4.4, where four contacts
introduce an indecomposable four-clique conflict structure. For such cases, we
modify the contact layout into hexagonal close packing, which allows for the
most aggressive cell area shrinkage for a TPL-friendly layout [19]. Note that
after modification, the layout still needs to satisfy the design rules. From the
layout analysis of different cells, we have various ways to remove such four-clique
conflicts. As shown in Fig. 4.4, with slight modification to the original layout,
we can choose either to move contacts connected with power/ground rails or to
shift contacts on the signal paths of the cell. We call these two options case 1
and case 2, respectively, both of which will lead to a TPL-friendly standard cell
layout. Note that although conventional cell migration techniques [20–22] might
be able to automatically shift layout patterns to avoid four-clique patterns, it is
hard to guarantee that the modified layout can maintain good timing performance.
Therefore, in this work, we manually modify the standard cell layout and verify
timing after each shift operation.
88 4 Standard Cell Compliance and Placement Co-Optimization
Generally, the cell layout design flexibility is beneficial for resolving conflicts
between cells when they are placed next to each other. However, from a circuit
designer’s perspective, we want to achieve little timing variation among various
layout styles of a single cell. Therefore, we need simulation results to demonstrate
negligible timing impact due to layout modification.
A Nangate 45 nm Open Cell Library [18] has been scaled to 16 nm technology
node. After native TPL conflict detection and layout modification, we carry out the
standard cell level timing analysis. Calibre xRC [23] is used to extract parasitic
information of the cell layout. For each cell, we have original and modified layouts
with case 1 and case 2 options. From the extraction results, we can see that the
source/drain parasitic resistance of transistors varies with the position of contacts,
which directly result from layout modification. We use SPICE simulation to
characterize different types of gates, which is based on the 16 nm PTM model [24].
Then, we can get the propagation delay of each gate, which is the average of the
rising and falling delays. We pick the six most commonly used cells to measure the
relative change in the propagation delay due to layout modification (see Fig. 4.5).
It is clear that, for both case 1 and case 2, the timing impact will be within 0.5 % of
the original propagation delay of the gates, which can be assumed to be insignificant
timing variation. Based on case 1 or case 2 options, we will remove all conflicts
among cells of the library with negligible timing impact. Then, we can ensure the
standard cell compliance for triple patterning lithography.
4.3 Standard Cell Compliance 89
2
case 1
Delay degradation (%) 1 case 2
–1
–2
INV_X1 INV_X2 AND2_X1 NAND2_X1 OR_X1 NOR2_X1
Fig. 4.5 The timing impact from layout modification for different types of gates, including case 1
and case 2
For each type of standard cell, after removing the native TPL conflicts, we seek a set
of pre-coloring solutions. These cell solutions are prepared as a supplement to the
library. In this section, we first describe the cell pre-coloring problem formulation,
then we introduce our algorithms to solve this problem.
Problem Formulation
Given the input standard cell layout, all the stitch candidates are captured through
wire projection [6]. One feature in the layout is divided into two adjacent parts if
one stitch candidate is introduced. Then a constraint graph (CG) is constructed to
represent all input features and all of the stitch candidates. A CG is an undirected
graph where each vertex is associated with one input layout feature. In a CG, there
is a conflict edge iff the two corresponding touching vertices are connected through
one stitch candidate. There is a stitch edge iff two untouched vertices are within
the minimum coloring distance dmin . For example, given an input layout shown in
Fig. 4.6a, five stitch candidates are generated through wire projection. The constraint
graph is illustrated in Fig. 4.6b, where the conflict edges and the stitch edges are
shown as solid edges and dash edges, respectively. Note that we forbid the stitch
on small features, e.g., contact, due to printability issues. Different from previous
stitch candidate generation, we forbid the stitch on boundary metal wires due to the
observation that boundary stitches tend to cause indecomposable patterns between
two cells.
Based on the constraint graph, the standard cell pre-coloring problem is to
search all possible coloring solutions. At first glance, this problem is similar to cell
level layout decomposition. However, unlike in conventional layout decomposition,
pre-coloring could have more than one solution for each cell. For some complex
cell structures, if we exhaustively enumerate all possible colorings, there may be
thousands of solutions. A large solution size would impact the performance of our
90 4 Standard Cell Compliance and Placement Co-Optimization
a b c
Boundary Wire
Stitch Candidate
Fig. 4.6 Constraint graph construction and simplification. (a) Input layout and all stitch candi-
dates. (b) Constraint graph (CG) where solid edges are conflict edges and dash edges are stitch
edges. (c) The simplified constraint graph (SCG) after removing immune features
3: function BACKTRACK(t; G)
4: if t sizeŒG then
5: Store current color solution;
6: else
7: for all legal color c do;
8: GŒt c;
9: BACKTRACK(t C 1; G);
10: GŒt 1;
11: end for
12: end if
13: end function
In the first step, given a SCG, we enumerate all possible coloring solutions. Our
enumeration is based on a backtracking algorithm [25], which usually explores
implicit directed graphs to carry out a systematic search of all solutions.
The details of SCG solution enumeration are shown in Algorithm 11. Given a
SCG, G, a backtracking function, BACKTRACK(0, G) is called to search the whole
graph (line 1). The backtracking is a modified depth-first search of the solution
space (lines 3–13). In line 7, a color c is denoted as legal if, when vertex GŒt is
assigned a color c, no conflicts are introduced, and the total stitch number does not
exceed maxS. It should be mentioned that, since all power/ground rails are assigned
a default color, the colors of the corresponding vertices are assigned before the
backtracking process. For example, given the SCG shown in Fig. 4.6c, if no stitch is
allowed, there are eight solutions (see Fig. 4.7).
9: function BRANCH-AND-BOUND(t; si )
10: if t sizeŒsi then
11: if GET-COST( ) < minCost then
12: minCost GET-COST();
13: end if
14: else if LOWER-BOUND( ) > minCost then
15: Return;
16: else if si Œt ¤ 1 then
17: BRANCH-AND-BOUND(t C 1; si );
18: else F si Œt D 1
19: for each available color c do;
20: si Œt c;
21: BRANCH-AND-BOUND(t C 1; si );
22: si Œt 1;
23: end for
24: end if
25: end function
CG Solution Verification
Until now, we have enumerated all coloring solutions for a SCG. However, not all of
the SCG solutions can achieve a legal layout decomposition in the initial constraint
graph (CG). Therefore, in the second step, CG solution verification is performed
on each generated solution. Since a SCG is a subset of a CG, the verification can
be viewed as layout decomposition with pre-colored features on SCG. If a coloring
solution with stitch number less than maxS for the whole CG can be found, it will
be stored as one pre-coloring solution. The CG solution verification is based on the
branch-and-bound algorithm [25], which is very similar to backtracking, in that a
state space tree is used to solve a problem. However, the differences are twofold. (1)
The branch-and-bound method is used only for optimization problems, i.e., only one
solution is generated. (2) The branch-and-bound algorithm introduces a bounding
function to prune sub-optimal nodes in the search space. That is, at each node of
search space, we calculate a bound on the possible solution. If the bound is worse
than the best solution we have found so far, then we do not need to go to the sub-
space.
The details of the CG solution verification are shown in Algorithm 12. Given a set
of SCG coloring solutions S0 D fs01 ; s02 : : : s0n g, we first generate the corresponding
4.3 Standard Cell Compliance 93
For each cell ci in the library, we have generated a set of pre-coloring solutions
Si D fsi1 ; si2 ; : : : ; siv g. We further pre-compute the decomposability of each cell
pair and store them in a look-up table. For example, assume that two cells, ci and
cj , are assigned the pth and qth coloring solutions, respectively. Then we store into
the look-up table a value LUT.i; p; j; q/, which is the minimum distance required
when ci is to the left of cj . If two colored cells can legally be abutted to each other,
the corresponding value would be 0. Otherwise, the value would be the site number
required to keep the two cells decomposable. Meanwhile, for each cell, the stitch
numbers in the different coloring solutions are also stored. Note that during the
look-up table construction, we consider cell flipping and store related values as well.
We first solve a single row placement, where we determine the orders of all cells on
the row. When the TPL process is not considered, this row based design problem is
known as the OSR problem, which has been well studied [26–29]. Here we revisit
the OSR problem with the TPL process consideration. For convenience, Table 4.1
lists the notations used in this section.
Problem Formulation
where LUT.i; p; j; q/ is the minimum distance required between .i; p/ & .j; q/. Based
on these notations, we define the TPL-OSR problem as follows.
Problem 4.2 (TPL Aware Ordered Single Row Problem). Given a single row
placement, we seek a legal placement and cell color assignment such that the half-
perimeter wire-length (HPWL) of all nets and the total stitch number are minimized.
Compared with the traditional OSR problem, the TPL-OSR problem faces two
special challenges: (1) TPL-OSR not only needs to solve the cell placement, but it
also needs to assign appropriate coloring solutions to cells to minimize the stitch
number. In other words, cell placement and color assignment should be solved
simultaneously. (2) In conventional OSR problems, if the sum of all cell widths is
less than the row capacity, it is guaranteed that there would be one legal placement
solution. However, for TPL-OSR problems, since some extra sites may be spared to
resolve coloring conflicts, we cannot calculate the required site number before the
coloring assignment.
In addition, note that compared with the conventional color assignment problem,
in TPL-OSR, the solution space is much larger. That is, to resolve the coloring
conflict between two abutted cells, ci and cj , apart from picking up compatible
coloring solutions, TPL-OSR can seek to flip cells (see Fig. 4.9a) or shift cells (see
Fig. 4.9b).
We propose a graph model that correctly captures the cost of HPWL and the stitch
number. Furthermore, we will show that performing a shortest path algorithm on the
graph model can optimally solve the TPL-OSR problem.
To consider cell placement and cell color assignment simultaneously, we con-
struct a directed acyclic graph G D .V; E/. The graph PG has vertex set V and edge
set E. V D ff0; : : : ; mg f0; : : : ; Ng; tg, where N D niD1 vi . The vertex in the first
row and first column is defined as vertex s. We can see that each column corresponds
to one site’s start point, and each row is related to one specified color assignment
for one cell. Without loss of generality, we label each row as r.i; p/, if it is related
to cell ci with the pth coloring solution. The edge set E is composed of three sets of
edges: horizontal edges Eh , ending edges Ee , and diagonal edges Ed .
a b c
Fig. 4.9 Two techniques for removing conflicts during placement. (a) Flip the cell; (b) shift
the cell
96 4 Standard Cell Compliance and Placement Co-Optimization
We denote each edge by its start and end point. A legal TPL-OSR solution
corresponds to finding a directed path from vertex s to vertex t. Sometimes one
row cannot handle insertion of all the cells, so we introduce ending edges. With
these ending edges, the graph model is guaranteed to contain a path from s to t.
To simultaneously minimize the HPWL and stitch number, we define the cost
on edges as follows. (1) All horizontal edges have zero cost. (2) For ending edge
f.r.i; p/; m/ ! tg, it is labeled with cost .n i/ M, where M is a large number. (3)
For diagonal edge f.r.i; p/; k/ ! .r.j; q/; k C w.cj / C LUT.i; p; j; q//g, it is labelled
with the cost calculated as follows:
where WL is the HPWL increment of placing cj in position qLUT.i; p; j; q/. Here
˛ is a user-defined parameter for assigning relative importance between the HPWL
and the stitch number. In our framework, ˛ is set to 10. The general structure of G
is shown in Fig. 4.10. Note that for clarity, we do not show the diagonal edges.
One example of the graph model is illustrated in Fig. 4.11, where two cells, c1
and c2 , are to be placed in a row with 5 sites. Each cell has two different coloring
solutions and the corresponding required stitch number. For example, the label (2,1)-
0 means c2 is assigned to the first coloring solution with no stitch. The graph model
is shown in Fig. 4.11b–d, where each figure shows a different part of the diagonal
edges. Cells c1 and c2 are connected with pin 1 and pin 2, respectively. Therefore,
c1 tends to be on the left side of the row, while c2 tends to be on the right side.
Figure 4.12 gives two shortest path solutions with the same HPWL. Because the
second has a lower stitch number, it would be selected as the solution for the TPL-
OSR problem.
a b
1 (1,1)-0 (2,1)-0
0 1 2 3 4 5
2 (1,1)-0 (2,2)-1 s
(1,1)
3 (1,2)-1 (2,1)-0
(1,2)
4 (1,2)-1 (2,2)-1
(2,1) t
pin 1 pin 2
(2,2)
c d
0 1 2 3 4 5 0 1 2 3 4 5
s s
(1,1) (1,1)
(1,2) (1,2)
(2,1) t (2,1) t
(2,2) (2,2)
Fig. 4.11 Example for the TPL-OSR problem. (a) Two cells with different coloring solutions to be
placed into a 5 sites row; graph models with diagonal edges (b) from s vertex to first cell; (c) from
c1_1 to second cell; (d) from c1_2 to second cell
a b
0 1 2 3 4 5 0 1 2 3 4 5
s s
(1,1) (1,1)
(1,2) (1,2)
(2,1) t (2,1) t
(2,2) (2,2)
Fig. 4.12 Shortest path solutions on the graph model with (a) 1 stitch and (b) 0 stitch
Since G is a directed acyclic graph, the shortest path can be calculated using
a topological traversal of G in O.mnK/ steps, where K is the maximal pre-
coloring solution number for each cell. To apply a topological traversal, a dynamic
programming algorithm is proposed to find the shortest path from the s vertex to the
t vertex.
98 4 Standard Cell Compliance and Placement Co-Optimization
a b
(1,1) (2,1) (1,1) (2,1)
0 0
0 0 0 0
1 1
s t s t
1 0 1 0
0 0
1 1
(1,2) (2,2) (1,2) (2,2)
Fig. 4.13 (a) The first stage to solve color assignment. In this example edge cost only considers
the stitch number minimization. (b) One shortest path solution corresponds to a color assignment
solution
Although the unified graph model can be solved optimally through a shortest path
method in O.mnK/, for a practical design where each cell could allow many pre-
coloring solutions, the proposed graph model may still suffer from a long runtime.
Here we present a new two-stage graph model for the TPL-OSR problem. The key
improvement is to decompose the previous unified graph model into two smaller
graphs, one for color assignment and another for cell placement. Solving the new
model will thus provide a fast solution to the TPL-OSR problem.
To solve the example in Fig. 4.11, the first stage graph model is illustrated in
Fig. 4.13a, where the cost of each edge corresponds to the stitch number required
for each cell-color pair .i; p/. Note that in our framework, relative positions among
cells are also considered in the edge cost. A shortest path on the graph corresponds
to a color assignment with minimum stitch number.
Our second stage handles cell placement, and we consider the previous color
assignment solutions here. That is, if in previous color assignment cells, ci1 and ci
are assigned its pth and qth coloring solutions, then the width of cell ci is changed
from w.i/ to w.i/ C LUT.i 1; p; i; q/. This way, the extra sites to resolve coloring
conflicts are prepared for cell placement. Based on the updated cell widths, the
graph model in [29] can be directly applied here. The second stage graph model
for the example in Fig. 4.11 is illustrated in Fig. 4.14. Note that since all cells
have been assigned a coloring solution, the graph size is much smaller than that
in Fig. 4.11. As shown in Fig. 4.14b, the shortest path on the graph corresponds to a
cell placement.
The first graph model can be solved in O.nK/, while the second graph model can
be resolved in O.mn/. Therefore, although the speed-up technique cannot achieve
an optimal solution of the TPL-OSR problem, applying the two-stage graph model
can reduce the complexity from O.mnK/ to O.nK C mn/.
4.4 TPL Aware Detailed Placement 99
b
0 1 2 3 4 5
s
(1,1)
t
(2,1)
(1,1) (2,1)
Here we consider another single row placement problem similar to the TPL-
OSR, where we determine the initial cell orders. Unlike in TPL-OSR, each cell
is forbidden from moving more than a distance M from its original location. This
new problem is called TPL-OSR with Maximum Displacement. The motivation
to study this new problem is twofold. First, in the previous TPL-OSR problem,
although the two stage graph model can provide fast solutions due to solving
the color assignment and cell placement separately, its solution qualities may
not be good enough. For the new problem, we are able to propose a fast and
high performance optimization algorithm. Second, from a design perspective, a
detailed placement technique with maximum displacement constraints is robust
and important in practical situations. For example, if the initial placement is
optimized toward other design metrics, e.g., pin density or routability, limiting cell
displacements can help to maintain these metrics.
Problem Formulation
The problem we solve is finding new locations for all cells that preserve their relative
order. Meanwhile, each cell has a maximum moving distance, M, from its original
location. In other words, each cell ci has 2M C 1 possible new locations between
Œx.i/ M; x.i/ C M. Here x.i/ is the original position of cell ci , while M is a
user-defined parameter. Based on these notations, the TPL-OSR with maximum
displacement problem is defined as follows:
100 4 Standard Cell Compliance and Placement Co-Optimization
Inspired by Taghavi et al. [15], our algorithm is based on linear dynamic program-
ming, which means the optimal solution can be found in linear time. The general
idea is to process cells starting from c1 and explore cell pair locations for (c1 , c2 ),
followed by (c2 , c3 ), etc. Once the optimal placements and color assignments
for c1 ; : : : ; ci1 are computed, we search for the optimal placement and color
assignment simultaneously for ci . For convenience, Table 4.2 lists some additional
notations used in this linear dynamic programming.
The details of the linear dynamic programming are shown in Algorithm 13. Line
1 initializes the solution costs. The main algorithmic computation takes place in
the loops (lines 2–17). We iteratively explore all cell pairs (ci1 ; ci ) with different
displacement values and color assignment solutions (lines 2–4). For cell pair
(ci1 ; ci ) and different combinations of (d1; a1; d2; a2), the best cost is stored in
tŒiŒd2Œa2, while d1 and a1 are stored in dŒiŒd2Œa2 and aŒiŒd2Œa2, respectively
(lines 9–13). Fi1 .d1; a1; d2; a2/ is the cost, considering wirelength impact and
stitch number, defined as follows:
In this section, we present our overall scheme for the design level TPL-aware
detailed placement. Algorithm 14 summarizes the overall flow. Initially, all rows
are labeled as FREE, which means additional cells can be inserted (line 3). In each
main loop, rows are sorted such that the row with more cells occupied will be solved
earlier. For each row rowi , we carry out single row TPL-aware detailed placement
as introduced in Sects. 4.4.1 and 4.4.2, to solve color assignment and cell placement
simultaneously. Note that it may be the case that in one row, we cannot assign all
cells legal positions, due to the extra sites required to resolve coloring conflicts.
If the single row problem ends with unsolved cells, Global Moving is applied to
move some cells to other rows (line 7). The basic idea behind the Global Moving
is to find the “optimal row and site” for a cell in the placement region and remove
some local triple patterning conflicts. For each cell we define its “optimal region”
as the placement site where the HPWL is optimal [30]. Note that one cell can only
be moved to FREE rows. Since some cells in the middle of a row may be moved,
we need to solve the OSR problem to rearrange the cell positions [29]. Note that
since all cells on the row have been assigned colors, cell widths should be updated
to preserve extra space for coloring conflicts (line 8–9) . After solving a row rowi , it
is labeled as BUSY (line 10).
Since the rows are placed and colored one by one sequentially, the solution
obtained within one single row may not be good enough. Therefore, our scheme
is able to repeatedly call the main loop until no significant improvement is achieved
(line 13).
4.5 Experimental Results 103
size for each single row placement, columns “max cell # per row” and “max m per
row” are used. These columns represent maximum cell module number in one row,
and maximum site number in one row.
In the first experiment, we demonstrate the effectiveness of our overall TPL-
aware design compared to conventional TPL-aware design. Conventional TPL-
aware design flow consists of standard cell synthesis, placement, and TPL layout
decomposition at post-stage. Our TPL-aware design flow integrates TPL constraints
into standard cell synthesis and detailed placement, and no layout decomposition
is required on the whole chip layout. Table 4.4 compares both flows for the M1
layer of all the benchmarks. Column “Conventional flow” is the conventional
TPL design flow. Encounter is chosen as the placer, and an academic decomposer
[4] is used as our layout decomposer. Column “Our flow” is our proposed TPL-
aware detailed placement. Layout modification and pre-coloring are carried out for
each standard cell, and the optimal graph model is utilized to solve cell placement
and color assignment simultaneously. Note that for each flow, the standard cell
inner native conflicts have been removed through use of our compliance techniques
(see Sect. 4.3). In other words, the conflicts can only theoretically happen on the
boundaries between standard cells.
4.5 Experimental Results 105
On the one hand, we can see that in the conventional design flow, even when
each standard cell is TPL-friendly, more than 1000 conflicts are reported on average
in the final decomposed layout. Meanwhile, over 20,000 stitches are introduced on
average for each case. Due to the large number of conflicts and stitches, substantial
effort may be required to manually modify or migrate the layout to resolve the
conflicts. On the other hand, through considering TPL constraints in the early design
stages, our proposed TPL-aware design flow can guarantee zero conflicts. Since
stitch number optimization is considered in both cell pre-coloring and TPL-aware
detailed placement, the stitch number can be reduced by 92.7 %, compared with
traditional flow.
In Sects. 4.4.1 and 4.4.2, we proposed several algorithms for solving TPL-aware
single row detailed placement. In the second experiment, we analyzed the perfor-
mance of the proposed algorithms and related speed-up techniques in Table 4.5.
Column “GREEDY” is a greedy detailed placement algorithm [14] whose imple-
mentation is used as our baseline. Although the work in [14] is targeting the
self-aligned double patterning (SADP), the proposed detailed placement algorithm
can be modified to be integrated into our framework. Columns “TPLPlacer” and
“TPLPlacer-2Stage” are detailed placement algorithms with different TPL-OSR
engines. TPLPlacer utilizes the optimal unified graph model, while TPLPlacer-
2Stage uses fast two-stage graph models to solve color assignment and cell
placement iteratively. Column “TPLPlacer-MDP” applies the linear dynamic
programming method (see Sect. 4.4.2) to solve the TPL-OSR with maximum
displacement problem. Here the maximum displacement value, M, is set to 8. For
each algorithm we list several metrics: “ST#”, “WL”, and “CPU(s)”. “ST#” is
the stitch number on the final decomposed layout. “WL” is the total wirelength
difference taken before and after our TPL aware placement, where HPWL is applied
to calculate the total wirelength. Column “CPU(s)” gives the detailed placement
process runtime in seconds.
From column “GREEDY,” we can see that the greedy method is very fast, i.e.,
if a legal solution is found it can be finished in less than 0.01 s. However, in 9 out
of 21 cases it cannot find legal placement solutions. For each illegal result “N/A”
is labeled in the table. These illegal solutions result because GREEDY only shifts
the cells right. Therefore, due to the greedy nature, for a benchmark case with high
cell utilization, it may cause final placement violation. Meanwhile, since the color
assignment is solved through a greedy method as well, it loses the global view
necessary to minimize the stitch number. We can observe that more stitches are
reported for those cases where it finds legal results.
Next, we compare two TPL-OSR algorithms: “TPLPlacer” and “TPLPlacer-
2Stage.” We can see that both of them can yield very similar wire-length improve-
ment (around 1 % wire-length reduction). In “TPLPlacer-2Stage,” the unified graph
is divided into two independent graphs, so the graph size can be reduced. Due
to the smaller graph size, “TPLPlacer-2Stage” can get a 100 speed-up against
“TPLPlacer.” However, “TPLPlacer-2Stage” has a 19 % higher stitch number. This
may be because under the two-stage graph model, placement and color assignment
are optimized separately, so this speed-up technique may lose some optimality in
terms of stitch number.
106
From column “TPLPlacer-MDP,” we can see that the linear dynamic program-
ming technique has a better trade-off in terms of optimizing wirelength and stitch
number together. That is, “TPLPlacer-MDP” achieves nearly the same wire-length
and stitch number results, compared with “TPLPlacer.” Meanwhile, “TPLPlacer-
MDP” is 14 faster than the unified graph model in “TPLPlacer.” This is because
“TPLPlacer-MDP” is a linear runtime algorithm, while “TPLPlacer” has nearly a
quadratic runtime complexity.
For test case ctl-70, Fig. 4.15 demonstrates three stitch density maps through dif-
ferent detailed placement algorithms: TPLPlacer, TPLPlacer-MDP, and TPLPlacer-
2Stage. The final stitch numbers for these three detailed placement techniques are
335, 330, and 491, respectively. We can see that the density maps in Figs. 4.15a, b
are very similar, which means that speed-up technique TPLPlacer-MDP can achieve
a very comparable result with TPLPlacer. However, another speed-up technique,
TPLPlacer-2Stage, may involve more final stitches (see Fig. 4.15c).
The “TPLPlacer-MDP” is implemented with M D 8. In other words, each cell
ci has 2M C 1 possible new positions between Œx.i/ M; x.i/ C M. Since the M
value determines the placement solution space, it greatly impacts the performance
of detailed placement. Therefore, to demonstrate the robustness of “TPLPlacer-
MDP,” it would be interesting to analyze the performance with different M settings.
Figure 4.16 gives such an analysis for test cases alu_70, alu_80 and alu_85.
a b c
Fig. 4.15 For test case ctl-70, three different stitch density maps through different detailed
placement algorithms. (a) 335 stitches through TPLPlacer; (b) 330 stitches through TPLPlacer-
MDP; (c) 491 stitches through TPLPlacer-2Stage
a b c
Fig. 4.16 TPLPlacer-MDP performance analyses with different M values for alu design cases. (a)
Impact on stitch numbers. (b) Impact on wire-length improvements (WL). (c) Impact on runtimes
108 4 Standard Cell Compliance and Placement Co-Optimization
From Figs. 4.16a, b we can see that with different M values, “TPLPlacer-MDP”
can achieve similar stitch number and wire-length improvement. It is not hard to
see from Fig. 4.16c that the runtime is related to the M value; i.e., for each test
case, the runtime is nearly a linear function of M. Therefore, we can conclude
that “TPLplacer-MDP” is very robust and insensitive to the M value. In our
implementation, M is set as a small value 8, to maintain both good speed-up and
good performance.
4.6 Summary
References
1. Cork, C., Madre, J.-C., Barnes, L.: Comparison of triple-patterning decomposition algorithms
using aperiodic tiling patterns. In: Proceedings of SPIE, vol. 7028 (2008)
2. Ghaida, R.S., Agarwal, K.B., Liebmann, L.W., Nassif, S.R., Gupta, P.: A novel methodology
for triple/multiple-patterning layout decomposition. In: Proceedings of SPIE, vol. 8327 (2012)
3. Yu, B., Yuan, K., Zhang, B., Ding, D., Pan, D.Z.: Layout decomposition for triple patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 1–8 (2011)
4. Fang, S.-Y., Chen, W.-Y., Chang, Y.-W.: A novel layout decomposition algorithm for triple
patterning lithography. In: ACM/IEEE Design Automation Conference (DAC), pp. 1185–1190
(2012)
5. Tian, H., Zhang, H., Ma, Q., Xiao, Z., Wong, M.D.F.: A polynomial time triple patterning
algorithm for cell based row-structure layout. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 57–64 (2012)
6. Kuang, J., Young, E.F.: An efficient layout decomposition approach for triple patterning
lithography. In: ACM/IEEE Design Automation Conference (DAC), pp. 69:1–69:6 (2013)
7. Yu, B., Gao, J.-R., Pan, D.Z.: Triple patterning lithography (TPL) layout decomposition using
end-cutting. In: Proceedings of SPIE, vol. 8684 (2013)
8. Yu, B., Lin, Y.-H., Luk-Pat, G., Ding, D., Lucas, K., Pan, D.Z.: A high-performance triple
patterning layout decomposer with balanced density. In: IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), pp. 163–169 (2013)
References 109
9. Zhang, Y., Luk, W.-S., Zhou, H., Yan, C., Zeng, X.: Layout decomposition with pairwise
coloring for multiple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 170–177 (2013)
10. Liebmann, L., Pietromonaco, D., Graf, M.: Decomposition-aware standard cell design flows to
enable double-patterning technology. In: Proceedings of SPIE, vol. 7974 (2011)
11. Chen, T.-C., Cho, M., Pan, D.Z., Chang, Y.-W.: Metal-density-driven placement for CMP
variation and routability. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27(12),
2145–2155 (2008)
12. Hu, S., Shah, P., Hu, J.: Pattern sensitive placement perturbation for manufacturability. IEEE
Trans. Very Large Scale Integr. VLSI Syst. 18(6), 1002–1006 (2010)
13. Gupta, M., Jeong, K., Kahng, A.B.: Timing yield-aware color reassignment and detailed
placement perturbation for bimodal cd distribution in double patterning lithography. IEEE
Trans. Comput. Aided Des. Integr. Circuits Syst. 29(8), 1229–1242 (2010)
14. Gao, J.-R., Yu, B., Huang, R., Pan, D.Z.: Self-aligned double patterning friendly configuration
for standard cell library considering placement. In: Proceedings of SPIE, vol. 8684 (2013)
15. Taghavi, T., Alpert, C., Huber, A., Li, Z., Nam, G.-J., Ramji, S.: New placement prediction and
mitigation techniques for local routing congestion. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 621–624 (2010)
16. Ma, Q., Zhang, H., Wong, M.D.F.: Triple patterning aware routing and its comparison with
double patterning aware routing in 14nm technology. In: ACM/IEEE Design Automation
Conference (DAC), pp. 591–596 (2012)
17. Lin, Y.-H., Yu, B., Pan, D.Z., Li, Y.-L.: TRIAD: a triple patterning lithography aware
detailed router. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 123–129 (2012)
18. NanGate FreePDK45 Generic Open Cell Library. http://www.si2.org/openeda.si2.org/projects/
nangatelib (2008)
19. Lucas, K., Cork, C., Yu, B., Luk-Pat, G., Painter, B., Pan, D.Z.: Implications of triple patterning
for 14 nm node design and patterning. In: Proceedings of SPIE, vol. 8327 (2012)
20. Yuan, K., Pan, D.Z.: WISDOM: wire spreading enhanced decomposition of masks in double
patterning lithography. In: IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pp. 32–38 (2010)
21. Fang, S.-Y., Chen, S.-Y., Chang, Y.-W.: Native-conflict and stitch-aware wire perturbation for
double patterning technology. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 31(5),
703–716 (2012)
22. Ghaida, R.S., Agarwal, K.B., Nassif, S.R., Yuan, X., Liebmann, L.W., Gupta, P.: Layout
decomposition and legalization for double-patterning technology. IEEE Trans. Comput. Aided
Des. Integr. Circuits Syst. 32(2), 202–215 (2013)
23. Mentor Graphics.: Calibre verification user’s manual (2008)
24. Predictive Technology Model ver. 2.1. http://ptm.asu.edu (2008)
25. Neapolitan, R., Naimipour, K.: Foundations of Algorithms. Jones & Bartlett Publishers, New
Delhi (2010)
26. Vygen, J.: Algorithms for detailed placement of standard cells. In: IEEE/ACM Proceedings
Design, Automation and Test in Europe (DATE), pp. 321–324 (1998)
27. Kahng, A.B., Tucker, P., Zelikovsky, A.: Optimization of linear placements for wirelength
minimization with free sites. In: IEEE/ACM Asia and South Pacific Design Automation
Conference (ASPDAC), pp. 241–244 (1999)
28. Brenner, U., Vygen, J.: Faster optimal single-row placement with fixed ordering. In:
IEEE/ACM Proceedings Design, Automation and Test in Europe (DATE), pp. 117–121 (2000)
29. Kahng, A.B., Reda, S., Wang, Q.: Architecture and details of a high quality, large-scale ana-
lytical placer. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 891–898 (2005)
30. Goto, S.: An efficient algorithm for the two-dimensional placement problem in electrical circuit
layout. IEEE Trans. Circuits Syst. 28(1), 12–18 (1981)
31. Synopsys IC Compiler. http://www.synopsys.com (2013)
32. Cadence SOC Encounter. http://www.cadence.com (2013)
Chapter 5
Design for Manufacturability with E-Beam
Lithography
5.1 Introduction
As the minimum feature size continues to scale to sub-22 nm, conventional 193 nm
optical photolithography technology is reaching its printability limit. In the near
future, multiple patterning lithography (MPL) has become one of the viable
lithography techniques for 22 nm and 14 nm logic nodes [1–4]. In the long run,
i.e., for the logic nodes beyond 14 nm, extreme ultra violet (EUV), directed self-
assembly (DSA), and electric beam lithography (EBL) are promising candidates
as next generation lithography technologies [5]. Currently, both EUV and DSA
suffer from some technical barriers: EUV technique is delayed due to tremendous
technical issues such as lack of power sources, resists, and defect-free masks [6, 7];
DSA has the potential only to generate contact or via layers [8].
EBL system, on the other hand, has been developed for several decades
[9]. Compared with the traditional lithographic methodologies, EBL has several
advantages. (1) Electron beam can easily be focused into nanometer diameter with
charged particle beam, thus can avoid suffering from the diffraction limitation of
light. (2) The price of a photomask set is getting unaffordable, especially through
the emerging MPL techniques. As a maskless technology, EBL can reduce the
manufacturing cost. (3) EBL allows a great flexibility for fast turnaround times or
late design modifications to correct or adapt a given chip layout. Because of all these
advantages, EBL is being used in mask making, small volume LSI production, and
R&D to develop the technological nodes ahead of mass production.
Conventional EBL system applies variable shaped beam (VSB) technique [10].
In the VSB mode, the entire layout is decomposed into a set of rectangles, through
a process called layout fracturing. Each rectangle is shot into resist by one electron
beam. The printing process of VSB mode is illustrated in Fig. 5.1a. At first an
electrical gun generates the initial beam, which becomes uniform through a shaping
aperture. Then a second aperture finalizes the target shape with a limited maximum
a b
Electrical Guns Electrical Gun
Wafer Wafer
Fig. 5.1 Printing process of conventional EBL system: (a) VSB mode; (b) CP mode
size. Since each pattern needs to be fractured into pieces of rectangles and printed
one by one, the VSB mode suffers from serious throughput problem.
One improved technique in EBL system is called character projection (CP) [10],
where the second aperture is replaced by a stencil. As shown in Fig. 5.1b, in CP
mode some complex shapes, called characters, are prepared on the stencil. The
key idea is that if a pattern is pre-designed on the stencil, it can be printed in one
electronic shot, otherwise it needs to be fractured into a set of rectangles and printed
one by one through the VSB mode. In this way the CP mode can improve the
throughput significantly. In addition, CP exposure has a good critical dimension
control stability compared with VSB [11]. However, the area constraint of stencil
is the bottleneck. For modern design, due to the numerous distinct circuit patterns,
only limited number of patterns can be employed on the stencil. Those patterns not
contained by stencil are still required to be written by VSB [12]. Thus one emerging
challenge in CP mode is how to pack the characters into the stencil to effectively
improve the throughput.
Many previous works dealt with the design optimization for EBL system. For
VSB mode, [13–16] considered EBL as a complementary lithography technique to
print via/cut patterns or additional patterns after multiple patterning mask processes.
Fang et al. [17] integrated EBL constraints into an early routing stage to avoid
stitching line-induced bad patterns for parallel EBL system, and [18, 19] solved
the subfield scheduling problem to reduce the critical dimension distortion. Kahng
et al. [20], Ma et al. [21], Yu et al. [22], and Chan et al. [23] proposed a set
of layout/mask fracturing approaches to reduce the VSB shot number. Besides,
several works solved the design challenges under CP technique. Before stencil
manufacturing, all design steps should consider the character projection, i.e.,
character aware library building [24], technology mapping [25], and character sizing
problem [26]. Besides, [27, 28] proposed several character design methods for both
via layers and interconnect layers to achieve stencil area-efficiency. After stencil
5.2 L-Shape Based Layout Fracturing 113
For EBL writing, a fundamental step is layout fracturing, where the layout pattern
is decomposed into numerous non-overlapping rectangles. Subsequently, the layout
is prepared and exposed by an EBL writing machine onto the mask or the wafer,
where each fractured rectangle is shot by one VSB.
As the minimum feature size decreases, the number of rectangles in the layout
steadily increases. Highly complex OPC causes both the writing time and data
volume to increase as well. The cost, which scales with both writing time and data
volume, increases as a result. Low throughput thus remains the bottleneck for EBL
writing.
To overcome this manufacturing problem, several optimization methods have
been proposed to reduce the EBL writing time to a reasonable level [32–34]. These
optimization methods contain both hardware improvement and software speed-up,
e.g., jog alignment, multi-resolution writing, character projection, and L-shape shot.
One of these, the L-shape shot strategy, is a very simple yet effective approach
to reduce the e-beam mask writing time, thus reducing the mask manufacturing
cost and improving the throughput [32, 33]. This technique can also be applied to
reduce the cost of the lithographic process. Conventional EBL writing is based on
rectangular VSB shots. The electrical gun generates an initial beam, which becomes
uniform through the shaping aperture. Then the second aperture finalizes the target
shape with a limited maximum size. The printing process of the L-shape shot, an
improved technique, is illustrated in Fig. 5.2. To take advantage of this new printing
process, a new fracturing methodology is needed to provide the L-shape in the
fractured layout. One additional aperture, the third aperture, is employed to create
L-shape shots. This strategy can potentially reduce the EBL writing time or cost by
50 % if all rectangles are combined into L-shapes. For example, in Fig. 5.3, instead
of four rectangles, using L-shape fracturing only requires two L-shape shots.
114 5 Design for Manufacturability with E-Beam Lithography
a b
Fig. 5.4 (a) Fracturing with one sliver. (b) Fracturing without any slivers
Note that the layout fracturing problem is different from the general polygon
decomposition problem in geometrical science. Taking into account yield and CD
control, the minimum width of each shot should be above a certain threshold value
. A shot whose minimum width is < is called a sliver. In layout fracturing, sliver
minimization is an important objective [35]. As shown in Fig. 5.4, two fractured
layouts can achieve the same shot number: 2. However, because of the sliver, the
fractured result shown in Fig. 5.4a is worse than that shown in Fig. 5.4b. It should be
noted that the layout in Fig. 5.4 can be written in one L-shaped shot without a sliver.
Several papers have studied the layout fracturing problem for traditional rectan-
gular layouts [20, 21, 35–37]. Kahng et al. proposed an integer linear programming
(ILP) formulation and some speed-up techniques based on matching [20, 35].
Recently, Ma et al. [21] presented a heuristic algorithm that generates rectangular
5.2 L-Shape Based Layout Fracturing 115
shots and further reduces the number of slivers. The L-shape fracturing problem
is newer than the rectangular fracturing problem, so there is only limited work so
far, mostly describing methodology. No systematic algorithm has been proposed
so far. Sahouria and Bowhill [32] reported initial results showing that L-shape
fracturing can save about an additional 38 % of the shot count, but no algorithmic
details were provided. For the general decomposition problem of polygons into
L-shapes, several heuristic methods have been proposed [38, 39]. However, since
these heuristic methods only consider horizontal decomposition, which would result
in numerous slivers, they cannot be applied to the layout fracturing problem.
This section presents the first systematic study of EBL L-shape fracturing that
considers sliver minimization. We propose two algorithms for the L-shape fracturing
problem. The first, called RM, takes rectangles generated by any previous fracturing
framework and merges them into L-shapes. We then use a maximum-weighted
matching algorithm to find the optimal merging solution, simultaneously minimiz-
ing the shot count and the number of slivers. To further overcome the intrinsic
limitations of rectangular merging, we propose a second fracturing algorithm called
DLF. Through effectively detecting and taking advantage of some special cuts,
DLF can directly fracture the layout into a set of L-shapes in O.n2 log n/ time.
Experimental results show that our algorithms are very promising for both shot
count reduction and sliver minimization. DLF can even achieve a significant speed-
up compared with previous state-of-the-art rectangular fracturing algorithms [21].
We first introduce some notations and definitions to facilitate the problem formula-
tion. For convenience, we use the term polygon to refer to rectilinear polygons in
the rest of this chapter.
Let P be an input polygon with n vertices. We define the concave vertices as
follows:
Definition 5.1 (Concave Vertex). The concave vertex of a polygon is one at which
the internal angle is 270ı .
Let c be the number of concave vertices in P; [40] gave the relationship between
n and c: n D 2c C 4. If the number of concave vertices c is odd, polygon P is called
an odd polygon; otherwise, P is an even polygon.
Definition 5.2 (Cut). A cut of a polygon P is a horizontal or vertical line segment
that has at least one endpoint incident on a concave vertex. The other endpoint is
obtained by extending the line segment inside P until it first encounters the boundary
of P.
If both endpoints of a cut are concave vertices in the original polygon, then the
cut is called a chord. If a cut has an odd number of concave vertices lying on one side
or the other, then the cut is called an odd-cut. If an cut is both an odd-cut and chord,
116 5 Design for Manufacturability with E-Beam Lithography
it is called an odd-chord. These concepts are illustrated in Fig. 5.5, where vertices
N ej
b; e; h are concave vertices, edges bh; N are odd-cuts, and edge bhN is a chord. Note
N
that bh is also an odd-chord.
Definition 5.3 (L-Shape). An L-shape is a polygon whose shape is in the form of
the letter “L”.
An L-shape can also be viewed as a combination of two rectangles with a
common coordinate. There are two easy ways to check whether a polygon is an
L-shape. First, we can check whether the number of vertices equals 6, i.e., n D 6.
There must then only be one concave vertex, i.e., c D 1.
Definition 5.4 (Sliver Length). For an L-shape or a rectangle, if the width of its
bounding box B is above , its sliver length is 0. Otherwise, the sliver length is the
length of B.
Problem 5.1 (L-Shape Based Layout Fracturing). Given an input layout speci-
fied by polygons, our goal is to fracture it into a set of L-shapes and/or rectangles.
We also wish to minimize both the number of shots and the silver length of fractured
shots.
a b c
Fig. 5.6 Example of RM algorithm. (a) Graph construction. (b) Maximum matching
result. (c) Corresponding rectangular merging
We first construct a merging graph G to represent the relationships among all the
input rectangles. Each vertex in G represents a rectangle. There is an edge between
two vertices if and only if those two rectangles can be merged into an L-shape.
Figure 5.6 shows an example where four rectangles are generated after rectangular
fracturing. The constructed merging graph G is illustrated in Fig. 5.6a, where the
three edges show that there are three ways to generate L-shapes. L-shape merging
can thus be viewed as edge selection from the merging graph G. Note that one
rectangle can only be assigned to one selected edge; that is, no two selected edges
share a common endpoint. For example, rectangle 2 can only belong to one L-shape.
Only one edge can then be chosen between edges 12 N and 23.
N
By utilizing the merging graph, the best edge selection can be solved by finding
a maximum matching. The rectangular merging can therefore be formulated as a
maximum matching problem. In the case of Fig. 5.6, the result of the maximum
matching is illustrated in Fig. 5.6b, and the corresponding L-shape fracturing result
is shown in Fig. 5.6c.
To take sliver minimization into account, we assign weights to the edges to
represent whether the merging would remove one sliver. For example, if there is
still one sliver even after two rectangles vi and vj are merged into one L-shape,
we assign a smaller weight to edge eij . Otherwise, a larger weight is assigned. The
rectangular merging can thus be formulated as maximum weighted matching. Even
in general graphs, the maximum weighted matching can be solved in O.nm log n/
time [41], where n is the number of vertices and m the number of edges in G.
Although the RM algorithm described above can provide the optimal merging solu-
tion for a given set of rectangles, it may suffer from several limitations. The polygon
is first fractured into rectangles and then merged. This strategy, however, has some
redundant operations. In Fig. 5.3, instead of a complex rectangle generation, one
cut is sufficient for the L-shape fracturing. The rectangular fracturing may also
ignore some internal features of L-shape fracturing, which could affect overall
118 5 Design for Manufacturability with E-Beam Lithography
a b c d e
Fig. 5.7 Five fracturing solutions with the same shot count
they may reduce the L-shape upper bound. We will then perform sliver-aware chord
selection to decompose the original polygon P into a set of sub-polygons. Then for
each sub-polygon, we will perform sliver-aware L-shape fracturing, where odd-cuts
are detected and selected to iteratively cut the polygon into a set of L-shapes. We
differentiate between chords and cuts during polygon fracturing because chords are
special cuts whose endpoints were both concave points in the original polygon. This
helps us to design more efficient algorithms for odd cut/chord detection.
The first step of the DLF algorithm is sliver-aware chord selection. Cutting through
chords decomposes the whole polygon P into a set of sub-polygons, reducing the
problem size. We can prove that cutting through a chord does not increase the L-
shape upper bound Nup .
Lemma 5.2. Decomposing a polygon by cutting through a chord does not increase
the L-shape upper bound number Nup .
Proof. Cut a polygon along a chord, and let c1 and c2 be the number of concave
vertices in the two pieces produced. Since c D c1 C c2 C 2, then using Lemma 5.1
we have:
Chord selection has been proposed in rectangular fracturing [20, 21]. For L-shape
fracturing, we will specifically select odd-chords, since they can reduce the number
of L-shapes.
Lemma 5.3. Decomposing an even polygon along an odd-chord can reduce the
L-shape upper bound number Nup by 1.
The proof is similar to that for Lemma 5.2. The only difference is that since c is
even and c1 ; c2 are odd, bc1 =2c C 1 C bc2 =2c C 1 < bc=2c C 1. Note that for an odd
polygon, all chords are odd. For a even polygon, Lemma 5.3 provides a guideline
N
to select chords. An example is illustrated in Fig. 5.9, which contains two chords bh
N Since the number of concave vertices to both sides of chord bh
and hk. N are odd (1),
N is an odd-chord. Cut along bh,
bh N as shown in Fig. 5.9a, can achieve two L-shots.
However, a cut along another chord hk, N which is not an odd-chord, would create
three shots. Note that, in an odd polygon, although all chords are odd, cutting along
them may not reduce Nup , but Nup is guaranteed not to increase.
For any even polygon P, we propose the following odd-chord search procedure.
Each vertex vi is assigned one Boolean parity pi . Starting from an arbitrary vertex
with any parity assignment, we proceed clockwise around the polygon. If the next
vertex vj is concave, then pj D :pi , where pi is the parity of the current vertex vi .
120 5 Design for Manufacturability with E-Beam Lithography
a b
which can be performed in polynomial time. It should be noted that if the input
polygon is an even polygon, because of Theorem 5.1, we preferentially choose
odd-chords. The bipartite graph is thus modified by assigning weights to the edges.
Sliver minimization is also integrated into the chord selection. When an odd-chord
candidate is detected, we calculate the distance between it and the boundary of the
polygon. If the distance is less than , cutting this odd-chord would cause a sliver,
so we will discard this candidate.
Algorithm 15 LShapeFracturing(P)
Require: Polygon P.
1: if P is L-shape or rectangle then
2: Output P as one of results; return
3: end if
4: Find all odd-cuts;
5: Choose cut cc considering the sliver minimization;
6: if Cannot find legal odd-cut then
7: Generate an auxiliary cut cc;
8: end if
9: Cut P through cc into two polygons P1 and P2;
10: Update one vertex and four edges;
11: LShapeFracturing(P1);
12: LShapeFracturing(P2);
Fig. 5.12 Only one vertex and four edges need to be updated during polygon decomposition
Note that during the polygon decomposition, we do not need to recalculate the
order number and parity for each vertex. Instead, when a polygon is divided into two
parts, we only update one vertex and four edges, maintaining all other information.
If polygon P is cut using odd-cut .a; bc/,
N a new vertex d is generated. For the new
vertex d, its order number od D ob and its parity pd D pb . Edge bc N is replaced
by edges bdN and dc.
N Two edges ad N and da N are then inserted. The update method is
simple and easy to implement. An example of such an update is shown in Fig. 5.12.
Sliver minimization is integrated into our L-shape fracturing algorithm. In
Algorithm 15, when picking one cut from all detected odd-cuts, we try to avoid
5.2 L-Shape Based Layout Fracturing 123
a b
Fig. 5.13 Auxiliary cut generation. (a) Here every odd-cut would cause sliver. (b) Decompose
through on auxiliary cut can avoid sliver
creating any slivers. For example, as illustrated in Fig. 5.13a, there are three odd-
cuts, but all of them would create a sliver. Instead of selecting one of these, we
generate an auxiliary cut in the middle (see Fig. 5.13b). The resulting polygon can
be fractured without introducing a sliver. In general, if there are several odd-cuts
that do not cause slivers, we pick the cut using the following rules: (1) We prefer the
cut that partitions the polygon into two balanced sub-polygons; (2) If the polygon is
more horizontal than vertical, we prefer a vertical cut, and vice versa.
Given a polygon with n vertices, finding all concave vertices takes O.n/ time. For
each concave vertex vi , searching for cut that starts there requires O.log n/ time. Due
to Theorem 5.2, checking whether a cut is an odd-cut can be performed in O.1/, thus
finding all odd-cuts can be finished in O(n log n) time. Note that given a polygon
with c concave vertices, if no auxiliary cut is generated, the L-shape fracturing can
be completed using bc=2c odd-cuts. When auxiliary cuts are applied, there are at
most c 1 cuts made that fracture the input polygon. This leads to the following
theorem.
Theorem 5.3. The sliver aware L-shape generation can find a set of L-shapes in
O.n2 log n/ time.
Note that if our objective is only to minimize the shot number, no auxiliary cuts
would be introduced. Thus, at most bc=2cC1 L-shapes would be generated. In other
words, the shot number would be bounded by the theoretical upper bound Nup .
Speed-Up Technique
We observe that in practice during the execution of Algorithm 15, many odd-cuts
do not intersect. This implies that multiple odd-cuts are compatible, and could be
used to decompose the polygon at the same time. Instead of only picking one odd-
cut at one time, we can achieve further speed-up by selecting multiple odd-cuts
simultaneously.
If the polygon is an odd polygon, this speed-up is easily implemented. In the odd
polygon, there is only one type of odd-cut: a cut that has an odd number of concave
124 5 Design for Manufacturability with E-Beam Lithography
vertices on each side. Partitioning the polygon along such an odd-cut leaves all other
remaining odd-cuts as odd-cuts. For example, Fig. 5.14a shows an odd polygon,
where all three odd-cuts are compatible, and can be chosen simultaneously. Through
fracturing the polygon along the three odd-cuts, the L-shape fracturing problem is
resolved in one step.
However, this speed-up technique cannot be directly applied to an even polygon,
since it may actually increase the shot number. This is because an even polygon
does not guarantee that odd-cuts will remain odd-cuts in the resulting sub-polygons.
For example, in Fig. 5.15a, all six cuts are odd-cuts and compatible. However,
if we use all these compatible cuts for fracturing, we would end up with seven
rectangular shots, which is obviously sub-optimal. To overcome this issue, we
introduce one artificial concave vertex for each even-polygon. This artificial concave
vertex converts the polygon into an odd polygon. As shown in Fig. 5.15b, in the
resulting odd polygon, all compatible odd-cuts can be used for fracturing without
increasing the shot number. Because of Lemma 5.4, this translation is guaranteed
not to increase the total shot number.
Lemma 5.4. Introducing one artificial concave vertex to an even polygon does not
increase the L-shape upper bound Nup .
When employing this speed-up technique, for most cases, the odd-cut detection
only needs to be performed once, so the DLF algorithm can generally be completed
in O.n log n/ time in practice.
5.3 OSP for MCC System 125
to achieve better throughput. In modern MCC system, there are more than 1300
character projections (CPs) [46]. Since one CP is associated with one stencil, there
are more than 1300 stencils in total. The manufacturing of stencil is similar to
mask manufacturing. If each stencil is different, then the stencil preparation process
could be very time consuming and expensive. Due to the design complexity and
cost consideration, different CPs share one stencil design. One example of MCC
printing process is illustrated in Fig. 5.17, where four CPs are bundled to generate
an MCC system. In this example, the whole wafer is divided into four regions,
w1 ; w2 ; w3 , and w4 , and each region is printed through one CP. The whole writing
time of the MCC system is determined by the maximum one among the four regions.
For modern design, because of the numerous distinct circuit patterns, only limited
number of patterns can be employed on stencil. Since the area constraint of stencil
is the bottleneck, the stencil should be carefully designed/manufactured to contain
the most repeated cells or patterns.
Stencil planning is one of the most challenging problems in CP mode, and has
earned more and more attentions. When blank overlapping is not considered, this
problem equals to a character selection problem, which can be solved through an
integer linear programming (ILP) formulation to maximize the system throughput
[47]. When the characters can be overlapped to save more stencil space, the
corresponding stencil planning is referred to as overlapping-aware stencil planning
(OSP). As suggested in [34], the OSP problem can be divided into two types: one
dimensional (1D) problem and two dimensional (2D) problem.
In the one dimensional OSP problem, the standard cells with same height are
selected into the stencil. As shown in Fig. 5.18a, each character implements one
standard cell, and the enclosed circuit patterns of all the characters have the same
height. Note that here we only show the horizontal blanks, and the vertical blanks
are not represented because they are identical. Yuan et al. [34] proposed a set
of heuristics, and the single row reordering was formulated as minimum cost
Hamiltonian path. Kuang and Young [48] proposed an integrated framework to solve
all the sub-problems effectively: character selection, row distribution, single-row
128 5 Design for Manufacturability with E-Beam Lithography
Electrical Guns
Shaping Apentures
Stencils
w1 w2 4 Regions on Wafer
w3 w4
w
Fig. 5.17 Printing process of MCC system, where four CPs are bundled
a b
F
D E F D E
A B C A C
B
Fig. 5.18 Two types of OSP problem: (a) one dimensional problem; (b) two dimensional problem
ordering, and inter-row swapping. Guo et al. [49] proved that single row reordering
can be optimally solved in polynomial time. In addition, [50, 51] assumed that the
pattern position in each character can be shifted, and integrated the character re-
design into OSP problem.
In the two dimensional OSP problem, the blank spaces of characters are non-
uniform along both horizontal and vertical directions. By this way, stencil can
contain both complex via patterns and regular wires (see Fig. 5.18b for an example).
Yuan et al. [34] solved the problem through a modified floorplanning engine, while
[52] further speed-up the engine through clustering technique. Kuang and Young
[53] proposed a set of fast and effective deterministic methods to solve this problem.
Compared with conventional EBL system, MCC system introduces two main
challenges in OSP problem. (1) The objective is new: in MCC system the wafer
5.3 OSP for MCC System 129
is divided into several regions, and each region is written by one CP. Therefore
the new OSP should minimize the maximal writing time of all regions. However,
in conventional EBL system the objective is simply to minimize the wafer writing
time. (2) The stencil for an MCC system can contain more than 4000 characters, and
previous methodologies for EBL system may suffer from runtime penalty. However,
no existing stencil planning work has been done toward the MCC system. This
section presents a powerful tool, E-BLOW, to overcome both the challenges. Our
main contributions are summarized as follows:
• We show that both 1D-OSP and 2D-OSP problems are NP-hard.
• We formulate integer linear programming (ILP) to co-optimizing characters
selection and physical placements on stencil. To the best of our knowledge, this
is the first mathematical formulation for both 1D-OSP and 2D-OSP.
• We propose a simplified formulation for 1D-OSP, and prove its rounding lower
bound theoretically.
• We present a successive relaxation algorithm to find a near optimal solution.
• We design a KD-Tree based clustering algorithm to speed up 2D-OSP solution.
In this section, we give the problem formulation. For convenience, Table 5.2 lists
the notations used in this paper. Note that in this section, we denote Œn as a set of
integers f1; 2; : : : ; ng.
In an MCC system with K CPs, the whole wafer is divided into K regions, and
each region is written by one particular CP. We assume cell extraction [12] has been
resolved first. In other words, a set of n character C D fc1 ; : : : ; cn g has already been
given to the MCC system. For each character ci 2 C, its width is wi . Meanwhile, the
writing time through VSB mode and CP mode are ai and 1, respectively. The stencil
is divided into m rows, and the width of each row is W. For each row j, bij indicates
whether character ci is selected on row j:
1; candidate ci is selected on row j
bij D
0; otherwise
Different regions have different layout patterns, thus the throughputs would also
be different. For region k (k 2 ŒK), character ci 2 C repeats tik times. If ci is
prepared on stencil, the total writing time of character ci on region rc is tik 1.
Otherwise, ci should be printed through VSB, and its writing time would be tik ai .
Therefore, for region k (k 2 ŒK) the writing time Tk is as follows:
X XX
Tk D tik ai bij tik .ai 1/ (5.1)
i2Œn i2Œn j2Œm
where the first term is the writing time using VSB mode, while the second term is
the writing time improvement through CP mode. Therefore, the total writing time
of the MCC system is formulated as follows:
T D maxfTk g (5.2)
k2ŒK
Based on the notations above, we define OSP for MCC system problem as
follows.
Problem 5.2 (OSP for MCC System). In an MCC system, given a stencil with m
row, and each row is with width W. A set of character C is also given. We select a
subset of characters in C, and place them on the stencil. The objective is to minimize
the MCC system writing time T expressed by Eq. (5.2), while the placement of all
characters is bounded by the outline of stencil.
For convenience, we use the term OSP to refer to OSP for MCC system in the rest
of this chapter. Note that when region number K is 1, then MCC system equals to
conventional EBL system. Therefore, our proposed methodologies can also handle
the OSP problem for conventional EBL system.
In this section we prove that both 1D-OSP and 2D-OSP problems are NP-hard.
Guo et al. [49] proves that a sub-problem of 1D-OSP, single row ordering, can
be optimally solved in polynomial time. Mak and Chu [51] proves that when
an additional problem, character re-design, is integrated, the extended 1D-OSP
problem is NP-hard. Although we know that a sub-problem of 1D-OSP is in P [49],
while an extended problem of 2D-OSP is NP-hard [51], the complexity of 1D-OSP
5.3 OSP for MCC System 131
itself is still an open question. This is the first work proving the complexity of 1D-
OSP problem. In addition, we will show that even a simpler version of 1D-OSP
problem, where there is only one row, is NP-hard as well.
To facilitate the proof, we first define a bounded subset sum (BSS) problem as
follows.
Problem 5.3 (Bounded Subset Sum). Given a list of n numbers fx1 ; : : : ; xn g and
4
a number s, where 8i 2 Œn 2 xi > xmax .D max jxi j/, decide if there is a subset of
i2Œn
the numbers that sums up to s.
For example, given three numbers 1100; 1200; 1413 and T D 2300, we can find
a subset f1100; 1200g such that 1100 C 1200 D 2300. Additionally, we have the
assumption that t > c xmax , where c is some constant. Otherwise it be solved in
O.nc / time. Besides, without the bounded constraint 8i 2 Œn 2 xi > xmax , the
BSS problem becomes a Subset sum problem, which is in NP-complete [54]. For
simplicity of later explanation, let S denote the set of n numbers. Note that, we can
assume that all the numbers are integer numbers.
Lemma 5.5. BSS problem is in NP.
Proof. It is easy to see that BSS problem is in NP. Given a subset of integer numbers
S0 2 S, we can add them up and verify that their sum is s in polynomial time.
We further prove BSS problem is NP-hard through showing that it can be reduced
from 3SAT problem, which is NP-complete [54].
Lemma 5.6. 3SAT p BSS.
Proof. In 3SAT problem, we are given m clauses fC1 ; C2 ; : : : ; Cm g over n variables
fy1 ; y2 ; : : : ; yn g. Besides, there are three literals in each clause, which is the OR of
some number of literals. Equation (5.3) gives one example of 3SAT, where n D 4
and m D 2:
numbers in set S and s are in base 10. Besides, 10nC2m < yi < 2 10nC2m , so that
the bounded constraints are satisfied. All the details regarding S and s are defined as
follows:
• In the set S, all integer numbers are with n C 2m C 1 digits, and the first digit is
always 1.
• In the set S, we construct two integer numbers ti and fi for the variable yi . For both
of the values, the n digits after the first “1” serve to indicate the corresponding
variable in S. That is, the ith digit in these n digits is set to 1 and all others are 0.
For the next m digits, the jth digit is set to 1 if the clause Cj contains the respective
literal. The last m digits are always 0.
• In the set S, we also construct three integer numbers cj1 ; cj2 , and cj3 for each
clause Cj . In cjk where k D f1; 2; 3g, the first n digits after the first “1” are 0, and
in the next m digits all are 0 except the jth index setting to k. The last m digits are
all 0 except the jth index setting to 1.
• T D .n C m/ 10nC2m C s0 , where s0 is an integer number with n C 2m digits.
The first n digits of s0 are 1, in the next m digits all are 4, and in the last m digits
all are 1.
Based on the above rules, given the 3SAT instance in Eq. (5.3), the constructed
set S and target s are shown in Fig. 5.19. Note that the highest digit achievable is 9,
meaning that no digit will carry over and interfere with other digits.
Claim. The 3SAT instance has a satisfying truth assignment iff the constructed BSS
instance has a subset that adds up to s.
Proof of ) Part of Claim If the 3SAT instance has a satisfying assignment, we
can pick a subset containing all ti for which yi is set to true and fi for which yi is set
to false. We should then be able to achieve s by picking the necessary cjk to get 4’s
in the s. Due to the last m “1” in s, for each j 2 Œm only one would be selected from
fcj1 ; cj2 ; cj3 g. Besides, we can see totally n C m numbers would be selected from S.
Proof of ( Part of Claim If there is a subset S0 2 S that adds up to s, we will
show that it corresponds to a satisfying assignment in the 3SAT instance. S0 must
include exactly one of ti and fi , otherwise the ith digit value of s0 cannot be satisfied.
If ti 2 S0 , in the 3SAT we set yi to true; otherwise we set it to false. Similarly, S0
must include exactly one of cj1 ; cj2 , and cj3 , otherwise the last m digits of s cannot be
satisfied. Therefore, all clauses in the 3SAT are satisfied and 3SAT has a satisfying
assignment.
For instance, given a satisfying assignment of Eq. (5.3): hy1 D 0; y2 D 1; y3 D
0; y4 D 0i, the corresponding subset S0 is ff1 D 110000100; t2 D 101000100;
f3 D 100101000; f4 D 100011100; c12 D 100002010; c21 D 100000101g. We set
s D .m C n/ 10nC2m C s0 , where s0 D 11114411, and then s D 611114411. We can
see that f1 C t2 C f3 C f4 C s12 C s21 D s.
Combining Lemmas 5.5 and 5.6, we can achieve the following theorem.
Theorem 5.4. BSS problem is NP-complete.
In the following, we will show that even a simpler version of 1D-OSP problem
is NP-hard. In the simpler version, there is only one row in the stencil, and each
character ci 2 C is with the same length w. Besides, for each character, its left blank
and right blank are symmetric.
Definition 5.5 (Minimum Packing). Given a subset of characters C0 2 C, its
minimum packing is the packing with the minimum stencil length.
Lemma 5.7. Given a set of character C D fc1 ; c2 ; : : : ; cn g placed on a single row
stencil. If for each character ci 2 C, both of its left and right blanks are si , then the
minimum packing is with the following stencil length
X
nw si C maxfsi g (5.4)
i2Œn
i2Œn
P
Meanwhile, the minimum total writing time in the 1D-OSP is xi s.
i2Œn
a b
900 900
c0 c1
0 2000 3100 4300
c2 c3 c0 c1 c2
Fig. 5.20 (a) 1D-OSP instance for the BSS instance S D f1100; 1200; 2000g and s D 2300.
(b) The minimum packing is with stencil length M C s D 2000 C 2300 D 4300
5.3 OSP for MCC System 135
up to s.
Theorem 5.5. 1D-OSP is NP-hard.
Proof. Directly from Lemma 5.8 and Theorem 5.4.
1D-OSP problem is a special case of 2D-OSP problem, when all the characters
share the same height. Therefore, from Theorem 5.5 we can see that 2D-OSP
problem is NP-hard as well.
When each character implements one standard cell, the enclosed circuit patterns of
all the characters have the same height. The corresponding OSP problem is called
1D-OSP, which can be viewed as a combination of character selection and single
row ordering problems [34]. Different from two-step heuristic proposed in [34], we
show that these two problems can be solved simultaneously through a unified ILP
formulation (5.5).
min T (5.5)
X XX
s.t T tik ai bij tik .ai 1/ 8k 2 ŒK (5.5a)
i2Œn i2Œn j2Œm
0 xi W w 8i 2 Œn (5.5b)
X
bij 1 8i 2 Œn (5.5c)
j2Œm
xi C wii0 xi0 W.2 C pii0 bij bi0 j / 8i; i0 2 Œn; j 2 Œm (5.5d)
0
xi0 C wi0 i xi W.3 pii0 bij bi0 j / 8i; i 2 Œn; j 2 Œm (5.5e)
bij ; bi0 j ; pii0 2 f0; 1g 8i; i0 2 Œn; j 2 Œm (5.5f)
In formulation (5.5), W is the stencil width and m is the number of rows. For each
character ci , wi , and xi are the width and the x-position, respectively. If and only if
ci is assigned to row j, bik D 1. Otherwise, bij D 0.
Constraint (5.5a) is derived from Eqs. (5.1) and (5.2). Constraint (5.5b) is for the
x-position of each character. Constraint (5.5c) is to make sure one character can be
inserted into at most one row. Constraints (5.5d), (5.5e) are used to check position
relationship between ci and c0i . Here wii0 D wi oii0 and wi0 i D wi0 ohi0 i , where
oii0 is the overlapping when candidates ci and c0i are packed together. Only when
136 5 Design for Manufacturability with E-Beam Lithography
bij D bi0 j D 1, i.e., both character ci and character ci0 are assigned to row j, one of
the two constraints (5.5d), (5.5e) will be active. Besides, all the pii0 values are self-
consistent. For example, for any three characters c1 ; c2 ; c3 being assigned to row j,
i.e., b1j D b2j D b3j D 1, if c1 is on the left of c2 (p12 D 0) and c2 is on the left of
c3 (p23 D 0), then c1 should be on the left of c3 (p13 D 0). Similarly, if c1 is on the
right of c2 (p12 D 1) and c2 is on the right of c3 (p23 D 1), then c1 should be on the
right of c3 (p13 D 1) as well.
Since ILP is a well-known NP-hard problem, directly solving it may suffer from
long runtime penalty. One straightforward speed-up method is to relax the ILP into
the corresponding linear programming (LP) through replacing constraints (5.5f) by
the following:
It is obvious that the LP solution provides a lower bound to the ILP solution.
However,
P we observe that the solution of relaxed LP could be like this: for each
i, j bij D 1 and all the pii0 are assigned 0:5. Although the objective function is
minimized and all the constraints are satisfied, this LP relaxation provides no useful
information to guide future rounding, i.e., all the character candidates are selected
and no ordering relationship is determined.
To overcome the limitation of above rounding, E-BLOW proposes a novel
successive rounding framework to search near-optimal solution in reasonable
runtime. The main idea is to modify the ILP formulation, so that the corresponding
LP relaxation can provide good lower bound theoretically.
To overcome the limitation of above rounding, we propose a novel successive
rounding framework, E-BLOW, to search for near-optimal solution in reasonable
runtime. As shown in Fig. 5.21, the overall flow includes five parts: Simplified
ILP formulation, Successive Rounding, Fast ILP Convergence, Refinement, and
Post-Insertion. In Sect. 5.3.3 the simplified formulation will be discussed, and its
LP rounding lower bound will be proved. In Sect. 5.3.3 the details of successive
rounding would be introduced. In Sect. 5.3.3 the refinement process is proposed.
At last, to further improve the performance, in Sect. 5.3.3 the post-swap and post-
insertion techniques are discussed.
As discussed above, solving the ILP formulation (5.5) is very time-consuming, and
the related LP relaxation may be bad in performance. To overcome the limitations
of (5.5), in this section we introduce a simplified ILP formulation, whose LP
relaxation can provide good lower bound. The simplified formulation is based
on a symmetrical blank (S-Blank) assumption: the blanks of each character are
symmetric, i.e., left blank equals to right blank. si is used to denote the blank of
character ci . Note that for different characters ci and ci0 , their blanks si and si0 can
be different.
5.3 OSP for MCC System 137
Regions Info
Characters
Info
No Yes
Stop?
Output 1D-Stencil
At first glance the S-Blank assumption may lose optimality. However, it provides
several practical and theoretical benefits. (1) In [34] the single row ordering problem
was transferred into Hamilton Cycle problem, which is a well-known NP-hard
problem and even particular solver is quite expensive. In our work, instead of relying
on expensive solver, under this assumption the problem can be optimally solved
in O.n/. (2) Under S-Blank assumption, the ILP formulation can be effectively
simplified to provide a reasonable rounding bound theoretically. Compared with
previous heuristic framework [34], the proved rounding bound provides a better
guideline for a global view search. (3) To compensate the inaccuracy in the
asymmetrical blank cases, E-BLOW provides a refinement (see Sect. 5.3.3).
The simplified ILP formulation is shown in Formula (5.6).
XX
max bij pi (5.6)
i2Œn j2Œm
X
s.t. .w si / bij W Bj 8j 2 Œm (5.6a)
i2Œn
.5:6c/ .5:6d/
where smax is the maximum horizontal blank length of every character, i.e.,
smax D maxfsi g
i2Œn
Algorithm 16 SuccRounding ( )
Require: ILP Formulation (5.6)
1: Set all bij as unsolved;
2: repeat
3: Update pi for all unsolved bij ;
4: Solve relaxed LP of (5.6);
5: repeat
6: bpq maxfbij };
7: for all bij bpq ˇ do
8: if ci can be assigned to row rj then
9: bij D 1 and set it as solved;
10: Update capacity of row j;
11: end if
12: end for
13: until cannot find bpq
14: until
Successive Rounding
where tk is current writing time of region k, and tmax D maxftc g. Through applying
k2ŒK
the pi , the region k with longer writing time would be considered more during the
LP formulation. During successive rounding, if character ci is not assigned to any
row, pi would continue to be updated, so that the total writing time of the whole
MCC system can be minimized.
During successive rounding, for each LP iteration, we select some characters into
rows, and set these characters as solved. In the next LP iteration, only unsolved
characters would be considered in formulation. Thus the number of unsolved
characters continues to decrease through the iterations. For four test cases (1M-1
to 1M-4), Fig. 5.22 illustrates the number of unsolved characters in each iteration.
We observe that in early iterations, more characters would be assigned to rows.
However, when the stencil is almost full, fewer of bij could be close to 1. Thus, in
late iterations only few characters would be assigned into stencil, and the successive
rounding requires more iterations.
To overcome this limitation so that the successive rounding iteration number can
be reduced, we present a convergence technique based on fast ILP formulation. The
basic idea is that when we observe only few characters are assigned into rows in one
LP iteration, we stop successive rounding in advance, and call fast ILP convergence
Fig. 5.22 Unsolved character number along the LP iterations for test cases 1M-1, 1M-2, 1M-3,
and 1M-4
5.3 OSP for MCC System 141
Fig. 5.23 For test case 1M-1, solution distribution in last LP, where most of values are close to 0
to assign all left characters. Note that in [48] an ILP formulation with similar idea
was also applied. The details of the ILP convergence is shown in Algorithm 17. The
inputs are the solutions of successive LP rounding. Besides, ı and ı C are two user-
defined parameters. First we check each bij (lines 1–9). If bij < ı , then we assume
that character ci would not be assigned to row j, and set bij as solved. Similarly, if
bij > ı C , we assign ci to row j and set bij as solved. For those unsolved bij we build
up ILP formulation (5.6) to assign final rows (lines 10–13).
At first glance the ILP formulation may be expensive to solve. However, we
observe that in our convergence Algorithm 17, typically the variable number is
small. Figure 5.23 illustrates the solution distribution in last LP formulation. We can
see that most of the values are close to 0. In our implementation ı and ı C are set to
0.1 and 0.9, respectively. For this case, although the LP formulation contains more
than 2500 variables, our fast ILP formulation results in only 101 binary variables.
142 5 Design for Manufacturability with E-Beam Lithography
a
D A B C
b c
D A D A B C
d e
D A B D A C
Fig. 5.24 Greedy based single row ordering. (a) At first all candidates are sorted by blank space.
(c) One possible ordering solution where each candidate chooses the right end position. (e) Another
possible ordering solution
Refinement
Refinement is a stage to solve the single row ordering problem [34], which adjusts
the relative locations of input p characters to minimize the total width. Under the
S-Blank assumption, because of Lemma 5.7, this problem can be optimally solved
through the following two-step greedy approach:
1. All characters are sorted decreasingly by blanks;
2. All characters are inserted one by one. Each one can be inserted at either left end
or right end.
One example of the greedy approach is illustrated in Fig. 5.24, where four
character candidates A, B, C, and D are to be ordered. In Fig. 5.24a, they are sorted
decreasingly by blank space. Then all the candidates are inserted one by one. From
the second candidate, each insertion has two options: left side or right side of the
whole packed candidates. For example, if A is inserted at the right of D, B has two
insertion options: one is at the right side of A (Fig. 5.24b), another is at the left
side of A (Fig. 5.24d). Given different choices of candidate B, Fig. 5.24c, e give
corresponding final solutions. Since from the second candidate each one has two
choices, by this greedy approach n candidates will generate 2n1 possible solutions.
For the asymmetrical cases, the optimality does not hold anymore. To com-
pensate the losing, E-BLOW consists of a refinement stage. For n characters
{c1 ; : : : ; cn }, single row ordering can have nŠ possible solutions. We avoid enu-
merating such huge solutions, and take advantage of the order in symmetrical blank
assumption. That is, we pick up one best solution from the 2n1 possible ones. Noted
that although considering 2n1 instead of nŠ options cannot guarantee optimal single
row packing, our preliminary results show that the solution quality loss is negligible
in practice.
The refinement is based on dynamic programming, and the details are shown
in Algorithm 18. Refine(k) generates all possible order solutions for the first k
characters {c1 ; : : : ; ck }. Each order solution is represented as a set .w; l; r; O/, where
5.3 OSP for MCC System 143
Algorithm 18 Refine(k)
Require: k characters {c1 ; : : : ; ck };
1: if k = 1 then
2: Add .w1 ; sl1 ; sr1 ; fc1 g/ into S;
3: else
4: Refine(k 1);
5: for each partial solution .w; l; r; O/ do
6: Remove .w; l; r; O/ from S;
7: Add .w C wk min.srk ; l/; slk ; r; fck ; Og/ into S;
8: Add .w C wk min.slk ; r/; l; srk ; fO; ck g/ into S;
9: end for
10: if size of S then
11: Prune inferior solutions in S;
12: end if
13: end if
w is the total length of the order, l is the left blank of the left character, r is the
right blank of the right character, and O is the character order. At the beginning,
an empty solution set S is initialized (line 1). If k D 1, then an initial solution
.w1 ; sl1 ; sr1 ; fc1 g/ would be generated (line 2). Here w1 ; sl1 , and sr1 are the width of
first character c1 , left blank of c1 , and right blank of c1 . If k > 1, then Refine.k/ will
recursively call Refine.k 1/ to generate all old partial solutions. All these partial
solutions will be updated by adding candidate ck (lines 5–9).
We propose pruning techniques to speed-up the dynamic programming process.
Let us introduce the concept of inferior solutions. For any two solutions SA D
.wa ; la ; ra ; Oa / and SB D .wb ; lb ; rb ; Ob /, we say SB is inferior to SA if and only
if wa wb , la lb and ra rb . Those inferior solutions would be pruned during
pruning section (lines 10–12). In our implementation, the is set to 20.
a b
b
row 1
a b c
a
row 2
c 1 2
Fig. 5.25 Example of maximum weighted matching based post character insertion. (a) Three
additional characters a; b; c and two rows. (b) Corresponding bipartite graph to represent the
relationships among characters and rows
relationships among characters and rows (see Fig. 5.25b). Each edge is associated
with a cost as character’s profit. By utilizing the bipartite graph, the best character
insertion can be solved by finding a maximum weighted matching.
Given n additional characters, we search the possible insertion position under
each row. The total search time needs O.nmC/ time, where m is the total row number
and C is the maximum character number on each row. We propose two heuristics
to speed-up the search process. First, to reduce n, we only consider those additional
characters with high profits. Second, to reduce m, we skip those rows with very little
empty spaces.
Now we consider a more general case: the blanking spaces of characters are non-
uniform along both horizontal and vertical directions. This problem is referred to as
2D-OSP problem. In [34] the 2D-OSP problem was transformed into a floorplanning
problem. However, several key differences between traditional floorplanning and
OSP were ignored. (1) In OSP there is no wirelength to be considered, while
at floorplanning wirelength is a major optimization objective. (2) Compared with
complex IP cores, lots of characters may have similar sizes. (3) Traditional
floorplanner could not handle the problem size of modern MCC design.
ILP Formulation
Here we will show that 2D-OSP can be formulated as integer linear programming
(ILP) as well. Compared with 1D-OSP, 2D-OSP is more general: the blanking
spaces of characters are non-uniform along both horizontal and vertical directions.
The 2D-OSP problem can also be formulated as an ILP formulation (5.9). For
convenience, Table 5.3 lists some notations used in the ILP formulation. The
formulation is motivated by Sutanthavibul et al. [58], but the difference is that
our formulation can optimize both placement constraints and character selection,
simultaneously.
where ai indicates whether candidate ci is on the stencil, pij and qij represent the
location relationships between ci and cj . The number of variables is O.n2 /, where
n is number of characters. We can see that if ai D 0, constraints (5.9b)–(5.9e)
are not active. Besides, it is easy to see that when ai D aj D 1, for each of the
four possible choices of .pij ; qij / D .0; 0/; .0; 1/; .1; 0/; .1; 1/, only one of the four
inequalities (5.9b)–(5.9e) are active. For example, with (ai ; aj ; pij ; qij ) = (1,1,1,1),
only the constraint (5.9e) applies, which allows character ci to be anywhere above
character cj . The other three constraints (5.9b)–(5.9d) are always satisfied for any
permitted values of (xi ; yi ) and (xj ; yj ).
Program (5.9) can be relaxed to linear programming (LP) by replacing con-
straint (5.9g) as:
0 pij ; qij ; ai 1
Simulated Annealing
based Packing
Output 2D-Stencil
To deal with all these limitations of ILP formulation, a fast packing framework is
proposed (see Fig. 5.26). Given the input character candidates, the pre-filter process
is first applied to remove characters with bad profit [defined in (5.8)]. Then the
second step is a clustering algorithm to effectively speed-up the design process.
Followed by the final floorplanner to pack all candidates.
Clustering is a well-studied problem, and there are many works and applications
in VLSI [59–61]. However, previous methodologies cannot be directly applied here.
First, traditional clustering is based on netlist, which provides all the clustering
options. Generally speaking, netlist is sparse, but in OSP the connection relation-
ships are so complex that any two characters can be clustered, and totally there are
O.n2 / clustering options. Second, given two candidates ci and cj , there are several
clustering options. For example, horizontal clustering and vertical clustering may
have different overlapping space.
The details of our clustering procedure are shown in Algorithm 19. The clustering
is repeated until no characters can be further merged. Initially all the candidates are
sorted by profiti , so those candidates with more shot number reduction tend to be
clustered. Then clustering (lines 3–8) is carried out, where we iteratively search all
character pair (ci ; cj ) with similar blank spaces, profits, and sizes. Character ci is
said to be similar to cj , if the following condition is satisfied:
8
< maxfjwi wj j=wj ; jhi hj j=hj g bound
maxfjshi shj j=shj ; jsvi svj j=svj g bound (5.10)
:
jprofiti profitj j=profitj bound
where wi and hi are the width and height of ci . shi and svi are the horizontal space
and vertical space of ci , respectively. In our implementation, bound is set as 0.2. We
can see that in clustering, all the sizes, blanks, and profits are considered.
5.3 OSP for MCC System 147
a b
Vertical Blank
c9
c4
c5
c8 c5
c2
c2 c7
c7
c1
c3 c4 c6 c8
c6
c3
c1 c9
Horizontal Blank
For each candidate ci , finding available cj may need O.n/, and complexity of the
horizontal clustering and vertical clustering are both O.n2 /. Then the complexity of
the whole procedure is O.n2 /, where n is the number of candidates.
A KD-Tree [62] is used to speed-up the process of finding available pair .ci ; cj /. It
provides fast O.log n/ region searching operations which keep the time for insertion
and deletion small: insertion, O.log n/; deletion of the root, O.n.k 1/=k/; deletion
of a random node, O.log n/. Using KD-Tree, the complexity of the Algorithm 19 can
be reduced to O.n log n/. For instance, given nine character candidates {c1 ; : : : ; c9 },
the corresponding KD-Tree is shown in Fig. 5.27a. For the sake of convenience, here
characters are distributed only based on horizontal and vertical spaces. The edges
of KD-Tree are labeled as well. To search for candidates with similar space with c2
(see the shaded region of Fig. 5.27a), it may need O.n/ time to scan all candidates,
where n is the total candidate number. However, under the KD-Tree structure, this
search procedure can be resolved in O.log n/. Particularly, all candidates scanned
(c1 c5 ) are illustrated in Fig. 5.27b.
In [34], the 2D-OSP is transformed into a fixed-outline floorplanning problem.
If a character candidate is outside the fixed-outline, then the character would not
148 5 Design for Manufacturability with E-Beam Lithography
1D-1 1000 1 64,891 912 50,809 926 19,095 940 19,479 940
1D-2 1000 1 99,381 884 93,465 854 35,295 864 34,974 866
1D-3 1000 1 165,480 748 152,376 749 69,301 757 67,209 766
1D-4 1000 1 193,881 691 193,494 687 92,523 703 93,766 703
1M-1 1000 10 63,811 912 53,333 926 39,026 938 36,800 945
1M-2 1000 10 104,877 884 95,963 854 77,997 864 75,303 874
1M-3 1000 10 172,834 748 156,700 749 138,256 758 132,773 774
1M-4 1000 10 200,498 691 196,686 687 176,228 698 173,620 711
1M-5 4000 10 274,992 3604 255,208 3629 204,114 3660 201,492 3681
1M-6 4000 10 437,088 3341 417,456 3346 357,829 3382 348,007 3420
1M-7 4000 10 650,419 3000 644,288 2986 568,339 3016 559,655 3070
1M-8 4000 10 820,013 2756 809,721 2734 731,483 2760 721,149 2818
Avg. – – 270,680.4 1597:6 259,958.3 1594:0 209,123.8 1611:7 205,352.3 1630:7
Ratio – – 1.32 0:98 1.27 0:98 1.02 0:99 1.0 1:0
149
150 5 Design for Manufacturability with E-Beam Lithography
Fig. 5.28 The comparison of e-beam system writing times between E-BLOW-0 and E-BLOW-1
solvers are more expensive than the deterministic heuristics [48], the runtime of
E-BLOW is reasonable that each case can be finished in 20 s on average.
We further demonstrate the effectiveness of the fast ILP convergence and post-
insertion. We denote E-BLOW-0 as E-BLOW without these two techniques, and
denote E-BLOW-1 as E-BLOW with these techniques. Figures 5.28 and 5.29
compare E-BLOW-0 and E-BLOW-1, in terms of system writing time and runtime,
respectively. From Fig. 5.28 we can see that applying fast ILP convergence and
post-insertion can effectively E-Beam system throughput, that is, averagely 9 %
5.3 OSP for MCC System 151
system writing time reduction can be achieved. In addition, Fig. 5.29 demonstrates
the performance of the fast ILP convergence. We can see that in 11 out of 12 test
cases, the fast ILP convergence can effectively reduce E-BLOW CPU time. The
possible reason for the slow down in case 1D-4 is that when fast convergence is
called, if there are still many unsolved aij variables, ILP solver may suffer from
runtime overhead problem. However, if more successive rounding iterations are
applied before ILP convergence, less runtime can be reported.
For 2D-OSP, Table 5.6 gives the similar comparison. For each algorithm, we
also record “T”, “char #,” and “CPU(s)”, where the meanings are the same with
that in Table 5.4. Compared with E-BLOW, although the greedy algorithm is faster,
its design results would introduce 41 % more system writing time. Furthermore,
compared with E-BLOW, although the framework in [34] puts 2 % characters
onto stencil, it gets 15 % more system writing time. The possible reason is that
in E-BLOW the characters with similar writing time are clustered together. The
clustering method can help to speed-up the packaging, so E-BLOW is 28 faster
than [34]. In addition, after clustering the character number can be reduced. With
smaller solution space, the simulated annealing engine is easier to achieve a better
solution, in terms of system writing time.
From both tables we can see that compared with [34], E-BLOW can achieve a
better trade-off between runtime and system throughput.
We further compare the E-BLOW with the ILP formulations (5.5) and (5.9).
Although for both OSP problems the ILP formulations can find optimal solutions
theoretically, they may suffer from runtime overhead. Therefore, we randomly
generate nine small benchmarks, five for 1D-OSP (“1T-x”) and four for 2D-OSP
(“2T-x”). The sizes of all the character candidates are set to 40 40 m. For 1D-
OSP benchmarks, the row number is set to 1, and the row length is set to 200.
The comparisons are listed in Table 5.7, where column “candidate#” is the number
152
of character candidates. “ILP” and “E-BLOW” represent the ILP formulation and
our E-BLOW framework, respectively. In ILP formulation, column “binary#” gives
the binary variable number. For each mode, we report “T”, “char#,” and “CPU(s)”,
where “T” is E-Beam system writing time, “char#” is character number on final
stencil, and “CPU(s)” is the runtime. Note that in Table 5.7 the ILP solutions are
optimal.
Let us compare E-BLOW with ILP formulation for 1D cases (1T-1, : : : , 1T-5). E-
BLOW can achieve the same results with ILP formulations, meanwhile it is very fast
that all cases can be finished in 0.2 s. Although ILP formulation can achieve optimal
results, it is very slow that a case with 14 character candidates (1T-5) cannot be
solved in 1 h. Next, let us compare E-BLOW with ILP formulation for 2D cases (2T-
1, : : : , 2T-4). For 2D cases ILP formulations are slow that if the character candidate
number is 12, it cannot finish in 1 h. E-BLOW is fast, but with some solution quality
penalty.
Although the integral variable number for each case is not huge, we find that
in the ILP formulations, the solutions of corresponding LP relations are vague.
Therefore, expensive search method may cause unacceptable runtimes. From these
cases ILP formulations are impossible to be directly applied in OSP problem, as in
MCC system character number may be as large as 4000.
5.4 Summary
In this chapter we have introduced different types of EBL system: VSB mode and
CP mode, and proposed a set of optimization techniques to overcome the throughput
limitation.
For VSB mode EBL system, we developed L-shape based layout fracturing for
VSB shot number and sliver minimization. The rectangular merging (RM)-based
algorithm is optimal for a given set of rectangular fractures. However, we show
154 5 Design for Manufacturability with E-Beam Lithography
that the direct L-shape fracturing (DLF) algorithm has superior performance by
directly decomposing the original layouts into a set of L-shapes. DLF obtained the
best results in all metrics: shot count, sliver length, and runtime, compared to the
previous state-of-the-art rectangular fracturing with RM.
For CP mode, we explore an extended MCC system, and discussed the corre-
sponding OSP problem in the MCC system. For 1D-OSP, a successive relaxation
algorithm and a dynamic programming based refinement are proposed. For 2D-
OSP, a KD-Tree based clustering method is integrated into simulated annealing
framework. Experimental results show that compared with previous works, E-
BLOW can achieve better performance in terms of shot number and runtime, for
both MCC system and traditional EBL system.
As EBL, including MCC system, are widely used for mask making and also
gaining momentum for direct wafer writing, we believe a lot more research can be
done for not only stencil planning, but also EBL aware design.
References
1. Kahng, A.B., Park, C.-H., Xu, X., Yao, H.: Layout decomposition for double patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 465–472 (2008)
2. Zhang, H., Du, Y., Wong, M.D., Topaloglu, R.: Self-aligned double patterning decomposition
for overlay minimization and hot spot detection. In: ACM/IEEE Design Automation Confer-
ence (DAC), pp. 71–76 (2011)
3. Yu, B., Yuan, K., Zhang, B., Ding, D., Pan, D.Z.: Layout decomposition for triple patterning
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 1–8 (2011)
4. Yu, B., Pan, D.Z.: Layout decomposition for quadruple patterning lithography and beyond. In:
ACM/IEEE Design Automation Conference (DAC), pp. 53:1–53:6 (2014)
5. Pan, D.Z., Yu, B., Gao, J.-R.: Design for manufacturing with emerging nanolithography. IEEE
Trans. Comput. Aided Des. Integr. Circuits Syst. 32(10), 1453–1472 (2013)
6. Arisawa, Y., Aoyama, H., Uno, T., Tanaka, T.: EUV flare correction for the half-pitch 22nm
node. In: Proceedings of SPIE, vol. 7636 (2010)
7. Zhang, H., Du, Y., Wong, M.D.F., Deng, Y., Mangat, P.: Layout small-angle rotation and
shift for EUV defect mitigation. In: IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), pp. 43–49 (2012)
8. Chang, L.-W., Bao, X., Chris, B., Philip Wong, H.-S.: Experimental demonstration of aperiodic
patterns of directed self-assembly by block copolymer lithography for random logic circuit
layout. In: IEEE International Electron Devices Meeting (IEDM), pp. 33.2.1–33.2.4 (2010)
9. Pfeiffer, H.C.: New prospects for electron beams as tools for semiconductor lithography. In:
Proceedings of SPIE, vol. 7378 (2009)
10. Fujimura, A.: Design for e-beam: design insights for direct-write maskless lithography. In:
Proceedings of SPIE, vol. 7823 (2010)
11. Maruyama, T., Takakuwa, M., Kojima, Y., Takahashi, Y., Yamada, K., Kon, J., Miyajima,
M., Shimizu, A., Machida, Y., Hoshino, H., Takita, H., Sugatani, S., Tsuchikawa, H.: EBDW
technology for EB shuttle at 65nm node and beyond. In: Proceedings of SPIE, vol. 6921 (2008)
12. Manakli, S., Komami, H., Takizawa, M., Mitsuhashi, T., Pain, L.: Cell projection use in mask-
less lithography for 45nm & 32nm logic nodes. In: Proceedings of SPIE, vol. 7271 (2009)
References 155
13. Du, Y., Zhang, H., Wong, M.D.F., Chao, K.-Y.: Hybrid lithography optimization with e-beam
and immersion processes for 16nm 1D gridded design. In: IEEE/ACM Asia and South Pacific
Design Automation Conference (ASPDAC), pp. 707–712 (2012)
14. Gao, J.-R., Yu, B., Pan, D.Z.: Self-aligned double patterning layout decomposition with
complementary e-beam lithography. In: IEEE/ACM Asia and South Pacific Design Automation
Conference (ASPDAC), pp. 143–148 (2014)
15. Ding, Y., Chu, C., Mak, W.-K.: Throughput optimization for SADP and e-beam based
manufacturing of 1D layout. In: ACM/IEEE Design Automation Conference (DAC),
pp. 51:1–51:6 (2014)
16. Yang, Y., Luk, W.-S., Zhou, H., Yan, C., Zeng, X., Zhou, D.: Layout decomposition co-
optimization for hybrid e-beam and multiple patterning lithography. In: IEEE/ACM Asia and
South Pacific Design Automation Conference (ASPDAC), pp. 652–657 (2015)
17. Fang, S.-Y., Liu, I.-J., Chang, Y.-W.: Stitch-aware routing for multiple e-beam lithography. In:
ACM/IEEE Design Automation Conference (DAC), pp. 25:1–25:6 (2013)
18. Babin, S., Kahng, A.B., Mandoiu, I.I., Muddu, S.: Resist heating dependence on subfield
scheduling in 50kV electron beam maskmaking. In: Proceedings of SPIE, vol. 5130 (2003)
19. Fang, S.-Y., Chen, W.-Y., Chang, Y.-W.: Graph-based subfield scheduling for electron-beam
photomask fabrication. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(2), 189–201
(2013)
20. Kahng, A.B., Xu, X., Zelikovsky, A.: Fast yield-driven fracture for variable shaped-beam mask
writing. In: Proceedings of SPIE, vol. 6283 (2006)
21. Ma, X., Jiang, S., Zakhor, A.: A cost-driven fracture heuristics to minimize sliver length. In:
Proceedings of SPIE, vol. 7973 (2011)
22. Yu, B., Gao, J.-R., Pan, D.Z.: L-shape based layout fracturing for e-beam lithography. In:
IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), pp. 249–254
(2013)
23. Chan, T.B., Gupta, P., Han, K., Kagalwalla, A.A., Kahng, A.B., Sahouria, E.: Benchmarking
of mask fracturing heuristics. In: IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), pp. 246–253 (2014)
24. Fujino, T., Kajiya, Y., Yoshikawa, M.: Character-build standard-cell layout technique for high-
throughput character-projection EB lithography. In: Proceedings of SPIE, vol. 5853 (2005)
25. Sugihara, M., Takata, T., Nakamura, K., Inanami, R., Hayashi, H., Kishimoto, K., Hasebe, T.,
Kawano, Y., Matsunaga, Y., Murakami, K., Okumurae, K.: Technology mapping technique for
throughput enhancement of character projection equipment. In: Proceedings of SPIE, vol. 6151
(2007)
26. Sugihara, M., Takata, T., Nakamura, K., Inanami, R., Inanami, R., Hayashi, H., Kishimoto,
K., Hasebe, T., Kawano, Y., Matsunaga, Y., Murakami, K., Okumura, K.: A character size
optimization technique for throughput enhancement of character projection lithography. In:
IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2561–2564 (2006)
27. Du, P., Zhao, W., Weng, S.-H., Cheng, C.-K., Graham, R.: Character design and stamp
algorithms for character projection electron-beam lithography. In: IEEE/ACM Asia and South
Pacific Design Automation Conference (ASPDAC), pp. 725–730 (2012)
28. Ikeno, R., Maruyama, T., Iizuka, T., Komatsu, S., Ikeda, M., Asada, K.: High-throughput
electron beam direct writing of VIA layers by character projection using character sets based
on one-dimensional VIA arrays with area-efficient stencil design. In: IEEE/ACM Asia and
South Pacific Design Automation Conference (ASPDAC), pp. 255–260 (2013)
29. Minh, H.P.D., Iizuka, T., Ikeda, M., Asada, K.: Shot minimization for throughput improvement
of character projection electron beam direct writing. In: Proceedings of SPIE, vol. 6921 (2006)
30. Ikeno, R., Maruyama, T., Komatsu, S., Iizuka, T., Ikeda, M., Asada, K.: A structured routing
architecture and its design methodology suitable for high-throughput electron beam direct
writing with character projection. In: ACM International Symposium on Physical Design
(ISPD), pp. 69–76 (2013)
31. Lee, S.H., Choi, J., Kim, H.B., Kim, B.G., Cho, H.-K.: The requirements for the future e-beam
mask writer: statistical analysis of pattern accuracy. In: Proceedings of SPIE, vol. 8166 (2011)
156 5 Design for Manufacturability with E-Beam Lithography
32. Sahouria, E., Bowhill, A.: Generalization of shot definition for variable shaped e-beam
machines for write time reduction. In: Proceedings of SPIE, vol. 7823 (2010)
33. Elayat, A., Lin, T., Sahouria, E., Schulze, S.F.: Assessment and comparison of different
approaches for mask write time reduction. In: Proceedings of SPIE, vol. 8166 (2011)
34. Yuan, K., Yu, B., Pan, D.Z.: E-beam lithography stencil planning and optimization with
overlapped characters. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 31(2), 167–179
(2012)
35. Kahng, A.B., Xu, X., Zelikovsky, A.: Yield-and cost-driven fracturing for variable shaped-
beam mask writing. In: Proceedings of SPIE, vol. 5567 (2004)
36. Dillon, B., Norris, T.: Case study: the impact of VSB fracturing. In: Proceedings of SPIE,
vol. 7028 (2008)
37. Jiang, S., Ma, X., Zakhor, A.: A recursive cost-based approach to fracturing. In: Proceedings
of SPIE, vol. 7973 (2011)
38. Edelsbrunner, H., O’Rourke, J., Welzl, E.: Stationing guards in rectilinear art galleries.
Comput. Vis. Graph. Image Process. 28, 167–176 (1984)
39. Lopez, M.A., Mehta, D.P.: Efficient decomposition of polygons into L-shapes with application
to VLSI layouts. ACM Trans. Des. Autom. Electron. Syst. 1(3), 371–395 (1996)
40. O’Rourke, J.: An alternate proof of the rectilinear art gallery theorem. J. Geom. 21, 118–130
(1983)
41. Galil, Z.: Efficient algorithms for finding maximum matching in graphs. ACM Comput. Surv.
18(1), 23–38 (1986)
42. Mehlhorn, K., Naher, S.: LEDA: A Platform for Combinatorial and Geometric Computing.
Cambridge University Press, Cambridge (1999)
43. Guiney, M., Leavitt, E.: An introduction to OpenAccess: an open source data model and API for
IC design. In: IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC),
pp. 434–436 (2006)
44. Yasuda, H., Haraguchi, T., Yamada, A.: A proposal for an MCC (multi-column cell with lotus
root lens) system to be used as a mask-making e-beam tool. In: Proceedings of SPIE, vol. 5567
(2004)
45. Maruyama, T., Machida, Y., Sugatani, S., Takita, H., Hoshino, H., Hino, T., Ito, M., Yamada,
A., Iizuka, T., Komatsue, S., Ikeda, M., Asada, K.: CP element based design for 14nm node
EBDW high volume manufacturing. In: Proceedings of SPIE, vol. 8323 (2012)
46. Shoji, M., Inoue, T., Yamabe, M.: Extraction and utilization of the repeating patterns for CP
writing in mask making. In: Proceedings of SPIE, vol. 7748 (2010)
47. Sugihara, M., Takata, T., Nakamura, K., Inanami, R., Hayashi, H., Kishimoto, K., Hasebe,
T., Kawano, Y., Matsunaga, Y., Murakami, K., Okumura, K.: Cell library development
methodology for throughput enhancement of character projection equipment. IEICE Trans.
Electron. E89-C, 377–383 (2006)
48. Kuang, J., Young, E.F.: A highly-efficient row-structure stencil planning approach for e-beam
lithography with overlapped characters. In: ACM International Symposium on Physical Design
(ISPD), pp. 109–116 (2014)
49. Guo, D., Du, Y., Wong, M.D.: Polynomial time optimal algorithm for stencil row planning in
e-beam lithography. In: IEEE/ACM Asia and South Pacific Design Automation Conference
(ASPDAC), pp. 658–664 (2015)
50. Chu, C., Mak, W.-K.: Flexible packed stencil design with multiple shaping apertures for e-
beam lithography. In: IEEE/ACM Asia and South Pacific Design Automation Conference
(ASPDAC), pp. 137–142 (2014)
51. Mak, W.-K., Chu, C.: E-beam lithography character and stencil co-optimization. IEEE Trans.
Comput. Aided Des. Integr. Circuits Syst. 33(5), 741–751 (2014)
52. Yu, B., Yuan, K., Gao, J.-R., Pan, D.Z.: E-BLOW: e-beam lithography overlapping aware
stencil planning for MCC system. In: ACM/IEEE Design Automation Conference (DAC),
pp. 70:1–70:7 (2013)
53. Kuang, J., Young, E.F.: Overlapping-aware throughput-driven stencil planning for e-beam
lithography. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 254–261 (2014)
References 157
54. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge University
Press, Cambridge (2009)
55. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley,
New York (1990)
56. Dawande, M., Kalagnanam, J., Keskinocak, P., Salman, F., Ravi, R.: Approximation algorithms
for the multiple knapsack problem with assignment restrictions. J. Comb. Optim. 4, 171–186
(2000)
57. Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W.: Progress in linear programming-based
algorithms for integer programming: an exposition. INFORMS J. Comput. 12(1), 2–23 (2000)
58. Sutanthavibul, S., Shragowitz, E., Rosen, J.: An analytical approach to floorplan design and
optimization. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 10(6), 761–769 (1991)
59. Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning:
application in VLSI domain. In: ACM/IEEE Design Automation Conference (DAC), pp. 526–
529 (1997)
60. Nam, G.-J., Reda, S., Alpert, C., Villarrubia, P., Kahng, A.: A fast hierarchical quadratic
placement algorithm. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25(4), 678–691
(2006)
61. Yan, J.Z., Chu, C., Mak, W.-K.: SafeChoice: a novel clustering algorithm for wirelength-driven
placement. In: ACM International Symposium on Physical Design (ISPD), pp. 185–192 (2010)
62. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun.
ACM 18, 509–517 (1975)
63. Adya, S.N., Markov, I.L.: Fixed-outline floorplanning: enabling hierarchical design. IEEE
Trans. Very Large Scale Integr. Syst. 11(6), 1120–1135 (2003)
64. Murata, H., Fujiyoshi, K., Nakatake, S., Kajitani, Y.: VLSI module placement based on
rectangle-packing by the sequence-pair. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.
12, 1518–1524 (1996)
65. Gurobi Optimization Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2014)
Chapter 6
Conclusions and Future Works
References
1. Yu, B., Gao, J.-R., Pan, D.Z.: Triple patterning lithography (TPL) layout decomposition using
end-cutting. In: Proceedings of SPIE, vol. 8684 (2013)
2. Yu, B., Roy, S., Gao, J.-R., Pan, D.Z.: Triple patterning lithography layout decomposition using
end-cutting. J. Micro/Nanolithogr. MEMS MOEMS (JM3) 14(1), 011002 (2015)
3. Kohira, Y., Matsui, T., Yokoyama, Y., Kodama, C., Takahashi, A., Nojima, S., Tanaka, S.:
Fast mask assignment using positive semidefinite relaxation in LELECUT triple patterning
lithography. In: IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-
DAC), pp. 665–670 (2015)
4. Yu, B., Xu, X., Gao, J.-R., Pan, D.Z.: Methodology for standard cell compliance and detailed
placement for triple patterning lithography. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 349–356 (2013)
5. Yu, B., Xu, X., Gao, J.-R., Lin, Y., Li, Z., Alpert, C., Pan, D.Z.: Methodology for standard
cell compliance and detailed placement for triple patterning lithography. IEEE Trans. Comput.
Aided Des. Integr. Circuits Syst. (TCAD) 34(5), 726–739 (2015)
6. Kuang, J., Chow, W.-K., Young, E.F.Y.: Triple patterning lithography aware optimization
for standard cell based design. In: IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), pp. 108–115 (2014)
7. Chien, H.-A., Chen, Y.-H., Han, S.-Y., Lai, H.-Y., Wang, T.-C.: On refining row-based detailed
placement for triple patterning lithography. IEEE Trans. Comput. Aided Des. Integr. Circuits
Syst. (TCAD) 34(5), 778–793 (2015)
8. Lin, T., Chu, C.: TPL-aware displacement-driven detailed placement refinement with coloring
constraints. In: ACM International Symposium on Physical Design (ISPD), pp. 75–80 (2015)
9. Tian, H., Du, Y., Zhang, H., Xiao, Z., Wong, M.D.F.: Triple patterning aware detailed
placement with constrained pattern assignment. In: IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pp. 116–123 (2014)
10. Gao, J.-R., Yu, B., Huang, R., Pan, D.Z.: Self-aligned double patterning friendly configuration
for standard cell library considering placement. In: Proceedings of SPIE, vol. 8684 (2013)
11. Yi, H., Bao, X.-Y., Zhang, J., Tiberio, R., Conway, J., Chang, L.-W., Mitra, S., Wong, H.-S.P.:
Contact-hole patterning for random logic circuit using block copolymer directed self-assembly.
In: Proceedings of SPIE, vol. 8323 (2012)
12. Xiao, Z., Du, Y., Wong, M.D., Zhang, H.: DSA template mask determination and cut
redistribution for advanced 1D gridded design. In: Proceedings of SPIE, vol. 8880 (2013)
13. Du, Y., Guo, D., Wong, M.D.F., Yi, H., Wong, H.-S.P., Zhang, H., Ma, Q.: Block copolymer
directed self-assembly (DSA) aware contact layer optimization for 10 nm 1D standard
cell library. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
pp. 186–193 (2013)
14. Du, Y., Xiao, Z., Wong, M.D., Yi, H., Wong, H.-S.P.: DSA-aware detailed routing for via layer
optimization. In: Proceedings of SPIE, vol. 9049 (2014)
15. Ou, J., Yu, B., Gao, J.-R., Pan, D.Z., Preil, M., Latypov, A.: Directed self-assembly based
cut mask optimization for unidirectional design. In: ACM Great Lakes Symposium on VLSI
(GLSVLSI), pp. 83–86 (2015)
162 6 Conclusions and Future Works
16. Zhang, H., Du, Y., Wong, M.D.F., Tapalaglu, R.O.: Efficient pattern relocation for EUV blank
defect mitigation. In: IEEE/ACM Asia and South Pacific Design Automation Conference
(ASPDAC), pp. 719–724 (2012)
17. Zhang, H., Du, Y., Wong, M.D.F., Deng, Y., Mangat, P.: Layout small-angle rotation and
shift for EUV defect mitigation. In: IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), pp. 43–49 (2012)
18. Du, Y., Zhang, H., Ma, Q., Wong, M.D.F.: Linear time algorithm to find all relocation positions
for EUV defect mitigation. In: IEEE/ACM Asia and South Pacific Design Automation
Conference (ASPDAC), pp. 261–266 (2013)
19. Kagalwalla, A.A., Gupta, P., Hur, D.-H., Park, C.-H.: Defect-aware reticle floorplanning for
EUV masks. In: Proceedings of SPIE, vol. 7479 (2011)
20. Fang, S.-Y., Chang, Y.-W.: Simultaneous flare level and flare variation minimization with
dummification in EUVL. In: ACM/IEEE Design Automation Conference (DAC), pp. 1179–
1184 (2012)
21. Pan, D.Z., Yu, B., Gao, J.-R.: Design for manufacturing with emerging nanolithography. IEEE
Trans. Comput. Aided Des. Integr. Circuits Syst. (TCAD) 32(10), 1453–1472 (2013)
22. Jhaveri, T., Rovner, V., Liebmann, L., Pileggi, L., Strojwas, A.J., Hibbeler, J.D.:
Co-optimization of circuits, layout and lithography for predictive technology scaling beyond
gratings. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. (TCAD) 29(4), 509–527
(2010)
23. Du, Y., Zhang, H., Wong, M.D.F., Chao, K.-Y.: Hybrid lithography optimization with e-beam
and immersion processes for 16nm 1D gridded design. In: IEEE/ACM Asia and South Pacific
Design Automation Conference (ASPDAC), pp. 707–712 (2012)
24. Liebmann, L., Chu, A., Gutwin, P.: The daunting complexity of scaling to 7 nm without EUV:
pushing DTCO to the extreme. In: Proceedings of SPIE, vol. 9427 (2015)
25. Xu, X., Cline, B., Yeric, G., Yu, B., Pan, D.Z.: Self-aligned double patterning aware pin access
and standard cell layout co-optimization. In: ACM International Symposium on Physical
Design (ISPD), pp. 101–108 (2014)
26. Ye, W., Yu, B., Ban, Y.-C., Liebmann, L., Pan, D.Z.: Standard cell layout regularity and pin
access optimization considering middle-of-line. In: ACM Great Lakes Symposium on VLSI
(GLSVLSI), pp. 289–294 (2015)
27. Fang, S.-Y.: Cut mask optimization with wire planning in self-aligned multiple patterning
full-chip routing. In: IEEE/ACM Asia and South Pacific Design Automation Conference
(ASPDAC), pp. 396–401 (2015)
28. Xu, X., Yu, B., Gao, J.-R., Hsu, C.-L., Pan, D.Z.: PARR: pin access planning and regular
routing for self-aligned double patterning. In: ACM/IEEE Design Automation Conference
(DAC), pp. 28:1–28:6 (2015)
29. Su, Y.-H., Chang, Y.-W.: Nanowire-aware routing considering high cut mask complexity.
In: ACM/IEEE Design Automation Conference (DAC), pp. 138:1–138:6 (2015)
Index
B E
Backtracking, 70, 91 Electron beam lithography (EBL), 2, 111
Balanced density, 31 Character, 112
Bounded subset sum (BSS), 131 Character projection (CP), 112, 125
Branch-and-bound, 92 Multi-column cell (MCC), 125, 128
Bridge edge, 23, 63 Sliver, 114
Bridge vertex, 23–24 Stencil, 112
VSB, 4, 111, 113
End-cut graph, 56
Extreme ultra violet (EUV), 2, 111, 160
C
Chord, 115, 119
Clustering, 146
G
Concave vertex, 115
GH-Tree, 74
Conflict, 7
Global moving, 102
Constraint graph (CG), 89
Critical dimension (CD), 3, 33, 114
H
HPWL, 95
D
Decomposition graph, 9, 43
Density uniformity, 34 I
Design for manufacturability (DFM), 4, Immersion lithography, 1
159 Independent component computation, 21, 63
Detailed placement, 94 Inner product, 18, 38
Directed self-assembly (DSA), 2, 111, Integer linear programming (ILP), 7, 13, 61,
160 136, 145
Direct L-shape fracturing, 117 Iterative vertex removal, 23
Disjoint-set, 21
Double patterning lithography (DPL), 2, 7
LELE, 2, 53 K
SADP, 2 KD-Tree, 147
Dynamic programming, 97, 100, 142 Knapsack, 138
L Q
Layout decomposition, 4, 8 Quadruple patterning lithography (QPL), 2, 67
DPL decomposition, 12
K-patterning decomposition, 68
LELE-EC decomposition, 57 R
QPL decomposition, 67 Rectangular merging, 116
TPL decomposition, 12, 33
Layout fracturing, 111
Layout graph, 8, 21, 42, 55 S
Linear programming (LP), 136, 145 Semidefinite, 18
Lithography resolution, 2 Semidefinite programming (SDP), 18, 38, 39,
Look-up table, 93 69
Shortest path, 96, 97
Simplified constraint graph (SCG), 90
M Standard cell compliance, 86
Mapping, 20, 39, 70 Stitch, 4, 9
Matching, 143, 144 Lost stitch, 9, 10
Multiple patterning lithography (MPL), 1 Projection sequence, 10
Redundant stitch, 9, 10
Stitch candidate, 9
N Stitch candidate, 62
Nanoimprint lithography (NIL), 2, 160 Successive rounding, 139–140
NP-hard, 12, 39, 57, 130, 135
T
O 3SAT, 131
Optical proximity correction (OPC), 113 Timing characterization, 88
Orthogonal drawing, 12 TPL-OSR, 95
Overlapping-aware stencil planning (OSP), Triple patterning lithography (TPL), 2, 7
127, 130 LELE-EC, 53
LELELE, 53
P
Partitioning, 40 V
PG3C, 13 Vector programming, 17, 38