1-Cao_Chapter2_Thesis
1-Cao_Chapter2_Thesis
Design Optimisation
Linan Cao
Doctor of Philosophy
University of York
Electronic Engineering
August 2021
Chapter 2
8
2.1 Overview
2.1 Overview
This chapter will overview the modern digital IC design process from transistor scaling
to the pre-fabrication design stage and outline the current challenges of commercial
EDA flows. The chapter is structured as follows: Section 2.2 introduces transistor
scaling including its trends and challenges. Standard cell library design for digital IC
flow using is then explored in Section 2.3. Section 2.4 introduces the industry-standard
digital VLSI design flow and its challenges for modern circuit designs. The related
optimisation techniques for augmenting the standard flow to tackle these challenges
are discussed in Section 2.5. Section 2.6 summarises the chapter.
9
2.2 Transistor Evolution
The technology node (also process node, process technology or simply node) refers to a
specific semiconductor manufacturing process and its design rules. In the semiconductor
industry, the miniaturisation of technology devices (i.e., transistors) has continuously
enabled the next-generation process node, circuit and system architecture over the
last few decades. The transistor scaling allows ICs to obtain greater device density
following the well-known Moore’s law (shown in Figure 2.1) which projects a doubling
of transistors on a single chip about every two years [12]. Such scaling targets drive the
industry to push the semiconductor physical limits towards many process technology
innovations introducing new materials and new structures to fulfil Moore’s law.
Figure 2.1 Moore’s law: The number of transistors on microchips doubles every two
years [1].
10
2.2 Transistor Evolution
Gate
Source
In the early 2000s, before the 130nm node, MOSFETs rewardingly employed the
Dennard scaling methodology [14], in which the transistor size could be scaled by a
constant while delivering consistent improvements in transistor area, performance (e.g.,
delay) and power reduction [15]. However, this transistor shrinking trajectory has
broken down and can no longer be followed in advanced technology nodes because
the power cannot be dropped without simultaneously decreasing either the transistor
performance or increasing current leakage. The leakage has been particularly significant
beyond 65nm node, where it poses a greater proportion of overall power consumption
and causes thermal issues on the chip [16]. The architecture innovation of transistor has
been shifted from the planar FET to FinFET and to the cutting-edge gate-all-around
11
2.2 Transistor Evolution
(GAA) FET to reach the goal of increasing the control of channel for leakage reduction
and operating at lower power with good performance. Figure 2.2 presents the evolution
of transistor architecture from planar device to GAA.
The 2020 IEEE IRDS report further projects beyond 2022, where a transition to
lateral GAA devices for the next die shrink below 5nm and GAAFET will become the
mainstream device after 2025, taking the place of FinFETs [8]. A lateral GAAFET
architecture example, shown in Figure 2.2 (c), shows how the gate material surrounds
the source to drain channel region (i.e., using the nanowire structure in this case) on
all sides.
Most recently, Samsung has launched the plan to develop its own novel variant (i.e,
using lateral nanosheet structure) of GAAFET, called MBCFET TM shown in Figure 2.3
(a), for 3nm process node. From a long-term perspective for the next 15 years, the
projected evolution of device architectures is expected to potentially include vertically
stacked fine-pitch 3D GAA devices in hybrid formed with the lateral GAAFETs,
presented in Figure 2.3 (b) [8].
12
2.2 Transistor Evolution
Gate
Gate
Oxide Oxide
Silicon Substrate Silicon Substrate
Figure 2.3 (a) A lateral GAAFET using nanosheet structure proposed by Samsung.
(b) Projected GAA device architecture for 3D VLSI beyond 2030.
A process technology is typically labelled with a node name indicating the device
dimension. However, the industry “Node Range” labelling scheme of modern process
technologies, particularly in FinFETs, starts losing their actual meaning. The node
names used to represent physical features of a transistor, such as the gate length or metal
half-pitch. Most recently, due to how the transistor architecture changed dramatically
from how it used to be, the “Node Range” labels simply become commercial names for
a generation of a certain size and its technology, and does not represent any geometry
of the transistor [17].
The projected few process nodes in the 2020 IEEE IRDS roadmap are defined with
labelling “3”, “2.1”, “1.5”, “1.0 eq” and “0.7 eq” from 2022 to 2034 [8]. These numbers
look like they are continuously shrinking, whereas the physical gate length of each
corresponding process node is not constantly dropping down. The expected physical
gate length of the “5nm” node starts with 18nm, and decreases to 12nm at “1.5nm”
node and stays constant for the following nodes [8]. Figure 2.4 presents the CMOS
13
2.2 Transistor Evolution
scaling evolution, in which it can be observed that gate length or metal pitch is no
longer shrinking significantly with each process node generation [2].
Figure 2.4 CMOS technology scaling evolution: The gate length or maximum metal
pitch is hard to shrink in advanced technology nodes [2].
These demonstrate that transistor scaling becomes extremely hard due to its physical
constraints. Novel 3D stacked GAA device architecture is indeed expected to feature in
the roadmap of 2020 IEEE IRDS as already shown in Figure 2.3 (b), which can further
increase the transistor density on the die area. This makes it possible to continuously
fulfil Moore’s law in future chip evolution. However, a single transistor’s physical
dimension (e.g., gate length, metal pitch) will not significantly change (scale-down) in
the upcoming MOSFET process nodes.
14
2.3 Standard Cell in Digital Integrated Circuits
still exists a space in current process technology and EDA tools/flows for pushing
modern designs to obtain improved overall performance.
OUT
IN2
IN OUT IN1
OUT
IN1 IN1
IN OUT OUT IN2 OUT
IN2
15
2.3 Standard Cell in Digital Integrated Circuits
overall functionality. Figure 2.5 presents a few basic logic cells including schematics,
symbols and their behaviour.
However, circuits not only must satisfy behavioral functionality but also have to meet
the constraints or requirements derived from physical level when taping out chips.
Foundries therefore create each standard cell function with multiple options in drive
strength, and each library provides multiple versions in routing track, threshold voltage
(Vth ) and supply voltage (Vdd).
Drive strength of a logic gate refers to its relative capability to charge or discharge the
capacitance presented at its output. Large drive strength featuring bigger transistor
sizes has a larger drive force to speed up a logic cell’s performance (transition time)
but can consume more power and die area, and vice versa. Thus, the multiple drive
strength options for a single cell are used to drive different required loads of circuit
paths.
For the whole library, the cell height (e.g., 9-track or 12-track) of standard cell layouts
implies how many route channels can be used later at the circuit-level physical routing
stage. More tracks allowing more routing space above the cells could relax routing
congestion, which can reduce potential design rule violations. Higher cells also provide
larger drive capabilities for better circuit performance. However, this would also
consume more power and increase the die area significantly.
The threshold voltage (Vth ) is the minimum voltage at the transistor gate (VG ) required
to form an inversion layer (channel) in between source and drain so that can turn the
transistor on. Different threshold voltages can be achieved via tuning manufacture
parameters of a transistor such as doping concentration. Foundries usually provide
standard Vth (SVT), high Vth (HVT) and low Vth (LVT) cell libraries aiming to
effectively control the leakage power in digital ICs. Because Higher Vth can reduce the
leakage but cells require larger transition time, and vice versa.
Multiple Vdd libraries are an important technique to typically save dynamic power of
digital ICs, allowing using different power domains. Different blocks having different
16
2.3 Standard Cell in Digital Integrated Circuits
supply voltages can be integrated into a single system-on-chip (SoC) chip. Thus, some
blocks can use lower voltages or even be completely shut off for a specific operation
mode so that power-efficient systems can be obtained. This method increases power
planning complexity in terms of laying down the power rails and power grid structure.
Level shift cells are necessary to interface between different blocks.
The provision of cells in the library having different layout architectures and character-
istics tries to make the most of the physical features of transistors. The standard cell
library is the middle abstraction layer which bridges process technology and common
logic blocks. Achievable well-optimised libraries have therefore become crucial, which
could determine the overall quality of results (QoRs) of VLSI designs.
The commercial digital IC design flow requires pre-characterised cell libraries for circuit
analysis and physical implementation. Figure 2.6 demonstrates the overall standard
cell design flow.
Specifications
PDKs,
Cell layout,
Design rules, Layout Design Parasitic extraction
SPICE models
17
2.3 Standard Cell in Digital Integrated Circuits
The pre-processed timing and power models, typically Liberty (.lib) format, are
generated for each cell through simulation based on its parasitic extraction. These
characterised models can speed up the evaluation process of circuits. Thus, the full
physical layouts no longer need to retain all interconnects and transistor structures,
and only the top layer metal including input/output (I/O) pin positions is required
for the subsequent circuit-level place and route. The abstract view (.lef) containing
the geometry information (normally metal 1) of cells is produced. The process of
transforming a standard cell library into pre-processed formats (i.e., timing, power
models and layout abstract) is referred to as library characterisation. Once the overall
design layout is complete, all standard cells used will then be replaced by the full
layout ones for fabrication.
Creating standard cell libraries might take much human effort in a turnaround design
cycle for producing cell layouts (i.e., transistor placement and interconnects). In the
past two decades, automated layout generators of standard cells (or called transistor
synthesis tool) have been investigated to accelerate this iterative process. In the early
2000s, EDA vendors started offering full standard cell design flow kits (e.g., Prolific
ProGenesisTM , Synopsys CadabraTM and NanGate Library CreatorTM ) for automating
optimised CMOS gate creation, including cell circuit design, physical layout and library
characterisation. However, most of them are no longer active and available excepting
18
2.4 Digital VLSI Design Flow
NanGate Library CreatorTM (acquired by Silvaco in 2018). Its latest library platform
celloTM [18] supports advanced process technology down to 7nm FinFET node for
standard cell library creation, migration and optimisation. It now excels in technology
migration and layout optimisation for further PPA gains based on legacy libraries for
optimised cell variants generation.
In addition, few standard cell creators were introduced for research in the early 2000s. A
home-brewed tool from IBM, called C-cell, could generate optimised cell layouts based
on primitive cells and was adopted for high-performance microprocessor design [19]. A
layout generation system from Kyoto University called VARDS [20] could produce a
cell layout with variable drive strength. It had been successfully employed for 130nm,
180nm and 350nm library generation [21], on-demand library generation in the full
digital IC flow [22] and post-layout transistor sizing for chip power reduction [23]. More
recently, a dedicated layout generator for area-efficient standard cells was proposed
by the same research team [24]. However, these research-purpose cell creators all
need to operate based on primitive cells and symbolic layouts, which means each logic
functional cell needs to create a corresponding layout template manually created by
human effort.
Automating the creation of standard cell libraries from scratch is an extremely challeng-
ing task. In particular, custom specifications on cell design such as special requirements
of Vdd, Vth (typically near-threshold operation [25–28]), or special drive strength [29],
are still in a dire need of experience-based designer efforts.
The process of designing a digital VLSI circuit is highly complex. It starts with a
system specification, following a series of steps and eventually produces a packaged
chip. A typical design flow is represented by the flow chart shown in Figure 2.7. The
system specification defines the overall goals and high-level requirements of the system
19
2.4 Digital VLSI Design Flow
System
Specification
Architectural
ENTITY test Design
port a: in;
end ENTITY;
Logic Design and
Circuit Design
Logic Synthesis
Physical Design
Verification and
DRC Signoff
LVS
ERC
Fabrication
Chip
The emphasis of the work discussed in this thesis mainly involves logic design to
physical design, which refers to a register-transfer level (RTL) to graphic design system
II (GDSII) flow, also called digital flow in the EDA community.
Logic design is performed at the RTL using an HDL, which defines the functional
behaviour. Two common, widely used HDLs are Verilog and VHDL (i.e., VHSIC (very
20
2.4 Digital VLSI Design Flow
high speed integrated circuit) hardware description language). All RTL modules must
be simulated and verified for the use of consequent design steps.
In addition, an IC does not only include logic designs but also some critical macros like
memory blocks, analogue circuits, and I/O cells, which are normally manually designed
at the transistor level by engineers. These macros have to be complete before running
the logic synthesis. They are also required to be characterised in advance, including
timing and power models, and physical layout abstracts need to be created.
Logic synthesis is a process that automatically converts HDL designs into a list of
signal nets and low-level circuit elements. In general, the synthesis process, shown
in Figure 2.8, has two main steps: 1) a given HDL functionality description is firstly
transformed into a netlist comprised of generic logic gates (e.g., and, or, not, universal
sequential elements). The modern EDA synthesizer provides few optimisation options
for designers to manipulate design hierarchy and logic structure transformations in
the RTL during the generic synthesis step; 2) The generic netlist is then mapped into
logic gates from a given technology standard cell library. The library used in this step
is pre-characterised in terms of timing, power (.lib file) and layout abstract (.lef file),
as discussed in Section 2.3. The technology-specified gates that defined their drive
strength, threshold voltage (corresponding to a physical view from the library) and
their inter-connectivity refer to a gate-level netlist.
The synthesised design also needs to be checked whether it meets the constraints
like timing, power, etc. If not, the synthesis tool will perform optimisation through
remapping logic or resizing gates in an iterative loop until design metrics improved.
Incremental optimisation is being operated while synthesising the design concurrently.
In addition, the synthesis tool usually provides different optimisation levels (e.g., low,
medium, high, ultra), but engineers have to make a choice between runtime and QoRs.
21
2.4 Digital VLSI Design Flow
ENTITY test
port a: in;
RTL port b: out;
b <= ~a
end ENTITY;
Generic Netlist
Synthesis
Technology Mapping
and Optimisation
module test(I/Os);
ND2D0 g0(..net_0..);
OR2D0 g1(..net_1..);
Gate-level Netlist INVD2 g2(..net_2..);
AD2D0 g3(..net_3..);
endmodule
However, it is not unreasonable that the obtained synthesised design failed to meet
some design constraints, particularly the critical one - timing, although the “try hard”
synthesis mode - ultra optimisation effort is enabled. So the failed timing paths then
might be best fixed manually in the RTL design by engineers. It can cause iterations
of the whole synthesis flow and exacerbates the design effort challenge.
22
2.4 Digital VLSI Design Flow
physical implementation tools can complete the whole process in an automated way.
Figure 2.9 presents each distinct step of physical design.
Partitioning
Floorplanning
Placement
Routing
Floorplanning. After the circuit partitioning phase, each block has a known hard
or soft shape. Hard blocks have fixed dimensions and areas, while a soft block has
a fixed area but the aspect ratio can be changed. The entire arrangement of all
blocks including their shapes and positions without any design rule violations (e.g.,
no overlap) is floorplanning. The determined topology of a circuit layout is necessary
for the subsequent placement and routing steps [31]. Particularly for routing, an
poor-floorplanned layout would significantly affect the routing quality (e.g., heavy
23
2.4 Digital VLSI Design Flow
routing congestion), which could pessimistically impact the overall performance of the
design.
Physical design directly impacts final circuit timing, area, power and reliability. Par-
ticularly meeting timing is of the most importance when completing the physical
layout generation. So timing evaluation is performed at each step of physical design
flow, and any timing violations must be solved before carrying on to the next step.
Modern physical design EDA tools offer incremental optimisation techniques to fix
these problems automatically through gate resizing (drive strength remapping), buffer
insert/delete, logic refinement, instance movement, etc. These local optimisations
might not be able to consider the design globally, and limited in trading off design
metrics well.
The complete circuit layout must be fully verified to ensure behavioural and electrical
functionality before fabrication. Few changes on layouts may be required for solving
problems exposed at physical verification step. This is normally achieved through
24
2.4 Digital VLSI Design Flow
In addition, with the increasing complexity of VLSI design, the modern EDA tools
are required to manipulate a fast algorithm to deliver a feasible solution against the
time-to-market pressure. So deterministic algorithms, which can always deliver the
same solution for a particular given input, are in demand and have been developed
for most sub-design steps of digital flow. These algorithms only require one execution
for producing solutions but algorithm designers need to determine a mathematical
function mapping the specific problem domain for computing. Such methods used
might be limited to obtain a well compromised solution from a global point of view.
Therefore, to find possible optimal trade-off solutions regarding multiple design re-
quirements using appropriate library cells while consuming less turnaround time is the
challenge of design optimisation.
25
2.5 Modified Digital Flow
Custom design methodologies are efficient to improve the QoRs of designs achieved
by experienced engineers. In the early 2000s, W. Dally and A. Chang evidenced the
role of custom design in application specific integrated circuit (ASIC) chips [33]. They
proposed to selectively apply a number of custom design techniques in the digital
flow, including custom floor-planning, place and route critical signals to achieve the
most compact layout structure. This manual design process enables reducing load on
paths, better density and ultimately achieves better PPA, but custom design requires
significant manual effort and is therefore not scalable to handle the complexity of large
designs.
Furthermore, D. Chinnery and K. Keutzer stated that there is a gap between full-
custom design and standard digital flow regarding speed and power [34] [35]. Digital
ICs implemented using the standard design flow may significantly reduce design cycle
time but have lost possible optimal trade-off solutions, which full-custom design can
achieve. However, the current extreme design complexity, as well as the time-to-
market pressure to continuously produce new generations of chips, designers in industry
still focus on synthesis-centred methodology to save design efficiency and resource
budgets. Therefore, implementing extra custom design and optimisation techniques as
enhancements to the standard digital flow is promising to achieve higher QoRs [36].
To accelerate custom design in the digital flow, H. Onodera et al. introduced an ASIC
design methodology with on-demand library generation during design loop. It can
26
2.5 Modified Digital Flow
produce cells with tailored drive strength from a set of symbolic layouts [22]. This
enabled tunability of the drive strength of cells, which is in contrast to the conventionally
used set of cells with fixed drive strengths. In [19] IBM also raised a similar semi-
custom design flow for microprocessor design. The method continuously iterates the
whole flow using pre-defined parameterized cells to recover the performance of designs
through auto-generating compensated cells into a fixed cell library. In [23] a post-layout
transistor down-sizing method was proposed for power reduction while preventing
interconnect modifications, so that straightforwardly save the design turnaround time.
Most recently, EDA vendors offer latest digital full flow solution, such as Cadence
iSpatialTM , Synopsys Fusion CompilerTM , to unify the power of logic synthesis and
physical implementation tools. The key enhancement of novel design methodologies is
migrating a part of physical implementation functions to logic synthesis for early-stage
accurate evaluation to reduce design margins, so as to enable faster throughput time
and improved PPA metrics.
27
2.6 Summary
2.6 Summary
This chapter provides an overview of digital integrated circuit design in EDA flows
from transistor scaling to physical layout implementation. Some current challenges in
the modern digital IC design are explored both in terms of EDA tools and designers.
Process technology scaling trend is moving towards novel transistor structure (3D
stacked) investigation instead of further shrinking the absolute physical size of tran-
sistors, because significant variability has been introduced in small-scale transistor
so simultaneous improvements on power and performance are almost impossible to
achieve. This implies that power, performance and area gains for overall electronic
system optimisation no longer heavily depends on transistor scaling.
Each design step of digital flow introduces its own level of abstraction, so any margin
or error will accumulate and propagate. Hence, achieving a good solution in each
step is crucial for the success of subsequent design steps and the quality of the overall
solution. Increasing the correlations between front-end and back-end during the IC
design cycle is vital to reduce margins across different levels of abstraction.
The next chapter will give a literature background of multi-objective problems and
commonly-used methodologies, as well as multi-objective optimisation techniques
applied in the VLSI design processes.
28