Asic Design Cadence DR D Gracia Nirmala Rani
Asic Design Cadence DR D Gracia Nirmala Rani
2
14EC770 : ASIC DESIGN
• Preamble
• 14EC270 : Digital Logic Circuit Design
• 14EC520 : Digital CMOS Systems
• Objective
This course provide the students, the knowledge about
– Physical design flow
• Logic synthesis, Floor-planning, Placement and Routing
– Experiments explore complete digital design flow of
programmable ASIC through VLSI EDA tools.
– Students work from design entry using verilog code to GDSII file
generation of an ASIC.
3
Concept MAP
4
Course Outcomes
CO1 Describe the design flow, types and the programming Understand
technologies of an ASIC and its construction.
CO2 Describe the goals, objectives, measurements and Apply
algorithms of partitioning then apply those algorithms to
partition the network to meet the objectives.
CO3 Describe the goals, objectives, measurements and Apply
algorithms of floorplanning & placement then apply those
algorithms to place the logic cells inside the flexible blocks
of an ASIC to meet the objectives.
CO4 Describe the goals, objectives, measurements and Analyze
algorithms of routing then apply those algorithms to route
the channels then describing various circuit extraction
formats and Investigate the issues and discover solutions in
each step of physical design flow of an ASIC.
CO5 Design an ASIC for digital circuits with ASIC design flow Analyze
steps consists of simulation, synthesis, floorplanning,
placement, routing, circuit extraction and generate GDSII
File for fabrication of an ASIC, then analyze the ASIC to
meet the performance in terms of area, speed and power
using EDA tools. 5
Integrated Circuit
Wafer : A circular piece of pure silicon (10-15 cm in dia, but
wafers of 30 cm dia are expected soon)
Wafer Lot: 5 ~ 30 wafers, each containing hundreds of
chips(dies) depending upon size of the die
Die: A rectangular piece of silicon that contains one
IC design
Mask Layers: Each IC is manufactured with successive
mask layers(10 – 15 layers)
First half-dozen or so layers define transistors
Other half-dozen define Interconnect
6
Integrated Circuit (IC) in a package
7
Evolution of IC
• SSI (Small-Scale Integration)-(1962)
– Tens of Transistors
– NAND, NOR
10
ASIC and Non ASIC
• Examples of ICs that are not ASICs include standard parts such as:
– memory chips sold as a commodity item—ROMs, DRAM, and SRAM;
microprocessors;
– TTL or TTL-equivalent ICs at SSI, MSI, and LSI levels.
• Full-Custom ASICs: Possibly all logic cells and all mask layers customized
• Semi-Custom ASICs: all logic cells are pre-designed and some (possibly all)
mask layers customized
13
Types of ASICs – Cont’d
Full-Custom ASICs
Include some (possibly all) customized logic cells
Disadvantages
Increased design time
Increased Complexity
Higher design cost
Higher risk.
Some Examples:
Microporcessor,
High-Voltage Automobile Control Chips
Ana-Digi Communication Chips
Sensors and Actuators
15
Types of ASICs – Cont’d
Semi-Custom ASICs
Standard-Cell based ASICs (CBIC- “sea-bick”)
full-custom blocks
System-Level Macros(SLMs)
cores etc
16
Types of ASICs – Cont’d
Semi-Custom ASICs – Cont’d
Standard-Cell based ASICs
(CBIC- “sea-bick”) – Cont’d
17
Types of ASICs – Cont’d
Standard Cell in Flexible block of CBIC
– The rows stack vertically to form flexible blocks- reshape during design
– Flexible blocks connected with other std cell blocks or full custom block
18
Types of ASICs – Cont’d
Wiring cells in Standard Cell based ASICs
•Feedthrough cell:
•Piece of metal that is used to pass a signal through a cell or to a space in a cell waiting
to be used as a feedthrough
•Spacer cells
•The width of each row of standard cells is adjusted so that they may be aligned
using spacer cells .
• Power cells
•If the rows of standard cells are long, then vertical power rails can also be run in
metal2 through the cell rows using special power cells that just connect to VDD
and GND.
•Usually the designer manually controls the number and width of the vertical 19
power rails connected to the standard-cell blocks during physical design.
Types of ASICs – Cont’d
Advantages of CBIC
– Save time, money, reduce risk
Disadvantages of CBIC:
– Time to design standard cell library
– Time needed to fabricate all layers of the ASIC for new design
20
Types of ASICs – Cont’d
Semi-Custom ASICs – Cont’d
Gate Array based ASICs
Transistors are predefined on the silicon wafer
Predefined pattern of transistors on a gate array is base array.
Smallest element repeated to form base array is base cell.
Only the top few layers of metal, which define the interconnect between
transistors, are defined by the designer using custom masks.It is often called a
masked gate array ( MGA ).
Less turnaroundtime: fewdays or couple of weeks.
21
Similar to CBIC –but here space is fixed
Types of ASICs – Cont’d
Semi-Custom ASICs – Cont’d
Chanelless Gate Array ASIC
•An embedded gate array or structured gate array (also known as masterslice or
masterimage ) combines some of the features of CBICs and MGAs.
•One of the disadvantages of the MGA is the fixed gate-array base cell. This
makes the implementation of memory, for example, difficult and inefficient.
• In an embedded gate array we set aside some of the IC area and dedicate it to
a specific function.
• This embedded area either can contain a different base cell that is more
suitable for building memory cells, or it can contain a complete circuit block,
such as a microcontroller.
24
Channelled gate array
Adv: Specific space for interconnection
Disadv: compared to CBIC space is not adjustable
Channelless gate array
Adv :
• Logic density is higher for channelless gate array
• Contact layers are customized
Disadv:
• No specific area for routing
• Rows of transistors used for routing are not used for other purpose.
Structured Gate Array
Adv:
• Embedded gate array set in some of IC area and dedicate to specific
function-customized.
• Increase area efficiency, performance of CBIC
• low cost and fast turn around of MGA
Disadv: 25
Embedded function is fixed
Types of ASICs – Cont’d
Semi-Custom ASICs – Cont’d
Programmable ASICs
PLDs - PLDs are low-density devices
which contain 1k – 10 k gates and are
available both in bipolar and CMOS
technologies [PLA, PAL or GAL]
CPLDs or FPLDs or FPGAs -
FPGAs combine architecture of gate arrays
with programmability of PLDs.
User Configurable
Contain Regular Structures -
circuit elements such as AND, OR,
NAND/NOR gates, FFs, Mux, RAMs,
Allow Different Programming
Technologies
Allow both Matrix and Row-
based Architectures
26
Programmable Logic Devices
• Programmable logic devices ( PLDs ) are standard ICs
– Available in standard configurations
27
Examples of PLD
• The simplest type of programmable IC is a read-only memory ( ROM ).
• The most common types of ROM use a metal fuse that can be blown
permanently
(a programmable ROM or PROM ).
• Erasable PROM
– Erase an EPROM either by using another high voltage (an electrically erasable
PROM , or EEPROM )
– Exposing the device to ultraviolet light ( UV-erasable PROM , or UVPROM ).
29
• CMOS PLDs usually employ floating-gate
transistors
Types of ASICs – Cont’d
Semi-Custom ASICs – Cont’d
Programmable ASICs
PLDs - PLDs are low-density devices
which contain 1k – 10 k gates and are
available both in bipolar and CMOS
technologies [PLA, PAL or GAL]
CPLDs or FPLDs or FPGAs -
FPGAs combine architecture of gate arrays
with programmability of PLDs.
User Configurable
Contain Regular Structures -
circuit elements such as AND, OR,
NAND/NOR gates, FFs, Mux, RAMs,
Allow Different Programming
Technologies
Allow both Matrix and Row-
based Architectures
30
Types of ASICs – Cont’d
Semi-Custom ASICs – Cont’d
Programmable ASICs - Cont’d
Structure of a CPLD / FPGA
31
Essential characteristics of FPGA
32
Why FPGA-based ASIC Design?
Choice is based on Many
Factors ;
Requirement FPGA/FPLD Discrete Logic Custom Logic
Speed Speed
Future Modifications
Future Modification
33
Different Categorizations of FPGAs
Based on Functional Unit/Logic
Cell Structure
Transistor Pairs
Basic Logic Gates: NAND/NOR
MUX
Look –up Tables (LUT)
Wide-Fan-In AND-OR Gates
Programming Technology
Anti-Fuse Technology
SRAM Technology
EPROM Technology
Gate Density
Chip Architecture (Routing Style)
34
Different Types of Logic Cells
35
Different Types of Logic Cells – Cont’d
Actel Act Logic Module Structure
Use Antifuse Programming Tech.
Based on Channeled GA Architecture
Logic Cell is MUX which can be configured as multi-input logic gates
36
Different Types of Logic Cells – Cont’d
Xilinx XC4000 CLB Structure
37
Different Types of Logic Cells – Cont’d
Altera Flex / Max Logic
Element Structure
Flex 8k/10k Devices – SRAM Based LUTs, Logic
Elements (LEs) are similar to those used in XC5200
FPGA
38
Different Types of Logic Cells – Cont’d
To SUMMARIZE, FPGAs from various
vendors differ in their
Architecture (Row Based or Matrix
Based Routing Mechanism)
39
Programming Technologies
Three Programming Technologies
The Antifuse Technology
Static RAM Technology
EPROM and EEPROM Technology
40
Antifuse
[a] [b]
[c] [d]
41
Programming Technologies – Cont’d
The Antifuse Technology
Invented at Stanford and developed
by Actel
Opposite to regular fuse Technology
Normally an open circuit until a
programming current (about 5 mA) is
forced through it [a] [b]
Two Types:
Actel’s PLICE [Programmable
Low-Impedance Circuit Element]- A
High-Resistance Poly-Diffusion
Antifuse
QuickLogic’s Low-Resistance [c] [d]
metal-metal antifuse [ViaLink]
technology
Actel Antifuse [b] Actel Antifuse Resistance [c] QuickLogic
Direct metal-2-metal connections
Antifuse [d] QL Antifuse Resistance
Higher programming currents
reduce antifuse resistance
Disadvantages:
Unwanted Long Delay
OTP Technology
42
Programming Technologies – Cont’d
Static RAM Technology
SRAM cells are used for
As Look-Up Tables (LUT) to
implement logic (as Truth Tables)
As embedded RAM blocks (for
buffer storage etc.)
As control to routing and
configuration switches
Advantages
Allows In-System Programming
(ISP)
Suitable for Reconfigurable HW
Disadvantages
Volatile – needs power all the time /
use PROM to download configuration
data
43
Programming Technologies – Cont’d
EPROM and EEPROM Technology-
44
Programming Technologies – Cont’d
Summary Sheet
45
ASIC Design Process
S-1 Design Entry: Schematic entry
or HDL description
S-2: Logic Synthesis: Using
Verilog HDL or VHDL and
Synthesis tool, produce a netlist-
logic cells and their interconnect
detail
S-3 System Partitioning: Divide a
large system into ASIC sized pieces
S-4 Pre-Layout Simulation: Check
design functionality
S-5 Floorplanning: Arrange netlist
blocks on the chip
S-6 Placement: Fix cell locations in
a block
S-7 Routing: Make the cell and
block interconnections
S-8 Extraction: Measure the
interconnect R/C cost
S-9 Post-Layout Simulation
46
ASIC Design Flow
1.Design entry. Enter the design into an ASIC design system, either
using a hardware description language ( HDL ) or
schematic entry .
2. Logic synthesis. Use an HDL (VHDL or Verilog) and a logic synthesis
tool to produce a netlist —a description of the logic
cells and their connections.
3. System partitioning. Divide a large system into ASIC-sized pieces.
4. Prelayout simulation. Check to see if the design functions correctly.
5. Floorplanning. Arrange the blocks of the netlist on the chip.
6. Placement. Decide the locations of cells in a block.
7. Routing. Make the connections between cells and blocks.
8. Extraction. Determine the resistance and capacitance of the
interconnect.
9. Postlayout simulation. Check to see the design still works with the added loads
of the interconnect.
Steps 1–4 are part of logical design , and steps 5–9 are part of physical design .
• There is some overlap. For example, system partitioning might be considered as either logical or
physical design. To put it another way, when we are performing system partitioning we have to 47
consider both logical and physical factors.
ASIC Design Process – Cont’d
Altera FPGA Design Flow – A Self-Contained System that does all
from Design Entry, Simulation, Synthesis, and Programming of Altera Devices
48
ASIC Design Process – Cont’d
Xilinx FPGA Design Flow – Allows Third-Party Design Entry SW,
Accepts their generated netlist file as an input
52
VLSI Design Flow
53
Physical Design Steps
• Part of an ASIC design flow showing
the system partitioning, floorplanning,
placement, and routing steps.
• Performed in a slightly different order,
iterated or omitted depending on the
type and size of the system and its
ASICs.
• Floorplanning assumes an increasingly
important role.
• Sequential-Each of the steps shown in
the figure must be performed and each
depends on the previous step.
• Parallel- However, the trend is
toward completing these steps in a
parallel fashion and iterating,
rather than in a sequential manner.
54
CAD Tools
It is necessary to convert each of the physical design steps
to a problem with well defined goals and objectives
The Goal for each physical design step are the things we
must achieve.
55
CAD Tools
System partitioning:
• Goal. Partition a system into a number of ASICs.
• Objectives. Minimize the number of external connections between the ASICs. Keep
each ASIC smaller than a maximum size.
Floor planning:
• Goal. Calculate the sizes of all the blocks and assign them locations.
• Objective. Keep the highly connected blocks physically close to each other.
Placement:
• Goal. Assign the interconnect areas and the location of all the logic cells within the
• flexible blocks.
• Objectives. Minimize the ASIC area and the interconnect density.
Global routing:
• Goal. Determine the location of all the interconnect.
• Objective. Minimize the total interconnect area used.
Detailed routing:
• Goal. Completely route all the interconnect on the chip.
• Objective. Minimize the total interconnect length used.
56
Methods and Algorithm
• A CAD tool needs methods or algorithms to generate a solution to each
problem using a reasonable amount of computer time.
• complexity of an algorithm is O ( f ( n ))
• there are constants k and n 0 so that the running time of the algorithm T ( n ) is
less than k f ( n ) for all n > n 0 [ Sedgewick, 1988]. Here n is a measure of the
size of the problem (number of transistors, number of wires, and so on)
• Definition : This means that it is unlikely we can find an algorithm to solve the
problem exactly in polynomial time.
58
Methods and Algorithm (Contd.,)
Cost Function :
If we are minimizing the measurement function, it is a cost function.
Gain function :
If we are maximizing the measurement function, we call the function a
gain function (sometimes just gain).
59
• Estimating ASIC size
Estimate the die size of a 40 k-gate ASIC in a 0.35 µm gate array, three-
level metal process with 166 I/O pads.
1µm(micron)=0.0393701mil
(1mil= Thousands of inch)
60
Estimating ASIC size
• For this ASIC the minimum feature size is 0.35 µm.
• No of I/O pads=166.
• One side=166/4=42 I/O pads per side.
• If a I/O pad pitch=5 mil then One side of die=5×42=210mil
• Minimum requirement of die size=210×210=4.4×104mil2 to fit 166 I/O
pads
• Die area utilized by core logic=1.19×104/4.4×104mil2=27% by core logic
61
Estimating ASIC size
62
63
System Hierarchy
Levels of Partitioning •System
•System Level Partitioning
•PCBs
•Board Level Partitioning
•Chips
• Goal of partitioning
– Divide the system into number of small systems.
• Objectives of Partitioning
we may need to take into account any or all of the following objectives:
– A maximum size for each ASIC
– A maximum number of ASICs
– A maximum number of connections for each ASIC
– A maximum number of total connections between all ASICs
Measuring Connectivity
Measuring Connectivity
• Figure (a) shows a circuit schematic, netlist, or network.
• The network consists of circuit modules A–F. Equivalent terms for a circuit
module - cell, logic cell, macro, or a block.
• A cell or logic cell -a small logic gate (NAND etc.),collection of other cells;
• Macro - gate-array cells;
• Block - a collection of gates or cells.
• Each logic cell has Electrical connections between the terminals- connectors or
pins.
• The network can be represented as the mathematical graph shown in Figure (b).
• A graph is like a spider’s web:
– it contains vertexes (or vertices) A–F -graph nodes or points) that are connected by
edges.
– A graph vertex corresponds to a logic cell.
– An electrical connection (a net or a signal) between two logic cells corresponds to a
graph edge.
Measuring Connectivity
Measuring Connectivity
• Net Cutset
– Divide the network into two by drawing a line across connections, make net cuts. The resulting
set of net cuts is the net cutset.
– Number of net cuts - the number of external connections between the two partitions in a
network.
• Edge cutset.
– When we divide the network graph into the same partitions we make edge cuts and we create
the edge cutset.
– Number of edge cuts – the number of external connections between the two partitions in a
graph
– Number of edge cuts in a graph is not necessarily equal to the number of net cuts in the
network.
Partitioning
•1 •5 •1 •8
•2 •6
•2 6 3 7
•3 •7
•4 •8 •5 •4
• (a) We wish to partition this network into three ASICs with no more than four logic cells per ASIC.
•Objectives are the following:
• (b) A partitioning with five external connections (nets 2, 4, 5, 6, and 8)—the minimum number.
• -Use no more than three ASICs.
• -l Each ASIC is to contain no more than four logic
cells.
• -l Use the minimum number of external connections for 73
74
•Types of Partitioning
75
Constructive Partitioning
• The most common constructive partitioning algorithms - seed growth or cluster
growth.
2. Consider all the logic cells that are not yet in a partition. Select each of
these logic cells in turn.
4. Add the logic cell with the highest gain g(m) to the current partition.
5. Repeat the process from step 2. If you reach the limit of logic cells in a
partition, start again at step 1.
76
Cluster Growth
m : size of each cluster, V : set of nodes
n = |V| / m ;
for (i=1; i<=n; i++)
{
seed = vertex in V with maximum degree;
Vi = {seed};
V = V – {seed};
for (j=1; j<m; j++)
{
t = vertex in V maximally connected to Vi;
Vi = Vi U {t};
V = V – {t};
}
}
77
Constructive Partitioning
To get from the solution shown in Fig 2 to the solution of Fig 1, which has a
minimum number of external connections, requires a complicated swap.
The three pairs: D and F, J and K, C and L need to be swapped—all at the same
time. It would take a very long time to consider all possible swaps of this complexity.
79
Iterative Partitioning Improvement
Algorithm based on Interchange method and group migration method
81
The Kernighan–Lin Algorithm (contd.,)
• Total external cost, cut cost, cut weight
W c
aA,bB
ab
• Gain
g Da Db 2Cab
where
Da Ea I a
82
The Kernighan–Lin Algorithm (contd.,)
• The K–L algorithm finds a group of node pairs to swap that increases the gain even
though swapping individual node pairs from that group might decrease the gain.
1. Find two nodes, ai from A, and bi from B, so that the gain from swapping them is a
maximum. The gain gi is
g i Dai Dbi 2Caibi
2. Next pretend swap ai and bi even if the gain gi is zero or negative, and do not consider ai
and bi eligible for being swapped again.
3. Repeat steps 1 and 2 a total of m times until all the nodes of A and B have been pretend
swapped. We are back where we started, but we have ordered pairs of nodes in A and B
according to the gain from interchanging those pairs.
83
The Kernighan–Lin Algorithm (contd.,)
4. Now we can choose which nodes we shall actually swap. Suppose we only swap the
first n pairs of nodes that we found in the preceding process. In other words we
swap nodes X = a1, a2, &…., an from A with nodes Y = b1, b2,&…..,bn from B.
n
The total gain would be,
Gn g i
i 1
• Use this new partitioning to start the process again at the first step.
84
•Partitioning a graph using the
Kernighan–Lin algorithm.
86
Kernighan-Lin Algorithm (1)
•3
•a •b •Given:
•4 •1 •2 Initial weighted graph G with
•2 •c
•2
•d V(G) = { a, b, c, d, e, f }
•3 •4 •1
•e
•6
•f •Start with any partition of
V(G) into X and Y, say
•X = { a, c, e }
Y = { b, d, f }
87
KL algorithm (2a)
•a
•3
•b •Compute the gain values of moving
•4 •1 •2
node x to the others set:
•2 •c
•2
•d •Gx = Ex - Ix
•3 •4 •1
Ex = cost of edges connecting node x
•6
with the other group (extra)
•e •f Ix = cost of edges connecting node x
within its own group (intra)
•cut-size = 3+1+2+4+6 = 16
• Ga = Ea – Ia = – 3 (= 3 – 4 – 2)
• Gc = Ec – Ic = 0 (= 1 + 2 + 4 – 4 – 3)
•X = { a, c, e }
Y = { b, d, f } • Ge = Ee – Ie = + 1 (= 6 – 2 – 3)
• Gb = Eb – Ib = + 2 (= 3 + 1 –2)
• Gd = Ed – Id = – 1 (= 2 – 2 – 1)
• Gf = Ef – If = + 9 (= 4 + 6 – 1) 88
KL algorithm (2b)
•3
•a •b •Cost saving when exchanging a and b is
•4 •1 •2 essentially Ga + Gb
•2
•2 •c •d •However, the cost saving 3 of the direct
•3 •4 •1 edge was counted twice. But this edge
•6 still connects the two groups
•e •f
•Hence, the real “gain” (i.e. cost saving)
of this exchange is gab = Ga + Gb - 2cab
•X = { a, c, e } • Ga = Ea – Ia = – 3 (= 3 – 4 – 2)
Y = { b, d, f } • Gb = Eb – Ib = + 2 (= 3 + 1 – 2)
• gab = Ga + Gb – 2cab = – 7 (= – 3 + 2 – 2.3)
89
KL algorithm (3)
• Ga = –3 Gb = +2
•3
•a •b • Gc = 0 Gd = –1
•4 •1 •2 • Ge = +1 Gf = +9
•2 •c
•2
•d •Compute all the gains
•3 •4 •1
• gab = Ga + Gb – 2wab = –3 + 2 – 23 = –7
•6
•e •f • gad = Ga + Gd – 2wad = –3 – 1 – 20 = –4
• gaf = Ga + Gf – 2waf = –3 + 9 – 20 = +6
•cut-size = 16
• gcb = Gc + Gb – 2wcb = 0 + 2 – 21 = 0
•Pair with • gcd = Gc + Gd – 2wcd = 0 – 1 – 22 = –5
maximum gain • gcf = Gc + Gf – 2wcf = 0 + 9 – 24 = +1
• geb = Ge + Gb – 2web = +1 + 2 – 20 = +3
• ged = Ge + Gd – 2wed = +1 – 1 – 20 = 0
90
• gef = Ge + Gf – 2wef = +1 + 9 – 26 = –2
KL algorithm (4)
•3
•a •b •f •1
•b
•4 •1 •2 •4 •1 •2
•3
•2 •2
•2 •c •d •c •d
•3 •4 •1 •6 •3 •4
•6 •2
•e •f •e •a
•cut-size = 16 •cut-size = 16 – 6 = 10
•cut-size = 13
• G”e = G’e + 2ced – 2cec = –7 + 2(0 – 3) = –
1
• G”b = G’b + 2cbd – 2cbc= –4 + 2(2 – 1) = –
2•Compute the gains
•Pair with max. gain
is (e, b) •g”eb = G”e + G”b – 2ceb = –1 – 2 – 20 = –3
95
KL algorithm (9)
• Summary of the Gains…
– g = +6
– g + g’ = +6 – 3 = +3
– g + g’ + g” = +6 – 3 – 3 = 0
• Maximum Gain = g = +6
• Exchange only nodes a and f.
• End of 1 pass.
96
Demerits of Kernighan–Lin Algorithm
• Minimizes the number of edges cut, not the number of nets cut.
• Does not directly allow for more than two partitions.
• Does not allow logic cells to be different sizes.
• Does not allow partitions to be unequal or find the optimum partition size.
• Does not allow for selected logic cells to be fixed in place.
• K-L Finding local optimum solution in random fashion
– Random starting partition
– Choice of nodes to swap may have equal gain
• Expensive in computation time.
2
– An amount of computation time that grows as n log n for 2n nodes.
Solution:
To implement a net-cut partitioning rather than an edge-cut partitioning,
keep track of the nets rather than the edges – FM algorithm
97
Recap of Kernighan-Lin’s Algorithm
a Pair-wise exchange of nodes to reduce cut size
a Allow cut size to increase temporarily within a pass
•Compute the gain of a swap
•Repeat u v
•Perform a feasible swap of max gain
•Mark swapped nodes “locked”; v u
•Update swap gains;
locked
• Until no feasible swap;
• Find max prefix partial sum in gain sequence g1, g2, …, gm
• Make corresponding swaps permanent.
99
Fiduccia-Mattheyses (F-M) Algorithm
• Addresses the difference between nets and edges.
• Reduce the computational time.
Key Features of F-M:
• Base logic cell - Only one logic cell moves at a time.
– Base logic cell is chosen to maintain balance between partitions in order to stop the
algorithm from moving all the logic cells to one large partition
– Balance - the ratio of total logic cell size in one partition to the total logic cell size in the
other. Altering the balance allows us to vary the sizes of the partitions.
• The logic cells that are free to move are stored in a doubly linked list. The lists are
sorted according to gain. This allows the logic cells with maximum gain to be found
quickly.
• Reduce the computation time - increases only slightly more than linearly with the
number of logic cells in the network. 100
Overcome problems in K-L using F-M algorithm
– That logic cells should not be considered as base logic cells in F-M
algorithm.
Features of FM Algorithm
• Modification of KL Algorithm:
– Can handle non-uniform vertex weights (areas)
– Allow unbalanced partitions
– Extended to handle hypergraphs
– Clever way to select vertices to move, run
much faster.
Hypergraph
•Hypergraph- To represent nets with multiple terminals in a network accurately
•A hypergraph consist of –
•FIGURE - A hypergraph. (a) The network contains a net y with three terminals. (b) In the
network hypergraph we can model net y by a single hyperedge (B, C, D) and a star node. Now
there is a direct correspondence between wires or nets in the network and hyperedges in the graph
103
Problem Formulation
• Input: A hypergraph with
– Set vertices V. (|V| = n)
– Set of hyperedges E. (total # pins in netlist = p)
– Area au for each vertex u in V.
– Cost ce for each hyperedge in e.
– An area ratio r.
• Output: 2 partitions X & Y such that
– Total cost of hyperedges cut is minimized.
– area(X) / (area(X) + area(Y)) is about r.
•-pmax
•1 •2 •n
FM Partitioning:
Moves are made based on object gain.
-1 0 2
- each object is assigned a
gain
- objects are put into a sorted 0
0 -
gain list -2
- the object with the highest gain
from the larger of the two sides
is selected and moved. 0 0
-2
- the moved object is "locked"
- gains of "touched" objects are -1
recomputed 1
-1
- gain lists are resorted 1
FM Partitioning:
-1 0 2
0
0 -
-2
0 0
-2
-1
1
-1
1
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1
-1
1
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1 1
-1
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1
1
-1
-1 -2 -2
0
-2 -
-2
0 -2
-2
1 -1
-1
-1
-1 -2 -2
-2 -1
-2 0
0 -2
-2
1 -1
-1
-1
-1 -2 -2
-2 -
-2 0
0 -2
-2
1 -1
-1
-1
-1 -2 -2
-2 1
-2
0
-2 -2
-2
1 -1
-1
-1
-1 -2 -2
-2 1
-2
0
-2 -2
1 -2
-1
-1
-1
-1 -2 -2
-2 1
-2
0
-2 -2
1 -2
-1
-1
-1
-1 -2 -2
-2 1
-2
0
-2 -1
-2
-2
-3
-1
-1
-1 -2 -2
1
-2
-2
0
-1
-2 -2
-2
-3
-1
-1
-1 -2 -2
1
-2
-2
0
-1
-2 -2
-2
-3
-1
-1
-1 -2 -2
-1
-2
-2
-2
-1
-2 -2
-2
-3
-1
-1
Ratio-Cut Algorithm
• Removes the restriction of constant partition sizes.
• The cut weight W for a cut that divides a network into two partitions, A
and B , is given by ,
W c
aA,bB
ab
123
Ratio-Cut Algorithm (contd.,)
A network is partitioned into small, highly connected groups
using ratio cuts.
Algorithm
• The gain for the initial move is called as the first-level gain.
• Gains from subsequent moves are then second-level and higher gains.
• Define a gain vector that contains these gains.
• The choice of nodes to be swapped are found Using the gain vector .
• This reduces both the mean and variation in the number of cuts in the
resulting partitions.
125
Look-ahead Algorithm
126
127
Look-ahead Algorithm (contd.,)
•An example of network partitioning that shows the need to look ahead when
selecting logic cells to be moved between partitions.
•Partitionings (a), (b), and (c) show one sequence of moves – Partition I
•Partition I:
•Partition II:
•The partitioning shown in (d) is the same as (a). We can move node 5 to B
with a gain of 1 as shown in (e), but now we can move node 4 to B with a 128
gain
Partitioning:
Simulated Annealing
129
State Space Search Problem
• Combinatorial optimization problems (like partitioning) can be thought
as a State Space Search Problem.
• A State is just a configuration of the combinatorial objects involved.
• The State Space is the set of all possible states (configurations).
• A Neighbourhood Structure is also defined (which states can one go in
one step).
• There is a cost corresponding to each state.
• Search for the min (or max) cost state.
130
Greedy Algorithm
• A very simple technique for State Space Search
Problem.
• Start from any state.
• Always move to a neighbor with the min cost
(assume minimization problem).
• Stop when all neighbors have a higher cost than
the current state.
131
Simulated Annealing
• Very general search technique.
• Try to avoid being trapped in local minimum by
making probabilistic moves.
• Popularize as a heuristic for optimization by:
– Kirkpatrick, Gelatt and Vecchi, “Optimization by
Simulated Annealing”, Science, 220(4598):498-516,
May 1983.
132
•Jigsaw puzzles – Intuitive usage of Simulated
Annealing
• Given a jigsaw puzzle such
that one has to obtain the
final shape using all pieces
together.
• Starting with a random
configuration, the human
brain unconditionally
chooses certain moves that
tend to the solution.
• However, certain moves that
may or may not lead to the
solution are accepted or
rejected with a certain small
probability.
• The final shape is obtained
as a result of a large number
of iterations.
133
Basic Idea of Simulated
Annealing
• Inspired by the Annealing Process:
– The process of carefully cooling molten metals in order
to obtain a good crystal structure.
– First, metal is heated to a very high temperature.
– Then slowly cooled.
– By cooling at a proper rate, atoms will have an
increased chance to regain proper crystal structure.
• Attaining a min cost state in simulated annealing
is analogous to attaining a good crystal structure in
annealing.
134
The Simulated Annealing Procedure
Let t be the initial temperature.
Repeat
Repeat
– Pick a neighbor of the current state randomly.
– Let c = cost of current state.
Let c’ = cost of the neighbour picked.
– If c’ < c, then move to the neighbour (downhill move).
– If c’ > c, then move to the neighbour with probablility e-
(c’-c)/t (uphill move).
135
•Convergence of simulated annealing
HILL CLIMBING
HILL CLIMBING
AT FINAL_TEMP
NUMBER OF ITERATIONS
136
•Ball on terrain example – SA vs Greedy Algorithms
Initial position
of the ball Simulated Annealing explores
more. Chooses this move with a
small probability (Hill Climbing)
Greedy Algorithm
gets stuck here!
Locally Optimum
Solution.
138
Simulated Annealing
• A parameter that relates the temperatures, T i and T i +1, at the i th and i + 1 th
iteration:
T i +1 =α T i .
• Finally, as the temperature approaches zero, refuse to make any moves that
increase the energy of the system and the system falls and comes to rest at the
nearest local minimum.
140
Other Partitioning Objectives
Constraints or Purpose Implemented
Objectivies
Timing Constraints certain logic cells in a system Adding weights to nets to make
may need to be located on them more important than others.
the same ASIC in order to avoid
adding the delay of any external
interconnections
Power Constraints Some logic cells may consume To assign more than rough estimates
more power than others of power consumption for each logic
cell at the system planning stage,
before any simulation has been
completed.
Technology Constraints To include memory on an ASIC It will keep logic cell together
requiring similar technology
Cost Constraints To use low cost package To keep ASICs below a certain size.
142
•Asynchronous transfer mode (ATM) cell format
•FIGURE 15.4 The asynchronous transfer mode (ATM) cell format. The ATM protocol uses
53-byte cells or packets of information with a data payload and header information for
routing and error control.
143
•An asynchronous transfer mode (ATM) connection
simulator
144
FPGA Partitioning
• The simulator is partitioned into the three major blocks
• ATM traffic policer - which regulates the input to the simulator
• ATM cell delays generator – which delays ATM cell, reorders
ATM cells and inserts ATM cells with valid ATM cell headers.
• ATM cell error generator – which produce bit errors and four
random variables that are needed by the other two blocks.
• The Traffic Policer performs the following operations:
• Performs header screening and remapping.
• Checks ATM cell Conformance.
• Delete selected ATM Cells.
• The delay generator delays, misinserts and reorders the target ATM cells.
• The error generator performs the following operations:
• Payload bit error ratio generation: The user specifies the Bernoulli
probability PBER of the payload bit error ratio.
• Random variable generation for ATM cell loss, misinsertion,
reordering and deletion.
145
Find the connectivity matrix for the ATM Connection
Simulator shown in Figure. Use the following scheme to
number the blocks and ordering of the matrix rows and
columns: 1 = Personal Computer, 2 = Intel 80186, 3 =
UTOPIA receiver, 4 = UTOPIA transmitter, 5 = Header
remapper and screener, 6 = Remapper SRAM, . . . 15 =
Random-number and bit error rate generator, 16 =
Random-variable generator. All buses are labeled with
their width except for two single connections (the arrows).
146
Automatic Partitioning with FPGAs
147
Power Dissipation
148
Switching current
• From charging and discharging of parasitic capacitance
• When the p -channel transistor in an inverter is charging a capacitance, C ,
at a frequency, f ,
– the current through the transistor is I=C (d V /d t ).
– The power dissipation is P=VI=CV (d V /d t ) for one-half the period of the input, t =
1/(2 f ).
– The power dissipated in the p -channel transistor is thus
1
2f VDD
1
Pdt CVdV 2 CV
2
DD
• When the n -channel transistor
0 0 discharges the capacitor, the power
dissipation is equal.( ie., )
1 2
CVDD
2
• Then total power dissipation,
P1 fCVDD
2
• Most of the power dissipation in a CMOS ASIC arises from this source—the
switching current.
• The best way to reduce power is to reduce V DD and to reduce C , the amount
of capacitance we have to switch.
149
Short circuit current
• Both n-channel and p-channel transistors momentarily on at the same
time
• For a CMOS inverter, the power dissipation due to the crowbar current is
ftrf
P2 VDD 2Vtn 3 , Transistor gain factor I DS
12
VGS Vtn
1
VDS VDS
2
– Where W LC is the same for both p - and n -channel transistors.
OX
– The threshold voltages Vtn are assumed equal for both transistor types.
– trf is the rise and fall time (assumed equal) of the input signal [ Veendrick,
1984].
150
Problem on Power Dissipation
• Inference:
(β=0.01AV-1, short-circuit current, P2=0.00133W or 1mW, switching
current, P1=0.01089W or 10mW)
– short-circuit current is typically less than 10 percent of the switching current.
151
Subthreshold current
• CMOS transistor is never completely off
• When the gate-to-source voltage, VGS , of an MOS transistor is less than the
threshold voltage, Vt , the transistor conducts a very small subthreshold current in the
subthreshold region
qV
I DS I 0 exp GS 1
nKT
– where I 0 is a constant, and the constant, n, is normally between 1 and 2.
152
Leakage Current
• Transistor leakage is caused by the fact that a reverse-biased diode conducts
a very small leakage current.
– temperature.
• The parasitic diodes have two components in parallel: an area diode and a
perimeter diode.
• The leakage current due to perimeter diode is larger than area diode.
• The ideal parasitic diode currents are given by the following equation:
qVD
I I s exp 1
nKT 153
Module III
Floorplanning
155
ASIC Design Process
156
Introduction
• The input to the floorplanning step - output of system partitioning and
design entry—a netlist.
• Netlist - describing circuit blocks, the logic cells within the blocks, and
their connections.
157
•The starting point of floorplaning and placement steps for
the viterbi decoder
•-collection of standard cells with no room set aside yet158for
routing.
The starting point of floorplaning and
placement steps for the viterbi decoder
• Small boxes that look like bricks - outlines of the standard cells.
• Large box surrounding all the logic cells - estimated chip size.
159
The viterbi decoder after floorplanning and placement
160
The viterbi decoder after floorplanning
and placement
Objectives of Floorplanning –
To minimize the chip area
To minimize delay.
162
Measuring area is straightforward, but measuring delay is more difficult
Measurement of Delay in Floor planning
164
Measurement of Delay in Floor planning
(contd.,)
• A floorplanning tool can use predicted-capacitance tables (also
known as interconnect-load tables or wire-load tables ).
• Typically between 60 and 70 percent of nets have a FO = 1.
165
Measurement of Delay in Floor planning
(contd.,)
• We often see a twin-peaked distribution at the chip level also,
corresponding to separate distributions for interblock routing (inside
blocks) and intrablock routing (between blocks).
• The distributions for FO > 1 are more symmetrical and flatter than for
FO = 1.
• The wire-load tables can only contain one number, for example the
average net capacitance, for any one distribution.
• Many tools take a worst-case approach and use the 80- or 90-percentile
point instead of the average. Thus a tool may use a predicted
capacitance for which we know 90 percent of the nets will have less
166
than the estimated capacitance.
Measurement of Delay in Floor planning
(contd.,)
• Repeat the statistical analysis for blocks with different sizes.
167
168
Floorplanning - Optimization
Optimize Performance
• Chip area.
• Total wire length.
• Critical path delay.
• Routability.
• Others, e.g. noise, heat dissipation.
Cost = αA + βL,
Where
A = total area,
L = total wire length,
α and β constants.
169
Floorplanning
Area
•Deadspace
• Fixed blocks:
– The dimensions and connector locations of the other fixed blocks (perhaps RAM, ROM, compiled
cells, or megacells) can only be modified when they are created.
• Seeding:
– Force logic cells to be in selected flexible blocks by seeding . We choose seed cells by name.
– Seeding may be hard or soft.
• Hard seed - fixed and not allowed to move during the remaining floor
planning and placement steps.
• Soft seed - an initial suggestion only and can be altered if necessary by the
floor planner.
No Bounds
•Block 4
•Block 3
•Block 2
•Block 1
•NOT
GOOD!!
With Bounds
lower bound ≤ height/width ≤ upper bound
•Soft Blocks
• Flexible shape
• I/O positions not yet determined
•Hard Blocks
• Fixed shape
• Fixed I/O pin positions 172
Sizing example*
173
Floorplanning Tools
174
•Aspect ratio and Congestion
Analysis
•Defining the channel routing order for a slicing floorplan using a slicing tree.
•(a) Make a cut all the way across the chip between circuit blocks. Continue slicing until each
piece contains just one circuit block. Each cut divides a piece into two without cutting
through a circuit block.
•(b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks
are left.
•(c) The slicing tree corresponding to the sequence of cuts gives the order in which to route
the channels: 4, 3, 2, and finally 1.
177
Slicing Floorplan and General Floorplan
•Slicing floorplan •v
•5 •h •h
•1 •3
•1 •2 •v •v
•6
•3 •h •4 •7
•2
•4 •7 •5 •6
•Slicing Tree
•non-slicing floorplan
178
Area Utilization
• Area utilization
– Depends on how nicely the rigid modules’ shapes are
matched
– Soft modules can take different shapes to “fill in”
empty slots
– Floorplan sizing
•m3•m4
•m3•m4
•1 •3 •m1 •m1
•m1
•2
•m2
•m2
•4
•m7
•m7
•7 •6
•m6
•m6
•L •R •T •B
•bi
•bi •yj •max(bi, yj) •ai •bi+ yj
•ai •xj
•ai+ xj •xj •yj
•max(ai, xj) 180
Slicing Floorplan Sizing
• Simple case: all modules are hard macros
– No rotation allowed, one shape only
•3
•17x16•1234567
•1 •2
•4
•7 •6 •5
•8x8 •7x5
•6 •7 •2 •34 •4x11
•m6
•3x6 •4x5
•Slicing Floorplan Sizing
General case: all modules are soft macros
Stockmeyer’s work (1983) for optimal module orientation
Non-slicing = NP complete
Slicing = polynomial time solvable with dynamic programming
Phase 1: bottom-up
Input: floorplan tree, modules shapes
Start with sorted shapes lists of modules
Perform Vertical_Node_Sizing & Horizontal_Node_Sizing
When get to the root node, we have a list of shapes. Select the one
that is best in terms of area
Phase 2: top-down
Traverse the floorplan tree and set module locations
182
Sizing Example
•A •B •a1 •a2 •a3
•a1 •a2
•b3 •a3
•b3 •b3 •b3 183
•4x2 •8x6 •9x5 •10x4
Cyclic Constraints
•Cyclic constraints.
•(a) A nonslicing floorplan with a cyclic constraint that prevents channel routing.
(b) In this case it is difficult to find a slicing floorplan without increasing the chip
area.
•(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic
constraints, but it is inefficient in area use and will be very difficult to route.
184
Cyclic Constraints
•
•(a) We can eliminate the cyclic constraint by merging the blocks A and C.
•(b) A slicing structure.
185
I/O and Power Planning (contd.,)
• Every chip communicates with the outside world.
• Signals flow onto and off the chip and we need to supply
power.
•FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of
pads determines the die size. (b) A core-limited die: The core logic determines the die
size. (c) Using both pad-limited pads and core-limited pads for a square die.
188
I/O and Power Planning (contd.,)
• Special power pads are used for:1. positive supply, or VDD, power buses
(or power rails ) and
2. ground or negative supply, VSS or GND.
– one set of VDD/VSS pads supplies power to the I/O pads only.
– Another set of VDD/VSS pads connects to a second power ring that supplies the logic core.
• I/O pads also contain special circuits to protect against electrostatic discharge
( ESD ).
– These circuits can withstand very short high-voltage (several kilovolt) pulses that can be generated
during human or machine handling.
189
I/O and Power Planning (contd.,)
• If we make an electrical connection between the substrate and a chip pad, or to a
package pin, it must be to VDD ( n -type substrate) or VSS ( p -type substrate). This
substrate connection (for the whole chip) employs a down bond (or drop bond) to the
carrier. We have several options:
We can dedicate one (or more) chip pad(s) to down bond to the chip carrier.
We can make a connection from a chip pad to the lead frame and down bond
from the chip pad to the chip carrier.
We can make a connection from a chip pad to the lead frame and down bond
from the lead frame.
We can down bond from the lead frame without using a chip pad.
• Depending on the package design, the type and positioning of down bonds may be fixed.
This means we need to fix the position of the chip pad for down bonding using a pad
seed
190
I/O and Power Planning (contd.,)
• A double bond connects two pads to one chip-carrier finger and one package
pin. We can do this to save package pins or reduce the series inductance of
bond wires (typically a few nanohenries) by parallel connection of the pads.
– The output pads can easily consume most of the power on a CMOS ASIC, because the load on
a pad (usually tens of picofarads) is much larger than typical on-chip capacitive loads.
– Depending on the technology it may be necessary to provide dedicated VDD and VSS pads for
every few SSOs. Design rules set how many SSOs can be used per VDD/VSS pad pair. These
dedicated VDD/VSS pads must “follow” groups of output pads as they are seeded or planned on
the floorplan.
191
I/O and Power Planning (contd.,)
• Using a pad mapping, we translate the logical pad in a netlist to a physical
pad from a pad library. We might control pad seeding and mapping in the
floorplanner.
• In single-supply chips we have one VDD net and one VSS net, both
global power nets . It is also possible to use mixed power supplies
(for example, 3.3 V and 5 V) or multiple power supplies ( digital VDD,
analog VDD).
192
I/O and Power Planning (contd.,)
•FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited pads. (b) A hybrid
corner pad. (c) A chip with stagger-bonded pads. (d) An area-bump bonded chip (or flip-chip). The chip is
turned upside down and solder bumps connect the pads to the lead frame 193
I/O and Power Planning (contd.,)
• stagger-bond arrangement using two rows of I/O pads.
– In this case the design rules for bond wires (the spacing and the angle at which the
bond wires leave the pads) become very important.
– Even though the bonding pads are located in the center of the chip, the I/O circuits
are still often located at the edges of the chip because of difficulties in power
supply distribution and integrating I/O circuits together with logic in the center of
the die.
• Some automatic routers may require that metal lines parallel to a channel
spine use a preferred layer (either m1, m2, or m3). Alternatively we say that
a particular metal layer runs in a preferred direction .
196
I/O and Power Planning (contd.,)
•FIGURE 16.15 Power distribution. (a) Power distributed using m1 for VSS and m2 for VDD. This helps
minimize the number of vias and layer crossings needed but causes problems in the routing channels.
(b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine. This can
make automatic routing easier but may increase the number of vias and layer crossings. (c) An
expanded view of part of a channel (interconnect is shown as lines). If power runs on different layers
along the spine of a channel, this forces signals to change layers. (d) A closeup of VDD and VSS buses197 as
they cross. Changing layers requires a large number of via contacts to reduce resistance .
Power distribution.
• (a) Power distributed using m1 for VSS and m2 for VDD.
– This helps minimize the number of vias and layer crossings needed
– but causes problems in the routing channels.
• (d) A closeup of VDD and VSS buses as they cross. Changing layers
requires a large number of via contacts to reduce resistance.
198
Clock Planning
• clock spine routing scheme with all clock pins driven directly from the clock
driver. MGAs and FPGAs often use this fish bone type of clock distribution
scheme
• clock skew and clock latency
•FIGURE 16.16 Clock distribution.
•(a) A clock spine for a gate array.
•(b) A clock spine for a cell-based ASIC
(typical chips have thousands of clock
nets).
•(c) A clock spine is usually driven from
one or more clock-driver cells. Delay in
the driver cell is a function of the
number of stages and the ratio of output
to input capacitance for each stage
(taper).
•(d) Clock latency and clock skew. We
would like to minimize both latency and
skew.
199
Clock Planning (cont.,)
• FIGURE 16.17 A clock tree. (a) Minimum delay is achieved when the taper of
successive stages is about 3. (b) Using a fanout of three at successive nodes.
(c) A clock tree for the cell-based ASIC of Figure 16.16 b. We have to balance
the clock arrival times at all of the leaf nodes to minimize clock skew.
200
•Module III
•Placement
•Dr.(Mrs).D.Gracia Nirmala Rani
•Assistant Professor
•ECE Department
•Thiagarajar College of Engineering
•Madurai-15
•Email : [email protected]
•201
Content
Placement Definitions
Placement Goals and Objectives
Measurement of placement Goals and Objectives
Placement Algorithms
Simple placement Example
Physical Design Flow
•202
Placement
The process of arranging circuit components on a layout
surface undercertain constraints.
Inputs : Set of fixed modules, netlist
Output : Best position for each module based on various
cost functions
Cost functions include wirelength, wire routability,
hotspots, performance, I/O pads.
Placement is much more suited to automation than
floorplanning.
After we complete floorplanning and placement, we can
predict both intrablock and interblock capacitances
•203
Good placement vs Bad placement*
•FIGURE 16.18 INTERCONNECT STRUCTURE. (a) The two-level metal CBIC floorplan shown in Figure 16.11
b. (b) A channel from the flexible block A. This channel has a channel height equal to the maximum channel density of
7 (there is room for seven interconnects to run horizontally in m1). (c) A channel that uses OTC (over-the-cell) routing
•205
in m2.
•FIGURE 16.19 GATE-ARRAY INTERCONNECT. (a) A small two-level metal gate array (about 4.6 k-
gate). (b) Routing in a block. (c) Channel routing showing channel density and channel capacity. The channel
height on a gate array may only be increased in increments of a row. If the interconnect does not use up all of
the channel, the rest of the space is wasted. The interconnect in the channel runs in m1 in the horizontal
direction with m2 in the vertical direction.
•206
Vertical interconnect uses feedthroughs to cross the logic cells. Here are some
commonly used terms with explanations (there are no generally accepted
definitions):
An unused vertical track (or just track ) in a logic cell is called an uncommitted
feedthrough (also built-in feedthrough , implicit feedthrough , or jumper ).
A vertical strip of metal that runs from the top to bottom of a cell (for double-entry
cells ), but has no connections inside the cell, is also called a feedthrough or
jumper.
Two connectors for the same physical net are electrically equivalent connectors
(or equipotential connectors ). for double-entry cells these are usually at the top
and bottom of the logic cell.
A dedicated feedthrough cell (or crosser cell ) is an empty cell (with no logic) that
can hold one or more vertical interconnects. These are used if there are no other
feedthroughs available.
A spacer cell (usually the same as a feedthrough cell) is used to fill space in rows
so that the ends of all rows in a flexible block may be aligned to connect to power
buses, for example.
•207
There are also LOGICALLY EQUIVALENT CONNECTORS (or FUNCTIONALLY
EQUIVALENT CONNECTORS, sometimes also called just EQUIVALENT
CONNECTORS—which is very confusing).
Example: The two inputs of a two-input NAND gate may be logically equivalent
connectors. The placement tool can swap these without altering the logic (but the two
inputs may have different delay properties, so it is not always a good idea to swap
them).
•208
Interconnect Area for CBIC,MGA and FPGA
HORIZONTAL INTERCONNECT
In the case of channeled gate arrays and FPGAs, the horizontal interconnect
areas—the channels, usually on m1—have a fixed capacity.
VERTICAL INTERCONNECT
In the vertical interconnect direction, usually m2, FPGAs still have fixed
resources.
•209
Placement Goals and Objectives
The goal of a placement tool is to arrange all the logic cells within the flexible
blocks on a chip.
Ideally, the objectives of the placement step are to
Guarantee the router can complete the routing step
Minimize all the critical net delays
Make the chip as dense as possible
We may also have the following additional objectives:
Minimize power dissipation
Minimize cross talk between signals
Current placement tools use more specific and achievable criteria. The most
commonly used placement objectives are one or more of the following:
Minimize the total estimated interconnect length
Meet the timing requirements for critical nets
Minimize the interconnect congestion
•210
Measurement of Placement Goals and Objectives
The graph structures that correspond to making all the connections for a net
are known as trees on graphs (or just trees ).
The Manhattan distance (or rectangular distance) between two points is the
distance we would have to walk in New York.
•211
•FIGURE 16.20 Placement using trees on graphs. (a) The floorplan from Figure 16.11 b. (b) An expanded view
of the flexible block A showing four rows of standard cells for placement (typical blocks may contain
thousands or tens of thousands of logic cells). We want to find the length of the net shown with four terminals,
W through Z, given the placement of four logic cells (labeled: A.211, A.19, A.43, A.25). (c) The problem for
net (W, X, Y, Z) drawn as a graph. The shortest connection is the minimum Steiner tree. (d) The minimum
rectilinear Steiner tree using Manhattan routing. The rectangular (Manhattan) interconnect-length measures are
•212
shown for each tree
Measurement of Placement (contd.,)
The minimum rectilinear Steiner tree ( MRST ) is the shortest interconnect using
a rectangular grid.The determination of the MRST is in general an NP-complete
problem—which means it is hard to solve.
The complete graph has connections from each terminal to every other terminal.
The complete-graph measure adds all the interconnect lengths of the complete-graph
connection together and then divides by n /2, where n is the number of terminals.
Complete graph = (n ( n -1) ) / 2 )
The bounding box is the smallest rectangle that encloses all the terminals.
•213
FIGURE 16.21 Interconnect-length measures. (a) Complete-
graph measure. (b) Half-perimeter measure.
•214
Correlation between total length of chip interconnect and the half-
perimeter and complete-graph measures.
Meander factor that specifies, on average, the ratio of the interconnect created by the
routing tool to the interconnect-length estimate used by the placement tool.
Another problem is that we have concentrated on finding estimates to the MRST, but the
MRST that minimizes total net length may not minimize net delay.
•215
Interconnect congestion
There is no point in minimizing the interconnect length if we create a placement that is
Maximum cut line: Imagine a horizontal or vertical line drawn anywhere across a chip or
block,
The number of interconnects that must cross this line is the cut size (the number of
interconnects we cut).The maximum cut line has the highest cut size.
•216
•FIGURE 16.23 Interconnect congestion for the cell-based
ASIC from Figure 16.11 (b). (a) Measurement of
congestion. (b) An expanded view of flexible block A
shows a maximum cut line.
•217
Interconnect Delay
The problem with this approach is that a logic cell may be placed a long way from another
logic cell to which it has just one connection. This logic cell with one connection is less important
as far as the total wire length is concerned than other logic cells, to which there are many connections.
However, the one long connection may be critical as far as timing delay is concerned.
As technology is scaled, interconnection delays become larger relative to circuit delays and
this problem gets worse.
•218
Interconnect Delay
In timing-driven placement we must estimate delay for every net for every trial
placement, possibly for hundreds of thousands of gates.
Unfortunately, the minimum-length Steiner tree does not necessarily correspond to the
interconnect path that minimizes delay. To construct a minimum-delay path we may have to
route with non-Steiner trees.
In the placement phase typically we take a simple interconnect length approximation to this
minimum-delay path (typically the half-perimeter measure).
Even when we can estimate the length of the interconnect, we do not yet have information on
which layers and how many vias the interconnect will use or how wide it will be. Some tools
allow us to include estimates for these parameters.
Often we can specify metal usage , the percentage of routing on the different layers to expect
from the router. This allows the placement tool to estimate RC values and delays—and thus
minimize delay.
•219
Placement Algorithms
•220
•FIGURE 16.24 Min-cut placement. (a) Divide the chip into bins using a grid. (b) Merge all connections to
the center of each bin. (c) Make a cut and swap logic cells between bins to minimize the cost of the cut.
(d) Take the cut pieces and throw out all the edges that are not inside the piece. (e) Repeat the process with a
new cut and continue until we reach the individual bins.
•221
Eigen Value Placement Algorithm
The eigenvalue placement algorithm uses the cost matrix or weighted connectivity matrix (eigen
value methods are also known as spectral methods ) [Hall, 1970]. The measure we use is a cost
function f that we shall minimize, given by ,
1 n
f cij d ij2
2 i 1 (1)
where C = [ c ij ] is the (possibly weighted) connectivity matrix, and d ij is the Euclidean distance
between the centers of logic cell i and logic cell j . Since we are going to minimize a cost function that is
the square of the distance between logic cells, these methods are also known as quadratic placement
methods. This type of cost function leads to a simple mathematical solution. We can rewrite the cost
function f in matrix form:
f cij xi x j yi y j
1 n 2 2
2 i , j 1
f xT Bx y T By
B= D- C
•222
n
d ii Cij
where,
j 1
d ij 0, i j
We can simplify the problem by noticing that it is symmetric in the x - and y -coordinates.
Let us solve the simpler problem of minimizing the cost function for the placement of logic cells
along just the x – axis first. We can then apply this solution to the more general two-dimensional
placement problem.
Before we solve this simpler problem, we introduce a constraint that the coordinates of the logic
cells must correspond to valid positions (the cells do not overlap and they are placed on-grid). We
make another simplifying assumption that all logic cells are the same size and we must place
them in fixed positions. We can define a vector p consisting of the valid positions:
p p1 , p2 .... pn (4)
x p
i 1
2
i
i 1
2
i
(6)
•223
Simplifying the problem in this way will lead to an approximate solution to the placement
problem. We can write this single constraint on the x -coordinates in matrix form: ,
xT x P
n
P pi2
i 1
where P is a constant.
•224
We can now summarize the formulation of the problem, with the simplifications that we have
made, for a one-dimensional solution. We must minimize a cost function, g, where
(8)
subject to the constraint: g x Bx
T
(9)
T
x x p
This is a standard problem that we can solve using a Lagrangian multiplier:
xT Bx xT x p (10)
To find the value of x that minimizes g we differentiate L partially with respect to x and set the
result equal to zero. We get the following equation:
B I x 0 (11)
This last equation is called the characteristic equation for the disconnection matrix B and occurs
frequently in matrix algebra (this l has nothing to do with scaling). The solutions to this
equation are the eigenvectors and eigenvalues of B . Multiplying Eq.(11) by x T we get:
x x xT xx= Bx
However, since we imposed the constraint
T T
P and x T Bx = g , then
g
The eigenvectors of the disconnection matrix B are the solutions to our
placement problem. p
•225
•226
Iterative Placement Improvement
There are several interchange or iterative exchange methods that differ in their
selection and measurement criteria:
Pair wise interchange,
force-directed interchange,
force-directed relaxation, and
force-directed pair wise relaxation.
All of these methods usually consider only pairs of logic cells to be exchanged.
A source logic cell is picked for trial exchange with a destination logic cell
•227
Iterative Placement Improvement
There are several interchange or iterative exchange methods that differ in their
selection and measurement criteria:
Pair wise interchange,
force-directed interchange,
force-directed relaxation, and
force-directed pair wise relaxation.
All of these methods usually consider only pairs of logic cells to be exchanged.
A source logic cell is picked for trial exchange with a destination logic cell
•228
Iterative Placement Improvement
(contd.,)
The pair wise-interchange algorithm is similar to the interchange algorithm
used for iterative improvement in the system partitioning step:
•229
•FIGURE 16.26 Interchange.
• (a) Swapping the source logic cell with a destination logic cell in pairwise interchange.
•(b) Sometimes we have to swap more than two logic cells at a time to reach an optimum
placement, but this is expensive in computation time. Limiting the search to
neighborhoods reduces the search time. Logic cells within a distance e of a logic cell
form an e-neighborhood.
•(c) A one-neighborhood.
•230 •(d) A two-neighborhood.
Iterative Placement Improvement
(contd.,)
Force-directed placement methods:
Imagine identical springs connecting all the logic cells we wish to place. The
number of springs is equal to the number of connections between logic cells. The
effect of the springs is to pull connected logic cells together. The more highly connected the
logic cells, the stronger the pull of the springs. The force on a logic cell i due to logic
cell j is given by Hooke’s law , which says the force of a spring is proportional to its
extension:
F ij = – c ij x ij .
The vector component x ij is directed from the center of logic cell i to the center of logic
cell j .
The vector magnitude is calculated as either the Euclidean or
Manhattan distance between the logic cell centers.
The c ij form the connectivity or cost matrix (the matrix element c ij is the
number of connections between logic cell i and logic cell j ).
•231
•FIGURE 16.27 Force-directed placement.
• (a) A network with nine logic cells.
• (b) We make a grid (one logic cell per bin).
• (c) Forces are calculated as if springs were attached to the
centers of each logic cell for each connection.The two nets
connecting logic cells A and I correspond to two springs.
•232
•(d) The forces are proportional to the spring extensions.
Iterative Placement Improvement
(contd.,)
Force-directed placement algorithms:
•233
•FIGURE 16.28 Force-directed iterative placement
improvement.
•(a) Force-directed interchange.
•(b) Force-directed relaxation.
•(c) Force-directed pairwise relaxation.
•234
Placement Using Simulated Annealing
Applying simulated annealing to placement, the algorithm is as follows:
•235
Timing-Driven Placement Methods
Minimizing delay is becoming more and more important as a placement
objective.
– net based
– path based
One method finds the n most critical paths (using a timing-analysis engine,
possibly in the synthesis tool).
The net weights might then be the number of times each net appears in this list.
Another method to find the net weights uses the zeroslack algorithm.
•236
Timing-Driven Placement Methods
Figure 16.29 (a) shows a circuit with primary inputs at which we know the
arrival times (actual times) of each signal.
We also know the required times for the primary outputs the points in
time at which we want the signals to be valid.
We can work forward from the primary inputs and backward from the
primary outputs to determine arrival and required times at each input pin
for each net.
The difference between the required and arrival times at each input pin is
the slack time (the time we have to spare).
The zero-slack algorithm adds delay to each net until the slacks are zero, as
shown in Figure 16.29 (b).
The net delays can then be converted to weights or constraints in the
placement.
•237
•FIGURE 16.29
The zero-slack
algorithm.
(a) The circuit
with no net
delays.
•238
Physical design flow
•239
Module IV
Routing
This is still a hard problem that is made easier by dividing it into smaller
problems.
•241
•The starting point of floorplaning and placement steps for
the viterbi decoder
•-collection of standard cells with no room set aside yet •242
routing.
for
The starting point of floorplaning and
placement steps for the viterbi decoder
• Small boxes that look like bricks - outlines of the standard cells.
• Large box surrounding all the logic cells - estimated chip size.
•243
The viterbi decoder after floorplanning
•244
•FIGURE 17.1 The core of the Viterbi decoder chip after placement (a screen shot from
Cadence Cell Ensemble) •245
•FIGURE 17.2 The core of the Viterbi decoder chip after the completion of global and detailed
routing (a screen shot from Cadence Cell Ensemble). This chip uses two-level metal. Although you
cannot see the difference, m1 runs in the horizontal direction and m2 in the vertical direction. •246
Global Routing
• The details of global routing differ slightly between
– cell-based ASICs, gate arrays, and FPGAs, but the principles are the
same.
• A global router does not make any connections, it just plans them.
• Global route the whole chip (or large pieces if it is a large chip) before detail
routing the whole chip (or the pieces).
•247
Goals and Objectives
• Input to routing
– Floorplan that includes the locations of all the fixed and flexible blocks;
– Placement information for flexible blocks;
• Locations of all the logic cells.
•248
Measurement of Interconnect Delay
• After placement, the logic cell positions are fixed and the global router can afford to use
better estimates of the interconnect delay.
• To illustrate one method, we shall use the Elmore constant to estimate the interconnect
delay for the circuit shown in Figure 17.3 .
•FIGURE 17.3 Measuring the delay of a net. (a) A simple circuit with an inverter A driving a
net with a fanout of two. Voltages V 1 , V 2 , V 3 , and V 4 are the voltages at intermediate
points along the net. (b) The layout showing the net segments (pieces of interconnect).
(c) The RC model with each segment replaced by a capacitance and resistance. The ideal •249
switch and pull-down resistance R pd model the inverter A.
The problem is to find the voltages at the inputs to logic cells B and C taking
into account the parasitic resistance and capacitance of the metal interconnect.
Figure 17.3 (c) models logic cell A as an ideal switch with a pull-down
resistance equal to R pd and models the metal interconnect using resistors and
capacitors for each segment of the interconnect.
•The Elmore constant for node 4 (labeled V 4 ) in the network
shown in Figure 17.3 (c) is
4
ζ4 = ΣR k 4 C k (17.1)
k=1
= R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4 ,
R 24 = R pd + R 1
R 34 = R pd + R 1 + R 3
•250
R 44 = R pd + R 1 + R 3 + R 4
In Eq. 17.2 notice that R 24 = R pd + R 1 (and not R pd + R 1 + R 2 ) because
R 1 is the resistance to V 0 (ground) shared by node 2 and node 4.
Suppose we have the following parameters (from the generic 0.5 m m CMOS
process, G5) for the layout shown in Figure 17.3 (b):
• m2 resistance is 50 m Ω /square.
• m2 capacitance (for a minimum-width line) is 0.2 pFmm –1 .
• 4X inverter delay is 0.02 ns + 0.5 CLns ( C L is in picofarads).
• Delay is measured using 0.35/0.65 output trip points.
• m2 minimum width is 3 λ = 0.9 µm.
• 1X inverter input capacitance is 0.02 pF (a standard load).
The output reaches 63 percent of its final value when t = CLRpd , because
exp (–1) = 0.63. Then,because the delay is measured with a 0.65 trip point, the
constant 0.5 nspF –1 0.5kW is very close to the equivalent pull-down
resistance. Thus, Rpd = 500 Ω .
•251
•m2 resistance is 50 m Ω square.
•m2 capacitance (for a minimum-width
line) is 0.2 pFmm –1 .
•4X inverter delay is 0.02 ns + 0.5 CLns (
C L is in picofarads).
•Delay is measured using 0.35/0.65
output trip points.
•m2 minimum width is 3 λ = 0.9 µm.
•1X inverter input capacitance is 0.02
pF (a standard load).
•252
• R 1= R 2 = 6 Ω
• R3=56 Ω
• R4=112 Ω
• C 1=0.02 pF
• C 2 =0.04 pF
• C 3=0.2 pF
• C 4=0.42 pF
Now we can calculate the path resistance, Rki, values (notice that Rki = Rki):
R14 = 500 Ω + 6 Ω =506 Ω
R24 = 500 Ω + 6 Ω =506 Ω
R34 =500 Ω + 6 Ω + 56 Ω =562 Ω
R44 =500 Ω + 6 Ω + 56 Ω + 112 Ω =674 Ω (17.5)
•253
ζD4 = R 14 C 1 + R 24 C 2 + R 34 C 3 + R 44 C 4
:
Finally, we can calculate Elmore’s constants for node 4 and node 2 as follows
(17.6)
= (506)(0.02) + (506)(0.04)
+ (562)(0.2) + (674)(0.42)
ζD2 = R= 12 425
C 1 +psR. 22 C 2 + R 32 C 3 + R 42 C 4 (17.7)
= ( R pd + R 1 )( C 1 + C 3 + C 4 )
+ ( R pd + R 1 + R 2 ) C 2
= (500 + 6 + 6)(0.04)
+ (500 + 6)(0.02 + 0.2 + 0.2)
= 344 ps .
•255
Measurement of Interconnect Delay (contd.,)
• Even using the Elmore constant we still made the following assumptions in
estimating the path delays:
• A step-function waveform drives the net.
• The delay is measured from when the gate input changes.
• The delay is equal to the time constant of an exponential waveform
that approximates the actual output waveform.
• The interconnect is modeled by discrete resistance and capacitance
elements.
• The global router could use more sophisticated estimates that remove some
of these assumptions, but there is a limit to the accuracy with which delay
can be estimated during global routing
• The path that minimizes the delay between two terminals on a net is not
necessarily the same as the path that minimizes the total path length of
the net.
•256
Global Routing Methods
• Many of the methods used in global routing are based on the solutions to the
tree on a graph problem.
• sequential routing :
One approach to global routing takes each net in turn and calculates
the shortest path using tree on graph algorithms—with the added
restriction of using the available channels.
Disadvantage:
• As a sequential routing algorithm proceeds, some channels will
become more congested since they hold more interconnects than
others.
• In the case of FPGAs and channeled gate arrays, the channels have a
fixed channel capacity and can only hold a certain number of
interconnects.
•257
Global Routing Methods (contd.,)
• There are two different ways that a global router normally handles this problem.
1.Order independent Routing
2.Order dependent Routing
• Order-independent routing, after all the interconnects are assigned to channels, the
global router returns to those channels that are the most crowded and reassigns some
interconnects to other, less crowded, channels.
• order dependent :A global router can consider the number of interconnects already
placed in various channels as it proceeds. In this case the global routing is order
dependent —the routing is still sequential, but now the order of processing the nets will
affect the results.
• Rather than handling all of the nets on the chip at the same time, the global-
routing problem is made more tractable by dividing the chip area into levels of
hierarchy.
• By considering only one level of hierarchy at a time the size of the problem is
reduced at each level.
•259
Global Routing
– between blocks
•260
Global Routing Between Blocks
•FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC showing a single net
with a fanout of four (five terminals). We have to order the numbered channels to complete
the interconnect path for terminals A1 through F1. (b) The terminals are projected to the
center of the nearest channel, forming a graph. A minimum-length tree for the net that uses
the channels and takes into account the channel capacities. (c) The minimum-length tree
does not necessarily correspond to minimum delay. If we wish to minimize the delay
from terminal A1 to D1, a different tree might be better. •262
Global Routing Between Blocks
( contd.,)
• Global routing is very similar for cell-based ASICs and gate arrays, but there
is a very important difference between the types of channels in these
ASICs.
• If the global router needs more room, even in just one channel on the whole
chip, the designer has to repeat the placement-and-routing steps and try again
(or use a bigger chip).
•263
Global Routing Inside Flexible Blocks
•FIGURE 17.6 Gate-array global routing. (a) A small gate array. (b) An enlarged view of the routing. The
top channel uses three rows of gate-array base cells; the other channels use only one. (c) A further
enlarged view showing how the routing in the channels connects to the logic cells. (d) One of the logic
cells, an inverter. (e) There are seven horizontal wiring tracks available in one row of gate-array base •264
cells—the channel capacity is thus 7
Global Routing Inside Flexible Blocks (contd.,)
FIGURE 17.8 Global routing a gate array. (a) A single global-routing cell (GRC or routing bin) containing 2-by-4
gate-array base cells. For this choice of routing bin the maximum horizontal track capacity is 14, the maximum
vertical track capacity is 12. The routing bin labeled C3 contains three logic cells, two of which have feedthroughs
marked 'f'. This results in the edge capacities shown. (b) A view of the top left-hand corner of the gate array
showing 28 routing bins. The global router uses the edge capacities to find a sequence of routing bins to connect
the nets.
•266
Timing-DrivenMethods
• As in timing-driven placement, there are two main approaches to timing-driven routing:
– net-based and path-based.
• Placement and global routing tools may or may not use the same algorithm to
estimate net delay. If these tools are from different companies, the algorithms are
probably different.
• Companies that produce floorplanning and placement tools make sure that the
output is compatible with different routing tools—often to the extent of using different
algorithms to target different routers.
•267
Back-annotation
• The global router can give not just an estimate of the total net
length (which was all we knew at the placement stage), but the
resistance and capacitance of each path in each net. This RC
information is used to calculate net delays.
•268
Detailed Routing
Goal:
• The goal of detailed routing is to complete all the connections between logic
cells.
Objectives:
•269
Measurement of Channel Density
Definition of Local and Global channel density
•270
Left-edge algorithm
The left-edge algorithm ( LEA ) is the basis for several routing algorithms [
Hashimoto and Stevens, 1971].
The LEA applies to two-layer channel routing, using one layer for the trunks and the
other layer for the branches.
For example, m1 may be used in the horizontal direction and m2 in the vertical
direction.
•271
Left-edge algorithm
•272
Left-edge algorithm
•273
Constraints and Routing Graphs
• Two terminals that are in the same column in a channel create a
vertical constraint .
• Overlap between the trunks of nets is called horizontal constraint.
•274
Dog-Leg router
• A dogleg router removes the restriction that each net can use only one
track or trunk.
•275
Area Routing Algorithm- Lee-Maze algorithm
[For general shaped areas]
• Finds a path from source (X) to target (Y) by emitting a wave from both
the source and the target at the same time.
• Once the target is reached, the path is found by backtracking (if there is a
choice of bins with equal labeled values, choose the bin that avoids changing
direction). (The original form of the Lee algorithm uses a single wave.)
•276
Hightower or line search-Area routing algorithm
[For general shaped areas]
• 1. Extend lines from both the source and target toward each
other.
• Cell-based ASICs may use either a clock spine, a clock tree, or a hybrid
approach.
• Figure shows how a clock router may minimize clock skew in a clock spine
by making the path lengths, and thus net delays, to every leaf node equal—
using jogs in the interconnect paths if necessary.
•278
Special routing- CLK routing
FIGURE: Clock routing. (a) A clock network for the cellbased ASIC
(b) Equalizing the interconnect segments between CLK and all
destinations (by including jogs if necessary) minimizes clock
skew.
•279
Special routing- Power routing
• Power bus width
• Each of the power buses has to be sized according to the current it
will carry.
•281
Special routing- Power routing
• Cell-based ASIC
• Standard cells are constructed in a similar fashion to gate-array cells, with
power buses running horizontally in m1 at the top and bottom of each
cell.
• A row of standard cells uses end-cap cells that connect to the VDD and VSS
power buses placed by the power router.
Figure: The regular and reduced standard parasitic format (SPF) models for
interconnect. (a) An example of an interconnect network with fanout. The driving-point
admittance of the interconnect network is Y ( s ). (b) The SPF model of the interconnect.
(c) The lumped-capacitance interconnect model. (d) The lumped-RC interconnect
model. (e) The PI segment interconnect model .
The values of C , R , C 1 , and C 2 are calculated so that Y 1 ( s ), Y 2 ( s ), and Y 3 ( s ) are the
first-, second-, and third-order Taylor-series approximations to Y ( s ).
•284
Circuit Extraction
The key features of regular and reduced SPF are as follows:
• The loading effect of a net as seen by the driving gate is represented by
choosing one of three different RC networks: lumped-C, lumped-RC, or PI
segment (selected when generating the SPF) [ O’Brien and Savarino, 1989].
• The pin-to-pin delays of each path in the net are modeled by a simple RC
delay (one for each path). This can be the Elmore constant for each path, but it
need not be.
• The reduced SPF ( RSPF) contains the same information as regular SPF,
but uses the SPICE format.
• Detailed SPF:
• The detailed SPF ( DSPF) shows the resistance and capacitance of each
segment in a net, again in a SPICE format. There are no models or
assumptions on calculating the net delays in this format.
•285
Design-Rule Check ( DRC )
• ASIC designers perform two major checks before fabrication.
• DRC:
• The first check is a design-rule check ( DRC ) to ensure that nothing
has gone wrong in the process of assembling the logic cells and
routing.
• Normally the ASIC vendor will perform this check using its own
software as a type of incoming inspection.
•287
Design-Rule Check ( DRC )
• Layout Vs Schematic check:
• To ensure that what is about to be committed to silicon
is what is really wanted.
• The first step is normally to match certain key nodes (such as the
power supplies, inputs, and outputs), but the process can very quickly
become bogged down in the thousands of mismatch errors that are
inevitably generated initially.
•289
Design-Rule Check ( DRC )
• Problems in LVS check:
• The second problem with an LVS check is creating a true reference.
•291