ASIC Floorplanning Essentials
ASIC Floorplanning Essentials
FLOORPLANNING 16
AND
PLACEMENT
Key terms and concepts: The input to floorplanning is the output of system partitioning and
design entry—a netlist. The output of the placement step is a set of directions for the routing
tools.
The starting point for floorplanning and placement for the Viterbi decoder (standard cells).
1
2 SECTION 16 FLOORPLANNING AND PLACEMENT ASICS... THE COURSE
16.1 Floorplanning
Key terms and concepts: Interconnect and gate delay both decrease with feature size—but at
different rates • Interconnect capacitance bottoms out at 2pFcm–1 for a minimum-width wire, but
gate delay continues to decrease • Floorplanning predicts interconnect delay by estimating inter-
connect length
know only the fanout (FO) of a net and the size of the block • We estimate interconnect length
from predicted-capacitance tables (wire-load tables)
fanout
% of nets (b)
fanout (FO) 1 2 3 4 5
100
10
FO=5
20 0.9 1.2 1.9 2.4 3.0
100
30 predicted capacitance
FO=4 (standard loads) as a
net C 40 function of fanout (FO) and
100 block size (k-gate)
FO=3 block size
(k-gate)
100
FO=2 0.9 standard
net A average net loads=0.009 pF
100 capacitance
net A
FO=1
net B
FO=1
0 0.25 0.5 0.75 1.0 0.03pF
delay/ ns
0.03 pF net B FO=4
net C
0 0.01 0.02 0.03 0.04
capacitance/pF
0 1 2 3 4 logic cells
standard loads row-based ASIC flexible block
1 standard load=0.01pF
(20k-gate)
(a) (c)
Predicted capacitance.
(a) Interconnect lengths as a function of fanout (FO) and circuit-block size.
(b) Wire-load table.
There is only one capacitance value for each fanout (typically the average value).
(c) The wire-load table predicts the capacitance and delay of a net (with a considerable er-
ror).
Net A and net B both have a fanout of 1, both have the same predicted net delay, but net B
in fact has a much greater delay than net A in the actual layout (of course we shall not know
what the actual layout is until much later in the design process).
ASICs... THE COURSE 16.1 Floorplanning 5
interconnect interconnect
delay/ ns delay /ns
Worst-case
1.0 ns worst case is 1.0
interconnect delay. increasing from
wire-load
As we scale circuits, table
average is
but avoid scaling the decreasing
chip size, the worst- ± 1 sigma
spread
case interconnect de- 0.1 ns 0.1
lay increases.
feature 100%
1.0 0.5 0.25 size/ µm
2 1.75
1.5 1.75
A B C
A B C
E
D F
E
D F
(a)
(b)
1.75
D F D F
(c) (d)
Congestion analysis.
(a) The initial floorplan with a 2:1.5 die aspect ratio.
(b) Altering the floorplan to give a 1:1 chip aspect ratio.
(c) A trial floorplan with a congestion map.
Blocks A and C have been placed so that we know the terminal positions in the channels.
Shading indicates the ratio of channel density to the channel capacity.
Dark areas show regions that cannot be routed because the channel congestion exceeds
the estimated capacity.
(d) Resizing flexible blocks A and C alleviates congestion.
ASICs... THE COURSE 16.1 Floorplanning 7
flight
4 21 32 line
Block status
Block name: A bundle
line E.in
Type: flexible 41
Contents: 200 cells 17 nets in
Seed file: A.seed center of bundle D.out
gravity
D E F D E F
fixed blocks (a) terminal, pin, or (b)
port location
A B C A B C
mirror about B.in
x-axis
D E F F E D
swap
(c) (d)
channel B channel B
block 1 block 3 block 1 block 3
block
m2 pin 2
1
Adjust Now we
channel A cannot
first. adjust
T-pin channel A.
m1
block 2 block 2
1 Adjust channel B
2 Now we can first.
adjust channel B.
(a) (b)
route
channels
in this
cut slice order
line 1
D
1 D
1 2
A A D
routing C 3 C 3
channel 4
B B 4
E E A B
circuit
block 2 cut C E
(a) (b) number (c)
Defining the channel routing order for a slicing floorplan using a slicing tree.
(a) Make a cut all the way across the chip between circuit blocks.
Continue slicing until each piece contains just one circuit block.
Each cut divides a piece into two without cutting through a circuit block.
(b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit
blocks are left.
(c) The slicing tree corresponding to the sequence of cuts gives the order in which to route
the channels: 4, 3, 2, and finally 1.
10 SECTION 16 FLOORPLANNING AND PLACEMENT ASICS... THE COURSE
1 E E
B
A C C
4
B
A 2
C E B
A
2
D D D
3
Cyclic constraints.
(a) A nonslicing floorplan with a cyclic constraint that prevents channel routing.
(b) In this case it is difficult to find a slicing floorplan without increasing the chip area.
(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic constraints, but it
is inefficient in area use and will be very difficult to route.
(a) 8 (b) 6
A 1 AC
3 E 2
4
cyclic constraint: 1
1, 2, 3, 4 B 2 C merge B E
7 standard 4
cell areas 8 7
10 5 11 A and C
3 channel
number
(in routing
D F D F order)
9 5
VDD(I/O)
pad
ring VSS(I/O)
core VDD(core)
VSS(core)
VSS (core)
power pad
I/O power pad
I/O circuit
I/O pads (pad-limited) I/O pad m1 I/O pad m1
(core-limited) jumper (core-limited) jumper
package-pin spacing
chip die
package pin
bond wire
lead frame
outline of
bond-wire chip core
angle
off-grid pads
stagger lead-frame
bond wires (not
all shown)
solder bump (not
shown on all pads)
minimum
lead-frame pitch
(c) (d)
Bonding pads.
(a) This chip uses both pad-limited and core-limited pads.
(b) A hybrid corner pad.
(c) A chip with stagger-bonded pads.
(d) An area-bump bonded chip (or flip-chip). The chip is turned upside down and solder
bumps connect the pads to the lead frame.
ASICs... THE COURSE 16.1 Floorplanning 13
cell-based ASIC
custom I/O pad
I/O
circuits
4mA output 2 × 4mA output
driver cell driver cells in
bonding I/O-cell parallel I/O cell
pad pitch slot
bonding pad I/O circuit
(a)
(not shown
gate-array for all slots)
pads are output cell
fixed pitch empty pad
slot
4mA output 8mA output pad cell
pad pad
(b) (c)
(a) (b)
standard-cell area
A A
horizontal B E B E
channel
layer
m2 crossing
m1 D F D F
vertical
channel All power rails run in m1 parallel to spine.
m1/m2 via
VDD VSS VDD VSS
(m2) (m1) (m1) (m1)
signal m1
(m2)
signals need to m2 m2
change layers
m1
(c) (d)
Power distribution.
(a) Power distributed using m1 for VSS and m2 for VDD.
This helps minimize the number of vias and layer crossings needed but causes problems in
the routing channels.
(b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine.
This can make automatic routing easier but may increase the number of vias and layer
crossings.
(c) An expanded view of part of a channel (interconnect is shown as lines).
If power runs on different layers along the spine of a channel, this forces signals to change
layers.
(d) A closeup of VDD and VSS buses as they cross.
Changing layers requires a large number of via contacts to reduce resistance.
ASICs... THE COURSE 16.1 Floorplanning 15
2 A1 A.1 A.2
clock-driver cell
CLK buffer chain A1, B1, B2
D1, D2, E1 CLK
1 2 n
C1 C2 Cn D3, E2, F1 D2
CL
clock F1
taper spine
latency skew
(c) (d)
Clock distribution.
(a) A clock spine for a gate array.
(b) A clock spine for a cell-based ASIC (typical chips have thousands of clock nets).
(c) A clock spine is usually driven from one or more clock-driver cells.
Delay in the driver cell is a function of the number of stages and the ratio of output to input
capacitance for each stage (taper).
(d) Clock latency and clock skew. We would like to minimize both latency and skew.
16 SECTION 16 FLOORPLANNING AND PLACEMENT ASICS... THE COURSE
A.1
clock-buffer cell D Q
7
5 6
(a) C1 C2
A1
2 flip-flop
CLK B1, B2 A.2
taper 1 3
D1, D2, E1
4
I/O pad D3, E2 clock spine
2 inside block A
F.2
1 3 F1 8 inside
flip-flop
(b) 4 9
C1 CL F.1 A.2 CLK' CLK
10 7
taper
clock tree inside block F
(c)
A clock tree.
(a) Minimum delay is achieved when the taper of successive stages is about 3.
(b) Using a fanout of three at successive nodes.
(c) A clock tree for a cell-based ASIC
We have to balance the clock arrival times at all of the leaf nodes to minimize clock skew.
ASICs... THE COURSE 16.2 Placement 17
16.2 Placement
Key terms and concepts: Placement is more suited to automation than floorplanning. Thus we
need measurement techniques and algorithms.
A
channel
density
=7
m2
feedthrough using feedthrough cell
logic cell (vertical capacity=1) m1
(b)
(a)
channel
height=15
over-the-cell routing in m2
(c)
Interconnect structure.
(a) A two-level metal CBIC floorplan.
(b) A channel from the flexible block A. This channel has a channel height equal to the
maximum channel density of 7 (there is room for seven interconnects to run horizontally in
m1).
(c) A channel that uses OTC (over-the-cell) routing in m2.
18 SECTION 16 FLOORPLANNING AND PLACEMENT ASICS... THE COURSE
(a) (b)
gate-array base
1 block = 128 sites
= 36 blocks by 128 sites
= 4608 sites 16
16 site or base cell
channel
routing
row
3 columns
channel A (density=10)
2-row-high channel m2
(horizontal capacity=14)
m1
channel B (density=5)
Gate-array interconnect.
(a) A small two-level metal gate array (about 4.6k-gate).
(b) Routing in a block.
(c) Channel routing showing channel density and channel capacity.
The channel height on a gate array may only be increased in increments of a row. If the in-
terconnect does not use up all of the channel, the rest of the space is wasted. The intercon-
nect in the channel runs in m1 in the horizontal direction with m2 in the vertical direction.
ASICs... THE COURSE 16.2 Placement 19
A.211 Y X
A.43
A.25
rows of cell instance name
standard 50 λ
(a) cells (b)
W1 2 3 4 5 6 7 W
1 X 2 4 6 8 X minimum
2 rectilinear
3 Y 10 Y Steiner tree
4
5 12
6
7 L=16 14 L=15
Z Z Steiner
50 λ 50 λ point
(c) (d)
(a) 28 26 24 22 (b) 28 26 24 22
30 42 40
Interconnect-length measures. 2 44 20 2 20
34
4 18 4 18
(a) Complete-graph measure. 36
6 16 6 16
(b) Half-perimeter measure. 39
8 10 12 14 8 10 12 14
complete-graph measure half-perimeter measure
L=44/2=22 L=28/ 2= 14
B C 3 2 4 1
A
3 4 B C A
logic
cell (b)
(a)
0.6
0.4 3 1
0.2
(c)
0 B A
-0.2
cell
-0.4 C abutment
box
-0.6 2 4
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
cell 3 cell 1
connector
B C channel
(d) A
m2
m1
Eigenvalue placement.
(a) An example network.
(b) The one-dimensional placement.
The small black squares represent the centers of the logic cells.
(c) The two-dimensional placement.
The eigenvalue method takes no account of the logic cell sizes or actual location of logic
cell connectors.
(d) A complete layout.
We snap the logic cells to valid locations, leaving room for the routing in the channel.
24 SECTION 16 FLOORPLANNING AND PLACEMENT ASICS... THE COURSE
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8
9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12
13 14 15 16 13 14 15 16 13 14 15 16 13 14 15 16
λ =2 swap
(a) (b) (c) (d)
Interchange.
(a) Swapping the source logic cell with a destination logic cell in pairwise interchange.
(b) Sometimes we have to swap more than two logic cells at a time to reach an optimum
placement, but this is expensive in computation time.
Limiting the search to neighborhoods reduces the search time.
Logic cells within a distance ε of a logic cell form an ε-neighborhood.
(c) A one-neighborhood.
(d) A two-neighborhood.
ASICs... THE COURSE 16.2 Placement 25
A B C
spring
(–5, 4)
A B C A
D E F (–2, 2)
D E F (–2, 2)
G H I G H I (–1 , 0)
H I I
Force-directed placement.
(a) A network with nine logic cells.
(b) We make a grid (one logic cell per bin).
(c) Forces are calculated as if springs were attached to the centers of each logic cell for
each connection.
The two nets connecting logic cells A and I correspond to two springs.
(d) The forces are proportional to the spring extensions.
A B C D A B C D A B C D
Move P to Move P to
E F G H E F G H location E F G H location
force that that
I J K L vector I J K L minimizes I J K L minimizes
M N O P M N O P force M N O P force
P vector. vector
Repeat process,
Trial swap P with nearest forming a chain. Swap is accepted if
neighbors in direction of force destination module moves
vector. to ε-neighborhood of P.
(a) (b) (c)
A 1/2/1 X
0/1/1 3/4/1 4/6/2 7/10/3
primary 1 1 primary
input 2 3 output
critical path
1/2/1
B 9/10/1 Y
0/1/1
(a)
2/4/2 5/6/1
2 4
1 1
5/8/3 7/10/3
C 0/3/3 Z
1/4/3 3/6/3 2 1
1 2
arrival/required/slack
gate delay
A 1.5/1.5/0 X
0/0/0 4/4/0 6/6/2 10/10/0
1.5/1.5/0
B 10/10/0 Y
0/0/0
(b)
4/4/0 6/6/0
2+0 4+0
1+ 0.5 1+1.5
8/8/0 10/10/0
C 0/0/0 Z
6/6/0 2+0 1+0
2.5/2.5/0
1+1.5 2+1.5
arrival/required/slack
gate delay + net delay
maximum capacity of
cut line (y) =4 each bin
A B C routing length =7
edge=2
wire maximum cut (x and y) =2
length= 1
A B E C D E
D E F cut line=2
C D F A
cut line=1 G F
G H I H I G B H I
total routing length=8
(a) (b) (c)
Placement example.
(a) An example network.
(b) In this placement, the bin size is
equal to the logic cell size and all cell
connector
the logic cells are assumed equal
size. cell A cell B cell E
(c) An alternative placement with a
lower total routing length. channel
(d) m1 m2 density=2
(d) A layout that might result from
the placement shown in b. cell C cell D cell F
Notice that the logic cells are not all cell H cell I cell G cell abutment
the same size (which means there box
are errors in the interconnect-length
estimates we made during place-
ment).
Because interconnect delay now dominates gate delay, the trend is to include placement
within a floorplanning tool and use a separate router.
1. Design entry. The input is a logical description with no physical information.
ASICs... THE COURSE 16.3 Physical Design Flow 29
2. Initial synthesis. The initial synthesis contains little or no information on any interconnect
loading.The output of the synthesis tool (typically an EDIF netlist) is the input to the floorplan-
ner.
3. Initial floorplan. From the initial floorplan interblock capacitances are input to the synthesis
tool as load constraints and intrablock capacitances are input as wire-load tables.
4. Synthesis with load constraints. At this point the synthesis tool is able to resynthesize the
logic based on estimates of the interconnect capacitance each gate is driving. The synthesis
tool produces a forward annotation file to constrain path delays in the placement step.
5. Timing-driven placement. After placement using constraints from the synthesis tool, the
location of every logic cell on the chip is fixed and accurate estimates of interconnect delay
can be passed back to the synthesis tool.
6. Synthesis with in-place optimization (IPO).The synthesis tool changes the drive strength
of gates based on the accurate interconnect delay estimates from the floorplanner without
altering the netlist structure.
7. Detailed placement. The placement information is ready to be input to the routing step.
VHDL/Verilog increasing
1 design entry accuracy of
netlist
wire-load
A estimates
2 initial synthesis error
0
synthesis C3
A 6 with in-place A
optimization
x8 C3 C4
7 detailed placement
block C4
A.nand1 A.inv1
16.4.2 PDEF
Key terms and concepts: physical design exchange format (PDEF)
(CLUSTERFILE
(PDEFVERSION "1.0")
(DESIGN "myDesign")
(DATE "THU AUG 6 12:00 1995")
ASICs... THE COURSE 16.5 Summary 31
(VENDOR "ASICS_R_US")
(PROGRAM "PDEF_GEN")
(VERSION "V2.2")
(DIVIDER .)
(CLUSTER (NAME "ROOT")
(WIRE_LOAD "10mm x 10mm")
(UTILIZATION 50.0)
(MAX_UTILIZATION 60.0)
(X_BOUNDS 100 1000)
(Y_BOUNDS 100 1000)
(CLUSTER (NAME "LEAF_1")
(WIRE_LOAD "50k gates")
(UTILIZATION 50.0)
(MAX_UTILIZATION 60.0)
(X_BOUNDS 100 500)
(Y_BOUNDS 100 200)
(CELL (NAME L1.RAM01)
(CELL (NAME L1.ALU01)
)
)
)
16.5 Summary
Key terms and concepts: Interconnect delay now dominates gate delay • Floorplanning is a
mapping between logical and physical design • Floorplanning is the center of design operations
for all types of ASIC • Timing-driven floorplanning is an essential ASIC design tool • Placement
is an automated function
32 SECTION 16 FLOORPLANNING AND PLACEMENT ASICS... THE COURSE