Ics Intended For A Specific Application E.G Chip For Toy Bear That Talks - 3 Types
Ics Intended For A Specific Application E.G Chip For Toy Bear That Talks - 3 Types
ICs intended for a specific application e.g chip for toy bear that talks
3 types:
Full Custom ASIC
engineer designs some or all of the logic cells, circuits, or layout specifically for
one ASIC.
designer abandons the approach of using pretested and pre-characterized cells
for all or part of that design.
Most expensive to manufacture and design
Manufacturing lead time (the time required just to make an IC not including
design time) is typically eight weeks for a full-custom IC.
some (possibly all) logic cells are customized and all mask layers are
customized.
Offers highest performance and smallest die size for a given design
full-custom design used if
ASIC technology is new or
no existing cell libraries or existing cell libraries are not fast enough, or the logic cells
are not small enough or consume too much power.
some circuits must be custom designed.
Bipolar technology has historically been more widely used for full-custom analog
design because of its improved precision.
Semi Custom ASIC
all of the logic cells are predesigned and some (possibly all) of the mask
layers are customized
Using the predesigned cells from a cell library makes the design much
easier
two types of semicustom ASICs
(i) Standard-cellbased ASICs
(ii) Gate-arraybased ASICs.
Standard-CellBased ASICs
cell-based ASIC (CBIC) uses predesigned logic cells (AND gates, OR gates,
multiplexers, and flipflops, for example) known as standard cells
standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard
cellslike a wall built of bricks.
standard-cell areas may be used in combination with larger predesigned cells, perhaps
microcontrollers or even microprocessors, known as megacells.
ASIC designer defines only the placement of the standard cells and the interconnect in
a CBIC.
all the mask layers of a CBIC are customized and are unique to a particular customer.
advantage of CBICs is that designers save time, money, and reduce risk by using a
predesigned, pretested, and precharacterized standard-cell library
each standard cell can be optimized individually.
During the design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area, for example.
disadvantages are the time or expense of designing or buying the standard-cell library
and the time needed to fabricate all layers of the ASIC for each new design.
important features of this type of ASIC are as follows:
All mask layers are customizedtransistors and interconnect.
Custom blocks can be embedded.
Manufacturing lead time is about eight weeks.
Each standard cell in the library is constructed using full-custom design
methods, but predesigned and pre-characterized circuits can be used
without having to do any full-custom design yourself.
design style gives you the same performance and flexibility advantages of a
full-custom ASIC but reduces design time and reduces risk.
Since all mask layers on a standard-cell design are customized, memory
design is more efficient and denser than for gate arrays.
Both cell-based and gate-array ASICs use predefined cells, but there is a
differencewe can change the transistor sizes in a standard cell to optimize
speed and performance, but the device sizes in a gate array are fixed.
results in a tradeoff in performance and area in a gate array at the silicon
level. The trade-off between area and performance is made at the library
level for a standard-cell ASIC.
Gate-ArrayBased ASICs
Here transistors are predefined on the silicon wafer.
predefined pattern of transistors on a gate array is the base array , and the
smallest element that is replicated to make the base array is the base cell
Only the top few layers of metal, which define the interconnect between
transistors, are defined by the designer using custom masks.
often called a masked gate array ( MGA ).
designer chooses from a gate-array library of predesigned and
precharacterized logic cells.
The logic cells in a gate-array library are often called macros since base-cell
layout is the same for each logic cell, and only the interconnect (inside cells
and between cells) is customized
also called a prediffused array
only the metal interconnections are unique to an MGA
costs for all the initial fabrication steps for an MGA are shared for each
customer and this reduces the cost of an MGA compared to a full-custom or
standard-cell ASIC design.
time needed to make an MGA, the turnaround time ,a few days or at most a
couple of weeks
different types of MGA or gate-arraybased ASICs:
Channeled Gate Array
important features of this type of MGA are:
Only the interconnect is customized.
The interconnect uses predefined spaces between rows of base cells.
Manufacturing lead time is between two days and two weeks.
channeled gate array similar to a CBIC - both use rows of cells separated by
channels used for interconnect.
One difference is that the space for interconnect between rows of cells are
fixed in height in a channeled gate array, whereas the space between rows of
cells may be adjusted in a CBIC.
Channel-less Gate Array
also known as a channel-free gate array , sea-of-gates array , or SOG array
important features of this type of MGA are as follows:
Only some (the top few) mask layers are customizedthe interconnect.
Manufacturing lead time is between two days and two weeks.
key difference between a Channel-less gate array and channeled gate array
is that there are no predefined areas set aside for routing between cells on a
Channel-less gate array, instead we route over the top of the gate-array
devices.
When area of transistors is used for routing in a Channel-less array, no
contacts made to the devices lying underneath, transistors simply left
unused.
logic densitythe amount of logic that can be implemented in a given silicon
areais higher for Channel-less gate arrays than for channeled gate arrays.
contact mask is customized in a Channel-less gate array, but is not usually
customized in a channeled gate array leading to denser cells in the Channel-
less architectures because cells can be routed over the top of unused
contact sites.
Structured Gate Array
also known as embedded gate array or as masterslice or masterimage
combines some of the features of CBICs and MGAs.
One of the disadvantages of the MGA is the fixed gate-array base cell.
makes the implementation of memory, for example, difficult and inefficient.
In an embedded gate array we set aside some of the IC area and dedicate it to
a specific function.
This embedded area either can contain a different base cell that is more
suitable for building memory cells, or it can contain a complete circuit block,
such as a microcontroller.
important features of this type of MGA are the following:
Only the interconnect is customized.
Custom blocks (the same for each design) can be embedded.
Manufacturing lead time is between two days and two weeks.
gives the improved area efficiency and increased performance of a CBIC but
with the lower cost and faster turnaround of an MGA.
One disadvantage of an embedded gate array is that the embedded function is
fixed.
For example, if an embedded gate array contains an area set aside for
Programmable Logic Devices
standard ICs that are available in standard configurations
PLDs may be configured or programmed to create a part customized to a
specific application, and so they also belong to the family of ASICs.
important features that all PLDs have in common:
No customized mask layers or logic cells
Fast design turnaround
A single large block of programmable interconnect
A matrix of logic macrocells that usually consist of programmable array logic
followed by a flip-flop or latch
simplest type of programmable IC is a read-only memory
PLA has a programmable AND logic array, or AND plane , followed by a
programmable OR logic array, or OR plane
PAL has a programmable AND plane and, in contrast to a PLA, a fixed OR
plane.
Depending on how the PLD is programmed, we can have an erasable PLD
(EPLD), or mask-programmed PLD (sometimes called a masked PLD but
usually just PLD).
The first PALs, PLAs, and PLDs were based on bipolar technology and used
programmable fuses or links. CMOS PLDs usually employ floating-gate
Field-Programmable Gate Arrays
FPGA is usually just larger and more complex than a PLD.
FPGAs are the newest member of the ASIC family and are rapidly growing in
importance, replacing TTL in microelectronic systems
essential characteristics of an FPGA:
None of the mask layers are customized.
A method for programming the basic logic cells and the interconnect.
The core is a regular array of programmable basic logic cells that can implement
combinational as well as sequential logic (flip-flops).
A matrix of programmable interconnect surrounds the basic logic cells.
Programmable I/O cells surround the core.
Design turnaround is a few hours.
Design Flow
Design entry : Enter the design into an ASIC design system, either using a
hardware description language ( HDL ) or schematic entry .
Logic synthesis : Use an HDL (VHDL or Verilog) and a logic synthesis tool to
produce a netlist a description of the logic cells and their connections.
System partitioning : Divide a large system into ASIC-sized pieces.
Pre-layout simulation : Check to see if the design functions correctly.
Floorplanning : Arrange the blocks of the netlist on the chip.
Placement : Decide the locations of cells in a block.
Routing : Make the connections between cells and blocks.
Extraction : Determine the resistance and capacitance of the interconnect.
Post-layout simulation : Check to see the design still works with the added loads
of the interconnect.
Steps 14 are part of logical design , and steps 59 are part of physical design.
There is some overlap.
For example, system partitioning might be considered as either logical or
physical design.
when we are performing system partitioning we have to consider both logical
and physical factors.
Physical Design
physical design of ASICs is normally divided into system partitioning,
floorplanning, placement, and routing
depending on the size of the system, system partitioning may be performed
before doing any design entry or synthesis.
There may be some iteration between the different steps too.
first apply system partitioning to divide a microelectronics system into separate
ASICs
In floorplanning we estimate sizes and set the initial relative locations of the
various blocks in our ASIC (sometimes we also call this
chip planning).
At the same time we allocate space for clock and power wiring and decide on the
location of the I/O and power pads.
Placement defines the location of the logic cells within the flexible blocks and sets
aside space for the interconnect to each logic cell.
Routing makes the connections between logic cells.
Routing is a hard problem by itself and is normally split into two distinct steps,
called global and local routing.
Global routing determines where the interconnections between the placed logic
cells and blocks will be situated.
Local routing joins the logic cells with interconnections.
System Partitioning
The goal of partitioning is to divide the system so that each partition is a single ASIC.
To do this we may need to take into account any or all of the following objectives:
A maximum size for each ASIC
A maximum number of ASICs
A maximum number of connections for each ASIC
A maximum number of total connections between all ASICs
Simple Partitioning
goal is to partition simple network into ASICs.
objectives are the following:
Use limited no. of ASICs.
Each ASIC is to contain limited no. of logic cells.
Use the minimum number of external connections for each ASIC.
Use the minimum total number of external connections.
Constructive Partitioning
most common constructive partitioning algorithms use seed growth or cluster growth.
simple seed-growth algorithm for constructive partitioning consists of the following steps:
Start a new partition with a seed logic cell.
Consider all the logic cells that are not yet in a partition. Select each of these logic cells in turn.
Calculate a gain function, g(m) , that measures the benefit of adding logic cell m to the current partition.
One measure of gain is the number of connections between logic cell m and the current partition.
Add the logic cell with the highest gain g(m) to the current partition.
Repeat the process from step 2. If you reach the limit of logic cells in a partition, start again at step 1.
may choose different gain functions according to the objectives
algorithm starts with the choice of a seed logic cell ( seed module, or just seed).
The logic cell with the most nets is a good choice as the seed logic cell.
Iterative Partitioning Improvement
most common iterative improvement algorithms are based on interchange and
group migration
process of interchanging (swapping) logic cells in an effort to improve the partition
is an interchange method.
If the swap improves the partition, we accept the trial interchange; otherwise we
select a new set of logic cells to swap.
limit to what we can achieve with a partitioning algorithm based on simple
interchange.
Algorithms of this type are greedy algorithms in the sense that they will accept a
move only if it provides immediate benefit.
Group migration consists of swapping groups of logic cells between partitions.
group migration algorithms are better than simple interchange methods at
improving a solution but are more complex
all group migration methods are based on the powerful and general KernighanLin
algorithm
The KernighanLin Algorithm
Find two nodes, ai from A , and bi from B , so that the gain from swapping them is a maximum. The gain is
gi = D ai + D bi 2 c aibi
Next pretend swap ai and bi even if the gain gi is zero or negative, and do not consider ai and bi eligible for
being swapped again.
Repeat steps 1 and 2 a total of m (no. of nodes in each partition) times until all the nodes of A and B have
been pretend swapped. We are back where we started, but we have ordered pairs of nodes in A and B
according to the gain from interchanging those pairs.
Now we can choose which nodes we shall actually swap. Suppose we only swap the first n pairs of nodes that
we found in the preceding process. In other words we swap nodes X = a 1 , a 2 ,, a n from A with nodes Y = b
1 , b 2 ,, b n from B. The total gain would be
G n = (from i = 1 to n) gi
We now choose n corresponding to the maximum value of G n .
If the maximum value of G n > 0, then we swap the sets of nodes X and Y and thus reduce the cut weight by
G n . We use this new partitioning to start the process again at the first step. If the maximum value of G n = 0,
then we cannot improve the current partitioning and we stop. We have found a locally optimum solution.
Problems with K-L algorithm
It minimizes the number of edges cut, not the number of nets cut.
It does not allow logic cells to be different sizes.
It is expensive in computation time.
It does not allow partitions to be unequal or find the optimum partition size.
It does not allow for selected logic cells to be fixed in place.
The results are random.
It does not directly allow for more than two partitions.
requires an amount of computer time that grows as n 2 log n for a graph with 2n nodes.
FiducciaMattheyses algorithm
Features
Only one logic cell, the base logic cell, moves at a time.
In order to stop the algorithm from moving all the logic cells to one large
partition, the base logic cell is chosen to maintain balance between partitions.
The balance is the ratio of total logic cell size in one partition to the total logic
cell size in the other.
Altering the balance allows us to vary the sizes of the partitions.
Critical nets are used to simplify the gain calculations.
A net is a critical net if it has an attached logic cell that, when swapped,
changes the number of net cuts.
It is only necessary to recalculate the gains of logic cells on critical nets that
are attached to the base logic cell.
The logic cells that are free to move are stored in a doubly linked list.
The lists are sorted according to gain
This allows the logic cells with maximum gain to be found quickly.
reduces the computation time so that it increases only slightly more than
linearly with the number of logic cells in the network,
K-L & FM
K L suggested simulating logic cells of different sizes by clumping s
logic cells together with highly weighted nets to simulate a logic cell
of size s . The FM algorithm takes logic-cell size into account as it
selects a logic cell to swap based on maintaining the balance
between the total logic-cell size of each of the partitions.
To generate unequal partitions using the KL algorithm, we can
introduce dummy logic cells with no connections into one of the
partitions. The FM algorithm adjusts the partition size according to
the balance parameter.
The FM algorithm allows you to fix logic cells by removing them
from consideration as the base logic cells you move. Methods based
on the KL algorithm find locally optimum solutions in a random
fashion.
The Ratio-Cut Algorithm
ratio-cut algorithm removes the restriction of constant partition sizes.
cut weight W for a cut that divides a network into two partitions, A and B , is
given by
KL algorithm minimizes W while keeping partitions A and B the same size.
The ratio of a cut is defined as
Routing Constraints
The routing constraints can be classified into two major categories:
Design rule constraints
Performance constraints
Design-rule constraints
often related with the manufacturing details during fabrication
To improve the manufacturing yield, connections of nets have to follow the rules provided
by foundries.
For example, in the 65-nm technology, the physical limitations of an optical lithography
system would impose a constraint on a wire such that its width cannot be smaller than
65 nm.
Figure illustrates a typical set of design rules.
defines the minimum widths of wires and vias
minimum wire-to-wire spacing and minimum via-to-via spacing of a layer.
distance between two wires or routing tracks of the grid-based model is often called wire pitch.
Other design rules of the manufacturing process, such as resistance and capacitance of each
layer, are also included.
An example of design
rules. Typical rules define
wire width, wire spacing,
wire pitch, via width, and
via spacing on each layer.
Performance constraints
objective is to make the connections meet the performance specifications
provided by chip designers
For example, the timing constraint is often the most important performance
constraint for high-speed designs
speed of a chip is limited by its critical nets, which have smaller timing
budgets (or timing slacks) than others.
To meet the performance constraint, it is desirable to carefully route these
critical nets by proper routing topologies
Global vs. Detailed Routing
Global routing
Input: detailed placement, with exact terminal
locations
Determine channel (routing region) for each
net
Objective: minimize area (congestion), and
timing (approximate), Maximize the probability
that the detailed router can complete the
routing, Minimize the critical path delay.
Detailed routing
Input: channels and approximate routing from
the global routing phase
Determine the exact route and layers for each
net
Objective: valid routing, Minimize the total
interconnect length and area, meet timing
constraints
Additional objectives: min via, power
Figs. [Sherwani]
Measurement of Channel Density
number of nets that cross a line drawn vertically anywhere in a
channel is the local density
maximum local density of the channel is the global density or
sometimes just channel density
Channel density is an important measure in routingit tells a
router the absolute fewest number of horizontal interconnects
that it needs at the point where the local density is highest.
In two-level routing the channel density determines the minimum
height of the channel.
The channel capacity is the maximum number of interconnects
that a channel can hold.
If the channel density is greater than the channel capacity, that
channel definitely cannot be routed .
Area-Routing Algorithms
The Lee maze-running algorithm
Hightower algorithm
The Lee maze-running algorithm
Goal is to find a path from X to Yi.e., from the start (or source) to
the finish (or target)avoiding any obstacles.
Algorithm finds a path from source (X) to target (Y) by emitting a
wave from both the source and the target at the same time.
Successive outward moves are marked in each bin.
Once the target is reached, the path is found by backtracking (if
there is a choice of bins with equal labeled values, we choose the
bin that avoids changing direction).
algorithm is often called wave propagation because it sends out
waves
Hightower algorithm a line-search algorithm (or line-probe
algorithm)
Extend lines from both the source and target toward each other.
When an extended line, known as an escape line , meets an
obstacle, choose a point on the escape line from which to project
another escape line at right angles to the old one.
This point is the escape point .
Place an escape point on the line so that the next escape line just
misses the edge of the obstacle.
Escape lines emanating from the source and target intersect to form
the path.
Multilevel Routing
two-layer routing : using one layer for the trunks and the other layer for the
branches
2.5-layer routing : possible to complete some routing in m2 using over-the-
cell (OTC) routing
three-layer routing :
Reserved-layer routing restricts all the interconnect on each layer to flow in one
direction(|| or perpendicular) in a given routing area
Unreserved-layer routing moves in both horizontal and vertical directions on a
given layer.
Reserved three-level metal routing offers another choice:
Either use m1 and m3 for horizontal routing (parallel to the channel spine),
with m2 for vertical routing ( HVH routing ) or use VHV routing
Some processes have more than three levels of metal.
Sometimes the upper one or two metal layers have a coarser pitch than the
lower layers and are used in multilevel routing for power and clock lines
rather than for signal interconnect.
Special Routing
Clock Routing
clock router may minimize clock skew in a clock spine by making the path lengths, and thus net
delays, to every leaf node equalusing jogs in the interconnect paths if necessary.
More sophisticated clock routers perform clocktree synthesis (automatically choosing the depth
and structure of the clock tree) and clock-buffer insertion (equalizing the delay to the leaf nodes
by balancing interconnect delays and buffer delays).
The power buses supplying the buffers driving the clock spine carry direct current ( unidirectional
current or DC), but the clock spine itself carries alternating current ( bidirectional current or AC).
Power Routing
Each of the power buses has to be sized according to the current it will carry.
Too much current in a power bus can lead to a failure through a mechanism known as
electromigration
To determine the power-bus widths we need to determine the bus currents.
Power routing of cell-based ASICs may include the option to include vertical m2 straps at a
specified intervals.
The power router forms an interdigitated comb structure, minimizing the number of times a VDD
or VSS power bus needs to change layers.
This is achieved by routing with a routing bias on preferred layers.
For example, VDD may be routed with a left-and down bias on m1, with VSS routed using right-
and-up bias on m2.
In a three-level metal process, power routing is similar to two-level metal ASICs.
Circuit Extraction and DRC
parasitic capacitance and resistance associated with each interconnect, via,
and contact can be calculated by a circuit-extraction tool
design-rule check ( DRC )
design-rule check ( DRC ) to ensure that nothing has gone wrong in the
process of assembling the logic cells and routing.
DRC may be performed at two levels.
Since the detailed router normally works with logic-cell phantoms, the first
level of DRC is a phantom-level DRC , which checks for shorts, spacing
violations, or other design-rule problems between logic cells.
This is principally a check of the detailed router.
If we have access to the real library-cell layouts (sometimes called hard
layout ), we can instantiate the phantom cells and perform a second-level
DRC at the transistor level.
This is principally a check of the correctness of the library cells.
Stuck at fault model
single stuck-at fault ( SSF ) model assumes that there is just one
fault in the logic we are testing.
multiple stuck-at fault model that could handle several faults in
the logic at the same time is too complicated to implement.
In the SSF model we further assume that the effect of the physical
fault (whatever it may be) is to create only two kinds of logical
fault.
The two types of logical faults or stuck-at faults are:
a stuck-at-1 fault (abbreviated to SA1 or s@1)
a stuck-at-0 fault ( SA0 or s@0).
equivalent faults (or indistinguishable faults )
Stuck-at faults attached to different points in a circuit may
produce identical fault effects.
Using fault collapsing we can group these equivalent faults into a
fault-equivalence class
To save time we need only consider one fault, called the prime
fault or representative fault , from a fault equivalence class.
Nondeterministic Fault Simulation
Serial, parallel, and concurrent fault-simulation algorithms are
forms of deterministic fault simulation
we give up trying to simulate every possible fault and instead,
using probabilistic fault simulation , we simulate a subset or
sample of the faults and extrapolate fault coverage from the
sample.
In statistical fault simulation we perform a fault-free
simulation and use the results to predict fault coverage. This is
done by computing measures of observability and
controllability at every node
ATPG algorithm
detect a fault by first activating (or exciting the fault).
To do this we must drive the faulty node to the opposite value of the fault.
work backward from the fault origin to the PIs (primary inputs) by recursively
justifying signals at the output of logic cells.
then work forward from the fault origin to a PO (primary output), setting inputs to
gates on a sensitized path to their enabling values.
We propagate the fault until the D-frontier reaches a PO
We then work backward from the PO to the PIs recursively justifying outputs to
generate the sensitized path
The PODEM Algorithm
Pick an objective to set a node to a value. Start with the fault origin as an objective
and all other nodes set to 'X'.
2. Backtrace to a PI and set it to a value that will help meet the objective. 3.
Simulate the network to calculate the effect of fixing the value of the PI (this step is
called implication ). If there is no possibility of sensitizing a path to a PO, then retry
by reversing the value of the PI that was set in step 2 and simulate again. Update
the D-frontier and return to step 1. Stop if the D-frontier reaches a PO
Controllability and Observability
In order for an ATPG system to provide a test for a fault on a node it must be possible to
both control and observe the behavior of the node
Combinational controllability is defined separately from sequential controllability .
We also separate zero-controllability and one-controllability .
For example, the combinational zero-controllability for a two-input AND gate, Y = AND (X
1 , X 2 ), is recursively defined in terms of the input controllability values as follows:
CC0 (Y) = min { CC0 (X 1 ), CC0 (X 2 ) } + 1 .
We define the combinational one-controllability for a two-input AND gate as
CC1 (Y) = CC1(X 1 ) + CC1 (X 2 ) + 1 .
We define observability in terms of the controllability measures. The combinational
observability , OC (X 1 ), of input X 1 of a two-input AND gate can be expressed in terms of
the controllability of the other input CC1 (X 2 ) and the combinational observability of the
output, OC (Y):
OC (X 1 ) = CC1 (X 2 ) + OC (Y) + 1 .