ASICs Kumar
ASICs Kumar
Content
Chapter 1 Introduction…………………………………………………. 3
Chapter 2 FPGAs
1) Introduction to FPGAs…………………………………….................... 6
Chapter 3 ASICs
6) Comparisons of Technologies…………………………………………. 19
7) Requirement…………………………………………………………… 22
9) Properties……………………………………………………………… 24
i) Extremely Fine-grained……………………………..
12) SA Advantages……………………………………………………… 30
13) SA Disadvantages……………………………………………………. 31
i) Method 1
i) Method 2
i) Method 3
Conclusion………………………………………………………………………….. 40
References………………………………………………………………………….. 40
Structured ASICs
ABSTRACT
There is currently huge gap between these two main technologies used to
implement custom Digital Integrated (IC) designs. At one end of spectrum are
Filed Programmable Gate Array (FPGAs). These devices have relatively low
design costs and short design times, but they also have high per–unit costs and are
limited in terms of design size, complexity, and performance. At the other end of
design continuum are Application Specific Integrated Circuits (ASICs). These
components have exceedingly high design costs and take a long time to develop,
but they can support extremely large, complex, and high performance designs, and
they have low per-unit costs in large production runs.
CHAPTER 1
I. Introduction:
These full custom chip designs were the sole purview of the semiconductor
manufacturers who targeted only the largest markets; frequently providing
customers with multiple sources for most products. Since the 1980s, product
offerings of the typical semiconductor manufacturer have undergone dramatic
changes. Memories, Microprocessors, and other traditional high volume standard
products that are still available are now referred to as “commodity products”. In
addition, current standard product portfolios include a significant proportion of so-
called Application Specific Standard Products (ASSPs). These ASSPs differ from
commodity products through the incorporation of value-added features for specific
market segments (e.g. complete SCSI controllers targeted for workstations).
Furthermore, ASSPs from different vendors may have similar functionality but
they are seldom pin-for-pin compatible; resulting in intensive “design-in
competition” among vendors offering comparable solutions to the same problem.
Figure 1 illustrates this hierarchy in the present Integrated Circuit market. The
introduction of the ASSP designation is a natural result of semiconductor
manufacturers use of the ASIC design methodology for standard products.
However, this points to a problem of accuracy with the current use of the term
ASIC. ASIC chips are not generally just application specific, they are customer
specific. It can be argued that customer programmable logic devices are the only
true ASICs. To compound this problem, the term is used differently by different
people in different contexts. ASIC is defined variously as gate array, any custom,
semi-custom, or programmable technology, and as a design methodology.
An even more gray area is created by the previously discussed ASSPs. Although
they are built using traditional ASIC technology, they are sold like standard parts.
Nonetheless, the term ASIC is still quite useful. In addition to describing gate
array, standard cell, and PLD chip designs, ASIC describes a methodology (or
group of methodologies) for designing electronic systems, and it describes a
technology (or group of technologies) used to build electronic systems. Most of
today’s flexible and cost effective electronic systems are designed using ASIC
methodologies and built using ASIC technology
CHAPTER 2
FPGAs:
1) Introduction to FPGAs:
facility, both time to market and financial risk can be substantially reduced. The
two major classes of field programmable logic, Programmable Logic Devices
(PLDs) and Field Programmable Gate Arrays (FPGAs), have emerged as cost
effective ASIC solutions because they provide low-cost prototypes with nearly
instant “manufacturing”. This class of device consists of an array of uncommitted
logic elements whose interconnect structure and/or logic structure can be
personalized on-site according to the user’s specification. Section III of this
chapter will discuss the design alternatives for this class of ASIC in more detail.
Field programmable logic devices are expected to grow from the 1993 world wide
sales level of 11% to at least 17% by 1998.
Although early PLD and FPGA designs were generated largely by hand, access to
today’s complex programmable logic devices requires the use of an integrated
Computer-Aided Design (CAD) system. Figure (A) illustrates the typical sequence
of operations needed to go from concept to programmed chip. Both commercial
CAD tool vendors and FPGA companies offer appropriate tools. For example,
traditional Electronic Design Automation (EDA) vendors such as Cadence, Mentor
Graphics, Synopsys, and ViewLogic all offer tools to support FPGA design. These
tools are typically used for the front-end design entry and simulation operations
and provide the necessary interfaces to vendor-specific back-end tools for chip
placement and routing. Examples of vendor specific tools are the Xilinx XACT
system and the Altera MAX+PLUS II software. It is worth noting that Altera’s
MAX+PLUS II software supports the entire design flow illustrated in Figure (A)
on either PC or workstation platforms. A detailed discussion of available FPGA
CAD tools is outside the scope of this chapter. Rather, the following discussion is
meant to be indicative of the general operations and steps required in FPGA
design. Where appropriate, examples are taken from the Xilinx and Altera CAD
design flows to illustrate the generic operations.
The starting point in any logic or digital system design is a set of architectural or
behavioral specifications. Traditionally, a designer uses schematic capture tools for
graphical entry of a logic design which has been manually generated to meet the
architectural or behavioral specifications. The upper left hand arrow in Figure (A)
identifies some of the commercial CAD tools available for FPGA schematic
capture. One of the more significant recent innovations in the EDA industry is the
development of tools which allow the designer to move from the gate level to the
behavioral level for design entry. A behavioral design specification is created using
a Hardware Description Language (HDL), and then a synthesis tool automatically
compiles the gate level schematic or net list from the behavioral description. The
upper right hand arrow in Figure (A) indicates some of the HDLs currently being
used for FPGA behavioral modeling. Section VII will present a more detailed
discussion of behavioral modeling and logic synthesis.
CHAPTER 3
ASICs:
In the case of ASICs – of which the currently dominant form is that of standard
cell (SC) devices – these are extremely expensive and time-consuming to develop.
As IC implementation technologies move into the ultra-deep submicron (USDM)
realm (specifically the 90 nanometer node and below), power, timing, and signal
integrity issues become evermore complex. Reaching closure on these issues takes
so much effort that the design team now spends more time addressing these aspects
of the design than they spend architecting, capturing, and verifying the logical
functionality of the device. In addition to protracted development times, the
photomasks associated with a new ASIC are becoming prohibitively expensive (in
the order of $1 million for a reasonably complex 90 nanometer device).
Furthermore, the manufacturing turnaround time to actually fabricate these devices
significantly impacts their time-to-market. The long development and
manufacturing times associated with standard cell ASICs pose particular problems
with regard to today’s short product life cycles and the need to address constantly
evolving standards and protocols. However, these devices do have the advantages
that they can be used to implement the largest, most complex, high-performance
designs. They also have a low per-unit cost when used in large production runs in
the order of 50,000 units or more.
In the classic full custom design style, each primitive logic function or transistor is
manually designed and optimized. This results in the most compact chip design
with the highest possible speed and lowest power dissipation. However, the initial
investment or Non-Recurring Engineering (NRE) cost is highest compared to all
other design styles. The designer must manipulate the individual geometric shapes
which represent the features of each transistor on the
chip; hence the often applied term for full custom design: “polygon pushing”. A
relatively simple 3000 gate design might require the handling of 300,000
rectangles per chip. Although this design style was used exclusively in early ICs,
engineers rarely use it for today’s ASICs due to the high engineering costs and low
designer productivity. Productivity for full custom logic designs is typically only 6
to 17 transistors per day.2 The exception is in high volume commodity products
such as memories which must be hand-crafted to meet density and performance
requirements. In addition, at least portions of high-end products such as
microprocessors are full custom designed for performance reasons. Worldwide
sales of full custom ASIC designs are predicted to grow only slightly from the
current level of $2.7 Billion to $2.9 Billion in 1998 (a declining market share from
23% to 16%).
ii) Standard Cell Design:
In the standard cell design methodology, pre-defined logic and function blocks are
made available to the designer in a cell library. Typical libraries begin with gate
level primitives such as AND, OR, NAND, NOR, XOR, Inverters, flip-flops,
registers, and the like. Libraries generally include more complex functions such as
adders, multiplexers, decoders, ALUs, shifters, and memory (RAM, ROM, FIFOs,
etc.). In some cases, the standard cell library may include complex functions such
as multipliers, dividers, microcontrollers, microprocessors, and microprocessor
support functions (parallel port, serial port, DMA controller, event timers, real-
time clock, etc.). Standard cell designs are created using schematic capture tools or
via synthesis from a Hardware Description Language (HDL). Section VII of this
chapter will discuss behavioral modeling and synthesis options in more detail.
Automated tools are then used to place the cells on a chip image and wire them
together. Standard cell layouts are easily identified by rows of equal height cells
separated by wiring channels. Large macro-cells such as multipliers or
microcontrollers may span multiple cell rows and block some of the wiring
channels. Standard cell designs operate a lower clock rates and are generally less
area efficient than a full custom design due to the fixed cell size constraints and
requirements for dedicated wiring channels. However, very high layout density is
achieved within the cells themselves, resulting in densities which can approach that
of full custom designs with substantially shorter design times. In the field of
CMOS ASICs, standard cells are the fastest growing market segment with
worldwide sales of $2.9 billion on 1993. This 26% market share is expected to
grow to $6.2 billion or 34% of the total ASIC market by 1998. The growing market
success of standard cell design is largely due to the increasing availability of
“mega-cell” and “core” functions in their libraries. These simplify ASIC design by
providing entire sub-system or chip level functional blocks (e.g. a RISC processor
core, a DMA or memory controller, or a complete I/O subsystem) from which the
designer can compose a complex ASIC.
corresponding to a predicted dollar sales growth from $3.5 billion to $5.1 billion.
Table 1 summarizes the current ASIC worldwide market and indicates the
forcasted trends. Although field programmable logic represents only a small
percentage of total ASIC market sales, statistics indicate that approximately one
half of all chip design projects today are begun using FPGAs.
Table 1. ASIC Market Forecast (predicted worldwide sales in
millions of dollars)
estimates the differences between the various design style alternatives for a
particular ASIC, using relative numbers which are normalized to the full custom
design style.
figure.
typically lump ASIC designs into two groups, CMOS and “other” technologies,
this section attempts to take broader view of the technology alternatives. Bipolar,
BiCMOS, and GaAs ASICs each have unique advantages for many high
performance applications. Figure 3 presents a taxonomy of available
semiconductor process technologies for ASICs. At the topmost level, the tree splits
into silicon and gallium arsenide (GaAs) technologies. GaAs has been slowly
expanding from its historical markets in the military and aerospace fields; and may
be ready to expand into the mainstream digital IC market. It has inherent
performance advantages over silicon, mainly due to its higher carrier mobility.
Electrons travel four to five times faster through GaAs than bulk silicon, which
means that GaAs logic will operate at much higher clock frequencies. Furthermore,
lower electric fields are necessary to achieve the maximum mobility when
compared to CMOS. Thus, as system operating voltages are reduced, this
performance advantage becomes even greater. Although significant progress has
been made to solve the historical materials and processing problems of GaAs, it
still has several fundamental disadvantages when compared to silicon technology.
For example, unlike silicon there is no native oxide to act as an insulator to
produce simple MOS style logic elements. Also, holes in GaAs move more slowly
than in silicon which makes complementary (CMOS) style circuit operation
inefficient.
As indicated on the right hand side of Figure 3, GaAs has many circuit topologies
and device types. The most dominant commercially available GaAs technologies
are Direct Coupled FET Logic (DCFL) and Source-Coupled FET Logic (SCFL).
DCFL is similar in design to NMOS and because of its low transistor count circuits
provides higher gate-count chips. Its speed is comparable to bipolar Emitter
Coupled Logic (ECL) with a 60% reduction in power dissipation.
SCFL has significantly higher speed than DCFL, with correspondingly higher
power dissipation. Other common GaAs technologies include Buffered FET Logic
(BFL) and Bipolar Integrated Shottkey Logic (BSL). Looking at the left side of
Figure 3, it can be seen that silicon technologies split into two main categories;
bipolar and unipolar. This distinction is made because in bipolar devices both
majority and minority carriers participate in transistor operation. Bipolar processes
were dominant during the 1960’s and 70’s, and offer the potential for very high
speed operation using Bipolar Junction Transistors (BJTs). However, the power
dissipation in bipolar circuits is quite high, and the device density is not as great as
in MOS designs. The still popular Transistor Transistor Logic (TTL) logic family,
as well as Emitter Coupled Logic (ECL) and Integrated Injection Logic (IIL)
families fall in this category. An important point about TTL logic is that its
input/output voltage levels still constitute a de facto standard which is followed by
other logic families. The most important class of unipolar devices for ASICs are
the Metal-Oxide Semiconductor (MOS) devices used in the PMOS, NMOS, and
CMOS processes. While other unipolar technologies exist, such as the Metal-
Nitride Oxide Semiconductor (MNOS) process used in nonvolatile memories, they
do not represent a significant part of the ASIC market. Although universally used
today, the acronym MOS is an outdated term. Metal refers to the gate layer, Oxide
refers to the silicon dioxide insulator, and Semiconductor to the channel being
controlled by the gate. MOS processes today make almost exclusive use of
polysilicon rather than metal for the gate material. MOS Field Effect Transistors
(MOSFETS) are available in two basic flavors; P-Channel MOSFETS and N-
channel MOSFETS. The term PMOS refers to an MOS process which exclusively
uses P-channel MOSFET transistors; and similarly NMOS refers to an MOS
process which exclusively uses N-channel transistors. Although used extensively
in early MOS designs, PMOS technology is not used today because of the poor
electrical characteristics of P-channel transistors. This is because the mobility of
holes (the majority carrier in PMOS) is considerably poorer than the mobility of
electrons (the majority carrier in NMOS). NMOS design offers excellent density
and reasonable performance; but it is seldom used today because of the difficulties
in designing ratioed NMOS logic and because it dissipates static power. However,
some designs such as Dynamic Random Access Memories (DRAMs) utilize
NMOS style circuits for the bulk of the chip array while providing CMOS support
logic and I/O circuits. The term CMOS (Complementary MOS) refers to an MOS
process that simultaneously provides both P- and N-channel transistors. Although
CMOS logic circuits have increased fabrication costs and increased area when
compared to PMOS or NMOS, their ease of design, potential for very low static
power dissipation, and high speed operation make CMOS the technology of choice
in the 90’s. Finally, BiCMOS is a relatively recent technology introduction which
incorporates both Bipolar and CMOS devices on the same chip. Typically, most of
the logic in a BiCMOS ASIC is CMOS, while the bipolar devices are used for on-
chip and off-chip drivers. The advantage of the bipolar drivers is that they are
capable of driving much higher loads without sacrificing speed. Compared to
CMOS, BiCMOS is significantly faster, but chip cost can be two or three times
higher.
6) Comparison of Technologies:
It is a tautology in the electronics industry that each new generation of systems
must be of smaller size, lower weight, lower cost, higher speed, and higher
reliability than the previous one.
crossover point moves as process technology scales; and the CMOS crossover
point can be expected to move to higher clock frequencies for deep sub-micron
CMOS processing.
CHAPTER 4
STRUCTURED ASICs:
7) Requirement:
All of the above points serve to illustrate that there is a huge gap between the two
main technologies currently used to implement custom digital IC designs.
The rising cost of developing standard cell ASICs means that many companies can
no longer afford to use these devices. At the same time, FPGAs aren’t appropriate
for many of these designs due to capacity and performance issues and/or high per-
unit costs. What is required is a new implementation technology that overcomes
the design size, complexity, performance, and power consumption limitations of
FPGAs, but which also addresses the long development times, high development
costs, and long manufacturing lead times associated with standard cell FPGAs. In
addition, this new technology should offer a reasonably low per-unit cost, thereby
making these components suitable for medium-size production runs. The solution
may well be a new class of devices known as structured ASICs (SAs). This paper
introduces the concept of structured ASICs along with some comparisons between
standard cell, structured ASIC, and FPGA implementations. Also provided is an
overview of some of the alternative structured ASIC architectures that are currently
being made available to the market. Finally, the paper discusses the challenges
these devices present to vendors of electronic design automation (EDA) tools.
An array (sea) of these tiles is then prefabricated across the face of the chip.
Structured ASICs also typically contain additional prefabricated elements, which
may include configurable general-purpose I/O, microprocessor cores, gigabit
transceivers, embedded (block) RAM, and so forth.
In many respects these devices are similar to modern, high-end gate array ASICs.
The key differentiator with regard to Structured ASICs is that the majority of the
metallization layers are also prefabricated. This means that the transistors forming
the core logical functions comprising each tile (gates, multiplexers, etc) are already
wired together. Also, much of the local and global interconnect has also been
9) Properties:
Low NRE cost
– Implementation engineering effort
– Mask tooling charges
High performance
Low power consumption
Less Complex
– Fewer layers to fabricate
Small marketing time
– Pre-made cell blocks available for placing.
This is a somewhat “gray” area, because the majority of vendors with structured
ASIC offerings are still working in “stealth mode,” which means that detailed
descriptions of their internal architectures are not readily available. Thus, the
following architectural descriptions are “composites” that have been gleaned from
a variety of sources. In addition to its own unique version of a basic tile, each
vendor offers its own selection of hard, firm, and soft IP. Hard IP comes in the
form of configurable I/O blocks that can be modified (via the user-definable
metallization layers) to handle a variety of standard I/O interfaces. Other hard IP
blocks include standard interfaces like PCI, gigabit transceivers, microprocessor
cores, embedded RAM, and so forth. Each vendor may offer a family of devices
containing different combinations of hard IP blocks combined with various
quantities of basic tiles. Firm IP comes in the form of a library of high-level
functions that have been optimally mapped, placed, and routed for this vendors
particular architecture, while soft IP is presented as a source-level library of high-
level functions that can be included into the users’ designs. In many cases the hard,
firm, and soft IP from the various vendors are simply variations on a theme. The
real differentiator between devices comes in the contents and architecture of the
basic tile.
i) Extremely fine-grained:
Some vendors are evaluating an extremely fine-grained version of a basic tile that
comprises only unconnected components such as transistors and resistors.
These architectures are extremely close to those of modern high-end gate array
devices. The difference being that – in the case of the structured ASIC,
metallization has been added so as to almost connect these components in a variety
of pre-defined configurations. Thus, the userdefinable metallization layers are used
to complete the appropriate connections, and to link the tiles into the local and
global routing architecture.
ii) Medium-grained tiles:
Other vendors have opted for a medium-grained architecture. In this
case, the tile might contain some generic logic in the form of gates and/or
multiplexers along with one or more flip-flops
In both of these cases the polarity of the flip-flops’ clock inputs (i.e., whether each
register should be positive- or negative-edge-triggered) and the polarity of their set
and reset inputs can be determined by the customized metallization layers.
(An
example
“base tile”)
containing registers, memory elements, and other logic to form a master tile, then
an array (sea) of these master tiles is prefabricated across the face of the chip.
One consideration with regard to the granularity of the architecture is that fine-
grained implementations require a lot of connections into and out of each tile
compared to the amount of functionality that can be supported by the time. By
comparison, as the granularity of the tile increases to medium-grained and higher,
the amount of connections into the tile compared to the functionality it can support
decreases.
One final twist that can potentially be applied to all of the above architectures is
that some devices require the customization of a number of metallization layers
(this might be two tracking layers, or it could be two tracking layers and one or
more via layers). By comparison, at least one vendor is fielding an architecture that
requires the customization of only a single via layer. In addition to cutting photo-
mask and production costs to a minimum (and further reducing back-end
production times), this scheme means that the prefabricated track segments are
extremely well characterized in terms of parasitics, delays, and signal integrity
issues. The disadvantage is that you have less flexibility with regard to routing, but
this may be mitigated to some extent by the granularity of the architecture as
discussed in the previous point.
vi) SA versus SC versus FPGA:
Once again, our ability to provide definitive comparisons between structured
ASICs and other technologies is somewhat limited due to the lack of hard data
supplied by the device vendors. One important metric is the density of usable
equivalent gates per square millimeter (mm2). This can be confusing even when
it comes to comparing standard cell devices to FPGAs, because the former user
the concept of an equivalent gate (typically a 2-input NAND), while the latter
often base things on the concept of a “system gate.” The problem is that the
mapping of FPGA system gates to ASIC equivalent gates is design-dependent
and is a function of the mix between combinatorial and sequential logic.
Keeping this in mind, it is generally accepted that standard cell architectures
can support an equivalent gate density of approximately 100,000 gates/mm2,
while FPGAs can only offer around 1,000 gates/mm2, which is a factor of
100:1. By comparison, some structured ASIC architectures are rumored to
support around 33,000 gates/mm2, which is a factor of 3:1 compared to
standard cells. That is, a structured ASIC can support 0.33x the number of gates
as a standard cell device and 33x the number of gates in an FPGA component in
the same area. With regard to performance, if the same design is implemented
in standard cell and FPGA devices, it is typically the case that the FPGA can
only achieve 10% to 20% of the performance if the standard cell
implementation (in terms of clock frequency). By comparison, early results on
structured ASICs suggest that these devices can achieve 70% to 80% of the
performance of a standard cell implementation. In the case of power, FPGAs
typically dissipate 10x to 15x that of an equivalent standard cell
implementation. Once again, early results on structured ASICs suggest that
these devices consume only 2x to 3x the power of their standard cell
counterparts. Some additional metrics that are being quoted (although not
referenced) is that the development costs of a structured ASIC design are only
25% those of a standard cell equivalent. Furthermore, the production unit price
One by-product of the structured ASIC philosophy is that these devices are much
easier and faster to design than are their standard cell cousins. There are a variety
of reasons for this, such as the fact that multiple global and local clock domains are
typically prefabricated in the master fabric and are implemented in such a way that
there are no skew problems that need be addressed by the design engineers.
Similarly, design-for-test considerations are addressed by the fact that functions
such as boundary scan (JTAG), full internal scan, and BIST are all typically
embedded in the basic fabric. In order to mitigate USDM timing and signal
integrity effects, the ASIC vendor works to pessimistic, highly guard-banded
specifications. This allows signal integrity issues and timing issues (in the form of
setup and hold violation times associated with internal registers) to be
automatically addressed by the architecture or the design tools. Due to the fact that
structured ASICs need only a limited number of metallization layers to complete
them, the costs associated with generating the photo-masks are dramatically
reduced. Furthermore, the fact that the device is largely prefabricated radically
shrinks the turnaround time to working silicon. This also means that structured
ASICs can undergo faster and cheaper modification cycles in order to
accommodate evolving standards and protocols. Overall, the capacity,
performance, and power consumption of a structured ASIC is much closer to that
of a standard cell realization of the design as opposed to an FPGA implementation.
Additionally, the faster design time, lower mask costs, and quicker turnaround to
final silicon – along with the lower costs resulting from the fact that the majority of
the device is pre-fabricated – means that the per-unit cost of structured ASICs is
extremely reasonable for medium-low to medium-high production runs.
13) SA Disadvantages:
One problem with structured ASICs is that the current design tools – which are
currently predominantly based on traditional ASIC offerings – are both expensive
and not well-suited to the task. Another is that the diverse architectures fielded by
the various vendors are so new that they have not yet been subject to any form of
formal evaluation and comparative analysis (unlike alternative FPGA architectures
– such as the tradeoffs between 3-, 4- , and 5-input LUTs – which have undergone
extensive research by the industry and academia)
CHAPTER 5
Prefabricated
– Standard Cells, Flip-Flops, DSP, Memory and other IP (Intellectual
Properties)
– Interconnects for modules, DFT circuit, and clocks
2 main clocks:
Accessible from anywhere.
8 local clocks:
Chip divided into 4 regions
4 local clocks can be assigned to each region
Region divided into 4 sub regions.
Each sub region assigned 2 local clocks
For this more clock signals are needed. To add more clock signals Use a
custom layer to implement an additional clock signal. Custom layer is
limited, so it many not be feasible. Try to avoid this as much as possible.
i) Method 1:
Perform placement
i) Method 2
Problems
i) Method 3:
In Structured ASIC’s:
Partitioning
– Create clusters of cells and FFs based on clock, delay, and other
constraints
Floorplanning
– Assigning the clusters to each region
Incremental Floorplanning
– Move violating FFs to other regions
Conclusions:
References:
T. Okamoto, T. Kimoto, N. Maeda, “Design Methodology and Tools for NEC Electronics
- Structured ASIC ISSP", [p. 90] Proceeding of the 2004 international symposium on
Physical design.
B. Zahiri, “Structured ASICs: Opportunities and Challenges,” Proceedings of the 21st
International Conference on Computer Design (ICCD’03).
K. Wu, Y. Tsai, “Structured ASIC, Evolution or Revolution?,” Faraday Technology
Corporation, Proceedings of the 2004 International Symposium on Physical Design.