Design of Sram in Verilog
Design of Sram in Verilog
TABLE OF CONTENTS
List of figures
List of tables
Chapter 1 INTRODUCTION
1.1 Design Objectives
1.2 ACCOMPLISHMENTS
Chapter 2 LITERATURE REVIEW
Chapter 3
3.1 Design ofSRAM
ABSTRACT
Low power and low area Static Random Access Memory (SRAM) is essential
for System on Chip (SoC) technology. Dual-Port (DP) SRAM greatly reduces the
power consumption by full current-mode techniques for read/write operation and the
area by using Single-Port (SP) cell. An 8 bit DP-SRAM is proposed in this study.
Negative bit-line technique during write has been utilized for write-assist solutions.
Negative voltage is generated on-chip using capacitive coupling. The proposed circuit
design topology does not affect the read operation for bit interleaved architectures
enabling high-speed operation. Designed in XILINX ISE 14.4 Simulation results and
comparative study of the present scheme with state of-the art conventional schemes
proposed .show that the proposed scheme is superior in terms of process-variations
impact, area overhead, timings and dynamic power consumption. The proposed
negative bitline technique can be used to improve the write ability of 6 T Single-Port
(SP) as well as 8 T DP and other multiport SRAM cells.
CHAPTER 1
INTRODUCTION
Tools Required:
Simulators: Modelsim 6.5b, Xilinx 12.1i Isim Simulator
Synthesis: Xilinx 14.4 XST (Xilinx Synthesis Technology) Synthesizer.
FPGA Family: Xilinx Spartan 3E XC3S500E.
1.2 ACCOMPLISHMENTS:
This section describes the work done in the project. The accomplishments are
categorized in to four main phases of the project in chronological order:
1. Literature Review
a. Surveyed the theory on memories .
b. Studied the concept of Static RAM.
2. Initially done a pen and paper work and designed a top level block diagram of
SRAM
3. Design Phase
a. Designed a top level module of SRAM in Verilog HDL.
4. Verification Phase
a. Wrote a self checking test bench consisting of a driver, monitor,
checker components in verilog.
CHAPTER 2
LITERATURE REVIEW
Semiconductor memory is an electronic data storage device, often used as computer
memory, implemented on a semiconductor-based integrated circuit. It is made in
many different types and technologies.
Semiconductor memory has the property of random access, which means that it takes
the same amount of time to access any memory location, so data can be efficiently
accessed in any random order.[1] This contrasts with data storage media such as hard
disks and CDs which read and write data consecutively and therefore the data can
only be accessed in the same sequence it was written. Semiconductor memory also
has much faster access times than other types of data storage; a byte of data can be
written to or read from semiconductor memory within a few nanoseconds, while
access time for rotating storage such as hard disks is in the range of milliseconds. For
these reasons it is used for main computer memory (primary storage), to hold data the
computer is currently working on, among other uses.
Shift registers, processor registers, data buffers and other small digital registers that
have no memory address decoding mechanism are not considered as memory
although they also store digital data.
1.1
Description
In a semiconductor memory chip, each bit of binary data is stored in a tiny circuit
called a memory cell consisting of one to several transistors. The memory cells are
laid out in rectangular arrays on the surface of the chip. The 1-bit memory cells are
grouped in small units called words which are accessed together as a single memory
address. Memory is manufactured in word length that is usually a power of two,
typically N=1, 2, 4 or 8 bits.
Data is accessed by means of a binary number called a memory address applied to the
chip's address pins, which specifies which word in the chip is to be accessed. If the
memory address consists of M bits, the number of addresses on the chip is 2 M, each
containing an N bit word. Consequently, the amount of data stored in each chip is N2M
bits.[1] The data capacity is usually a power of two: 2, 4, 8, 16, 32, 64, 128, 256 and
512 and measured in kibibits, mebibits, gibibits or tebibits, etc. Currently (2014) the
largest semiconductor memory chips hold a few gibibits of data, but higher capacity
memory is constantly being developed. By combining several integrated circuits,
memory can be arranged into a larger word length and/or address space than what is
offered by each chip, often but not necessarily a power of two.[1]
The two basic operations performed by a memory chip are "read", in which the data
contents of a memory word is read out (nondestructively), and "write" in which data
is stored in a memory word, replacing any data that was previously stored there. To
increase data rate, in some of the latest types of memory chips such as DDR SDRAM
multiple words are accessed with each read or write operation.
In addition to standalone memory chips, blocks of semiconductor memory are integral
parts of many computer and data processing integrated circuits. For example the
microprocessor chips that run computers contain cache memory to store instructions
awaiting execution.
1.2
Types
RAM chips for computers usually come on removable memory modules like these.
Additional memory can be added to the computer by plugging in additional modules.
RAM (Random access memory) has become a generic term for any semiconductor
memory that can be written to, as well as read from, in contrast to ROM (below),
which can only be read. All semiconductor memory, not just RAM, has the property
of random access.
Volatile memory loses its stored data when the power to the memory chip is turned
off. However it can be faster and less expensive than non-volatile memory. This type
is used for the main memory in most computers, since data is stored on the hard disk
while the computer is off. Major types are:[2][3]
Nonvolatile memory preserves the data stored in it during periods when the power to
the chip is turned off. Therefore it is used for the memory in portable devices, which
don't have disks, and for removable memory cards among other uses. Major types are:
[2][3]
1.3.1
Characteristics
SRAM is more expensive and less dense than DRAM and is therefore not used
for high-capacity, low-cost applications such as the main memory in personal
computers.
1.3.1.1 Clock rate and power
integrated on chip
o as RAM or cache memory in micro-controllers (usually from around
32 bytes up to 128 kilobytes)
o as the primary caches in powerful microprocessors, such as the x86
family, and many others (from 8 KB, up to many megabytes)
SRAM in its dual-ported form is sometimes used for realtime digital signal processing
circuits.[citation needed]
1.3.1.3 In computers
SRAM is also used in personal computers, workstations, routers and peripheral
equipment: CPU register files, internal CPU caches and external burst mode SRAM
caches, hard disk buffers, router buffers, etc. LCD screens and printers also normally
employ static RAM to hold the image displayed (or to be printed).
1.3.1.4 Hobbyists
Hobbyists, specifically homebuilt processor enthusiasts,[2] often prefer SRAM due to
the ease of interfacing. It is much easier to work with than DRAM as there are no
refresh cycles and the address and data buses are directly accessible rather than
multiplexed. In addition to buses and power connections, SRAM usually requires only
three controls: Chip Enable (CE), Write Enable (WE) and Output Enable (OE). In
synchronous SRAM, Clock (CLK) is also included.[citation needed]
1.4
1.4.1
Types of SRAM
Non-volatile SRAM
Non-volatile SRAMs, or nvSRAMs, have standard SRAM functionality, but they save
the data when the power supply is lost, ensuring preservation of critical information.
nvSRAMs are used in a wide range of situationsnetworking, aerospace, and
medical, among many others[3] where the preservation of data is critical and where
batteries are impractical.
1.4.2
Asynchronous SRAM
Asynchronous SRAM are available from 4 Kb to 64 Mb. The fast access time of
SRAM makes asynchronous SRAM appropriate as main memory for small cache-less
embedded processors used in everything from industrial electronics and measurement
systems to hard disks and networking equipment, among many other applications.
They are used in various applications like switches and routers, IP-Phones, IC-Testers,
DSLAM Cards, to Automotive Electronics.
1.4.3
By transistor type
Bipolar junction transistor (used in TTL and ECL) very fast but consumes a
lot of power
1.4.4
Synchronous all timings are initiated by the clock edge(s). Address, data in
and other control signals are associated with the clock signals
1.4.5
By feature
ZBT (ZBT stands for zero bus turnaround) the turnaround is the number of
clock cycles it takes to change access to the SRAM from write to read and
vice versa. The turnaround for ZBT SRAMs or the latency between read and
write cycle is zero.
DDR SRAM Synchronous, single read/write port, double data rate I/O
Quad Data Rate SRAM Synchronous, separate read and write ports,
quadruple data rate I/O
Chapter 3
A typical SRAM cell is made up of six MOSFETs. Each bit in an SRAM is stored on
four transistors (M1, M2, M3, M4) that form two cross-coupled inverters. This
storage cell has two stable states which are used to denote 0 and 1. Two additional
access transistors serve to control the access to a storage cell during read and write
operations. In addition to such six-transistor (6T) SRAM, other kinds of SRAM chips
use 4, 8, 10 (4T, 8T, 10T SRAM), or more transistors per bit. [4][5][6] Four-transistor
SRAM is quite common in stand-alone SRAM devices (as opposed to SRAM used for
CPU caches), implemented in special processes with an extra layer of polysilicon,
allowing for very high-resistance pull-up resistors. [7] The principal drawback of using
4T SRAM is increased static power due to the constant current flow through one of
the pull-down transistors.
During read accesses, the bit lines are actively driven high and low by the inverters in
the SRAM cell. This improves SRAM bandwidth compared to DRAMs in a
DRAM, the bit line is connected to storage capacitors and charge sharing causes the
bitline to swing upwards or downwards. The symmetric structure of SRAMs also
allows for differential signaling, which makes small voltage swings more easily
detectable. Another difference with DRAM that contributes to making SRAM faster is
that commercial chips accept all address bits at a time. By comparison, commodity
DRAMs have the address multiplexed in two halves, i.e. higher bits followed by
lower bits, over the same package pins in order to keep their size and cost down.
The size of an SRAM with m address lines and n data lines is 2m words, or 2m n bits.
The most common word size is 8 bits, meaning that a single byte can be read or
written to each of 2m different words within the SRAM chip. Several common SRAM
chips have 11 address lines (thus a capacity of 2 m = 2,048 = 2k words) and an 8-bit
word, so they are referred to as "2k 8 SRAM".
3.2SRAM operation
An SRAM cell has three different states. It can be in: standby (the circuit is idle),
reading (the data has been requested) and writing (updating the contents). The SRAM
to operate in read mode and write mode should have "readability" and "write stability"
respectively. The three different states work as follows:
3.2.1 Standby
If the word line is not asserted, the access transistors M5 and M6 disconnect the
cell from the bit lines. The two cross-coupled inverters formed by M 1 M4
will continue to reinforce each other as long as they are connected to the
supply.
3.2.2 Reading
Assume that the content of the memory is a 1, stored at Q. The read cycle is
started by precharging both the bit lines to a logical 1, then asserting the word
line WL, enabling both the access transistors. The second step occurs when the
values stored in Q and Q are transferred to the bit lines by leaving BL at its
precharged value and discharging BL through M 1 and M5 to a logical 0 (i. e.
Bus behavior
RAM with an access time of 70 ns will output valid data within 70 ns from the time
that the address lines are valid. But the data will remain for a hold time as well (5
10 ns). Rise and fall times also influence valid timeslots with approximately 5 ns. By
reading the lower part of an address range bits in sequence (page cycle) one can read
with significantly shorter access time (30 ns)
CHAPTER 4
INTRODUCTION TO VLSI & FPGA DESIGN FLOW
Introduction
to VLSI
DATE
COMPLEXITY
Single transistor
1959
less than 1
1960
Multi-function
1962
2-4
Complex function
1964
5 - 20
1967
20 - 200 (MSI)
1972
200 - 2000
1978
2000 - 20000
1989
20000 - ?(ULSI)
(LSI)
Very Large Scale Integration
(VLSI)
Ultra Large Scale Integration
Figure-3.2: Evolution of integration density and minimum feature size, as seen in the
early 1980s.
Therefore, the current trend of integration will also continue in the foreseeable
future. Advances in device manufacturing technology, and especially the steady
reduction of minimum feature size (minimum length of a transistor or an interconnect
realizable on chip) support this trend. Figure 1.2 shows the history and forecast of
chip complexity - and minimum feature size - over time, as seen in the early 1980s. At
that time, a minimum feature size of 0.3 microns was expected around the year 2000.
The actual development of the technology, however, has far exceeded these
expectations. A minimum size of 0.25 microns was readily achievable by the year
1995. As a direct result of this, the integration density has also exceeded previous
expectations - the first 64 Mbit DRAM, and the INTEL Pentium microprocessor chip
containing more than 3 million transistors were already available by 1994, pushing
the envelope of integration density.
When comparing the integration density of integrated circuits, a clear distinction must
be made between the memory chips and logic chips. Figure 1.3 shows the level of
integration over time for memory and logic chips, starting in 1970. It can be observed
that in terms of transistor count, logic chips contain significantly fewer transistors in
any given year mainly due to large consumption of chip area for complex
interconnects. Memory circuits are highly regular and thus more cells can be
integrated with much less area for interconnects.
Figure-3.3: Level of integration over time, for memory chips and logic chips.
Generally speaking, logic chips such as microprocessor chips and digital
signal processing (DSP) chips contain not only large arrays of memory (SRAM) cells,
but also many different functional units. As a result, their design complexity is
considered much higher than that of memory chips, although advanced memory chips
contain some sophisticated logic functions. The design complexity of logic chips
increases almost exponentially with the number of transistors to be integrated. This is
translated into the increase in the design cycle time, which is the time period from the
start of the chip development until the mask-tape delivery time. However, in order to
make the best use of the current technology, the chip development time has to be short
enough to allow the maturing of chip manufacturing and timely delivery to customers.
As a result, the level of actual logic integration tends to fall short of the integration
level achievable with the current processing technology. Sophisticated computer-aided
design (CAD) tools and methodologies are developed and applied in order to manage
the rapidly increasing design complexity.
behavioral domain,
structural domain,
The design flow starts from the algorithm that describes the behavior of the target
chip. The corresponding architecture of the processor is first defined. It is mapped
onto the chip surface by floorplanning. The next design evolution in the behavioral
domain defines finite state machines (FSMs) which are structurally implemented with
functional modules such as registers and arithmetic logic units (ALUs).
These modules are then geometrically placed onto the chip surface using CAD
tools for automatic module placement followed by routing, with a goal of minimizing
the interconnects area and signal delays. The third evolution starts with a behavioral
module description. Individual modules are then implemented with leaf cells. At this
stage the chip is described in terms of logic gates (leaf cells), which can be placed and
interconnected by using a cell placement & routing program. The last evolution
involves a detailed Boolean description of leaf cells followed by a transistor level
implementation of leaf cells and mask generation. In standard-cell based design, leaf
cells are already pre-designed and stored in a library for logic design use.
3.1 CONVOLUTIONAL ENCODER
to fit the architecture into the allowable chip area, some functions may have to be
removed and the design process must be repeated. Such changes may require
significant modification of the original requirements. Thus, it is very important to feed
forward low-level information to higher levels (bottom up) as early as possible.
In the following, we will examine design methodologies and structured
approaches which have been developed over the years to deal with both complex
hardware and software projects. Regardless of the actual size of the project, the basic
principles of structured design will improve the prospects of success. Some of the
classical techniques for reducing the complexity of IC design are: Hierarchy,
regularity, modularity and locality.
3.3 Design Hierarchy
The use of hierarchy, or divide and conquer technique involves dividing a
module into sub- modules and then repeating this operation on the sub-modules until
the complexity of the smaller parts becomes manageable. This approach is very
similar to the software case where large programs are split into smaller and smaller
sections until simple subroutines, with well-defined functions and interfaces, can be
written. In Section 1.2, we have seen that the design of a VLSI chip can be
represented in three domains. Correspondingly, a hierarchy structure can be described
in each domain separately. However, it is important for the simplicity of design that
the hierarchies in different domains can be mapped into each other easily.
As an example of structural hierarchy, Fig. 1.6 shows the structural
decomposition of a CMOS four-bit adder into its components. The adder can be
decomposed progressively into one- bit adders, separate carry and sum circuits, and
finally, into individual logic gates. At this lower level of the hierarchy, the design of a
simple circuit realizing a well-defined Boolean function is much more easier to handle
than at the higher levels of the hierarchy.
In the physical domain, partitioning a complex system into its various
functional blocks will provide a valuable guidance for the actual realization of these
blocks on chip. Obviously, the approximate shape and size (area) of each sub-module
should be estimated in order to provide a useful floorplan. Figure 1.7 shows the
hierarchical decomposition of a four-bit adder in physical description (geometrical
layout) domain, resulting in a simple floorplan. This physical view describes the
external geometry of the adder, the locations of input and output pins, and how pin
locations allow some signals (in this case the carry signals) to be transferred from one
sub-block to the other without external routing. At lower levels of the physical
hierarchy, the internal mask
Figure-3.7: Regular design of a 2-1 MUX, a DFF and an adder, using inverters and
tri-state buffers.
3.4 VLSI Design Styles
Several design styles can be considered for chip implementation of specified
algorithms or logic functions. Each design style has its own merits and shortcomings,
and thus a proper choice has to be made by designers in order to provide the
functionality at low cost.
3.4.1 Field Programmable Gate Array (FPGA)
Fully fabricated FPGA chips containing thousands of logic gates or even more,
with programmable interconnects, are available to users for their custom hardware
programming to realize desired functionality. This design style provides a means for
fast prototyping and also for cost-effective chip design, especially for low-volume
applications. A typical field programmable gate array (FPGA) chip consists of I/O
buffers, an array of configurable logic blocks (CLBs), and programmable interconnect
structures. The programming of the interconnects is implemented by programming of
RAM cells whose output terminals are connected to the gates of MOS pass transistors.
A general architecture of FPGA from XILINX is shown in Fig. 3.8. A more detailed
view showing the locations of switch matrices used for interconnect routing is given
in Fig. 3.9.
A simple CLB (model XC2000 from XILINX) is shown in Fig. 3.10. It
consists of four signal input terminals (A, B, C, D), a clock signal terminal, userprogrammable multiplexers, an SR-latch, and a look-up table (LUT). The LUT is a
digital memory that stores the truth table of the Boolean function. Thus, it can
generate any function of up to four variables or any two functions of three variables.
The control terminals of multiplexers are not shown explicitly in Fig. 3.10.
The CLB is configured such that many different logic functions can be
realized by programming its array. More sophisticated CLBs have also been
introduced to map complex functions. The typical design flow of an FPGA chip starts
with the behavioral description of its functionality, using a hardware description
language such as VHDL. The synthesized architecture is then technology-mapped (or
partitioned) into circuits or logic cells. At this stage, the chip design is completely
described in terms of available logic cells. Next, the placement and routing step
assigns individual logic cells to FPGA sites (CLBs) and determines the routing
patterns among the cells in accordance with the netlist. After routing is completed, the
on-chip
transistors of the array (Fig. 3.11). Since the patterning of metallic interconnects is
done at the end of the chip fabrication, the turn-around time can be still short, a few
days to a few weeks. Figure 3.12 shows a corner of a gate array chip which contains
bonding pads on its left and bottom edges, diodes for I/O protection, nMOS
transistors and pMOS transistors for chip output driver circuits in the neighboring
areas of bonding pads, arrays of nMOS transistors and pMOS transistors, underpass
wire segments, and power and ground buses along with contact windows.
availability of these routing channels simplifies the interconnections, even using one
metal layer only. The interconnection patterns to realize basic logic gates can be
stored in a library, which can then be used to customize rows of uncommitted
transistors according to the netlist. While most gate array platforms only contain rows
of uncommitted transistors separated by routing channels, some other platforms also
offer dedicated memory (RAM) arrays to allow a higher density where memory
functions are required. Figure 3.14 shows the layout views of a conventional gate
array and a gate array platform with two dedicated memory banks.
With the use of multiple interconnect layers, the routing can be achieved over
the active cell areas; thus, the routing channels can be removed as in Sea-of-Gates
(SOG) chips. Here, the entire chip surface is covered with uncommitted nMOS and
pMOS transistors. As in the gate array case, neighboring transistors can be customized
using a metal mask to form basic logic gates. For intercell routing, however, some of
the uncommitted transistors must be sacrificed. This approach results in more
flexibility for interconnections, and usually in a higher density. The basic platform of
a SOG chip is shown in Fig. 1.19. Figure 1.20 offers a brief comparison between the
channeled (GA) vs. the channelless (SOG) approaches.
Figure-3.14: Layout views of a conventional GA chip and a gate array with two
memory banks.
Figure-3.16: Comparison between the channeled (GA) vs. the channelless (SOG)
approaches.
3.4.3 Standard-Cells Based Design
The standard-cells based design is one of the most prevalent full custom design
styles which require development of a full custom mask set. The standard cell is also
called the polycell. In this design style, all of the commonly used logic cells are
developed, characterized, and stored in a standard cell library. A typical library may
contain a few hundred cells including inverters, NAND gates, NOR gates, complex
AOI, OAI gates, D-latches, and flip-flops. Each gate type can have multiple
implementations to provide adequate driving capability for different fanouts. For
instance, the inverter gate can have standard size transistors, double size transistors,
and quadruple size transistors so that the chip designer can choose the proper size to
achieve high circuit speed and layout density. The characterization of each cell is done
for several different categories. It consists of
mask data
chip consists of two blocks, and power/ground routing must be provided from both
sides of the layout area. Standard-cell based designs may consist of several such
macro-blocks, each corresponding to a specific unit of the system architecture such as
ALU, control logic, etc.
The availability of dedicated memory blocks also reduces the area, since the
realization of memory elements using standard cells would occupy a larger area.
In digital CMOS VLSI, full-custom design is rarely used due to the high labor
cost. Exceptions to this include the design of high-volume products such as memory
chips, high- performance microprocessors and FPGA masters. Figure 3.21 shows the
full layout of the Intel 486 microprocessor chip, which is a good example of a hybrid
full-custom design. Here, one can identify four different design styles on one chip:
Memory banks (RAM cache), data-path units consisting of bit-slice cells, control
circuitry mainly consisting of standard cells and PLA blocks.
FPGAs, alternative to the custom ICs, can be used to implement an entire System On one
Chip (SOC). The main advantage of FPGA is ability to reprogram. User can reprogram an
FPGA to implement a design and this is done after the FPGA is manufactured. This brings the
name Field Programmable.
Custom ICs are expensive and takes long time to design so they are useful when produced in
bulk amounts. But FPGAs are easy to implement with in a short time with the help of
Computer Aided Designing (CAD) tools (because there is no physical layout process, no
mask making, and no IC manufacturing).
Some disadvantages of FPGAs are, they are slow compared to custom ICs as they cant
handle vary complex designs and also they draw more power.
Xilinx logic block consists of one Look Up Table (LUT) and one FlipFlop. An LUT is used to
implement number of different functionality. The input lines to the logic block go into the
LUT and enable it. The output of the LUT gives the result of the logic function that it
implements and the output of logic block is registered or unregistered out put from the LUT.
SRAM is used to implement a LUT.A k-input logic function is implemented using 2^k * 1
size SRAM. Number of different possible functions for k input LUT is 2^2^k. Advantage of
such an architecture is that it supports implementation of so many logic functions, however
the disadvantage is unusually large number of memory cells required to implement such a
logic block in case number of inputs is large.
Figure below shows a 4-input LUT based implementation of logic block.
LUT based design provides for better logic block utilization. A k-input LUT based logic block
can be implemented in number of different ways with trade off between performance and
logic
density.
AND
OR
NAND
......
....
Interconnects
A wire segment can be described as two end points of an interconnect with no
programmable switch between them. A sequence of one or more wire segments in an
FPGA
can
be
termed
as
track.
Typically an FPGA has logic blocks, interconnects and switch blocks (Input/Output
blocks). Switch blocks lie in the periphery of logic blocks and interconnect. Wire
segments are connected to logic blocks through switch blocks. Depending on the
required design, one logic block is connected to another and so on.
FPGA DESIGN FLOW
In this part of tutorial we are going to have a short intro on FPGA design flow. A
simplified version of design flow is given in the flowing diagram.
There are different techniques for design entry. Schematic based, Hardware
Description Language and combination of both etc. . Selection of a method depends
on the design and designer. If the designer wants to deal more with Hardware, then
Schematic entry is the better choice. When the design is complex or the designer
thinks the design in an algorithmic way then HDL is the better choice. Language
based entry is faster but lag in performance and density.
HDLs represent a level of abstraction that can isolate the designers from the details of
the hardware implementation. Schematic based entry gives designers much more
visibility into the hardware. It is the better choice for those who are hardware
oriented. Another method but rarely used is state-machines. It is the better choice for
the designers who think the design as a series of states. But the tools for state machine
entry are limited. In this documentation we are going to deal with the HDL based
design entry.
4.1.2 Synthesis
The process which translates VHDL or Verilog code into a device netlist formate. i.e a
complete circuit with logical elements( gates, flip flops, etc) for the design.If the
design contains more than one sub designs, ex. to implement a processor, we need a
CPU as one design element and RAM as another and so on, then the synthesis process
generates
netlist
for
each
design
element
Synthesis process will check code syntax and analyze the hierarchy of the design
which ensures that the design is optimized for the design architecture, the designer has
selected. The resulting netlist(s) is saved to an NGC( Native Generic Circuit) file (for
Xilinx Synthesis Technology (XST)).
4.1.3. Implementation
This process consists a sequence of three steps
1. Translate
2. Map
3. Place and Route
4.1.3.1 Translate
This process combines all the input netlists and constraints to a logic design file. This
information is saved as a NGD (Native Generic Database) file. This can be done using
NGD Build program. Here, defining constraints is nothing but, assigning the ports in
the design to the physical elements (ex. pins, switches, buttons etc) of the targeted
device and specifying time requirements of the design. This information is stored in a
file named UCF (User Constraints File). Tools used to create or modify the UCF are
PACE, Constraint Editor etc.
4.1.3.2 Map
This process divides the whole circuit with logical elements into sub blocks such that
they can be fit into the FPGA logic blocks. That means map process fits the logic
defined by the NGD file into the targeted FPGA elements (Combinational Logic
Blocks (CLB), Input Output Blocks (IOB)) and generates an NCD (Native Circuit
Description) file which physically represents the design mapped to the components of
FPGA. MAP program is used for this purpose.
Place and Route PAR program is used for this process. The place and route process
places the sub blocks from the map process into logic blocks according to the
constraints and connects the logic blocks. Ex. if a sub block is placed in a logic block
which is very near to IO pin, then it may save the time but it may effect some other
constraint. So trade off between all the constraints is taken account by the place and
route process. The PAR tool takes the mapped NCD file as input and produces a
completely routed NCD file as output. Output NCD file consists the routing
information.
CHAPTER 5
Introduction to Verilog
In
a hardware
confused with VHDL (a competing language), is most commonly used in the design,
verification, and implementation ofdigital logic chips at the register-transfer
level of abstraction. It is also used in the verification ofanalog and mixed-signal
circuits.
1.5
Overview
Hardware
description
languages
such
as
Verilog
differ
from
than
that
of
ANSI
C/C++),
and
equivalent control
flow keywords (if/else, for, while, case, etc.), and compatible operator precedence.
Syntactic differences include variable declaration (Verilog requires bit-widths on
net/reg types[clarification
needed]
Verilog
design
consists
of
hierarchy
of
modules.
Modules
encapsulate design hierarchy, and communicate with other modules through a set of
declared input, output, and bidirectional ports. Internally, a module can contain any
combination of the following: net/variable declarations (wire, reg, integer, etc.),
concurrent and sequential statement blocks, and instances of other modules (subhierarchies). Sequential statements are placed inside a begin/end block and executed
in sequential order within the block. But the blocks themselves are executed
concurrently, qualifying Verilog as a dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and strengths (strong, weak, etc.). This system allows abstract modeling
of shared signal lines, where multiple sources drive a common net. When a wire has
multiple drivers, the wire's (readable) value is resolved by a function of the source
drivers and their strengths.
A subset of statements in the Verilog language is synthesizable. Verilog modules
that conform to a synthesizable coding style, known as RTL (register-transfer level),
can be physically realized by synthesis software. Synthesis software algorithmically
transforms the (abstract) Verilog source into a net list, a logically equivalent
description consisting only of elementary logic primitives (AND, OR, NOT, flipflops, etc.) that are available in a specific FPGA or VLSI technology. Further
manipulations to the net list ultimately lead to a circuit fabrication blueprint (such as
a photo mask set for an ASIC or a bit stream file for an FPGA).
1.6 History
1.6.1 Beginning
Verilog was the first modern hardware description language to be invented. It
was created by Phil Moorby and Prabhu Goel during the winter of 1983/1984. The
wording for this process was "Automated Integrated Design Systems" (later renamed
to Gateway Design Automation in 1985) as a hardware modeling language. Gateway
Design Automation was purchased by Cadence Design Systems in 1990. Cadence
now has full proprietary rights to Gateway's Verilog and the Verilog-XL, the HDLsimulator that would become the de-facto standard (of Verilog logic simulators) for
the next decade. Originally, Verilog was intended to describe and allow simulation;
only afterwards was support for synthesis added.
1.6.2 Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make
the language available for open standardization. Cadence transferred Verilog into the
public domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 13641995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put
standards support behind its analog simulator Spectre. Verilog-A was never intended
to be a standalone language and is a subset of Verilog-AMS which encompassed
Verilog-95.
SystemVerilog
Receive early error notifications and deploy run-time checking and error
analysis to simplify debugging.
6.3 Examples
Ex1: A hello world program looks like this:
module main;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
Ex2: A simple example of two flip-flops follows:
module toplevel(clock,reset);
input clock;
input reset;
reg flop1;
reg flop2;
always @ (posedge reset or posedge clock)
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
endmodule
The "<=" operator in Verilog is another aspect of its being a hardware
description language as opposed to a normal procedural language. This is known as a
"non-blocking" assignment. Its action doesn't register until the next clock cycle. This
means that the order of the assignments are irrelevant and will produce the same
result: flop1 and flop2 will swap values every clock.
The other assignment operator, "=", is referred to as a blocking assignment.
When "=" assignment is used, for the purposes of logic, the target variable is updated
immediately. In the above example, had the statements used the "=" blocking operator
instead of "<=", flop1 and flop2 would not have been swapped. Instead, as in
traditional programming, the compiler would understand to simply set flop1 equal to
flop2 (and subsequently ignore the redundant logic to set flop2 equal to flop1.)
Ex3: An example counter circuit follows:
module Div20x (rst, clk, cet, cep, count, tc);
// TITLE 'Divide-by-20 Counter with enables'
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
// a counter using the Verilog language
parameter size = 5;
parameter length = 20;
input rst; // These inputs/outputs represent
input clk; // connections to the module.
input cet;
input cep;
output [size-1:0] count;
output tc;
reg [size-1:0] count; // Signals assigned
// within an always
// (or initial)block
// must be of type reg
wire tc; // Other signals are of type wire
// The always statement below is a parallel
// execution statement that
// executes any time the signals
// rst or clk transition from low to high
The always clause above illustrates the other type of method of use, i.e. the
always clause executes any time any of the entities in the list change, i.e. the b or e
change. When one of these changes, immediately a is assigned a new value, and due
to the blocking assignment b is assigned a new value afterward (taking into account
the new value of a.) After a delay of 5 time units, c is assigned the value of b and the
value of c ^ e is tucked away in an invisible store. Then after 6 more time units, d is
assigned the value that was tucked away.
Signals that are driven from within a process (an initial or always block) must be of
type reg. Signals that are driven from outside a process must be of type wire. The
keyword reg does not necessarily imply a hardware register.
6.3 Constants
The definition of constants in Verilog supports the addition of a width parameter. The
basic syntax is:
<Width in bits>'<base letter><number>
Examples:
basic rule
of
thumb is
to
there
is
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors
can result. Consider the following test sequence of events. 1) reset goes high 2) clk
goes high 3) set goes high 4) clk goes high again 5) reset goes low followed by 6) set
going low. Assume no setup and hold violations.
In this example the always @ statement would first execute when the rising
edge of reset occurs which would place q to a value of 0. The next time the always
block executes would be the rising edge of clk which again would keep q at a value of
0. The always block then executes when set goes high which because reset is high
forces q to remain at 0. This condition may or may not be correct depending on the
actual flip flop. However, this is not the main problem with this model. Notice that
when reset goes low, that set is still high. In a real flip flop this will cause the output
to go to a 1. However, in this model it will not occur because the always block is
triggered by rising edges of set and reset - not levels. A different approach may be
necessary for set/reset flip flops.
Note that there are no "initial" blocks mentioned in this description. There is a
split between FPGA and ASIC synthesis tools on this structure. FPGA tools allow
initial blocks where reg values are established instead of using a "reset" signal. ASIC
synthesis tools don't support such a statement. The reason is that an FPGA's initial
state is something that is downloaded into the memory tables of the FPGA. An ASIC
is an actual hardware implementation.
block. In fact, it is better to think of the initial-block as a special-case of the alwaysblock, one which terminates after it completes for the first time.
//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit
b = a; // Assign the value of reg a to reg b
end
always @(a or b) // Any time a or b CHANGE, run the process
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant
additional uses. The most common of these is an alwayskeyword without
the @(...) sensitivity list. It is possible to use always as shown below:
always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
clk = 1; // Set clk to 1
#1; // Wait 1 time unit
end // Keeps executing - so continue back at the top of the begin
The always keyword acts similar to the "C" construct while(1) {..} in the sense
that it will execute forever.
The other interesting exception is the use of the initial keyword with the
addition of the forever keyword.
6.6 Race Condition
The order of execution isn't always guaranteed within Verilog. This can best be
illustrated by a classic example. Consider the code snippet below:
initial
a = 0;
initial
b = a;
initial
begin
#1;
$display("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Depending on the order of
execution of the initial blocks, it could be zero and zero, or alternately zero and some
other arbitrary uninitialized value. The $display statement will always execute after
both assignment blocks have completed, due to the #1 delay.
6.7 Operators
Note: These operators are not shown in order of precedence.
Bitwise
Logical
Reduction
Arithmetic
Relational
Shift
$monitor - Print out all the listed variables when any change value.
$dumpfile - Declare the VCD (Value Change Dump) format output file name.
CHAPTER 5
2 . ISE Project Navigator (M53 D) - ISE Design Suite Info center window will
be
opened.
3. Select the location where you want to store the project by selecting the
Location field
5. Select the HDL in Top Level Source Type present at bottom of window.
8. In this project settings you can select the product details used to dump our
program.
11. After that it Can back to ISE Project Navigator (M53 D) window .
20.
Verification using
test bench
Chapter 6
Results and discussions
VII. CONCLUSION
In this paper, we have presented the design and implementation of the SRAM . This
design has been simulated in MODELSIM altera 6.6d and synthesized using XILINXISE 14.4i targeted to Spartan 3E FPGA . The given input sequence has been Stored in
SRAM and return as output .
REFERENCES