CVR College of Engineering: Design of Different Low Power, High Performance Cam Cell For 4 Word 3 Bits Cam Architecture
CVR College of Engineering: Design of Different Low Power, High Performance Cam Cell For 4 Word 3 Bits Cam Architecture
P. Naveen (19B81A0486)
K. Venkateswarlu (19B81A04B4)
2022-23
DESIGN OF DIFFERENT LOW POWER, HIGH PERFORMANCE CAM
CELL FOR 4 WORD 3 BITS CAM ARCHITECTURE
Project work report submitted in partial fulfillment of the requirements
for the degree of
Bachelor of Technology
in
Electronics and Communication Engineering
Submitted by
P. Harsha Vardhan (19B81A0476)
P. Naveen (19B81A0486)
K. Venkateswarlu (19B81A04B4)
2022-23
Cherabuddi Education Society’s
CERTIFICATE
This is to certify that the project work titled “DESIGN OF DIFFERENT LOW
POWER, HIGH PERFORMANCE CAM CELL FOR 4 WORD 3 BITS CAM
ARCHITECTURE” submitted to the CVR College of Engineering, affiliated to JNTUH, by P.
Harshavardhan(19B81A0476), P. Naveen (19B81A0486), K.Venkateswarlu(19B81A04B4), is
a bonafide record of the work done by the students towards partial fulfillment of requirements for
the award of the degree of Bachelor of Technology in Electronics & Communication
Engineering during the academic year 2022-2023.
Dr.P.Hema Sree
Associate Professor
Dept. of ECE
Place:
Date:
Cherabuddi Education Society’s
DECLARATION
I hereby declare that this project report titled “DESIGN OF DIFFERENT LOW POWER, HIGH
PERFORMANCE CAM CELL FOR 4 WORD 3 BITS CAM ARCHITECTURE” submitted to the
Department of Electronics and Communication Engineering, CVR College of Engineering is a
record of original work done by me under the guidance of K.A.Jyotsna,Associate Professor. The
information and data given in the report is authentic to the best of my knowledge. This project
report is not submitted to any other university or institution for the award of any degree or diploma
or published any time before.
P. Harshavardhan (19B81A0476)
P. Naveen (19B81A0486)
K. Venkateswarlu (19B81A04B4)
Date:
Place:
Acknowledgement
The satisfaction that accompanies the successful completion of any task would be
incomplete without the mention of the people who made it possible and whose encouragement and
guidance has been a source of inspiration throughout the course of the project.
It is great pleasure to convey our profound sense of gratitude to our principal Dr. K.
Ramamohan Reddy, Dr. K. Lalithendra, Head of the ECE Department, CVR college of
Engineering for having been kind enough for arranging the necessary facilities for executing the
project in the college.
We would like to express our sincere gratitude to our supervisor, K.A.Jyostna, Associate
Professor, ECE Dept, CVR College of Engineering, whose guidance and valuable suggestions
have been indispensable to bring about the successful completion of our project.
We would also like to express our gratitude to all the staff members and lab faculty,
department of Electronics and Communication Engineering, CVR College of Engineering for the
constant help and support.
We wish a deep sense of gratitude and heartfelt thanks to management for providing
excellent lab facilities and tools. Finally, we thank all those whose guidance helped us in this
regard.
Abstract
Acknowledgement 4
Abstract 5
Contents 6
Chapter 1 Overview 9
1.1 Introduction
1.3 Methodology
3.1 Introduction
3.2 Volatile Memory
3.2.1 DRAM
3.2.2 SRAM
3.3 Non-Volatile Memory
3.3.1 READ ONLY MEMORY (ROM)
Chapter 7 Results 38
8.1 Conclusion
REFERNECES 46
LIST OF FIGURES
1.1 Introduction
It has become the requirements of storage devices, because the computer era
drawing. In particular, this memory should be associative in nature. For practical purposes, this
memory also needs to be fast and cheap in cost. Most storage device stores and by addressing
specific memory location to retrieve the data. As a result, this path has become dependent on access
to the system flash memory constraints. To find that time is stored in a memory in the project
needed can be significantly decreased if the project can access its content by replacing its address
is recognized. Can be accessed in this way is called content-addressable memory (CAM). Thus,
content addressable memory (CAM) is one such memory is fast and intuitive.
1.2 Aim of the system
1.3 Methodology
Content-addressable memory (CAM) is one in which it can use its contents, instead
of using the accessed memory cell storage memory. When the CAM receives input data word to
search it for the table stored in the CAM memory data word, it returns the data in which the search
word is stored in the address. The entire operation is done using active parallel circuit in a single
clock cycle. Therefore, it is very fast, only in need of high-speed applications. However, this will
consume a lot of power.
1.4 Significance of the work
Conclusion
In a new method is called block XOR-CAM way to improve the low-power precalculated
based CAM proposed efficiency. In this project, we propose an XOR block parameter extractor
for low power CAM. The project presents the theory and practice of proof to verify that the
proposed block XOR CAM can effectively reduce the number of comparison operations compare
the second part of the process to achieve greater power reduction. This means that this method is
more flexible and adaptive to the general design. In addition, the proposed block XOR-CAM bit
parallel parameter may be calculated only three XOR gate delay of any input bit length (constant
delay search operation).
We propose a technique to reduce the content addressable memory (CAM) technique called
pipelining match line power consumption. In this technique, the search operation is performed by
matching lines into paragraphs pipeline sabotage. Since most of the stored word do not match on
their first portion, then the subsequent search operation is aborted segment. Accordingly, the power
will be reduced. Saving pipeline ML is the result of the activation of only a small segment matching
portion.
In the architecture with low power consumption, low cost, low voltage, previously called
complete parallel computing-based content addressable memory (PBCAM) and high reliability.
This design is based on a pre-computed skill, to save power consumption of the CAM reduction
in the second part of the comparison process of comparing the number of passed. In this design, a
person counting method used to calculate in advance. For this, one count parameter extractor is to
use a chain full adder design, but it increases the data bit length increases the delay.The method to
reduce the power consumption of the technology available in the content-addressable memory
(cam) match line, called the selective pre-charge technique.
Since CAM compares all of its stored words concurrently, its search speed is high. However,
this comes at a cost of high energy consumption, mainly due to the high switching activity of the
ML’s. Therefore, a lot of works have been proposed to reduce the energy consumption of CAM
by reducing either the switching activity or voltage swing of the MLs. Among those designs, the
charge-injection and the pre-charge low MLSAs are the most attractive designs because of their
single-clock, high-speed and low energy characteristics. However, all of these designs have their
limitations. The charge-injection design is robust but a capacitor is associated for every ML, so
the area of the MLSA increases a lot; the pre-charge low MLSAs including the current-race, the
stability and the positive-feedback designs are very sensitive to external environment variations
including temperature and process corners. In view of that, we propose an improved MLSA design
offering a better performance in terms of energy, area, and robustness.
With strict literature investigation, it was decided to modify the architecture and design CAM
to achieve low power consumption and high-speed operation by reducing the comparison of the
number of the conventional CAM. To achieve this goal, we use an extra bit in each data word
stored in the data store. This extra bit is used as parity bits corresponding to the stored data word.
In the existing architecture of CAM, parallel to compare the contents of functions in a single clock
cycle for each CAM word (cell). Therefore, this design has to run fast, but there are a lot of power
consumption. To reduce this power consumption, there are two techniques in one word for
modifying the structure of the level comparison circuit. Although these techniques to save power,
but will increase its latency and area. Therefore, in the preconstruction level calculation method it
is proposed to reduce the number of comparators to save electricity. A method is a method of
extracting a person's number of parameters, and the other is a block XOR method.
However, these methods take a lot of area for storing bits of these parameters and their
extraction. They must also extract and search operation parameters via the parameter memory,
although it reduces the power consumption data comparison of power loss. Thus, based on this
approach, we have decided to adopt a parity bit as arguments than previous PB-CAM it just a bit-
storage parameters, and reduce the number of comparison operations to reduce power consumption
and improve the ratio of performance to achieve PB-CAM Traditional CAM.
Chapter Three
Types Of Memory
2.1 INTRODUCTION
In computing, memory refers to the computer hardware devices used to store information for
immediate use use in a computer, it is synonymous with the term "primary storage". Computer
memory operates at a high speed, for example Random-Acess memory (RAM), as a distinction
from storage that provides slow to access program and data storage but offers higher capacities. If
needed, contents of the computer memory can be transsferred to secondory storage, through a
memory management technique called "virtual memory". An archaic synonym for memory is
store.
The term "memory". meaning "primary storage" or "main memory", is often associated with
addressable semiconductor memory, i.e. integrated ciecuits consisting of silicon based transistors,
used for exampleas primary storage but also other purpose in comuters and other digital electronic
devices, there are two main kinds of semiconductor type memory, volatile and non-volatile. Some
examples of non-volatile memory are ROM,PROM,EPROM and EEPROM memory(which is
used for storing firmware such as BIOS). Examples of volatile memory are primary storage, which
is typically Dynamic Random-access memory (DRAM), and fast CPU cache memory, which is
typically static Random-access memory(SRAM) that is fast but energy-consuming, offering lower
memory areal density than DRAM.
Flash memory organisation include both one bit per memory cell and multiples bits per cell
(called MLC, Multiple Level Cell). The memory cells are grouped into words of fixed word length,
for example 1,2,4,8,16,32,64 or 128 bit. Each word can accessed by a binary of N bit, making it
possible to store 2 raised by N words in the memory. This implies that processor registers normally
are not considered as memory, since they only store one word and do not include addressing
mechanisim.
Typically secondary storage devices are hard disk drives and solid-state drives.
2.2 VOLATILE MEMORY
Volatile memory is computer memory that requires power to maintain the stored information. Most
modern semiconductor volatile memory is either Static RAM (see SRAM) or dynamic RAM. SRAM
retains its contents as long as the power is connected and is easy to interface to but uses six
transistors per bit.
Dynamic RAM is more complicated to interface to and control and needs regular refresh cycles
to prevent its contents being lost. However, DRAM uses only one transistor and a capacitor per
bit, allowing it to reach much higher densities and, with more bits on a memory chip, be much
cheaper per bit. SRAM is not worthwhile for desktop system memory, where DRAM dominates,
but is used for their cache memories. SRAM is commonplace in small embedded systems, which
might only need tens of kilobytes or less. Forthcoming volatile memory technologies that hope to
replace or compete with SRAM and DRAM include Z-RAM, TTRAM, A-RAM and ETA RAM.
2.2.1 DRAM
The term static differentiates SRAM from DRAM(dynamic random-access memory) which
must be periodically refreshed. SRAM is faster and more expensive than DRAM; it is typically
used for CPU cachewhile DRAM is used for a computer's main memory.
CAM is used in an opposite manner to find the address from the data Binary CAM requires
an exact match, while ternary CAM uses the X bit (don't care bit) for a wild card match. With
ternary CAM, the address that matches the most bits is the one selected. This is known as "longest-
prefix matching," the routing table lookup method of the Internet protocol.
In RAM, data are stored at a particular location called address. A user supplies the address
in order to retrieve the data. With CAM, the user supplies the data and gets the address back. The
CAM searches through the memory in one clock cycle and returns the address where the data are
found. An obvious question is, how to store the data in the first place? Data can be transferred to
or from a CAM without knowing the memory address. Binary data are automatically written to the
next free location.
With CAMS, a longest prefix matching operation on a 32 bit IPv4 addresscan be performed
using an exact match search in 32 separate CAMS. The incoming IP address is given as input to
all the CAMS. The output of the CAMS indicating the results of the match is fed through a priority
encoder that picks the CAM that has the longest matching prefix. Such a solution is expensive both
in terms of cost and complexity.
Hence, a more flexible type of CAM that enables comparisons of the input key with variable
length elements is desirable. Ternary CAMS were introduced to address this need. While a binary
CAM stores one of two states 0 and 1 for each bit in a memory location, a ternary CAM (TCAM)
stores one of the three states 0, 1, and X (don't care) for each bit in a memory location. The don't
care state permits search operations beyond the exact match.
A TCAM stores an element as a pair: a value and a mask, where each of them is the same
length. The value field stores the actual value of the prefix and the mask field is used to denote the
length of the element. Let us see how this works. If a prefix is Y bits long, the most significant Y
bits of the value field are assigned the same value as that of the prefix, and the remaining bits are
set 0 or 1 or X. The most significant Y bits of the mask field are set to 1 and the remaining bits are
set to 0. Thus, the mask bits indicate which bits in the value field are relevant. For example, a
prefix of 110 will be stored as (110XXX,111000) assuming the elements are 6 bits long. The
prefixes are stored in TCAM. Note that the prefixes are stored in descending order of their lengths.
An incoming key matches a stored element, if the bits of the value field for which the mask bit is
1 are identical to those in the incoming key.
NAND BCAM cell is shown in Figure 4. Bits stored in NAND BCAM cell is with the help of
SRAM structure. Comparison in NAND BCAM cell is performed by using three transistors T1,
TD1, TD2. NAND BCAM cell consumes less power because of low load capacitance to the match
line but results from high delay because of their long pull down paths . The main purpose of CAM
is to design thearchitecture with low power consumption without degrading the performance.
Therefore normally NOR type BCAM cell is preferred when designing cells in CAM architecture
because of their low delay, even in the worst case NOR has a better delay than NAND.
A priority encoder is a circuit or algorithm that compresses multiple binary inputs into a
smaller number of outputs. The output of a priority encoder is the binary representation of the
original number starting from zero of the most significant input bit. Binary Encoder converts one
of 2n inputs into an n-bit output. It has fewer output bits than the input code. The Priority Encoder
is another type of combinational circuit similar to a binary encoder, except that it generates an
output code based on the highest prioritized input.
The figure (8) shown below is the basic schematic of the priority encoder:
The priority encoder consists of the following components:
1. OR GATES
2. 4 INPUT NOR GATE
3. 3 INPUT NAND GATE
4. INVERTER
All these components together are used to design the schematic of the priority encoder, below is
the diagram of the wave forms obtained from the test bench of the priority encoder
As with the AND function seen previously, the NAND function can also have any number of
individual inputs and commercial available NAND Gate IC's are available in standard 2, 3, or
4 input types.
Fig 11 : Schematic Design of 3 Input NAND Gate
The Boolean Expression for this 4-input NOR gate will therefore be : Q=A+B+C+D If the
number of inputs required is an odd number of inputs any "unused" inputs can be held LOW by
connecting them directly to ground using suitable "Pull-down" resistors.
The Logic NOR Gate function is sometimes known as the Pierce Function and is denoted by
downwards arrow operator as shown A ↓ B.
Fig 12 : Schematic Design of 4 Input NOR Gate
Chapter Five
6-Bit Register
5.1 IMPLEMENTATION OF 6-BIT REGISTER
Flip-flop is a 1 bit memory cell which can be used for storing the digital data. To increase
the storage capacity in terms of number of bits, we have to use a group of flip-flop. Such a group
of flip-flop is known as a Register. The 6-bit register will consist of 6 number of flip-flop and it
is capable of storing an 6-bit word. The binary data in a register can be moved within the register
from one flip-flop to another.
A simple Shift Register can be made using only D-type flip-Flops, one flip-Flop for each
data bit. The output from each flip-Flop is connected to the D input of the flip-flop at its right.
Shift registers hold the data in their memory which is moved or “shifted” to their required positions
on each clock pulse.
The figure (13) shown below is the basic schematic of the 6-Bit Register:
The 6-Bit Register consists of the following components:
In D flip flop, the single input "D" is referred to as the "Data" input. When the data input is set to
1, the flip flop would be set, and when it is set to 0, the flip flop would change and become reset.
The "CLOCK" or "ENABLE" input is used to avoid this for isolating the data input from the flip
flop's latching circuitry. When the clock input is set to true, the D input condition is only copied
to the output Q.
The Conventional Search operation is performed in three steps. 1)Search lines or bit lines i.e
b and bbar are reset to GND. 2)Matchline is precharged to Vdd and 3) Finally, the search key bits
and its complementary value are placed on b and bbar respectively, with write disabled.If the
search key bit is identical to the stored value, ml–to-gnd pull-down paths remain „OFF‟, and the
ml remains at VDD,indicating a “match”. Otherwise, if the search key bit is different from the
stored value, one of the pull-down paths conducts and discharges the ml to gnd indicating a
“mismatch”.
Resetting b and bbar to gnd before ml pre-charge phase ensures that both pull-down paths are
„OFF‟, and hence do not conflict with the ml precharging. Fig ffvf shows the search operation
when 0 is stored in the cell. For b = „1‟ (bbar = „0‟), ml is discharged to „0‟ detecting “mismatch”,
In this paper, we have surveyed CAM circuits and architectures, with an emphasis on high-capacity
CAM. At the circuit level, we have reviewed the two basic CMOS cells, namely the BCAM and
TCAM type NOR cell and the NAND cell , XOR cell. Different CAM cell is analyzed for power
and delay using cadence virtuoso tool at 90nm. Supply voltage 1.2V and operating frequency
100(MHz) is used for all the CAM cell structures.
The Content Addressable Memory (CAM) is designed and implemented in existing CAM XOR
Array and 90 nm based CAM XOR Array. The average power consumption and number of
transistor count should be reduced by proposed CAM XNOR Array and 90 nm Based CAM XNOR
Array. It will be having much reduced power when compared to the other two designs.
[2] S. Yang, W. Wang, N. Vijayakrishnan , and Y. Xie, “Low-leakage robust SRAM cell design
for sub-100 nm technologies”, in Proc. ASPDAC, 2005, pp.539-544
[3] Mahendranath.B and Avireni Srinivasulu , “Performance analysis of a new CMOS output
buffer”, in proc. of IEEE International Conference on Circuit, Power and Computing
Technologies, Kumaracoil , India, Mar 21-22, 2013, pp. 752-755. Dol:
10.1109/ICCPCT.2013.6529041.
[4] Madira Suma, V .Venkata Reddy and Avireni Srinivasulu, “Current mode Schmitt trigger
based on ZC-current differencing transconductance amplifier”, in proc. of IEEE International
Conference on Inventive Computation Technologies, pp. 439-443, Aug 26-27,India.
[5] E. Komoto, T. Homma, and T. Nakamura, “A high-speed and compactsize JPEG Huffman
decoder using CAM,” in Symp. VLSI Circuits Dig. Tech. Papers, 1993, pp. 37–38.
[6] B. W. Wei, R. Tarver, J.-S. Kim, and K. Ng, “A single chip Lempel-Ziv data compressor,” in
Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 3, 1993, pp. 1953–1955.
[7] L.-Y. Liu, J.-F. Wang, R.-J. Wang, and J.-Y. Lee, “CAM-based VLSI architectures for dynamic
Huffman coding,” IEEE Trans. Consumer Electron., vol. 40, no. 3, pp. 282–289, Aug. 1994.
[8] M. Nakanishi and T. Ogura, “Real-time CAM-based Hough transform and its performance
evaluation,” Machine Vision Appl., vol. 12, no. 2, pp. 59–68, Aug. 2000.
[9] T.Srivyshnavi and A. Srinivasulu, “A current mode Schmitt trigger based on Current
Differencing Transconductance Amplifier: without any passive components”, in proc. of IEEE
International Conference on Signal Processing, Communications and Networking, Chennai, India,
March 26-28, 2015, pp. 1-4. DOI: 10.1109/ICSCN.2015.7219884
[10] ] M. Meribout, T. Ogura, and M. Nakanishi, “On using the CAM concept for parametric curve
extraction,” IEEE Trans. Image Process., vol. 9, no. 12, pp. 2126–2130, Dec. 2000.
[11] Shin, Y.C., Sridhar, R., Demjanenko, V., Palumbo, P.W. and Srihari, S.N., 1992. A special-
purpose content addressable memory chip for real-time image processing. IEEE Journal of Solid-
State Circuits, 27(5), pp.737- 744.
[12] Maurya, S.K. and Clark, L.T., 2011. A dynamic longest prefix matching content addressable
memory for IP routing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
19(6), pp.963-972.
[13] Jalaleddine, S.M., 1999. Associative memories and processors: The exact match paradigm. Journal of
King Saud University-Computer and Information Sciences, 11, pp.45-67.
[14] Bremler-Barr, A. and Hendler, D., 2012. Space-efficient TCAM-based classification using
gray coding. IEEE Transactions on Computers, 61(1), pp.18-30.
[15] Pagiamtzis, K. and Sheikholeslami, A., 2006. Content-addressable memory (CAM) circuits
and architectures: A tutorial and survey. IEEE Journal of Solid-State Circuits, 41(3), pp.712-727