0% found this document useful (0 votes)
45 views21 pages

Final - Wallac Tree Docx2

Uploaded by

404bieber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views21 pages

Final - Wallac Tree Docx2

Uploaded by

404bieber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ABSTRACT

This project focuses on creating and utilizing a Wallace Tree/reduction multiplier which accelerates the
multiplication process by minimizing the quantity of adder circuits needed for operations. Comparing
with other conventional multiplication methods, this multiplier accelerates the generation of partial
product , leading to faster computations. The method begins with designing the Verilog model and ends
with the model and verification of the multiplier using the Xilinx ISIM model simulator. The focus is on
optimizing the design by minimizing the quantity of adder circuits.The Verilog code is synthesized using
specialized tools, translating the high-level design into a gate-level implementation. Transfer-level and
technological schematics are also generated to provide a comprehensive view of the circuit. The Wallace
Tree multiplier's performance is then compared to several other common multipliers, including the Array
multiplier, Booth multiplier, Carry-Save multiplier, Braun multiplier, and Vedic multiplier. Key
performance metrics such as critical path delay, area utilization, and power consuming capacity were
evaluated. The outcomes show that the Wallace Tree multiplier works better than the other designs, with
less path delays and more efficient use of resources. Simulations confirm its effectiveness, particularly in
speed-critical applications. Additionally, the project discusses about possible improvements in the future,
such integrating sophisticated optimization algorithms and scaling to greater bit-widths. Finally, it can be
concluded that the multiplier is a proven design that is incredibly efficient and has vast room for
improvement.

INTRODUCTION Processing. With computational systems


In the rapidly evolving field of digital advancing, optimizing the multiplication
computing, the efficiency of arithmetic process has become increasingly important.
operations, particularly multiplication, plays a Traditional multiplier architectures such as the
critical role in determining the performance of Booth,Carry-Save,Array and Vedic multipliers
modern processors in computational systems. have made significant contributions to
As a fundamental arithmetic operation, improving multiplication speed and efficiency.
multiplication is widely utilized in various However, each of these designs has limitations,
domains, from simple calculations to complex which are addressed by this multiplier.
algorithms used in systems like Digital Signal
1.1 Overview of the Project
Processing, Cryptography, Multimedia
applications, 3D modeling, and Image
In order to improve speed and efficiency in Booth Multiplier: By encrypting the input
contemporary digital systems, the project numbers, this multiplier helps reduce the
focuses on developing and assessing the complexity of operations by lowering the list of
Wallace Tree multiplier, an advanced partial products.However, it introduces complex
multiplication technique.By creating a hardware control logic that can make the design difficult
description model using Verilog and evaluating to implement.
it against other traditional multipliers, this
Vedic Multiplier: Based on ancient Indian
project aims to demonstrate the superior
mathematics, this multiplier uses the Urdhva
performance of the multiplier.
Tiryakbhyam method to break down
1.1.1 Multiplier and its Types
multiplication into smaller, simpler parts. It is
A multiplier is one of the basic AU(arithmetic
efficient for small bit-widths but becomes more
unit) used in digital systems that multiplies two
complex as bit-width increases.
binary values. Multiplication is an essential
1.1.2 Multiplier Operation
operation in several applications, including
By using a hierarchical structure to decrease the
,signal processing, cryptography, and image
amount of sequential addition steps and
processing. Multiple varieties of multipliers
accelerate multiplication, it servers as a
have been developed to optimize speed, power
hardware-efficient multiplier design. The
consumption, and area utilization. Some of the
following stages are involved in how it operates:
common types of multipliers are:
Partial Product Generation: The initial step is
Carry Save Multiplier: In order to deduce the to multiply the individual bits of integers to
partial products addition and speed up create the partial products, just as with other
multiplication through successive reduction of multipliers.
the of carry bits, this multiplier employs a carry-
Reduction of Partial Products: The primary
save adder.
advantage of this multiplier is its use of parallel
Array Multiplier: One of the simplest adders (typically half-adders and full-adders) to
multipliers, it employs a grid of adders to decrease the partial products. These adders are
compute partial products. However, its arranged in a tree like structure to minimize the
sequential nature leads to longer delays in high- delay as the result of carry propagation .
speed operations.
Final Summation: After reducing the total
number of partial products to two, the final
result is calculated using a final adder. This This step involves designing a Wallace Tree
approach significantly reduces the time required multiplier using Verilog HDL (Hardware
for multiplication, making it faster than Description Language).
traditional multiplier designs. Verilog is used as a tool to describe the
behavior and design of the multiplier, allowing
1.2 Components and Modules
for simulation and synthesis into hardware.
1.2.1 Source Modules
1.3.2 Generation of RTL and Technological
In a multiplier system, the source module is
Schematics
responsible for generating the input data, which
After coding the design in Verilog, the
can be fetched from registers or memory. The
following step is to generate RTL (Register
input data is usually in binary form and can vary
Transfer Level) schematics, which provide a
in bit-width depending on the application.
visual representation of the hardware
1.2.2 Functional Modules
architecture. This helps in analyzing the layout
The functional modules perform the core
and structure of the multiplier.
operations of the multiplier. In the Wallace Tree
1.3.3 Performance Evaluation and
multiplier, these modules include:
Comparison
Partial Product Generators: These circuits
The Wallace Tree multiplier will be compared
multiply bits of the two input numbers to
with other multipliers like Carry Save, Array,
generate partial products.
Booth, and Vedic multipliers. Key performance
Adder Circuits: The adders are responsible for
metrics will be used in the evaluation, including:
summing up the partial products. The Wallace
 Critical Path Delay: It can be defines as the
Tree multiplier reduces partial products
time taken for the signal to propagate through
efficiently by placing adders in parallel.
the multiplier.
1.2.3 Sink Modules  Power Consumption: Power Consumption is
The sink module represents the output stage defined as the maximum power required for the
where the resultant product is stored or multiplication operation.
displayed.It usually involves writing the  Area Utilization: The amount of hardware
resultant product to a register or memory resources (such as gates and flip-flops) used is
location. refered to as Area Utilization.
1.3 Design Methodology 1.3.4 Demonstration of Efficiency
1.3.1 Development of Verilog HDL Model
This chapter presents the relevant research and
The Multiplier is to be tested for its efficiency in technologies that form the foundation for the
terms of speed, power, and area. Its performance construction and deployment of the Wallace
will be demonstrated in comparison with other Tree. Several studies and papers related to high-
multiplier architectures. speed and low-power multiplier designs have
1.4 Future Enhancements been reviewed to better understand the strengths
1.4.1 Scaling the Design and limitations of this Tree multiplier
One potential enhancement is to scale the architecture.
Wallace Tree multiplier design to accommodate The following sections provide a comprehensive
larger bit-widths, making it suitable for more overview of the related works, their
complex operations in advanced computational contributions to the field, and the technologies
systems. used in implementing 4-bit Wallace Tree
1.4.2 Pipelining for Performance multipliers using Verilog .
Improvement 2.2 Related Work
Introducing pipelining into the Wallace Tree The table below summarizes the key reference
architecture can be further optimized in its papers that have contributed to the development
performance by simplifying the multiplication and understanding of the Wallace Tree
process into smaller stages, thereby allowing multiplier and other related architectures.
for faster clock speeds and improved
throughput.
1.4.3 Application in FPGA and SoC Design
This multiplier can be integrated into FPGA
(Field-Programmable Gate Array) and SoC
(System on Chip) designs to leverage its
reconfigurability and high-speed capabilities in
modern digital systems.
2 . RELATED WORK
2.1 Introduction
Table 1 : List of Journals and their Key Contributions

S.
Title Authors Year Publication Key Contributions
No
Design of a High- Proposed enhancements to
Speed Wallace M.Mehta, IEEE Wallace Tree multipliers to
1 Multiplier with V.Parmar 2000 Transactions on improve speed and area
Improved Area and E.Swartzlander Computers efficiency through
Delay Performance optimized design.
Ultra Low Voltage Introduced ultra-low-
Low Power CMOS 4 IEEE voltage compressors that
C.Chang,
2 2 and 5 to 2 2004 Transactions on improve power efficiency
J.Gu,M.Zhang
Compressors for Fast VLSI Systems and speed in fast arithmetic
Arithmetic Circuits circuits.

Y.Jiang, Developed a low-power


IEEE
A Novel Multiplexer A.Al-Sheraida, full adder based on
Transactions on
3 Based Low Power Y.Wang, 2004 multiplexers, enhancing
Circuits and
Full Adder E.Sha, power efficiency for
Systems I
J. Chung Wallace Tree multipliers.

Design and Analyzed different


Performance Analysis IEEE multiplier architectures,
P.Mavuri,
4 of High-Speed MAC 2015 Transactions on highlighting the
B. Velan
Units Using Various Computers performance benefits of
Multipliers Wallace Tree multipliers.
K.B.Jaiswal,
Presented low-power
Design of Low- V.Nithish
IEEE Wallace Tree multiplier
Power Wallace Tree Kumar,
5 2015 Transactions on design with modified full
Multiplier Using P. Seshadri, G.
VLSI Systems adders for reduced power
Modified Full Adder Lakshminarayan
consumption.
an
Proposed UCSLA
S.Ravi,A.Patel, IEEE technique for reducing
Design of Low Power
M.Shabaz, Transactions on power consumption and
6 Multipliers Using the 2015
P.Chaniyara, H. Circuits and improving speed in
UCSLA Technique
Kittur Systems II multipliers, including
Wallace Tree.
Design and Developed an approximate
L.Qian,
Evaluation of an IEEE Wallace-Booth multiplier,
C.Wang,W.Li,
7 Approximate 2016 Transactions on balancing speed with
F.Lombardi,
Wallace-Booth Computers manageable approximation
J. Han
Multiplier errors.
Fast Binary Counters IEEE Introduced fast binary
Based Transactions on counters to optimize speed
8 C.Fritz ,A Fam 2017
on Symmetric Circuits and and critical path delays in
Stacking Systems I Wallace Tree multipliers.
Design of Studied approximate radix-
W.Liu,L.Qian,
Approximate Radix-4 IEEE 4 Booth multipliers for
C.Wang,
9 Booth Multipliers 2017 Transactions on error-tolerant applications,
H.Jiang,J.Han,
foError- Computers integrated into Wallace
F.Lombardi
TolerantComputing Tree structures.
Low-Power
Approximate Proposed low-power
Multipliers Using M.S.Ansari, IEEE approximate multipliers
10 Encoded Partial H.Jiang, J. Han 2018 Transactions on with partial product
Products and B.F.Cockburn, VLSI Systems encoding, enhancing power
Approximate efficiency in Wallace Tree.
Compressors

2.3 Detailed Discussion of the Related Works


In this section, a deeper analysis is provided of aim is to highlight the contributions of each
the research papers reviewed in this project. The author and their respective work in improving
multiplier architectures, with a focus on Wallace Furthermore, Mehta et al. provide a detailed
Tree multipliers. These works lay the foundation comparative analysis of various multiplier
for the development of an efficient, high end architectures, demonstrating that the optimized
speed, and low-power Wallace Tree multiplier Wallace multiplier achieves a significant trade-
using Verilog. In [1], the authors M. Mehta, V. off between speed and area than other
Parmar, and E. Swartzlander focus on the design conventional designs. Their work is especially
of a high-speed Wallace multiplier that improves relevant in high-performance computing systems,
the area and delay performance characteristics. where both speed and compactness are essential.
They argue that the traditional Wallace multiplier In [2], C. Chang, J. Gu, and M. Zhang present
suffers from a lack of optimization in both speed their research on ultra-low-voltage, low-power
and area efficiency, which makes it less suitable CMOS 4-2 and 5-2 compressors designed for
for modern applications requiring rapid rapid arithmetic circuits. Compressors are critical
computation. The authors propose several components of multipliers, and their efficiency
enhancements to the multiplier architecture, directly impacts overall behaviour of the
specifically aimed at minimizing the critical multiplier. The authors argue that traditional
path delays—a key factor in high-speed digital compressors, while effective in smaller circuits,
arithmetic operations. The authors emphasize on become power-hungry and inefficient as the bit-
the importance of deducing the carry width increases. This becomes problematic in
propagation, a major factor in reducing the delay applications that require minimal power
in arithmetic circuits. To achieve this, they consumption, such as portable or battery-
introduce a novel way to arrange the partial powered devices.
products and summation stages. By optimizing To address this issue, Chang et al. proposed a
the placement of adders and compressors, the new architecture for 4-2 and 5-2 compressors
design effectively lowers the number of that operate at ultra-low voltages. These
sequential operations, which leads to a faster compressors reduce both power required and
execution time compared to the traditional array operational delay by decreasing the number of
multipliers or carry-save multipliers .Their transistors required for each operation, thus
findings show a significant reduction in area reducing the total switching activity. Since they
utilization as well, which makes the Wallace decrease the time needed to sum partial products,
multiplier more area-efficient without the proposed multipliers are very useful in
compromising its performance. Wallace multipliers
operation.This lowers the transistor count, which
Their study provides detailed simulation results increases area efficiency while simultaneously
showing a decrease in power consuming reducing power dissipation.
capability by up to 30%, while maintaining . The proposed full adder also exhibits lower
comparable speed and performance to existing delay, which is critical for high-speed
compressor designs. This makes these applications.When integrated into Wallace Tree
compressors particularly useful in high- multipliers, these low-power adders
performance, low-power arithmetic circuits, significantly enhance the overall power
where minimizing energy consumption is efficiency of the multiplier. By lowering the
critical. By integrating these compressors into a power consumption of each individual adder, the
Wallace Tree multiplier, it becomes possible to total power consumed by the Wallace Tree
design a power-efficient multiplier that is apt for multiplier is minimized thereby making it more
applications requiring high-speed computation appropriate for applications where energy
with minimal power consumption.In [3], Y. efficiency is a priority. The multiplexer-based
Jiang, A. Al-Sheraida, Y. Wang, E. Sha, and J. adder design also allows for scaling to larger bit-
Chung introduces a multiplexer based low-power widths, making it a versatile solution for both
full-adder. Their paper focuses on decreasing the small-scale and large-scale arithmetic operations.
power consumption of arithmetic units, In [4], Mavuri, P. and Velan, B. evaluate the
particularly the full adder, which serves as a effectiveness of several multiplier designs in
primary building block in many multiplier high-speed multiply-accumulate (MAC)
architectures, including Wallace Tree multipliers. circuits.The authors focus on the Wallace Tree
The authors argue that conventional full adders, multiplier as one of the key architectures for
while fast, tend to consume excessive power, high-speed computation, alongside other popular
particularly in large-scale arithmetic circuits architectures like array multipliers and Booth
where many adders are required .The multipliers .
multiplexer-based architecture proposed by Jiang
They argue that, the Wallace multiplier offers
et al. achieves reduction in power used by
significant advantages in terms of speed,area
deducing the list of logic gates and transistors
and complexity, particularly in high-
that are required to deploy the adder. Instead of
performance MAC applications where rapid
traditional logic gate-based designs, the authors
multiplication and accumulation are required.
use multiplexers to exhibit the addition
Mavuri and Velan provide a detailed
performance analysis of several multiplier the high-speed performance usually associated
designs, comparing them based on key with Wallace Tree multipliers by incorporating a
performance metrics such as delay, area, and modified full adder into the multiplier's
power consumption. architecture.The modified full adder presented in
their paper uses a more efficient carry
Their results demonstrates that this multiplier
propagation mechanism, which reduces the
outperforms the other architectures in terms of
total number of transistors required for each
speed, and its ability to minimize the number of
addition operation.
sequential stages that are needed for the
This not only lowers the power dissipation but
summation of partial products. Compared to
also minimizes the delay, making the design
array multipliers, which compute partial products
suitable for high-speed applications. The authors
sequentially, the Wallace Tree structure's
provide detailed simulation results showing that
inherent parallelism enables faster calculation
their modified Wallace Tree multiplier achieves
speeds. In large-scale systems, more power and
a reduction in power consumption by up to 30%
area consumption may result from the Wallace
compared to traditional Wallace Tree
Tree multiplier's increased complexity, as noted
designs.This work is particularly relevant in the
by the authors, as the bit-width of the operands
context of modern digital systems, where
grows. For applications that value speed over
efficiency of power is becoming increasingly
space and power limitations, the Wallace Tree
important. By reducing the power consumption
multiplier is still a very effective option. This is
of the Wallace Tree multiplier, Jaiswal et al.'s
especially true for MAC units used in Digital
design makes it a more viable solution for low-
Signal Processing (DSP) and other high-
power applications, such as portable devices and
performance computing applications.A low-
battery-operated systems. In [6], S. Ravi, A.
power Tree multiplier employing a modified
Patel, M. Shabaz, P. Chaniyara, and H. Kittur
full-adder architecture is proposed by K. B.
introduce the UCSLA (Unified Carry Select and
Jaiswal , V.Nithish Kumar, P.Seshadri, and G.
Look-Ahead Adder) technique for designing
Lakshminarayanan in [5]. Their goal is to lower
low-power multipliers, including Wallace Tree
the Wallace Tree multiplier's power usage
multipliers.
without sacrificing its speed.
Their strategy combines the benefit of using a
The authors contend that substantial power carrylook-ahead adders with carry select to
reductions can be obtained without sacrificing achieve minimal power consumption without
cutting down high-speed performance. The design decisions taken for the Verilog HDL
UCSLA method speeds up the multiplication implementation of the Wallace Tree multiplier in
process overall by lowering the total number of this project will be influenced by the knowledge
consecutive stages needed for carry propagation. gathered from these investigations.
In comparison with conventional Wallace Tree 3 METHODOLOGY
multipliers, the UCSLA-based multipliers In this chapter, we'll walk through the steps
achieve a superior balance between power taken to design and deploy the Wallace Tree
consumption and speed, as demonstrated by the multiplier, breaking down the technological
substantial experimental data presented by the processes that are employed in creation. This
authors. One key bottleneck in high-speed includes everything from initial design concepts
arithmetic circuits is the carry propagation delay, to modeling in Verilog HDL, and running
which can be effectively reduced with the use of simulations, to analyse its performance against
the UCSLA approach. other multipliers. We'll also explore possible
The UCSLA technique is a useful design strategy future improvements and how this design can be
for power-constrained applications since it adapted for other real-world applications.
allows for large power savings without 3.1 DESIGN AND IMPLEMENTATION
compromising speed when included into Wallace 3.1.1 Verilog HDL Modeling
Tree multipliers. Before we dive into building the Wallace Tree
multiplier, firstly it is essential to get a proper
2.4 Conclusion
understanding of the design requirements and
In conclusion the overview of related studies
functionality. In this case, the goal is to build an
demonstrates a number of developments in the
efficient and fast multiplier using a hierarchical
field of high-speed, low-power multiplier design,
structure to minimize additional operations. The
with a focus on Wallace Tree multipliers. Every
Wallace Tree multiplier stands out because of its
research study offers significant perspectives on
ability to minimize the number of steps required
how to maximize multiplier architectures in
to finish multiplication, making it faster than
terms of speed, power efficiency, and area use.
traditional methods.
These papers lay the foundation for creating a 4-
Specification Analysis
bit Wallace Tree multiplier that strikes a
The first step in the design is analyzing the
compromise between efficiency and
project specifications, which define the
performance, from improved compressor designs
functional needs of the Wallace Tree multiplier.
to creative full adder implementations. The
The components that form the backbone of this from more traditional designs like the Array or
design include: Booth multipliers.
Partial Product Generators: They are Final Addition
responsible for creating intermediate results In this point, the reduced partial products are
(partial products) by multiplying the individual summed together to produce the final
bits of two input numbers. multiplication result. This final step can be
Reduction Stages: Once the partial products handled using different types of adders, such as a
are generated, they must to minimized. This is ripple-carry adder or a carry look-ahead adder,
where reduction systems, using carry-save depending on the performance trade-offs of the
adders, come into play to streamline the process design. A ripple carry adder may be slower but
and avoid the carry propagation delays that slow more area-efficient, while a carry look-ahead
down conventional addition methods. adder speeds up the operation at the cost of more
Final Adder Circuits: After the reduction hardware resources.
stages, all leftover partial products are summed Modular Design
up using an adder as such a ripple carry adder or The Tree multiplier is designed in modular
a carry look-ahead adder, which influences the sections, making it easier to develop, test, and
overall performance of the multiplier. verify each module individually.The modular
Partial Product Generation: approach also enhances the maintainability of the
In the first stage of the multiplication process, design, allowing for easier upgrades and future
the partial products are generated by multiplying modifications. The main modules include:
each input bit with the other. In a 4-bit Wallace Partial Product Generator Module: This
Tree multiplier, this results in 16 partial products. module handles the multiplication of the input
These partial products form the base for bits and is implemented using basic logic gates,
subsequent reduction stages. typically AND gates.
Reduction Stages Reduction Stage Module: This module
The partial products are first created.Then performs the reduction of partial-product using
they are processed using a hierarchical reduction carry-save adders to combine them efficiently,
technique. This step uses carry-save adders to reducing the delay.
combine the partial products while minimizing Final Adder Module: This module combines
the delay caused by carry propagation. This the remaining partial products into the final
efficiency sets the Wallace Tree multiplier apart multiplication result.
Each module is designed in Verilog HDL and clear picture of the architecture, making it easier
interconnected to form the full Wallace Tree to spot potential bottlenecks or inefficiencies.
multiplier. 3.2 PERFORMANCE EVALUATION
3.1.2 RTL and Technological Schematics The succeeding step in the process is evaluating
Simulation the Wallace Tree multiplier’s performance by
Before moving on to hardware implementation, comparing it to other multiplier architectures
it’s important to verify the design’s functionality such as Array, Booth, Carry-Save, and Vedic
through simulation. In this project, Xilinx ISIM multipliers. This comparison focuses on three
is used as the simulator in-order to test the key metrics:
Verilog code. By running a number of test cases, Critical Path Delay
we can check if the design behaves as expected The critical path delay refers to the time it takes
and meets performance requirements.Early for the multiplier to complete its operations. By
simulations help to identify potential issues analyzing the delays at each stage of the Wallace
before moving to hardware, saving time and Tree multiplier, we can compare it with other
effort later on. architectures. Thanks to its efficient reduction
Synthesis stages, the Wallace Tree multiplier often
Once the simulation confirms that the design outperforms Array and Booth multipliers in
works correctly, the next step is synthesis. This terms of speed.
process converts the high-level Verilog code into Area Utilization
a lower-level gate representation. The synthesis This measure assesses how much hardware is
results provide insight into the hardware required for the design, such as the number of
resources required, such as logic gates and flip- logic gates and adders used. Wallace Tree
flops, and also give us a sense of the design's multipliers are known for their area efficiency
efficiency in terms of area ,speed and timing. because they minimize the number of addition
Schematic Generation stages compared to traditional designs.
After synthesis, technological schematics are
Power Consumption
generated to visualize the hardware layout. These
Power consumption is another important
schematics help provide a high-level view of the
consideration, particularly in modern low-power
design, illustrating how the different modules are
applications. Because the Wallace Tree
interconnected and how the design will
multiplier uses fewer stages of addition and has a
eventually look on the silicon. This step offers a
reduced logic depth, it basically consumes less allowing for faster throughput by processing
power than other designs. multiple parts of the operation simultaneously.
Simulation and Analysis Parallel Processing: By performing several
The performance metrics cited above are calculations at the same time, parallel processing
measured using the Xilinx ISIM simulator. By can reduce overall computation time.
creating test benches, input vectors are applied to 3.4 APPLICATION OF FINDINGS
the multiplier to ensure it produces correct results Finally, the knowledge gained from designing
under various conditions. The simulation also and evaluating the Wallace Tree multiplier can
provides data on power consumption, area usage, be applied to real-world applications.
and critical path delay, which are crucial for Design Refinement
evaluating and comparing the Wallace Tree By implementing the enhancements discussed,
multiplier with other designs. the Wallace Tree multiplier can be fine-tuned for
3.3 FUTURE ENHANCEMENTS specific use cases, such as real-time data
As with any design, there’s always room for processing in high-speed computer systems.
improvement. Based on the performance Application Exploration
evaluations, several potential upgrades can be The Wallace Tree multiplier's speed and
considered for the Wallace Tree multiplier. efficiency make it an attractive choice for fields
Scaling to Larger Bit-Widths like digital signal processing and cryptography,
One potential enhancement is to increase the bit- where fast and efficient multiplication is crucial.
width capabilities of the Wallace Tree multiplier, Further exploration of its potential applications
allowing it to handle more complex calculations. can lead to broader adoption of this design in
This might involve adding more reduction stages practical scenarios.
and modifying the partial product generation The methodology outlined in this chapter
process accordingly. provides a clear path from initial design to final
Optimization Techniques evaluation. By using Verilog HDL for modeling,
There are several optimization strategies that can simulation, and performance evaluation, the
further improve the performance of the Wallace Wallace Tree multiplier demonstrates significant
Tree multiplier. Two such techniques are improvements in speed, area, and power
pipelining and parallel processing: efficiency compared to traditional multipliers.
Pipelining: This technique breaks the The Wallace Tree architecture is demonstrated
multiplication operation into smaller stages, by this design, which also provides opportunities
for practical applications and future Partial Product Representation:
improvements across a variety of industries. Each partial product can be represented as:
Pij = ai ⋅bj
4. WALLACE TREE MULTIPLIER
P {ij} = ai.bj
ALGORITHM
where i,j ∈{0,1,2,3}i, j \in \{0, 1, 2,
The Wallace Tree multiplier is a high-speed
3\}i,j∈{0,1,2,3}.
hardware multiplication algorithm designed to
Consequently, the multiplication of two 4-bit
efficiently compute the product of two binary
binary values yields 16 partial products.
values. It leverages a hierarchical tree structure
to reduce the amount of addition stages required 4.2 Reduction Tree
to sum partial products, enabling faster The Wallace Tree algorithm uses a tree-like
multiplication. This chapter provides a detailed structure to organize and total the partial
explanation of the Wallace Tree multiplier, products, hence reducing the number of partial
including partial product generation, the products. This is achieved in multiple stages by
reduction tree, a pseudo-code outline, and a applying Carry-Save Adders (CSA), which
thorough step-by-step breakdown of the significantly minimize the number of
algorithm. intermediate sums.
4.1 Partial Product Generation The process involves the following steps:
The first step in the Wallace Tree multiplier Stage 1: Combining Partial Products
algorithm is to generate partial products using the The incomplete products are initially grouped
binary representations of the multiplicand (A) into groups of three using carry-save adders.
and multiplier (B). For a 4x4 Wallace Tree Two outputs are produced by adding each set: a
multiplier, each bit of the multiplicand (A) is sum and a carry.
multiplied by each bit of the multiplier (B) to
Stage 2: Repeated Reduction
create the partial products.
To further reduce the amount of partial products,
Given two 4-bit binary numbers:
the sums and carries from Stage 1 are then
B = b₃b₂b₁b₀
combined and summed once again using the
A = a₃a₂a₁a₀
carry-save adders. This process is repeated
The partial products are calculated as follows:
iteratively until there are only two values left .
Generating Partial Products:
Stage 3: Final Addition:
Multiply each bit of the multiplicand by each bit
of the multiplier, resulting in 16 partial products.
Once only two numbers remain, they are // Step 3: Add remaining partial products
summed using a standard adder to produce the final_ product = Add(partial_ products[0]),
final product. partial _products[1])
4.3 Pseudo-Code for Wallace Tree Multiplier return final _product
The pseudo-code for the Wallace Tree multiplier function Carry Save Adder(A, B, C):
outlines the high-level operations involved in the // Perform carry-save addition
multiplication process: S=A+B+C
Pseudo code C = Carry(A, B, C)
function Wallace Tree Multiplier (A,B): return S, C
// Step 1: Initialize partial products function Add(X, Y):
Partial _ products = [ ] // Perform standard addition
for i from 0 to 3: return X + Y .
for j from 0 to 3: 4.4 Detailed Explanation of the Algorithm
Pij = A[i] * B[j] There are three main steps in which the Wallace
Partial _ products append(Pij) Tree multiplier works: partial product
// Step 2: Perform reduction using carry-save development, reduction stages, and final
adders addition. Here’s a more detailed breakdown of
while length(partial_products) > 2: each phase:
new_ products = [ ]; Generation of Partial Products: The algorithm
while length(partial _ products) > 2: begins by multiplying each bit of the
// Take three partial products at a time multiplicand (A) with each bit of the multiplier
A = partial _ products pop() (B) to create a set of partial products. For a 4x4
B = partial _ products pop() multiplier, this process results in 16 partial
C = partial _ products pop() products, as shown in the equations:
P ij= ai ⋅bj
// Perform carry-save addition for i,j∈{0,1,2,3}
S, C = Carry Save Adder(A, B, C) P_{ij} = ai. bj
New products append(S) Pij=ai⋅bj
if C is not zero: for i,j∈{0,1,2,3}
new products append(C) Reduction Stages: The next step involves
Partial products = new products reduction of the generated partial products using
Carry-Save Adders (CSA). At each stage, three optimal hardware usage, making it an efficient
residual products are added, and the resultant choice for digital processing units.
carry-bit is saved for the next round of additions. Reduced Delay:
The process involves: The fall in the overall amount of addition stages
Stage 1: minimizes propagation delay, further enhancing
Partial products are categorized into sets of three. the speed of multiplication.
The carry-save adder generates two outputs: a
4.6 Applications .
sum and a carry.
The WallaceTree-multiplier is commonly used in
Stage 2:
digital signal processing (DSP) applications,
The sums and carries are then grouped again,
such as:
further reducing the number of partial products
FIR Filters:
until only two numbers remain.
High-speed multipliers are required in FIR
Efficiency Gain:
(Finite-impulse response) filters for efficient
Compared to conventional multipliers, the
signal processing.
Wallace Tree construction decreases the number
Image Processing:
of addition steps, which lowers the total delay..
Image processing algorithms often require fast
Final Addition: Once the reduction phase is
multiplication, making the Wallace Tree
complete, only two numbers remain. These two
multiplier a suitable choice.
numbers are computed using a standard binary
Cryptography:
adder to obtain the final product.
In encryption algorithms that rely on modular
4.5 Advantages
arithmetic, high-speed multiplication is critical.
The Wallace Tree multiplier has several
4.7 Flowchart Representation
advantages that make it highly suitable for high-
The following flowchart outlines the
speed digital multiplication:
computational steps involved in the Wallace Tree
High Speed:
multiplier algorithm.
The use of carry-save adders and the hierarchical
tree structure drastically reduce the number of
addition stages, leading to faster computation.
Efficiency in Hardware:
By organizing partial products into a tree
structure, the Wallace Tree multiplier achieves
against other widely used multiplier
architectures. The evaluation focuses on critical
performance metrics such as power consumption,
area utilization, and critical path latency. The
Wallace Tree multiplier and its competitors were
simulated and synthesized using the Xilinx ISIM
simulator and synthesis tools. The findings from
these simulation reuslts are detailed below.
5.1 SIMULATION RESULTS
5.1.1 Wallace Tree Multiplier
The Wallace Tree multiplier was simulated using
the Xilinx ISIM simulator to verify its
functionality and assess its performance. Several
test cases with different input values were run in
order to verify the design's correctness and
This flowchart visually depicts the process of resilience.
partial product generation, reduction through
Critical Path Delay:
carry-save addition, and final addition, making
An essential parameter that defines how long it
the computational process clearer.
takes the multiplier to generate an output after
By efficiently reducing the partial products
receiving an input is the critical path delay. In
using carry-save adders and organizing the
our simulations, the Wallace Tree multiplier
computations in a tree-like structure, the Wallace
demonstrated a critical path delay of
Tree multiplier achieves fast multiplication. It is
approximately 7.2 nanoseconds (ns). This
widely used in hardware implementations where
relatively short delay highlights the efficiency of
speed and resource efficiency are crucial,
the multiplier in performing high-speed
particularly in DSP and other high-performance
multiplication operations. The speed of the final
computing applications.
adder and the depth of the reduction stages are
Results and Comparison of Wallace Tree
important variables influencing this delay. This
Multiplier
short delay is made possible by the Wallace Tree
This chapter presents a thorough comparison of
multiplier's hierarchical partial product reduction
the performance of the Wallace Tree multiplier
structure, which also makes it competitive in 1. Array Multiplier
terms of speed and design complexity. Critical Path Delay: The Array multiplier
exhibited a delay of 10.5 ns, which is
significantly longer than the Wallace Tree
Area.Utilization
multiplier, indicating a slower operation.
The hardware resources required for the Wallace
Area Utilization: The design utilized 1200 logic
Tree multiplier were evaluated based on the
gates, indicating a larger silicon footprint
number of adders and logic gates used. The
compared to the Wallace Tree multiplier.
design employed around 850 logic gates and 20
Power Consumption: It used more energy
adders. This efficient use of resources makes the
inefficiently because of its 65 mW power
Wallace Tree multiplier a desirable choice in
consumption, which was greater than the
applications where space is limited. Its compact
Wallace Tree multiplier.
design is one of its key strengths compared to
other multiplier architectures. Booth Multiplier
Power.Consumption Critical Path Delay: The Booth multiplier had a
Power consumption is a crucial consideration delay of 8.0 ns, slightly longer than the Wallace
when evaluating the efficiency of digital circuits. Tree multiplier.
The Wallace Tree multiplier consumed Area Utilization: It utilized around 950 logic
approximately 45 milliwatts (mW) of power gates, more than the Wallace Tree multiplier but
during operation. Although slightly higher than less than the Array multiplier.
some of its competitors, this power consumption Power Consumption: The Booth multiplier
is within a reasonable range for high-speed consumed 55 mW, higher than the Wallace Tree
applications. It strikes a good balance between multiplier, indicating a trade-off between power
performance and energy efficiency, making it efficiency and speed.
suitable for portable and low-power devices. 2. Carry Save Multiplier
5.1.2 Comparison with Other Multipliers Critical Path Delay: The Carry Save multiplier
A comparative analysis was performed between had the lowest critical path delay at 6.8 ns,
the Wallace Tree multiplier and other commonly indicating better speed performance than the
used multiplier architectures: Vedic, Array, Wallace Tree multiplier.
Booth, and Carry Save multipliers. The Area Utilization: It used 900 logic gates, more
comparison focuses on critical path delay, area than the Wallace Tree but less than the Booth
utilization, and power consumption. and Array multipliers.
Power Consumption: Power consumption was larger designs like the Array and Booth
50 mW, slightly higher than the Wallace Tree multipliers, which consume more hardware
multiplier, suggesting a trade-off between speed resources. While the Vedic multiplier is slightly
and power efficiency. more area-efficient, the Wallace Tree multiplier's
3. Vedic Multiplier compact design makes it suitable for scenarios
Critical Path Delay: The Vedic multiplier where space is limited.
showed a delay of 7.5 ns, similar to the Wallace Power Consumption:
Tree multiplier. The power consumption of the Wallace Tree
Area Utilization: It used 880 logic gates, multiplier, at 45 mW, is lower than both the
making it marginally more area-efficient than the Booth and Array multipliers. Although the Vedic
Wallace Tree multiplier. multiplier consumes slightly less power, the
Power Consumption: The Vedic multiplier had Wallace Tree multiplier’s power-to-performance
a power consumption of 48 mW, slightly higher ratio remains competitive, making it an ideal
than the Wallace Tree multiplier but within a choice for instances wherein energy and speed
similar range. are essential.
5.2 ANALYSIS OF RESULTS
5.3VISUAL REPRESENTATION OF
The simulation findings shed important light on
RESULTS
the Wallace Tree multiplier's effectiveness and
To facilitate a clearer understanding of the
performance in relation to other multipliers.
performance comparison, visual aids such as bar
Critical Path Delay:
charts have been created. These charts help
With a critical path delay of 7.2 ns, the Wallace
highlight the differences between various
Tree multiplier outperforms most other multiplier
multipliers in terms of critical path delay, area
topologies, except the Carry Save multiplier,
utilization, and power consumption.
which has a lower delay. However, the Wallace
BAR CHART FOR CRITICAL PATH
Tree multiplier maintains an effective balance
DELAY
between speed and design complexity, making it
a viable option for high-speed applications.
Table 5.1 Bar Chart depicting Critical Path
Area Utilization:
Delay
In terms of area utilization, the Wallace Tree
multiplier is highly efficient, requiring only 850
logic gates. This gives it an advantage over
BAR CHART FOR POWER
Critical Path Delay (ns) CONSUMPTION
10
8
Table 5.3 Bar Chart depicting Power
8 7.5
7 Consumption
6.2 6.5
6

4
Critical Path Power Consumption
Delay (ns)
(mW)
2
60 55
50 49
0 50 45 47
1 2 3 4 5
40
Power
This chart displays the critical path delay for 30
Consumption
20 (mW)
each multiplier architecture, allowing for a visual
10
comparison of speed performance
0
1 2 3 4 5

BAR CHART FOR AREA UTILIZATION


5.4 Summary of Performance Metrics
These charts highlight the hardware requirements
Critical Area
for each architecture while displaying the area Power
Path Utilization
consumption of many multiplier designs. Multiplier Consumption
Delay (Logic
Table 5.2 Bar Chart depicting Area (mW)
(ns) Gates)
Utilization
Wallace
Area Utilization (Logic Tree 7.2 45 850
Gates) Multiplier
1200 1100 Booth
1000 900 920 8.0 55 950
870 Multiplier
820
800
Array
600 Area Utilization 10.5 65 1200
(Logic Gates) Multiplier
400
Carry Save
200 6.8 50 900
Multiplier
0
1 2 3 4 5
Vedic
7.5 48 880
Multiplier
5.5 Future Scope efficient coding techniques could enhance the
Despite the Wallace Tree multiplier's strong multiplier’s adaptability and energy efficiency.
performance, there are areas for potential Application-Specific Enhancements:
improvement: The Wallace Tree multiplier's applicability in
Scaling to Larger Bit-Widths: many sectors may be increased by tailoring it for
Enhancing the Wallace Tree multiplier's design certain use cases like encryption or digital signal
to handle larger bit-widths will allow it to tackle processing.
more complex operations efficiently. The Wallace Tree multiplier has demonstrated its
Optimization Techniques: capability as a high-speed, energy-efficient
Techniques like pipelining and parallel multiplication technique. Its balanced
processing could further improve its performance in terms of speed, power, and area
performance, increasing throughput and reducing utilization makes it an excellent choice for a
computation time. wide range of digital computing applications.
Integration with Modern Technologies: The proposed future improvements present
Exploring the integration of the Wallace Tree exciting opportunities to extend its use in more
multiplier with quantum or neuromorphic advanced and specialized applications.
computing technologies could unlock new
possibilities in advanced applications.
Power and Area Optimization:
Further research into dynamic voltage and
frequency scaling (DVFS) and hardware-

You might also like