UNIT - IV
Low-Voltage Low-Power Multipliers:
1. Introduction
2. Overview of Multiplication
3. Types of Multiplier Architectures
4. Braun Multiplier
5. Baugh- Wooley Multiplier
6. Booth Multiplier
7. Introduction to Wallace Tree Multiplier
Introduction
• Multiplication is an important fundamental function in arithmetic operations.
• In fact, multiplication-based operations such as Multiply and Accumulate (MAC)
and inner product are among some of the frequently used computation-intensive
arithmetic functions currently implemented in many Digital Signal-Processing
(DSP) applications (such as convolution, Fast Fourier Transform (FFT), filtering,
and others).
• They usually contribute significantly to the time delay and take up a great deal of
silicon area in the DSP system.
• Since multiplication dominates the execution time of most DSP algorithms, using
a high-speed multiplier is very desirable.
• Currently, multiplication time is still the dominant factor in determining the
instruction cycle time of a DSP chip.
• With an ever-increasing quest for greater computing power on battery-operated
mobile devices, design emphasis has shifted from optimizing conventional delay
time and area size to minimizing power dissipation while still maintaining high
performance.
Overview of Multiplication
• Multiplication can be considered as a series of repeated additions.
• The Number to be added is called the multiplicand, the number of times it is added is called the
multiplier, and the result obtained is called product.
• The basic operations involved in multiplication include Generating and accumulating or adding the
partial products. Consequently to speed up the entire multiplication process, these two major steps must
be optimized. The two main categories of binary arithmetic multiplication involve computing
unsigned numbers and computing signed numbers.
[Link] multiplication
• Real time computer applications require fast multiplication.
• By utilizing AND gates and full adders, multiplication can be implemented on processor much in the
same way as it is done by hand: multiplying each digit of the multiplier by the multiplicand, thereby
generating partials and then sum up the respective partial products in order to generate the final result.
• Assume that X and Y are two n-bit unsigned numbers, where X is the multiplicand and Y is the
multiplier.
•The product of X and Y is P and it can be written in the following form
•The multiplicand X and multiplier Y can be represented as follows:
The process of multiplying two unsigned Binary Coded-Decimals (BCDs)
using the paper-and-pencil method
Shift-add multiplication algorithms
• Shift-and-add multiplication is similar to the multiplication performed by paper
and pencil.
• This method adds the multiplicand X to itself Y times, where Y denotes the
multiplier.
• To multiply two numbers by paper and pencil, the algorithm is to take the digits of
the multiplier one at a time from right to left, multiplying the multiplicand by a
single digit of the multiplier and placing the intermediate product in the
appropriate positions to the left of the earlier results.
When multiplying with right shifts, the partial
product terms are
Multiplication of signed numbers
Types of Multiplier Architectures
[Link] multipliers
2. Parallel multipliers
• Braun multiplier
• Baugh-Wooley multiplier,
• tree multipliers (Wallace multiplier)
3. Serial-parallel multipliers
[Link] multipliers
• The serial multiplier uses a successive addition algorithm. They are simple
in structure because both the operands are entered in a al manner.
• Therefore, the physical circuit requires less hardware and a minimum
amount of chip area.
• However, the speed performance of the serial multiplier is poor due to the
operands being entered sequentially. FA
• 2. Parallel multipliers
• Three important criteria to be considered in the design of multipliers are the chip
area, speed of computation and power dissipation.
• Most advanced digital systems incorporate a parallel multiplication unit to carry out
high-speed mathematical operations.
• A microprocessor re quires multipliers in its arithmetic logic unit and a digital signal
processing System requires-multipliers to implement algorithms such as
convolution and filtering.
• Today, high-speed parallel multipliers with much larger areas and higher complexity
are used extensively in Reduced Instruction Set Computers (RISC), Digital Signal
Processing (DSP), and graphics accelerators)
• Some examples of the parallel multiplier are the array multipliers such as the Braun
multiplier and Baugh-Wooley multiplier, as well as the tree multipliers like the
Wallace multiplier.
• Array multipliers have a more regular layout, although tree multipliers are generally
faster.
• The major draw hack of these-multipliers is the relatively larger chip area
consumption. It presents high-speed performance, but it is expensive in terms of
silicon area and power consumption.
• This is because for parallel multipliers both operands are input to the multiplier in a
parallel manner. As a result, the circuitry occupies a much larger area and is more
complex as compared to serial multipliers.
[Link]-parallel multipliers
• The serial-parallel multiplier serves as a good trade-off between the time-consuming serial
multiplier and the area-consuming parallel multipliers.
• These multipliers are used when there is a demand for both high speed and small area. n a
device using the serial-parallel multiplier, one operand is entered serially and the other is
stored in parallel with a fixed number of bits.
• The resultant enhancement in the processing speed and the chip area will become more
significant when a large number of independent operations are performed.
• Contemporary digital signal processing algorithms for image processing and
telecommunication applications are increasingly dependent on both matrix arithmetic and
vector-like arithmetic.
• In addition, given the exponentially rising processor performance requirement and the
arithmetic-intensive nature of many applications, such as and speech image processing,
waveform shaping, infinite impulse response digital filtering, channel equalization,
networking, multimedia and computer vision, com high-speed multipliers will continue to be
in high demand.
Braun Multiplier
• Braun Edward Louis first proposed the Braun multiplier in 1963 .
• It is a simple parallel multiplier that is commonly known as the
Carry Save Array Multiplier.
• Braun multiplier is a type of parallel array multiplier. The architecture
of Braun multiplier mainly consists of some Carry Save Adders, array of
AND gates and one Ripple Carry Adder.
• This multiplier is restricted to performing multiplication of two unsigned
numbers.
• It consists of an array of AND gates and adders arranged in an iterative
structure that does not require logic registers.
• This is also known as the non-additive multiplier since it does not add
an additional operand to the result of the multiplication.
• n x n bit Braun multiplier is constructed with
• n (n-1) adders,
• n2 AND gates and
• (n-1) rows of Carry Save Adder.
• Each products can be generated in parallel with the AND gates
Schematic diagram of a 4 x 4-bit Braun multiplier
X: 4-bit multiplicand Y: 4-bit multiplier
P =8-bit product of X and Y
Ripple Carry Adders are used at the final
Pn= XiYj is a product bit
stage of the array to output the final result.
• Performance of Braun multiplier
• The Braun multiplier performs well for unsigned operands that are less than 16 bits, in
terms of speed, power and area. Besides, it has a simple and regular structure as
compared to other multiplier schemes.
• The number of components required in building the Braun multiplier increases
quadratically with the number of bits.
• Another pitfall of the Braun multiplier is its potential susceptibility to glitching
problems at the last stage of the full a due to the exploitation of the Ripple Carry
Adders (RCA).
• Speed consideration
• The delay of the Braun multiplier is dependent on the delay of the adder
cell and also on the final adder in the last row.
• In the multiplier array, a full adder with balanced carry and sum delays is
desirable because the sum and carry signals are both in the critical path.
• The worst-case multiplication time of a Braun multiplier can be
• Enhanced Braun multiplier
• Replacing the full adders FA1, FA2, FA3, and FA12 in above fig with half adders can enhance the performance of the Braun multiplier.
• Each replacement will result in a savings of three logic gates.
• Nonetheless, even though the performance is improved when replacing full adders with half adders.
• Optimizing the interconnections between the adders, so that the delay throughout each adder's path is approximately the same, can enhance
the performance of the Braun multiplier.
• modified Braun multiplier obtained by optimally interconnecting two full adders with fast input and fast output
Baugh-Wooley Multiplier
• The Baugh-Wooley multiplier is an enhanced version of the Braun multiplier.
• It is designed to cater to multiplication of both signed and signed operands, which are
represented in the 2's complement system.
• The partial complement number products are adjusted so that the negative signs are moved
to the last steps, which in turn maximize the regularity o the multiplication array.
• Architecture of Baugh-Wooley multiplier
• The architecture of the Baugh-Wooley multiplier is also based on the carry-save algorithm.
It inherits the regular and repeating structure of the array multiplier.
• The structure of a 4 x 4-bit 2's complement multiplier is shown in Fig, with the cell
number representing the type of basic cell.
The macroscopic views of these basic cells are
Algorithm of Baugh-Wooley multiplier
• The Baugh-Wooley multiplier operates on signed operands with 2's complement representation to
make sure that the signs of all the partial products are positive.
It is observed that the last two terms of Eq. are subtracted from the partial product.
To prevent the use of subtractor cells and use only adders, these negative terms should be transformed.
Therefore
using a step-by-step approach, this 2's complement multiplication algorithm can be converted into an
equivalent parallel array expression, as adopted by the Baugh-Wooley multiplier.
Each partial product bit is result of an AND gate of a multiplier.
The signs of all partial product are positive.
• The multiplication result of a 4x 4-bit multiplier is an 8-bit output, there are eight
vertical columns shown in Table each product terms is made up of one AND gate
The variables with bars denote prior inversions. Inverters are connected before the input of the full
adder or the AND gates as required by the algorithm.
Each column represents the addition in accordance with the respective weight of the product term.
In this scheme, a total of n{n - 1) +3 full adders are required.
Hence, for the case of n = 4, the array requires 15 adders.
2's complement number system
Signed multiplicands must first be converted into their 2's complement representation
before multiplication. A 2's complement generator is shown in Fig.
When the control line Comp-Sig (Complementary Signal) goes high, the XOR gates invert the input bits and
a 1 gets added to the result. The generated result is the 2's complement of the input bits.
On the contrary, when the Comp-Signal goes low, the multiplicand inputs do not get inverted and a 0 gets
added to them.
Once the signed multiplicands get processed, the Most Significant Bit (MSB) of the result would then
indicate the sign of the result (1 for negative, 0 for positive).
• Performance consideration
• The area and power consumption of a number of multiplier structures vary with the number
of bit operands and the layout strategies. In creasing regularity and locality at the silicon level
reduces the power consumption in a standard-cell-based design flow.
• Since the Baugh-Wooley multiplier is an evolvement of the Braun multiplier, its performance
can also be improved by using the earlier mentioned optimized interconnections, as shown in
Fig.
Booth Multiplier
• Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed
binary numbers in two's complement notation.
• The algorithm was invented by Andrew Donald Booth in 1950-1951.
• Area-efficient and fast multipliers are the essential blocks for high-performance computing.
• Therefore, multipliers should be small enough so that a larger number of them may be
integrated one single chip.
• Conventional array multipliers, like the Braun multiplier and the Baugh-Wooley multiplier,
achieve comparatively good performance, but they require large areas of silicon, unlike the
add-shift algorithms, which require less hardware and exhibit poorer performance .
• The Booth multiplier makes use of the Booth encoding algorithm in order to reduce the
number partial products by considering two bits of the multiplier at a time, thereby
achieving speed advantage over other multiplier architectures.
• This algorithm is valid for both signed and unsigned operands.
• Booth multiplier is arithmetic operator for DSP applications, such as filtering and for
Fourier transforms. Booth multiplier is used to achieve high execution speed. These
multipliers tend to consume most of power in DSP computation.
• Booth's algorithm
• In 1951, A.D. Booth proposed the Booth algorithm (also known as radix-2 algorithm) for
multiplication that accepts numbers complement form, based on Radix-2 computation. it can
handle single binary multiplication by using 2s complement representation.
• order to start the process, an imaginary 0 is appended to the right of the multiplier.
Subsequently, the current bit xi and the previous bit -1 of the multiplier, Xn-1 xn-2.. .x1,x0 are
examined in order to yield the ith bit, yi of the recoded multiplier, yn-1, yn-2...y1 ,y0.
• At this point, the previous bit xi-1 serves only as a reference bit. At its turn, -1 will be recoded
to yield yi-1, with xi-2 serving as the reference bit.
• For i= 0, its corresponding reference bit, x-1 is defined to be zero.
Standard radix-2 Booth multiplication rules
• The rules for a standard radix-2 Booth recoding are as follows:
1. Append a zero to the right of the Least Significant Bit (LSB) of the multiplier.
2. Inspect groups of two adjacent bits of the multiplier, starting with the LSB and the
appended zero.
If the pair is 00 or 11, then shift the partial product 1 bit to the right.
If the pair is 01, then add the multiplicand to the partial product and shift the partial
product 1 bit to the right.
If the pair is 10, subtract the multiplicand from the partial product and shift the new
partial product to the right by1 bit. .
3. Proceed with overlapping pairs of -bits such that the MSB of a pair becomes the LSB of
the next pair. In this manner, 1 bit o the multiplier number is eliminated in each pass
through the algorithm.
4. when the last pair of bits is examined, the partial product is updated following the rules
except that no shift is performed.
Modified Booth algorithm
• Recoding the multiplier in a higher radix is a powerful way to speed up the
standard Booth multiplication algorithm.
• Therefore, since in each cycle a greater number of bits can be inspected and
eliminated, the total number of cycles required to obtain the product gets
reduced.
• The number, n, of the bits inspected in radix, r, is given by
For example, in each cycle of the radix-4 algorithm, 3-bits get inspected and two get
eliminated.
The modified Booth's algorithm also commences by appending a zero to the right of
the LSB.
Radix-4 algorithm
Radix-8 algorithm
The Radix-8 performs the 8 different types of operations on the
multiplicand that are +M, +2M, +3M, +4M, –4M, –3M, –2M and –
M where M is Multiplicand.
The Radix-8 Booth multiplier widely used for high performance
signed multiplication by using encoding process.
Radix encoding process widely used to reduce the number of partial
products and there by reduces the number of additions and it
requires half of the number of iterations.
The Radix-8 method improves the system’s speed through the
reduction of partial products to N/3, where N= number of multiplier
bits
RADIX-8 BOOTH’S MULTIPLIER
Performance of the Radix-8 multiplier can be enhanced by introducing parallelism which
results in reducing the number of calculation stages.
Input = 00001010 = 10
Radix-8 Booth Encoding process Recoding
Coefficient = 00001001= 9
Operation on
Multiplier(Y) Recorded multiplier
filter output = 90 Multiplicand
Bits
(M)
i+2 i+1 i i-1 j+2=[(i+1)- j+1=[i-(i+1)] j= [(i-1)-i] (j+2)*22+(j+1)*
(i+2)] 21+(j*20)
0 0 0 0 0 0 0 0*M
0 0 0 1 0 0 1 +1*M
0 0 1 0 0 1 -1 +1*M
0 0 1 1 0 1 0 +2*M
0 1 0 0 1 -1 0 +2*M
0 1 0 1 1 -1 1 +3*M
0 1 1 0 1 0 -1 +3*M
0 1 1 1 1 0 0 +4*M
1 0 0 0 -1 0 0 -4*M
1 0 0 1 -1 0 1 -3*M
1 0 1 0 -1 1 -1 -3*M
1 0 1 1 -1 1 0 -2*M
1 1 0 0 0 -1 0 -2*M
1 1 0 1 0 -1 1 -1*M
1 1 1 0 0 0 -1 -1*M
1 1 1 1 0 0 0 0*M
20-Jun-22 30
Radix-8 Booth Encoding process
0*M multiplicand is multiplied by ‘0’
+1*M the product still remains the same
multiplicand
+2*M shift left the multiplicand by one place
(+1M process + add 0 LSB)
+3*M i.e addition of multiplicand and left shifted
multiplicand by one digit.
(1M+2M)
+4*M i.e shift left the multiplicand by two places
(2M+2M)
-4*M i.e (- shift left the two‘s complement of
multiplicand by two places.
2M-2M)
-3*M i.e (- addition of two‘s complement of
multiplicand and shift left one bit the
1M-2M)
two‘s compliment of the multiplicand
value
-2*M shift left one bit the two‘s complement of
the multiplicand value
(-1M process + add 0 LSB)
-1*M 20-Jun-22 2’s compliment of multiplicand 31
Booth encoder
• The Booth encoder implements Booth encoding of the three multipier bits
and also handles the sign extension logic.
• Each encoder is dedicated to one partial product in the array Since there is
a circuit for each the five possible generated partial product signals, one
and only Signal is high during the steady state operation.
• The carry propagation Circuits are independent of the partial product
circuits and they do not share any inputs.
Wallace Tree Multiplier
• Booth's algorithm effectively reduces the number of partial products by half.
• However, for large-operand multipliers such as 32-bit and above, the partial
products are longer than 16 bits and are considered unacceptably large.
• In this case, the performance of the modified Booth algorithm is degraded.
• The Wallace tree multiplication algorithm, however, can reduce the number of
partial products by employing multiple input compressors capable of accumulating
several partial products concur recently.
• In 1964, C.S. Wallace proposed the Wallace Tree multiplier, which can handle the
multiplication process for large operands.
• This is achieved by minimizing the number of partial product bits in a fast and
efficient way by means of a CSA tree constructed from 1-bit full adders.
• A Wallace tree multiplier is a parallel multiplier which uses the carry save addition
algorithm to reduce the latency
The Wallace tree has three steps:
1. Partial Product Generation Stage
2. 2Partial Product Reduction Stage
3. Partial Product Addition Stage
The Wallace tree approach can reduce the number of partial products, in parallel,
with a resulting overall delay proportional to log3/2 n, for number of rows
The main disadvantage of the Wallace tree algorithm is that the architecture exhibits some
irregularities in the layout because it has a relatively complicated interconnection scheme.
• The conventional Wallace tree algorithm reduces the propagation stages by incorporating
compressor
4:2 compressors
• In 1981, A. Weinberger of IBM originated the idea of 4:2. compressors .
• He proposed to reduce the number of propagation stages in tree multipliers by replacing the
3:2 compressors of the conventional Wallace tree algorithm with 4:2 compressors.
• The advantage of tree multipliers is that their speed increases logarithmically in proportion
to the operand's length, as opposed to the case of iterative arrays, for which the speed
increases linearly in proportion to the size of the operands.
• The 4:2 compressor is able to yield a much more regular structure than the 3:2 counter
because it can reduce four inputs of the same weight to two.