0% found this document useful (0 votes)
25 views6 pages

KLakshmiNarasamma KSundeep 139

This document discusses the design and implementation of a floating-point butterfly architecture based on multi-operand adders. It proposes a butterfly unit that uses signed binary digit representation and faster floating-point multi-operand adders. This architecture aims to speed up floating-point fast Fourier transforms by improving the speed of the butterfly computations. The document provides background on fast Fourier transforms and reviews prior work on implementing FFTs using fixed-point and floating-point arithmetic. It also surveys parallelization approaches for FFT algorithms.

Uploaded by

23951a0498
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views6 pages

KLakshmiNarasamma KSundeep 139

This document discusses the design and implementation of a floating-point butterfly architecture based on multi-operand adders. It proposes a butterfly unit that uses signed binary digit representation and faster floating-point multi-operand adders. This architecture aims to speed up floating-point fast Fourier transforms by improving the speed of the butterfly computations. The document provides background on fast Fourier transforms and reviews prior work on implementing FFTs using fixed-point and floating-point arithmetic. It also surveys parallelization approaches for FFT algorithms.

Uploaded by

23951a0498
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Design and Implementation of Floating-Point Butterfly

Architecture Based on Multi-Operand Adders


K.Lakshmi Narasamma K.Sundeep
VLSI & ES, Associate Professor,
Department of ECE, Department of ECE - Communication Systems,
Pace institute of Technology and Sciences, Pace institute of Technology and Sciences,
NH-5, Near Valluramma Temple, Ongole, NH-5, Near Valluramma Temple, Ongole,
Prakasam District, A. P. Prakasam District, A. P.

Abstract: The index of signed binary digit (BSD) represented by


In this paper we have here in the processor FFT and the butterfly unit, the complex number system, fast
FFT butterfly structure, reading, writing and execution Fourier transform (FFT), floating point (FP), the
addresses. Fast Fourier Transform (FFT) coprocessor unnecessary number system, in addition to the three
having a noticeable impact on the performance of method.
communication systems, has been a hot topic of
research for many years. FFT function over the INTRODUCTION:
complex numbers in a row, also known as butterfly Joseph Fourier in 1811 in a paper presented to the
units have to add and multiply. , FFT architectures French Academy of Sciences. As soon as the paper is
applicable to floating point (FP) arithmetic units have published, the Fourier effect and un- benownst, yet
become more popular recently, especially the butterfly. many problem domains, which would have a lasting
FP is dismissing concerns (eg, scaling and overflow / impact on a range. Fourier presented by the magazine
flow) compute-intensive tasks from the general- in its original domain (eg, time) equally distributed
purpose processor offloads. However, FP butterfly, the samples to the frequency domain as a method of
main difficulty is slow as compared with the fixed- transforming a set _nite Fourier transform concept was
point counter. To mitigate the slowdown in high-speed introduced. Apply the mathematical concept of a set of
FP FP reveals incentive to the development of the complex numbers and Fourier in_nite the transition
butterfly structure. does not involve integration. This is a continuing type
of Fourier transform has many applications in physics
This brief product- AB e ± ± CD, signed- binary digit and engineering, but it is the form of a discrete Fourier
(BSD) on the basis of representation to count, add transform of the computer systems that can easily [53]
(FDPA) unit to a faster one using drops were invented can be implemented. The discrete Fourier transform
FP FP proposed butterfly unit. Three method FP FP (DFT) and its applications in computer science in 1965
BSD BSD adder and multiplier stable unit are part of (FFT) to transform the James Cooley and John Tukey
the proposed FDPA. BSD is a carry-limited so as to [18] led to the fast Fourier. Divide and conquer using
improve the speed of the unit adder FDPA proposed the method, FFT O (N2) O (Nlog (N)) also reduces the
and three horizontal BSD method adder and multiplier computational complexity of the DFT. When such
are used. Moreover, the change in the booth is used to signi_cant computational complexity, FFT Fourier
speed up the encoding module is BSD. FP results in transform many di_cult problems were a convenient
the synthesis of the proposed structure of the butterfly, solution. Turning around the areas of digital signal pro
but more than in previous images show that the cost is cessing and the number of FFT knots, which in the
much faster. 20th century, [22] is considered to be one of the most
inuential algorithms.

Page 941
It is used in many applica- tions and systems, FFT LITERATURE SURVEY:
modern computers [22] is one of the most commonly The Sequential Fast Fourier Transform Explained:
used algorithms. Modern computers allow us to solve The Discrete Fourier Transform is an operation
the problems of larger and more complex, the past performed on a series of elements to convert the
several decades have become significantly faster. underlying domain (e.g., time) to frequency, or vise-
However, some large-scale problems still di_culties. versa. The result has many useful applications and is
Also a (Nlog (N)) FFT with plexity of the realization, one of the most widely used algorithms of the 20th and
some large-scale problems that are tying the modern 21st centuries [22]. The typical DFT operation
computers. Through the use of parallelization of performed on N elements x1; x2:::xN is de_ned as:
computational de_ciencies a way to overcome. By
using several computers in parallel to solve a problem,
we can greatly reduce the amount of computation time.
With the growing realization that the prevalence of
As a result of the input range of each element of the
multiple and multi-core architectures monplace
array X is an additive that requires the cooperation of
strategy.
every element of x. Thus, an N- DFT operation
element in the input array O (N2) is. In the case of a
The nature of the distribution of FFT algorithm is an
large array of applications, compute the DFT of this
obvious candidate for parallelization. 1965 [18] Since
difficult time. Therefore, DFT-time-complexity of the
the introduction of the FFT algorithm parallelization it
algorithm, it is desirable to reduce. Cooley and Tukey
is a great subject of research (and is). Fast Fourier
Fast Fourier [18] algorithm introduced in 1965, to
transformation (FFT) circuits have several series of
transform, to significantly reduce the complexity of
multipliers and adders over the complex numbers; That
computing discrete Fourier transform and conquer
should be a sufficient number to represent the smarter
approach is the partition. FFT method, we have no
choice. Using a fixed-point arithmetic, floating point
input in the range of a DFT (NlogN) to be able to
FFT is based on the structures of FFT (FP) operations
count. FFT for a very brief introduction to the original
growing until recently, [1], [2]. The main advantage of
paper [18], but its effect is evident as soon as the
a fixed-point arithmetic, the introduction of the FP has
number _eld of turning around. Sev- eral documents
a wide dynamic range; But the price is reduced. In
shortly after being published in more detail and more
addition, IEEE-754-2008 standard use [3] FP an FFT
FFT algorithm [28] generalize. However, for many
coprocessor for arithmetic, general purpose processors
years after its introduction, the development of some
allows collaboration. Offloads compute-intensive tasks
of the technology, "Cooley-Tukey algorithm" was
from the processor and can lead to high performance.
held, and the theoretical aspects of the re- search FFT
algorithm, or FFT analysis focused on the details of
FP activities, their main drawback is slower when
the proposals.
compared to fixed-point units. FP arithmetic as a way
to speed up the integration of several operations in a
The Cooley-Tukey Algorithm:
single FP unit, and hence save the delay, area, and
Cooley-Tukey algorithm, [18] as introduced,
power consumption, [2]. Using redundant number
accounting (both additions and multiplications) to
systems, FP overcoming slowing the propagation of
calculate the discrete Fourier transform to reduce the
the carry over the intermediate term, there is no known
need to use a divide and conquer approach. The
way to work.
Cooley-Tukey algo- rithm, outlined here, radix-2 D
(decimation-in-time), and for a simple FFT algorithm
serves as our base.

Page 942
The basic steps of the algorithm: To calculate the inverse transform, the real and
1. decimate - two (ie, source 2) of DFT to create small, imaginary part of the input and output are changed. N
even or odd set the split of the original input. 1 / n scaling a power of two, so that the right binary
2. Multiply - Multiply each element of the source of word log2 (N) bits in the same switch. Even simpler,
the unity of the coalition (called twiddle factors [28]). binary point left log2 (N) bits is shifted to remember.
3. Butterfly - (see picture 1), the other with a small If ever, did not show up to change a bit, depending on
element of the corre- sponding add each element of how it is used, the output from IFFT, is required.
each of the DFT.

Figure 4.1: A radix-2 DIF butterfly (a) and a radix-2


DIT butterfly (b), where W is the twiddle factor.

FFT algorithm is the basic building block [23] can be


realized with a butterfly operation. Frequency (DIF)
time, (d) and the decimation of the butterfly in the
death of two types of operations, the two are shown in
Figure 4.1. DIF is the difference between the before
and after the addition or subtraction and multiplication
of featured twiddle factor is in place. FFT based on the
division to conquer and due to the input range is N R
Existing systems
source, known as the point length N = RP, so, and p
FFT
positive integer is the most effective. An N-point FFT,
An FFT processor chip was originally designed.
to count the butterflies are connected to the p stages.
OFDM FFT processor is the focal point of both the
transmitter and receiver. FFT high performance,
The map is an example of a hardware-point radix-
combined with low energy consumption, such as high-
made N = 16 -2 DIF FFT is shown in Figure 4.2. The
throughput approach to the implementation of an ASIC
input data, x (N), the output of data occurs in a random
is computationally demanding operation.
order, however, X (N) to return to observe it. Reversed
the order of the data generated to re-order bit is known
Architecture
and described in Section 5.3. Figure 4.2 FFT bit
The FFT and IFFT Equation 2.1 and 2.2 has the
reversed order, which will result in the reshaping of
property that, if
the input and output, so it is possible to form the
FFT(Re(xi)+ jIm(xi)) = Re(Xi)+ jIm(Xi)
natural order. Figure 4.2, moving vertically, parallel
and
data paths and reconfigure the product to the natural
IFFT(Re(Xi)+ jIm(Xi)) = Re(xi)+ jIm(xi),
order, the way to control cross connections while, and
where xi and Xi are N words long sequences of
think. As well as all the arrows in Figure 4.2, if one
complex valued, samples and sub-carriers respectively,
turned around from the FFT, DIF FFT is performed
then
instead.
1/N * FFT(Im(Xi)+ jRe(Xi)) = Im(xi)+ jRe(xi).
Therefore, it is necessary not only to discuss the
implementation of the FFT equalizer.

Page 943
FDPA operation (e.g., BreWim+ BimWre+ Aim).
Implementation details of FDPA, over FP operands,
are discussed below.

Figure 4.2: A hardware mapped N = 16-point


radix-2 DIF FFT algorithm. Or, objective, DRE, and Bim's significands BSD,
however, there are exponents of all inputs, predicted
(after removal of non-discrimination), the two's
complement representation. Within each of these
represent the position of the binary values {-1, 0, 1} is
a bit of a negative (negabit) and a bit of a positive
(posibit) represented by. BSD carry a limited
additional wiring for the numbers shown in Fig. 2,
capital (short) refers to the characters negabits
(posibits). The adder critical path delay will have three
full-adders. After three FP adder enclosed FDPA the
proposed method is a recurring FP multiplier.

Figure 4.3: A hardware mapped N = 16-point A. Proposed vast floating point multiplier:
radix-22 DIF FFT algorithm. The proposed multiplier, as well as other parallel
multipliers, two major steps vconsists, namely, a
PROPOSED SYSTEMS: partial product generation (PPG) vand PP reduction
Proposed Butterfly Architecture: (PPR). However, in contrast to the traditional
FFT efficient algorithms [5] in which the FFT multipliers, our module production in repetitive format
calculation of input N- (n / 2) FFT -input simplified is required for the final carry propagating adder keeps
calculations based on the hardware could not run. andvhence. Exponents of the input operands is done in
Continuing the distribution of 2-input FFT block, also the traditional FP multipliers are taken care of in the
known as the lead unit of the butterfly. Multiply- were same way; However, normalization and rounding
in fact proposed the butterfly unit is a complex FP butterfly Architecture (ie, three method adder) are left
Combine with operands. Expanding complex numbers, to the next block.
figure. 1 shows the required modules. According to
Fig. 1, the constituent operations for butterfly unit are 1) partial product generation: PPG foot because of the
a dot-product (e.g., BreWim+ BimWre) followed by proposed multiplier input operands (B, W, B_, W_)
an addition/subtraction which leads to the proposed representation is completely different from the

Page 944
traditional one. Moreover, Wre, Wim [5] constants, The inputs A and B (24 bits) can be found in
given that the multiplications in Fig. 1 (significands embedded Assuming significands BSD; W The
over) are calculated through a series of shifters and modified Booth encoding (25 bits) are represented in
adders. With the intention of reducing the number of the last PP, 24- (binary position) Width (instead of 25),
adders, we modified Booth encoding W, significand W has become the most significant bit is always 1. pps
storage [4]. Wre, Wim, representing a PP mutant four levels, given that the full adders 12 BSD. ?? In B}
booth, multiplicand given the choice from A to B, as Given that [1, 2) and [1, 2), in the final product ??
shown in Table I. Fig, the coefficient W of each of two There is} W [1, 4) and 48 binary position (47
binary positions. Shows the wiring needed for the inadequate ... 0). Consequently, there are fractions of 0
production of 3 ppibased on Table I where each PP down to 45 positions. Similar to the standard binary
consists of (n + 1) digits (i.e., binary positions). representation, Guard (G) and round (R) locations are
sufficient to correct rounding. Therefore, the final
2) Partial Product Reduction: The major constituent of product of binary positions in the last part of the only
the PPR step is the proposed carry-limited addition 23 + 2 with an error <2-23 are required to ensure.
over the operands represented in BSD format. This Semi-finished product of 46 points to 25 from 0 to 20
carry-limited addition circuitry is shown in Fig. 2 in binary positions are out of choosing locations,
(two-digit slice). Since each PP (PPi )is (n + 1)-digit however, the next step is to cause the addition of
(n, . . . , 0) which is either B (n − 1, . . . , 0) or 2B (n, . . carries from the G and R positions. However, because
. , 1), the length of the final product maybe more than of the limited carry BSD addition, in contrast to the
2n. standard binary addition, the 20 and 19 positions as the
cause carries. Overall, 18 to 0, respectively, the
positions of the final product is not useful, and hence it
is possible to have a flexible PPR tree. Fig. 4 shows
three digits to the method passed to the adder. Fig. 5
shows the proposed recurrent FP multiplier. Three vast
floating point adder B of the proposed method FP
method to handle the addition of three more delays in
the process, power, and which could lead to the use of
the area is to concatenate two FP adders. FP three
adders could be a good way to use the method [6], [7].

Page 945
CONCLUSION: [6] A. F. Tenca, “Multi-operand floating-point
In this paper we have here in the processor FFT and addition,” in Proc. 19th IEEESymp. Comput.
FFT butterfly structure, reading, writing and execution Arithmetic, Jun. 2009, pp. 161–168.
addresses. But the high-risk area, which is faster than
the previous works, we have a high-speed FP butterfly [7] Y. Tao, G. Deyuan, F. Xiaoya, and R. Xianglong,
structure, proposed. The reason for this is the speed of “Three-operand floating-point adder,” in Proc. 12th
development is twofold: it eliminates the carry- IEEE Int. Conf. Comput. Inf. Technol.,Oct. 2012, pp.
propagation significands 1) BSD represented, and 2) 192–196.
the abbreviation of the proposed new FDPA unit. FP
butterfly multiplications and additions need to be [8] A. M. Nielsen, D. W. Matula, C. N. Lyu, and G.
combined with the unit; Thus, high-speed additional Even, “An IEEE compliant floating-point adder that
LZD, normalize, remove, and can be achieved by conforms with the pipeline packet forwarding
rounding units. The next research FP adder three dual- paradigm,” IEEE Trans. Comput., vol. 49, no. 1, pp.
line method applied to the structure and other 33–47,Jan. 2000.
unnecessary FP FP representations may be planning on
using. Moreover, the design stage of the abolition of [9] P. Kornerup, “Correcting the normalization shift of
the use of improved techniques (ie, repeating LZD, redundant binary representations,” IEEE Trans.
normalize, and rounding) of the estimated costs, Comput., vol. 58, no. 10, pp. 1435–1439,Oct. 2009.
however, led to faster architectures.
[10] 90 nm CMOS090 Design Platform,
REFERENCES: STMicroelectronics, Geneva, Switzerland, 2007.
[1] E. E. Swartzlander, Jr., and H. H. Saleh, “FFT
implementation withfused floating-point operations,” [11] J. H. Min, S.-W. Kim, and E. E. Swartzlander, Jr.,
IEEE Trans. Comput., vol. 61, no. 2,pp. 284–288, Feb. “A floating-point fused FFT butterfly arithmetic unit
2012. with merged multiple-constant multipliers,” in Proc.
45th Asilomar Conf. Signals, Syst. Comput., Nov.
[2] J. Sohn and E. E. Swartzlander, Jr., “Improved 2011,pp. 520–524.
architectures for afloating-point fused dot product
unit,” in Proc. IEEE 21st Symp. Comput.Arithmetic,
Apr. 2013, pp. 41–48.

[3] IEEE Standard for Floating-Point Arithmetic, IEEE


Standard 754-2008,Aug. 2008, pp. 1–58.

[4] B. Parhami, Computer Arithmetic: Algorithms and


Hardware Designs,2nd ed. New York, NY, USA:
Oxford Univ. Press, 2010.

[5] J. W. Cooley and J. W. Tukey, “An algorithm for


the machine calculationof complex Fourier series,”
Math. Comput., vol. 19, no. 90, pp. 297–301,Apr.
1965.

Page 946

You might also like