KLakshmiNarasamma KSundeep 139
KLakshmiNarasamma KSundeep 139
Page 941
It is used in many applica- tions and systems, FFT LITERATURE SURVEY:
modern computers [22] is one of the most commonly The Sequential Fast Fourier Transform Explained:
used algorithms. Modern computers allow us to solve The Discrete Fourier Transform is an operation
the problems of larger and more complex, the past performed on a series of elements to convert the
several decades have become significantly faster. underlying domain (e.g., time) to frequency, or vise-
However, some large-scale problems still di_culties. versa. The result has many useful applications and is
Also a (Nlog (N)) FFT with plexity of the realization, one of the most widely used algorithms of the 20th and
some large-scale problems that are tying the modern 21st centuries [22]. The typical DFT operation
computers. Through the use of parallelization of performed on N elements x1; x2:::xN is de_ned as:
computational de_ciencies a way to overcome. By
using several computers in parallel to solve a problem,
we can greatly reduce the amount of computation time.
With the growing realization that the prevalence of
As a result of the input range of each element of the
multiple and multi-core architectures monplace
array X is an additive that requires the cooperation of
strategy.
every element of x. Thus, an N- DFT operation
element in the input array O (N2) is. In the case of a
The nature of the distribution of FFT algorithm is an
large array of applications, compute the DFT of this
obvious candidate for parallelization. 1965 [18] Since
difficult time. Therefore, DFT-time-complexity of the
the introduction of the FFT algorithm parallelization it
algorithm, it is desirable to reduce. Cooley and Tukey
is a great subject of research (and is). Fast Fourier
Fast Fourier [18] algorithm introduced in 1965, to
transformation (FFT) circuits have several series of
transform, to significantly reduce the complexity of
multipliers and adders over the complex numbers; That
computing discrete Fourier transform and conquer
should be a sufficient number to represent the smarter
approach is the partition. FFT method, we have no
choice. Using a fixed-point arithmetic, floating point
input in the range of a DFT (NlogN) to be able to
FFT is based on the structures of FFT (FP) operations
count. FFT for a very brief introduction to the original
growing until recently, [1], [2]. The main advantage of
paper [18], but its effect is evident as soon as the
a fixed-point arithmetic, the introduction of the FP has
number _eld of turning around. Sev- eral documents
a wide dynamic range; But the price is reduced. In
shortly after being published in more detail and more
addition, IEEE-754-2008 standard use [3] FP an FFT
FFT algorithm [28] generalize. However, for many
coprocessor for arithmetic, general purpose processors
years after its introduction, the development of some
allows collaboration. Offloads compute-intensive tasks
of the technology, "Cooley-Tukey algorithm" was
from the processor and can lead to high performance.
held, and the theoretical aspects of the re- search FFT
algorithm, or FFT analysis focused on the details of
FP activities, their main drawback is slower when
the proposals.
compared to fixed-point units. FP arithmetic as a way
to speed up the integration of several operations in a
The Cooley-Tukey Algorithm:
single FP unit, and hence save the delay, area, and
Cooley-Tukey algorithm, [18] as introduced,
power consumption, [2]. Using redundant number
accounting (both additions and multiplications) to
systems, FP overcoming slowing the propagation of
calculate the discrete Fourier transform to reduce the
the carry over the intermediate term, there is no known
need to use a divide and conquer approach. The
way to work.
Cooley-Tukey algo- rithm, outlined here, radix-2 D
(decimation-in-time), and for a simple FFT algorithm
serves as our base.
Page 942
The basic steps of the algorithm: To calculate the inverse transform, the real and
1. decimate - two (ie, source 2) of DFT to create small, imaginary part of the input and output are changed. N
even or odd set the split of the original input. 1 / n scaling a power of two, so that the right binary
2. Multiply - Multiply each element of the source of word log2 (N) bits in the same switch. Even simpler,
the unity of the coalition (called twiddle factors [28]). binary point left log2 (N) bits is shifted to remember.
3. Butterfly - (see picture 1), the other with a small If ever, did not show up to change a bit, depending on
element of the corre- sponding add each element of how it is used, the output from IFFT, is required.
each of the DFT.
Page 943
FDPA operation (e.g., BreWim+ BimWre+ Aim).
Implementation details of FDPA, over FP operands,
are discussed below.
Figure 4.3: A hardware mapped N = 16-point A. Proposed vast floating point multiplier:
radix-22 DIF FFT algorithm. The proposed multiplier, as well as other parallel
multipliers, two major steps vconsists, namely, a
PROPOSED SYSTEMS: partial product generation (PPG) vand PP reduction
Proposed Butterfly Architecture: (PPR). However, in contrast to the traditional
FFT efficient algorithms [5] in which the FFT multipliers, our module production in repetitive format
calculation of input N- (n / 2) FFT -input simplified is required for the final carry propagating adder keeps
calculations based on the hardware could not run. andvhence. Exponents of the input operands is done in
Continuing the distribution of 2-input FFT block, also the traditional FP multipliers are taken care of in the
known as the lead unit of the butterfly. Multiply- were same way; However, normalization and rounding
in fact proposed the butterfly unit is a complex FP butterfly Architecture (ie, three method adder) are left
Combine with operands. Expanding complex numbers, to the next block.
figure. 1 shows the required modules. According to
Fig. 1, the constituent operations for butterfly unit are 1) partial product generation: PPG foot because of the
a dot-product (e.g., BreWim+ BimWre) followed by proposed multiplier input operands (B, W, B_, W_)
an addition/subtraction which leads to the proposed representation is completely different from the
Page 944
traditional one. Moreover, Wre, Wim [5] constants, The inputs A and B (24 bits) can be found in
given that the multiplications in Fig. 1 (significands embedded Assuming significands BSD; W The
over) are calculated through a series of shifters and modified Booth encoding (25 bits) are represented in
adders. With the intention of reducing the number of the last PP, 24- (binary position) Width (instead of 25),
adders, we modified Booth encoding W, significand W has become the most significant bit is always 1. pps
storage [4]. Wre, Wim, representing a PP mutant four levels, given that the full adders 12 BSD. ?? In B}
booth, multiplicand given the choice from A to B, as Given that [1, 2) and [1, 2), in the final product ??
shown in Table I. Fig, the coefficient W of each of two There is} W [1, 4) and 48 binary position (47
binary positions. Shows the wiring needed for the inadequate ... 0). Consequently, there are fractions of 0
production of 3 ppibased on Table I where each PP down to 45 positions. Similar to the standard binary
consists of (n + 1) digits (i.e., binary positions). representation, Guard (G) and round (R) locations are
sufficient to correct rounding. Therefore, the final
2) Partial Product Reduction: The major constituent of product of binary positions in the last part of the only
the PPR step is the proposed carry-limited addition 23 + 2 with an error <2-23 are required to ensure.
over the operands represented in BSD format. This Semi-finished product of 46 points to 25 from 0 to 20
carry-limited addition circuitry is shown in Fig. 2 in binary positions are out of choosing locations,
(two-digit slice). Since each PP (PPi )is (n + 1)-digit however, the next step is to cause the addition of
(n, . . . , 0) which is either B (n − 1, . . . , 0) or 2B (n, . . carries from the G and R positions. However, because
. , 1), the length of the final product maybe more than of the limited carry BSD addition, in contrast to the
2n. standard binary addition, the 20 and 19 positions as the
cause carries. Overall, 18 to 0, respectively, the
positions of the final product is not useful, and hence it
is possible to have a flexible PPR tree. Fig. 4 shows
three digits to the method passed to the adder. Fig. 5
shows the proposed recurrent FP multiplier. Three vast
floating point adder B of the proposed method FP
method to handle the addition of three more delays in
the process, power, and which could lead to the use of
the area is to concatenate two FP adders. FP three
adders could be a good way to use the method [6], [7].
Page 945
CONCLUSION: [6] A. F. Tenca, “Multi-operand floating-point
In this paper we have here in the processor FFT and addition,” in Proc. 19th IEEESymp. Comput.
FFT butterfly structure, reading, writing and execution Arithmetic, Jun. 2009, pp. 161–168.
addresses. But the high-risk area, which is faster than
the previous works, we have a high-speed FP butterfly [7] Y. Tao, G. Deyuan, F. Xiaoya, and R. Xianglong,
structure, proposed. The reason for this is the speed of “Three-operand floating-point adder,” in Proc. 12th
development is twofold: it eliminates the carry- IEEE Int. Conf. Comput. Inf. Technol.,Oct. 2012, pp.
propagation significands 1) BSD represented, and 2) 192–196.
the abbreviation of the proposed new FDPA unit. FP
butterfly multiplications and additions need to be [8] A. M. Nielsen, D. W. Matula, C. N. Lyu, and G.
combined with the unit; Thus, high-speed additional Even, “An IEEE compliant floating-point adder that
LZD, normalize, remove, and can be achieved by conforms with the pipeline packet forwarding
rounding units. The next research FP adder three dual- paradigm,” IEEE Trans. Comput., vol. 49, no. 1, pp.
line method applied to the structure and other 33–47,Jan. 2000.
unnecessary FP FP representations may be planning on
using. Moreover, the design stage of the abolition of [9] P. Kornerup, “Correcting the normalization shift of
the use of improved techniques (ie, repeating LZD, redundant binary representations,” IEEE Trans.
normalize, and rounding) of the estimated costs, Comput., vol. 58, no. 10, pp. 1435–1439,Oct. 2009.
however, led to faster architectures.
[10] 90 nm CMOS090 Design Platform,
REFERENCES: STMicroelectronics, Geneva, Switzerland, 2007.
[1] E. E. Swartzlander, Jr., and H. H. Saleh, “FFT
implementation withfused floating-point operations,” [11] J. H. Min, S.-W. Kim, and E. E. Swartzlander, Jr.,
IEEE Trans. Comput., vol. 61, no. 2,pp. 284–288, Feb. “A floating-point fused FFT butterfly arithmetic unit
2012. with merged multiple-constant multipliers,” in Proc.
45th Asilomar Conf. Signals, Syst. Comput., Nov.
[2] J. Sohn and E. E. Swartzlander, Jr., “Improved 2011,pp. 520–524.
architectures for afloating-point fused dot product
unit,” in Proc. IEEE 21st Symp. Comput.Arithmetic,
Apr. 2013, pp. 41–48.
Page 946