0% found this document useful (0 votes)
4 views

fft

This paper presents a novel design for a 1024-point pipelined FFT processor utilizing the Cordic algorithm, achieving a high frequency of 330MHz while reducing hardware resource usage by up to 50% compared to traditional designs. The architecture includes various modules such as serial-to-parallel, FFT computation, and a Cordic module, which collectively enhance performance and precision. Simulation results indicate improved accuracy and efficiency, making this design suitable for digital signal processing applications.

Uploaded by

anurag anand
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

fft

This paper presents a novel design for a 1024-point pipelined FFT processor utilizing the Cordic algorithm, achieving a high frequency of 330MHz while reducing hardware resource usage by up to 50% compared to traditional designs. The architecture includes various modules such as serial-to-parallel, FFT computation, and a Cordic module, which collectively enhance performance and precision. Simulation results indicate improved accuracy and efficiency, making this design suitable for digital signal processing applications.

Uploaded by

anurag anand
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2011

2012
Second
International
International
Conference
Conference
on Intelligent
on Intelligent
Systems
System
Design
Design
andand
Engineering
Engineering
Application
Application

A Novel Design of 1024-point Pipelined FFT Processor Based on Cordic Algorithm

Shi Jiangyi, Tian Yinghui, Wang Mingxing ,Yang Zhe


Dept Microelectronics Xidian University, Xian, Shannxi, 710071, China
School of Technical Physics Xidian University, Xian, Shaanxi, 710071 China
[email protected]

Abstract—A novel ASIC design of a 1024-point pipelined FFT nk


X(k)= X (k) +W X (k)
1 N 2
(2)
Processor is introduced in this paper. This is a new FFT
nk
architecture based on the radix-2 algorithm and cordic X (k+N/2) = X (k) -W1X (k) N 2
(3)
algorithm. The solution adopted in the design is used to
achieve a high-frequency FFT processor of which the
Where k refers to 0, 1, 2…, N/2-1.
frequency could reach as high as 330MHz. It also shows an N-point DFT of x(n) could be ascertained from both
advantage in saving up to fifty percent hardware resources equation (2) and (3) with the values of X1(k) and X2 (k)
over traditional FFT processor. could be identified by the way mentioned above.
Keywords-FFT; pepeline; cordic

I. INTRODUCTION
DFT plays an important role in digital signal processing,
Figure 1. The basic element of butterfly algorithm
it is a basic operational unit in signal transformation from
time domain to frequency domain. FFT, a fast algorithm, is
Equation (2) and (3), also known as butterfly algorithm,
widely used in wireless communication, voice and image
are shown in figure-1.Figure-2 illustrates an 8-point FFT
processing, spectrum analysis and so on. With the number of
algorithm which is composed of basic butterfly algorithm.
point increasing, the need of hardware resources multiples
while the computational speed slows down. The common
solution used to enhance the performance of FFT processor
is to increase the number of pipeline-stage, but this will lead
to the increase in hardware resource and power, and the
speed will be limited as well. Therefore, we need to reduce
the scale of hardware resources on the basis of ensuring
high-performance. Cordic algorithm and a new structure are
adopted in the design in this paper to reduce the hardware
resource and achieve a high frequency and better
performance. Figure 2. 8-point FFT algorithm
An N-point DFT is defined as:
N −1

¦ x ( n )W
nk II. ARCHITECTURE
X (k) = N
k=0, 1, 2..., N-1 (1)
n=0 The 1024-point FFT pipelined processor presented in this
refers to exp (-j 2π nk ) and x (n) is the
nk paper is based on radix-2 algorithm. The system consists of a
Where W N
N serial-to-parallel module, ten FFT computational modules, a
input signal. parallel-to-serial module and a sequence reversion module.
FFT, a fast algorithm shown in equation (1), could be Figure-3 shows the architecture of the FFT processor.
divided into DIT and DIF in terms of decimation, and could
also be separated into radix-2, radix-4 etc. with respect to
radix. X (n) could be divided into odd part and even part
using radix-2 DIT in equation (1). N/2-point DFT is
computed in both part to identify the value of X1(k) and.

X (k)
2
Based on the periodicity and symmetry of W nk , the
N
following equations are obtained: Figure 3. New architecture of 1024-point pipelined FFT

978-0-7695-4608-7/12 $26.00 © 2012


978-0-7695-4608-7/11 2011 IEEE 80
DOI 10.1109/ISdea.2012.503
10.1109/ISdea.2011.126
Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on April 03,2024 at 05:05:32 UTC from IEEE Xplore. Restrictions apply.
A. Serial-to-parallel and stage one module while the result of the subtracter goes into SRAM2 and
In this paper, the serial to parallel module and state one SRAM3. When the number of data coming from the adder
module are connected together. They work as a whole. equals to 128, the result of the adder will go into the
Because of there is only one butterfly factor in stage one, butterfly module directly, simultaneously, the data stored in
one adder could achieve its function. In serial-to-parallel SRAM1 beforehand will go into butterfly module as well,
module, the frequency of clock controlling read is half of completing the butterfly computation. When the butterfly
that of write, ensuring that the number of input equals to that computation is running, the data from subtracter go into
of the output. Controlled by control module, the data from SRAM3, until the butterfly computation of the adder part is
serial-to-parallel to stage one module (just one adder) are the over. Meanwhile, data in SRAM2 and SRAM3 go into the
data that could be access in, thus the two result are going butterfly module synchronously, and data from the adder go
direct to the adder to accomplish the computation of state into SRAM1 consecutively. A control module is
one. The structure as shown in figure-4 comprises 4 SRAMs, incorporated to control the processing above, ensuring the
using ping-pong operation to make sure the continuous two states above work alternately.
output of data.

Figure 4. The architecture of Serial-to-parallel and stage one module

B. Parallel-to-serial module
This module is similar to the serial-to-parallel module
except that it is a 2-data-input and 1-data-output module,
leading to the save of a lot of resources with only four
registers are needed to implement its function. Every two
registers work together. Four registers combined comprise
the ping-pong operation, ensuring the continuous output of
Figure 5. The architecture of Serial-to-parallel and stage one module
data.
C. FFT compute module
The first and the second stage are simplified from the
third level. In stage one, there is only one butterfly factor,
thereforeˈwe can use an adder to accomplish the butterfly
computation. This way of design could spare the use of
cordic multiplier and rom used to store the butterfly factor.
The principle of stage two is the same as stage one. Figure-5
shows the structure of stage three of which the depth of the
memory is 128. When the stage increases by 1, the depth of
the memory will decrease by fifty percent accordingly. So
this structure could save a lot of resource compared to other Figure 6. Compute processing
designs. Figure-6 and figure-7 illustrates the computing
processing and the state diagram respectively.
In stage three, the depth of SRAM1, SRAM2 and
SRAM3 (SRAM1 and SRAM3 are both single-port rams
while SRAM2 is dual-port ram used for synchronous read
and write) are all 128. The output of the previous stage is
made of two parts, the result of the adder and the result of
the subtracter. The result of the adder goes into SRAM1

81

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on April 03,2024 at 05:05:32 UTC from IEEE Xplore. Restrictions apply.
δ 0 δ 1 δ 2 δ 3 δ 4 δ 5 δ 6 δ 7

0 1 0 -1 -1 0 -1 0
δ 8 δ 9 δ 10 δ 11 δ 12 δ 13 δ 14 δ 15

0 -1 1 1 0 0 1 0

Figure 7. State diagram

In figure-7, s0, s1, s2 and s3 stand for the state while the
number 1,2,3,4 and 5 stand for the state shift conditions
which are described in detail in Table1.
TABLE1 Descriptions of The Shift Conditions
Number Descriptiion
1 ready=1
2 address=127, ready=1
3 address==127, ready=1
4 address=127, ready=1
5 address==127, ready=0, add_rom=3
Figure 8. State diagram

D. Cordic module E. Inversion sequence module


The multiplier which is widely used in complex In this paper, a 1024-point FFT processor is designed, so
multiplication module costs four multipliers and two adders a 10-bit-width address bus is needed. To accomplish the
in each butterfly computation .However, instead, it only sequence reversion, we have to reverse the address bus, as is
costs one cordic module in each butterfly computation, thus, shown in figure-9.
it will save a lot of resource. In this paper, a cordic module is
used to complete complex multiplication. The structure of
the whole module is shown in figure-8.
The cordic-pre module, cordic module, cordic-last
module has been introduced in many papers, but the cordic
adjust module may not be adopted frequently. Using this
module, the accuracy could be improved greatly. The cordic

is implemented by iteration which consists of shift and
Figure 9. Inversion sequence
addition operations. In the cordic module of this proposed
design, 18 times iteration is adopted to obtain a higher
precision. After each operation, the modulus will increase. III. PERFORMANCE ANALYSIS
According to the theoretical calculations, the value is a
constant generated by multiplying 0.60725 to the value 1024-point FFT pipelined processor is the core element
output from the cordic-last module. The iterative formula (4) on radar signal processing chip. The system has been
and (5) are used to achieve this goal. verified at RTL level in Modelsim6.5 SE simulation
environment and has passed the verification of DC
x = x -δ x 2
−i
i +1 i i i
(4) verification platform. From the DC report we know that the
frequency could reach as high as 330MHz and the area is
y = y -δ y 2
i +1 i i i
−i
(5) less than traditional designs. From the comparison by the
matlab, we know that the precision is 0.5%.
Where δ refers to 0, 1 or -1. The input data start from 1 to 1024. Figure-10 shows the
After the simulation by Matlab, the δ is chosen from real part and the imaginary part of the output. The axis-x
Table2. stands for the sequence of the output while the axis-y stands
The accuracy of the cordic is related to the numbers of for its value. Figure-11 shows the output from Matlab. It is
iteration and the width of the bits. Five protection-bits used quite clear that both graphs have the same trend from figure-
in the design in this paper could enhance the precision 11, indicating that the result is right. Due to the fact that a
greatly. complex multiplication module uses either a general
multiplier or a cordic multiplier, errors are inevitable.
TABLE2 Description of The Shift Conditions Figure-12 shows the relative error of our proposed FFT

82

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on April 03,2024 at 05:05:32 UTC from IEEE Xplore. Restrictions apply.
processor while figure-13 shows the relative error of other The modelsim simulation diagram is shown in figure-14.
FFT processors that use general multiplier. For both figures, From the figure, it is quite clear that this proposed design is
axis-x stands for the sequence of the output and axis-y stands pipelined.
for the relative error. From figure-12, we can find that the
maximum value is less than 0.6%. However, from figure13,
we can see that the maximum value is less than 2.5%. So the
precision is improved a lot in our proposed design.


Figure 14. The modelsim simulation diagram

 IV. CONCLUSION
Figure 10. The output of FFT processor This paper shows the flow of our design, a 1024-point
pipelined processor. By using the new structure, the SRAM
is saved while the function of pipeline is secured. By using
cordic algorithm, the ROM is saved while the frequency and
the precision are enhanced greatly.
ACKNOWLEDGMENT
Support for this work was provided by Natural Science
Basic Research Plan in Shaanxi Province of China
(2010JM8015).
 REFERENCES
Figure 11. The output of Matlab [1] LENORE R. MULLIN and SHARON G. SMALL, "Four Easy Ways
to a Faster FFT”,Journal of Mathematical Modelling and
Algorithmsl:193-21 4; 2002.
[2] DAISUKE TAKAHASHI,YASUMASA KANADA,"High-
Performance adix-2,3 and 5 Parallel1-D Complex FFT Algorithms
for Distributed-Memory Parallel Computers”, Journal of
Supercomputing, 15, 207228(2000)
[3] Volder J E. The CORDIC trigonometric computing technique[J]. IRE
Transactions on Electronic Computers,1959,8(3):330-334
[4] Wang S,Piuri V, Wartzlander E. Hybrid CORDIC algorithms [J].
IEEE
Trans. on Computers, 1997, 46 (11):1202-1207
[5] Baas B M. A Low-Power, High-performance 1024-Point
FFTProcessor[J]. IEEE Journal of Solid-State Circuits 1999, 34 (3 ):
380-387
Figure 12. The comparison between FFT processor and Matlab [6] Cooley J W, Tukey J. Proc IEEE ASSP. IEEE Signal Processing
Society, 1992.
[7] Cooley J W IEEE Signal Processing Magazine, 1992, 9(1)
[8] J. Giacomantone, H. Villagarcia, O. Bria. A fast CORDIC Co-
Processor Architecture for Digital Signal Processing Applications [J].
VI Congreso Argentino de Ciencias de la Computacion (VI CACIC).
October 2000.
[9] Maharatna K, Grass E, Jagdhold U. A 64-Point Fourier Transform
Chip for High-Speed Wireless LAN Application Using OFDM. IEEE
Journal of Solid-State Circuit. 2004, Vol. 39 (3):484~493
[10] Sansaloni T, Pascual A P, Tomes V, et al. FFT Spectrum Analyzer
Project for Teaching Digital Signal Processing With FPGA Devices.
 IEEE Transactions on Education, 2007, Vol. 50 (3):229 ϔ 235
Figure 13. The comparison between FFT processor and Matlab

83

Authorized licensed use limited to: Indian Institute of Technology Hyderabad. Downloaded on April 03,2024 at 05:05:32 UTC from IEEE Xplore. Restrictions apply.

You might also like