0% found this document useful (0 votes)

64 views83 pages

VHDL Implementation of FFT/IFFT Design

fft

Uploaded by

Nour Nour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views83 pages

VHDL Implementation of FFT/IFFT Design

fft

Uploaded by

Nour Nour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Faculty of Engineering

Seminar Report

Department of biomedical

FFT and IFFT using VHDL code

Nour Hankir
Farah Nassar
Lara Kaiss
Ghina Jouny

Supervised By
Dr Mohamad Alwan
Table of content

Contents
I. Abstract.................................................................................................................................................................. 3
II. Introduction........................................................................................................................................................ 4
III. FFT/IFFT ALGORITHMS............................................................................................................................5
A. Cooley-Tukey Algorithms......................................................................................................................... 5
B. 16-Points FFT/IFFT Module....................................................................................................................7
IV. 256-POINTS FFT/IFFT ARCHITECTURAL DESIGN............................................................................8
V. IFFT 256 point code...................................................................................................................................... 11
VI. IFFT 256 point code................................................................................................................................. 45
VII. Conclusion.................................................................................................................................................... 79
VIII. References.................................................................................................................................................... 80
List of figures

Figure 0-1 16 points DIF FFT algorithm based on Radix-2.....................................................................8

Figure 0-1 the 256 points FFT proposed design.......................................................................................10
Figure 0-2 Simplification of multiplication complexity inside the 16-points FFT module....12
I. Abstract

Processors of Fast Fourier Transform (FFT) and its inverse transform IFFT are a key component
in OFDM based wireless broadband communication systems. It is essential to develop a high-
speed and low-power performance FFT/IFFT processor to meet the real time and low cost
prerequisites in such communication systems. This paper gives the details of the development of
efficient 256-points FFT/IFFT architecture to be used in OFDM system and which satisfies the
specification of IEEE 802.16a standard. The 256-points FFT architecture consists of an
optimized pipeline implementation of 16-point FFT core based on Radix-2 butterfly Processor
Element. This proposed design reduces the multiplicative complexity compared to other efficient
architectures. The FFT processor has been implemented in VHDL code, and simulation results
show that this FFT module significantly achieves a better performance with low arithmetic
complexity. Hence high speed and low power consumption for OFDM-based wireless broadband
communication systems.
II. Introduction

Currently, one of the most commonly used digital signal processing algorithm is Fast Fourier
Transform (FFT) and its inverse transform (IFFT). Recently, this algorithm has been widely used
in digital signal processing field applied for wireless communication systems. FFT/IFFT is a key
component of the physical layer of Orthogonal Frequency Division Multiplexing (OFDM) based
wireless broadband communication system; it’s one of the most complex and intensive
computation module of various wireless standards PHY layer (OFDM-802.11x, MIMO-OFDM
802.11n, OFDM-802.16x) [1].
The fast growing demand of OFDM-based applications, including Wireless Metropolitan Area
Network (WMAN) applications, makes the processing speed an indicative factor in the Fast
Fourier Transform algorithm design. Indeed, these applications need the FFT and IFFT
processors to perform real-time operations for modulation and demodulation of OFDM signals.
Hence, the study of FFT/IFFT algorithms and high performance VLSI FFT/IFFT architecture
becomes increasingly important [2].
The main constraints nowadays for FFT/IFFT processors used in IEEE 802.11x and IEEE
802.16x standard, are execution time and power consumption [2], [3]. According to the 802.16a
specifications, the 256-point IFFT/FFT has to perform FFT computation in 89.6 µs. To fulfill
this time constraint, the FFT design has to use a highly in-depth architecture or employ a very
high operation frequency. Both solutions will satisfy the timing constraint, but at the cost of the
area and power consumption. In this paper, we concentrate on an optimal implementation on
Altera FPGA platform of the 256-point FFT processor based on 16-points FFT core. We propose
a power efficient 256-points FFT architecture that has low arithmetic complexity and high
architecture regularity and at the same time satisfies the timing of the IEEE 802.16a
specification.
The report is structured as follows: Section II discusses the Cooley-Tukey algorithms and
complex multiplication inside a butterfly-Processing Element. Section III discusses the
mathematical formulation and architectural description of the FFT proposed processor. Section
IV shows the resulting implementation and the performance evaluation. Finally a conclusion is
given in the last section.
III. FFT/IFFT ALGORITHMS
A. Cooley-Tukey Algorithms
The N-point Discrete Fourier Transform (DFT) of complex data sequence x(n) is defined as:

X(k) is the k-th harmonic and x(n) is the n-th input sample.
Direct DFT calculation requires a computational complexity of O(N2). By using The Cooley–
Tukey FFT algorithm, the complexity can be reduced to O ([Link]) [4].
The most universal of overall FFT algorithms is Cooley- Tukey, because of any factorization of
N is possible [5]. The most popular Cooley-Tukey algorithms are those were the transform
length is a power of a basis r, i.e., N=rS. These algorithms are referred to as Radix-r algorithms.
The most commonly used are those of basis r=2(radix-2) and r=4(radix- 4), r=8(radix-8) and r=16
(radix-16). Those algorithms and others such as radix-22, radix-23, Split-Radix have been
developed based on the basic Cooley-Tukey algorithm to further reduce the computational
complexity [6].

For r=2 and S stages, for instance, the following index mapping of Cooley–Tukey algorithm
gives:

This algorithm is based on a divide-and-conquer approach in the frequency domain and

therefore, is referred to as decimation-in-frequency (DIF) FFT. The FFT formula is split into two
summations:
After decimation into even-and odd-indexed frequency samples, X(k) becomes:

The computational method can be repeated through decimation of the N/2-point FFTs X(2k) and
FFTs X(2k+1). The entire algorithm involves log2N stages, where each stage involves N/2
operation units (Butterflies). The computation of the N point FFT via the decimation-in-
frequency (DIF) as in the decimation-in-time (DIT) algorithm requires (N/2).log2N complex
multiplications and N.log2N complex additions/subtractions [7]. Based on the same approach,
the other fast algorithms: radix-4, radix-8, radix-16, radix-22 and split-radix recursively divide
the FFT computation into odd and even-half parts and then obtain as many common twiddle
factors as possible. The number of needed real additions and multiplications is generally used to
compare efficiency of different variants of FFT algorithms.
The split-radix algorithm offers the best computational performance, as indicated by
multiplicative comparison in [8], due to its most trivial multiplication with twiddle factors Wn,
i.e., ±1 and ±j. Nevertheless, the split-radix is an irregular algorithm by its nature, because of the
combination of two algorithms radix-2 and radix-4 stages used respectively for the even-half
operations and for the odd half operations. Therefore, this architecture leads to an L-shaped
butterfly units and affect the delay of the pipeline path and make it unbalanced [6], [8]. A fix
radix algorithm with low degree, as radix-2, is entirely convenient for its requirement of low
complexity of integration in an integrated circuit(IC), due to the algorithm regularity as well as
the conception complexity and architecture control.
Fig. 1 represents the flow graph of complete decimation-infrequency decomposition of 16-points
FFT computation based on radix-2. The intrinsic operation of the signal flow graph is the
butterfly operation; it’s a 2-point DFT computation.

B. 16-Points FFT/IFFT Module

Figure 0-1 16 points DIF FFT algorithm based on Radix-2

The FFT computation of 16-points radix-2 based architecture is achieved in four stages. The x(0)
until x(15) variables are indicated as the input values for FFT computation and X(0) until X(15)
are indicated as the output values. In the butterfly process, the upward arrow executes addition
operation, beside that; downward arrow executes subtraction operation. The subtracted value is
multiplied with twiddle factor value WN before being processed into next stage; this
computation performed concurrently. The complex multiplication with the twiddle factor
requires four real multiplications and two add/subtract operations [4], [7].
The complex multiplication is one of the most essential arithmetic operations used in FFT
computation. It is often the most expensive arithmetic operation and one of the dominate factors
in determining the performance in terms of power consumption, speed and throughput of an FFT
processor [9].
As observed in [10], the complex multiplier may consume more than 70% of the power in an
FFT/IFFT processor. Therefore, an effective design of FFT processor is vital in high speed and
low-power applications.
The aim here is to reduce the multiplication complexity of the twiddle factor inside the butterfly
processor by calculating only three real multiplications and three additions/subtractions
operations [4], [11]. This method of complex multiplication reduction is demonstrated in
equation 8 and equation 9. We applied it for efficient conception of 16-points FFT module.
The complexes twiddle factor multiplication:

R  jI  (X  jY ).WN  (X  jY ).(C  jS) (8)

However, it can be simplified:

The twiddle factors coefficients WN are known in advance depend on the algorithm adopted
in the FFT implementation. i.e, C and S in Equations 9 and 10 are pre-computed and
stored in a memory table. Therefore it is necessary to store the following three coefficients C,
C+S, and C−S. The storage operation of those constants is used to simplify the complex
multiplication. Those constants can be saved as canonical signed digits (CSD) to implement
complex multiplication with carry and save tree [12]. Consequently, the area and power
consumption can both be reduced.
jπ
The complex multiplication with W 2 =e 4 requires only two real multiplications rather than
16
three multiplications Moreover, the complex multiplication can be reduced further with an
efficient number representation of fixed-point arithmetic [13]. The implementation of 16-
points FFT radix-2 algorithm is accomplished by coding the FFT module in the hardware
description language (VHDL). This module uses the method of complex multiplication
reduction by employing three multiplications, one addition and two subtractions. This is done
at the cost of an additional memory table. In the hardware description language program, the
twiddle factor multiplier was implemented using component instantiations of three lpm-mult
and three lpm-add-sub modules from ALTERA library. Worth to note that lpm modules are
supported by most of EDA vendors and LPM provides an architecture-independent library of
logic functions or modules that are parameterized to achieve scalability and adaptability [9].
The 16-points FFT processor employing radix-2 algorithm, performs trivial multiplications
with and W416=-j and W016 factors. Multiplication with W164 simply can be done by swapping
from real to imaginary part and vice versa, followed by changing the sign [7]. The rest of
complex multiplications are non-trivial multiplications. However, W 162 can be implemented
with two multiplications and three multiplications for the other non-trivial coefficients.
Therefore, the total number of real multiplications in the 16-points FFT scheme is 24, which
correspond to 10 complex multiplications.

IV. 256-POINTS FFT/IFFT ARCHITECTURAL DESIGN

The radix-2 algorithm is appealing for its simplicity but has the disadvantage for being not
adapted for large point FFT calculation such as 256-points FFT, due to the high multiplier
requirement. However, to increase the throughput and reduce the power consumption in the 256-
points FFT processor, the number of multiplications must be reduced. For this reason, we
propose a design methodology for efficient 256-points FFT architecture in order to provide high-
speed and keep the area and power consumption as low as possible.
The 256-points FFT/IFFT proposed architecture internally uses two 16-points FFT core for
computation. The 256-points FFT with radix-16 decimation of the FFT/IFFT can be formulated
in the following way:

We suppose: N =16T, k=s+Tt, and, n=l+16m where: s, l ϵ {0,1,…,15} and m, t ϵ {0,1,...,T-

1}.
We apply this values in equation (12), we obtain:

Equation (14) demonstrates that the implementation of the FFT algorithm for computation of the
256-points FFT (i.e N=162 ) involves computation of two 16-points FFTs. This one can be
computed by using a radix-16 algorithm as shown in Fig. 2. The first 16-points FFT module
computes 16-points of the 256-points FFT on the fitting data slot according to (14) and then
multiplies the output with 16×16 inter-dimensional constants coefficients by a multiplier and
once again computing the 16-points FFT of the resultant data with the fitting data reordering.

Figure 0-2 the 256 points FFT proposed design

For the 16-points FFT module implementation, the radix-16 algorithm is an attractive algorithm
for its requirement of less complex multiplications and additions comparing to radix-4, radix-2
algorithms [8]. However, the use of algorithms with high radix degree increases the complexity
of integration in an integrated circuit such as FPGA [14]. Even if the number of nontrivial
multiplications and additions/subtractions present a good clue on the effectiveness of an
algorithm, hardware integration considers other performance criteria such as the algorithm
regularity as well as the conception complexity and architecture control. The gains achieved by
the reduction of multiplications or additions could be sometimes lost by the control complexity
induced and the interconnection surplus. Therefore, our interest goes to radix-2 algorithm that
offers more large regularity for architectural hardware implementation compared to radix-16 and
radix-4 and split-radix.
Many communication systems require high throughput and continuous input/output data. The
MDC (Multipath Delay Commutator) pipeline architecture is considerably adequate to attain
these ends [15], and ideal choice to implement high speed long-size FFT due to its regular
structure and simple control [8]. R2MDC (Radix-2 Multipath delay Commutator) structure was
adopted as pipeline approach in our design for the purpose of minimizing the memory resources
and saving silicon area in FPGA [16]. Moreover, the efficiency of the pipeline FFT processor
can be improved by optimizing the structure and saving hardware resources [17].
The proposed architecture of 256-points FFT/IFFT module is illustrated in Fig. 2. Our design
consists of an essential unit, the Radix2MDC 16-point FFT unit, which is the kernel of the 256-
points FFT processor as interpreted in (14). It has four stages pipelined structure carrying out 24
real multiplications, and eventually processes 16-points FFT. This block requires an input buffer
of size 16 for storing the input serial data in 16 parallel vectors, in order to be arranging for
computation by FFT processor according to (14).
The 16-points FFT core uses radix-2 algorithms for computation. The multiplications with W 164
=-j and W160 are trivial, multiplication with W 164 is simply done by swapping from real to
imaginary part and vice versa, followed by changing the sign [18]. The implementation
complexity of non-trivial twiddle factors W162 and W166 is reduced even further, due to
replacement of the complex multiplications by basic operations. However, the multiplication
with W162 and W166 was done by add and shift operations [18], [19] (Fig. 3).
Applying permutations, shift-and-add operations with twiddle factors inside the 16-points FFT
module instead of complex multiplications reduces the number of expensive multiplication
operations. Multiplication with other non-trivial twiddle factors W 16n was implemented with
embedded multiplier 9-bit. The employed logic operations allow us to cut down the number of
complex multiplications in this optimized approach. Therefore, the total number was reduced to
4 complex multiplications. Consequently, the total number of real multiplication is 12
multiplications [19].
Another approach adopted previously, with which similar design of 256-points FFT
implementation was conceived to fulfill the optimum complexity using 16-points FFT module.
The total number was reduced to 6 complex multiplications. The total number of real
multiplication in this approach is 16 multiplications. The performance comparison of this
approach with different FFT pipeline efficient processors is completed in [19].
Figure 0-3 Simplification of multiplication complexity inside the 16-points FFT module

V. IFFT 256 point code

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
------------------------------------- Entity Declaration
----------------------------------------
entity top_ifft is
port( clk : in std_logic; -- Processing clock
reset : in std_logic; -- Asychronous reset
signal
real_in : in std_logic_vector(15 downto 0); -- Real part of the
input data
imag_in : in std_logic_vector(15 downto 0); -- Imaginary part of
the input data
valid_in : in std_logic; -- Input data valid
signal
start : in std_logic; -- Symbol start signal
real_out : out std_logic_vector(15 downto 0); -- Real part of the
output data
imag_out : out std_logic_vector(15 downto 0); -- Imaginary part of
the output data
valid_out : out std_logic -- Ouput data valid
signal
);
end top_ifft;
----------------------------------- Architecture begin here
-----------------------------------------
architecture Behavioral of top_ifft is
---------------------------------- Components declaration
-------------------------------------------
COMPONENT radix4_butterfly_r
port( clk : in std_logic; -- Processing clock
reset : in std_logic; -- Asychronous reset
signal
ri0 : in std_logic_vector(15 downto 0); -- input1 real part
ri1 : in std_logic_vector(15 downto 0); -- input2 real part
ri2 : in std_logic_vector(15 downto 0); -- input3 real part
ri3 : in std_logic_vector(15 downto 0); -- input4 real part
ii0 : in std_logic_vector(15 downto 0); -- input1 imaginary part
ii1 : in std_logic_vector(15 downto 0); -- input2 imaginary part
ii2 : in std_logic_vector(15 downto 0); -- input3 imaginary part
ii3 : in std_logic_vector(15 downto 0); -- input4 imaginary part
co1 : in std_logic_vector(15 downto 0); -- Cos of the angle1
co2 : in std_logic_vector(15 downto 0); -- Cos of the angle2
co3 : in std_logic_vector(15 downto 0); -- Cos of the angle3
si1 : in std_logic_vector(15 downto 0); -- Sin of the angle1
si2 : in std_logic_vector(15 downto 0); -- Sin of the angle2
si3 : in std_logic_vector(15 downto 0); -- Sin of the angle3
ro0 : out std_logic_vector(15 downto 0); -- real part of the
output1
ro1 : out std_logic_vector(15 downto 0); -- real part of the
output2
ro2 : out std_logic_vector(15 downto 0); -- real part of the
output3
ro3 : out std_logic_vector(15 downto 0); -- real part of the
output4
io0 : out std_logic_vector(15 downto 0); -- imaginary part of the
output1
io1 : out std_logic_vector(15 downto 0); -- imaginary part of the
output2
io2 : out std_logic_vector(15 downto 0); -- imaginary part of the
output3
io3 : out std_logic_vector(15 downto 0) -- imaginary part of the
output4
);
END COMPONENT;

COMPONENT ram_stage1_r
port(
clk : in std_logic; -- processing clock
reset : in std_logic; -- Asychronous reset
wr : in std_logic; -- write enable signal
Mux_sel : in std_logic_vector(1 downto 0); -- selection
line for mux
wadd : in std_logic_vector(7 downto 0); -- write
address
radd : in std_logic_vector(5 downto 0); -- read
address
data_in1 : in std_logic_vector(31 downto 0); -- input
data1
data_in2 : in std_logic_vector(31 downto 0); -- input
data2
data_in3 : in std_logic_vector(31 downto 0); -- input
data3
data_in4 : in std_logic_vector(31 downto 0); -- input
data4
data_out1 : out std_logic_vector(31 downto 0); -- output
data1
data_out2 : out std_logic_vector(31 downto 0); -- output
data2
data_out3 : out std_logic_vector(31 downto 0); -- output
data3
data_out4 : out std_logic_vector(31 downto 0); -- output
data4
data_out_final_real : out std_logic_vector(15 downto 0); -- real part
of the output data
data_out_final_Imag : out std_logic_vector(15 downto 0); -- imaginary
part of the output data
valid_out : out std_logic -- output valid signal
);
END COMPONENT;

COMPONENT ram_stage3_r
port (clk : in std_logic; -- Processing clock
wr : in std_logic; -- write enable
wadd : in std_logic_vector(5 downto 0); -- write addres
radd : in std_logic_vector(5 downto 0); -- read address
data_in : in std_logic_vector(31 downto 0); -- input data to write
into memory
data_out : out std_logic_vector(31 downto 0) -- output data from
memory
);
END COMPONENT;

--------------------------------------------- Signals Declaration

-------------------------------------
signal Real_1st_in_0 : std_logic_vector(15 downto 0);
signal Real_1st_in_1 : std_logic_vector(15 downto 0);
signal Real_1st_in_2 : std_logic_vector(15 downto 0);
signal Real_1st_in_3 : std_logic_vector(15 downto 0);
signal Imag_1st_in_0 : std_logic_vector(15 downto 0);
signal Imag_1st_in_1 : std_logic_vector(15 downto 0);
signal Imag_1st_in_2 : std_logic_vector(15 downto 0);
signal Imag_1st_in_3 : std_logic_vector(15 downto 0);
signal Real_2st_in_0 : std_logic_vector(15 downto 0);
signal Real_2st_in_1 : std_logic_vector(15 downto 0);
signal Real_2st_in_2 : std_logic_vector(15 downto 0);
signal Real_2st_in_3 : std_logic_vector(15 downto 0);
signal Imag_2st_in_0 : std_logic_vector(15 downto 0);
signal Imag_2st_in_1 : std_logic_vector(15 downto 0);
signal Imag_2st_in_2 : std_logic_vector(15 downto 0);
signal Imag_2st_in_3 : std_logic_vector(15 downto 0);
signal wr_add_1st : std_logic_vector(8 downto 0);
signal wr_en1, wr_en2 : std_logic;
signal wr_en3 : std_logic;
signal enable_wr_count1 : std_logic;
signal Data_in_s : std_logic_vector(31 downto 0);
signal wr_add_s : std_logic_vector(5 downto 0);
signal rd_add_s : std_logic_vector(5 downto 0);
signal Rd_add_s_d : std_logic_vector(6 downto 0);
signal rd_add_ss : std_logic_vector(6 downto 0);
signal rd_add_ss2 : std_logic_vector(5 downto 0);
signal Data_out01_s : std_logic_vector(31 downto 0);
signal Data_out02_s : std_logic_vector(31 downto 0);
signal Data_out03_s : std_logic_vector(31 downto 0);
signal Data_out11_s : std_logic_vector(31 downto 0);
signal Data_out12_s : std_logic_vector(31 downto 0);
signal Data_out13_s : std_logic_vector(31 downto 0);
signal Data_out14_s : std_logic_vector(31 downto 0);
signal enable_rd_count1 : std_logic;
signal Data1 : std_logic_vector(31 downto 0);
signal Data2 : std_logic_vector(31 downto 0);
signal Data3 : std_logic_vector(31 downto 0);
signal Data4 : std_logic_vector(31 downto 0);
signal enable_rd_count2 : std_logic;
signal rd_add_s2 : std_logic_vector(8 downto 0);
signal rd_add_s2_d : std_logic_vector(8 downto 0);
signal wr_add_ss : std_logic_vector(8 downto 0);
signal wr_add_ss_d : std_logic_vector(8 downto 0);
signal wr_add_sss : std_logic_vector(8 downto 0);
signal wr_en_ram2 : std_logic;
signal wr_add_ss2 : std_logic_vector(7 downto 0);
SIGNAL co1, co_1_1, co_1 : std_logic_vector(15 downto 0);
SIGNAL co2, co_2 : std_logic_vector(15 downto 0);
SIGNAL co3, co_3 : std_logic_vector(15 downto 0);
SIGNAL so1, si_1 : std_logic_vector(15 downto 0);
SIGNAL so2, si_2 : std_logic_vector(15 downto 0);
SIGNAL so3, si_3 : std_logic_vector(15 downto 0);
SIGNAL co_2_1 : std_logic_vector(15 downto 0);
SIGNAL co_3_1 : std_logic_vector(15 downto 0);
SIGNAL si_1_1 : std_logic_vector(15 downto 0);
SIGNAL si_2_1 : std_logic_vector(15 downto 0);
SIGNAL si_3_1 : std_logic_vector(15 downto 0);
type state1 is (rst, s0, s1, s2, s3, s4);
signal ps1, ns1 : state1;

type state2 is (rst, s0, s1, s2, s3, s4);

signal ps2, ns2 : state2;

type state3 is (rst, s0, s1, s2, s3, s4);

signal ps3, ns3 : state3;

type state4 is (rst, s0, s1, s2, s3, s4);

signal ps4, ns4 : state4;

signal mux_sel_s : std_logic_vector(1 downto 0);

signal mux_sel_s1 : std_logic_vector(1 downto 0);
signal Real_out1 : std_logic_vector(15 downto 0);
signal Real_out2 : std_logic_vector(15 downto 0);
signal Real_out3 : std_logic_vector(15 downto 0);
signal Real_out4 : std_logic_vector(15 downto 0);
signal Imag_out1 : std_logic_vector(15 downto 0);
signal Imag_out2 : std_logic_vector(15 downto 0);
signal Imag_out3 : std_logic_vector(15 downto 0);
signal Imag_out4 : std_logic_vector(15 downto 0);
signal ri0_s : std_logic_vector(15 downto 0);
signal ri1_s : std_logic_vector(15 downto 0);
signal ri2_s : std_logic_vector(15 downto 0);
signal ri3_s : std_logic_vector(15 downto 0);
signal ii0_s : std_logic_vector(15 downto 0);
signal ii1_s : std_logic_vector(15 downto 0);
signal ii2_s : std_logic_vector(15 downto 0);
signal ii3_s : std_logic_vector(15 downto 0);
signal ri0_s1 : std_logic_vector(15 downto 0);
signal ri1_s1 : std_logic_vector(15 downto 0);
signal ri2_s1 : std_logic_vector(15 downto 0);
signal ri3_s1 : std_logic_vector(15 downto 0);
signal ii0_s1 : std_logic_vector(15 downto 0);
signal ii1_s1 : std_logic_vector(15 downto 0);
signal ii2_s1 : std_logic_vector(15 downto 0);
signal ii3_s1 : std_logic_vector(15 downto 0);
signal enable_wr_count1_d1 : std_logic;
signal enable_wr_count1_d2 : std_logic;
---------------------- ROM declaration and initialization
-----------------------------------------
type rom is array(0 to 63) of std_logic_vector(15 downto 0);
constant co_11 : rom := ( x"4000",
x"3ffb",
x"3fec",
x"3fd4",
x"3fb1",
x"3f85",
x"3f4f",
x"3f0f",
x"3ec5",
x"3e72",
x"3e15",
x"3daf",
x"3d3f",
x"3cc5",
x"3c42",
x"3bb6",
x"3b21",
x"3a82",
x"39db",
x"392b",
x"3871",
x"37b0",
x"36e5",
x"3612",
x"3537",
x"3453",
x"3368",
x"3274",
x"3179",
x"3076",
x"2f6c",
x"2e5a",
x"2d41",
x"2c21",
x"2afb",
x"29ce",
x"289a",
x"2760",
x"2620",
x"24da",
x"238e",
x"223d",
x"20e7",
x"1f8c",
x"1e2b",
x"1cc6",
x"1b5d",
x"19ef",
x"187e",
x"1709",
x"1590",
x"1413",
x"1294",
x"1112",
x"0f8d",
x"0e06",
x"0c7c",
x"0af1",
x"0964",
x"07d6",
x"0646",
x"04b5",
x"0324",
x"0192"
);
constant co_12 : rom := ( x"4000",
x"4000",
x"4000",
x"4000",
x"3fb1",
x"3fb1",
x"3fb1",
x"3fb1",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3d3f",
x"3d3f",
x"3d3f",
x"3d3f",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3871",
x"3871",
x"3871",
x"3871",
x"3537",
x"3537",
x"3537",
x"3537",
x"3179",
x"3179",
x"3179",
x"3179",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"289a",
x"289a",
x"289a",
x"289a",
x"238e",
x"238e",
x"238e",
x"238e",
x"1e2b",
x"1e2b",
x"1e2b",
x"1e2b",
x"187e",
x"187e",
x"187e",
x"187e",
x"1294",
x"1294",
x"1294",
x"1294",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"0646",
x"0646",
x"0646",
x"0646"
);
constant co_13 : rom := ( x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e"
);

constant co_21 : rom := ( x"4000",

x"3fec",
x"3fb1",
x"3f4f",
x"3ec5",
x"3e15",
x"3d3f",
x"3c42",
x"3b21",
x"39db",
x"3871",
x"36e5",
x"3537",
x"3368",
x"3179",
x"2f6c",
x"2d41",
x"2afb",
x"289a",
x"2620",
x"238e",
x"20e7",
x"1e2b",
x"1b5d",
x"187e",
x"1590",
x"1294",
x"0f8d",
x"0c7c",
x"0964",
x"0646",
x"0324",
x"0000",
x"fcdc",
x"f9ba",
x"f69c",
x"f384",
x"f073",
x"ed6c",
x"ea70",
x"e782",
x"e4a3",
x"e1d5",
x"df19",
x"dc72",
x"d9e0",
x"d766",
x"d505",
x"d2bf",
x"d094",
x"ce87",
x"cc98",
x"cac9",
x"c91b",
x"c78f",
x"c625",
x"c4df",
x"c3be",
x"c2c1",
x"c1eb",
x"c13b",
x"c0b1",
x"c04f",
x"c014"
);

constant co_22 : rom := ( x"4000",

x"4000",
x"4000",
x"4000",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3b21",
x"3b21",
x"3b21",
x"3b21",
x"3537",
x"3537",
x"3537",
x"3537",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"238e",
x"238e",
x"238e",
x"238e",
x"187e",
x"187e",
x"187e",
x"187e",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"0000",
x"0000",
x"0000",
x"0000",
x"f384",
x"f384",
x"f384",
x"f384",
x"e782",
x"e782",
x"e782",
x"e782",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"cac9",
x"cac9",
x"cac9",
x"cac9",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c13b",
x"c13b",
x"c13b",
x"c13b"
);

constant co_23 : rom := ( x"4000",

x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf"
);

constant co_31 : rom := ( x"4000",

x"3fd4",
x"3f4f",
x"3e72",
x"3d3f",
x"3bb6",
x"39db",
x"37b0",
x"3537",
x"3274",
x"2f6c",
x"2c21",
x"289a",
x"24da",
x"20e7",
x"1cc6",
x"187e",
x"1413",
x"0f8d",
x"0af1",
x"0646",
x"0192",
x"fcdc",
x"f82a",
x"f384",
x"eeee",
x"ea70",
x"e611",
x"e1d5",
x"ddc3",
x"d9e0",
x"d632",
x"d2bf",
x"cf8a",
x"cc98",
x"c9ee",
x"c78f",
x"c57e",
x"c3be",
x"c251",
x"c13b",
x"c07b",
x"c014",
x"c005",
x"c04f",
x"c0f1",
x"c1eb",
x"c33b",
x"c4df",
x"c6d5",
x"c91b",
x"cbad",
x"ce87",
x"d1a6",
x"d505",
x"d8a0",
x"dc72",
x"e074",
x"e4a3",
x"e8f7",
x"ed6c",
x"f1fa",
x"f69c",
x"fb4b"
);

constant co_32 : rom := ( x"4000",

x"4000",
x"4000",
x"4000",
x"3d3f",
x"3d3f",
x"3d3f",
x"3d3f",
x"3537",
x"3537",
x"3537",
x"3537",
x"289a",
x"289a",
x"289a",
x"289a",
x"187e",
x"187e",
x"187e",
x"187e",
x"0646",
x"0646",
x"0646",
x"0646",
x"f384",
x"f384",
x"f384",
x"f384",
x"e1d5",
x"e1d5",
x"e1d5",
x"e1d5",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"c78f",
x"c78f",
x"c78f",
x"c78f",
x"c13b",
x"c13b",
x"c13b",
x"c13b",
x"c04f",
x"c04f",
x"c04f",
x"c04f",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"ce87",
x"ce87",
x"ce87",
x"ce87",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"ed6c",
x"ed6c",
x"ed6c",
x"ed6c"
);

constant co_33 : rom := ( x"4000",

x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df"
);

constant si_11 : rom := ( x"0000",

x"fe6e",
x"fcdc",
x"fb4b",
x"f9ba",
x"f82a",
x"f69c",
x"f50f",
x"f384",
x"f1fa",
x"f073",
x"eeee",
x"ed6c",
x"ebed",
x"ea70",
x"e8f7",
x"e782",
x"e611",
x"e4a3",
x"e33a",
x"e1d5",
x"e074",
x"df19",
x"ddc3",
x"dc72",
x"db26",
x"d9e0",
x"d8a0",
x"d766",
x"d632",
x"d505",
x"d3df",
x"d2bf",
x"d1a6",
x"d094",
x"cf8a",
x"ce87",
x"cd8c",
x"cc98",
x"cbad",
x"cac9",
x"c9ee",
x"c91b",
x"c850",
x"c78f",
x"c6d5",
x"c625",
x"c57e",
x"c4df",
x"c44a",
x"c3be",
x"c33b",
x"c2c1",
x"c251",
x"c1eb",
x"c18e",
x"c13b",
x"c0f1",
x"c0b1",
x"c07b",
x"c04f",
x"c02c",
x"c014",
x"c005"
);
constant si_12 : rom := ( x"0000",
x"0000",
x"0000",
x"0000",
x"f9ba",
x"f9ba",
x"f9ba",
x"f9ba",
x"f384",
x"f384",
x"f384",
x"f384",
x"ed6c",
x"ed6c",
x"ed6c",
x"ed6c",
x"e782",
x"e782",
x"e782",
x"e782",
x"e1d5",
x"e1d5",
x"e1d5",
x"e1d5",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"d766",
x"d766",
x"d766",
x"d766",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"ce87",
x"ce87",
x"ce87",
x"ce87",
x"cac9",
x"cac9",
x"cac9",
x"cac9",
x"c78f",
x"c78f",
x"c78f",
x"c78f",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c2c1",
x"c2c1",
x"c2c1",
x"c2c1",
x"c13b",
x"c13b",
x"c13b",
x"c13b",
x"c04f",
x"c04f",
x"c04f",
x"c04f"
);

constant si_13 : rom := ( x"0000",

x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"e782",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df"
);

constant si_21 : rom := ( x"0000",

x"fcdc",
x"f9ba",
x"f69c",
x"f384",
x"f073",
x"ed6c",
x"ea70",
x"e782",
x"e4a3",
x"e1d5",
x"df19",
x"dc72",
x"d9e0",
x"d766",
x"d505",
x"d2bf",
x"d094",
x"ce87",
x"cc98",
x"cac9",
x"c91b",
x"c78f",
x"c625",
x"c4df",
x"c3be",
x"c2c1",
x"c1eb",
x"c13b",
x"c0b1",
x"c04f",
x"c014",
x"c000",
x"c014",
x"c04f",
x"c0b1",
x"c13b",
x"c1eb",
x"c2c1",
x"c3be",
x"c4df",
x"c625",
x"c78f",
x"c91b",
x"cac9",
x"cc98",
x"ce87",
x"d094",
x"d2bf",
x"d505",
x"d766",
x"d9e0",
x"dc72",
x"df19",
x"e1d5",
x"e4a3",
x"e782",
x"ea70",
x"ed6c",
x"f073",
x"f384",
x"f69c",
x"f9ba",
x"fcdc"
);

constant si_22 : rom := ( x"0000",

x"0000",
x"0000",
x"0000",
x"f384",
x"f384",
x"f384",
x"f384",
x"e782",
x"e782",
x"e782",
x"e782",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"cac9",
x"cac9",
x"cac9",
x"cac9",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c13b",
x"c13b",
x"c13b",
x"c13b",
x"c000",
x"c000",
x"c000",
x"c000",
x"c13b",
x"c13b",
x"c13b",
x"c13b",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"cac9",
x"cac9",
x"cac9",
x"cac9",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"e782",
x"e782",
x"e782",
x"e782",
x"f384",
x"f384",
x"f384",
x"f384"
);

constant si_23 : rom := ( x"0000",

x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"c000",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf"
);

constant si_31 : rom := (x"0000",

x"fb4b",
x"f69c",
x"f1fa",
x"ed6c",
x"e8f7",
x"e4a3",
x"e074",
x"dc72",
x"d8a0",
x"d505",
x"d1a6",
x"ce87",
x"cbad",
x"c91b",
x"c6d5",
x"c4df",
x"c33b",
x"c1eb",
x"c0f1",
x"c04f",
x"c005",
x"c014",
x"c07b",
x"c13b",
x"c251",
x"c3be",
x"c57e",
x"c78f",
x"c9ee",
x"cc98",
x"cf8a",
x"d2bf",
x"d632",
x"d9e0",
x"ddc3",
x"e1d5",
x"e611",
x"ea70",
x"eeee",
x"f384",
x"f82a",
x"fcdc",
x"0192",
x"0646",
x"0af1",
x"0f8d",
x"1413",
x"187e",
x"1cc6",
x"20e7",
x"24da",
x"289a",
x"2c21",
x"2f6c",
x"3274",
x"3537",
x"37b0",
x"39db",
x"3bb6",
x"3d3f",
x"3e72",
x"3f4f",
x"3fd4"
);

constant si_32 : rom := (x"0000",

x"0000",
x"0000",
x"0000",
x"ed6c",
x"ed6c",
x"ed6c",
x"ed6c",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"ce87",
x"ce87",
x"ce87",
x"ce87",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c04f",
x"c04f",
x"c04f",
x"c04f",
x"c13b",
x"c13b",
x"c13b",
x"c13b",
x"c78f",
x"c78f",
x"c78f",
x"c78f",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"e1d5",
x"e1d5",
x"e1d5",
x"e1d5",
x"f384",
x"f384",
x"f384",
x"f384",
x"0646",
x"0646",
x"0646",
x"0646",
x"187e",
x"187e",
x"187e",
x"187e",
x"289a",
x"289a",
x"289a",
x"289a",
x"3537",
x"3537",
x"3537",
x"3537",
x"3d3f",
x"3d3f",
x"3d3f",
x"3d3f"
);

constant si_33 : rom := (x"0000",

x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"c4df",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e",
x"187e"
);
------------------------------- Signals declaration
-------------------------------------------------
signal Real_out_sig, imag_out_sig : std_logic_vector(15 downto 0);
signal wr_en_ram3 : std_logic;
signal wr_en_ram4 : std_logic;
signal wr_en_ram5 : std_logic;
begin
------------- radix4_butterfly_r module instantiation
------------------------------------------------
Inst_radix4_butterfly : radix4_butterfly_r PORT MAP
( clk => clk,
reset => reset,
ri0 => ri0_s1,
ri1 => ri1_s1,
ri2 => ri2_s1,
ri3 => ri3_s1,
ii0 => ii0_s1,
ii1 => ii1_s1,
ii2 => ii2_s1,
ii3 => ii3_s1,
co1 => Co_1_1,
co2 => Co_2_1,
co3 => Co_3_1,
si1 => Si_1_1,
si2 => Si_2_1,
si3 => Si_3_1,
ro0 => Real_out1,
ro1 => Real_out2,
ro2 => Real_out3,
ro3 => Real_out4,
io0 => Imag_out1,
io1 => Imag_out2,
io2 => Imag_out3,
io3 => Imag_out4
);
Real_out <= Real_out_sig(15)& Real_out_sig(15)& Real_out_sig(15 downto 2);
imag_out <= imag_out_sig(15)& imag_out_sig(15)& imag_out_sig(15 downto 2);
Data1 <= Real_out1 & Imag_out1;
Data2 <= Real_out2 & Imag_out2;
Data3 <= Real_out3 & Imag_out3;
Data4 <= Real_out4 & Imag_out4;

process(clk, reset)
begin
if reset = '1' then
ri0_s1 <= (others => '0');
ri1_s1 <= (others => '0');
ri2_s1 <= (others => '0');
ri3_s1 <= (others => '0');
ii0_s1 <= (others => '0');
ii1_s1 <= (others => '0');
ii2_s1 <= (others => '0');
ii3_s1 <= (others => '0');
Co_1_1 <= (others => '0');
Co_2_1 <= (others => '0');
Co_3_1 <= (others => '0');
Si_1_1 <= (others => '0');
Si_2_1 <= (others => '0');
Si_3_1 <= (others => '0');
elsif clk = '1' and clk'event then
ri0_s1 <= ri0_s;
ri1_s1 <= ri1_s;
ri2_s1 <= ri2_s;
ri3_s1 <= ri3_s;
ii0_s1 <= ii0_s;
ii1_s1 <= ii1_s;
ii2_s1 <= ii2_s;
ii3_s1 <= ii3_s;
Co_1_1 <= Co_1;
Co_2_1 <= Co_2;
Co_3_1 <= Co_3;
Si_1_1 <= Si_1;
Si_2_1 <= Si_2;
Si_3_1 <= Si_3;
end if;
end process;

--- write address generation process to write input data

Process(Clk, Reset)
begin
if Reset = '1' then
wr_add_1st <= (others => '0');
elsif Clk'event and Clk = '1' then
if start = '1' then -- initializing the write address
on start signal
wr_add_1st <= ( others => '0');
elsif valid_in = '1' then
wr_add_1st <= wr_add_1st + '1'; -- incrementing the write address
end if;
end if;
end process;
-- four inputs are required to radix 4 butterfly with each offset by 64 so
storing
-- each 64 inputs to seperate memory and the last 64 inputs are given to
butterfly
-- on the fly
Wr_en1 <= '1' when wr_add_1st >= 0 and wr_add_1st <= 63 else --wr enable
for first 64 input
'0';

Wr_en2 <= '1' when wr_add_1st >= 64 and wr_add_1st <= 127 else--wr enable
for second 64 input
'0';

Wr_en3 <= '1' when wr_add_1st >= 128 and wr_add_1st <= 191 else--wr enable
for third 64 input
'0';

-- ram_stage3_r mdoule instantiation

Inst_ram_stage10 : ram_stage3_r PORT MAP(
clk => Clk,
wr => Wr_en1,
wadd => wr_add_s,
radd => rd_add_s,
data_in => Data_in_s,
data_out => Data_out01_s
);
-- ram_stage3_r mdoule instantiation

Inst_ram_stage11 : ram_stage3_r PORT MAP(

clk => Clk,
wr => Wr_en2,
wadd => wr_add_s,
radd => rd_add_s,
data_in => Data_in_s,
data_out => Data_out02_s
);
-- ram_stage3_r mdoule instantiation
Inst_ram_stage12 : ram_stage3_r PORT MAP(
clk => Clk,
wr => Wr_en3,
wadd => wr_add_s,
radd => rd_add_s,
data_in => Data_in_s,
data_out => Data_out03_s
);

Data_in_s <= Real_in & Imag_in;

wr_add_s <= wr_add_1st(5 downto 0);

Real_1st_in_0 <= Data_out01_s (31 downto 16);
Imag_1st_in_0 <= Data_out01_s (15 downto 0);

Real_1st_in_1 <= Data_out02_s (31 downto 16);

Imag_1st_in_1 <= Data_out02_s (15 downto 0);

Real_1st_in_2 <= Data_out03_s (31 downto 16);

Imag_1st_in_2 <= Data_out03_s (15 downto 0);

Real_1st_in_3 <= Real_in;

Imag_1st_in_3 <= Imag_in;

-- process to assign next state to present state

Process(Clk, reset)
begin
if Reset = '1' then
ps1 <= Rst;
elsif Clk'event and Clk = '1' then
ps1 <= ns1;
end if;
end process;
-- state machine to genrate the read enable for the input data memory
process(ps1, wr_add_1st, Rd_add_ss)
begin
case ps1 is
when Rst => if wr_add_1st = 189 then -- 253
ns1 <= s0;
else
ns1 <= Rst;
end if;

when s0 => if Rd_add_ss = 63 then

ns1 <= Rst;
else
ns1 <= s0;
end if;

when others => ns1 <= Rst;

end case;
end process;

process(ps1, wr_add_1st)
begin
case ps1 is
when Rst => enable_rd_count1 <= '0';

when s0 => enable_rd_count1 <= '1';

when others => enable_rd_count1 <= '0';

end case;
end process;

-- process to generate the read address for the input data memory
process(Clk, Reset, enable_rd_count1, wr_add_1st, start)
begin
if Reset = '1' then ----- add
Rd_add_ss <= ( others => '1');
elsif Clk'event and Clk = '1' then
if Rd_add_ss = 64 then
Rd_add_ss <= ( others => '1');
elsif enable_rd_count1 = '1' then
Rd_add_ss <= Rd_add_ss + '1';
end if;
end if;
end process;

Rd_add_s <= Rd_add_ss(5 downto 0);

process(Clk, reset, Rd_add_s, Rd_add_ss2, Rd_add_s2, wr_add_ss)

begin
if reset = '1' then
Rd_add_s_d <= ( others => '1');
Rd_add_s2_d <= ( others => '1');
wr_add_ss_d <= ( others => '1');
elsif Clk'event and Clk = '1' then
Rd_add_s_d <= Rd_add_ss;
Rd_add_s2_d <= Rd_add_s2;
wr_add_ss_d <= wr_add_ss;
end if;
end process;
--
******************************************************************************
*********
-- process to assign next state to present state
Process(Clk, reset)
begin
if Reset = '1' then
ps2 <= Rst;
elsif Clk'event and Clk = '1' then
ps2 <= ns2;
end if;
end process;
--state machine for the genration of write enable signal to write the
butterfly output after first stage of processing
--
process(ps2, wr_add_1st, wr_add_ss)
begin
case ps2 is
when Rst => if wr_add_1st = 189 then -- 253
ns2 <= s0;
else
ns2 <= Rst;
end if;

when s0 => if wr_add_ss = 254 and wr_add_1st = 189 then

ns2 <= s0;
elsif wr_add_ss = 254 then
ns2 <= rst;
else
ns2 <= s0;
end if;
when others => ns2 <= Rst;
end case;
end process;

process(ps2, wr_add_1st)
begin
case ps2 is
when Rst => enable_wr_count1 <= '0';

when s0 => enable_wr_count1 <= '1';

when others => enable_wr_count1 <= '0';

end case;
end process;
-- generating the write address for the second stage data
process(Clk, Reset, enable_wr_count1, start, wr_add_ss)
begin
if Reset = '1' then
wr_add_ss <= ( others => '1');
wr_en_ram2 <= '0';
wr_en_ram3 <= '0';
wr_en_ram4 <= '0';
wr_en_ram5 <= '0';
elsif Clk'event and Clk = '1' then

wr_en_ram2 <= enable_wr_count1_d2;

wr_en_ram3 <= wr_en_ram2;
wr_en_ram4 <= wr_en_ram3;
wr_en_ram5 <= wr_en_ram4;
if enable_wr_count1 = '1' then
if wr_add_ss = 255 then
wr_add_ss <= ( others => '0');
else
wr_add_ss <= wr_add_ss + '1';
end if;
end if;
end if;
end process;

Rd_add_ss2 <= Rd_add_s2(5 downto 0);

wr_add_ss2 <= wr_add_ss( 7 downto 0);
mux_sel_s1 <= mux_sel_s;

process(Clk, reset, wr_add_ss, enable_wr_count1)

begin
if reset = '1' then
wr_add_sss <= ( others => '1');
enable_wr_count1_d1 <= '0';
enable_wr_count1_d2 <= '0';
elsif Clk'event and Clk = '1' then
wr_add_sss <= wr_add_ss;
enable_wr_count1_d1 <= enable_wr_count1;
enable_wr_count1_d2 <= enable_wr_count1_d1;
end if;
end process;

--- ram_stage1_r module instantiation

Inst_ram_stage20 : ram_stage1_r PORT MAP(
clk => Clk,
reset => reset,
wr => wr_en_ram5,
Mux_sel => Mux_sel_s,
wadd => wr_add_ss2,
radd => Rd_add_ss2,
data_in1 => Data1,
data_in2 => Data2,
data_in3 => Data3,
data_in4 => Data4,
data_out1 => Data_out11_s,
data_out2 => Data_out12_s,
data_out3 => Data_out13_s,
data_out4 => Data_out14_s,
data_out_final_real => Real_out_sig,
data_out_final_Imag => Imag_out_sig,
Valid_out => Valid_out
);

Real_2st_in_0 <= Data_out11_s (31 downto 16);

Imag_2st_in_0 <= Data_out11_s (15 downto 0);
Real_2st_in_1 <= Data_out12_s (31 downto 16);
Imag_2st_in_1 <= Data_out12_s (15 downto 0);
Real_2st_in_2 <= Data_out13_s (31 downto 16);
Imag_2st_in_2 <= Data_out13_s (15 downto 0);
Real_2st_in_3 <= Data_out14_s (31 downto 16);
Imag_2st_in_3 <= Data_out14_s (15 downto 0);

-- process to generate the final read address for the output of the FFT
Process(Clk, reset)
begin
if Reset = '1' then
ps3 <= Rst;
elsif Clk'event and Clk = '1' then
ps3 <= ns3;
end if;
end process;

process(ps3, Rd_add_s2, Rd_add_ss)

begin
case ps3 is
when Rst => if Rd_add_ss = 62 or Rd_add_ss = 63 then -- 255 or
Rd_add_ss = 63
ns3 <= s0;
else
ns3 <= Rst;
end if;

when s0 => ns3 <= s1;

when s1 => ns3 <= s2;
when s2 => ns3 <= s3;
when s3 => if Rd_add_ss = 62 then
ns3 <= s0;
elsif Rd_add_s2 = 63 then -- 255
ns3 <= Rst;
else
ns3 <= s0;
end if;

when others => ns3 <= Rst;

end case;
end process;

process(ps3)
begin
case ps3 is
when Rst => enable_rd_count2 <= '0';
mux_sel_s <= "00";

when s0 => enable_rd_count2 <= '1';

mux_sel_s <= "00";

when s1 => enable_rd_count2 <= '0';

mux_sel_s <= "01";

when s2 => enable_rd_count2 <= '0';

mux_sel_s <= "10";
when s3 => enable_rd_count2 <= '0';
mux_sel_s <= "11";

when others => enable_rd_count2 <= '0';

mux_sel_s <= "00";
end case;
end process;

process(Clk, Reset, enable_rd_count2, Rd_add_s2, start)

begin
if Reset = '1' then --- add
Rd_add_s2 <= ( others => '1');
elsif Clk'event and Clk = '1' then
if enable_rd_count2 = '1' then
if Rd_add_s2 = 63 then
Rd_add_s2 <= ( others => '0');
else
Rd_add_s2 <= Rd_add_s2 + '1';
end if;
end if;
end if;
end process;

-- ********logic for final stage read

****************************************************

-- butterfly input assignment depending in the stages

ri0_s <= Real_1st_in_0 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_0 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ii0_s <= Imag_1st_in_0 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_0 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ri1_s <= Real_1st_in_1 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_1 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ii1_s <= Imag_1st_in_1 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_1 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ri2_s <= Real_1st_in_2 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_2 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ii2_s <= Imag_1st_in_2 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_2 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ri3_s <= Real_1st_in_3 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_3 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

ii3_s <= Imag_1st_in_3 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_3 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192
else
( others => '0');

-- twiddle factor assignment depennding on the stages

co_1 <= co1(15)&co1(15)&co1(15)&co1(15)&co1(15)&co1(15)&co1(15 downto 6);

co_2 <= co2(15)&co2(15)&co2(15)&co2(15)&co2(15)&co2(15)&co2(15 downto 6);
co_3 <= co3(15)&co3(15)&co3(15)&co3(15)&co3(15)&co3(15)&co3(15 downto 6);
si_1 <= so1(15)&so1(15)&so1(15)&so1(15)&so1(15)&so1(15)&so1(15 downto 6);
si_2 <= so2(15)&so2(15)&so2(15)&so2(15)&so2(15)&so2(15)&so2(15 downto 6);
si_3 <= so3(15)&so3(15)&so3(15)&so3(15)&so3(15)&so3(15)&so3(15 downto 6);

Co1 <= co_11(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

wr_add_ss_d <= 63 else
co_12(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 64 and
wr_add_ss_d <= 127 else
co_13(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 128
and wr_add_ss_d <= 191 else
x"4000" when wr_add_ss_d >= 192 and wr_add_ss_d <= 255 else
(others => '0');

Co2 <= co_21(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

wr_add_ss_d <= 63 else
co_22(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 64 and
wr_add_ss_d <= 127 else
co_23(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 128
and wr_add_ss_d <= 191 else
x"4000" when wr_add_ss_d >= 192 and wr_add_ss_d <= 255 else
( others => '0');

Co3 <= co_31(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

wr_add_ss_d <= 63 else
co_32(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 64 and
wr_add_ss_d <= 127 else
co_33(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 128
and wr_add_ss_d <= 191 else
x"4000" when wr_add_ss_d >= 192 and wr_add_ss_d <= 255 else
( others => '0');

So1 <= si_11(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

wr_add_ss_d <= 63 else
si_12(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 64 and
wr_add_ss_d <= 127 else
si_13(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 128
and wr_add_ss_d <= 191 else
( others => '0');

So2 <= si_21(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

wr_add_ss_d <= 63 else
si_22(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 64 and
wr_add_ss_d <= 127 else
si_23(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 128
and wr_add_ss_d <= 191 else
( others => '0');

So3 <= si_31(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

wr_add_ss_d <= 63 else
si_32(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 64 and
wr_add_ss_d <= 127 else
si_33(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 128
and wr_add_ss_d <= 191 else
( others => '0');

end Behavioral;

VI. IFFT 256 point code

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

---------------------------- Entity declaration

-------------------------------------
entity Top_fft_256_r is
PORT(
clk : IN std_logic;--processing clock
reset : IN std_logic;-- Asychronous reset
real_in : IN std_logic_vector(15 downto 0);-- real in fft
imag_in : IN std_logic_vector(15 downto 0);-- imaginary in FFT
valid_in : IN std_logic;--valid in FFt
start : IN std_logic;--start FFT
real_out : OUT std_logic_vector(15 downto 0);-- real out FFT
imag_out : OUT std_logic_vector(15 downto 0);-- Imaginary out FFT
valid_out : OUT std_logic-- valid out FFT
);
end Top_fft_256_r;

---------------------------- Architecture begin here

-------------------------------
architecture Behavioral of Top_fft_256_r is

--------------------------- Components declaration

---------------------------------
COMPONENT radix4_butterfly_rx
PORT(
clk : in std_logic; -- Processing clock
reset : in std_logic; -- Asychronous rese
ri0 : IN std_logic_vector(15 downto 0); -- input1 real part
ri1 : IN std_logic_vector(15 downto 0); -- input2 real part
ri2 : IN std_logic_vector(15 downto 0); -- input3 real part
ri3 : IN std_logic_vector(15 downto 0); -- input4 real part
ii0 : IN std_logic_vector(15 downto 0); -- input1 imaginary part
ii1 : IN std_logic_vector(15 downto 0); -- input2 imaginary part
ii2 : IN std_logic_vector(15 downto 0); -- input3 imaginary part
ii3 : IN std_logic_vector(15 downto 0); -- input4 imaginary part
co1 : IN std_logic_vector(15 downto 0); -- Cos of the angle1
co2 : IN std_logic_vector(15 downto 0); -- Cos of the angle2
co3 : IN std_logic_vector(15 downto 0); -- Cos of the angle3
si1 : IN std_logic_vector(15 downto 0); -- Sin of the angle1
si2 : IN std_logic_vector(15 downto 0); -- Sin of the angle2
si3 : IN std_logic_vector(15 downto 0); -- Sin of the angle3
ro0 : OUT std_logic_vector(15 downto 0); -- real part of the output1
ro1 : OUT std_logic_vector(15 downto 0); -- real part of the output2
ro2 : OUT std_logic_vector(15 downto 0); -- real part of the output3
ro3 : OUT std_logic_vector(15 downto 0); -- real part of the output4
io0 : OUT std_logic_vector(15 downto 0); -- imaginary part of the
output1
io1 : OUT std_logic_vector(15 downto 0); -- imaginary part of the
output2
io2 : OUT std_logic_vector(15 downto 0); -- imaginary part of the
output3
io3 : OUT std_logic_vector(15 downto 0) -- imaginary part of the
output4
);
END COMPONENT;

COMPONENT ram_stage1_rx
PORT(
clk : IN std_logic; -- processing clock
reset : in std_logic; -- Asynchronous reset
wr : IN std_logic; -- Write enable signal
Mux_sel : in std_logic_vector(1 downto 0); -- Mux
selection line
wadd : IN std_logic_vector(7 downto 0); -- write
address signal
radd : IN std_logic_vector(5 downto 0); -- read address
signal
data_in1 : IN std_logic_vector(31 downto 0); -- input data1
to store into fifo
data_in2 : IN std_logic_vector(31 downto 0); -- input data2
to store into fifo
data_in3 : IN std_logic_vector(31 downto 0); -- input data3
to store into fifo
data_in4 : IN std_logic_vector(31 downto 0); -- input data4
to store into fifo
data_out1 : OUT std_logic_vector(31 downto 0); -- output data1
from fifo
data_out2 : OUT std_logic_vector(31 downto 0); -- output data2
from fifo
data_out3 : OUT std_logic_vector(31 downto 0); -- output data3
from fifo
data_out4 : OUT std_logic_vector(31 downto 0); -- output data4
from fifo
data_out_final_real : out std_logic_vector(15 downto 0); -- Real parat
of the output
data_out_final_Imag : out std_logic_vector(15 downto 0); -- Imaginary
part of the output
Valid_out : out std_logic-- Output data valid signal
);
END COMPONENT;

COMPONENT ram_stage3_rx
PORT(
clk : IN std_logic; -- Processing clock
wr : IN std_logic;-- Write enable signal
wadd : IN std_logic_vector(5 downto 0); -- Write address
radd : IN std_logic_vector(5 downto 0); -- Read address
data_in : IN std_logic_vector(31 downto 0);-- Input data to store into
ram
data_out : OUT std_logic_vector(31 downto 0)-- Output data from ram
);
END COMPONENT;

COMPONENT ram_stage2_rx
PORT(
clk : IN std_logic;
reset : in std_logic;
wr : IN std_logic;
wadd : IN std_logic_vector(5 downto 0);
radd : IN std_logic_vector(8 downto 0);
data_in1 : IN std_logic_vector(31 downto 0);
data_in2 : IN std_logic_vector(31 downto 0);
data_in3 : IN std_logic_vector(31 downto 0);
data_in4 : IN std_logic_vector(31 downto 0);
data_out : OUT std_logic_vector(31 downto 0)
);
END COMPONENT;

---------------- Signals declaration

---------------------------------------------
signal Real_1st_in_0 : std_logic_vector(15 downto 0);
signal Real_1st_in_1 : std_logic_vector(15 downto 0);
signal Real_1st_in_2 : std_logic_vector(15 downto 0);
signal Real_1st_in_3 : std_logic_vector(15 downto 0);
signal Imag_1st_in_0 : std_logic_vector(15 downto 0);
signal Imag_1st_in_1 : std_logic_vector(15 downto 0);
signal Imag_1st_in_2 : std_logic_vector(15 downto 0);
signal Imag_1st_in_3 : std_logic_vector(15 downto 0);
signal Real_2st_in_0 : std_logic_vector(15 downto 0);
signal Real_2st_in_1 : std_logic_vector(15 downto 0);
signal Real_2st_in_2 : std_logic_vector(15 downto 0);
signal Real_2st_in_3 : std_logic_vector(15 downto 0);
signal Imag_2st_in_0 : std_logic_vector(15 downto 0);
signal Imag_2st_in_1 : std_logic_vector(15 downto 0);
signal Imag_2st_in_2 : std_logic_vector(15 downto 0);
signal Imag_2st_in_3 : std_logic_vector(15 downto 0);
signal wr_add_1st : std_logic_vector(8 downto 0);
signal wr_en1 : std_logic;
signal wr_en2 : std_logic;
signal wr_en3 : std_logic;
signal enable_wr_count1 : std_logic;
signal Data_in_s : std_logic_vector(31 downto 0);
signal wr_add_s : std_logic_vector(5 downto 0);
signal rd_add_s : std_logic_vector(5 downto 0);
signal Rd_add_s_d : std_logic_vector(6 downto 0);
signal rd_add_ss : std_logic_vector(6 downto 0);
signal rd_add_ss2 : std_logic_vector(5 downto 0);
signal Data_out01_s : std_logic_vector(31 downto 0);
signal Data_out02_s : std_logic_vector(31 downto 0);
signal Data_out03_s : std_logic_vector(31 downto 0);
signal Data_out11_s : std_logic_vector(31 downto 0);
signal Data_out12_s : std_logic_vector(31 downto 0);
signal Data_out13_s : std_logic_vector(31 downto 0);
signal Data_out14_s : std_logic_vector(31 downto 0);
signal enable_rd_count1 : std_logic;
signal Data1 : std_logic_vector(31 downto 0);
signal Data2 : std_logic_vector(31 downto 0);
signal Data3 : std_logic_vector(31 downto 0);
signal Data4 : std_logic_vector(31 downto 0);
signal enable_rd_count2 : std_logic;
signal rd_add_s2 : std_logic_vector(8 downto 0);
signal rd_add_s2_d : std_logic_vector(8 downto 0);
signal wr_add_ss : std_logic_vector(8 downto 0);
signal wr_add_ss_d : std_logic_vector(8 downto 0);
signal wr_add_sss : std_logic_vector(8 downto 0);
signal wr_en_ram2 : std_logic;
signal wr_add_ss2 : std_logic_vector(7 downto 0);
signal Co1 : std_logic_vector(15 downto 0);
signal Co2 : std_logic_vector(15 downto 0);
signal Co3 : std_logic_vector(15 downto 0);
signal So1 : std_logic_vector(15 downto 0);
signal So2 : std_logic_vector(15 downto 0);
signal So3 : std_logic_vector(15 downto 0);
SIGNAL co_1_1 : std_logic_vector(15 downto 0);
SIGNAL co_2_1 : std_logic_vector(15 downto 0);
SIGNAL co_3_1 : std_logic_vector(15 downto 0);
SIGNAL si_1_1 : std_logic_vector(15 downto 0);
SIGNAL si_2_1 : std_logic_vector(15 downto 0);
SIGNAL si_3_1 : std_logic_vector(15 downto 0);
type state1 is (rst, s0, s1, s2, s3, s4);
signal ps1, ns1 : state1;

type state2 is (rst, s0, s1, s2, s3, s4);

signal ps2, ns2 : state2;

type state3 is (rst, s0, s1, s2, s3, s4);

signal ps3, ns3 : state3;

type state4 is (rst, s0, s1, s2, s3, s4);

signal ps4, ns4 : state4;

signal mux_sel_s : std_logic_vector(1 downto 0);

signal Real_out1 : std_logic_vector(15 downto 0);

signal Real_out2 : std_logic_vector(15 downto 0);
signal Real_out3 : std_logic_vector(15 downto 0);
signal Real_out4 : std_logic_vector(15 downto 0);

signal Imag_out1 : std_logic_vector(15 downto 0);

signal Imag_out2 : std_logic_vector(15 downto 0);
signal Imag_out3 : std_logic_vector(15 downto 0);
signal Imag_out4 : std_logic_vector(15 downto 0);

signal ri0_s : std_logic_vector(15 downto 0);

signal ri1_s : std_logic_vector(15 downto 0);
signal ri2_s : std_logic_vector(15 downto 0);
signal ri3_s : std_logic_vector(15 downto 0);

signal ii0_s : std_logic_vector(15 downto 0);

signal ii1_s : std_logic_vector(15 downto 0);
signal ii2_s : std_logic_vector(15 downto 0);
signal ii3_s : std_logic_vector(15 downto 0);

signal ri0_s1 : std_logic_vector(15 downto 0);

signal ri1_s1 : std_logic_vector(15 downto 0);
signal ri2_s1 : std_logic_vector(15 downto 0);
signal ri3_s1 : std_logic_vector(15 downto 0);

signal ii0_s1 : std_logic_vector(15 downto 0);

signal ii1_s1 : std_logic_vector(15 downto 0);
signal ii2_s1 : std_logic_vector(15 downto 0);
signal ii3_s1 : std_logic_vector(15 downto 0);

signal enable_wr_count1_d1 : std_logic;

signal enable_wr_count1_d2 : std_logic;

-------------------- ROM declaration and initialization

---------------------------------
type rom is array(0 to 63) of std_logic_vector(15 downto 0);

constant co_11:rom:=( x"4000",

x"3ffb",
x"3fec",
x"3fd3",
x"3fb1",
x"3f84",
x"3f4e",
x"3f0e",
x"3ec5",
x"3e71",
x"3e14",
x"3dae",
x"3d3e",
x"3cc5",
x"3c42",
x"3bb6",
x"3b20",
x"3a82",
x"39da",
x"392a",
x"3871",
x"37af",
x"36e5",
x"3612",
x"3536",
x"3453",
x"3367",
x"3274",
x"3179",
x"3076",
x"2f6b",
x"2e5a",
x"2d41",
x"2c21",
x"2afa",
x"29cd",
x"2899",
x"275f",
x"261f",
x"24da",
x"238e",
x"223d",
x"20e7",
x"1f8b",
x"1e2b",
x"1cc6",
x"1b5d",
x"19ef",
x"187d",
x"1708",
x"158f",
x"1413",
x"1294",
x"1111",
x"0f8c",
x"0e05",
x"0c7c",
x"0af1",
x"0964",
x"07d5",
x"0645",
x"04b5",
x"0323",
x"0192"
);
constant co_12:rom:=( x"4000",
x"4000",
x"4000",
x"4000",
x"3fb1",
x"3fb1",
x"3fb1",
x"3fb1",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3d3e",
x"3d3e",
x"3d3e",
x"3d3e",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3871",
x"3871",
x"3871",
x"3871",
x"3536",
x"3536",
x"3536",
x"3536",
x"3179",
x"3179",
x"3179",
x"3179",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2899",
x"2899",
x"2899",
x"2899",
x"238e",
x"238e",
x"238e",
x"238e",
x"1e2b",
x"1e2b",
x"1e2b",
x"1e2b",
x"187d",
x"187d",
x"187d",
x"187d",
x"1294",
x"1294",
x"1294",
x"1294",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"0645",
x"0645",
x"0645",
x"0645"
);
constant co_13:rom:=(x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d"
);

constant co_21:rom:=( x"4000",

x"3fec",
x"3fb1",
x"3f4e",
x"3ec5",
x"3e14",
x"3d3e",
x"3c42",
x"3b20",
x"39da",
x"3871",
x"36e5",
x"3536",
x"3367",
x"3179",
x"2f6b",
x"2d41",
x"2afa",
x"2899",
x"261f",
x"238e",
x"20e7",
x"1e2b",
x"1b5d",
x"187d",
x"158f",
x"1294",
x"0f8c",
x"0c7c",
x"0964",
x"0645",
x"0323",
x"0000",
x"fcdd",
x"f9bb",
x"f69c",
x"f384",
x"f074",
x"ed6c",
x"ea71",
x"e783",
x"e4a3",
x"e1d5",
x"df19",
x"dc72",
x"d9e1",
x"d767",
x"d506",
x"d2bf",
x"d095",
x"ce87",
x"cc99",
x"caca",
x"c91b",
x"c78f",
x"c626",
x"c4e0",
x"c3be",
x"c2c2",
x"c1ec",
x"c13b",
x"c0b2",
x"c04f",
x"c014"
);

constant co_22:rom:=( x"4000",

x"4000",
x"4000",
x"4000",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3536",
x"3536",
x"3536",
x"3536",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"238e",
x"238e",
x"238e",
x"238e",
x"187d",
x"187d",
x"187d",
x"187d",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"0000",
x"0000",
x"0000",
x"0000",
x"f384",
x"f384",
x"f384",
x"f384",
x"e783",
x"e783",
x"e783",
x"e783",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"caca",
x"caca",
x"caca",
x"caca",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c13b",
x"c13b",
x"c13b",
x"c13b"
);

constant co_23:rom:=( x"4000",

constant co_31:rom:=( x"4000",

x"3fd3",
x"3f4e",
x"3e71",
x"3d3e",
x"3bb6",
x"39da",
x"37af",
x"3536",
x"3274",
x"2f6b",
x"2c21",
x"2899",
x"24da",
x"20e7",
x"1cc6",
x"187d",
x"1413",
x"0f8c",
x"0af1",
x"0645",
x"0192",
x"fcdd",
x"f82b",
x"f384",
x"eeef",
x"ea71",
x"e611",
x"e1d5",
x"ddc3",
x"d9e1",
x"d633",
x"d2bf",
x"cf8a",
x"cc99",
x"c9ee",
x"c78f",
x"c57e",
x"c3be",
x"c252",
x"c13b",
x"c07c",
x"c014",
x"c005",
x"c04f",
x"c0f2",
x"c1ec",
x"c33b",
x"c4e0",
x"c6d6",
x"c91b",
x"cbad",
x"ce87",
x"d1a6",
x"d506",
x"d8a1",
x"dc72",
x"e075",
x"e4a3",
x"e8f8",
x"ed6c",
x"f1fb",
x"f69c",
x"fb4b"
);

constant co_32:rom:=( x"4000",

x"4000",
x"4000",
x"4000",
x"3d3e",
x"3d3e",
x"3d3e",
x"3d3e",
x"3536",
x"3536",
x"3536",
x"3536",
x"2899",
x"2899",
x"2899",
x"2899",
x"187d",
x"187d",
x"187d",
x"187d",
x"0645",
x"0645",
x"0645",
x"0645",
x"f384",
x"f384",
x"f384",
x"f384",
x"e1d5",
x"e1d5",
x"e1d5",
x"e1d5",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"c78f",
x"c78f",
x"c78f",
x"c78f",
x"c13b",
x"c13b",
x"c13b",
x"c13b",
x"c04f",
x"c04f",
x"c04f",
x"c04f",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"ce87",
x"ce87",
x"ce87",
x"ce87",
x"dc72",
x"dc72",
x"dc72",
x"dc72",
x"ed6c",
x"ed6c",
x"ed6c",
x"ed6c"
);

constant co_33:rom:=( x"4000",

x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"4000",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"d2bf",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0",
x"c4e0"
);
constant si_11:rom:=( x"0000",
x"0192",
x"0323",
x"04b5",
x"0645",
x"07d5",
x"0964",
x"0af1",
x"0c7c",
x"0e05",
x"0f8c",
x"1111",
x"1294",
x"1413",
x"158f",
x"1708",
x"187d",
x"19ef",
x"1b5d",
x"1cc6",
x"1e2b",
x"1f8b",
x"20e7",
x"223d",
x"238e",
x"24da",
x"261f",
x"275f",
x"2899",
x"29cd",
x"2afa",
x"2c21",
x"2d41",
x"2e5a",
x"2f6b",
x"3076",
x"3179",
x"3274",
x"3367",
x"3453",
x"3536",
x"3612",
x"36e5",
x"37af",
x"3871",
x"392a",
x"39da",
x"3a82",
x"3b20",
x"3bb6",
x"3c42",
x"3cc5",
x"3d3e",
x"3dae",
x"3e14",
x"3e71",
x"3ec5",
x"3f0e",
x"3f4e",
x"3f84",
x"3fb1",
x"3fd3",
x"3fec",
x"3ffb"
);
constant si_12:rom:=( x"0000",
x"0000",
x"0000",
x"0000",
x"0645",
x"0645",
x"0645",
x"0645",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"1294",
x"1294",
x"1294",
x"1294",
x"187d",
x"187d",
x"187d",
x"187d",
x"1e2b",
x"1e2b",
x"1e2b",
x"1e2b",
x"238e",
x"238e",
x"238e",
x"238e",
x"2899",
x"2899",
x"2899",
x"2899",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"3179",
x"3179",
x"3179",
x"3179",
x"3536",
x"3536",
x"3536",
x"3536",
x"3871",
x"3871",
x"3871",
x"3871",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3d3e",
x"3d3e",
x"3d3e",
x"3d3e",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3fb1",
x"3fb1",
x"3fb1",
x"3fb1"
);

constant si_13:rom:=( x"0000",

x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"187d",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20"
);

constant si_21:rom:=( x"0000",

x"0323",
x"0645",
x"0964",
x"0c7c",
x"0f8c",
x"1294",
x"158f",
x"187d",
x"1b5d",
x"1e2b",
x"20e7",
x"238e",
x"261f",
x"2899",
x"2afa",
x"2d41",
x"2f6b",
x"3179",
x"3367",
x"3536",
x"36e5",
x"3871",
x"39da",
x"3b20",
x"3c42",
x"3d3e",
x"3e14",
x"3ec5",
x"3f4e",
x"3fb1",
x"3fec",
x"3fff",
x"3fec",
x"3fb1",
x"3f4e",
x"3ec5",
x"3e14",
x"3d3e",
x"3c42",
x"3b20",
x"39da",
x"3871",
x"36e5",
x"3536",
x"3367",
x"3179",
x"2f6b",
x"2d41",
x"2afa",
x"2899",
x"261f",
x"238e",
x"20e7",
x"1e2b",
x"1b5d",
x"187d",
x"158f",
x"1294",
x"0f8c",
x"0c7c",
x"0964",
x"0645",
x"0323"
);

constant si_22:rom:=( x"0000",

x"0000",
x"0000",
x"0000",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"187d",
x"187d",
x"187d",
x"187d",
x"238e",
x"238e",
x"238e",
x"238e",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"3536",
x"3536",
x"3536",
x"3536",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3536",
x"3536",
x"3536",
x"3536",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"238e",
x"238e",
x"238e",
x"238e",
x"187d",
x"187d",
x"187d",
x"187d",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c"
);

constant si_23:rom:=( x"0000",

x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"3fff",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41"
);

constant si_31:rom:=(x"0000",
x"04b5",
x"0964",
x"0e05",
x"1294",
x"1708",
x"1b5d",
x"1f8b",
x"238e",
x"275f",
x"2afa",
x"2e5a",
x"3179",
x"3453",
x"36e5",
x"392a",
x"3b20",
x"3cc5",
x"3e14",
x"3f0e",
x"3fb1",
x"3ffb",
x"3fec",
x"3f84",
x"3ec5",
x"3dae",
x"3c42",
x"3a82",
x"3871",
x"3612",
x"3367",
x"3076",
x"2d41",
x"29cd",
x"261f",
x"223d",
x"1e2b",
x"19ef",
x"158f",
x"1111",
x"0c7c",
x"07d5",
x"0323",
x"fe6e",
x"f9bb",
x"f50f",
x"f074",
x"ebed",
x"e783",
x"e33a",
x"df19",
x"db26",
x"d767",
x"d3df",
x"d095",
x"cd8c",
x"caca",
x"c851",
x"c626",
x"c44a",
x"c2c2",
x"c18f",
x"c0b2",
x"c02d"
);

constant si_32:rom:=(x"0000",
x"0000",
x"0000",
x"0000",
x"1294",
x"1294",
x"1294",
x"1294",
x"238e",
x"238e",
x"238e",
x"238e",
x"3179",
x"3179",
x"3179",
x"3179",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3fb1",
x"3fb1",
x"3fb1",
x"3fb1",
x"3ec5",
x"3ec5",
x"3ec5",
x"3ec5",
x"3871",
x"3871",
x"3871",
x"3871",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"1e2b",
x"1e2b",
x"1e2b",
x"1e2b",
x"0c7c",
x"0c7c",
x"0c7c",
x"0c7c",
x"f9bb",
x"f9bb",
x"f9bb",
x"f9bb",
x"e783",
x"e783",
x"e783",
x"e783",
x"d767",
x"d767",
x"d767",
x"d767",
x"caca",
x"caca",
x"caca",
x"caca",
x"c2c2",
x"c2c2",
x"c2c2",
x"c2c2"
);

constant si_33:rom:=(x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"0000",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"3b20",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"2d41",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783",
x"e783"
);
signal wr_en_ram3 : std_logic;
signal wr_en_ram4 : std_logic;
signal wr_en_ram5 : std_logic;

begin

------ radix4_butterfly_rx module instantiation

Inst_radix4_butterfly : radix4_butterfly_rx PORT MAP(
clk => clk,
reset => reset,
ri0 => ri0_s1,
ri1 => ri1_s1,
ri2 => ri2_s1,
ri3 => ri3_s1,
ii0 => ii0_s1,
ii1 => ii1_s1,
ii2 => ii2_s1,
ii3 => ii3_s1,
co1 => Co_1_1,
co2 => Co_2_1,
co3 => Co_3_1,
si1 => Si_1_1,
si2 => Si_2_1,
si3 => Si_3_1,
ro0 => Real_out1,
ro1 => Real_out2,
ro2 => Real_out3,
ro3 => Real_out4,
io0 => Imag_out1,
io1 => Imag_out2,
io2 => Imag_out3,
io3 => Imag_out4

);
process(clk,reset)
begin
if reset = '1' then
ri0_s1 <= (others => '0');
ri1_s1 <= (others => '0');
ri2_s1 <= (others => '0');
ri3_s1 <= (others => '0');
ii0_s1 <= (others => '0');
ii1_s1 <= (others => '0');
ii2_s1 <= (others => '0');
ii3_s1 <= (others => '0');
Co_1_1 <= (others => '0');
Co_2_1 <= (others => '0');
Co_3_1 <= (others => '0');
Si_1_1 <= (others => '0');
Si_2_1 <= (others => '0');
Si_3_1 <= (others => '0');
elsif clk = '1' and clk'event then
ri0_s1 <= ri0_s;
ri1_s1 <= ri1_s;
ri2_s1 <= ri2_s;
ri3_s1 <= ri3_s;
ii0_s1 <= ii0_s;
ii1_s1 <= ii1_s;
ii2_s1 <= ii2_s;
ii3_s1 <= ii3_s;
Co_1_1 <= Co1;
Co_2_1 <= Co2;
Co_3_1 <= Co3;
Si_1_1 <= So1;
Si_2_1 <= So2;
Si_3_1 <= So3;
end if;
end process;

Data1 <= Real_out1 & Imag_out1;

Data2 <= Real_out2 & Imag_out2;
Data3 <= Real_out3 & Imag_out3;
Data4 <= Real_out4 & Imag_out4;

Process(Clk,Reset)
begin
if Reset = '1' then
wr_add_1st <= (others => '0');
elsif Clk'event and Clk = '1' then
if start = '1' then
wr_add_1st <= ( others => '0');
elsif valid_in = '1' then
wr_add_1st <= wr_add_1st + '1';
end if;
end if;
end process;

Wr_en1 <= '1' when wr_add_1st >= 0 and wr_add_1st <= 63 else
'0';
Wr_en2 <= '1' when wr_add_1st >= 64 and wr_add_1st <= 127 else
'0';

Wr_en3 <= '1' when wr_add_1st >= 128 and wr_add_1st <= 191 else
'0';

Inst_ram_stage10 : ram_stage3_rx PORT MAP(

clk => Clk,
wr => Wr_en1,
wadd => wr_add_s,
radd => rd_add_s,
data_in => Data_in_s,
data_out => Data_out01_s
);

Inst_ram_stage11 : ram_stage3_rx PORT MAP(

clk => Clk,
wr => Wr_en2,
wadd => wr_add_s,
radd => rd_add_s,
data_in => Data_in_s,
data_out => Data_out02_s
);

Inst_ram_stage12 : ram_stage3_rx PORT MAP(

clk => Clk,
wr => Wr_en3,
wadd => wr_add_s,
radd => rd_add_s,
data_in => Data_in_s,
data_out => Data_out03_s
);

Data_in_s <= Real_in & Imag_in;

wr_add_s <= wr_add_1st(5 downto 0);

Real_1st_in_0 <= Data_out01_s (31 downto 16);
Imag_1st_in_0 <= Data_out01_s (15 downto 0);

Real_1st_in_1 <= Data_out02_s (31 downto 16);

Imag_1st_in_1 <= Data_out02_s (15 downto 0);

Real_1st_in_2 <= Data_out03_s (31 downto 16);

Imag_1st_in_2 <= Data_out03_s (15 downto 0);

Real_1st_in_3 <= Real_in;

Imag_1st_in_3 <= Imag_in;

Process(Clk,reset)
begin
if Reset = '1' then
ps1 <= Rst;
elsif Clk'event and Clk = '1' then
ps1 <= ns1;
end if;
end process;

process(ps1,wr_add_1st,Rd_add_ss)
begin
case ps1 is
when Rst => if wr_add_1st = 189 then -- 253
ns1 <= s0;
else
ns1 <= Rst;
end if;

when s0 => if Rd_add_ss = 63 then

ns1 <= Rst;
else
ns1 <= s0;
end if;

when others => ns1 <= Rst;

end case;
end process;
process(ps1,wr_add_1st)
begin
case ps1 is
when Rst => enable_rd_count1 <= '0';

when s0 => enable_rd_count1 <= '1';

when others => enable_rd_count1 <= '0';

end case;
end process;

process(Clk,Reset,enable_rd_count1,wr_add_1st,start)
begin
if Reset = '1' then ----- add
Rd_add_ss <= ( others => '1');
elsif Clk'event and Clk = '1' then
if Rd_add_ss = 64 then
Rd_add_ss <= ( others => '1');
elsif enable_rd_count1 = '1' then
Rd_add_ss <= Rd_add_ss + '1';
end if;
end if;
end process;

Rd_add_s <= Rd_add_ss(5 downto 0);

process(Clk,reset,Rd_add_s,Rd_add_ss2,Rd_add_s2,wr_add_ss)
begin
if reset = '1' then
Rd_add_s_d <= ( others => '1');
-- Rd_add_ss2_d <= ( others => '1');
Rd_add_s2_d <= ( others => '1');
wr_add_ss_d <= ( others => '1');
elsif Clk'event and Clk = '1' then
Rd_add_s_d <= Rd_add_ss;
-- Rd_add_ss2_d <= '1' & Rd_add_s2;
Rd_add_s2_d <= Rd_add_s2;
wr_add_ss_d <= wr_add_ss;
end if;
end process;

--
******************************************************************************
*********

Process(Clk,reset)
begin
if Reset = '1' then
ps2 <= Rst;
elsif Clk'event and Clk = '1' then
ps2 <= ns2;
end if;
end process;

process(ps2,wr_add_1st,wr_add_ss)
begin
case ps2 is
when Rst => if wr_add_1st = 189 then -- 253
ns2 <= s0;
else
ns2 <= Rst;
end if;

when s0 => if wr_add_ss = 254 and wr_add_1st = 189 then

ns2 <= s0;
elsif wr_add_ss = 254 then
ns2 <= rst;
else
ns2 <= s0;
end if;

when others => ns2 <= Rst;

end case;
end process;
process(ps2,wr_add_1st)
begin
case ps2 is
when Rst => enable_wr_count1 <= '0';

when s0 => enable_wr_count1 <= '1';

when others => enable_wr_count1 <= '0';

end case;
end process;
process(Clk,Reset,enable_wr_count1,start,wr_add_ss)
begin
if Reset = '1' then ----- add
wr_add_ss <= ( others => '1');
elsif Clk'event and Clk = '1' then
-- if wr_add_ss = 255 then
-- wr_add_ss <= ( others => '1');
if enable_wr_count1 = '1' then
if wr_add_ss = 255 then
wr_add_ss <= ( others => '0');
else
wr_add_ss <= wr_add_ss + '1';
end if;
end if;
end if;
end process;

wr_add_ss2 <= wr_add_ss( 7 downto 0);

process(Clk,reset,wr_add_ss,enable_wr_count1)
begin
if reset = '1' then
wr_add_sss <= ( others => '1');
enable_wr_count1_d1 <= '0';
enable_wr_count1_d2 <= '0';
wr_en_ram2 <= '0';
wr_en_ram3 <= '0';
wr_en_ram4 <='0';
wr_en_ram5 <='0';

elsif Clk'event and Clk = '1' then

wr_en_ram2 <= enable_wr_count1_d2;
wr_en_ram3 <= wr_en_ram2;
wr_en_ram4 <= wr_en_ram3;
wr_en_ram5 <= wr_en_ram4;

wr_add_sss <= wr_add_ss;

enable_wr_count1_d1 <= enable_wr_count1;
enable_wr_count1_d2 <= enable_wr_count1_d1;
end if;
end process;

-- wr_en_ram2 <= '1' when wr_add_sss >= 0 and wr_add_sss <= 255 else
-- '0';

Inst_ram_stage20: ram_stage1_rx PORT MAP(

clk => Clk,
reset => reset,
wr => wr_en_ram5,
Mux_sel => Mux_sel_s,
wadd => wr_add_ss2,
radd => Rd_add_ss2,
data_in1 => Data1,
data_in2 => Data2,
data_in3 => Data3,
data_in4 => Data4,
data_out1 => Data_out11_s,
data_out2 => Data_out12_s,
data_out3 => Data_out13_s,
data_out4 => Data_out14_s,
data_out_final_real => Real_out,
data_out_final_Imag => Imag_out,
Valid_out => Valid_out
);

Real_2st_in_0 <= Data_out11_s (31 downto 16);

Imag_2st_in_0 <= Data_out11_s (15 downto 0);

Real_2st_in_1 <= Data_out12_s (31 downto 16);

Imag_2st_in_1 <= Data_out12_s (15 downto 0);

Real_2st_in_2 <= Data_out13_s (31 downto 16);

Imag_2st_in_2 <= Data_out13_s (15 downto 0);

Real_2st_in_3 <= Data_out14_s (31 downto 16);

Imag_2st_in_3 <= Data_out14_s (15 downto 0);

Process(Clk,reset)
begin
if Reset = '1' then
ps3 <= Rst;
elsif Clk'event and Clk = '1' then
ps3 <= ns3;
end if;
end process;

process(ps3,Rd_add_s2,Rd_add_ss)
begin
case ps3 is
when Rst => if Rd_add_ss = 62 or Rd_add_ss = 63 then
-- 255 or Rd_add_ss = 63
ns3 <= s0;
else
ns3 <= Rst;
end if;

when s0 => ns3 <= s1;

when s1 => ns3 <= s2;
when s2 => ns3 <= s3;
when s3 => if Rd_add_ss = 62 then
ns3 <= s0;
elsif Rd_add_s2 = 63 then -- 255
ns3 <= Rst;
else
ns3 <= s0;
end if;

when others => ns3 <= Rst;

end case;
end process;
process(ps3)
begin
case ps3 is
when Rst => enable_rd_count2 <= '0';
mux_sel_s <= "00";

when s0 => enable_rd_count2 <= '1';

mux_sel_s <= "00";

when s1 => enable_rd_count2 <= '0';

mux_sel_s <= "01";

when s2 => enable_rd_count2 <= '0';

mux_sel_s <= "10";

when s3 => enable_rd_count2 <= '0';

mux_sel_s <= "11";

when others => enable_rd_count2 <= '0';

mux_sel_s <= "00";
end case;
end process;

process(Clk,Reset,enable_rd_count2,Rd_add_s2,start)
begin
if Reset = '1' then --- add
Rd_add_s2 <= ( others => '1');
elsif Clk'event and Clk = '1' then
if enable_rd_count2 = '1' then
if Rd_add_s2 = 63 then
Rd_add_s2 <= ( others => '0');
else
Rd_add_s2 <= Rd_add_s2 + '1';
end if;
end if;
end if;
end process;

Rd_add_ss2 <= Rd_add_s2(5 downto 0);

--
******************************************************************************
**********

--
******************************************************************************
********
-- ********logic for final stage read
****************************************************
--
******************************************************************************
*********

ri0_s <= Real_1st_in_0 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_0 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ii0_s <= Imag_1st_in_0 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_0 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ri1_s <= Real_1st_in_1 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_1 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ii1_s <= Imag_1st_in_1 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_1 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ri2_s <= Real_1st_in_2 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_2 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ii2_s <= Imag_1st_in_2 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_2 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ri3_s <= Real_1st_in_3 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Real_2st_in_3 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

ii3_s <= Imag_1st_in_3 when Rd_add_s_d >= 0 and Rd_add_s_d <= 63 else
Imag_2st_in_3 when rd_add_s2_d >= 0 and rd_add_s2_d <= 192 else
( others => '0');

Co1 <= co_11(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

Co2 <= co_21(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

Co3 <= co_31(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

So1 <= si_11(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

So2 <= si_21(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

So3 <= si_31(conv_integer(wr_add_ss_d(5 downto 0))) when wr_add_ss_d >= 0 and

end Behavioral;

VII. Conclusion

In this work, the number of multiplications has been considered as a key metric for comparing
the FFT performance since it has a large impact on the throughput and power consumption of
an FFT processor. The efficient 256-points FFT/IFFT architecture proposed in this paper gives
an advantage in terms of multiplicative complexity using pipelined method and complex
multiplication reduction approach. The simulation result shows that proposed architecture
significantly reduces the number of operations inside the processor compared to other efficient
FFT processors. The proposed processor can be integrated with other components to be used as
standalone processor applied for OFDM based Wireless Broadband Communication.
VIII. References

1. L. Weidong, “Studies on implementation of lower power FFT

processors,” Linkö ping Studies in Science and Technology, Linkö ping,

Sweden, 2003.

2. A. Saeed, M. Elbably, G. Abdelfadeel, and M. I. Eladwy, “Efficient

FPGA implementation of FFT/IFFT processor,” Intern. Journal of

Circuit, Systems and Signal Processing, vol. 3, pp.103-110, 2009.

3. K. Maharatna, E. Grass, and U. Jagdhold, “A Lower-Power 64-point

FFT/IFFT architecture for wireless broadband communication,”

presented at 7

th Inter. Conf. on Mobile Multimedia Communication,

Tokyo, 2000.

4. W. Li and L. Wanhammar, “Complex multiplication reduction in FFT

processor,” presented at Swedish System-on-Chip Conference,

Falkenberg, Sweden, March, 2002.

5. U. M. Baese, Digital Signal Processing with Field Programmable

Gate Arrays, 3rd Ed. Springer, 2007.

6. W. H. Chang and T. Nguyen, “An OFDM-specified lossless FFT

architecture,” IEEE Transactions on Circuits and Systems, vol. 53,

issue 6, pp. 1235-1243, 2006.

7. M. Petrov and M. Glesner, “Optimal FFT architecture selection for

OFDM receivers on FPGA,” in Proc. IEEE Intern. Conf. on Field

Programmable Technology, Singapore, 2005, pp. 313-314.

8. C. Sahnine, “Architecture of reconfigurable integrated circuit, very

high speed and low consumption for digital processing of the advanced

OFDM,” Ph.D. dissertation, Polytechnic Institute of Grenoble, France,2009.

9. B. Wang, Q. Zhang, T. Ao, and M. Huang, “Design of pipelined FFT processor based
on FPGA,” in Proc. 2nd Intern. Conf. on Computer Modeling and Simulation, Hainan,
2010, pp. 432-435.
10. T. Widhe, “Efficient implementation of FFT processing elements,” Ph.D. dissertation,
Linkö ping Studies in Science and Technology, Linkö ping University, Sweden, 1997.
11. Linkö ping University, Sweden, 1997. [11] M. Arioua, S. Belkouch, M. M. Hassani, and
M. Agdad, “Complex multiplication reduction in pipeline FFT architecture,” in Proc.
of 20th International Conference on Computer Theory and Applications, Alexandria,
Egypt, 2010.
12. M. Kannan and S. K. Srivasta, “Low power hardware implementation of high speed
FFT core,” Journal of Computer Science, vol. 3, issue 6, pp. 376-382, 2007.
13. S. Simard, J. G. Mailloux, and R. Beguenane, “Optimized FPGA mapping of a bit-serial
square root operator with minimum output delay,” International Review on
Computers and Software, vol. 2, issue 6, pp. 661-665, 2007.
14. M. A. Jaber, D. Massicotte, and Y. Achouri, “A higher radix FFT FPGA implementation
suitable for OFDM systems,” in Proc. The 18th IEEE International Conference on
Electronics Circuits and Systems, Beirut, Dec. 2011, pp. 744-747.
15. S. R. Talebiyan and S. Hosseini-Khayat, “Delay analysis of pipeline FFT processors,”
International Review on Computers and Software, vol. 4, issue 3, pp. 422–425,
2009.
16. M. Arioua, S. Belkouch, M. M. Hassani, and M. Agdad, “VHDL implementation of an
optimized 8-points FFT/IFFT processor in pipeline architecture for OFDM systems,”
in Proc. IEEE Intern. Conf. on Multimedia Computing and Systems, Ouarzazate,
Morocco, 2011.
17. Z. Dong, Y. M. Zhang, Z. P. Huang, G. L. Tang, and C. W. Liu, “Simulation and
application of FFT based pipelined stream,” in Proc. International Conference on
Information Engineering and Computer Science, Wuhan, China, 2009, pp. 1-4.
18. Y. Jung, H. Yoon, and J. Kim, “New efficient FFT algorithm and pipeline
implementation results for OFDM/DMT applications,” IEEE Transactions on
Consumer Electronics, vol. 49, no.1, pp.14-20, 2003.

Design and Implementation of Low Power Fft/Ifft Processor For Wireless Communication
No ratings yet
Design and Implementation of Low Power Fft/Ifft Processor For Wireless Communication
4 pages
Low-Power, High-Speed FFT Processor For MB-OFDM UWB Application
No ratings yet
Low-Power, High-Speed FFT Processor For MB-OFDM UWB Application
10 pages
Impact of DPU 2017
No ratings yet
Impact of DPU 2017
6 pages
Survey of FFT Algorithms in DSP
No ratings yet
Survey of FFT Algorithms in DSP
5 pages
Fpga Implementation of FFT Algorithm For Ieee 802.16E (Mobile Wimax)
No ratings yet
Fpga Implementation of FFT Algorithm For Ieee 802.16E (Mobile Wimax)
7 pages
Pipelined Parallel FFT Architecture
No ratings yet
Pipelined Parallel FFT Architecture
5 pages
FPGA-Based Fast Fourier Transform Design
No ratings yet
FPGA-Based Fast Fourier Transform Design
5 pages
A 128/512/1024/2048-Point Pipeline Fft/Ifft Architecture For Mobile Wimax
No ratings yet
A 128/512/1024/2048-Point Pipeline Fft/Ifft Architecture For Mobile Wimax
2 pages
Low Power FFT Architectures for OFDM
No ratings yet
Low Power FFT Architectures for OFDM
7 pages
Block Floating-Point FFT Hardware Design
No ratings yet
Block Floating-Point FFT Hardware Design
9 pages
FPGA-Based 1024-Point FFT Processor
No ratings yet
FPGA-Based 1024-Point FFT Processor
5 pages
Area-Efficient FFT/IFFT Processor Design
No ratings yet
Area-Efficient FFT/IFFT Processor Design
5 pages
OFDM FPGA Implementation Techniques
No ratings yet
OFDM FPGA Implementation Techniques
5 pages
VLSI Implementation of Pipelined Fast Fourier Transform
No ratings yet
VLSI Implementation of Pipelined Fast Fourier Transform
6 pages
VHDL Implementation of An Optimized 8-Point FFT - IFFT Processor in Pipeline Architecture For OFDM Systems
No ratings yet
VHDL Implementation of An Optimized 8-Point FFT - IFFT Processor in Pipeline Architecture For OFDM Systems
5 pages
IP FFT Processors For OFDM in FPGA (Http://bbwizard - Com)
No ratings yet
IP FFT Processors For OFDM in FPGA (Http://bbwizard - Com)
9 pages
VLSI FFT Architecture Using Radix-2
No ratings yet
VLSI FFT Architecture Using Radix-2
5 pages
High Speed Eight-Parallel Mixed-Radix FFT Processor For OFDM Systems
No ratings yet
High Speed Eight-Parallel Mixed-Radix FFT Processor For OFDM Systems
4 pages
Cached 64-Point FFT for OFDM Systems
No ratings yet
Cached 64-Point FFT for OFDM Systems
6 pages
Design of 16-Point Radix4 Fast Fourier Transform I
No ratings yet
Design of 16-Point Radix4 Fast Fourier Transform I
7 pages
Hardware-Efficient FFT Architectures
No ratings yet
Hardware-Efficient FFT Architectures
11 pages
FFT/IFFT Cores for OFDM Systems
No ratings yet
FFT/IFFT Cores for OFDM Systems
27 pages
FPGA Based FFT Algorithm Implementation in
No ratings yet
FPGA Based FFT Algorithm Implementation in
6 pages
Variable-Size FFT Hardware Accelerator
No ratings yet
Variable-Size FFT Hardware Accelerator
4 pages
Pipelined 128-Point FFT Design Project
No ratings yet
Pipelined 128-Point FFT Design Project
70 pages
1 - A Novel Area-Power Efficient Design For Approximated Small-Point FFT Architecture
No ratings yet
1 - A Novel Area-Power Efficient Design For Approximated Small-Point FFT Architecture
12 pages
Configurable Mixed-Radix FFT Architecture
No ratings yet
Configurable Mixed-Radix FFT Architecture
7 pages
CORDIC FFT Architecture for DSP
No ratings yet
CORDIC FFT Architecture for DSP
8 pages
FFT Architectures for UWB Systems
No ratings yet
FFT Architectures for UWB Systems
4 pages
An Efficient FPGA Architecture For Reconfigurable FFT Processor Incorporating An Integration of An Improved CORDIC and Radix-2 Algorithm
No ratings yet
An Efficient FPGA Architecture For Reconfigurable FFT Processor Incorporating An Integration of An Improved CORDIC and Radix-2 Algorithm
29 pages
(IJCST-V3I2P16) :harpreet Kaur
No ratings yet
(IJCST-V3I2P16) :harpreet Kaur
6 pages
VHDL-Based FFT Processor Design
No ratings yet
VHDL-Based FFT Processor Design
5 pages
A VLSI Array Processor For 16-Point FFT
No ratings yet
A VLSI Array Processor For 16-Point FFT
7 pages
Low Power 8-Parallel Radix-4 FFT Design
No ratings yet
Low Power 8-Parallel Radix-4 FFT Design
5 pages
Design and Simulation of 64 Point FFT Using Radix 4 Algorithm For FPGA Implementation
No ratings yet
Design and Simulation of 64 Point FFT Using Radix 4 Algorithm For FPGA Implementation
5 pages
Article 16
No ratings yet
Article 16
6 pages
Implementation of Efficient 64-Point FFT/IFFT Block For OFDM Transreciever of IEEE 802.11a
No ratings yet
Implementation of Efficient 64-Point FFT/IFFT Block For OFDM Transreciever of IEEE 802.11a
6 pages
CORDIC Based Implementation of Fast Fourier Transform: - CORDIC Is An Iterative Arithmetic Computing
No ratings yet
CORDIC Based Implementation of Fast Fourier Transform: - CORDIC Is An Iterative Arithmetic Computing
6 pages
VHDL Frequency Analyzer Design
No ratings yet
VHDL Frequency Analyzer Design
4 pages
Balanced Binary-Tree Decomposition For Area-Effici 241019 181404
No ratings yet
Balanced Binary-Tree Decomposition For Area-Effici 241019 181404
12 pages
Fourier Transform Architecture Overview
No ratings yet
Fourier Transform Architecture Overview
40 pages
Variable Length FFT Processor Design
No ratings yet
Variable Length FFT Processor Design
16 pages
Verilog FFT Implementation Overview
No ratings yet
Verilog FFT Implementation Overview
5 pages
SDF Radix-2 FFT Processor Design in VERILOG
No ratings yet
SDF Radix-2 FFT Processor Design in VERILOG
5 pages
2020 ISCAS A 128-Point Multi-Path SC FFT Architecture
No ratings yet
2020 ISCAS A 128-Point Multi-Path SC FFT Architecture
5 pages
LiU Tek Lic 2003 23 W - Li
No ratings yet
LiU Tek Lic 2003 23 W - Li
120 pages
High-Throughput Multibank FFT Engine
No ratings yet
High-Throughput Multibank FFT Engine
13 pages
32-Point FFT Design with Mixed Radix
No ratings yet
32-Point FFT Design with Mixed Radix
11 pages
Low Power High Performance FFT Radices
No ratings yet
Low Power High Performance FFT Radices
8 pages
Efficient Low Multiplier Cost 256-Point FFT Design With Radix-2 SDF Architecture
No ratings yet
Efficient Low Multiplier Cost 256-Point FFT Design With Radix-2 SDF Architecture
14 pages
Design and Simulation of 32-Point FFT Using Radix-2 Algorithm For FPGA 2012
No ratings yet
Design and Simulation of 32-Point FFT Using Radix-2 Algorithm For FPGA 2012
5 pages
High-Speed 64-Point FFT Processor Design
No ratings yet
High-Speed 64-Point FFT Processor Design
6 pages
Reconfigurable Radix-22 FFT Processor
No ratings yet
Reconfigurable Radix-22 FFT Processor
8 pages
64-Point FFT/IFFT Processor Design
No ratings yet
64-Point FFT/IFFT Processor Design
5 pages
64-Point IFFT Module for 802.11a OFDM
No ratings yet
64-Point IFFT Module for 802.11a OFDM
34 pages
Ieee Programmable Dataflow Accelerators A 5G OFDM ModulationDemodulation Case Study Icassp40776.2020.9053796
No ratings yet
Ieee Programmable Dataflow Accelerators A 5G OFDM ModulationDemodulation Case Study Icassp40776.2020.9053796
5 pages
Software Optimization of Dfts and Idfts Using The Starcore Sc3850 DSP Core
No ratings yet
Software Optimization of Dfts and Idfts Using The Starcore Sc3850 DSP Core
86 pages
Survey of Pipelined FFT Architectures
No ratings yet
Survey of Pipelined FFT Architectures
20 pages
Survey of Pipelined FFT Architectures
No ratings yet
Survey of Pipelined FFT Architectures
20 pages
Grade 7 Weekly Learning Plan: Week 2
No ratings yet
Grade 7 Weekly Learning Plan: Week 2
11 pages
Inventory Management System for EEU
No ratings yet
Inventory Management System for EEU
74 pages
Sentiment Analysis Mini-Project Report
75% (4)
Sentiment Analysis Mini-Project Report
45 pages
VLSI Physical Design Engineer Profile
No ratings yet
VLSI Physical Design Engineer Profile
2 pages
Et Al - Illustrated English Dictionary-Usborne (2012)
100% (6)
Et Al - Illustrated English Dictionary-Usborne (2012)
292 pages
0-1 Knapsack Problem Implementation
No ratings yet
0-1 Knapsack Problem Implementation
7 pages
Understanding Time Clauses in English
No ratings yet
Understanding Time Clauses in English
4 pages
Culturas de los Estados Unidos: Examen 2022
No ratings yet
Culturas de los Estados Unidos: Examen 2022
9 pages
Superlative Adjectives Lesson Plan
No ratings yet
Superlative Adjectives Lesson Plan
3 pages
Micro-Benchmarking BPMN 2.0 Workflows
No ratings yet
Micro-Benchmarking BPMN 2.0 Workflows
16 pages
Simulado de Inglês para o Quinto Ano
No ratings yet
Simulado de Inglês para o Quinto Ano
8 pages
B.Tech Electrical Engineering Syllabus
No ratings yet
B.Tech Electrical Engineering Syllabus
2 pages
Phrase Edit Guide for rekordbox
No ratings yet
Phrase Edit Guide for rekordbox
9 pages
Class 9 SOF Mathematics Olympiad Paper
No ratings yet
Class 9 SOF Mathematics Olympiad Paper
7 pages
Writing an Argument Analysis Guide
No ratings yet
Writing an Argument Analysis Guide
10 pages
Effective Communication in Mediation
0% (3)
Effective Communication in Mediation
3 pages
A Level Media Studies Year 2 Guide
No ratings yet
A Level Media Studies Year 2 Guide
20 pages
Common Exam Mistakes in IT Topics
No ratings yet
Common Exam Mistakes in IT Topics
10 pages
Unusual Friends in Language Arts
No ratings yet
Unusual Friends in Language Arts
15 pages
AARI Process Setup and Bot Development Guide
0% (1)
AARI Process Setup and Bot Development Guide
6 pages
Combinational Logic Design Overview
100% (1)
Combinational Logic Design Overview
145 pages
Likelihoods in Cosmological Perturbation Theory
No ratings yet
Likelihoods in Cosmological Perturbation Theory
35 pages
Class 12 Audiobook Lesson Plan
No ratings yet
Class 12 Audiobook Lesson Plan
1 page
Compiler Design Lab Manual Overview
No ratings yet
Compiler Design Lab Manual Overview
71 pages
Diversity in Second Language Acquisition
No ratings yet
Diversity in Second Language Acquisition
11 pages
Full Text 01
No ratings yet
Full Text 01
39 pages
Facial Expression Reading Test Insights
No ratings yet
Facial Expression Reading Test Insights
16 pages
Embedded C Program for Irrigation System
100% (1)
Embedded C Program for Irrigation System
7 pages
Danantara Brand Guidelines Overview
No ratings yet
Danantara Brand Guidelines Overview
8 pages
B.Sc. Mathematics Syllabus Overview
No ratings yet
B.Sc. Mathematics Syllabus Overview
10 pages