Altera Crest Factor Reudction AN475
Altera Crest Factor Reudction AN475
OFDMA Systems
Introduction Crest factor reduction (CFR) is a technique for reducing the peak-to-
average ratio (PAR) of an orthogonal frequency division multiplexing
(OFDM) waveform. An OFDM signal is made up in the frequency
domain as a set of orthogonal carriers that are each modulated by a
constellation symbol. The main disadvantage of OFDM modulation is
that the time domain representation approximates a Gaussian
distribution and therefore exhibits large envelope variations. Power
amplifiers usually only have a limited linear region, making it necessary
to back off the amplifier so that the transmit signal never drives the
amplifier into the non-linear region, which causes spectral regrowth and
inefficient amplifier power.
By reducing the PAR of the input signal, you can achieve greater power
efficiency from the amplifier or use a less expensive power amplifier with
a more limited linear region. The CFR module reduces the PAR of the
signal and ultimately reduces the overall basestation cost or power
requirements.
Altera Corporation 1
AN-475-1.0 Preliminary
Crest Factor Reduction for OFDMA Systems
CFR Algorithm There are many different techniques to achieve CFR. This document
implements a technique that is well suited to wireless systems that use
OFDM- or OFDMA-based modulation such as WiMAX and 3G LTE. The
algorithm used is based on a modified version of the algorithm in
Constrained Clipping for Crest Factor Reduction in Multiple-user OFDM [1].
The algorithm offers the following key advantages:
In/Outband IFFT CP
Reference
Processor LN Insertion
The system accepts data at the input as OFDM(A) symbols (of length N
carriers) in the frequency domain. The first process is to upconvert the
data by a factor L = 4 using perfect frequency domain interpolation (zero
padding). The technique involves performing CFR at a higher sampling
frequency because there is peak growth associated with upconversion. In
addition, a higher sampling frequency spreads the non-linear distortions
introduced by subsequent blocks across a larger bandwidth.
2 Altera Corporation
Preliminary
CFR Algorithm
Figure 2 shows the operation of the polar clipping block. All samples
bounded by the unit circle scaled by AMAX pass through the block
unchanged, and those outside of the unit circle are scaled back onto the
circle. Optimal operation of the CFR module requires that the value of
AMAX is set accordingly.
Input
Output
AMAX
The purpose of the inband processor is to ensure that the overall EVM
does not exceed a specified limit. The EVM is defined as the square root
of the mean error power divided by the square of the maximum
constellation magnitude (SMAX).
Altera Corporation 3
Preliminary
Crest Factor Reduction for OFDMA Systems
(1) N–1
1-⎞
⎛ ---
⎝ N⎠ ∑ k
E 2
EVM = k=0
-------------------------------
-
( S MAX ) 2
Highest Order
SMAX
Modulation Scheme
QPSK 1
16QAM 18
------
10
64QAM 98
------
42
5
SMAX
3
1
-7 -5 -3 -1 1 3 5 7 I
-1
-3
-5
-7
(2)
7 -⎞ 2 ⎛ ---------
⎛ --------- 7 -⎞ 2 98
S MAX = ⎝ 42⎠ + ⎝ 42⎠ = ------
42
4 Altera Corporation
Preliminary
CFR Algorithm
If the calculated error (Ek) power for a sample does not exceed the
specified EVM threshold, simply output the clipped sample. If the error
power for a sample is greater than or equal to the square of the product of
the specified EVM threshold and SMAX, output the reference signal plus a
small error signal. This error signal has magnitude EVM threshold
multiplied by SMAX and a phase that is equal to the phase of the original
error, namely:
(3)
EVM threshold × S MAX × e j ∠E k
Figure 4 shows an example where the magnitude of the error between the
input sample and the reference sample is greater than the specified
threshold and the resulting output sample.
Input
Output
Reference
EVMthresh × SMAX
Altera Corporation 5
Preliminary
Crest Factor Reduction for OFDMA Systems
Hardware This section summarizes the key processing blocks in the CFR design.
Architecture
Zeropad
The zeropad block performs frequency domain interpolation. Figure 5
shows the interpolation is an operation that fills the input FFT frame with
zeros.
After N/2 cycles, zeros are passed to the output of the zero pad block
while the incoming data continues to write to the memory. A sequence
generator determines when sufficient zeros have been passed to the
output and configures a multiplexer to redirect samples from the memory
to the output. Figure 6 summarizes these actions.
Write
Address
Generator Memory
Control
Read
Address
Generator
0
Source
Selector
6 Altera Corporation
Preliminary
Hardware Architecture
f For details on the Altera FFT MegaCore function, refer to the FFT
MegaCore Function User Guide.
With CFR, restricting the natural bitgrowth allows for reduction in the
resource requirements. Mathematical analysis of the architecture shows
that the worst case bitgrowth for the algorithm is 2.5 bits per stage. For
example, for a 4K FFT with 6 stages, a 16 bit input results in 31 bits at the
output. Because the distribution of the input data and resulting output
data is known, the algorithm can determine the actual worst case
bitgrowth for OFDMA stimulus. When the internal bitgrowth has
reached this ceiling, you can restrict any further word length extension
through the arithmetic elements. The result is a savings in overall logic
and increased frequency of operation.
Altera Corporation 7
Preliminary
Crest Factor Reduction for OFDMA Systems
The third advantage is that the size of the twiddle memory look up tables
is reduced by using symmetry and redundancy in the reference signal.
However, there is a trade off between memory and logic when the size of
the twiddle table is small; sometimes it is more efficient to store all values
rather than use redundancy because of the quantization of the memory in
the device.
Polar Clipping
The polar clipping block is achieved very easily in hardware, as shown by
Figure 8. This block takes advantage of the coordinate rotation digital
computer (CORDIC) algorithm [3][4]. CORDIC is an iterative algorithm
that performs complex and trigonometric operations. For example, in the
CFR design the CORDIC algorithm converts cartesian coordinates to
polar coordinates and vice versa. The CORDIC algorithm works well in
FPGA implementations because it can be fully unrolled.
Real R R Real
CORDIC CORDIC
Imaginary θ θ Imaginary
8 Altera Corporation
Preliminary
Hardware Architecture
Outband Processor
The outband processor is similar to the polar clipping block. Each
frequency sample in the out of band region is compared with the
magnitude of the spectral mask at that point. If the sample exceeds the
magnitude, the sample is reduced down to the threshold. Hence, the only
difference between the outband processor and the polar clipping block is
that the magnitude threshold is variable and implemented as a memory
that contains the spectral mask magnitude for each point. Figure 9 shows
the outband processor architecture.
Mask
>
Real R R Real
CORDIC CORDIC
Imaginary θ θ Imaginary
Inband Processor
The inband processor calculates the magnitude of the error between the
clipped signal and the clean reference signal. If this error is greater than
the EVM threshold times SMAX, the clean signal plus the maximum
tolerable error magnitude is passed to the output. Otherwise, the signal is
passed on unchanged. Figure 10 shows the inband processor architecture.
Altera Corporation 9
Preliminary
Crest Factor Reduction for OFDMA Systems
R R
Error
Real/Imaginary
CORDIC CORDIC
θ θ
>
Reference Block
The reference block is designed to synchronize the clean and clipped
signals so that the combined inband/outband processor works properly.
The latency associated with the clipped signal path passed to the
inband/outband processor is 2 × L × N clocks plus additional pipeline
delays. As a result, you need a FIFO that can store the clean reference
signal. In addition, this block generates the spectral mask that is used by
the inband/outband processor. Because the spectral mask is symmetrical
about L × N / 2, only half of the mask is stored and is automatically
reflected and repeated by the control logic.
Further Latency To further reduce latency, increase the level of parallelism in the CFR
design. While rearchitecting the CFR design to process in parallel requires
Reduction more resources, the scalability, high density, and DSP and memory
Techniques capability of FPGAs make rearchitecting viable.
10 Altera Corporation
Preliminary
Further Latency Reduction Techniques
buffer. This penalty is the minimum time required before you can fully
utilize the read ports of the memory for the configuration where the
oversampling factor L = 4.
You can modify the FFT blocks to process two samples per clock cycle.
The latency of the current FFT cores is N – 1 clock cycles plus pipelining.
You can decompose the required transform into two smaller transforms
of length N/2 that are combined by using the decimate in time (DIT) and
decimate in frequency (DIF) techniques. These modifications result in a
total FFT processing latency of approximately N/2 + pipelining for an N
point transform.
For example, for the FFTs that have natural order addressing inputs, use
the decimate in time algorithm — pass the even samples to one core, and
the odd samples to another core. Two transforms are performed on both
the even and odd samples, before a final recombination stage is used, as
summarized by the following decimation in time FFT simplification:
(5) N N
---- – 1 ---- – 1
2 2
X( k) = ∑ x [ 2n ]W 2n + ∑ x [ 2n + 1 ]W 2n + 1
n=0 n=0
(6) N N
---- – 1 ---- – 1
2 2
X( k) = ∑ x [ 2n ]W 2n + W ∑ x [ 2n + 1 ]W 2n
n=0 n=0
(7) – j2πk
⎛ --------------
N
- ⎞
X ( k ) = EvenFFT + ⎜ e × OddFFT⎟
⎝ ⎠
Figure 11 shows the parallel FFT architecture for natural order inputs.
Altera Corporation 11
Preliminary
Crest Factor Reduction for OFDMA Systems
The output addressing is still bit reversed. One output conveys samples
in the range 0 ... (N/2) – 1 (the lower samples) and the other output
conveys the samples in the range N/2 ... N – 1 (the upper samples). The
subsequent processing blocks must consider this bit-reversed addressing.
Figure 12 shows a similar decomposition for the bit reversed input case.
In bit reversed input case, the decimate in frequency technique is more
appropriate to allow compatibility between cascaded FFTs.
Finally, at the end of the processing chain, the guard interval insertion
block needs to merge the two streams together and pass them off to the
correct antenna. This block requires an element of buffering and is the
only block that needs a complete redesign.
12 Altera Corporation
Preliminary
System Integration
At the output of the CFR, the symbols are in the time domain and have
been interpolated by a factor of four. The CFR uses an intermediate
frequency of 61.44 Msps, so requires no further upconversion. If an IF
sampling frequency of 122.88 Msps is required, just one single stage of
interpolation by two filters are required. As a result of the CFR module
processing multiple antennas in a time-multiplexed fashion, the signals
need to demultiplexed so they can be mixed with the appropriate carrier
frequency. This demultiplexing requires external buffering to align the
three symbols that are all associated with the same time instant.
I1
Q1
I2 Demux
Q2 CFR and
Align
I3
Q3
NCO
Altera Corporation 13
Preliminary
Crest Factor Reduction for OFDMA Systems
(8) f clk
Antennas = -----------------
-
f sbb × L
14 Altera Corporation
Preliminary
System Integration
MAC/PHY Interface
Interleaving Deinterleaving
Channel Estimation
and Equalization To MAC
Subchannelization
Pilot Insertion Desubchannelization
OFDMA OFDMA Ranging
Pilot Extraction
Symbol-Level
Processing
IFFT FFT
Remove
Cyclic Prefix
Cyclic Prefix
DUC DDC
CFR
Digital IF From ADC
Processing
DPD
To DAC
Altera Corporation 15
Preliminary
Crest Factor Reduction for OFDMA Systems
■ Windows XP SP2
■ MATLAB version R2006B
■ MATLAB Signal Processing Toolbox/Blockset
■ MATLAB Fixed Point Toolbox/Blockset
■ Quartus® II version 7.2
f For a copy of the reference design, please contact your Altera sales
representative.
docs
Contains all documentation for the reference design
matlab
Contains MATLAB fixed point model
rtl
Contains VHDL and Quartus project
16 Altera Corporation
Preliminary
Getting Started
To use the MATLAB fixed point model, set the paths so that MATLAB can
find all of the components by performing the following steps:
Table 4 and Table 5 show the CFR module input and output variables.
For an example of some test data and an example of the CFR module
running, open the script cfr_example.m and step through the code. You
can type help <function name> for command line help.
Altera Corporation 17
Preliminary
Crest Factor Reduction for OFDMA Systems
Parameter Override
After validating the arguments, the function populates the configuration
structure for each parameter not defined at the input. To see the
structure’s fields and default values, run cfr_configuration without any
input arguments or type help cfr_configuration at the command
line.
18 Altera Corporation
Preliminary
Getting Started
Name Description
Config.debug_mode When set to zero, disables extra messages output to the console.
When set to one, enables extra messages output to the console.
Config.plot_figures When set to zero, disables figure plotting capability for each input symbol.
When set to one, enables figure plotting capability for each input symbol.
Config.sample_order(1) When set to zero, the frequency domain input vector is in natural (0 to N – 1)
order.
When set to one, the frequency domain input vector is in DC centered (–N/2 to
N/2 – 1) order.
Config.sim_id String that identifies a simulation run
Note to Table 6:
(1) Internally, the algorithm operates in natural order. Any figures plotted adopt natural ordering regardless of
the ordering at the input.
Altera Corporation 19
Preliminary
Crest Factor Reduction for OFDMA Systems
Name Description
Config.N_FFT The FFT size associated with the basestation deployment. For WiMAX, this
is typically 128, 512, 1024 or 2048.
Config.oversample_factor The value of L. Altera recommends always using a value of four.
Config.evm_threshold The per symbol EVM that the output symbol must not exceed. For WiMAX
64QAM, the specification gives a total basestation budget of 3% EVM.
Typically, the CFR is allowed between 25% and 75% of this total budget, so
this value needs to be in the range 0.25 × 0.03 to 0.75 × 0.03.
Config.Amax The magnitude at which the resulting signal is clipped, after conversion to the
time domain. For constellation symbols that are normalized according to the
WiMAX/LTE specifications, a sensible value for this parameter is in the range
1.0/(Config.N_FFT × Config.oversample_factor)1/2 to
1.4/(Config.N_FFT × Config.oversample_factor)1/2.
Config.Smax The maximum magnitude of the highest order modulation scheme. For more
information, see Figure 3.
Config.spectral_mask_dB A vector of magnitude values that define the y vertices of the spectral mask.
Note that the inband region is defined by the region where the spectral mask
is equal to 0dB.
Config.spectral_mask_f A vector of frequency values that define the x-axis vertices of the spectral
mask. The two spectral mask vectors need to be the same length and the
frequency vector should be in ascending order. The configuration script
interpolates the two vectors to determine the allowed radiation at each
frequency bin.
Config.bandwidth The bandwidth associated with all N_FFT frequency bins, used by the
spectral mask generation process to normalize the result.
Config.guard_interval The guard interval associated with the OFDM system. Set to zero to disable,
or a fraction such as 1/32, 1/16, 1/8 or 1/4.
20 Altera Corporation
Preliminary
Getting Started
-25
-32
-50
f0 AB C D MHz
A B C D
Frequency (MHz) 4.75 5.45 9.75 14.75
Altera Corporation 21
Preliminary
Crest Factor Reduction for OFDMA Systems
Name Description
Config.rtl_testbench_capture When set to zero, disables testbench capture feature.
When set to one, enables testbench capture feature.
For the following source and sink block interfaces, zero indicates floating
point mode and one indicates fixed point mode:
■ Config.cfr_sink_mode
■ Config.ifft1_sink_mode
■ Config.ifft1_source_mode
■ Config.polarclip_sink_mode
■ Config.polarclip_source_mode
■ Config.fft_sink_mode
■ Config.fft_source_mode
■ Config.inoutproc_sink_mode
■ Config.inoutproc_source_mode
■ Config.ifft2_sink_mode
■ Config.ifft2_source_mode
■ Config.cfr_source_mode
■ Config.cfr_sink_type
■ Config.ifft1_sink_type
22 Altera Corporation
Preliminary
Getting Started
■ Config.ifft1_source_type
■ Config.polarclip_sink_type
■ Config.polarclip_source_type
■ Config.fft_sink_type
■ Config.fft_source_type
■ Config.inoutproc_sink_type
■ Config.inoutproc_source_type
■ Config.ifft2_sink_type
■ Config.ifft2_source_type
■ Config.cfr_source_type
Performance Measurement
Table 9 shows the main parameters that determine CFR algorithm
performance.
Name Description
AMAX The lower the number for AMAX, the more aggressive the clipping. As the intensity of the
clipping increases, the greater the outband spectral regrowth and distortion introduced to
the constellation points.
EVM threshold If a high EVM budget is assigned to the CFR algorithm, less of the distorted constellation
points need correcting after clipping. This minimizes the peak regrowth associated with the
correction applied by the inband processor and in turn increases the PAR reduction
capability of the algorithm.
Spectral mask An aggressive spectral mask results in a high level of correction required in the out of band
region, resulting in greater peak regrowth.
Generally, the EVM threshold and spectral mask are related to the
specification of the system. You can determine the optimal value for AMAX
for a given operating mode and data dynamic range with Monte Carlo
simulation.
Altera Corporation 23
Preliminary
Crest Factor Reduction for OFDMA Systems
(usually 10-4), you can compare the PAR of an OFDM symbol that has
been compressed by a CFR algorithm with the OFDM symbol that has not
been compressed.
Figure 18 shows an example CCDF curve for the WiMAX 1K mode where
the EVM threshold is equal to 75% of the EVM budget specified in the
specification. The black curve shows that the PAR of the input OFDM
signal exceeds 12.3dB for only one out of ten thousand symbols. The other
curves on the graph show the output PAR for different values of AMAX. At
a probability of 10-4, the output PAR is approximately 3.7dB less than the
OFDM case for the optimal value of AMAX (for this case).
2
10
3
10
4
10
4 5 6 7 8 9 10 11 12 13
(dB)
24 Altera Corporation
Preliminary
Getting Started
for Config.Amax =
[0.8:0.1:1.6]./sqrt(Config.N_FFT*Config.oversample_factor)
for i=1:1e6 % number of symbols
% Get an OFDMA symbol somehow, and store it in the variable symbol
[output, Results] = cfr(symbol, Config, 1);
% Do some processing on the results
% Store results for later so we can generate a CCDF
end
end
RTL Simulation
You can simulate the CFR module using Modelsim. The reference design
provides a script to compile all of the VHDL files, and run a sample
testbench to demonstrate the functionality. To invoke the simulation, start
Modelsim and change the current directory to <installation
directory>\rtl\cfr_<x>k_tb, where x is the design you wish to simulate.
From the Tools menu, select Execute Macro, and then select the
Control.do script.
Altera Corporation 25
Preliminary
Crest Factor Reduction for OFDMA Systems
Synthesis Results
The <installation directory>\rtl directory contains a Quartus II project you
can use to synthesize the design. Table 10 shows the resources and speeds
generated for a 10 MHz bandwidth design using the push button flow.
The design was synthesized using Quartus II 7.2 software targeting the
Stratix III EP3SE50F780C3 device.
Design
Combinational Logic MATLAB Memory
Target M-ALUT M9K M144K 18x18
ALUTs Registers Bits Bits
(MHz)
280 16,637 28,561 18,382 976,269 972 133 0 114
v zv
Frequenct Domain c
Input Interface xr
xi zc
26 Altera Corporation
Preliminary
Interface Specification
Altera Corporation 27
Preliminary
Crest Factor Reduction for OFDMA Systems
Data Throughput
At the sink interface, you must present an entire FFT frame of N samples
in N clock cycles. The frame represents frequency domain data ordered
naturally, that is, 0 to N – 1. Given this input frame size, the output
consists of L × N × (1 + CP) samples where L = 4 and CP is a fraction that
represents the guard interval size. As a result, only one frame of N
samples is presented to the sink of the block every L × N × (1 + CP) cycles.
If this rule is violated, corruption of the data occurs.
To achieve maximum throughput (that is, to fully utilize the output bus)
and maximum efficiency, observe the timing diagram shown in Figure 20.
If data is presented at the sink at intervals greater than L × N(1 + CP), the
source interface is not fully utilized.
Finally, this system does not self flush its pipeline. The hardware relies on
subsequent data frames to continue processing the data that already
exists in the pipeline. As a result, once the final frame has been presented,
present an additional dummy frame of zeros to fully flush the pipeline.
28 Altera Corporation
Preliminary
References
Document Table 11 shows the revision history for this application note.
Revision History
Copyright © 2007 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company,
the stylized Altera logo, specific device designations, and all other words and logos that are identified as
trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera
Corporation in the U.S. and other countries. All other product or service names are the property of their re-
spective holders. Altera products are protected under numerous U.S. and foreign patents and pending
101 Innovation Drive applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products
San Jose, CA 95134 to current specifications in accordance with Altera's standard warranty, but reserves the right to make chang-
www.altera.com es to any products and services at any time without notice. Altera assumes no responsibility or liability
arising out of the application or use of any information, product, or service described
Technical Support: herein except as expressly agreed to in writing by Altera Corporation. Altera customers
www.altera.com/support/ are advised to obtain the latest version of device specifications before relying on any pub-
Literature Services: lished information and before placing orders for products or services.
Altera Corporation 29
Preliminary