0% found this document useful (0 votes)
43 views12 pages

Autoscaling Radix-4 FFT For TMS320C6000

Uploaded by

Masoud Jabbari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
43 views12 pages

Autoscaling Radix-4 FFT For TMS320C6000

Uploaded by

Masoud Jabbari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
we TEXAS Application Report INSTRUMENTS ‘SPRA654- March 2000 Autoscaling Radix-4 FFT for TMS320C6000™ Yao-Ting Cheng Taiwan Semiconductor Sales & Marketing ABSTRACT Fixed-point digital signal processors (DSPs) have limited dynamic range to deal with digital data. This application report proposes a scheme to test and scale the result output from each Fast Fourier Transform (FFT) stage in order to fix the accumulation overflow. The radix-4 FFT algorithm is selected since it provides fewer stages than radix-2 algorithm. Thus, the scaling operations are minimized. This application report is organized as follows: © Basics of FFT ‘© Multiplication and addition overflow ‘© Algorithm to test bit growth and scaling the result ‘* Implementation by C and Linear Assembly on the C6000 DSP © List of the codes Contents 1 FFT (Fast Fourier Transform) . 2 2 Multiplication and Additions Overflow 3 3 Bit-Growth Detection and Scaling Algorithm 5 4 Example 1 - Main Program . 6 5 Example 2- Autoscaling Radix-4 FFT With C6000 C Intrinsics ...... 7 6 Example 3 - Autoscaling Radix-4 FFT With C6000 Linear Assembly . 8 7 References . M1 List of Figures Figure 1. Radix-2 FFT forN=B ........ceee coe t ete teeettteeeees seve 2 Figure 2. Radix-4 Butterfly ..... Sbooenecboseoseoaced , Snoopsuncaboneaoce) ‘TM$32006000 is a trademark of Texas Instruments, Texas SPRAG54 INSTRUMENTS, 1 FFT (Fast Fourier Transform) Many applications require the processing of signals in the digital world, digital signal processing. Because we may need to process a signal based on its frequency characteristics, there is a need to reformat the signal. The Discrete Fourier Transform (DFT) is one of the ways to convert the signal from time domain to frequency domain. DFT is a discrete version of Fourier Transform and is very computable by the modern microprocessor. The DFT equation is listed below: Net Xk) = > x(n)WAr,K = Oto N— 1 where W, = eR! ‘a Many calculations are needed. There are N2 complex multiplications and N2 complex additions for an N-point DFT. One of the algorithms that can reduce dramatically the number of computations is the radix-2 FFT, which takes advantage of the periodicity of the Twiddle Factor Wy: For example, if n=N, then wt = wat The radix-2 FFT equation is listed below: en) = = Rtk Ney Mo XW = 2 = S [xo +(-h{n+ wit : =o The radix-2 FFT equation simply divides the DFT into two smaller DFTs. Each of the smaller DFTs is then further divided into smaller ones and so on (see Figure 1). It consists of logaN ‘stages and each stage consists of N/2 butterflies. Each butterfly consists of two additions for the input data and one multiplication to the twiddle factor. x(0) xo) x(1) x(a) x@) xe) x(3) x(6) x(4) x1) x65) x6) (6) X(3), x(7) x7) Stage 1 Stage 2 Stage 3 Figure 1. Radix-2 FFT for N=8 Autoscaling Radix-4 FFT for TMS320C6000" 9 Texas INSTRUMENTS SPRAG54 The other popular algorithm is the radix-4 FFT, which is even more efficient than the radix-2 FFT. The radix-4 FFT equation is listed below: xk) = S [ xen-+ (atari) + aytaln + 8) + @ha(n ll wy a The radix-4 FFT equation essentially combines two stages of a radix-2 FFT into one, so that half as many stages are required (see Figure 2). Since the radix-4 FFT requires fewer stages and butterflies than the radix 2 FFT, the computations of FFT can be further improved. For example, to calculate a 16-point FFT, the radix-2 takes logo 16=4 stages but the radix-4 takes only log416=2 stages. Next, we discuss the numerical issue that arises from a finite length problem. Most people use a fixed-point DSP to perform the calculation in their embedded system because the fixed-point DSP is highly programmable and is cost efficient. The drawback is that the fixed-point DSP has limited dynamic range, which is worsened by the summation overtiow problem that occurs all the time in FFT. A scheme is needed to overcome this issue. Figure 2. Radix-4 Butterfly 2 Multiplication and Additions Overflow FFT is nothing but a bundle of multiplications and summations which may overflow during multiplication and addition. This application report adopts the radix-4 algorithm developed by C. S. Burrus and T. W. Parks to explain how to solve these two kinds of overflow on a C6000 DSP. The radix-4 FFT C equivalent program is listed below: void radix4(int n, short x{J, short w{]) { int nly n2, ie, ial, ia2, ia3, 10, i1, 12, 13, 4, ki short t, rl, r2, 81, 82, col, co2, co3, sil, si2, $i3; n2 =n; for (k =n; k > 1; k >>= 2) ( // number of stages Al = n2; ‘Autoscaling Radix-4 FFT for TMS320C6000" 3 Texas SPRAG54 INSTRUMENTS, 3 < nay 344) // number of butterflies fal + ialy // per stage da2 + dal; wal ©2441); woial * 21; co2 = w(ia2 * 2+ 1); 812 = w[ia2 * 2]; co3 = wliad ¥ 2+ 11; 8i3 = wlia3 * 2); dal = jal + de; sil for (0 = 3; 10 > 15; x(2 * i2 + 1] = (81 * cod ~ rl * si2y>> 15; ba x(2* it +1) - x12 * 43415 rl = 22+; r2=32-t; t= x(2* 41] - «(2 * 43); sl=s2-t; 82 = s2 +t; x(2* i] = (rl * col + sl * sil) >> 15; x(2 * il +1] = (s1 * col ~ rl * sily>> x[2 * 43] = (x2 * co} + s2* 8i3) >> x(2 * 43 + 1] = (52 * cod ~ 22 * si3)>> de << 2; To deal with the multiplication overflow, we need to interpret all input samples and twiddle factors, Wx, as fractional numbers because a fractional number times a fractional number is. always less than or equal to one. For the C6000 DSP, the largest 16-bit positive fractional binary number is 0.111 1111 1111 1111, which is mapped as 32767 in integer domain (or Ox7FFF in hexadecimal). The smallest negative number is 1.000 0000 0000 0000, which is noted as 32768 in integer (or 0x8000 in hexadecimal). The only exception that multiplication still occurs is —1 times —1; the result of which should be equal to positive 1. However, we have the largest positive number 0.111 1111 1111 1111, which is very close to one but not precisely the perfect 1. The 4 Autoscaling Radix-4 FFT for TMS320C6000" 9 Texas INSTRUMENTS SPRAG54 C6000 DSP provides Saturation Multiplication instructions such as SMPY that can fix this problem. The second overflow comes from additions. According to the algorithm listed above, up to five additions are needed to calculate the output. For example, one of the FFT output data is calculated as x(2 * i] = rl * col + s1* sil (22 +) * col + (82 - t) * sit = (e2 + (x[2"i141] = x[243411)) * col + (52 ~ (x(2*i2] ~ x(2*43])) * 812. Itcan contribute up to a 3-bit growth within the butterflies. The easiest way to fix itis to scale down the input samples 3 bits at each stage. Somehow, it costs a lot of dynamic range. The other way to fix itis to detect if the bit grows at the output of each stage. Then, scale down the result based on how many bits have grown before feeding the result into the next stage. 3 Bit-Growth Detection and Scaling Algorithm The C6000 DSP provides the instruction NORM that can help detect how many bits grow after each addition. For example, assume the content of the 32-bit register A1 is 0000 0000 0000 0000 0010 0010 1100 1111. After performing the NORM operation such as NORM.L1 Al, A2 The At will stay unchanged and A2 will be 17, which is simply the number of non-redundant sign bits as shown with double underscore. If the content of A1 grows 3 bits as 0000 0000 0000 0901 0010 0010 0011 0011, the result of NORM will be 14 because the non-redundant sign bit is decreased by 3 bits. Once we have the number of bit-growth, we can properly scale down the result by right-shifting the content of the register. One more issue to be considered is the input data format. Generally, the Q15 number is adopted for most of the system. It means that there is one sign bit in the most significant bit (MSB) for 16-bit data such as S.XXX XXXX XXXX XXXX, where S is the sign bit. To prevent addition overflow for radix-4 FFT, we need three guard bits; therefore, the data should be Q12, such as SSSS. XXXX XXXX XXXX. Itis a reasonable approach since the resolution of most of the analog-to-digital converters is less than or equal to 12 bits. The result returned by the NORM instruction for Q12 data is therefore 19. The algorithm is summarized below: Step 1: Input data should be in the format of Q12 to gain three guard bits. Set exp = 19, which is the number of non-redundant sign bits of Q12 data. Step 2: At the end of each butterfly calculation, take the test of bit growth and record the maximum as follows: Af ((exp_temp = _norm(x[k])) < 19) Af (exp_temp < exp) exp = exp_temp; Step 3: At the end of each stage, test to see if FFT is not in the last stage. There is no need to scale the last output. Then, test if the bit grows. If it does, scale down the output back to O12. if (!ast_stage) { if (exp < 19) ( for (10; i¢(2*N); i++) X{1]>>=(18-exp); ‘Autoscaling Radix-4 FFT for TMS320C6000" 5 SPRAG54 Texas INSTRUMENTS. scale += (19-exp); exp - 18; ) Example programs are listed below. Example 1 is the main program that provides the input samples and the twiddle factors for 16-point FFT. Example 2 is the autoscaling radix-4 FFT implemented in C with C6000 intrinsics. Example 3 is the FFT subroutine implemented with C6000 linear assembly. 4 Example 1 - Main Program #define 912 SCALE 8 extern int r4_fft (short, short x(32]={ 0, short*, 4617/912_SCALE, 9118/012_SCALE, 13389/Q12_SCALE, 17324/Q12_SCALE, 20825/Q12_SCALE, 23804/Q12_SCALE, 26187/012_SCALE, 27914/912_SCALE, 28941/Q12_SCALE, 29242/012_SCALE, 28811/912_SCALE, 27658/Q12_SCALE, 25811/012_SCALE, 23318/012_SCALE, 20241/012_SCALE, short w(32]={ 0, 12540, 23170, 30274, 32767, 30274, 23170, 12540, 0, 12540, 23170, 30274, 32767, -30274, -23170, 12540, short index{16] 32767, 30274, 23170, 12540, 0, 12540, 23170, 30274, 32767, -30274, 23170, -12540, o, 12540, 23170, 30274 4, 8, 5, 9, 6, 10, 7, iy short y(32]; // outputs main() 6 © Autoscaling Radix-4 FFT for TMS320C6000" short*); 9, // input samples °, // Scale the data from Q15 to Ql2 o, °, °, °, °, 9, °, °, 9, °, °, °, °, oo // Twiddle Factors Jf 32768*sin(2P1*n/N), 32768*cos (2PI*n/N) he 12, // index for 16-points digit reverse 13, 14, 150 9 Texas INSTRUMENTS SPRAG54 int n=16; int i; int scale; scale = r4_££t(n,x,w); for(i=0; den; itt) { y(2"i] = xlindex{i]*21; y(2*itt] = x{index[i]*2+1]; 5 Example 2 — Autoscaling Radix-4 FFT With C6000 C Intrinsics int r4_fft (short n, int x{], const int w(]) ‘ int ni, n2, ie, ial, ia2, ia3, i0, il, i2, 43, 3, ke int t0, t1, t2; int xtmph, xtmpl; int shift, exp-19, scale=0; a2 =n; fe = 4) for (ken; Kel; ko>92 ) ( nl = n2; a2 >>= 2; ial = 0; for ( 3-07 j> 16) oxoooarete; x{i2] = xtmph | xtmpl; £0 = _ub2(x{i1],*143]); tl = (0 << 16); tO = ti | ((t0 >> 16) & oxo000EEet); t1 = “adda (t2,t0); £2 = “sub2 (e2, £0 xtmph = (_smpyh(t1,w[ial]) - _smpy(tl,w(ial]}) © Ox££££0000; xtmpl = ((ompylh(t1,w[ial]) + _smpyhi(ti,wlial})) >> 16) & oxoo00Feee; x(i2] = xtmph | xtmp1; xtmph = (empyh(t2,w(ia3]) ~ _empy(t2,w[ia3]}) & Ox££££0000; xtmpl = ((empylh(t2,w(ia3]) + _smpyhl(t2,w[ia3])) >> 16) & oxooo0rets; ‘Autoscaling Radix-4 FFT for TMS320C6000" = 7 Texas SPRAG54 INSTRUMENTS, x(43] = xtmph | xemply ) jal - ial + ier , ME (> 4) ie <<= 2; 3-0; while ( (exp > 16) 6& (3 > 16); xtmpl = norm(x{j] << 16 >> 16); if ( xtmph < exp ) exp=xtmphy if ( xtmpl < exp ) exp=xtmpl; ate if (exp < 19) { shift = 19-exp; exp = 19 scale += shift; —nassert (3215); for ( 3-0; Jeni j#+) 1 xtmph = (x[j] >> shift) 6 Ox€£££0000; xtmpl = ((x(3] << 16) >> (L6tshitt)) & oxoov0Eses; x(i] = xtmph | xtmpl; } ) return scal 6 Example 3 - Autoscaling Radi -4 FFT With C6000 Linear Assembly stitle “r4_fft.sa” sdef | _r4_ftt text r4_fft Leproc ny px) pW treg nl, 2, ie, ial, ia, ia3, 10, i2, 42, 43, 3p ke sreg tO, tl, t2, wy x0, x1, x2, x3; :reg tmp, mskh, xtmph, xtmply sreg exp, scale; add on, 0, 2 mee ie zero mskh mvkh — OXfFF£0000, makh zero scale add on, 0, Kk stage_loop: add n2, 0, al she m2, 2, a2 zero ial zero 3 group_loop: 8 Autoscaling Radix-4 FFT for TMS320C6000" 9 Texas INSTRUMENTS SPRAG54 add da, at, 1a2 add ia2, ial, ia3 add jy, 0, 40 butterfly_loo add i0, n2, it add it, m2, 12 add = i2, m2, i3 ldw = *#p_x [10], x0 law *4p_x{il], x1 law *4p_x [42], x2 law = “4p_x[i3], x3 add2 x1, x3, tO add2 x0, x2, tL sub2 x0, x2, t2 add2 0, tl, x0 i x0 sub2 tl, tO, t1 dw *4p_w[ia2], w j load twiddle factor w2 smpyh tl, w, tmp smpy ti, w, xtmph sub tmp, xtmph, xtmph and -xtmph, mskh, xtmph smpylh ti, w, tmp smpyhl tl, w, xtmpl add tmp, xtmpl, xtmpl shru —-xtmpl, 16, xtmpl or xtmph, xtmpl, x2 i x2 sub2 x1, x3, tO shi tO, 16, t1 neg tl, tl extu 0, 0,16, t0 or tl, to, t0 add2t2, tO, t1 sub2t2, tO, t2 ldw = “4p_widall, w } load twiddle factor wi smpyh ti, w, tmp smpy tl, w, xtmph sub tmp, xtmph, xtmph and xtmph, mskh, xtmph smpylh ti, w, tmp smpyhl ti, w, xtmpl add tmp, xtmpl, xtmpl shru —-xtmpl, 16, xtmpl or xtmph, xtmpl, x1 pox dw *4p_w[ia3], w j load twiddle factor w2 smpyh 2, w, tmp smpy t2, w, xtmph sub tmp, xtmph, xtmph and xtmph, mskh, xtmph smpylh 2, w, tmp smpyhl 2, w, xtmpl add tmp, xtmpl, xtmpl ‘Autoscaling Radix-4 FFT for TMS320C6000" 9 Texas INSTRUMENTS. SPRAGS4 shea xtmpl, 16, xtmpl or xtmph, xtmpl, x3 box stw x0, *#p_x[i0] ste xd, *p_x(i1] stw x2, *4p_x[i2] stw x3, *4p_x(13] add 0, m1, i0 emplt 40, 8, tmp (emp) butterfly_loop ; branch to butterfly loop add iat, ie, ial add, J cmpit 5, n2, tmp (emp) group_loop } branch to group loop cmpeq k, 4, tmp } test if last stage (emp) end } Af true, branch to end mvk 2, exp } initialize exponent zero 3 } initialize index mvkl OxO000ffff, t2 ; mask for masking xtmpl nvkh Ox000OFFFE, 2 test_bit_growth: .trip 16 Tew “4x L 3], tmp norm tmp, xtmph } test for redundant sign bit of HI half shl tmp, 16, xtmpl norm xt mpl, xtmpl ; test for redundant sign bit of 10 half cmplt xtmph, exp, tmp } test if bit grow (tmp)add xtmph, 0, exp cmpit xtmpl, exp, tmp } test if bit grow (tmpladd xt mpl, 0, exp cmpgt exp, 2, tmp if exp>2 than no scaling (empl no_scale empeq exp, 0, tmp } compare if bit grow 3 bits (tmpisub 3, exp, tO } calculate shift [emplmvk 0x0213, t1 } esta & ostb to ext xtmpl (tmpladd scale, £0, scale } accumulate scale (empl scaling empeq exp, 1, tmp } compare if bit grow 2 bit [tmpisub 3, exp, tO (emplmvk — 0x0212, t1 } esta & estb to ext xtmpl (tmpladd scale, 0, scale } accumulate scale (empl sealing sub 3, exp, t0 3 grows 1 bit mvk —0x0211, 1 } esta 6 esth to ext xtmpl add scale, t0, scale } accumulate scale > sealing no_scale: add 3, ty 5 emplt 3, n, tmp } compare if test all output (empl test_bit_growth } if not, test next output > next_stage } else go to next stage 10 Autoscaling Radix-4 FFT for TMS320c6000" 9 Texas INSTRUMENTS SPRAGS4 scaling: zero 3 scaling_loop: .trip 16 dew Mp_x(]) tmp she tmp, £0, xtmph } scaling HI half and xtmph, mskh, xtmph } mask HI half. ext tmp, t1, xtmpl scaling LO half and -xtmpl, t2, xtmpl } mask LO half by Ox0000seee or xtmph, xtmpl, tmp } x(3]=(xtmph | xtmpl] stw tmp, *#p_x(5] add 3, 3 emplt 3, n, tmp (emp] > scaling_loop next_stage: shl ie, 2, ie shr ky 2, k > stage_loop ; end of stage loop end: -return scale sendproc 7 References 1. C.S, Burrus and T.W. Parks, DFT/FFT and Convolution Algorithms and Implementation, John Wiley & Sons, New York, 1985. ‘Autoscaling Raaix-4 FFT for TMS320C6000" 11 IMPORTANT NOTICE Texas Instruments and its subsidiaries (Tl) reserve the right o make changes to thelr products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information, to verity, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgement, including those peraining to warranty, patent infingement, and imitation of lability. TI warrants performance ofits semiconductor products to the specifications applicable atthe time of sale in accordance with T's standard warranty. Testing and other quailty control techniques are utlized to the extent Tideems necessary to suppor this warranty. Specific testing fall parameters of each device is not necessarily performed, except those mandated by government requirements. CERTAIN APPLICATIONS USING SEMICONDUCTOR PRODUCTS MAY INVOLVE POTENTIAL RISKS OF DEATH, PERSONAL INJURY, OR SEVERE PROPERTY OR ENVIRONMENTAL DAMAGE (‘CRITICAL APPLICATIONS’). Tl SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, AUTHORIZED, OR WARRANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT DEVICES OR SYSTEMS OR OTHER, CRITICAL APPLICATIONS. INCLUSION OF TI PRODUCTS IN SUCH APPLICATIONS IS UNDERSTOOD TO, BE FULLY AT THE CUSTOMER'S RISK. In order to minimize risks associated with the customer's applications, adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards. Tlassumes no liabilty for applications assistance or customer product design. Tidoes not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such semiconductor products or services might be or are used. TI's publication of information regarding any third party's products or services does not constitute TI's approval, warranty or endorsement thereot. Copyright © 2000, Texas Instruments Incorporated

You might also like