DIRECTION OF ARRIVAL ESTIMATION USING EIGENANALYSIS OF THE PARAMETERIZED SPATIAL CORRELATION MATRIX
Jacek Dmochowski, Jacob Benesty, and Sofiene Affes
Universite du Quebec, INRS-EMT 800 rue de la Gauchetiere Ouest Montreal, Quebec, Canada, H5A 1K6 { dmochow,benesty,affes} @ [Link]
ABSTRACT The estimation of the direction-of-arrival (DOA) of one or more acoustic sources is an area that has generated much interest in recent years, with applications like automatic video camera steering and multi-party stereophonic teleconferencing entering the market. Time-difference-of-arrival (TDOA) based methods compute each relative delay using only two microphones, even though additional microphones are usually available, and thus suffer from the effects of background noise and reverberation. This paper deals with DOA estimation based on spatial spectral estimation, and proposes a novel DOA estimator based on the eigenvalues of the parameterized spatial correlation matrix. Simulation results confirm the ability of the proposed method to provide reliable estimates even in heavily reverberant environments. Index Terms- Microphone arrays, DOA estimation, source localization, circular array, spatial correlation matrix.
all microphones. Its relation to minimum variance spectral estimation is also examined, and simulation results show that the proposed methods yield a substantially lower anomaly rate in reverberant environments than that of TDOA-based methods. It is also shown that adding more microphones helps to combat the effects of reverberation. 2. SIGNAL MODEL
Assume a 2-dimensional array of L + 1 elements lying in a plane, shown in Fig. 1 (i.e., circular array), whose outputs are denoted by xi [n], 1 = 0, 1, ..., L, where n is the time index. Denoting the azimuth angle of arrival by 0, propagation of the signal from a far-field source to microphone 1 is modeled as:
xi [n]
=
i s [n- t -fi (0)] + vi [n],
(1)
1. INTRODUCTION
The location of a signal source is of much interest in many applications, and there exists a large and increasing need to locate and track sound sources. For example, a speech enhancing beamformer [1], [2] must continuously monitor the position of the speaker in order to provide the desired directivity and interference suppression. The two major classes of broadband DOA estimation techniques are those based on the time-differences-of-arrival (TDOA) and spatial spectral estimators. The latter terminology arises from the fact that spatial frequency corresponds to the wavenumber vector, whose direction is that of the propagating signal. Therefore, by looking for peaks in the spatial spectrum, one is determining the DOA's of the dominant signal sources. The TDOA approach is based on the relationship between DOA and relative delays across the array. The problem of estimating these relative delays is termed "time delay estimation" [3]. The generalized cross-correlation (GCC) approach of [4], [5] is the most popular time delay estimation technique. Each computed time delay is derived from only two microphones, and often contains significant levels of corrupting noise and interference. It is thus well-known that current TDOA-based DOA estimation algorithms are plagued by the effects of noise and especially reverberation. This paper focuses on DOA estimation based on spatial spectral estimation, proposing a DOA estimator based on the eigenanalysis of the parameterized spatial correlation matrix, which jointly utilizes
This work is supported by the National Sciences and Engineering Research Council of Canada (NSERC).
where cei, 1 = 0, 1, 2, ..., L, are the attenuation factors due to channel effects, t is the propagation time, in samples, from the unknown source s [n] to microphone 0, vi [n] is the additive noise signal at the lth microphone and includes reverberation, and fi (0), = 0 , 1, ..., L, is the relative delay between microphones 0 and 1. The function fi relates the angle of arrival to the relative delays between microphone elements 0 and 1, and is derived for the case of an equispaced circular array in the following manner. When operating in the far-field, the time delay between microphone 1 and the centre of the array is given by [6] (2) 91 (0) = rc, cos (O- i), where the azimuth angle (relative to the selected angle reference) of the Ith microphone is denoted by 1bi = bo = 0, r denotes the array radius, and c is the speed of signal propagation. It easily follows that
fi(0) = go(O)- gi(0)
rc- [cos0 -cos(0
2-L 2I
(3)
3. PROPOSED APPROACH
3.1. Steered Response Power and the Parameterized Spatial Correlation Matrix Using the model of Section 2, the output of a conventional delayand-sum beamformer (DSB) steered to an angle of arrival of 0 is
given as
zl,
[n]
1=0
,' wo,lx [n + fi (X)]
(4)
1-4244-0728-1/07/$20.00 2007 IEEE
I- I
ICASSP 2007
where oa2 is the signal power,
at
..............
[cvo Ce1
.CL
VL
(11) (I 1)
and
.......
....
vo
[ vo
[n]
vi
[n + f,
(O)]
[n + fL (0)]
(12
Angle reference
Fig. 1. Circular array geometry.
The delays fi (X) steer the beamformer to the desired DOA, while the angle-dependent beamformer weights wo,l help shape the beam accordingly. The estimate of the spatial spectral power at azimuth angle 0 is given by the power of the beamformer output when steered to azimuth 0 [7]: f r% S(p) =E{z2[n]}, tJ)J which using (4) may be written in matrix notation as S (5) =w,Rsws, where
Note that it has been implicitly assumed that the desired signal is wide-sense stationary, zero-mean, and temporally uncorrelated with the additive noise. It may be easily shown that the signal component of Ro has one non-zero eigenvector, that eigenvector being at, with the corresponding eigenvalue being a72 Ia 2. The vector of attenuation constants at is generally unknown; however, from the above discussion, it is apparent that the vector may be estimated from the eigenanalysis of Ro. To that end, consider an adaptive weight selection method, which follows from the ideas of narrowband beamforming [12]. This weight selection attempts to non-trivially maximize the output energy of the steered-beamformer for a given azimuth 0: T (13) emax, </ = arg max wg +w
w,
(12)
WO
and
[ Wo,o Wo,i
R
=
...
WO,L
(7)
(8)
E{x
[n]x [n]
is the parameterized spatial correlation matrix (proposed in a time delay estimation context in [8], [9]), which should not be confused with the array observation matrix that is commonly used in narrowband beamforming models. Each off-diagonal entry of Ro is a single cross-correlation term and a function of the azimuth angle 5b. Notice that the various microphone pairs are combined in a joint fashion, in that altering the steering angle 0 affects all off-diagonal entries of
(14) It is well-known that the solution to the above constrained optimization is the vector that maximizes the Rayleigh quotient [2] 'WT ' which is in turn given by the eigenvector corresponding to the maximum eigenvalue of R7. The resulting spatial spectral estimate is given by ( _'.TI _ (15) SE;u (5) = emax,oRoemax, 0= Amax,>, where Amax,,,/ is the maximum eigenvalue of R7 and emax,, is the corresponding eigenvector. The DOA estimation involves searching for the angle that produces the largest maximum eigenvalue of R7:
subject to
wOwO= 1.
rT
OEIG
arg max Amax,c-
(16)
In addition to producing a spatial spectrum estimate, the above eigenanalysis allows one to estimate a:
&
Ro.
max,0EIG
(17)
The DOA estimate is thus given by
0
A~~~
3.3. Minimum Variance Estimation
arg max w, Ro ws.
(9)
The question of how to choose the weights ws remains. Notice that from (6), this is an effectively "narrowband" weight selection, in that the pre-aligning of the microphones requires only the selection of a single weight per channel. Note, however, that this weight selection must be performed for all angles 5b. The simplest choice is the fixed weighting w = 1 for all 0 [10], [11] ,where 1 is a vector of L + 1 ones. In this paper, two adaptive schemes are proposed, as presented in the next two subsections.
3.2. Eigenanalysis of the Parameterized Spatial Correlation Matrix
The application of Capon's MVDR weight-selection method [13] to spatial spectral estimation has been previously studied in [14]. The treatment therein is somewhat incomplete in that channel attenuation effects are not considered. With the estimate of at now available, it is possible to derive the optimal MVDR weights for the steered response power approach of (9). Taking into account the channel attenuation vector at, the proposed unity gain constraint is
L
w w,
1=0
c,,els [n- t -fi () + fi (q5)] = s [n -t],
T wO a
(18)
which may be simplified and written in vector notation as
1.
Using the signal model of Section 2, notice that when the steered azimuth 0 matches the actual azimuth 0, the parameterized spatial correlation matrix may be decomposed into signal and noise components in the following manner: aaT + E voVT }, (10) RO
=
(19)
Using the method of Lagrange multipliers, the optimal weights become
Wmv' ,
R1 a 0T R 1a
(20)
1- 2
Substituting (20) into (6):
(q5) Sm~ =WmvI4RoWmvI,o
(at
a)
(21)
The proposed broadband minimum variance DOA estimator is thus given by:
Omvi
=
over the K frames. For each simulation, the algorithms are also evaluated from a DOA estimation standpoint using the percentage of anomalies (%0a) - anomalies are classified as estimates that differ from the actual angle of arrival by more than 5 degrees - and rootmean-square (RMS) error measure for the nonanomalous estimates:
eRMS
argmax (a TTR-i a)
(22)
dK
(6k
)0
(25)
Interestingly, from (20) and the definition of an eigenvector,
emax,
Wmv/,O.
(23)
where Xna is the set of all nonanomalous estimates, and Kna is the number of elements in Xna.
and thus the EIG spectrum closely resembles the MV' spectrum at angles near the actual azimuth angle of arrival. It should also be mentioned that the classical minimum variance approach is a specific case of (21), where ac 1 [14].
4. SIMULATION EVALUATION 4.1. Simulation Environment
4.2. TDOA Comparison Algorithm
The DOA estimation performance of the proposed estimators is compared to that of a standard two-step TDOA algorithm. The algorithm computes the time delay between microphone 0 and microphone I forl= l ...,Las: 1 (26) Tol = arg max E{xo[n xi[ n -+ T, and then translates these relative delays to the azimuth angle of arrival using the least-squares criterion:
L
The proposed estimators are evaluated in a computer simulation. An equispaced circular array of 3-10 omnidirectional microphones is employed as the spatial aperture. The array radius is made as large as possible without suffering from spatial aliasing [6]:
r
=
OTDOA
arg max
EcOl r T
=1
COSX+COS (X
4 sin
fmax
L+l
'
21 l2 L+ I (27)
(24)
4.3. Results Figure 2 depicts the broadband spatial spectra of the proposed estimators in simulation scenario 1. Panels (a) and (b) are computed with moderate reverberation and a speech signal, and show the effect of adding more microphones on the resulting spectra. Panels (c) and (d) show the effect of increased reverberation on the spectral estimates and are computed with 10 mics and a speech signal. It is evident that adding extra microphones improves the resolution of the spatial spectra. The main lobe is narrowed, and the background level, which corresponds to the power of the reverberant field, is lowered. A lower reverberant field level decreases the probability of anomalies. As the reverberation level increases, the background level is increased, making anomalies more likely. Note that even with a T60 of 600 ms, the spectra clearly discriminate the source from the background. Tables I and II pertain to the DOA estimation accuracy of the proposed and TDOA methods. It is obvious from the tables that the TDOA-based method provides very poor performance in reverberant conditions, as the proposed estimators greatly outperform the TDOA-based approach in all but the anechoic white signal case. This lends credence to the notion that jointly utilizing multiple microphone pairs combats reverberation, not just background noise. In the TDOA two-step method, a "hard-decision" is made in the computation of each mi, and thus if this decision is incorrect, the error is propagated to the least-squares stage. On the other hand, spatial spectral estimators do not make such hard-decisions. Instead, the decision is deferred until after the contribution of all microphone pairs. As the eigenvalue and minimum variance spectra are equivalent at 0 from (23), the DOA estimation performance of the two methods is identical. Notice that the introduction of variable microphone gains in scenario 2 seriously degrades the TDOA method's performance, but causes only a slight degradation in performance of the proposed methods, pointing to the efficacy of the estimate of a. Similarly, the elevation of the source does not pose problems to the proposed estimators.
-
where fmax denotes the highest frequency of interest, and is chosen to be 4 kHz in the simulations. The signal sources are omnidirectional point sources. A reverberant acoustic environment is simulated using the image model method [15]. The simulated room is rectangular with plane reflective boundaries (walls, ceiling and floor). Each boundary is characterized by a frequency- and angle-independent uniform reflection coefficient. The room dimensions in centimeters are (304.8, 457.2, 381). The centre of the array sits at (152.4, 228.6, 101.6). Two distinct scenarios are simulated, as described below. The speaker is immobile and situated at (254, 406.4, 101.6) and (254, 406.4, 152.4) in the first and second simulation scenarios, respectively. The correct azimuth angle of arrival is 60 degrees. The SNR at the microphone elements is 0 dB. Here, SNR refers to spatially white sensor noise in the first scenario, and spherically isotropic (diffuse) noise in the second scenario (the noise is temporally white in both cases). Three reverberation levels are simulated for each scenario: anechoic, moderately reverberant (T60 = 300 ms), and highly reverberant (T60 = 600 ms). The reverberation times are measured using the reverse-time integrated impulse response method of [16]. In the first simulation scenario, the microphones are calibrated with unity gains. In the second simulation scenario, the presence of uncalibrated microphones is simulated, by setting cel, 1 = 0, 1, ..., L to a uniformly distributed random number over the range (0.2, 1). The microphone outputs are filtered to the 300 - 4000 Hz range prior to processing. Two signal types are examined for each scenario: white Gaussian noise and female English speech. The DOA estimates are computed once per 128 ms frame. To achieve good angular resolution, the sampling rate is chosen to be 48 kHz, resulting in frames of N = 6144 samples each. A simulation run consists of K = 890 frames. The estimated spatial spectra are plotted to observe mainlobe width and background values. For each frame, the spectrum is normalized such that the peak is 0 dB, and the spectra are then averaged
I-3
-0.5 f /;>k
iSource '
-1.50
FU
Un
-2.5
-3.5 -4 L o
30
60
a
E
-1.5(/-
-2.5-
-30
-3.5
30
60
.-2F/\ '.,~ 75mics~
3 mics
Table 1. DOA estimation results for simulation scenario 1. Parameters TDOA EIG MV'
'
P
'
a eRl\4S
5 mics
|%a
eam\s
%a
eam\s
.r
7 mics
m.
.1
10 mics
0 0.4 0 0 0.7 0..7 11 2.5 0 1.4 0 1.4 74 2.8 0 2.0 0 2.0 T60 = Om s,speech signal 5 2.1 0 1.2 0 1.2 T60 = 300Om s,speech signal 53 2.8 9 2.6 9 2.6 T60 =600 ms,speech signal 82 3.0 29 2.8 29 2.8 Table 2. DOA estimation results for simulation scenario 2. Parameters TDOA EIG MV'
%a
e
T6 0= Oms,white signal TC0 = 300ms,white signal T60 = 600ms,white signal
90
120
Angle of Arrival (degrees)
150
180
210
240
270
300
330
%4a
em4s
%a
em4s
T60
T60
Oms,white signal
4
29
2.5
3.1
0
0
2.0
3.50
2.0
3.5
300Om s,white signal T60 600ms,white signal T60 = Oms,speech signal
= =
T60 =600 ms,speech signal
TC,0
300Om s,speech signal
66 24 77 91
2.8 2.6 3.1 2.9
8 1 10 27
4.5 3.0 4.4 4.7
9 4.5 1 3.0 |4.4 l10 27| 4.7
5^ \
5et
5. CONCLUSIONS
90
120
Angle of Arrival (degrees)
150
180
210
240
270
300
330
(b)
This paper has presented a novel approach to broadband DOA estimation based on the eigenanalysis of the parameterized spatial correlation matrix. It was shown that this estimate is equivalent to the MVDR spatial spectral estimate that takes into account channel attenuation effects. The addition of extra microphones increases the resolution of the proposed spectral estimate and helps shield the DOA estimate against the effects of reverberation. Simulation results showed that the proposed method provides reliable estimates in reverberant environments in which TDOA-based methods fail. This superior performance stems from the joint utilization of all microphones via the parameterized spatial correlation matrix.
6. REFERENCES
a
U)
[1]
B. D. Van Veen and K. M. Buckley, "Beamfonming: a versatile approach to spatial tiltering," IEEE ASSP Mag., vol. 5, pp. 4-24, Apr. 1988.
-4.50
-5
30
60
90
120
Angle of Arrival (degrees)
150
180
210
240
270
300
30
D. E. Dudgeon and D. H. Johnson, Array Signal Processing, Prentice-Hall, NJ, 1993. [3] J. Chen, J. Benesty, and Y. Huang, "Time delay estimation in room acoustic environments: an overview," EURASIP Journal on Applied Signal Processing, vol. 2006, article ID 26503, 19 pages, 2006. [4] C. H. Knapp and G. C. Carter, "The generalized coffelation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Processing, vol. 24, pp. 320-327, Aug. 1976. [5] M. Omologo and P. Svaizer, "Use of the crosspower-spectrum phase in acoustic event location," IEEE Trans. Speech and Audio Processing, vol. 5, pp. 288-292, May 1997. [6] R. A. Monzingo and T. W. Miller, Introduction to AdaptiveArrays, SciTech Publishing, Raleigh, NC, USA, 2004. [7] D.H. Johnson, "The application of spectral estimation methods to bearing estimation problems," Proc. IEEE, vol. 70, pp. 1018-1028, Sept. 1982.
[2]
(c)
[8] J. Chen, J. Benesty, and Y. Huang, "Robust time delay estimation exploiting redundancy among multiple microphones," IEEE Trans. Speech and Audio Processing, vol. 11, pp. 549-557, Nov. 2003.
[9] J. Benesty, J. Chen, and Y. Huang, "Time-delay estimation via linear interpolation and cross-coffelation," IEEE Trans. Speech and Audio Processing, vol. 12, pp. 509-519, Sept. 2004. [10] J. Dibiase, H.F. Silvenman, and M.S. Brandstein, "Robust localization in reverberant rooms," in Microphone Arrays: Signal Processing Techniques and Applications (M. S. Brandstein and D. B. Ward, eds.), pp. 157-180, Springer-Verlag, Berlin, 2001.
[11] D. N. Zotkin and R. Duraiswami, "Accelerated speech source localization via a hierarchical search of steered response power," IEEE Trans. Speech and Audio Processing, vol. 12, pp. 499-508, Sept. 2004. [12] H. Krim and M. Viberg, "Two decades of arfay signal processing research: the parametric approach," IEEE Signal Processing Mag, pp. 67-94, July 1996.
E
[13] J. Capon, "High-resolution frequency-wavenumber analysis," Proc. IEEE, vol. 57, pp. 1408-1418, Aug. 1969.
[14] N. Owsley, "Sonar affay processing", in Array Signal Processing, S. Haykin, ed., Englewood Cliffs, NJ: Prentice Hall, 1984.
[15] J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," J Acoust. Soc. Am., vol. 65, pp. 943-950, Apr. 1979.
[16] M. R. Schroeder, "New method for measuring reverberation time," J. Acoust. Soc. Am., vol. 37, pp. 409-412, 1965. o
30 60
90
120
Angle of Arrival (degrees)
150
180
210
240
270
300
330
(d)
Fig. 2. Spatial spectral estimates: (a) SEIG (0) as a function of number of mics, (b) Smv (0) as a function of number of mics, (c) SEIG (q) as a function of T60, and (d) Smv (0) as a function of T6o.
'-