Final Applied Acoustics Paper
Final Applied Acoustics Paper
Applied Acoustics
journal homepage: www.elsevier.com/locate/apacoust
Technical Note
a r t i c l e i n f o a b s t r a c t
Article history: In this paper a new wavelet based Independent Component Analysis (ICA) is proposed for Acoustic Echo
Received 11 September 2014 Cancellation (AEC) in the presence of double talk scenario. Conventional Echo cancellation systems that
Received in revised form 26 February 2015 use adaptive filter for AEC fails in the double talk situation, which demands a double talk detector. In the
Accepted 8 April 2015
proposed method, the near end speech is separated from the acoustic echo by using the measure of max-
imising the Non-Gaussianity of ICA using kurtosis and Negentropy without the need of double talk detec-
tor. The simulations show that the proposed wavelet based ICA method provides higher cancellation of
Keywords:
echo with less computation time.
Kurtosis
Negentropy
Ó 2015 Elsevier Ltd. All rights reserved.
Adaptive filters
Acoustic echo cancellation
Independent component analysis
1. Introduction out at various levels. Gansler identified the double talk by measur-
ing the similarity between far-end speech and near-end speech by
In the hands free communication environment, Acoustic Echo means of coherence function [4]. More robust and reliable double
Cancellation (AEC) plays a vital role. The presence of acoustic cou- talk detection is achieved by measuring the cross-correlation coeffi-
pling between the loudspeakers (far-end speech signal) and near- cient between the far-end speech and the microphone signal [5].
end microphone signal produces an undesired acoustic echo, The normalisation of the cross-correlation coefficients in double
which degrades the quality of the speech signal. To overcome this talk detection results in reduced computational complexity with
problem, an adaptive filter is implemented whose output will be improved efficiency of the echo cancellation [6,7]. For a full-band
the replica of the echoed signal. The echo can be suppressed by adaptive filter, the frequency domain adaptive algorithm combined
summing up the output of an adaptive filter and the microphone with cross-correlation based double talk detection is used for fast
signal. The coefficients of this adaptive filter are continuously convergence of the filter coefficients [8]. Frequency domain double
updated by using various adaptive algorithms such as Least talk detection based on Gaussian Mixture Modelling (GMM) results
Mean Square (LMS), Normalised Least Mean Square (NLMS), in simple and heuristic decision rule for double talk detection [9].
Recursive Least Square (RLS) [1,2] in order to keep the mean square In a half-duplex system, the adaptive filter technique is suffi-
error at the minimum. The efficiency of the echo removal is esti- cient for echo cancellation. But for a full-duplex with echo can-
mated by measuring ERLE (Echo Return Loss Enhancement) [3]. cellation, different algorithms are used to update the filter
When two people talk simultaneously, known as the double talk coefficients even during the double talk. One of the many latest
situation, the adaptive filter fails to correctly update the filter algorithms emits pseudo-random noise signal from the loud-
coefficients. Once a double talk situation has been identified by speaker along with far-end speech signal. An estimate of room
using various double talk detection algorithms, the updating of response is obtained by cross-correlating the pseudo-noise signal
the filter coefficients is stopped, but the comparison continues, thus with microphone signal [10]. Enzner and Vary have worked
allowing some part of the echo to pass through. In the view of dou- together to adapt an echo canceller with a post cancellation resid-
ble talk detection algorithms extensive research has been carried ual echo suppression filter. They developed a signal model com-
prising of both acoustic echo of a far-end talker as well as speech
from a near-end talker. This technique has overcome the problem
⇑ Corresponding author. Tel.: +91 9487526097. of echo cancellation by striking down the need for double talk
E-mail addresses: [email protected] (K. Mohanaprasad), parulmozhi detection [11].
[email protected] (P. Arulmozhivarman).
http://dx.doi.org/10.1016/j.apacoust.2015.04.004
0003-682X/Ó 2015 Elsevier Ltd. All rights reserved.
38 K. Mohanaprasad, P. Arulmozhivarman / Applied Acoustics 97 (2015) 37–45
Another recent study solves this problem, based on blind source 2. Wavelet based ICA for acoustic echo cancellation
separation (BSS), often achieved via Independent component
analysis (ICA). This work concentrates on estimating the echo path 2.1. Fundamentals of ICA
response during the double talk. ICA does not require pseudo-noise
sequences; instead it makes use of the statistics of existing signal 2.1.1. Basics of ICA model
characteristics. An efficient echo cancellation technique has been The main theme behind BSS is to recover the unknown source
developed with the help of ICA which can also be executed during signals from the group of mixed signals. The unknown signals
double talk. Reviews of BSS/ICA techniques can be obtained from are represented by si ðtÞ; i ¼ 1; 2; . . . n and the mixed group of sig-
the references [12–14]. In addition to the above techniques, the nals by xi ðtÞ; i ¼ 1; 2; . . . n. Basic block diagram of ICA is shown in
source signals are assumed to be mutually statistically indepen- Fig. 1.
dent. The signal is recovered by finding a linear transformation of Mathematical representation of the mixtures are given by
mixtures that yields independent outputs. There are a number of X
methods to obtain measure of the independence of non-Gaussian xi ðtÞ ¼ aij sj ðtÞ; i ¼ 1; 2; . . . ; n ð1Þ
stationary signals like kurtosis and other high-order statistics where aij is the mixing coefficient of ith measurement, and jth
[15,16], information theoretic measures [17,18], and Maximum
source, Eq. (1) in vector form is given as
likelihood-based measures [19,20].
An independent signal separation can easily be achieved by x ¼ As ð2Þ
using their correlation structure. Diagonalisation of a time lagged where the unknown mixing matrix is ½Aði;jÞ ¼ aij . x ¼ ½x1 ðtÞ; x2 ðtÞ; . . .
correlation matrix and joint diagonalisation of multiple time
lagged correlation matrices of zero-mean independent signals, xn ðtÞT is the linear combination of mixing matrix and s ¼ ½s1 ðtÞ;
gives the separated signals [21,22]. Blind separation can be solved s2 ðtÞ; . . . Sn ðtÞT is the source signal assumed to be mutually indepen-
by utilising the difference between the power spectral densities of dent. The general approach to separate the source signal is by using
the correlated signals in frequency domain [23]. Matsuoka et al. separation matrix W ¼ A1 which is given by
utilised the time varying second order statistics to separate
independent non stationary signals [24]. Instead of second order
y ¼ A1 x
ð3Þ
statistics, cumulants based separation was introduced by Mei y ¼Wx
et al. [25]. Kim et al. [26] used BSS for cancelling the noise present
where W is the inversion matrix of A, it cannot be directly deter-
in the acoustic mixture as a pre-processing step to the adaptive fil-
mined because matrix A is unknown. Yet it can be determined by
ter for acoustic echo cancellation. The conversion of echo can-
adaptively calculating the W vectors and setting up a cost function
cellation problem into signal separation has been previously
which is determined by the techniques of ICA.
proposed by Schobeen and Sommen [27]; they presented an
expression for the source correlation function and developed an
2.1.2. Pre-processing of ICA
efficient adaptive algorithm in the frequency domain. Wada et al.
In order to simplify fast ICA technique the pre-processing steps
[28,29] used Semi Blind Source Separation (SBSS) for stereophonic
are vital. Observed mixture data x(t) (microphone signal = echo +
acoustic echo cancellation and made a deep analysis of the
near end speech) is centralised by subtracting mean of the mixture
improvement of the convergence behaviour, ERLE, misalignment
from the original mixture. Centering is required to make the mean
performance and the stability of the echo canceller. Gunther [30]
of the signal zero, to simplify the complexity of this algorithm. To
made use of the non-stationarity of the sources by computing
make the centred data uncorrelated, whitening is required,
the source correlation function in time domain for echo can-
because independent signals are uncorrelated. A signal is said to
cellation in a single phone-speaker system. Gupta et al. [31] used
be uncorrelated if it has unit variance and identity covariance
normalised cross correlation based algorithm for detecting dou-
matrix. i.e.
ble-talk signal and proposed a combined adaptive noise canceller
with BSS (ANC-BSS) to remove back ground noise and the far-end ¼ I
E½xx ð4Þ
echo signal in the double talk situations. During the absence of One popular method for whitening is to use the Eigen Value
double-talk the far-end echo signal is cancelled by using adaptive Decomposition (EVD), mathematical representation for EVD is
filters. given as
In this paper, to reduce the complexity of using two systems to
cancel the acoustic echo in a double talk situation, a Wavelet ICA v ¼ ED1=2 ET ð5Þ
based adaptive filter is implemented to cancel the acoustic echo
where E is the orthogonal matrix of Eigen vectors of
evenly with both single-talk and double-talk situations without a
need of double talk detector. The pre-processing step has been car-
E½xx ð6Þ
ried out using Daubechies Wavelet for the decomposition of speech
Diagonal matrix of its Eigen values is D ¼ diagfd1 ; d2 ; d3 ; . . . dn g. To
signals which improves the efficiency of the ICA adaptive filter. This
implement whitening, the linearly transformed vector v is esti-
paper is structured as follows: Wavelet based ICA for Acoustic Echo
mated to transform the original signal x into a whitened matrix Z
Cancellation is detailed in Section 2, Implementation of proposed
which is given as
method discussed in Section 3 and Conclusion is achieved in
Section 4. Z¼vx ð7Þ
w
2.3.2. Negentropy 5. Normalise w kwk
.
Negentropy is also a measure of Non-Gaussianity of a random 6. If the sign of c is not known a priori, update. Dca½EfGðwT ZÞg
variable and it also includes higher order statistical information. EfGðv Þg c.
It is based on the information-theoretic quantity of entropy. 7. If not converged, go back to step 4.
Entropy gives a degree of information about the observation of
the variables. Gaussian random variable has largest entropy. Algorithm for Negentropy based on Fast ICA as follows:
Entropy (H) is defined as
X
HðyÞ ¼ Pðy ¼ ei Þ log Pðy ¼ ei Þ ð16Þ 1. Center the data to make its mean zero.
2. Whiten the data to give Z.
where ei is the possible value of y. The generalisation of entropy is 3. Choose an initial vector w of unit norm.
called differential entropy, which is defined with density function 4. Let, w EfZgðwT ZÞg Efg 0 ðwT ZÞgw where g is defined as
PðyÞ as g 1 ðyÞ ¼ tanhðyÞ
Z .
g 2 ðyÞ ¼ y expðy2 =2Þ
HðyÞ ¼ PðyÞ log PðyÞdx ð17Þ 5. Let w w
kwk
.
6. If not converged, go back to step 4.
The Normalised version of differential entropy is called as
Negentropy and it is defined as The performance of the ICA based adaptive filter using Kurtosis
JðyÞ ¼ Hðygauss Þ HðyÞ ð18Þ and Negentropy is further improved by using the proposed wavelet
based ICA.
where ygauss is a Gaussian random variable of the same covariance
matrix of y. Due to the above mentioned definition Negentropy is 2.4. Proposed method using wavelet based ICA
always non-negative and it is zero if and only if y has a Gaussian
distribution. In order to separate the independent component from 2.4.1. Basics of wavelet transform
a mixture of two or more sources, Gaussianity of the mixture can be Discrete Wavelet transform (DWT) is the discretisation of the
changed to non-Gaussian by bringing Negentropy to the maximum Continuous Wavelet Transform (CWT) through sampling particular
value. The estimation of Negentropy is difficult, so classical method wavelet coefficients. Sampling of CWT is achieved by letting a = 2l
of approximating Negentropy [17] by using higher order cumulants and b = m2l, in W(a, b). Where l is the discrete translation and m
is given as are the discrete dilations. DWT of a signal f(t) is given by
1 3 2 1 Z 1
JðyÞ ffi E y þ Kurt ðyÞ2 ð19Þ Wðl; mÞ ¼ 2l=2 uð2l t mÞf ðtÞdt ð24Þ
12 48 1
In the above equation square of the kurtosis leads to non-ro- DWT [32,33] has its own advantages like easy implementation
bustness which is encountered in the kurtosis, which can be and less computation time when compared to time domain. Here
approximated by generalising the higher order cumulants. This is the signal is decomposed into approximation and detailed coeffi-
achieved by using expectations of general non quadratic function cients, where approximation coefficients consist of low frequency
or non-polynomial moments. The polynomial function y3 and y4 information and detailed coefficients represent high frequency
are replaced by another non quadratic function G. Let us consider information. Approximation coefficients are obtained by passing
two non-quadratic functions G1 and G2. In which G1 is odd and the signal through low pass filter and a dyadic down sampler.
G2 is even. Then the Eq. (19) reduces to Detailed coefficients are obtained by passing the signal through
n o2 n o2 high pass filter and a dyadic down sampler as shown in Fig. 3.
JðyÞ ¼ K 1 E G1 ðyÞ þ K 2 EfG2 ðyÞg E G2 ðmÞ ð20Þ By using Multi Resolution Analysis (MRA), we can analyse the
signal at different frequencies with different resolutions. The
In case of one non-quadratic function of G, the approximation Fig. 4 shows multilevel decomposition by consecutive decomposi-
becomes tion of approximations. Where S is the original signal, SA and SD
represents approximation and detailed coefficients respectively
JðyÞ a EfGðyÞg EfGðmÞg2 ð21Þ
(S = SA3 + SD3 + SD2 + SD1).
In this generalisation of the moment based approximation, fol- The consecutive decompositions are limited to a level where the
lowing choices of G have proved very useful. standard deviation of the approximation component becomes less
1 than the standard deviation of the original signal [34]. This is given
G1 ðyÞ ¼ log cosh a1 y ð22Þ by the equation
a1
rAk
< 0:1 ð25Þ
G2 ðyÞ ¼ expðy2 =2Þ where 1 6 a1 6 2 ð23Þ rs
Maximisation of ICA is achieved by bringing the score function
of Negentropy to maximum value; this is validated using Gradient
and Fast ICA method.
Algorithm for Negentropy based on Gradient method is as
follows:
Efp2s ðnÞg
ERLE ¼ 10log10 ð26Þ
Efe2r ðnÞg
where Efp2s ðnÞg the power of the original echo and Efe2r ðnÞg is the
power of the residual echo, er ðnÞ ¼ eðnÞ s2 ðnÞ is the residual echo,
error signal e(n) is the difference between microphone signal and
estimated echo signal, s2(n) is the near-end speech signal. From
the literature [3], the optimum echo cancellation can be achieved
when the ERLE lies between 30 and 40 dB; higher value of ERLE
gives higher cancellation of echo. According to ITU-T recommenda-
tion G.167 [3] the value of ERLE should be 25 dB for hands free tele-
phones during double talk.
The correlation coefficient is defined as the ratio of covariance
of correlated data by the square root of their individual covariance
Fig. 5. Wavelet based ICA for Acoustic Echo Cancellation. Fig. 6. Input and output signals for acoustic echo cancellation simulation.
42 K. Mohanaprasad, P. Arulmozhivarman / Applied Acoustics 97 (2015) 37–45
the double talk conditions were present, as shown in Fig 6. The the results for conventional ICA with Kurtosis, Negentropy using a
simulated echo signal was mixed with the near-end signal and tan function and Negentropy using an exponential function.
used as a microphone signal. Kurtosis and Negentropy of max- From Table 1 and Fig. 7, the Negentropy technique shows higher
imisation of Non-Gaussianity techniques were used to separate ERLE value (+4 dB) and lower correlation coefficients than those
the near-end speech signal from the echo signal. Wavelet trans- yielded using Kurtosis for both using gradient and Fast ICA algo-
forms were used as a pre-processing stage for the ICA to improve rithms. This agrees with the standard view that Negentropy is usu-
the cancellation efficiency. Gradient and fast ICA algorithms were ally preferable to kurtosis when performing ICA due to its
used to allow a comparative study between kurtosis and robustness. The Fast ICA executes in fewer iterations and less
Negentropy. The separated near-end signal, the microphone sig- computation time compared with the gradient method.
nals and the combined signal are displayed in Fig. 6. Negentropy computed using the exponential function improves
the performance as compared to Negentropy using the tan
3.2. Result analysis function.
Table 2 and Fig. 8 present results when wavelet based
Using the above simulation setup the ERLE, Correlation coeffi- decomposition is incorporated in ICA algorithm. Wavelet based
cient, execution time and number of iterations were calculated kurtosis ICA provides higher ERLE (+2 dB), reduced correlation
for conventional ICA (Kurtosis and Negentropy) and Wavelet based coefficient, quick convergence and less computation time as com-
ICA using the gradient and Fast ICA algorithms. Table 1 represents pared with conventional ICA.
Table 1
Comparison between Conventional ICA using Gradient and Fast ICA Algorithm.
Fig. 7. Comparison of ERLE for Kurtosis and Negentropy using Gradient and Fast ICA algorithm.
Table 2
Comparison of Kurtosis and Wavelet based Kurtosis using Gradient and Fast ICA algorithm.
Fig. 8. Comparison of ERLE for Kurtosis and Wavelet based Kurtosis using Gradient and Fast ICA algorithm.
Table 3
Comparison of Negentropy tan function and Wavelet based Negentropy tan function using Gradient and Fast ICA algorithm.
Fig. 9. Comparison of ERLE for Negentropy tan function and Wavelet based Negentropy using Gradient and Fast ICA algorithm.
Table 3 and Fig. 9 compare Wavelet based Negentropy (tan quicker convergence as compared with conventional ICA using
function based) with conventional Negentropy (tan function the same cost function.
based) using the gradient and Fast ICA algorithms. These results Table 4 and Fig. 10 demonstrate a similar comparison between
showed that the wavelet based ICA parameterised by the tan- the conventional ICA and wavelet based ICA using the Negentropy
based Negentropy measure provides 2.5 dB increase in ERLE per- measure computed using the exponential function in the gradient
formance, lower correlation, reduced computation time and and Fast ICA algorithms. The results show that wavelet based
44 K. Mohanaprasad, P. Arulmozhivarman / Applied Acoustics 97 (2015) 37–45
Table 4
Comparison between Negentropy based exponential function with wavelet based Negentropy exponential function using Gradient and Fast ICA Algorithm.
Fig. 10. Comparison of ERLE for Negentropy Exponential function and Wavelet based Negentropy using Gradient and Fast ICA algorithm.
Negentropy increases the ERLE performance by nearly 3 dB and has [3] Duttweiler DL. A twelve-channel digital echo canceller. IEEE Trans Commun
1978;26:647–53.
the lowest correlation coefficients. The Fast ICA converges with
[4] Gansler T. A double-talk detector based on coherence. IEEE Trans Commun
fewer iterations and lower computation time. On the other hand, 1996;44(11):1421–7.
the Gradient algorithm shows a negligible increase in the ERLE [5] Cho JH, Morgan DR, Benesty J. An objective techniques for evaluating double
and a lower correlation coefficient as compared with Fast ICA algo- talk detector in acoustic echo cancellation. IEEE Trans Audio, Speech, Lang
Process 1999;7:718–24.
rithm. Overall the proposed wavelet based ICA algorithm para- [6] Benesty J, Morgan DR, Cho JH. A new class of doubletalk detectors based on
metrised by the exponential Negentropy measure yields the best cross-correlation. IEEE Trans Speech Audio Process 2000;8(3):168–72.
performance as quantified by an ERLE performance of around [7] Gansler T, Benesty J. A frequency-domain double-talk detector based on a
normalized cross-correlation vector. Signal Process 2001;81:1783–7.
6 dB, with almost zero correlation coefficient, lowest computation [8] Buchner H, Benesty J, Gansler T, Kellermann W. Robust extended multidelay
time and quickest convergence in six iterations. filter and double-talk detector for acoustic echo cancellation. IEEE Trans Audio,
Speech, Lang Process 2006;14(5):1633–44.
[9] Lee KH, Chang JH, Kim NS, Kang S, Kim Y. Frequency domain double-talk
4. Conclusion detection based on the Gaussian mixture model. IEEE Signal Process
2010;17(5):453–6.
[10] Jenq JC, Hsieh SF. A double talk resistant echo cancellation based on the
In this paper, to reduce the complexity of using two echo can- iterative maximal-length correlation. In: Proc Int Symp Circuits Syst R; 2000. p.
celler systems for single talk and double talk separately, a 237–240.
[11] Enzner G, Vary P. Frequency-domain adaptive kalman filter for acoustic echo
Wavelet ICA based adaptive filter is implemented to cancel the
control in hands-free telephones. Signal Process 2006;86:1140–56.
acoustic echo evenly with both single-talk and double-talk situa- [12] Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New
tions without a need of double talk detector. ICA based adaptive fil- York: Wiley; 2001.
ter using kurtosis and Negentropy algorithm is implemented. The [13] Cardoso JF. Blind signal separation: statistical principles. Neural Comput
Surveys 1998;85:2009–25.
results show that the Negentropy based adaptive algorithm yields [14] Common P, Jutten C. Handbook of blind source separation. New
high cancellation of echo during double talk situation when com- York: Academics; 2010.
pared to kurtosis based adaptive algorithm. The performance of [15] Commom P. Independent component analysis a new concept? Signal Process
1994;36:287–314.
the ICA based adaptive filter is further improved by inclusion of [16] Cardos JF. Source separation using higher order moments. Proc IEEE Int Conf
wavelet based ICA with less computation time. Acoust, Speech Signal Process 1984;4:2109–12.
[17] Hyvärinen A. Fast and robust fixed-point algorithm for independent
component analysis. IEEE Trans Neural Network 1999;10(3):624–34.
References [18] Bell AJ, Sejnowski TJ. An information-maximization approach to blind
separation and blind deconvolution. Neural Comput 1995;7(6):1004–34.
[1] Haykin S. Adaptive filter theory. 4th ed. Upper Saddle River, NJ: Prentice-Hall; [19] Cardos JF. Infomax and maximum likelihood for source separation. IEEE Signal
2002. Process Lett 1997;4(4):112–4.
[2] Widrow B, Hoff Tr. ME. Adaptive switching circuits, in IRE Wescon Conv Rec; [20] Pham DT, Garat P, Jutten C. Separation of a mixture of independent sources
1960. p. 96–104. through a maximum likelihood approach. In: Proc EUSIPCO; 1992. P. 771–774.
K. Mohanaprasad, P. Arulmozhivarman / Applied Acoustics 97 (2015) 37–45 45
[21] Kawamoto M, Matsuoka K, Oya M. Blind separation of sources using temporal [29] Nesta F, Wada TS, Juang B-H. Batch-online semi-blind source separation
correlation of the observed signals. IEICE Trans Fundam Electron, Commun applied to multi-channel acoustic echo cancellation. IEEE Trans Audio, Speech,
Comput Sci 1997;E80-A(4):695–704. Lang Process 2011;19(3):583–99.
[22] Weinstein E, Feder M, Oppenheim AV. Multi-channel signal separation by [30] Gunther Jake. Learning echo paths during continuous double-talk using semi-
decorrelation. IEEE Trans Audio, Speech, Lang Process 1993;1(4):405–13. blind source separation. IEEE Trans Audio, Speech, Lang Process
[23] Pham DT, Garat P. Blind separation of mixture of independent sources through 2012;20(2):646–60.
a quasi-maximum likelihood approach. IEEE Trans, Audio, Signal Process [31] Gupta VK, Chandra Mahesh, Sharan SN. Acoustic echo and noise cancellation
1997;45(7):1712–25. system for hand free telecommunication using variable step size algorithms.
[24] Kawamoto M, Matsuoka K, Oya M. A neural net for blind separation of Radio Eng 2013;22(1):200–7.
nonstationary signals. Neural Network 1995;8(3):411–9. [32] Burrus CS et al. Introduction to wavelet and wavelet transforms. 1st
[25] Mei T, Yin F, Wang J. Blind source separation based on cumulants with time ed. Prentice Hall Inc.; 1998.
and frequency non-properties. IEEE Trans Audio, Speech, Lang Process [33] Soon IY, Koh SN, Yeo CK. Wavelet for speech denoising. Proc IEEE TENCON
2009;17(8):1099–108. 1997:479–82.
[26] Kim D, Choi H, Bae H. Acoustic echo cancellation using blind source separation. [34] Rameshbabu N, Arulmozhivarman P. Improving forecast accuracy of wind
In: IEEE workshop on signal processing systems; August 2003. P. 27–29. speed using wavelet transform and neural networks. J Electric Eng Technol
[27] Schobben DWE, Somment PW. A frequency domain blind signal separation 2013;8(3):559–64.
method based on decorrelation. IEEE Trans, Audio, Signal Process
2002;50(8):1855–65.
[28] Wada TS, Miyabe S, Juang B-H. Use of decorrelation procedure four source and
echo suppression. In: Proc Int Workshop for Acoust Echo Noise Control; Sep.
2008.