SpeechComm2017c
SpeechComm2017c
Speech Communication
journal homepage: www.elsevier.com/locate/specom
a r t i c l e i n f o a b s t r a c t
Article history: In this paper, we propose a new dereverberation approach based on the weighted prediction error (WPE)
Received 18 June 2016 method implemented in the short-time Fourier transform (STFT) domain. Our main contribution is to
Revised 11 November 2016
model the temporal correlation of the STFT coefficients across analysis frames, referred to as inter-frame
Accepted 5 January 2017
correlation (IFC), and exploit it in the dereverberation process. Since accurate modeling of the IFC is
Available online 6 January 2017
not tractable, we consider an approximate model wherein only a finite number of consecutive speech
Keywords: frames are considered correlated. It is shown that, given an estimate of the IFC matrix, the proposed
Inter-frame correlation approach results in a convex quadratic optimization problem with respect to the reverberation prediction
Multi-channel linear prediction (MCLP) weights, and a closed-form solution can be accordingly derived. Furthermore, an efficient method for
Speech dereverberation the estimation of the underlying IFC matrix is developed based on the extension of a recently proposed
Speech enhancement speech variance estimator. We evaluate the performance of our approach incorporating the estimated IFC
matrix and compare it to the original and several variants of the WPE method. The results reveal lower
residual reverberation and higher overall quality of the enhanced speech when the proposed method is
employed.
© 2017 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.specom.2017.01.004
0167-6393/© 2017 Elsevier B.V. All rights reserved.
50 M. Parchami et al. / Speech Communication 87 (2017) 49–57
derived that yields the enhanced speech. In a similar way, by using prediction weights in the WPE method with IFC is developed and
a time-varying statistical model for the speech and a multi-channel a novel technique for the estimation of the IFC matrix is presented.
linear prediction (MCLP) model for the reverberation, an efficient The objective performance evaluation of the proposed approach
dereverberation approach has been developed in Nakatani et al. using different types of reverberant speech signals is discussed in
(2008a) and Kinoshita et al. (2009). Since the implementation of Section 4. Finally, a brief conclusion is given in Section 5.
such methods in the time domain is computationally expensive, it
was proposed in Nakatani et al. (2008b, 2010) to employ the MCLP- 2. A brief review of the WPE method
based method in the short-time Fourier transform (STFT) domain.
The resulting approach, referred to as the weighted prediction Suppose that a speech signal emanating from a single source
error (WPE) method, is an iterative algorithm that alternatively is captured by M microphones located in a reverberant enclo-
estimates the reverberation prediction coefficients and speech sure. In the STFT domain, we denote the clean speech signal by
spectral variance, using batch processing of the speech utterance. sn, k with time frame index n ∈ {1, . . . , N} and frequency bin in-
Basically, the WPE method and its variants consider temporally/ dex k ∈ {1, . . . , K } where N is the total number of frames and K
spectrally independent speech components in the STFT domain. is the number of available frequency bins. Then, the reverberant
(m )
This assumption, despite greatly simplifying the derivation and ap- speech signal observed at the mth microphone, xn,k , can be rep-
plication of the WPE method, is inaccurate and lacks the modeling resented in the STFT domain using a linear prediction model as
of inherent dependencies across time frames and spectral compo- Nakatani et al. (2010)
nents at each time frame. In Erkelens and Heusdens (2010), it was Lh −1
(m )
( m )∗ (m )
shown that the STFT coefficients of anechoic speech exhibit sig- xn,k = hl,k sn−l,k + en,k (1)
nificant correlation in time, even with frame overlaps of less than
l=0
50%. This correlation, referred to as inter-frame correlation (IFC),
(m )
is further pronounced in case of highly reverberant speech, due to where hl,k is an approximation of the acoustic transfer func-
the convolutive nature of the reverberation. In Habets et al. (2012), tion (ATF) between the speech source and the mth microphone in
in the context of multi-microphone noise reduction in a reverber- the STFT domain, Lh denotes the length of the ATF (measured in
ant environment, it was demonstrated that the achievable perfor- frames) and ∗ denotes the complex conjugate. The additive term
(m )
mance in terms of noise reduction and speech distortion can be en,k models the linear prediction error and the additive noise and
further improved by exploiting IFC. The noise reduction problem is neglected here as in Nakatani et al. (2010). Therefore, (1) can be
using IFC has also been addressed partially in Esch (2012) where, rewritten as
in the propagation step of a noise reduction method based on Lh −1
(m ) (m )
( m )∗
Kalman filter, the complex-valued prediction weight is used to ex- xn,k = dn,k + hl,k sn−l,k (2)
ploit the temporal correlation of successive speech and noise STFT l=D
coefficients. However, similar to Habets et al. (2012), this work as- −1 (m )∗
(m )
sumes perfect knowledge of the theoretical IFC in the derivation of where dn,k = D h
l=0 l,k
sn−l,k is the sum of anechoic (direct-path)
various enhancement algorithms. In summary, the IFC has not been speech and early reflections at the mth microphone and D corre-
fully explored in the context of STFT domain speech enhancement sponds to the duration of the early reflections. Most dereverbera-
and the accurate modeling and applications of the speech IFC re- tion techniques, including the WPE method, aim at reconstructing
(1 )
mains an attractive area for future research, especially in the con- the desired signal, say dn,k ≡ dn,k , or suppressing the late rever-
text of dereverberation where the channel impulse responses are berant terms represented by the summation in (2). Replacing the
characterized by long memory (Vaseghi, 2006). convolutive model in (2) by an auto-regressive (AR) model results
In this work, in order to take into account the considerable in the well-known multi-channel linear prediction (MCLP) form for
IFC present in the desired speech (due to the speech character- the observation at the first microphone, i.e.,
istics, STFT framing overlaps and heavy reverberation), we refor-
(1 )
M
mulate the WPE method through the introduction of an approx- dn,k = xn,k − gk(m )H xn,k
(m ) (1 )
= xn,k − GH
k Xn,k (3)
imate model for the joint probability distribution of the desired m=1
speech STFT coefficients within finite segments, each consisting of (m )
consecutive frames. Following an ML approach similar to the orig- with superscript H as the Hermitian transpose and the vectors xn,k
inal WPE method, it is shown that the resulting dereverberation and gk(m ) are defined as
problem leads to a convex optimization problem with a closed-
form solution for the reverberation prediction weights, since it can gk(m ) = [g(0m,k) , g(1m,k) , . . . , g(Lm−1
)
,k
]T
k
be solved efficiently in a single attempt, unlike the original WPE (m )
xn,k = [xn(m )
−D,k
, xn(m )
−D−1,k
, . . . , xn(m )
−D−(L ),k ]
T
(4)
method whose solution requires an iterative procedure. In addi- k −1
tion, regarding the estimation of the underlying IFC matrix for where gk(m ) is the regression vector (reverberation prediction
the desired speech component, an extension of the method for weights) of order Lk for the mth channel and the superscript T de-
speech spectral variance estimation in Parchami et al. (2016) is notes transpose. The right-hand side of (3) has been obtained by
(m )
proposed. The proposed method can efficiently eliminate the re- concatenating {xn,k } and {gk(m) } over m to respectively form Xn, k
verberant component from the observed speech, prior to the esti-
and Gk . Estimation of the regression vector Gk and insertion of it in
mation of the cross-spectral variance of the desired speech, that is
(3) can provide an estimate of the desired (dereverberated) speech.
performed by a first order smoothing scheme. Finally, we evaluate
From a statistical viewpoint, estimation of Gk can be performed by
the performance of our approach incorporating the estimated IFC
applying the maximum likelihood (ML) criterion at each frequency
matrix and compare it to the original and several variants of the
bin. To this end, the conventional WPE method (Nakatani et al.,
WPE method. The results reveal lower residual reverberation and
2008b, 2010) assumes a circular complex Gaussian distribution for
higher overall quality of the enhanced speech when the proposed
the desired speech coefficients, dn, k , with (unknown) time-varying
method is employed.
spectral variance σd2 = E{|dn, k |2 } and zero mean. Assuming that
The remainder of this paper is organized as follows. In n,k
Section 2, a brief overview of the WPE method is presented. In the desired speech STFT coefficients dn, k are independent across
Section 3, a closed-form solution for the optimum reverberation frames, i.e., using zero IFC, the joint distribution of the desired
M. Parchami et al. / Speech Communication 87 (2017) 49–57 51
Table 1
Outline of the steps of the conventional WPE method with temporally independent
speech STFT coefficients.
• At each frequency bin k, consider the speech observations x(m ) , for all n and
n,k
m and the parameters D, Lk and .
σd2 by σd2[1] = |xn,k |2 .
• Initialize
n,k n,k
• For, j = 1, 2, , J (with a fixed number of iterations, J) , repeat the
following:
A[kj] = Nn=1 σd−2[ j] Xn,k XH n,k
N
n,k
( 1 )∗
ak = n=1 σd−2[ j] Xn,k xn,k
[ j]
n,k
G[kj] = A−1[
k
j] [ j]
ak
[ j]
rn,k = G[kj]H Xn,k
[ j] (1 ) [ j]
dn,k = xn,k − rn,k
σd2[n,kj+1] = max{|dn,k | , }
[ j] 2
N It can be shown that the matrix Ak is positive semidefinite, and
p(dk ) = p(d1,k ) p(dn,k |dn−1,k ) therefore, the quadratic objective function in (15) is real-valued
n=2 and convex in terms of hk . Subsequently, to find the global min-
N
p(dn,k , dn−1,k ) imum of J (hk ), we can express (15) in the following form
= p(d1,k ) (8)
p(dn−1,k ) J ( hk ) = hk − hk
H
Ak hk − hk + ck (17)
n=2
where the conditioning term Dn, k in (7) has been replaced by the where ck is an independent term and
shorter segment dn−1,k = [dn−1,k , dn−2,k , · · · , dn−τk ,k ]T with τ k as
the assumed IFC length in frames. Unfortunately, proceeding with hk = A−1
k
bH
k (18)
the model in (8) to find an ML solution for the regression vec-
It is evident that hk in the above is the global minimum of the
tor Gk does not lead to a convex optimization problem. Therefore,
objective function J (hk ) in (17), or equivalently, it is the esti-
to overcome this limitation, we alternatively exploit an approxi-
mate of the reverberation prediction weights by the proposed WPE
mate model by considering only the correlations among the frames
method.
within each segment, dn,k = [dn,k , dn−1,k , · · · , dn−τk +1,k ]T , and dis-
regarding the correlations across the segments. This results in the 3.2. Estimation of the IFC matrix
following approximate model
τN To calculate the optimal reverberation prediction weights by
k
(18), Ak and bk in (16), and in turn, An, k and bn, k given by
p( d k ) p dn,k
(14) have to be calculated. To do so, as seen in (14), the IFC ma-
n=1
trix of the desired speech terms, n, k , has to be estimated be-
τN
k
1 forehand. In Parchami et al. (2016), a new variant of the WPE
= exp −dn,k
H
−1 d (9) method has been suggested, that exploits the geometric spec-
n=1
π τk det n,k n,k n,k
tral subtraction approach in Lu and Loizou (2008) along with the
estimation of late reverberation spectral variance (LRSV), in or-
where n, k = E {dn,k dn,k
H } represents the correlation matrix of d ,
n,k
der to estimate the spectral variance of the desired speech, σd2 ,
det denotes the determinant of a matrix and . is the floor func- n,k
tion. Now, using (3), the desired speech segment dn,k can be ex- unlike the iterative scheme in the original WPE method, as in
pressed as Table 1. We here develop an extension of the proposed method
in Parchami et al. (2016) to estimate the spectral cross-variances
dn,k = un,k − UH
n,k hk (10) of the desired speech terms, ρn1 ,n2 ,k = E {dn1 ,k dn∗ ,k }, which in
2
where fact constitute the IFC matrix n, k . In this regard, by resorting
to the dereverberation by spectral enhancement (gain function-
(1 ) (1 )
un,k = [xn,k , xn−1,k , · · · , xn(1−)τ ]T (11) based approach), the following estimate of dn, k can be obtained
k +1,k
hk = G∗k 1−
(γn,k −ξn,k +1 )2
dˆn,k =
4γn,k (1 )
(γn,k −ξn,k −1 )2
xn,k (19)
In the same manner as the original WPE method (Nakatani et al., 1− 4ξn,k
2010), by considering the negative of the logarithm of p(dk |hk ), an
ML-based objective function for the regression weight vector hk where the two parameters ξ n, k and γ n, k are defined as
can be derived as follows,
(1 ) 2
|dn,k |2 |xn,k |
τN
ξn,k = , γ = (20)
k
|rn,k | 2 n,k
|rn,k |2
J (hk ) − log p(dk |hk ) = dn,k
H
−1 d + Kn,k
n,k n,k
(12)
n=1 (1 )
with rn, k = xn,k − dn,k as the reverberant-only component. We ex-
with Kn,k representing the terms independent of hk , which can be ploit (19) to provide primary estimates of dn1 ,k and dn2 ,k and then
discarded. Inserting (10) into (12) and doing further manipulation use recursive smoothing of dn1 ,k dn∗ ,k to estimate the elements of
2
result in the IFC matrix n, k . As explained in Parchami et al. (2016), due
τN to the unavailability of |dn, k |2 and |rn, k |2 , the two parameters de-
k
fined in (20) are not known a priori and have to be substituted
J ( hk ) = hH H H
k An,k hk − bn,k hk − hk bn,k + cn,k (13)
by their approximations. To this end, we use |dˆn−1,k |2 given by
n=1
(19) for |dn, k |2 and a short-term estimate of the spectral variance
where we defined σr2 for |rn, k |2 . To determine the spectral variance σr2 , we re-
n,k n,k
Table 2
Performance comparison of different WPE-based dereverberation methods using the
recorded RIR of room 1 from REVERB Challenge with babble noise.
Table 3
Performance comparison of different WPE-based dereverberation methods using the
recorded RIR of room 2 from REVERB Challenge with white noise.
Table 4
Performance comparison of different WPE-based dereverberation methods using the
recorded RIR of room 3 from REVERB Challenge with pink noise.
D = 3 in our experiments. This is also consistent with the fact that Fig. 6. Improvement in CD for different WPE-based dereverberation methods.
the IFC is more strongly present in the lag values of around 5 or
less, as inferred before from Fig. 3.
To evaluate the reverberation suppression performance of the As observed, whereas the CGG and Laplacian-based methods
proposed method, we compare it to the original WPE method achieve better scores w.r.t. the original WPE, the WPE with speech
(Nakatani et al., 2010), two recent developments of the same spectral variance estimation performs better than the former three
method based on the complex generalized Gaussian (CGG) fam- methods, and finally, the proposed WPE method in this work
ily of distributions (Jukic et al., 2015) and the Laplacian dis- achieves the best results as compared to the previous methods.
tribution (Jukic and Doclo, 2014) for the desired speech, the It should be noted that the superior performance of the proposed
WPE method using the estimation of speech spectral variance WPE with knowledge of IFC shows the possibility of improving the
(Parchami et al., 2016), and finally, the proposed method by us- proposed method through the availability of more accurate IFC ma-
ing the perfect knowledge of the desired speech component. The trix estimates. It is found that the relative performance of the con-
CGG-based method makes use of the same solution for the regres- sidered methods in terms of the four investigated scores is consis-
sion vector Gk as the original WPE method but with a different tent.
estimator of the speech spectral variance in its iterative procedure. Next, to evaluate the performance of the considered dereverber-
The Laplacian-based method does not have a closed-form solution ation methods for different amounts of reverberation, the objec-
for the reverberation prediction weights, Gk , and has to be imple- tive performance measures are obtained by using the synthesized
mented through numerical optimization, e.g. by using the CVX op- RIRs with different T60dB . The results are presented in Figs. 5–8 for
timization toolbox (CVX Research, 2012). Next, the WPE method T60dB in the range of 100–1000 ms. For better visualization, only
presented in Parchami et al. (2016), is actually a particular case the resulting improvements in the performance scores w.r.t. the
of the presented method in this work by disregarding the IFC and unprocessed speech (denoted by PESQ and such) are illustrated.
estimating only the speech spectral variance at each frame inde- As seen in these figures, the proposed method in this work and
pendently. Finally, the proposed WPE method with IFC knowledge the one in Parchami et al. (2016), which are both based on the
is obtained by exploiting only the early component of the speech estimation of the speech spectral variance by means of an LRSV
(this can be obtained in the same manner as that for Fig. 1) as dˆ estimator, perform significantly better than the previous versions
n,k
in (23), and is considered as a reference for comparison. The com- of the WPE method, which estimate the speech spectral variance
parative results obtained by using the recorded RIRs from REVERB iteratively along with the reverberation prediction weights. Also,
Challenge with different noise types are presented in Tables 2–4 in it is observed that the proposed method achieves the best scores
terms of the aforementioned objective performance measures. in comparison with the others in almost the entire range of T60dB .
56 M. Parchami et al. / Speech Communication 87 (2017) 49–57
Acknowledgment
References
Attias, H., Platt, J.C., Acero, A., Deng, L., 2001. Speech denoising and dereverberation
Fig. 7. Improvement in FW-SNR for different WPE-based dereverberation methods. using probabilistic models. Adv Neural Inf. Process. Syst. 13, 758–764.
Brookes, M., 2009. VoiceBOX: speech processing toolbox for MATLAB. Available:
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, last accessed on
May 2016.
Cohen, I., 2005. Relaxed statistical model for speech enhancement and a priori SNR
estimation. IEEE Trans. Speech Audio Process. 13 (5), 870–881.
CVX Research, I., 2012. CVX: matlab software for disciplined convex programming,
version 2.0. Available at http://cvxr.com/cvx, last accessed on May 2016.
Erkelens, J., Heusdens, R., 2010. Correlation-based and model-based blind sin-
gle-channel late-reverberation suppression in noisy time-varying acoustical en-
vironments. IEEE Trans. Audio Speech Language Process. 18 (7), 1746–1765.
Esch, T., 2012. Model-Based Speech Enhancement Exploiting Temporal and Spectral
Dependencies. RWTH Aachen University Ph.D. thesis.
Falk, T.H., Zheng, C., Chan, W.Y., 2010. A non-intrusive quality and intelligibility mea-
sure of reverberant and dereverberated speech. IEEE Trans. Audio Speech Lan-
guage Process. 18 (7), 1766–1774.
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., Zue, V., 1993.
TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Philadelphia: Lin-
guistic Data Consortium, last accessed on May 2016.
Habets, E.A.P., 2007. Single- and Multi-Microphone Speech Dereverberation using
Spectral Enhancement. Technische Universiteit Eindhoven, Netherlands Ph.D.
thesis.
Habets, E.A.P., Benesty, J., Chen, J., 2012. Multi-microphone noise reduction using
interchannel and interframe correlations. In: Proceedings of IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan,
pp. 305–308.
Habets, E.A.P., Gannot, S., Cohen, I., 2009. Late reverberant spectral variance estima-
Fig. 8. Improvement in SRMR for different WPE-based dereverberation methods. tion based on a statistical model. IEEE Signal Process. Lett. 16 (9), 770–773.
Hager, W.W., 1989. Updating the inverse of a matrix. SIAM Rev. 31 (2), 221–239.
Hu, Y., Loizou, P.C., 2008. Evaluation of objective quality measures for speech en-
hancement. IEEE Trans. Audio Speech Language Process. 16 (1), 229–238.
This advantage is more visible for the moderate values of T60dB . Jukic, A., Doclo, S., 2014. Speech dereverberation using weighted prediction error
There is also a considerable gap between the results obtained by with Laplacian model of the desired signal. In: Proceedings of IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy,
using the proposed approach with the suggested estimation of the
pp. 5172–5176.
IFC matrix and those by using the perfect knowledge of the early Jukic, A., van Waterschoot, T., Gerkmann, T., Doclo, S., 2015. Multi-channel linear
speech, which indicates an avenue for further research for the es- prediction-based speech dereverberation with sparse priors. IEEE/ACM Trans.
Audio Speech Language Process. 23 (9), 1509–1520.
timation of the IFC.
Kinoshita, K., Delcroix, M., Nakatani, T., Miyoshi, M., 2009. Suppression of late rever-
beration effect on speech signal using long-term multiple-step linear prediction.
5. Conclusion IEEE Trans. Audio Speech Language Process. 17 (4), 534–545.
Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W.,
Maas, R., 2013. The reverb challenge: a common evaluation framework for dere-
In this work, we proposed a novel WPE dereverberation method verberation and recognition of reverberant speech. In: IEEE Workshop on Appli-
based on an approximate model for the correlation across desired cations of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY,
USA, pp. 1–4.
speech frames, namely the IFC, in the STFT domain. It was shown Lehmann, E. A.,. Image-source method: matlab code implementationAvailable at
that, given an estimate of the IFC matrix, the dereverberation prob- http://www.eric-lehmann.com/, last accessed onMay 2016.
lem of interest can be formulated as a convex quadratic optimiza- Löllmann, H.W., Yilmaz, E., Jeub, M., Vary, P., 2010. An improved algorithm for blind
reverberation time estimation. In: Proceedings of International Workshop on
tion leading to a closed-form Wiener-like solution. Performance Acoustic Echo and Noise Control (IWAENC), Tel Aviv, Israel, pp. 1–4.
evaluations using both recorded and synthesized RIRs reveal that Lu, Y., Loizou, P.C., 2008. A geometric approach to spectral subtraction. Speech Com-
the proposed method considerably outperforms the previous vari- mun. 50 (6), 453–466.
Nakatani, T., Juang, B.H., Yoshioka, T., Kinoshita, K., Delcroix, M., Miyoshi, M., 2008a.
ations of the WPE method.
Speech dereverberation based on maximum-likelihood estimation with time–
It can be concluded that incorporating the statistical model- varying gaussian source model. IEEE Trans. Audio Speech Language Process. 16
based estimation of the desired speech spectral variance (or cor- (8), 1512–1527.
Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.H., 2008b. Blind
relation matrix in general) into the WPE dereverberation method
speech dereverberation with multi-channel linear prediction based on short
can lead to a better reverberation suppression performance. Such time Fourier transform representation. In: Proceedings of IEEE International
an approach, unlike the original WPE method, results in a non- Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA,
iterative estimator for the reverberation prediction weight vector, pp. 85–88.
Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.H., 2010. Speech dere-
provided that proper estimates of the spectral auto- and cross- verberation based on variance-normalized delayed linear prediction. IEEE Trans.
variance of the desired speech terms are available. According to Audio Speech Language Process. 18 (7), 1717–1731.
M. Parchami et al. / Speech Communication 87 (2017) 49–57 57
Naylor, P., Gaubitch, N. (Eds.), 2010, Speech Dereverberation. Springer-Verlag, Lon- Togami, M., Kawaguchi, Y., 2013. Noise robust speech dereverberation with Kalman
don. smoother. In: Proceedings of IEEE International Conference on Acoustics, Speech
Parchami, M., Zhu, W.P., Champagne, B., 2016. Speech dereverberation using lin- and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 7447–7451.
ear prediction with estimation of early speech spectral variance. In: Proceed- Varga, A., Steeneken, H.J.M., 1993. Assessment for automatic speech recognition II:
ings of IEEE International Conference on Acoustics, Speech and Signal Process- NOISEX-92: a database and an experiment to study the effect of additive noise
ing (ICASSP), Shanghai, China. on speech recognition systems. Speech Commun. 12 (3), 247–251.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Numerical recipes: Vaseghi, S.V., 2006. Advanced Digital Signal Processing and Noise Reduction. John
the art of scientific computing (3rd ed.). New York: Cambridge University Press. Wiley & Sons. Chapter 17.
Recommendation P.56: Objective measurement of active speech level 1993. ITU-T. Yoshioka, T., Nakatani, T., Miyoshi, M., 2009. Integrated speech enhancement
Recommendation P.862: Perceptual evaluation of speech quality (PESQ), an objective method using noise suppression and dereverberation. IEEE Trans. Audio Speech
method for end-to-end speech quality assessment of narrow-band telephone Language Process. 17 (2), 231–246.
networks and speech codecs 2001. ITU-T. Yoshioka, T., Sehr, A., Delcroix, M., Kinoshita, K., Maas, R., Nakatani, T., Keller-
Schmid, D., Malik, S., Enzner, G., 2012. An expectation-maximization algorithm for mann, W., 2012. Making machines understand us in reverberant rooms: robust-
multichannel adaptive speech dereverberation in the frequency-domain. In: Pro- ness against reverberation for automatic speech recognition. IEEE Signal Process.
ceedings of IEEE International Conference on Acoustics, Speech and Signal Pro- Mag. 29 (6), 114–126.
cessing (ICASSP), Kyoto, Japan, pp. 17–20.
SimData: dev and eval sets based on WSJCAM0, 2013. REVERB challenge. Available
at http://reverb2014.dereverberation.com/download.html, last accessed on May
2016.