0% found this document useful (0 votes)

48 views13 pages

Convolutive Blind Source Separation With Wiener Po PDF

This document summarizes a research paper that proposes a method for blind source separation in the time-frequency domain to improve speech recognition of separated speech sources. The method combines two separation techniques and uses a time-frequency Wiener filter as post-processing. The algorithm was evaluated on Spanish speech recorded in a reverberant room by two microphones near two active sound sources. Speech recognition rates on the separated speech showed around a 70% improvement over the noisy case.

Uploaded by

Kaiser Sozi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views13 pages

Convolutive Blind Source Separation With Wiener Po PDF

Uploaded by

Kaiser Sozi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228941010

Convolutive blind source separation with Wiener post-ﬁltering for robust

speech recognition

Conference Paper · January 2006

CITATIONS READS

0 140

3 authors, including:

Leandro Ezequiel Di Persia Diego Milone

Universidad Nacional del Litoral National Scientific and Technical Research Council
36 PUBLICATIONS 261 CITATIONS 140 PUBLICATIONS 1,130 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Machine learning for Imbalanced data View project

Tools for Precision Livestock Production View project

All content following this page was uploaded by Diego Milone on 23 May 2014.

The user has requested enhancement of the downloaded file.

Convolutive Blind Source Separation with
Wiener Post-Filtering for Robust Speech
Recognition?

Leandro Di Persia1,2 , Diego Milone1,2 , and Masuzo Yanagida3

1
Grupo de Investigación en Señales e Inteligencia Computacional. Facultad de
Ingenierı́a y Ciencias Hı́dricas, Universidad Nacional del Litoral, Argentina
2
Laboratorio de Cibernética. Facultad de Ingenierı́a, Universidad Nacional de Entre
Rı́os, Argentina
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

3
Department of Knowldge Engineering, Doshisha University, Japan
[email protected], [email protected],
[email protected]

Abstract. Blind source separation for convolutive mixtures of sound

sources is a complex task, mainly because the mixing filters are long and
non-minimum phase. One approach to solve this problem is frequency
domain blind source separation, in which the separation is calculated for
each frequency bin in the time-frequency domain. Although there are
several methods for this task, separation quality is degraded by many
factors. This paper presents a method for separation in time-frequency
domain, that combines the advantages of other two separation methods
and uses a time-frequency Wiener filter as post-processing to increase
separation quality. The algorithm has been evaluated over a database of
Spanish speech recorded in a reverberant room using two active sound
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

sources and two microphones. Speech recognition results show an incre-

ment in recognition rate of the separated speech in the order of 70% from
the noisy case.

1 Introduction

The objective of blind source separation (BSS) consist of, given a set of sound
field measurements obtained by means of microphones in specified locations,
to obtain a set of signals approximating the original sound sources that have
produced the sound field. In the case of free-field propagation of sound (i.e. in
open spaces without enclosures), the sound wave originated in each source arrives
only one time to each sensor. In such a way, the mixture can be considered as
a linear additive mixture. On the contrary, when the mixture is produced inside
an enclosed environment, the sound waves are reflected by every solid surface in
the room and so each microphone receives not only the direct sound wave but
also all the reflections, and more over, the reflections of all orders until energy of
?
This work is supported by ANPCyT-UNER, under Project PICT N 11-12700, UNL-
CAID 012-72 and CONICET
AST2006. 2006.
2 Leandro Di Persia, Diego Milone, and Masuzo Yanagida

the source vanishes. This phenomenon, called reverberation, can be modeled as

the output of an LTI system [1], that is, as a convolution between the original
sound source and the impulse response of the room.
As a result of reverberation, the mixture as recorded at microphones is not
simply additive, but it must be considered as a convolutive mixing, in such a way
that each microphone is excited by the addition of filtered versions of the original
sources. This reverberation phenomenon produces echoes and spectral distortion
that degrades recognition rates in case of automatic speech recognition (ASR)
systems [2], even if the system is trained with reverberant signals recorded in
the same room [3].
Given a number M of active sources and a number N of sensors, with N ≥ M ,
assuming that the environment effect can be modeled as the output of an LTI
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

system, measured signals at each microphone can be modeled as a convolutive

mixture model [4]:
M
X
xj (t) = hji (t) ∗ si (t) (1)
i=1

where xj is the j-th microphone signal, si is the i-th source, hji is the impulse
response of the room from source i to microphone j, and ∗ stands for convolution.
This equation can be written in compact form as:

x (t) = H (t) ∗ s (t) . (2)

Taking a short-time Fourier transform (STFT) of the previous equation, the
convolution becomes a multiplication, and assuming that the mixture filters are
constant over time (that is, impulse responses does not vary in time), this can
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

be written as:

x(ω, τ ) = H(ω)s(ω, τ ) . (3)

Thus, for a fixed frequency bin ω this means that a simpler instantaneous
mixture model can be applied. Under the assumption of statistical independence
of the sources over the STFT time τ , the separation model for each frequency
bin can be solved using one of the methods for Independent Component Analysis
(ICA) [5]. In this context, for each frequency bin ω a matrix W (ω) is searched
such as:
y(ω, τ ) = W(ω)x(ω, τ ) (4)
where resulting separated bins y (ω, τ ) should be approximately equal to the
original s (ω, τ ).
To simplify notation, as from now on all equations would be dealing with
time-frequency representations, we will obviate time and frequency variables,
and so, for example, x must be interpreted as x (ω, τ ), except if the context
makes confuse that interpretation, in such case it will be explicitly written.
To estimate separation matrix W several algorithms have been applied. Some
authors have used second-order statistics and decorrelation procedures [6,7,8].
AST2006. 2006.
Convolutive BSS with Wiener Post-Filtering for Robust ASR 3

Others have proposed the use of fixed-point algorithms derived from FastICA
algorithm [9,10,11]. Some information theory derived algorithms based on mini-
mization of mutual information [12], information maximization (InfoMax) [13] or
Kullback-Leibler divergency [14], combined with Natural Gradient [4] have been
also successfully used. Recently also, some algorithms combining several tech-
niques have been presented, as in [15] where a combination of FastICA followed
by an ICA by InfoMax with Natural Gradient is used.
In this paper the separation is solved by a combination of two methods. Then,
a Wiener time-frequency filter is estimated and applied in order to improve
the separation. In the following sections, the algorithm will be explained in
detail. Then, some experiments to evaluate the quality of the separation will be
presented, followed by results, analysis of results and finally some conclusions
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

and future works.

2 Separation Algorithms

For each frequency bin, two separation algorithms for complex-valued signals
are sequentially applied. In the first stage, Joint Approximate Diagonalization
of Eigenmatrices (JADE) algorithm [16] is applied to obtain a first estimation
of separation matrix W. Then, this separation matrix is refined by using it as
initial condition for FastICA algorithm [17].
After separation in each frequency bin there are two problems to solve. All
ICA algorithms are able to obtain an estimation of sources up to an scaling and
permutation indeterminacy, what means that for each frequency the resulting
sources will have different scaling and different sorting. Thus, before any other
step, one need to order the sources and obtain a consistent scaling for each
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

frequency.
Following this, a Wiener time-frequency filter estimated from the separated
sources is applied. Finally an inverse STFT is applied to yield time-domain
sources. In the following subsections, these aspects will be discussed in detail.

2.1 JADE Algorithm

JADE algorithm is an independent component analysis (ICA) method that uses

explicit High Order Statistics by means of fourth-order circular cumulant tensor.
For a random vector X with probability density function fX (x), the fourth order
circular cumulant is given by:
jl
= Cum Xi , Xj∗ , Xk , Xl∗ .

Cik,X (5)
Cardoso et al [16] proposed to obtain the unitary matrix W by means of
maximizing the cost function
M
X il 2
J (W) = Cik,Y (6)
i,k,l=1
AST2006. 2006.
4 Leandro Di Persia, Diego Milone, and Masuzo Yanagida

where y is observed as in (4). This optimization is equivalent to a join diagonal-

ization of a set of eigen-matrices.
jl
Given a matrix P , the fourth-order cumulant Cik,X defines a linear transfor-
mation Ω (P ) in such a way that
X jl
Ω (P )i,j = Cik,X Pk,l . (7)
k,l
2
This linear transformation has M eigen-matrices Ep,q that satisfy Ω (Ep,q ) =
λp,q Ep,q . It is enough to find the M most significant eigenmatrices (i.e. those with
bigger associated eigenvalues) and perform approximate joint-diagonalization of
these eigen-matrices to obtain separation matrix W for signals in x. Before
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

applying this algorithm, a whitening transformation is performed in order to

eliminate second order correlations and simplify the algorithm convergence.

2.2 FastICA Algorithm

Although JADE has good separation capabilities, its performance can be im-
proved by using the separation matrix obtained as initial value for another algo-
rithm that refines it by optimizing some contrast function. In this case, FastICA
algorithm has been used [17].
This algorithm uses a deflationary approach where each source is extracted
sequentially. Therefore, for each source a separation vector of signal in frequency
domain wi is pursued such that s̃i = wiH x will be approximately one of the
sources. To search for the proper vector wi , an optimization problem is solved.
This optimization is stated as maximizing
M
X M
X
E G wiH x

JG (wi ) = with respect to wi , i = 1, . . . , M (8)
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

i=1 i=1

subject to
wiH x wjH x = δij .

E (9)
+
In this equations, E {·} is the expectation operator, and G : R ∪ 0 → R is a
smooth even function. For this work we have used function G(y) = log (α + y),
with α = 0.1.
To achieve this, the contrast function JG (wi ) is maximized to obtain a sep-
aration vector wi , then a deflationary Gram-Schmidt-like decorrelation is used
to eliminate the information of previously obtained sources and this process is
iterated until all desired sources are extracted. It must be noted that matrix W
will have wiH as its i-th row. An alternative to this sequential extraction is to
extract all sources at once, optimizing a matrix W and using an ortonormaliza-
tion method on that matrix after each iteration. In this paper we have used the
deflationary approach.
This is a Newton-like fixed-point iteration with quadratic convergence, and
as such, it is very fast. As all fixed point methods, it depends on good initial
conditions estimation, and we have a good estimation using the output of JADE
algorithm.
AST2006. 2006.
Convolutive BSS with Wiener Post-Filtering for Robust ASR 5

2.3 Indeterminacies

To solve scaling and permutation indeterminacies, a variant of the method pro-

posed by [7] has been used. For the scaling ambiguity, the approach consists of
recovering the filtered versions of the sources instead of the sources themselves.
So, mixtures are modeled as x = v1 , . . . , vM . Using separation matrix W and
its inverse (i.e. estimated mixing matrix) W−1 , one can write:

x = W−1 y
= W−1 Wx
= W−1 IWx
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

= W−1 (E1 + · · · + EM ) Wx
= W−1 E1 y + · · · + W−1 EM y
= v1 + . . . + vM . (10)

where Ei is a matrix with a one in the i-th diagonal element and zeros elsewhere.
It is easy to prove that the representation of vi is independent of the scaling in
matrix W.
Now for the permutation problem, the approach makes use of the fact that
the envelopes of different sound signals must be different and also, that if the sig-
nals are independent the correlation between envelopes of the separated sources
must vanish. This must be true for one frequency bin, however one can expect
that successive frequency bins should share the same or similar envelopes. This
is the information used to solve permutation problem: starting from some fre-
quency band, an estimation of the envelope based on previous classified bands is
calculated. Then, for each separated signal in a new frequency bin, correlation
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

between its envelope and the estimated one for pre-classified bins is calculated,
and the signal is assigned to that of maximum correlation value [7].
In the original paper, pre-classified envelopes are estimated as an average
of all the previously classified envelopes in that class. In this paper, instead of
using this approach, we assume that in the averaging process, the last classified
envelopes must have more weight since they will be more similar to the envelopes
following for classification. Therefore instead of a simple averaging of envelopes,
we update that value as

E (k)j = E (k − 1)j + αE (k)j . (11)

where E (·)j refers to the locally averaged envelope for source class j, and E (·)j
to the last classified envelope for this class.
After this process we obtain time-frequency representations for each of the
source component, in each sensor. That is, for each source we obtain N time-
frequency representations, each one corresponding to the effect of that source in
one of the sensors, isolated from the other sources effects. Since usually only one
representation for each source is needed, in this study we use the alternative of
keeping the one with bigger energy to be used in the following steps.
AST2006. 2006.
6 Leandro Di Persia, Diego Milone, and Masuzo Yanagida

2.4 Time-Frequency Wiener Filter

The separation result will not be perfect mainly due to reverberation times and
the simplified time-invariant modeling. When reverberation time increases, the
performance of the algorithms tend to decrease. Therefore we propose the use of
a non-causal time-frequency Wiener filter as post-processing [18]. Without losing
generality, this will be explained for the two sources, two microphones case, and
the generalization to more sources being straightforward.
The short-time Wiener filter HW for a signal generated by the simple additive
noise model is:
2 2
|z̃(ω, τ )| − |ñ(ω, τ )|
HW (ω, τ ) = 2 (12)
|z̃(ω, τ )|
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

where ñ represents the estimated additive noise. In this case we obtain two
signals, ṽ1 and ṽ2 and if the separation process was successful one can uses them
as estimation of the clean sources.
So, in order to eliminate residual information from source v2 on source v1 we
can use the short-time power spectrum of ṽ1 as numerator (estimation of clean
source) and add short-time power spectrum of ṽ1 and ṽ2 as the estimation of
noisy power spectrum in denominator. Moreover, as we know that both signals
will have some information from the other, and this sharing would be not uniform
over the whole time-frequency plane, one can use time-frequency weights to
reduce the effect of the filter, as expressed in the following equation:
2
|ṽ1 (ω, τ )|
HW,1 (ω, τ ) = 2 2 (13)
|ṽ1 (ω, τ )| + C(ω, τ ) |ṽ2 (ω, τ )|

where the weighting matrix C (ω, τ ) ∈ [0, 1].

sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

If the time-frequency contents of ṽ1 and ṽ2 are very similar (so for that time-
frequency coordinate the separation was not well done), the weights must be
close to zero, otherwise they must be near to one. There are several ways to set
these weights. One simple way may include dot products to determine time and
frequency similitude of power spectrum.
The short-time Wiener filter to improve source v2 , HW,2 (ω, τ ) is calculated
in a similar way to (13), with the roles of v1 and v2 interchanged.

3 Results and Discussion

To test the capabilities of this algorithm, some experiments have been made.
Sentences for the experiments were extracted from Albayzin Spanish speech
corpus [19]. From this big database, a subset of 605 sentences were selected,
and those were divided into a training set of 585 a test set of 20 sentences. The
training set was used to train a recognizer with clean data. The test set has 5
sentences spoken by 4 speakers. Selected sentences were 3001, 3006, 3010, 3018,
3022, from speakers aagp and algp (female), and mogp and nyge (male). Those
20 sentences were recorded in a room according to Fig. 1.
AST2006. 2006.
Convolutive BSS with Wiener Post-Filtering for Robust ASR 7

490

200
100 100
(height : 125)

source noise

100 400

2 1
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

5
(height : 100)

Ceiling height :290

Fig. 1. Experimental setup (all dimensions in cm).

Mixed signal 1
1

0.5
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

−0.5

−1
0 0.5 1 1.5 2 2.5 3

Mixed signal 2
1

0.5

−0.5

−1
0 0.5 1 1.5 2 2.5 3

Fig. 2. Mixed signals. Top: microphone 1; bottom: microphone 2. Signal to noise power
ratio: 0 dB
AST2006. 2006.
8 Leandro Di Persia, Diego Milone, and Masuzo Yanagida

Two loudspeakers were used, one to reproduce desired speech source and the
other to reproduce some kind of noise. The resulting sound field was recorded
with two Ono Sokki MI 1233 omnidirectional measurement microphones, with
flat frequency response between 20 Hz to 20 kHz and with preamplifiers Ono
Sokki model MI 3110.
Interfering sources were of two kinds, speech and white noise. For speech
noise, sentence 3110 and speakers aagp and nyge where selected. To contaminate
female utterances, male speech (nyge) was used, and vice versa. White noise from
Noisex database [20] was used. For both noise kinds, two different signal to noise
output power ratios were selected, 0 dB and 6 dB. A 0 dB output power ratio
means that at speakerphones, both signals were replayed with equal powers 4 .
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

All recordings where made at 16000 Hz of sampling frequency with 16 bits

quantization. The room used was a sound proof room, with additional plywood
reverberation boards in two of the walls to increase reverberation times up to
about 260 ms.
To show an example of the algorithm output, the signal aagp 3002 mixed
with speech noise at 0 dB was processed with the algorithm. Figure 2 shows
the resulting mixed signals as measured by microphones, Fig. 3 shows result-
ing separated signals and Fig. 4 the source signals (clean, before mixing). This
example shows a good separation with large noise reduction, even in a mixture
with very strong noise. When listening to the output, the utterance by female
speaker can be clearly distinguished, while male noise is heard as a quite low
volume background.
In Fig. 5 spectrograms (in dB scale) for mixed signal, separated signal and
original signal are shown. A large improvement can be seen, specially in the area
where desired signal has low amplitude (towards the end), and also it can be
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

seen how the main structures of the original signal are present in a enhanced
way in the separated signal.
To test performance of the algorithm, we have used a speech recognition
system to estimate the improvement on word recognition rate before and after
separation. For this test, we have used a continuous speech recognizer based
on tied Gaussian-mixtures Hidden Markov Models (HMM). This recognizer was
trained first with 585 sentences from Minigeo subset of Albayzin database (the
training set does not includes any of the phrases used in the test).
After training, we have tested the recognition system with the clean sentences
of our test set. To see how the mixing process degrades recognition output we
have also evaluated recognition accuracy over the mixtures. We have then applied
the separation algorithm without the time-frequency Wiener filter (that test is
denoted as J+F because includes only Jade and FastICA separation) and per-
formed recognition accuracy test. Finally, the algorithm for separation including
Wiener filter was applied (this test is named J+F+W) and recognition accuracy
test was performed on that data. Table 1 shows the results of these tests. In
the table, for each test the word recognition accuracy percentage (WACC %) is

4
In similar way to standard SNR
AST2006. 2006.
Convolutive BSS with Wiener Post-Filtering for Robust ASR 9

Separated source 1
1

0.5

−0.5

−1
0 0.5 1 1.5 2 2.5 3

Separated source 2
1
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

0.5

−0.5

−1
0 0.5 1 1.5 2 2.5 3

Fig. 3. Separated signals. Top: separated source 1; bottom: separated source 2

Source
0.5
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

−0.5
0 0.5 1 1.5 2 2.5 3

Noise
0.4

0.2

−0.2

−0.4
0 0.5 1 1.5 2 2.5 3

Fig. 4. Source signals. Top: aagp3002, desired source, female; bottom: nyge3110, noise,
male.
AST2006. 2006.
10 Leandro Di Persia, Diego Milone, and Masuzo Yanagida

a)
4000
Frequency [Hz]
3000

2000

1000

0
0 0.5 1 1.5 2 2.5
b)
4000
Frequency [Hz]

3000

2000
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

1000

0
0 0.5 1 1.5 2 2.5
c)
4000
Frequency [Hz]

3000

2000

1000

0
0 0.5 1 1.5 2 2.5
Time [s]

Fig. 5. Spectrograms of a) mixed signal; b) separated signal and c) source signal, for
a mixture of speech with speech interference emitted with equal power (power ratio of
0 dB).
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

calculated as
N −D−S−I
WACC % = 100 (14)
N
where N is the number of words in the reference transcription, D is the number of
deletion errors (words present in the reference transcription that are not present
in the system transcription), S is the number of substitution errors (words that
were substituted by others in the system transcription) and I is the number
of insertion errors (extra words that were in the system transcription but not
in the reference transcriptions). This measure is a more representative figure of
recognizer performance than standard word recognition rate [21]. As can be seen
from these results, word accuracy rate improvements are in the order of 70%.

4 Conclusions and future works

In this paper an algorithm for blind source separation of convolved sources has
been presented. The use of Wiener post-filtering to improve the output of the
AST2006. 2006.
Convolutive BSS with Wiener Post-Filtering for Robust ASR 11

Test Environment WACC %

Noise kind PR dB Mixtures J+F J+F+W
Speech 0 -6.50 38.00 65.50
6 18.50 68.50 71.50
White 0 3.60 33.00 56.50
6 12.85 69.50 81.50

Table 1. Word accuracy in robust speech recognition. For reference, with clean sources
WACC % = 91.50%. PR: Power ratio in dB.

algorithm allows an important reduction in interfering signal power, particularly

L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

in the areas were the desired source has low power. As shown by the example
in Fig. 2, 3 and 4, the quality can be enhanced to a great extent even in a very
bad mixture with equal noise power.
Also, robust speech recognition rates shown a very important improvement
in word accuracy, from almost zero percent for mixtures to about 70% after
separation with the proposed algorithm. It must be noted that the use of wiener
filter has a big effect in word accuracy improvement.
There are some issues that must be addressed for future works. First, we
need to explore the capabilities of this algorithm for shorter data. The tests
presented here were performed on data with an average duration of 2 seconds.
Some applications, like remote controlling of home devices via voice commands
or real-time processing for hearing aids, require shorter data to be processed.
The algorithms used for separation in each frequency bin need a large amount
of data to estimate accurately the statistical properties of signals, so we need to
check whether they will still work in cases with less amount of data present.
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

Finally, some fine tuning of algorithm parameters, like window kind and
length used in calculating short time Fourier transform or parameters for the
Wiener filter, will be explored.

References
1. Kahrs, M., Brandenburg, K., eds.: Applications of Digital Signal Processing to Au-
dio and Acoustics. The Kluwer International Series In Engineering And Computer
Science. Kluwer Academic Publishers (2002)
2. Kinsbury, B., Morgan, N.: Recognizing reverberant speech with RASTA-PLP. In:
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing. (1997) 1259–1262
3. Benesty, J., Makino, S., Chen, J., eds.: Speech Enhancement. Signals and Com-
munication Technology. Springer (2005)
4. Cichocki, A., Amari, S.i.: Adaptive Blind Signal and Image Processing. Learning
Algorithms and applications. John Wiley & Sons (2002)
5. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John
Wiley & Sons, Inc. (2001)
6. Parra, L., Spence, C.: Convolutive blind separation of non-stationary sources.
IEEE Transactions on Speech and Audio Processing 8(3) (2000) 320–327
AST2006. 2006.
12 Leandro Di Persia, Diego Milone, and Masuzo Yanagida

7. Murata, N., Ikeda, S., Ziehe, A.: An approach to blind source separation based on
temporal structure of speech signals. Neurocomputing 41(1-4) (2001) 1–24
8. Araki, S., Makino, S., Hinamoto, Y., Mukai, R., Nishikawa, T., Saruwatari, H.:
Equivalence between Frequency-Domain blind source separation and Frequency-
Domain adaptive beamforming for convolutive mixtures. EURASIP Journal on
Applied Signal Processing 2003(11) (2003) 1157–1166
9. Mitianoudis, N., Davies, M.: New fixed-point ica algorithms for convolved mixtures.
In: Proceedins of the Third International Conference on Independent Component
Analysis and Source Separation. (2001) 633–638
10. Prasad, R., Saruwatari, H., Lee, A., Shikano, K.: A fixed-point ica algorithm for
convoluted speech signal separation. In: Proceedings of the Fourth International
Symposium on Independent Component Analysis and Blind Signal Separation.
(2003) 579–584
L. Di Persia, D. H. Milone & Masuzo Yanagida; "Convolutive Blind Source Separation with Wiener post-filtering for robust Speech Recognition"

11. Gotanda, H., K. Nobu, Koya, T., Kaneda, K., Ishibashi, T., Haratani, N.: Permu-
tation correction and speech extraction based on split spectrum through fastica. In:
Proceedings of the Fourth International Symposium on Independent Component
Analysis and Blind Signal Separation. (2003) 379–384
12. Douglas, S.C., Sun, X.: Convolutive blind separation of speech mixtures using the
natural gradient. Speech Communication 39(1-2) (2003) 65–78
13. Sawada, H., Mukai, R., Araki, S., Makino, S.: A robust and precise method for solv-
ing the permutation problem of frequency-domain blind source separation. IEEE
Transactions on Speech and Audio Processing 12(5) (2004) 530–538
14. Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental
limitation of frequency domain blind source separation for convolutive mixtures of
speech. IEEE Transactions on Speech and Audio Processing 11(2) (2003) 109–116
15. Makino, S., Sawada, H., Mukai, R., Araki, S.: Blind Source Separation of Con-
volutive Mixtures of Speech in Frequency Domain. IEICE Trans Fundamentals
E88-A(7) (2005) 1640–1655
16. Cardoso, J.F., Souloumiac, A.: Blind beamforming for non Gaussian signals. IEE
sinc(i) Laboratory for Signals and Computational Intelligence (http://fich.unl.edu.ar/sinc)

Proceedings-F 140 (1993) 362–370

17. Bingham, E., Hyvarinen, A.: A fast fixed-point algorithm for independent compo-
nent analysis of complex valued signals. International journal of Neural Systems
10(1) (2000) 1–8
18. Huang, Y.A., Benesty, J., eds.: Audio Signal Processing for next-generation mul-
timedia communication systems. Kluwer Academic Press (2004)
19. Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J., C. Nadeu:
Albayzin speech database design of the phonetic corpus. Technical report, Univer-
sitat Politècnica de Catalunya (UPC), Dpto. DTSC (1993)
20. Varga, A., Steeneken, H.: Assessment for automatic speech recognition II NOISEX-
92: A database and experiment to study the effect of additive noise on speech
recognition systems. Speech Communication 12(3) (1993) 247–251
21. Yung, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J.,
Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (for HTK
Version 3.3). Cambridge University Engineering Department, Cambridge. (2005)
AST2006. 2006.

View publication stats

Convolutive Blind Source Separation Survey
No ratings yet
Convolutive Blind Source Separation Survey
34 pages
Underdetermined Blind Source Separation Using Capsnet
No ratings yet
Underdetermined Blind Source Separation Using Capsnet
9 pages
Separation Based On Fast-Convergence Algorithm Using ICA Beamforming For Real Convolutive
No ratings yet
Separation Based On Fast-Convergence Algorithm Using ICA Beamforming For Real Convolutive
4 pages
Blind Separation of Disjoint Orthogonal Signals: Demixing N Sources From Mixtures
No ratings yet
Blind Separation of Disjoint Orthogonal Signals: Demixing N Sources From Mixtures
4 pages
Group 6 Project BASS Report 1
No ratings yet
Group 6 Project BASS Report 1
3 pages
Hiroshi SARUWATARI, Toshiya KAWAMURA, and Kiyohiro SHIKANO
No ratings yet
Hiroshi SARUWATARI, Toshiya KAWAMURA, and Kiyohiro SHIKANO
4 pages
Audio Signal Separation with ICA
No ratings yet
Audio Signal Separation with ICA
6 pages
A Two-Stage Frequency-Domain Blind Source Separation Method For Underdetermined Convolutive Mixtures
No ratings yet
A Two-Stage Frequency-Domain Blind Source Separation Method For Underdetermined Convolutive Mixtures
4 pages
Blind Source Separation Based On Time Frequency Signal Representation
No ratings yet
Blind Source Separation Based On Time Frequency Signal Representation
10 pages
Blind Source Separation Based On A Fast-Convergence Algorithm Combining ICA and Beamforming
No ratings yet
Blind Source Separation Based On A Fast-Convergence Algorithm Combining ICA and Beamforming
13 pages
Blind Extraction of A Dominant Source Signal From Mixtures of Many Sources Audio Source Separation Applications
No ratings yet
Blind Extraction of A Dominant Source Signal From Mixtures of Many Sources Audio Source Separation Applications
4 pages
Submitted Paper
No ratings yet
Submitted Paper
20 pages
Blind Source Separation Combining Frequency-Domain Ica and Beamforming
No ratings yet
Blind Source Separation Combining Frequency-Domain Ica and Beamforming
4 pages
Imm 5270
No ratings yet
Imm 5270
156 pages
Neural Blind Source Separation and Diarization For Distant Speech Recognition
No ratings yet
Neural Blind Source Separation and Diarization For Distant Speech Recognition
5 pages
Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking
No ratings yet
Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking
9 pages
46 Silence PDF
No ratings yet
46 Silence PDF
8 pages
Multichannel Audio Source Separation
No ratings yet
Multichannel Audio Source Separation
6 pages
Hiroshi SARUWATARI, Toshiya KAWAMURA, and Kiyohiro SHIKANO
No ratings yet
Hiroshi SARUWATARI, Toshiya KAWAMURA, and Kiyohiro SHIKANO
6 pages
Sac Hi
No ratings yet
Sac Hi
5 pages
Vincent TASLP06bis
No ratings yet
Vincent TASLP06bis
10 pages
Source Separation: Principles, Current Advances and Applications
No ratings yet
Source Separation: Principles, Current Advances and Applications
10 pages
Speech Enhancement
No ratings yet
Speech Enhancement
9 pages
Speech Separation with CRFs
No ratings yet
Speech Separation with CRFs
9 pages
Conv-TasNet Surpassing Ideal TimeFrequency Magnitude Masking For Speech Separation
No ratings yet
Conv-TasNet Surpassing Ideal TimeFrequency Magnitude Masking For Speech Separation
11 pages
Designing The Wiener Post-Filter For Diffuse Noise Suppression Using Imaginary Parts of Inter-Channel Cross-Spectra
No ratings yet
Designing The Wiener Post-Filter For Diffuse Noise Suppression Using Imaginary Parts of Inter-Channel Cross-Spectra
5 pages
Expectation-Maximisation For Speech Source Separation Using Convolutive Transfer Function
No ratings yet
Expectation-Maximisation For Speech Source Separation Using Convolutive Transfer Function
7 pages
Blind Separation of Two Human Speech Signals Using Natural Gradient Algorithm by Employing The Assumptions of Independent Component Analysis
No ratings yet
Blind Separation of Two Human Speech Signals Using Natural Gradient Algorithm by Employing The Assumptions of Independent Component Analysis
4 pages
Scc04 Rinas
No ratings yet
Scc04 Rinas
7 pages
Yu Xuan Wang 2014
No ratings yet
Yu Xuan Wang 2014
10 pages
Separation of Mixed Source Signals
No ratings yet
Separation of Mixed Source Signals
4 pages
Agrawal Et Al - 2023 - A Review On Speech Separation in Cocktail Party Environment
No ratings yet
Agrawal Et Al - 2023 - A Review On Speech Separation in Cocktail Party Environment
33 pages
Ref 28
No ratings yet
Ref 28
6 pages
Imagine That You Are in A Room Where Two People Are Speaking Simultaneously
No ratings yet
Imagine That You Are in A Room Where Two People Are Speaking Simultaneously
1 page
Microphone Array Signal Separation
No ratings yet
Microphone Array Signal Separation
75 pages
Adaptive Wiener Filter for Speech Enhancement
No ratings yet
Adaptive Wiener Filter for Speech Enhancement
9 pages
Blind Source Separation and Multichannel Deconvolution: Editorial
No ratings yet
Blind Source Separation and Multichannel Deconvolution: Editorial
2 pages
Article 6
No ratings yet
Article 6
20 pages
Unclassified 2012 Sijtsma AcousticBeamformingForTheRankingOfAircraftNoise
No ratings yet
Unclassified 2012 Sijtsma AcousticBeamformingForTheRankingOfAircraftNoise
52 pages
A Multimodal Approach For Frequency Domain Independent Component Analysis With Geometrically-Based Initialization
No ratings yet
A Multimodal Approach For Frequency Domain Independent Component Analysis With Geometrically-Based Initialization
5 pages
Glotin H. (Ed.) - Soundscape Semiotics. Localization and Categorization PDF
No ratings yet
Glotin H. (Ed.) - Soundscape Semiotics. Localization and Categorization PDF
193 pages
A Permutation Algorithm of Frequency-Domain Blind Source Separation Based On Influence Weights
No ratings yet
A Permutation Algorithm of Frequency-Domain Blind Source Separation Based On Influence Weights
6 pages
Matlab Speech Segmentation Guide
No ratings yet
Matlab Speech Segmentation Guide
3 pages
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
No ratings yet
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
30 pages
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
No ratings yet
Microphone Interference Reduction in Live Sound Alice Clifford, Josh Reiss Centre For Digital Music Queen Mary, University of London London, UK Alice - Clifford@eecs - Qmul.ac - Uk
8 pages
Tonelli 2012
No ratings yet
Tonelli 2012
166 pages
Combined Beamforming and Frequency Domain Ica For Source Separation
No ratings yet
Combined Beamforming and Frequency Domain Ica For Source Separation
4 pages
Super-Exponential Methods For Blind Deconvolution: Shalvi and Ehud Weinstein, Ieee
No ratings yet
Super-Exponential Methods For Blind Deconvolution: Shalvi and Ehud Weinstein, Ieee
16 pages
Improving The Global Parameter Signal To Distortion Value in Music Signals Using Panning Technique and Discrete Wavelet Transforms
No ratings yet
Improving The Global Parameter Signal To Distortion Value in Music Signals Using Panning Technique and Discrete Wavelet Transforms
10 pages
Ijacst04742018 PDF
No ratings yet
Ijacst04742018 PDF
4 pages
Speech Recognition with Wiener Filter
No ratings yet
Speech Recognition with Wiener Filter
5 pages
Conv-TasNet: Advanced Speech Separation
No ratings yet
Conv-TasNet: Advanced Speech Separation
12 pages
Speech Separation with Conv-TasNet
No ratings yet
Speech Separation with Conv-TasNet
12 pages
BarkCE05 SFD Spcomm
No ratings yet
BarkCE05 SFD Spcomm
21 pages
BASNET
No ratings yet
BASNET
5 pages
Joint Spectrogram Separation and TDOA Estimation Using Optimal Transport
No ratings yet
Joint Spectrogram Separation and TDOA Estimation Using Optimal Transport
5 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
PHD Thesis
No ratings yet
PHD Thesis
99 pages
SEAS Excel 5" Coaxial Woofer Specs
No ratings yet
SEAS Excel 5" Coaxial Woofer Specs
1 page
CAR 521 Table of Concordance
No ratings yet
CAR 521 Table of Concordance
16 pages
2018 05 Darch Han Yuqing
No ratings yet
2018 05 Darch Han Yuqing
169 pages
Better Photography
100% (2)
Better Photography
118 pages
Data Analytics Unit 1 and 2 Important Questions
No ratings yet
Data Analytics Unit 1 and 2 Important Questions
37 pages
LED Lighting Control Station DIN
No ratings yet
LED Lighting Control Station DIN
2 pages
VC PDE Lesson Plan
No ratings yet
VC PDE Lesson Plan
2 pages
PICO: Model For Clinical Questions: Evidence-Based Medicine August 2018
No ratings yet
PICO: Model For Clinical Questions: Evidence-Based Medicine August 2018
3 pages
OP1 Manual (December 2006)
0% (1)
OP1 Manual (December 2006)
20 pages
New Functions Manual 3 - 5 - 3
No ratings yet
New Functions Manual 3 - 5 - 3
8 pages
COP 4530 - Final Project Slides
No ratings yet
COP 4530 - Final Project Slides
11 pages
Gnav Recapitulare
No ratings yet
Gnav Recapitulare
24 pages
Staad To Mat3d Platform
No ratings yet
Staad To Mat3d Platform
17 pages
Mathematics: Quarter 1 - Module 4
No ratings yet
Mathematics: Quarter 1 - Module 4
12 pages
Phonetic and Phonological Levels
No ratings yet
Phonetic and Phonological Levels
11 pages
Atypical Combinations and Scientific Impact
No ratings yet
Atypical Combinations and Scientific Impact
7 pages
EBC-X Battery Tester User Manual V1.6.5
No ratings yet
EBC-X Battery Tester User Manual V1.6.5
15 pages
Module 1.2 - The Three Basic Economic Questions
No ratings yet
Module 1.2 - The Three Basic Economic Questions
3 pages
Sequence Networks for 25 MVA Generator
50% (2)
Sequence Networks for 25 MVA Generator
4 pages
Sound Online Class 2024 January OLEVEL
No ratings yet
Sound Online Class 2024 January OLEVEL
59 pages
EURAPIPE ABS Product Catalogue PDF
No ratings yet
EURAPIPE ABS Product Catalogue PDF
20 pages
Short Answer Questions Strategy Practice
No ratings yet
Short Answer Questions Strategy Practice
4 pages
GWP Profile May 2017-r1
No ratings yet
GWP Profile May 2017-r1
58 pages
Understanding Electric Charge and Current
No ratings yet
Understanding Electric Charge and Current
2 pages
Lime Sax Audition Process Guide
No ratings yet
Lime Sax Audition Process Guide
9 pages
Terraform Ashok It
No ratings yet
Terraform Ashok It
12 pages
Professional Regulation Commission Table of Specifications
100% (1)
Professional Regulation Commission Table of Specifications
8 pages
The Third Level
No ratings yet
The Third Level
2 pages
Affirm Your Expertise, Advance Your Career.: See What'S Next As A Cism
No ratings yet
Affirm Your Expertise, Advance Your Career.: See What'S Next As A Cism
4 pages
Psychological, and Pedagogical Benefits of Using Humor in An ELT Classroom.
No ratings yet
Psychological, and Pedagogical Benefits of Using Humor in An ELT Classroom.
39 pages

Convolutive Blind Source Separation With Wiener Po PDF

Uploaded by

Convolutive Blind Source Separation With Wiener Po PDF

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Convolutive blind source separation with Wiener post-ﬁltering for robust

Conference Paper · January 2006

Leandro Ezequiel Di Persia Diego Milone

SEE PROFILE SEE PROFILE

Machine learning for Imbalanced data View project

Tools for Precision Livestock Production View project

The user has requested enhancement of the downloaded file.

Leandro Di Persia1,2 , Diego Milone1,2 , and Masuzo Yanagida3

Abstract. Blind source separation for convolutive mixtures of sound

sources and two microphones. Speech recognition results show an incre-

the source vanishes. This phenomenon, called reverberation, can be modeled as

system, measured signals at each microphone can be modeled as a convolutive

x (t) = H (t) ∗ s (t) . (2)

x(ω, τ ) = H(ω)s(ω, τ ) . (3)

and future works.

2.1 JADE Algorithm

JADE algorithm is an independent component analysis (ICA) method that uses

where y is observed as in (4). This optimization is equivalent to a join diagonal-

applying this algorithm, a whitening transformation is performed in order to

2.2 FastICA Algorithm

To solve scaling and permutation indeterminacies, a variant of the method pro-

E (k)j = E (k − 1)j + αE (k)j . (11)

2.4 Time-Frequency Wiener Filter

where the weighting matrix C (ω, τ ) ∈ [0, 1].

3 Results and Discussion

Ceiling height :290

Fig. 1. Experimental setup (all dimensions in cm).

All recordings where made at 16000 Hz of sampling frequency with 16 bits

Fig. 3. Separated signals. Top: separated source 1; bottom: separated source 2

4 Conclusions and future works

Test Environment WACC %

algorithm allows an important reduction in interfering signal power, particularly

Proceedings-F 140 (1993) 362–370

View publication stats

You might also like