BLIND SOURCE SEPARATION BASED ON FAST-CONVERGENCE ALGORITHM USING
ICA AND BEAMFORMING FOR REAL CONVOLUTIVE MIXTURE
Hiroshi SARUWATARI. Toshiya KAWAMURA
Kat3uyuki SAWAI. Atsunobu KAMINUMAt. and Mosao SAKATA t
Graduate School of [nfonnation Science, Nara Institute of Science and Tec hnol o gy
8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0101, JAPAN
tNissan Research Center. NISSAN MOTOR CO., LTD.
1 Natsushima-cho, Yokosuka-shi, Kanagawa 237-8523, JAPAN
ABSTRACT
sound
We propose a new algorithm for blind source separation (BSS).
in which independent component analysis (lCA) and beanlfonning
are combined to resolve the low-convergeru:e problem through op
timization in ICA. The proposed method consists of the following
three parts: (1) frequency-domain ICA with direction-of-arrival
(DOA) estimation, (2) nul! beamforming based on the estimated
DOA, and (3) integration of (1) and (2) based on the alg orithm
diversity in both iteration and freqcy d omain. The inverse of
the mixing matrix obtained by rCA is temporally substituted by
the matrix based on null beamforming through iterative optimiza
tion, and the temporal alternation between ICA and beamfonning
can realize fast- and high-convergence optimization. The results
of the signal separation experiments reveaJ that the signal separa
tion performance of the proposed algorithm is superior to that of
the conventional ICA-based BSS method, even under reverberant
conditions.
I.
INTRODUCTION
Blind source separation (BSS) is the approach taken to estimate
original source signals using only the infonnation of the mixed
signals observed in each input channel. This technique is ap
plicable to the realization of noise-robust speech recognition and
bigh-quality hands-free telecommunication systems. In the recent
works for the BSS based on the independent component analysis
(lCA) [I), several methods, in which the inverse of the complex
mixing matrices are calculated in the frequency domain, have been
proposed to deal with the arrival lags among each of the elements
of the microphone array systent [2, 3,4]. However, this rCA-based
approach has the disadvantage that there is difficulty with the low
[5).
In this paper, we describe a new algorithm for BSS in which
ICA and beamfotrning are combined. The proposed method con
sists of the foUowing three parts: (I) frequency-domain ICA with
convergence of nonlinear optimization
estimation of the direction of arrival (DOA) of the sound source,
(2) null beamforming based on the estimated OOA, and (3) in
tegration of (I) and (2) based on the algorithm diversity in both
iteration and frequency domain. The tentporal utilization of null
beamforming through ICA iterations can realize fast- and high
convergence optimization. The following sections describe the
proposed method in detail, and it is shown that the signal sepa
ration performance of the proposed algorithm is superior to that
of the conventionallCA-based BSS method. Also, the experiment
in a real car environment shows that the separation performances
of the proposed method are remarkably superior to those of tbe
0.7803-7402-9/021$17.00 C2002 IEEE
o
d
ItIk:rophone II;
microphone I
(d=d,)
(d-dk)
Fig. 1. Configuration of a microphone array and signals.
conventional DS amy.
2.
MEmOD
In this study, a straight-line array is assumed. The coordinates
DATA MODEL AND CONVENTIONAL BSS
of the elements are designated as d/c (k = 1,, K), and the
directions of arrival of multiple sound sources are designated as
9, (l
K=L=2.
In the frequency domain, the observed signals in which mul
=
1"", L) (see Fig. I), where we deal with the case of
tiple source signals are mixed are given by X(f)=A(f)S(f),
where X{f) = [XI (f),., " XK{f)
is the observed signal vec
tor, and S(f) = [51 (f), . . . ,SL (f)] is the source signal vector.
A(/) is the mixing matrix which is assumed to be complex-valued
because we introduce a model to deal with the arrival lags among
each of the elements ofthe microphone array and room reverbera
tions.
In the frequency-domain rCA, first, the short-time analysis of
observed signals is conducted by frame-by-frame discrete Fourier
tJansform (OFT). By plotting the spectral values in a frequency
bin of each microphone input frame by frame, we consider them
as a time series. Hereafter, we designate the time series as X {f, t)
=[XI (I, t), ... ,XK(f, t)]T. Next, we perform signal separation
using the complex-valued inverse of the mixing matrix,
r:
W(f
so that the L time-series output Y{f, t)-[YI(f, t),'" , YL(/, t)J
-W(I)X(/, t) becomes mutually independent We perform this
procedure with respect to all frequency bins. finally, by applying
time series
the inverse
OFT and the overlap-add technique to the separated
Y(f, t), we reconstruct the resultant source signals in
the time domain.
[n the conventional ICAbased BSS method, the optimal W(f)
is obtained by the following iterative equation [2J:
1-921
where the superscript "(ICA)" is used to express that the inverse of
the mixing matrix is obtained by ICA.
IStep 3: DOA estimation] Estimate DOAs of the solmd sources
by utilizing the directivity pattern of the array system, F,(f,6).
which is given by
F,(f.6);;;; EW,CA)(f) exp[j211'/dlosin6/c).
"1
(5)
where W,CII.) (f) is the element of WA) (f). In the directivity
patterns, directional nulls exist in only two particular directions.
Accordingly. by obtaining statistics with respect to the directions
of nul ls at all frequency bins, we can estimate the OOAs of the
sound source s , The DOA of the I th sOlmd source. 9,. can be es
Wi(/)
timated as 8, ;;;; 2
6,(fm)/N, where N is a total point of
OFT. and 6,(f... ) represents the DOA of the I th sound source at
the m th frequency bin. These are given by
EI
,.J
_____
Fill. 2. Proposed algorithm combining frequency-domain ICA and
beamfonning.
(. h denotes the time-averaging operator, i is used to express
the value of the i th step in the iterations, and 'I is the step-size
parameter. Also. we define the nonlinear vector function +(.) as
where
- [.(Y1(J,t, .. ,it(YL(f,tf,
it(Yi(f.t ... [I +exp(-Y,(R)(f.tr1
'
'+j. [1+exp(-Y,(I)(f,tr ,
(Y(J,t
(2 )
where mintz, III (maxIz, Ill) is defi ned as a function in order to
obtain the smaller (larger) value among x and II,
IStep 4: Beamforming] Construct an alternative matrix for signal
WCBF)(f), based on the null-beamforming technique
where the DOA results obtained in the previous step is used. In the
separation,
case that the look direction is
to
WfF)(f,..);;;;exp[ -j27r/mdlsin81/c]
x {exp[j211'/mdl(9in-sin81)/c]
- exp[j27r/,..da(sin 6a-sin 61)/C] r!
W1C:F) (/m) - exp[ - j27r/",d,sin81/c)
(3)
and (I)(f. t) are the real and imaginllty parts
ofYj(f, t). respectively.
where
(R)(J. t)
3. PROPOSED ALGORITHM
ICA method inherently has a significant disad
vantage which is due to low convergence through nonlinear opti
m ization in ICA. In order to resolve the problem, we propose an al
gorithm based on the temporal alternation oflcaming between ICA
and beamformingj the inverse of the mixing matrix. W(f), 0b
tained through ICA is temporally substituted by the matrix based
on null beam forming for a temporal initialization or acceleration
ofthe iterative optimization. The proposed algorithm is conduc ted
by the following steps with respect to all frequency bins in parallel
The conventional
I-time leA
WJF)(fm) ;;;; -exp[ - j27r/... dl sin 82/C]
)( {-exp[j2?r/mdl(sin61-sin6a)/e]
+exp[j27r/md,(sin91-sin9,)/c] r\
WJ:F)(fm) ;;;; exp[ - j2?r/mdUin82/c]
)( {- exp [j2?r/",d1 (sin 61 -sin 92)/e]
iteration] Optimize Wi(!) using the fol
WA)(!);;;; '1[diag( ((Y(f,tyH(/,t)t )
-((Y(f, t)}yH(f, t)}c]WM) +Wj(f),
(10)
+exp[j27r/md2(sin61-sin6a)/c]) -I, (Il)
(Step 5: Diversity witb cost funetion] Select the most suitable
frequency bin and each iteration point,
i.e., algorithm diversity in both iteration and frequency domain.
As a cost function used to achieve the diversity. we calculate two
kinds of cosine distances between the separated signals which are
unmixing matrix in each
(4)
(9)
Also, in the case that the look direction is 92 and the directional
null is steered to iit, the elements of the matrix are given as
arbitrary value. where the subscripts i is set to be O.
(Step 2:
(8)
{ex:p[j211'fmdl (sin 82-sin 81)fC]
- exp[j21f/mda(sin ia-sin il)/c]} -I,
(see Fig. 2).
IStep 1: Initialization] Set the initial W,(/), i,e Wo(J). to an
lowing I-time ICA iteration:
81 and the directional null is steered
92, the elements of the matrix for signal separation arc given as
1-922
5.73m
Loudspeakers
.V(Height: 1.35 ml
i . 1 5 m ...
IS
2.1Sm
where Yj(ICA) (/, t) is the separate d signal by leA, and Yj(BF) (/, t)
is the separated signal by bearnfonning. If the separation per
fonnanee of beamforming is su mor to that of ICA, we obtain
the condition, J(ICA)(/) > JI F)(f); otherwise J(lCA)(f) :5
Thus, an observation of the conditions yields the fol
J(BF)(/).
I'iA)(f),
{W
(f),
W(iF)
(J(ICA)(f):5 J(BF)(f)
(J(ICA)(f) > J(BF)(f ) . (14)
If the (i + l)th iteration was the final iteration, go to step 6; oth
erwise go bec k to step 2 and repeat the ICA iteration insening the
W(f) given by Eq.
ofi.
(14) into W.(f) in Eq. (4) with an increment
6: Ordering and scaliag) Using the DOA information ob
tained in step 3, we detect and correct the source pe rmutati on and
the gain inconsistency (6].
(Step
4. EXPERIMENTS IN REVERBERANT ROOM
4.1. Conditions for experiments
A two-i:lement array with the interelement spacing of 4 cm is as
sumed. The speech signals are assumed to arrive from two direc
tions, -30 and 40. Two kinds of sen te nces , those spoken by
two male and two female speakers selected from the ASJ c ontin o
uous speech corpus fOi research, are used as the original sp eec h
samples. Using these sentences, we obtain 12 combinations with
respect to speakers and source directi on s. In these experiments, we
use the following signals as the source signals: the original speech
convolved with the impulse responses specified by different re
ve rberation times (RTs) of 150 msec and 300 msec. The impulse
responses are recorded in a variable reverberation t ime room as
shown in Fig. 3. The analytical conditions of these experiments
are as follows: the sampling frequency is 8 kHz, the frame le ngth
is 128 msec, the frame shift is 2 msce, and the step-size p arameter
1J is set to be 1.0 X 10-5
4-Z.
Objective evaluation oheparated signals
In order to compare the performance of t he proposed algorithm
with that of the conventional BSS described in Sect. 2 for different
iteration points in ICA, the noise reduction rale (NRR), defined
dB min us input SNR
in dB, is shown in Fig. 4. These values were averages of all of
the combinations with respect to speakers and source directions.
As for the proposed algorithm, we also plot the NRR which is
rescaled by the computational c o st (see dotted lin es) because the
proposed algorithm has a computational complexity of about 1.9
fold compared with the conventional ICA.
as the output signaltonoise rat io (SNR) in
..:
\4001
..
Microphone
array
"'.
.I
)
ht _: _2.7_ o_ _m.;.,j
ig. ;..
-.;
1 .3_5_m_l__<;..Roo
_m he
__
_
Layout of reverberant room used in experiments.
Q h_t :
B_
H_I
__<_
Fig. 3.
lowing algorithm:
W(/)
-Lg :o
'i
In Fig. 4, it is evident that the separation perfonnances of
superior to those of the conventional
the proposed algorithm are
ICAbased BSS method at every iteration point, even considering
the additional computational cost of the proposed algorithm. For
example, compared with the conventional me thod, the proposed
met hod can improve the NRR of about 4.6 dB at the SOiteration
point in the conventional ICA when the RT is ISO rnsec. Also,
when the RT is 300 msce, the proposed method can improve the
NRR of abou t 1.5 dB.
Figure S shows a result of a l tern atio n between lCA and null
be amforming through iterative optimization by the proposed algo
rithm when the RT is 300 msec. In this figure, the symbol "-"
represents that the null beam forming is used in the iteration point
and frequency bin. As shown in Fig. 5, the proposed algorithm can
work automatically as follows: (I) null beamforming is used for
the acceleration ofleaming at e ar ly times in the iterations because
W(BFl(f) is a rough approltimation of the inverse of the mixing
matrix A(f), (2) lCA is used after the early part of the iterations
because ICA can update the inverse of the mixing matrix more ac
curately, and (3) the inverse of the mixing matrix obtained by leA
is substituted by the matrix based on null beamforrning through
whole ite rati on points at particular frequency bins where the inde
pendence between the sources is low. From these results, although
null beamfonning is not su i tabl e for signal separation under the
condition that the direct sounds and their reflections exist, we can
confirm that the temporal utilization of null beamforming for al
gorithm diversity through lCA iterations is effective for improving
the sep arat i on performance and convergence.
S. EXPERIMENTS IN CAR ENVIRONMENT
A two-element amy with the interelement spacing of 4 em is as
sumed. The speec h signals are assumed to arrive from two direc
ti o ns , _500 for the driver and 500 for the spe aker in the assistant
seal . The impulse response s are recorded in a real car environment
as shown in Fig. 6, where we use 3 kinds of array pOSition. The
an alyti cal conditions in this experimen t are the same as those of
the previ ou s section, except for the sampling frequency (which is
16 kHz).
Figure 7 shows NRR results of the propose d method, where
we also plot the results of the conventional DelayandSum (DS)
array with 16-element for comparison (a priori infonnation on
DOAs was gi ven in OS amy). From this figure, it is evident that
the separation perfonnances of the proposed method are remark
ably superior to those of the conventional OS array at every amy
position. This indicates thaI the BSS is effective for speech en
1-923
14 r---------
.
. ......... _.-. ........ .. -.. .... ........ _ ..
..........
. ..
..
... .
lC
)<'
,..., 3000
i2500
2000
------M------------
itsoo Il.-;:=:.-:-::--:--------looO
soo
50
150
100
200
:.--- .
..
7! )<'
f'
.
6
15 5
0::
15
; I
-6
:,
4 !
Jl 3 i I
..
- ---
- - - - - -
- -
ConvenUonallCA -K
Proposed Method-+-
50
(rescaled by computation cosl) ...
100
150
Fig. 4. N oise reduction rates for different iteration in ICA. Rever
beration time is ISO msce (top) and 300 msec (bottom).
car
environment.
In this paper. we described a fast- and high-convergence algorithm
for BSS where null beamfonning is used for temporal algorithm
diversity through ICA iterations. The results of the signal separa
lion experiments reveal that the signal separation performance of
the proposed algorithm is superior to that oflhe conventionailCA
based BSS method, and the utiliz ation of null beamforming in ICA
is effective for improving the separation performance and conver
gence , even under reverberant conditions. Also, the experiment
in a real car environment shows that the separation performances
of the propo sed method are remarkably superior to those of the
conventional DS array.
- ..;
--.
... - ----"'
.;"- -- ...-- -
60
80
Number of Iteratlons
100
Back -----.
:.ra
eKperiment.
::l
REFERENCES
P. Common, "Independent component analysis, a new
cept?:' Signal Processing, vol.36, pp.287-314, 1994.
in
.2
U
supported by NlSSAN MOTOR CO . LTD. and CREST (Core Re
search for Evolutional Science and Technology) in Jap an .
and S.
- --
m20r.======----'
;E.
7. ACKNOWLEDGEMENT
con
Ikeda, "An on-line algorithm for blind
source separation on speech signals," Proceedings of 1998
M
--
Fig. 6. Layout of array in car cabin used
The authors are grateful to Dr. Shoji Makino, Mr. Ryo Mukai of
NTI. CO., LTD, and Mr. Maseru Yamazaki ofNISSAN MOTOR
CO., LTD. for their discussions on this work. This work was partly
[2] N.
40
_ . . -
6. CONCLUSION
8.
...
...-- --
200
Number of lteratoions
[1]
20
Driver
Proposed Method
hancement in the
r------
2
1
- - M - ---
-'- - _ .
FIg. 5. The result of alternation between ICA and null beamfonn
ing through iterative optimization by the proposed algorithm. The
symbol "." represents that the null beamfonning is used in the
iteration point and frequency bin. The RT is 300 msce.
n ------
i =s
o
u =oormm=l=
10r-______--N=m
. ........."!!:j:........................................ ..
..
.. .
.
9
.
:oiD.
_
-- -
ConvenlionallCA ..JC
Proposed Method-+
Proposed Method (rescaled by computational cost) .....
-----:-:-:---:-.-:- - -.---.-.-- -
1- 924
Array 3
FIg. 7. Noise reduction rates for different array position.
International Symposium on Nonlinear 11reory and lIs Ap
plicalion (NOLTA '98), vol.3, pp.923-926, Sep. 1998.
[3] P. Smaragdis, "Blind separation of convolved mixtures in
the frequency domain." Neurocompuling, vol.ll, pp.2l-34,
1998.
[4] L. Parra and C. Spence, "Convolutive blind separation of
non-stationary sources," IEEE TraIlS. Speech & Audio Pr0H. Saruwatari, S. Kurita, K. Takeda, F. ItaJeura, and K.
Shikano, "Blind source separation based on subband ICA
cess., Yol.8, pp.320-327, 2000.
[5]
2000.
[6] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F.
Itakura, "Evaluation of blind signal separati on method us
ing directivity pattern under reverberant conditions," Proc.
ICASSP2000, voJ.S, pp.3 140--3 143, June 2000.
and beamfonning," Proc. ICSLP2000, vol.3, pp.94-97, Oct.