0% found this document useful (0 votes)
66 views4 pages

Separation Based On Fast-Convergence Algorithm Using ICA Beamforming For Real Convolutive

The document proposes a new algorithm for blind source separation that combines independent component analysis (ICA) and beamforming. The key steps are: 1) Perform frequency-domain ICA with direction-of-arrival estimation to obtain an initial unmixing matrix. 2) Estimate sound source directions of arrival from the ICA results and perform null beamforming to obtain an alternative unmixing matrix. 3) Iterate between using the ICA and beamforming matrices, selecting the matrix providing better separation at each step, to realize fast convergence compared to ICA alone. Experiments show the proposed method achieves superior separation performance even under reverberant conditions.

Uploaded by

Fernando Pereira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views4 pages

Separation Based On Fast-Convergence Algorithm Using ICA Beamforming For Real Convolutive

The document proposes a new algorithm for blind source separation that combines independent component analysis (ICA) and beamforming. The key steps are: 1) Perform frequency-domain ICA with direction-of-arrival estimation to obtain an initial unmixing matrix. 2) Estimate sound source directions of arrival from the ICA results and perform null beamforming to obtain an alternative unmixing matrix. 3) Iterate between using the ICA and beamforming matrices, selecting the matrix providing better separation at each step, to realize fast convergence compared to ICA alone. Experiments show the proposed method achieves superior separation performance even under reverberant conditions.

Uploaded by

Fernando Pereira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BLIND SOURCE SEPARATION BASED ON FAST-CONVERGENCE ALGORITHM USING

ICA AND BEAMFORMING FOR REAL CONVOLUTIVE MIXTURE


Hiroshi SARUWATARI. Toshiya KAWAMURA

Kat3uyuki SAWAI. Atsunobu KAMINUMAt. and Mosao SAKATA t

Graduate School of [nfonnation Science, Nara Institute of Science and Tec hnol o gy
8916-5 Takayama-cho, Ikoma-shi, Nara, 630-0101, JAPAN
tNissan Research Center. NISSAN MOTOR CO., LTD.

1 Natsushima-cho, Yokosuka-shi, Kanagawa 237-8523, JAPAN


ABSTRACT

sound

We propose a new algorithm for blind source separation (BSS).


in which independent component analysis (lCA) and beanlfonning
are combined to resolve the low-convergeru:e problem through op
timization in ICA. The proposed method consists of the following

three parts: (1) frequency-domain ICA with direction-of-arrival


(DOA) estimation, (2) nul! beamforming based on the estimated
DOA, and (3) integration of (1) and (2) based on the alg orithm
diversity in both iteration and freqcy d omain. The inverse of
the mixing matrix obtained by rCA is temporally substituted by
the matrix based on null beamforming through iterative optimiza
tion, and the temporal alternation between ICA and beamfonning
can realize fast- and high-convergence optimization. The results
of the signal separation experiments reveaJ that the signal separa
tion performance of the proposed algorithm is superior to that of
the conventional ICA-based BSS method, even under reverberant

conditions.

I.

INTRODUCTION

Blind source separation (BSS) is the approach taken to estimate


original source signals using only the infonnation of the mixed

signals observed in each input channel. This technique is ap


plicable to the realization of noise-robust speech recognition and
bigh-quality hands-free telecommunication systems. In the recent
works for the BSS based on the independent component analysis

(lCA) [I), several methods, in which the inverse of the complex


mixing matrices are calculated in the frequency domain, have been
proposed to deal with the arrival lags among each of the elements

of the microphone array systent [2, 3,4]. However, this rCA-based


approach has the disadvantage that there is difficulty with the low

[5).

In this paper, we describe a new algorithm for BSS in which


ICA and beamfotrning are combined. The proposed method con
sists of the foUowing three parts: (I) frequency-domain ICA with
convergence of nonlinear optimization

estimation of the direction of arrival (DOA) of the sound source,


(2) null beamforming based on the estimated OOA, and (3) in
tegration of (I) and (2) based on the algorithm diversity in both

iteration and frequency domain. The tentporal utilization of null


beamforming through ICA iterations can realize fast- and high
convergence optimization. The following sections describe the
proposed method in detail, and it is shown that the signal sepa

ration performance of the proposed algorithm is superior to that


of the conventionallCA-based BSS method. Also, the experiment

in a real car environment shows that the separation performances


of the proposed method are remarkably superior to those of tbe

0.7803-7402-9/021$17.00 C2002 IEEE

o
d
ItIk:rophone II;
microphone I
(d=d,)

(d-dk)
Fig. 1. Configuration of a microphone array and signals.
conventional DS amy.

2.

MEmOD

In this study, a straight-line array is assumed. The coordinates


DATA MODEL AND CONVENTIONAL BSS

of the elements are designated as d/c (k = 1,, K), and the


directions of arrival of multiple sound sources are designated as

9, (l

K=L=2.
In the frequency domain, the observed signals in which mul
=

1"", L) (see Fig. I), where we deal with the case of

tiple source signals are mixed are given by X(f)=A(f)S(f),


where X{f) = [XI (f),., " XK{f)
is the observed signal vec
tor, and S(f) = [51 (f), . . . ,SL (f)] is the source signal vector.
A(/) is the mixing matrix which is assumed to be complex-valued
because we introduce a model to deal with the arrival lags among
each of the elements ofthe microphone array and room reverbera
tions.
In the frequency-domain rCA, first, the short-time analysis of
observed signals is conducted by frame-by-frame discrete Fourier
tJansform (OFT). By plotting the spectral values in a frequency
bin of each microphone input frame by frame, we consider them
as a time series. Hereafter, we designate the time series as X {f, t)
=[XI (I, t), ... ,XK(f, t)]T. Next, we perform signal separation
using the complex-valued inverse of the mixing matrix,

r:

W(f

so that the L time-series output Y{f, t)-[YI(f, t),'" , YL(/, t)J


-W(I)X(/, t) becomes mutually independent We perform this

procedure with respect to all frequency bins. finally, by applying

time series

the inverse

OFT and the overlap-add technique to the separated


Y(f, t), we reconstruct the resultant source signals in

the time domain.


[n the conventional ICAbased BSS method, the optimal W(f)
is obtained by the following iterative equation [2J:

1-921

where the superscript "(ICA)" is used to express that the inverse of


the mixing matrix is obtained by ICA.
IStep 3: DOA estimation] Estimate DOAs of the solmd sources
by utilizing the directivity pattern of the array system, F,(f,6).
which is given by

F,(f.6);;;; EW,CA)(f) exp[j211'/dlosin6/c).


"1

(5)

where W,CII.) (f) is the element of WA) (f). In the directivity


patterns, directional nulls exist in only two particular directions.
Accordingly. by obtaining statistics with respect to the directions
of nul ls at all frequency bins, we can estimate the OOAs of the
sound source s , The DOA of the I th sOlmd source. 9,. can be es

Wi(/)

timated as 8, ;;;; 2
6,(fm)/N, where N is a total point of
OFT. and 6,(f... ) represents the DOA of the I th sound source at
the m th frequency bin. These are given by

EI

,.J

_____

Fill. 2. Proposed algorithm combining frequency-domain ICA and


beamfonning.

(. h denotes the time-averaging operator, i is used to express


the value of the i th step in the iterations, and 'I is the step-size
parameter. Also. we define the nonlinear vector function +(.) as
where

- [.(Y1(J,t, .. ,it(YL(f,tf,
it(Yi(f.t ... [I +exp(-Y,(R)(f.tr1
'
'+j. [1+exp(-Y,(I)(f,tr ,

(Y(J,t

(2 )

where mintz, III (maxIz, Ill) is defi ned as a function in order to


obtain the smaller (larger) value among x and II,

IStep 4: Beamforming] Construct an alternative matrix for signal


WCBF)(f), based on the null-beamforming technique

where the DOA results obtained in the previous step is used. In the
separation,

case that the look direction is

to

WfF)(f,..);;;;exp[ -j27r/mdlsin81/c]
x {exp[j211'/mdl(9in-sin81)/c]
- exp[j27r/,..da(sin 6a-sin 61)/C] r!
W1C:F) (/m) - exp[ - j27r/",d,sin81/c)

(3)

and (I)(f. t) are the real and imaginllty parts


ofYj(f, t). respectively.

where

(R)(J. t)

3. PROPOSED ALGORITHM

ICA method inherently has a significant disad


vantage which is due to low convergence through nonlinear opti
m ization in ICA. In order to resolve the problem, we propose an al
gorithm based on the temporal alternation oflcaming between ICA
and beamformingj the inverse of the mixing matrix. W(f), 0b
tained through ICA is temporally substituted by the matrix based
on null beam forming for a temporal initialization or acceleration
ofthe iterative optimization. The proposed algorithm is conduc ted
by the following steps with respect to all frequency bins in parallel
The conventional

I-time leA

WJF)(fm) ;;;; -exp[ - j27r/... dl sin 82/C]

)( {-exp[j2?r/mdl(sin61-sin6a)/e]

+exp[j27r/md,(sin91-sin9,)/c] r\
WJ:F)(fm) ;;;; exp[ - j2?r/mdUin82/c]
)( {- exp [j2?r/",d1 (sin 61 -sin 92)/e]

iteration] Optimize Wi(!) using the fol

WA)(!);;;; '1[diag( ((Y(f,tyH(/,t)t )


-((Y(f, t)}yH(f, t)}c]WM) +Wj(f),

(10)

+exp[j27r/md2(sin61-sin6a)/c]) -I, (Il)

(Step 5: Diversity witb cost funetion] Select the most suitable


frequency bin and each iteration point,
i.e., algorithm diversity in both iteration and frequency domain.
As a cost function used to achieve the diversity. we calculate two
kinds of cosine distances between the separated signals which are
unmixing matrix in each

(4)

(9)

Also, in the case that the look direction is 92 and the directional
null is steered to iit, the elements of the matrix are given as

arbitrary value. where the subscripts i is set to be O.

(Step 2:

(8)

{ex:p[j211'fmdl (sin 82-sin 81)fC]

- exp[j21f/mda(sin ia-sin il)/c]} -I,

(see Fig. 2).


IStep 1: Initialization] Set the initial W,(/), i,e Wo(J). to an
lowing I-time ICA iteration:

81 and the directional null is steered


92, the elements of the matrix for signal separation arc given as

1-922

5.73m

Loudspeakers
.V(Height: 1.35 ml
i . 1 5 m ...

IS

2.1Sm

where Yj(ICA) (/, t) is the separate d signal by leA, and Yj(BF) (/, t)
is the separated signal by bearnfonning. If the separation per
fonnanee of beamforming is su mor to that of ICA, we obtain
the condition, J(ICA)(/) > JI F)(f); otherwise J(lCA)(f) :5
Thus, an observation of the conditions yields the fol

J(BF)(/).

I'iA)(f),
{W
(f),
W(iF)

(J(ICA)(f):5 J(BF)(f)
(J(ICA)(f) > J(BF)(f ) . (14)

If the (i + l)th iteration was the final iteration, go to step 6; oth


erwise go bec k to step 2 and repeat the ICA iteration insening the
W(f) given by Eq.
ofi.

(14) into W.(f) in Eq. (4) with an increment

6: Ordering and scaliag) Using the DOA information ob


tained in step 3, we detect and correct the source pe rmutati on and
the gain inconsistency (6].
(Step

4. EXPERIMENTS IN REVERBERANT ROOM

4.1. Conditions for experiments

A two-i:lement array with the interelement spacing of 4 cm is as


sumed. The speech signals are assumed to arrive from two direc
tions, -30 and 40. Two kinds of sen te nces , those spoken by
two male and two female speakers selected from the ASJ c ontin o
uous speech corpus fOi research, are used as the original sp eec h
samples. Using these sentences, we obtain 12 combinations with
respect to speakers and source directi on s. In these experiments, we
use the following signals as the source signals: the original speech
convolved with the impulse responses specified by different re
ve rberation times (RTs) of 150 msec and 300 msec. The impulse
responses are recorded in a variable reverberation t ime room as
shown in Fig. 3. The analytical conditions of these experiments
are as follows: the sampling frequency is 8 kHz, the frame le ngth
is 128 msec, the frame shift is 2 msce, and the step-size p arameter
1J is set to be 1.0 X 10-5
4-Z.

Objective evaluation oheparated signals

In order to compare the performance of t he proposed algorithm


with that of the conventional BSS described in Sect. 2 for different

iteration points in ICA, the noise reduction rale (NRR), defined


dB min us input SNR
in dB, is shown in Fig. 4. These values were averages of all of
the combinations with respect to speakers and source directions.
As for the proposed algorithm, we also plot the NRR which is
rescaled by the computational c o st (see dotted lin es) because the
proposed algorithm has a computational complexity of about 1.9
fold compared with the conventional ICA.

as the output signaltonoise rat io (SNR) in

..:

\4001
..

Microphone
array

"'.

.I

)
ht _: _2.7_ o_ _m.;.,j
ig. ;..
-.;
1 .3_5_m_l__<;..Roo
_m he
__
_
Layout of reverberant room used in experiments.

Q h_t :
B_
H_I
__<_

Fig. 3.

lowing algorithm:

W(/)

-Lg :o

'i

In Fig. 4, it is evident that the separation perfonnances of


superior to those of the conventional

the proposed algorithm are

ICAbased BSS method at every iteration point, even considering


the additional computational cost of the proposed algorithm. For
example, compared with the conventional me thod, the proposed
met hod can improve the NRR of about 4.6 dB at the SOiteration
point in the conventional ICA when the RT is ISO rnsec. Also,
when the RT is 300 msce, the proposed method can improve the
NRR of abou t 1.5 dB.
Figure S shows a result of a l tern atio n between lCA and null

be amforming through iterative optimization by the proposed algo

rithm when the RT is 300 msec. In this figure, the symbol "-"
represents that the null beam forming is used in the iteration point
and frequency bin. As shown in Fig. 5, the proposed algorithm can
work automatically as follows: (I) null beamforming is used for
the acceleration ofleaming at e ar ly times in the iterations because
W(BFl(f) is a rough approltimation of the inverse of the mixing
matrix A(f), (2) lCA is used after the early part of the iterations
because ICA can update the inverse of the mixing matrix more ac
curately, and (3) the inverse of the mixing matrix obtained by leA
is substituted by the matrix based on null beamforrning through
whole ite rati on points at particular frequency bins where the inde
pendence between the sources is low. From these results, although
null beamfonning is not su i tabl e for signal separation under the
condition that the direct sounds and their reflections exist, we can
confirm that the temporal utilization of null beamforming for al
gorithm diversity through lCA iterations is effective for improving
the sep arat i on performance and convergence.
S. EXPERIMENTS IN CAR ENVIRONMENT

A two-element amy with the interelement spacing of 4 em is as


sumed. The speec h signals are assumed to arrive from two direc
ti o ns , _500 for the driver and 500 for the spe aker in the assistant
seal . The impulse response s are recorded in a real car environment
as shown in Fig. 6, where we use 3 kinds of array pOSition. The
an alyti cal conditions in this experimen t are the same as those of
the previ ou s section, except for the sampling frequency (which is
16 kHz).
Figure 7 shows NRR results of the propose d method, where
we also plot the results of the conventional DelayandSum (DS)
array with 16-element for comparison (a priori infonnation on
DOAs was gi ven in OS amy). From this figure, it is evident that
the separation perfonnances of the proposed method are remark
ably superior to those of the conventional OS array at every amy
position. This indicates thaI the BSS is effective for speech en

1-923

14 r---------
.
. ......... _.-. ........ .. -.. .... ........ _ ..
..........
. ..
..
... .

lC

)<'

,..., 3000
i2500
2000

------M------------

itsoo Il.-;:=:.-:-::--:--------looO
soo

50

150

100

200

:.--- .

..

7! )<'
f'
.

6
15 5

0::

15

; I
-6
:,
4 !
Jl 3 i I

..

- ---

- - - - - -

- -

ConvenUonallCA -K
Proposed Method-+-

50

(rescaled by computation cosl) ...


100

150

Fig. 4. N oise reduction rates for different iteration in ICA. Rever


beration time is ISO msce (top) and 300 msec (bottom).

car

environment.

In this paper. we described a fast- and high-convergence algorithm


for BSS where null beamfonning is used for temporal algorithm
diversity through ICA iterations. The results of the signal separa
lion experiments reveal that the signal separation performance of
the proposed algorithm is superior to that oflhe conventionailCA
based BSS method, and the utiliz ation of null beamforming in ICA
is effective for improving the separation performance and conver
gence , even under reverberant conditions. Also, the experiment
in a real car environment shows that the separation performances
of the propo sed method are remarkably superior to those of the
conventional DS array.

- ..;
--.

... - ----"'
.;"- -- ...-- -

60

80

Number of Iteratlons

100

Back -----.

:.ra

eKperiment.

::l

REFERENCES

P. Common, "Independent component analysis, a new


cept?:' Signal Processing, vol.36, pp.287-314, 1994.

in

.2
U

supported by NlSSAN MOTOR CO . LTD. and CREST (Core Re


search for Evolutional Science and Technology) in Jap an .

and S.

- --

m20r.======----'
;E.

7. ACKNOWLEDGEMENT

con

Ikeda, "An on-line algorithm for blind

source separation on speech signals," Proceedings of 1998


M

--

Fig. 6. Layout of array in car cabin used

The authors are grateful to Dr. Shoji Makino, Mr. Ryo Mukai of
NTI. CO., LTD, and Mr. Maseru Yamazaki ofNISSAN MOTOR
CO., LTD. for their discussions on this work. This work was partly

[2] N.

40

_ . . -

6. CONCLUSION

8.

...
...-- --

200

Number of lteratoions

[1]

20

Driver

Proposed Method

hancement in the

r------

2
1

- - M - ---

-'- - _ .

FIg. 5. The result of alternation between ICA and null beamfonn


ing through iterative optimization by the proposed algorithm. The
symbol "." represents that the null beamfonning is used in the
iteration point and frequency bin. The RT is 300 msce.

n ------
i =s
o
u =oormm=l=
10r-______--N=m
. ........."!!:j:........................................ ..
..
.. .
.
9
.

:oiD.
_

-- -

ConvenlionallCA ..JC
Proposed Method-+
Proposed Method (rescaled by computational cost) .....

-----:-:-:---:-.-:- - -.---.-.-- -

1- 924

Array 3

FIg. 7. Noise reduction rates for different array position.

International Symposium on Nonlinear 11reory and lIs Ap


plicalion (NOLTA '98), vol.3, pp.923-926, Sep. 1998.
[3] P. Smaragdis, "Blind separation of convolved mixtures in
the frequency domain." Neurocompuling, vol.ll, pp.2l-34,

1998.
[4] L. Parra and C. Spence, "Convolutive blind separation of

non-stationary sources," IEEE TraIlS. Speech & Audio Pr0H. Saruwatari, S. Kurita, K. Takeda, F. ItaJeura, and K.
Shikano, "Blind source separation based on subband ICA
cess., Yol.8, pp.320-327, 2000.

[5]

2000.
[6] S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, and F.
Itakura, "Evaluation of blind signal separati on method us
ing directivity pattern under reverberant conditions," Proc.
ICASSP2000, voJ.S, pp.3 140--3 143, June 2000.
and beamfonning," Proc. ICSLP2000, vol.3, pp.94-97, Oct.

You might also like