A Weighted Partial Domain Adaptation For Acoustic Scene Classification and Its Application in Fiber Optic Security System
A Weighted Partial Domain Adaptation For Acoustic Scene Classification and Its Application in Fiber Optic Security System
January 6, 2021.
Digital Object Identifier 10.1109/ACCESS.2020.3044153
ABSTRACT Domain adaptation (DA) is a technology that transfers knowledge from the source domain to
the target domain. General domain adaptation assume that the source and the target domain have the same
label space. However, in practical application tasks, the label of the target domain is often only a subset
of the source label. For this situation, partial domain adaptation is usually used as an effective solution to
transfer knowledge from a large number of labeled data sets to unlabeled micro data sets. In this article,
a weighted partial domain adaptation method is proposed to solve the Acoustic Scene Classification (ASC)
problem. Our method establish a connection between source and target domains to do the partial domain
adaptation. Experiments are carried out on TUT and ESC-50 datasets which show that our method achieves
state-of-the-art results. What is more, we apply the algorithms to an optical fiber perimeter security system
to complete early warning by identifying intrusion signals.
INDEX TERMS Partial domain adaptation, generative adversarial training, multi-weighting scheme, acous-
tic scenes classification, optical fiber perimeter security system.
I. INTRODUCTION with mismatched recording devices [8]. In this task, the data
Acoustic scenes classification (ASC) is the task for assigning collected by each recording device can be regarded as a
the labels to the audio signals to determine the environment separate data domain. Therefore, DA is naturally used to
in which the signals are collected [1], [2]. In recent years, solve the problem. S.Gharib et al used generative adversarial
domain adaptation method (DA) based on deep learning (DL) networks as feature extractors to extract domain-invariant
has been proved to be an effective trick to solve classifica- features for domain adaptation [9]. K.Drossos et al replaced
tion tasks [3]. Existing domain adaptation methods generally the adversarial adaptation process with Wasserstein Genera-
assume that source and target domain share the same label tive Adversarial Networks (WGAN) to improve the transfer
space. However, in real ASC task, signals to be classified effect [10]. However, the above methods are still based on
are usually only the part of the training data set. Standard the assumption that two domains have the same label spaces.
DA is difficult to obtain satisfactory classification results in In this article, we are addressing the transfer problem where
this situation. Therefore, partial domain adaptation (PDA) [4] target labels are the part of source labels.
is proposed to transfer knowledge from source dataset with Therefore, we propose a weighted partial domain adap-
sufficient labels to target dataset with fewer labels. tation method based on generative adversarial learning.
Domain adaptation (DA) is now widely used in computer We establish a connection between two generators to preserve
vision [5], image recognition [6], natural language process- the class-level structure during domain adaptation. A multi-
ing [7] and other fields. However, up to date, only a few weighting scheme is proposed not only to complete the selec-
studies applied domain adaptation (DA) methods to acoustic tion of shared categories in the source domain, but also to
scenes classification (ASC) task. In 2018, IEEE Audio and help the network to distinguish whether the sample belongs to
Acoustic Signal Processing (AASP) proposed an ASC task shared categories. Experiments are conducted on widely used
acoustic classification datasets TUT dataset [9] and ESC-50
The associate editor coordinating the review of this manuscript and dataset [11], [12]. Results show that our method improves
approving it for publication was Mohammad Zia Ur Rahman . more than 20% classification accuracy in both dataset after
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
2244 VOLUME 9, 2021
N. He, J. Zhu: Weighted Partial DA for ASC and Its Application in Fiber Optic Security System
domain adaptation. What is more, we apply our method to Then MMD can be represented by the following formula:
a perimeter security system [13] and achieve good alarm
MMD [F, p, q] = sup Em∼p [f (m)] − En∼q [f (n)]
function.
f ∈F
The rest of the article is organized as follows. Section II
gives a brief description on related work. In Section III,
C. ADVERSARIAL-TRAINING BASED DOMAIN
we introduce our method in details. Section IV provides the
ADAPTATION METHODS
experiments on TUT and ESC-50 dataset. Our algorithm is
Many domain adaptation algorithms have achieved good
also applied to a fiber optic security system which is also
adaptation effects based on adversarial training. Such as
introduced in Section IV. Finally, we conclude the article
Adversarial Discriminative Domain Adaptation
in Section V.
(ADDA) [16], Multi-Adversarial Domain Adaptation
(MADA) [17] and Conditional Adversarial Domain Adapta-
II. RELATED WORK
tion (CDAN) [18]. In these methods, a domain discriminator
A. DOMAIN ADAPTATION
is trained to minimize the domain discrepancy. Aadver-
Domain adaptation (DA) is a representative method in trans- sarial losses can make sure that the learned function can
fer learning, which uses sufficient source domain samples transfer an individual source sample to the desired domain.
to improve the performance of the target domain model. However, previous methods only focus on the global trans-
Here we introduce the basic ideas of domain adaptation. form. Since the discriminator can reduce the domain discrep-
Source and target domain are noted as DS = hXS , YS i and ancy, it destroys the class semantic feature in each category.
DT = hXT , YT i, where X represents the data distribution Therefore, a partial domain adaptation algorithm is proposed
and Y represents the label processing. The key to the domain based on adversarial training to overcome this shortcoming.
adaptation algorithm is to design a classifier H over x. The
expected error of the H over its input x can be expressed as
III. PROPOSED METHOD
follow.
In this section, our weighted partial domain adaptation
method is introduced in details. Some mathematical notation
(H , Y ) = L (H (x) , Y (x))
is set to interpret our algorithm. Soucrce and target domain
where L is the loss function, (H , Y ) indicates the differ- are defined as DS = hXS , YS i and DT = hXT , YT i, where
ences between the output of the classifier H and the label Y . X represents the data and Y represents the label. And in
The goal of domain adaptation is to adjust H to get the small the acoustic scenes classification (ASC) task of this article,
error S (H , Y ) in the source domain and adapt it to the target the data of the same category in the source domain and the
domain with low vaule of T (H , Y ). target domain have the same feature distribution (MS = MT ),
The classifier H with the low S (H , Y ) can be obtained while the target label is a subset of the source label (YT ∈ YS ).
from classical training on the source domain DS . However,
H cannot be optimized by retraining on the target domain DT A. NETWORK FRAMEWORK
due to the lack of labels. Therefore, the focus of domain adap- The weighted partial domain adaptation we proposed is
tation changes to reduce the discrepancy between XS and XT . based on the theory of generative adversarial neural networks
(GAN) [19]. The overall framework of our network is shown
B. MAXIMUM MEAN DISCREPANCY in Figure.1.
As mentioned above, the goal of domain migration is to
reduce the discrepancy between the source domain and B. GENERATIVE TRAINING
the target domain. Maximum mean discrepancy (MMD) As shown in Figure.1, two generators Gs and Gt are built in
[14], [15] can reflect the similarity between two distributions, source and target domain respectively. Gs aims to generate
so it is often used as an important indicator in the domain simliar data Ft based on the source data Xs . Gt does the same
adaptation algorithm to measure the discrepancy between the job to generate the fake data Fs . The training loss of the
two domains. generator in the source domain is consistent with the GAN
Specifically, the statistical test method based on MMD network.
refers to the following method. For samples of two distri-
butions, we calculate the mean value of the samples on the LsGAN (Xs ) = Ldiss + Lclss
Ldiss = E log Cs (Xs ) + E log (1 − Cs (Gs (Xs )))
continuous function f and take the mean difference as the
mean discrepancy of the two distributions. Then the MMD Lclss = E log Cs (Xs , Ys ) + E log Cs (Gs (Xs ) , Ys )
is obtained by looking for the continuous function f in the
sample space to make this mean discrepancy have the maxi- where Ldiss is the discrimination loss and Lclss is the clas-
mum value. Assume that m and n are two data sets sampled sification loss. After that, classifier Cs uses both the real
from two distributions p and q respectively and F is used source data Xs and the generated fake data Ft as input
to represent a continuous function set in the sample space. for training. A similar training process takes place in the
target domain. category between two domains. The other is the shared-
sample weight. Our method consider the weight of the cate-
LtGAN (Xs , Xt ) = Ldist + Lclst
gory and the sample at the same time to achieve better transfer
Ldiss = E log Ct (Xs ) +E log (1−Ct (Gt (Xt )))
effect.
Lclss = E log Ct (Ft , Ys ) + E log Ct Fs , Ybt
We first introduce the shared-class weight. We treat the
output of the last convolutional layer in the classifier trained
where Ybt is the pesudo label from the C0 (Xt ). Here C0 is a
in the previous chapter as a feature extractor to get the data
pre-trained classifier only based on the source data Xs .
feature M (x). The general idea is to learn both discrimina-
Similar to the training of GAN, we put the classifier and
tor and domain invariant features. Therefore, the discrimi-
generator together for joint generative training. Maximum
nator loss in our network is similar to the GAN minimax
mean discrepancy (MMD) is selected as an indicator for
loss:
generator training. MMD can reflect the distribution discrep-
ancy between two domains. In our network, two types of
min max L (D, Ms , Mt ) = Ex∼Zs (x) log D (Ms (x))
MMD loss are applied to describe the data distribution. One Ms ,Mt D
is the global MMD that shows the distance between source +Ex∼Zt (x) log (1−D (Mt (x)))
and target domain center. The other is the class MMD that
calculates the distance between each class data. The whole where Ms and Mt are the features from source and target data.
MMD lose in the network is defined as follows. D is the domain discriminator to identify whether the features
s/t s/t 1 s/t come from the source or target domain. The loss minimizes
LMMD = LgMMD + LcMMD
N the data distribution divergence on the feature space M while
where N is the number of data classes. We integrate the gen- produces a stricter bound for the discriminator D to achieve
erator training loss and MMD loss to complete the generative the more accurate identification results.
training of our network. To sum up, generator and classifier In the process of training the discriminator D, for any Ms (x)
are respectively established in the source and target domain. and Mt (x), training the discriminator D is to maximize the
The classifier takes the output of the generator as input and loss:
aims to get the best classification result, while the generator
aims to minimize the joint loss which is shown as follows. λ max L (D, Ms , Mt )
D
in the equation controls the relative weight of two losses. Z
= Zs (x) log D (Ms (x))
L = LsGAN + LtGAN + λ LsMMD + LtMMD
x
+Zt (x) log (1 − D (Mt (x))) dx
C. MULTI-WEIGHTING SCHEME Z
After training the generators and classifiers, a multi- = Zs (m) log D (m) + Zt (m) log (1 − D (m)) dm
weighting scheme is proposed to complete the subsequent m
partial domain adaptation. In our method, we propose two
weights to do the partial domain adaptation. One is the where m = M (x) is the feature sample after extraction.
shared-class weight [20], which is used to select the shared And the theoretical optimal solution D of this optimization
B. APPLICATION IN PERIMETER SECURITY SYSTEM adaptation effect on different data sets, which is effective and
Since our method has achieved good recognition accuracy on universal.
acoustic classification task, we apply it to a perimeter security
system and achieve satisfactory results. V. CONCLUSION
With the development of society, the intelligent perimeter In this article, a weighted partial domain adaptation method
security system has been applied in various occasions. So we is proposed for acoustic scenes classification task. We expand
design a perimeter security system [13], [21] based on the on the ideas of Generative Adversarial Networks and estab-
optical fiber sensors. System collects external intrusion sig- lish a connection between the two generators in the source
nals through optical fiber sensors and analyzes the types of and target domain. Therefore, generators can preserve the
intrusion signals to achieve the purpose of early warning. class-level structure while generating the data samples. What
The overall framework of the perimeter security system we is more, a multi-weighting scheme is proposed to com-
designed is shown in Figure.4. plete the partial domain adaptation. The shared-class weights
obtained through discriminator training can help us find
shared categories between domains. Moreover, the shared-
sample weights serve as a good supplement to describe the
association between samples and shared classes. Experiments
are taken on TUT and ESC-50 dataset among our method and
the SOTA domain adaptation algorithms (DAN, JAN, ADDA,
MADA, CDAN, SAN, ATDA, MIDN). Results show that our
method outperforms the second best method by more than 3%
on both two dataset. What is more, our method is applied to
the optical fiber security system and achieves good results,
which proves that our algorithm has a strong universality.
FIGURE 4. The overall framework of perimeter security system.
REFERENCES
In our designed perimeter security system, the optical fiber [1] D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and
sensor collects signals from the vibration on the optical path. M. D. Plumbley, ‘‘Detection and classification of acoustic scenes and
events: An IEEE AASP challenge,’’ in Proc. IEEE Workshop Appl. Signal
The collected signal can be regarded as an audio signal. Thus
Process. to Audio Acoust., Oct. 2013, pp. 1–4.
the perimeter security system can be seen as an alternative [2] A. Mesaros, T. Heittola, and T. Virtanen, ‘‘Acoustic scene classification:
audio scene classification task. What is more, although the An overview of dcase 2017 challenge entries,’’ in Proc. 16th Int. Workshop
Acoustic Signal Enhancement (IWAENC), Sep. 2018, pp. 411–415.
security system has retrained various types of intrusion sig-
[3] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and
nals, when the security system works, each intrusion signal J. W. Vaughan, ‘‘A theory of learning from different domains,’’ Mach.
is individually detected and identified, which also coincides Learn., vol. 79, nos. 1–2, pp. 151–175, May 2010.
[4] G. Yang, H. Xia, M. Ding, and Z. Ding, ‘‘Bi-directional generation for
with the thought of partial adaptation. The training data
unsupervised domain adaptation,’’ in Proc. AAAI, 2020, pp. 6615–6622.
contains a variety of intrusion signals, including vehicles [5] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, ‘‘Transferring visual category
passing by, man-made mining, etc. However, the intrusion models to new domains,’’ in Proc. ECCV, 2010, pp. 1–5.
[6] Q. Wang, J. Gao, and X. Li, ‘‘Weakly supervised adversarial domain
signal is finally divided into two classes according to their
adaptation for semantic segmentation in urban scenes,’’ IEEE Trans. Image
labels, which is only divided into harmful intrusion signals Process., vol. 28, no. 9, pp. 4376–4386, Sep. 2019.
and harmless intrusion signals. Our algorithm is only used in [7] J. Blitzer, R. McDonald, and F. Pereira, ‘‘Domain adaptation with struc-
tural correspondence learning,’’ in Proc. Conf. Empirical Methods Natural
the previous intrusion signal classification task.
Lang. Process., 2006, pp. 22–23.
[8] A. Mesaros, T. Heittola, and T. Virtanen, ‘‘A multi-device dataset for urban
TABLE 5. Alarm Accuracy in Different Environment. acoustic scene classification,’’ Tech. Rep., 2018.
[9] S. Gharib, K. Drossos, E. akir, D. Serdyuk, and T. Virtanen, ‘‘Unsuper-
vised adversarial domain adaptation for acoustic scene classification,’’
Tech. Rep., 2018.
[10] K. Drossos, P. Magron, and T. Virtanen, ‘‘Unsupervised adversarial domain
adaptation based on the wasserstein distance for acoustic scene classifica-
tion,’’ Tech. Rep., 2019.
[11] V. Boddapati, A. Petef, J. Rasmusson, and L. Lundberg, ‘‘Classifying envi-
ronmental sounds using image recognition networks,’’ Procedia Comput.
Sci., vol. 112, pp. 2048–2056, Dec. 2017.
[12] Y. Tokozume, Y. Ushiku, and T. Harada, ‘‘Learning from between-class
Traditional recognition algorithms like Back Propagation examples for deep sound recognition,’’ Tech. Rep., 2017.
Neural Networks (BPNN), Support Vector Machines (SVM) [13] N. He, J. Zhu, and L. Li, ‘‘[ieee 2018 international joint conference on
neural networks (IJCNN) - rio de janeiro, Brazil (2018.7.8-2018.7.13)]
and Deep Neural Networks (DNN) are used to conduct com- 2018 international joint conference on neural networks (IJCNN)—An
parative experiments with our method. And the results are optic-fiber fence intrusion recognition system using the optimized curve,’’
shown in Table 5. It is clear from the Table 5 that our method Tech. Rep., 2018.
[14] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola,
achieves the best recognition accuracy in any environment. ‘‘A kernel two-sample test,’’ J. Mach. Learn. Res., vol. 13, pp. 723–773,
This proves that our algorithm can achieve the good domain Mar. 2012.
[15] F. Liu, W. Xu, J. Lu, G. Zhang, and D. J. Sutherland, ‘‘Learning deep NINGYU HE received the B.S. degree from the
kernels for non-parametric two-sample tests,’’ in Proc. Int. Conf. Mach. School of Electronic Engineering, Xidian Univer-
Learn. (ICML), 2020, pp. 1–29. sity, China, in 2017. He is currently pursuing the
[16] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, ‘‘Adversarial discrim- Ph.D. degree with Shanghai Jiao Tong University,
inative domain adaptation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern China. He is also with the Department of Elec-
Recognit. (CVPR), Jul. 2017, pp. 7167–7176. tronic Engineering, Shanghai Jiao Tong Univer-
[17] Z. Pei, Z. Cao, M. Long, and J. Wang, ‘‘Multi-adversarial sity. His research interests include audio signal
domain adaptation,’’ 2018, arXiv:1809.02176. [Online]. Available:
processing, image processing, and deep learning.
http://arxiv.org/abs/1809.02176
[18] M. Long, Z. Cao, J. Wang, and M. I. Jordan, ‘‘Conditional adversar-
ial domain adaptation,’’ in Proc. Adv. Neural Inf. Process. Syst., 2017,
pp. 1640–1650.
[19] A. Radford, L. Metz, and S. Chintala, ‘‘Unsupervised representation learn-
ing with deep convolutional generative adversarial networks,’’ Comput.
Sci., May 2015.
[20] J. Zhang, Z. Ding, W. Li, and P. Ogunbona, ‘‘Importance weighted adver-
sarial nets for partial domain adaptation,’’ in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., 2018, pp. 8156–8164.
[21] N. He and J. Zhu, ‘‘Deep learning approach for audio signal classification JIE ZHU (Member, IEEE) received the Ph.D.
and its application in fiber optic sensor security system,’’ in Proc. 9th Int. degree in communications and information sys-
Conf. Inf. Sci. Technol. (ICIST), Aug. 2019, pp. 263–267. tems from Shanghai Jiaotong University. In 1997,
[22] M. Long and J. Wang, ‘‘Learning transferable features with deep adaptation he went to Bell Labs, NJ, USA, for coopera-
networks,’’ in Proc. Int. Conf. Mach. Learn., 2015, pp. 97–105.
tive scientific research. In 2000, he was a Senior
[23] M. Long, H. Zhu, J. Wang, and M. I. Jordan, ‘‘Deep transfer learning
Visiting Scholar with the Dresden University of
with joint adaptation networks,’’ in Proc. Int. Conf. Mach. Learn., 2016,
pp. 2208–2217.
Technology, Germany, for visiting research. He is
[24] Z. Cao, M. Long, J. Wang, and M. I. Jordan, ‘‘Partial transfer learning with currently a Professor with the Department of Elec-
selective adversarial networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern tronic Engineering and also a Doctoral Supervisor
Recognit., 2017, pp. 2724–2732. in electronic science and technology. And, he went
[25] P. Tang, X. Wang, X. Bai, and W. Liu, ‘‘Multiple instance detection to USA, Europe, Japan, South Korea, and other countries to participate in
network with online instance classifier refinement,’’ in Proc. IEEE Conf. international conferences and academic exchanges for many times.
Comput. Vis. Pattern Recognit., 2017, pp. 2843–2851.