Privacy-Aware Environmental Sound Classification For Indoor (2019)
Privacy-Aware Environmental Sound Classification For Indoor (2019)
36
Conference’19, June 5–7, 2019, Rhodes, Greece,
Wei Wang, Fatjon Seraj, Nirvana Meratnia, and Paul J.M. Havinga
while cheap to deploy. Beside the rich information that sound con- continuous audio recordings from 14 di�erent scenarios. Their in-
tains, there are also obvious challenges such as noise interference novation was a matching pursuit algorithm to extract features from
and high attenuation. To apply sound in indoor applications, sound time-frequency domain to complement MFCC. Compared to previ-
sensors also introduce some critical privacy concerns. People could ous works, the rich features used helped improved the classi�cation
feel stalked and spied on when microphones are around them. Al- accuracy. Heittola et al. [17] developed a context-dependent sound
though many research initiatives have focused on environmental event detection system. With context label which provided prior
sound recognition for general purposes, very few are speci�c for probabilities to HMM model to improve event detection accuracy.
human activity detection. There is still a gap between general sound The system also adopted an iterative algorithm based on original
classi�cation techniques and human activity speci�c techniques HMM to model the overlapping sound events.
inside buildings, e.g. how could privacy concern being preserved The aforementioned papers mainly focus on classifying the
and how to make the model lightweight to �t in IoT devices. In sound context but not the events. Later people use the features
this paper, we will �ll in this gap, by exploring the possibilities and models from these works to solve another problem, where the
of accurately using sound sensors to recognize human activity in focus is the shorter duration of an audio or so called events. From
privacy. data processing perspective, algorithms presented in these papers
Our idea to preserve privacy is to strip voice bands from input normally cut audio streaming into �xed length frames (millisecond
audio stream, as human voice largely falls into the range of 80HZ - level) and build models with feature arrays extracted from them.
3KHZ while environmental sound events have much wider range. This is because the time-varying characteristic of audio signal, short
We achieve this by using a hardware band-pass acoustic sensor frame based features can approximate time invariant functions and
that omits human voice frequencies in the device layer. Being a represent the details well.
challenging task, we conduct comparative experiments to �nd the Cowling et al. [8] compared several feature extraction and classi-
best suitable model and features that provide presence information �cation techniques typically used in speech recognition for general
in absence of human voice bands. We make use of machine learning environmental sound recognition. Particularly, tempo-frequency
techniques to classify di�erent human activity classes based on features such as STFT (Short Time Fourier Transformation) were
sound. Both the classi�cation accuracy and calculation density found to perform better than stationary MFCC features. Dalibor
results are given in the evaluation. We also compute the highest et al. [23] compiled a survey on the audio feature extraction tech-
rating model that can be used by the IoT devices with sound capable niques for audio retrieval systems. They proposed a taxonomy
functions to automatically detect ambient human behaviors in real of audio features together with the relevant application domain.
time. Temporal domain, frequency domain, Mel-frequency and harmonic
The remainder of the paper is organized as follows: Section 2 features were the most popular features in environmental sound
gives an overview of the related works on environmental sound recognition. Tran et al. [30] proposed a probabilistic distance SVM
recognition. Section 3 describes the methodology used to de�ne (Support Vector Machines) for online sound event classi�cation.
the characteristics, extract features and model the candidate sound This method has comparably good performance with very short
event classi�ers. Section 4 describes the experimental and evalu- latency and low computation cost compared to other mentioned
ation phase of this work. We conclude this paper with our open approaches. STE (Short Time Energy) was the only feature used
discussions in Section 5. in their model since in real online applications, the sound power
is highly related to the distance of the sound source, using this
feature alone could be highly biased in sound recognition task.
2 RELATED WORK Piczak et al. [24] applied CNN (Convolutional Neural Network), a
Classi�cation of human activities based on sound falls under cat- method usually applied to image processing, to classify environ-
egory of ’Environmental Sound Recognition’ (ESR). ESR aims to mental sound and achieved results comparable with traditional
automatically detect and identify audio events from captured audio methods ( 64.5% accuracy for 10 classes). Adavanne et al. [3] used
signal. Compared to other areas in audio presence such as speech multi-channel microphones and RNN (Recurrent Neural Networks)
or music recognition, general environmental sound recognition has for sound event detection, which improved the accuracy on de-
received less attention and is less e�ective because the sound input tecting overlapping sound events. Lopatka et al. [20] proposed an
is unstructured and arbitrary in size and pattern. acoustic surveillance system to recognize the acoustic events for
Eronen et al. [11] developed an environmental context classi�ca- possible threats like gunshot and explosion. The accuracy was not
tion system based on audio. With low-order hidden Markov models so encouraging compared to state of the art, but the advantage is
and MFCC (Mel Frequency Cepstral Coe�cients) features, they the low-cost model and realtime processing capacity. Dong et al.
achieved an accuracy of 82% for 6 classes, which dropped to 58% [9] used sensor fusion on PIR, CO2 , temperature data combined
when number of classes was increased to 24. Cai et al. [6] proposed with acoustic sensors to detect and predict user occupancy patterns.
a key audio e�ect detection framework for context inference, where They used only sound volume and no further exploration of other
movies are used as the dataset. The advantage of their model over sound-based feature. Yaniv et al. [33] used a combination of sound
others is to �rst infer the context by carefully picked and distin- and �oor vibrations to automatically detect people fall events to
guishable key sounds events (like explosion and gun shot means monitor elderly people living alone. Sound event length, energy,
an action movie). However these ’key events’ need manually and and MFCC features were extracted and used for event classi�cation.
carefully chosen and is hardly scaleable. Chu et al. [7] proposed an Table 1 shows the summery of the related works.
environmental sound recognition model to classify context, with
37
Conference’19, June 5–7, 2019, Rhodes, Greece,
Privacy-aware environmental sound classification for indoor human activity recognition
In General, many models have provided good results in sound 3.1 Preprocessing
events recognition problem, especially deep learning based models 3.1.1 Segmentation. In this step, the duration of events are ex-
have shown great potentials in this �eld [3][24]. However to use tracted from continuous audio streaming from background noise
sound in smart building applications there are still several draw- (or silence). The algorithm to extract these events is called segmen-
backs. One biggest problem is that sound may expose more privacy tation. Although segmentation is as important as other steps in our
information of occupants than cameras. Another drawback of deep algorithm, it is not the focus of this paper since numerous general
learning based model is the lack of huge training dataset. Compared approaches exist to tackle the problem already exist. Here we pro-
to the popular speech recognition problem, environmental sound vide one simple segmentation approach, which works as follows:
recognition problem is less attractive to researchers while the data (1) The audio stream is smoothed in time domain, and cut into �xed
source is more diverse than speech. What is more, the models also short frames (20ms) while the power is calculated for each frame.
need to be light weighted in order to operate in IoT devices. Our (2) The frames with power higher than a threshold are labeled as
work will �ll in these gaps and provide solutions to these prob- ’active’. The threshold can be preset static value or dynamic adjust-
lems. In our work, we �rst strip the voice bands of audio stream as ing. (3) Adjacent ’active’ frames are combined to form an event. (4)
human conversations are the most concerned privacy issue. Our Events shorter than a given duration are dropped, while the long
training data is crawled from multiple free audio websites so that frames are truncated such that the events are between 1 to 3s. The
the data source is diverse enough for model training. Regarding reason to choose this duration range is from practical experience,
to the models, we mainly investigate the low cost classi�ers and as human can identify sound segments with such length quite well.
e�cient feature combinations to improve the performance at low Figure 2 shows an example of the segmentation results.
computation complexity.
3.1.2 Voice bands stripping. The purpose of stripping human voice
is to preserve privacy, as human conversations are the one of the
most critical privacy concerns in indoor environments. Here we
3 METHODOLOGY implement a software band-stop �lter to eliminate the human voice
Our overall aim is to build an algorithm capable of classifying while in a real world implementation this can be implemented by
human activities based on environmental sound in real-time. The acoustic sensor physically, so that the privacy is protected by the
methodology is shown in Figure 1. The raw audio stream is �rst device layer. A typical band-stop �lter can achieve this function,
segmented into short events, from which important features are which let pass only the bands from zero up to its lower cut-o� fre-
extracted, and �nally each event will be classi�ed into an activity quency Flow and the bands above its upper cut-o� frequency Fhi h .
class using a classi�cation model. This results in a rejection of frequencies in the band Flow :Fhi h .
38
Conference’19, June 5–7, 2019, Rhodes, Greece,
Wei Wang, Fatjon Seraj, Nirvana Meratnia, and Paul J.M. Havinga
39
Conference’19, June 5–7, 2019, Rhodes, Greece,
Privacy-aware environmental sound classification for indoor human activity recognition
(a) original signal in time domain (b) truncated signal in time domain
(c) original signal in frequency domain (d) truncated signal in frequency domain
Announce Chatter Cheer Clap Footsteps Laugh elevatorcall shopping cart viznoman door slam wooden hair
0.5
0
-0.5
-1
PSD specetrogram
Frequency (kHz)
20
15
10
5
0
Log (mel) filterbank energies
20
Channel index
15
10
5
12
10
8
6
4
2
0 5 10 15 20 25
Time (s)
40
Conference’19, June 5–7, 2019, Rhodes, Greece,
Wei Wang, Fatjon Seraj, Nirvana Meratnia, and Paul J.M. Havinga
Learning is therefore a process performed by searching through classes of sound events including: Speech, Crowd Chatting, Cheer-
the representation space to �nd the best hypothesis that better �ts ing, Walking, Applause, Door slamming, Chair moving, Elevator,
the solution. Trolley. However, after looking into the dataset, we merged some
Classi�cation algorithms can be categorized into two types [5]: classes resulting in 5 classes mainly because of 2 reasons: (I) Some
(i) stateless algorithms, in which events to be classi�ed are treated objects make very similar sounds, which is hard to di�erentiate
as unrelated to each other and (ii) stateful algorithms, in which even by human. For example some door and chair sometimes can
the algorithms treat the events as related to each other and put make similar crisp and sharp sound, (II) some sounds always appear
them into a context while updating the memory. A stateful model simultaneously or very closely in certain scenarios, for example in a
works best in the scenario where the output is not only decided by party when there is cheering, there is always applause accompanied
current input, but also by previous input in timeline (state). Stateless with. Since these sounds are hard to separate, we merge them to-
models such as GMM, SVM, Random Forest and Neural Network gether into one class. After merging into 5 classes, the data samples
are widely used in sound events classi�cation, while in context distribution is shown in Figure 8, 150 samples from each class are
classi�cation, stateful models such as Bayes Networks, HMM and chosen for training for balance, the rest are used for testing.
RNN are more preferred. In our task, we think there is no need to
remember the past information, since it’s enough for human being Table 3: Data Source
to tell what’s happening when a sound is event heard, even without
much context. Scenario/ SR Length
Ref Format
While there might be a plethora of machine learning algorithm Data source used in this paper (khz) (hour)
variations with di�erent type of representation space, the e�ec- [22] residental house wav 44.1 24
tiveness for di�erent scenarios can only empirically be con�rmed. [12] random sound rec. wav, mp3, ai� 22,44.1 20
[1] random sound e�. wav, mp3 22,44.1 5
Hence, we select several commonly used algorithms to conduct an
empirical comparison and �nd the best candidate for the problem.
The chosen models in this paper are listed below, all of which are
stateless models:
(1) Decision tree
(2) Random Forest
(3) Mixed Gaussian
(4) Naive Bayes
(5) SVM(Linear & RBF kernel)
(6) Arti�cial Neural Network
4 EXPERIMENTAL EVALUATION
In this section, we present the experimental results over our labeled
audio dataset. Discussions are given to gain instructive insights
about how to build real-time indoor environmental sound recogni-
tion applications. The entire dataset is split into 2 parts: training
and test set. For the training sets, 5 folds cross-validation is used to
build the best �tting model, which is subsequently applied on the
test set for validation.
41
Conference’19, June 5–7, 2019, Rhodes, Greece,
Privacy-aware environmental sound classification for indoor human activity recognition
(a) Training w ith non-stripped data (b) Training w ith voice-stripped data
(baseline)
Figure 9. Performance and complexity of single features Figure 10. Greedy Search For Feature Combination
voice bands truncation, while LPCC is the best after truncation. In speech recognition, common techniques such as STFT are
These results show a signi�cant performance drop of MFCC after used to extract spectrum from overlapping frames, normally with
voice bands truncation (from 82% to 77%), which is reasonable half or 3/4 frame length overlapped. This is because audio signals
because MFCC is designed for human speech recognition as such it especially speech are highly time varying, and need more sophis-
provides better resolution in lower frequency bands. While LPCC ticated analysis to re�ect the details. However using overlapping
portraits the smoothed spectral envelope for entire frequency bands frames also duplicate or quadruple the complexity. We compared
without discrimination, it works nearly good with or without voice the results from di�erent overlapping frame length, with the best
bands. features combination found above. Figure 11 shows the results with
In machine learning, multiple features could be combined to- di�erent overlapping frame lengths. Results show that it’s best to
gether to reach better performances, as more features provide more use half frame length overlapping, which leads to approximately
information. However, this does not mean the more features the the same result as 3/4 overlapping, but is much more e�cient.
better, with a �xed number of training samples. With the predic-
tive power would normally �rst increase as feature numbers goes 4.3 Comparison of classi�cation models
up then decreases[13], mainly because duplicated and irrelevant
We also compare multiple classi�cation models with the same input
features only introduce noise to the model. In o�ine audio classi�-
features. In total we adopted 6 stateless models, using LPCC feature
cation, this is normally achieved by techniques like PCA (Principle
in input.
Component Analysis) to �rst trim redundant information. How-
The performance of classi�ers is shown in �gure 12. Results
ever, in real time applications, that need to be e�cient, selection of
show that SVM(RBF kernel) is the best, and the second best is ANN.
features should be proactively decided.
We used one hidden layer with 50 neurons for the ANN model
While brute force all the combinations is too time costing, we
based on the criterion in [15]. It is likely that our training data was
use heuristic greedy search to �nd the local optimum instead. It
not large enough, which has led to slightly worse results compared
starts with selecting the best single feature, then at each iteration
to SVM.
one more feature is added depending on the classi�cation accuracy.
Table 10 shows results of each iteration of the greedy algorithm for
both with and without voice truncation, added with the all-features 4.4 Con�dence of classi�cation results
result for comparison. In both cases, combination of features can In our application, knowing the classi�cation results is not enough,
improve the performance signi�cantly. Looking into the voice- we also need to score how reliable the results are. The high score
bands truncated case, the best combination contains 4 features: classi�cation results are kept, while the low score ones will be
LPCC, spectral-�ux, STE and time-entropy. This result matches discarded. This is because there are a lot of noise and irrelevant
our expectations, as these features portrait di�erent perspectives events in the environment, which are hard to be �ltered beforehand.
of the audio signals. Not surprisingly, using all-features for model However, they are likely to get low score in the classi�cation model,
input is not preferred, as high dimension data will cause over�tting since these sound carry very di�erent features. On the other hand,
problem and is much less e�cient. The calculation complexity of it’s acceptable if some interested events are discarded by mistake
features combination is smaller than the simple accumulation of as knowing part of the events is enough for detecting human be-
each since they share some steps such as FFT for all frequency haviours. Classi�cation algorithms such as Naive-Bayes give both
domain features. classi�cation result and the probability, i.e. a degree of certainty
42
Conference’19, June 5–7, 2019, Rhodes, Greece,
Wei Wang, Fatjon Seraj, Nirvana Meratnia, and Paul J.M. Havinga
Figure 11. Frame overlapping test Figure 12. Performance of Classi�ers Figure 13. Con�dence of prediction
about the result, which just corresponds to our needed score. How- Training set Testing set
43
Conference’19, June 5–7, 2019, Rhodes, Greece,
Privacy-aware environmental sound classification for indoor human activity recognition
we could di�erentiate the two classes with the help of other meth- [18] Eun Jeon, Jong-Suk Choi, Ji Lee, Kwang Shin, Yeong Kim, Toan Le, and Kang
ods, for example with sound localization algorithm since door could Park. 2015. Human detection based on the generation of a background image by
using a far-infrared light camera. Sensors 15, 3 (2015), 6763–6788.
not move. Even though our model is not perfect, we think this accu- [19] Benjamin Kedem. 1986. Spectral analysis and discrimination by zero-crossings.
racy makes it competent for a real indoor events recognition system. Proc. IEEE 74, 11 (1986), 1477–1493.
[20] K. Lopatka, J. Kotus, and A. Czyzewski. 2016. Detection, classi�cation and
Also our model is proved to be general since the dataset is from localization of acoustic events in the presence of background noise for acoustic
di�erent context and di�erent sound sources of high diversities. surveillance of hazardous situations. Multimedia Tools and Applications 75, 17
In real applications the accuracy could be higher, since the micro- (01 Sep 2016), 10407–10439. https://doi.org/10.1007/s11042-015-3105-4
[21] Konstantinos Makantasis, Antonios Nikitakis, Anastasios D Doulamis, Nikolaos D
phone embedded devices are normally �xed installed, where the Doulamis, and Ioannis Papaefstathiou. 2018. Data-driven background subtraction
sound source should be more homogeneous and easier to classify. algorithm for in-camera acceleration in thermal imagery. IEEE Transactions on
Circuits and Systems for Video Technology 28, 9 (2018), 2090–2104.
[22] A. Mesaros, T. Heittola, and T. Virtanen. 2016. TUT database for acoustic scene
6 ACKNOWLEDGEMENT classi�cation and sound event detection. In 2016 24th European Signal Processing
Conference (EUSIPCO). 1128–1132. https://doi.org/10.1109/EUSIPCO.2016.7760424
This work is a part of the COPAS (Cooperating Objects for Privacy [23] Dalibor Mitrović, Matthias Zeppelzauer, and Christian Breiteneder. 2010. Features
Aware Smart public buildings) project. for Content-Based Audio Retrieval. In Advances in Computers. Vol. 78. Elsevier,
71–150. http://linkinghub.elsevier.com/retrieve/pii/S0065245810780037 8 - 6
features.
REFERENCES [24] K. J. Piczak. 2015. Environmental sound classi�cation with convolutional neural
[1] [n. d.]. freeSFX.co.uk - FREESFX.CO.UK CONTENT PUBLISHER LICENCE networks. In 2015 IEEE 25th International Workshop on Machine Learning for
AGREEMENT. ([n. d.]). https://www.freesfx.co.uk Signal Processing (MLSP). 1–6. https://doi.org/10.1109/MLSP.2015.7324337
[2] G. Acampora, D. J. Cook, P. Rashidi, and A. V. Vasilakos. [n. d.]. A Survey on [25] Rashmi Priyadarshini and RM Mehra. 2015. Quantitative review of occupancy
Ambient Intelligence in Healthcare. 101, 12 ([n. d.]), 2470–2494. https://doi.org/ detection technologies. Int. J. Radio Freq 1 (2015), 1–19.
10.1109/JPROC.2013.2262913 3. [26] Fariba Sadri. [n. d.]. Ambient Intelligence: A Survey. 43, 4 ([n. d.]), 36:1–36:66.
[3] Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, and https://doi.org/10.1145/1978802.1978815 3.
Tuomas Virtanen. 2017. Sound Event Detection in Multichannel Audio Using [27] Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. 2014. A dataset and
Spatial and Harmonic Features. arXiv:1706.02293 [cs] (2017). http://arxiv.org/abs/ taxonomy for urban sound research. In Proceedings of the 22nd ACM international
1706.02293 conference on Multimedia. ACM, 1041–1044.
[4] Mohammed Arif, Martha Katafygiotou, Ahmed Mazroei, Amit Kaushik, Esam [28] Jérôme Sueur, Sandrine Pavoine, Olivier Hamerlynck, and Stéphanie Duvail. 2008.
Elsarrag, et al. 2016. Impact of indoor environmental quality on occupant well- Rapid acoustic survey for biodiversity appraisal. PloS one 3, 12 (2008), e4065.
being and comfort: A review of the literature. International Journal of Sustainable [29] Huy Dat Tran and Haizhou Li. 2011. Sound event recognition with probabilistic
Built Environment 5, 1 (2016), 1–11. distance SVMs. IEEE transactions on audio, speech, and language processing 19, 6
[5] Christopher M. Bishop. 2006. Pattern recognition and machine learning. Springer, (2011), 1556–1568.
New York. [30] Huy Dat Tran and Haizhou Li. 2011. Sound Event Recognition With Probabilistic
[6] R. Cai, Lie Lu, A. Hanjalic, Hong-Jiang Zhang, and Lian-Hong Cai. 2006. A �exible Distance SVMs. IEEE Transactions on Audio, Speech, and Language Processing 19,
framework for key audio e�ects detection and auditory context inference. IEEE 6 (2011), 1556–1568. https://doi.org/10.1109/TASL.2010.2093519
Transactions on Audio, Speech, and Language Processing 14, 3 (2006), 1026–1039. [31] Wikipedia contributors. 2018. Voice frequency — Wikipedia, The Free Encyclope-
https://doi.org/10.1109/TSA.2005.857575 7 - 5. dia. (2018). https://en.wikipedia.org/w/index.php?title=Voice_frequency&oldid=
[7] S. Chu, S. Narayanan, and C. C. J. Kuo. 2009. Environmental Sound Recognition 834458520 [Online; accessed 30-May-2018].
With Time #x2013;Frequency Audio Features. IEEE Transactions on Audio, Speech, [32] Ting-Fan Wu, Chih-Jen Lin, and Ruby C Weng. 2004. Probability estimates for
and Language Processing 17, 6 (2009), 1142–1158. https://doi.org/10.1109/TASL. multi-class classi�cation by pairwise coupling. Journal of Machine Learning
2009.2017438 2-3. Research 5, Aug (2004), 975–1005.
[8] Michael Cowling and Renate Sitte. 2003. Comparison of techniques for environ- [33] Y. Zigel, D. Litvak, and I. Gannot. 2009. A Method for Automatic Fall Detection of
mental sound recognition. Pattern Recognition Letters 24, 15 (2003), 2895 – 2907. Elderly People Using Floor Vibrations and Sound—Proof of Concept on Human
https://doi.org/10.1016/S0167-8655(03)00147-8 Mimicking Doll Falls. IEEE Transactions on Biomedical Engineering 56, 12 (Dec.
[9] Bing Dong and Burton Andrews. 2009. Sensor-based occupancy behavioral 2009), 2858–2867. https://doi.org/10.1109/TBME.2009.2030171
pattern recognition for energy and comfort management in intelligent buildings.
In Proceedings of building simulation. 1444–1451. https://pdfs.semanticscholar.
org/de95/b672e9e30b04749623c2d92c89f256eedda4.pdf 1.
[10] Monica Drăgoicea, Laurenţiu Bucur, and Monica Pătraşcu. 2013. A service
oriented simulation architecture for intelligent building management. In Interna-
tional Conference on Exploring Services Science. Springer, 14–28.
[11] Antti J Eronen, Vesa T Peltonen, Juha T Tuomi, Anssi P Klapuri, Seppo Fagerlund,
Timo Sorsa, Gaëtan Lorho, and Jyri Huopaniemi. 2006. Audio-based context
recognition. IEEE Transactions on Audio, Speech, and Language Processing 14, 1
(2006), 321–329.
[12] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound Technical Demo.
In ACM International Conference on Multimedia (MM’13). ACM, ACM, Barcelona,
Spain, 411–412. https://doi.org/10.1145/2502081.2502245
[13] Jerome H. Friedman. 1997. On Bias, Variance, 0/1—Loss, and the Curse-of-
Dimensionality. Data Mining and Knowledge Discovery 1, 1 (01 Mar 1997), 55–77.
https://doi.org/10.1023/A:1009778005914
[14] Timothy D Gri�ths, Adrian Rees, Caroline Witton, A Shakir Ra’ad, G Bruce
Henning, and Gary GR Green. 1996. Evidence for a sound movement area in the
human cerebral cortex. Nature 383, 6599 (1996), 425.
[15] M.T. Hagan, H.B. Demuth, and M.H. Beale. 2014. Neural network design. Martin
Hagan. https://books.google.nl/books?id=4EW9oQEACAAJ
[16] Ebenezer Hailemariam, Rhys Goldstein, Ramtin Attar, and Azam Khan. [n. d.].
Real-time Occupancy Detection Using Decision Trees with Multiple Sensor Types.
In Proceedings of the 2011 Symposium on Simulation for Architecture and Urban
Design (2011) (SimAUD ’11). Society for Computer Simulation International,
141–148. http://dl.acm.org/citation.cfm?id=2048536.2048555
[17] Toni Heittola, Annamaria Mesaros, Antti Eronen, and Tuomas Virtanen. 2013.
Context-dependent sound event detection. EURASIP Journal on Audio, Speech,
and Music Processing 2013, 1 (2013), 1. 9 -8 Add context-label to detect overlapping
sound event by HMM.
44