Towards Multimodal Surveillance For Smart Building
Towards Multimodal Surveillance For Smart Building
net/publication/322343800
CITATIONS READS
7 599
9 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Davide Moroni on 15 January 2018.
Proceedings
Towards Multimodal Surveillance for Smart
Building Security †
Giuseppe Amato, Paolo Barsocchi, Fabrizio Falchi, Erina Ferro, Claudio Gennaro,
Giuseppe Riccardo Leone *, Davide Moroni, Ovidio Salvetti and Claudio Vairo
Institute of Information Science and Technologies, National Research Council of Italy, Via Moruzzi, 1-56124 Pisa,
Italy; [email protected] (G.A.); [email protected] (P.B.); [email protected] (F.F.);
[email protected] (E.F.); [email protected] (C.G.); [email protected] (D.M.);
[email protected] (O.S.); [email protected] (C.V.)
* Correspondence: [email protected]
† Presented at the International Workshop on Computational Intelligence for Multimedia Understanding
(IWCIM), Kos Island, Greece, 2 September 2017.
Abstract: The main goal of a surveillance system is to collect information in a sensing environment
and notify unexpected behavior. Information provided by single sensor and surveillance technology
may not be sufficient to understand the whole context of the monitored environment. On the
other hand, by combining information coming from different sources, the overall performance
of a surveillance system can be improved. In this paper, we present the Smart Building Suite,
in which independent and different technologies are developed in order to realize a multimodal
surveillance system.
Keywords: smart building; sensor networks; smart cameras; surveillance; pervasive computing;
face recognition
1. Introduction
The concept of “smart building” is very wide, as it embraces all the building’s components,
from technology to services, in order to fully support the people living or working there [1].
Optimizing spaces utilization, building life costs and surveillance system is part of the process of
making “smart” a building. The life costs are mostly related to operations and maintenance; as an
example, the tendency to be unconcerned about energy saving is more prominent in office than at home;
thus, in a smart building it is necessary to provide automatic energy saving services. Surveillance
is another important aspect in order to both avoid waste of money or destruction of properties due
to thefts and for the safety of the people [2]. Unfortunately, today the need of good surveillance is
getting higher due to the high level of criminality, especially in sensitive areas, government offices,
public places, and even at home.
Security and intrusion control concerns are the motivating factors for the deployment on the
market of many video surveillance systems, such as surveillance cameras, which are closed-circuits that
need a command and a control center to monitor all the activities using cameras, or radio frequency
identifications (RFID), which use radio waves to automatically identify persons or objects by means of
RFID transponders and readers. In this context, the information provided by a dedicated surveillance
technology may not be sufficient to automatically identify the intrusions in a monitored environment.
Combining several sources of information coming, for example, from both a wireless sensor network
and cameras, improves the overall performance of a surveillance system. Indeed, ambient sensors
(like RFID, PIR, noise, etc...) may generate false positive readings that can be reduced by fusing these
measurements with data coming from surveillance cameras.
In this paper, we present an envision enhanced surveillance system for smart building, in which
independent and different technologies, currently implemented and deployed, are integrated and
coordinated by a high-level central reasoner in order to provide a more robust and efficient surveillance
system for a smart building. Such an enhanced surveillance system will be able both to monitor what
happens outside the buildings and to detected if a violation occurs. We also describe the different
technologies, based on sensors and cameras, and the software solutions we implemented and that we
aim to integrate in order to realize the surveillance scenario described above. The paper is organized as
follows. Section 2 describes the system and its functionalities. Section 3 reports the technologies used
in developing the system; Section 4 presents the methodologies adopted and describes the algorithm
developed. Finally, Section 5 concludes the paper.
3. Involved Technologies
Raw sensor data and video streams are processed by two smart peripheral subsystems:
the embedded smart cameras and the wireless sensor network, which are in charge to detect the
events of interest and to communicate with the central server. In the following paragraphs we briefly
describe the technologies involved in these tasks.
transducers, such as humidity, temperature, current, voltage, Passive Infrared (PIR), and noise
detector [4,5]. Each node of the WSN is connected to a ZigBee Sink, which provides IoT connectivity
through the IPv6 addressing. The ZigBee choice is driven by several technology characteristics, such as
ultra low power consumption, use of unlicensed radio bands, cheap and easy installation, flexible and
extendable networks, integrated intelligence for network set-up and message routing. In order to
measure the energy consumption in a room, we evaluate the values of current and voltage waveforms
at the same time instant; for this scope, driven by the need to operate within existing buildings without
the possibility of changing the existing electrical appliances, we used a current and voltage transformer.
We also installed a PIR and a noise detector into a single node. All these measured values help in
deciding whether or not someone is in the office, in order to apply appropriate decisions on the energy
saving in that specific room or on surveillance policies. As an example, in an office where nobody
is present, lights and any other electric gear, a part from the computer, are automatically switched
off, if on. Sensor data collected by the deployed WSN are stored in a NoSQL document-oriented
local database, such as MongoDb. In order to provide both a real-time view of the sensor data and a
configuration interface, we implemented a web interface, named WebOffice, that runs on JBoss 7.1;
it implements the JavaEE specification, is free and open-source (Figure 2).
4. Methodologies
The algorithms used to run with the hardware introduced in previous Section are shortly described
in the following paragraphs.
Proceedings 2018, 2, 95 5 of 8
trained to recognize faces with a dataset composed of 2.6 million faces belonging to 2.6 thousands
different persons. We take the output of the fully-connected 7th layer of the CNN (fc7), which is a high
level representation of the input face, and we compute the L2 distance between the query face deep
feature and all the deep features in the training set, in order to perform the kNN classification and to
obtain the identity of the person whom the query face belongs to.
The person re-identification system is composed of two hardware components: an embedded
camera connected to the network and a PC that processes the acquired images and performs the
person re-identification task. The camera is placed in front of the office entrance, in order to have a
clear view of the face of the entering person, and sends the acquired images to the PC that executes
the re-identification algorithm. In particular, for each captured image, a first phase of face detection
is executed in order to determine and crop the face from the whole image. We used the OpenCV
implementation of the Viola-Jones [19] algorithm for the face detection. Each detected face is then
processed by the VGG_Face network in order to extract the deep feature that will be used to perform
the kNN similarity search. If the distance of the person returned by the kNN search is under
a given threshold, then the entering person is among the allowed ones. Otherwise, the system
raises a notification of an unauthorized person entering a monitored office.
5. Conclusions
In this work, we presented the Smart Building Suite, where independent and different technologies
are developed in order to realize a multimodal surveillance system. As a future work we plan to
integrate all the technologies described in the paper and to enhance the current surveillance system with
a high-level reasoner that crosses all the selected information coming from smart cameras, the sensors
and the face recognition system, in order to better determine whether or not a non-authorized access
to the building has occurred and, in case, to raise a series of alarms, according to the severity of
the intrusion.
Acknowledgments: We gratefully acknowledge the support of NVIDIA Corporation for the donation of the Titan
X Pascal GPU used in this research, and the “Renewed Energy” project of the DIITET Department of CNR for
supporting the Smart Area activity, in whose framework this research is carried on.
Proceedings 2018, 2, 95 7 of 8
References
1. Barsocchi, P.; Ferro, E.; Fortunati, L.; Mavilia, F.; Palumbo, F. EMS@ CNR: An energy monitoring sensor
network infrastructure for in-building location-based services. In Proceedings of the 2014 International
Conference on High Performance Computing & Simulation (HPCS), Bologna, Italy, 21–25 July 2014;
pp. 857–862.
2. He, T.; Krishnamurthy, S.; Stankovic, J.A.; Abdelzaher, T.; Luo, L;. Stoleru, R.; Yan, T.; Gu, L.; Hui, J.;
Krogh, B. Energy-efficient surveillance system using wireless sensor networks. In Proceedings of the
2nd International Conference on Mobile Systems, Applications, and Services, ACM, Boston, MA, USA,
6–9 June 2004; pp. 270–283.
3. Magrini, M.; Moroni, D.; Palazzese, G.; Pieri, G.; Leone, G.; Salvetti, O. Computer vision on embedded
sensors for traffic flow monitoring. In Proceedings of the 2015 IEEE 18th International Conference on
Intelligent Transportation Systems (ITSC), Las Palmas, Spain, 15–18 September 2015; pp. 161–166.
4. Vairo, C.; Amato, G.; Chessa, S.; Valleri, P. Modeling detection and tracking of complex events in wireless
sensor networks. In Proceedings of the 2010 IEEE International Conference on Systems Man and Cybernetics
(SMC), Istanbul, Turkey, 10–13 October 2010; pp. 235–242.
5. Amato, G.; Chessa, S.; Gennaro, C.; Vairo, C. Efficient detection of composite events in wireless sensor
networks: Design and evaluation. In Proceedings of the 2011 IEEE Symposium on Computers and
Communications (ISCC), Kerkyra, Greece, 28 June–1 July 2011; pp. 821–823.
6. Kim, K.; Chalidabhongse, T.H.; Harwood, D.; Davis, L. Real-time foreground—Background segmentation
using codebook model. Real-Time Imaging 2005, 11, 172–185.
7. Barnich, O.; Van Droogenbroeck, M. Vibe: A universal background subtraction algorithm for video sequences.
IEEE Trans. Image Process. 2011, 20, 1709–1724.
8. Brutzer, S.; Höferlin, B.; Heidemann, G. Evaluation of background subtraction techniques for video
surveillance. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 1937–1944.
9. Leone, G.; Magrini, R.M.; Moroni, D.; Pieri, G.; Salvetti, O.; Tampucci, M. A smart device for monitoring
railway tracks in remote areas. In Proceedings of the 2016 International Workshop on Computational
Intelligence for Multimedia Understanding (IWCIM), Reggio Calabria, Italy, 27–28 October 2016; pp. 1–5.
10. Bloisi, D.D.; Iocchi, L.; Leone, G.R.; Pigliacampo, R.; Tombolini, L.; Novelli, L. A distributed vision system
for boat traffic monitoring in the venice grand canal. In Proceedings of the 2nd International Conference on
Computer Vision Theory and Applications (VISAPP), Barcelona, Spain, 8–11 March 2007; pp. 549–556.
11. Amato, G.; Falchi, F.; Gennaro, C. Fast image classification for monument recognition. J. Comput. Cult. Herit.
2015, 8, 18.
12. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444.
13. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Proceedings of the 25th International Conference on Advances in Neural Information Processing Systems,
Lake Tahoe, Nevada, 3–6 December 2012; pp. 1097–1105.
14. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object
detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158.
15. Amato, G.; Falchi, F.; Vadicamo, L. Visual recognition of ancient inscriptions using convolutional neural
network and Fisher vector. J. Comput. Cult. Herit. 2016, 9, 21.
16. Dahl, G.E.; Yu, D.; Deng, L.; Acero, A. Context-dependent pre-trained deep neural networks for
large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 30–42.
17. Amato, G.; Carrara, F.; Falchi, F.; Gennaro, C.; Meghini, C.; Vairo, C. Deep learning for decentralized parking
lot occupancy detection. Exp. Syst. Appl. 2017, 72, 327–334.
18. Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the British Machine Vision
Conference, Swansea, UK, 7–10 September 2015.
19. Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154.
20. Chen, D.; Barker, S.; Subbaswamy, A.; Irwin, D.; Shenoy, P. Non-intrusive occupancy monitoring using smart
meters. In Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings,
Roma, Italy, 11–15 November 2013; pp. 1–8.
Proceedings 2018, 2, 95 8 of 8
21. Barsocchi, P.; Cimino, M.G.; Ferro, E.; Lazzeri, A.; Palumbo, F.; Vaglini, G. Monitoring elderly behavior via
indoor position-based stigmergy. Pervasive Mob. Comput. 2015, 23, 26–42.
22. Barsocchi, P.; Crivello, A.; Girolami, M.; Mavilia, F.; Ferro, E. Are you in or out? Monitoring the human
behavior through an occupancy strategy. In Proceedings of the 2016 IEEE Symposium on Computers and
Communication (ISCC), Messina, Italy, 27–30 June 2016; pp. 159–162.
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).