0% found this document useful (0 votes)

29 views15 pages

1 - A Deep Learning Based Computer Vision Solution For Construction Vehicle Detection

Uploaded by

Edu Miranda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views15 pages

1 - A Deep Learning Based Computer Vision Solution For Construction Vehicle Detection

Uploaded by

Edu Miranda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

DOI: 10.1111/mice.

12530

I N D U ST R I A L A P P L I CAT I O N

A deep-learning-based computer vision solution for construction

vehicle detection

Saeed Arabi Arya Haghighat Anuj Sharma

Department of Civil, Construction, and

Environmental Engineering, Iowa State
Abstract
University, Ames, Iowa This paper aims at providing researchers and engineering professionals from the first
step of solution development to the last step of solution deployment with a practical
Correspondence
Anuj Sharma, Department of Civil, Construc- and comprehensive deep-learning-based solution for detecting construction vehicles.
tion, and Environmental Engineering, Iowa This paper places particular focus on the often-ignored last step of deployment. Our
State University, Ames, IA 50011, USA.
first phase of solution development involved data preparation, model selection, model
Email: [email protected]
training, and model validation. Given the necessarily small-scale nature of construc-
tion vehicle image datasets, we propose as detection model an improved version of the
single shot detector MobileNet, which is suitable for embedded devices. Our study’s
second phase comprised model optimization, application-specific embedded system
selection, economic analysis, and field implementation. Several embedded devices
were proposed and compared. Results including a consistent above 90% mean aver-
age precision confirm the superior real-time performance of our proposed solutions.
Finally, the practical field implementation of our proposed solutions was investigated.
This study validates the practicality of deep-learning-based object detection solutions
for construction scenarios. Moreover, the detailed information provided by the current
study can be employed for several purposes such as safety monitoring, productivity
assessments, and managerial decision making.

1 I N T RO D U C T I O N deep learning shown major breakthroughs due to increasingly

affordable computing hardware, that is, graphics processing
Over the years, several methods have been developed by units (GPUs), and the increased availability of large-scale
researchers and engineers to monitor civil infrastructure datasets for training deep learning models (Russakovsky et al.,
and evaluate its performance. Although conventional sensor- 2015).
based methods are still popular and effective (Amezquita- Researchers have used deep learning to tackle computer
Sanchez & Adeli, 2016; Amezquita-Sanchez, Valtierra- vision problems in several areas of civil engineering. Among
Rodriguez, & Adeli, 2018; Arabi & Shafei, 2019; Arabi, them, crack detection (Cha, Choi, & Büyüköztürk, 2017;
Shafei, & Phares, 2018, 2019), recently new computer-vision- Dung & Le, 2019; Yeum & Dyke, 2015), structural damage
based methods, such as deep-learning-based computer vision detection (Cha, Choi, Suh, Mahmoudkhani, & Büyüköztürk,
solutions, have caught the attention of researchers in dif- 2018; Gao & Mosalam, 2018; Li, Yuan, Zhang, & Yuan, 2018;
ferent areas of civil and infrastructure engineering (LeCun, Liang, 2019; Y. Lin, Nie, & Ma, 2017), transportation net-
Bengio, & Hinton, 2015). Although the main building block work reliability analysis (Nabian & Meidani, 2018), traffic
of deep learning, that is, neural networks, has been utilized congestion and incident detection (Chakraborty et al., 2018;
by researchers for decades (Adeli, 2001), only recently has Chakraborty, Sharma, & Hegde, 2018), tunnel-lining defect

© 2020 Computer-Aided Civil and Infrastructure Engineering

Comput Aided Civ Inf. 2020;35:753–767. wileyonlinelibrary.com/journal/mice 753

14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
754 ARABI ET AL.

detection (Xue & Li, 2018), and pavement crack detection single object-tracking algorithm. Second, they used optical
(Zhang et al., 2017) have been investigated by researchers. flow estimation to extract temporal and spatial information
Image-based monitoring and evaluation have also received about objects. Then, they classified the activities from both
considerable attention in the construction engineering domain information streams. Finally, they merged the results of
due to the dynamic nature and vastness of typical construc- the temporal and spatial stream classifications to estimate
tion sites. For instance, Memarzadeh, Golparvar-Fard, and the final activity (Luo, Li, Cao, Yu et al., 2018). Luo,
Niebles (2013) used handcrafted feature extraction, namely Li, Cao, Yu et al.’s framework used still-image data to
the histogram of oriented gradients (HOG) and colors, in identify worker activities. They employed Faster R-CNN
order to detect construction vehicles and workers. Also, Chi to detect image objects and spatial information about these
and Caldas (2011) proposed a methodology for detecting con- objects in order to define activity patterns (Luo, Li, Cao,
struction vehicles and workers using background subtraction, Dai et al., 2018).
morphological processing, and neural networks for classi- Construction vehicle detection using deep learning has
fying objects. Kim, Kim, and Kim (2016) have proposed a also been investigated by some researchers. Along with using
framework for monitoring struck-by accidents using computer transfer learning (Shao, Zhu, & Li, 2015), Kim, Kim, Hong,
vision techniques and fuzzy inference. In the computer vision and Byun (2018) employed R-FCN (Dai, Li, He, & Sun, 2016)
step, they used background subtraction, morphological oper- to detect construction vehicles. W. Fang, Ding, Zhong, Love,
ations and object classification, and tracking. Afterward, they and Luo (2018) used Faster R-CNN to detect workers and
employed proximity and crowdedness as contextual informa- excavators on construction sites. Son, Choi, Seong, and Kim
tion about the construction site in fuzzy inference (Kim et al., (2019) used Faster R-CNN to detect construction workers
2016). in various poses against a variety of backgrounds. However,
Unfortunately, the abovementioned traditional computer almost all of the abovementioned endeavors have focused
vision solutions suffer from an inherent lack of generaliza- on providing solutions that disregard the proposed solu-
tion and require extensive development effort and domain tion’s real-time on-the-construction-site performance, effi-
knowledge (LeCun et al., 2015). In contrast, a deep learning ciency, and cost of deployment.
approach to computer vision problems introduces an alterna- Extensive review of the literature finds that most studies
tive end-to-end solution capable of automatic feature extrac- focus on the development of improved techniques for image
tion without explicit use of domain-knowledge-based feature analytics, but very few look at the economics of final deploy-
selection. ment nor the inevitable trade-off between accuracy and cost
Among safety-specific studies, W. Fang, Ding, Luo, and of deployment. In the infrastructure management domain, we
Love (2018) used the Faster R-CNN (Ren, He, Girshick, found only one study investigating inference at the edge of net-
& Sun, 2015) structure to detect workers and their safety works applied to road damage detection (Maeda, Sekimoto,
harnesses. Faster R-CNN was also used by Q. Fang et al. Seto, Kashiyama, & Omata, 2018). However, to the best of
(2018) to detect workers with no hardhat. A good compari- our knowledge, there is no study in construction engineering
son between traditional computer vision techniques and deep that focuses on inference at the edge using embedded devices.
learning solutions can be seen by comparing and contrasting This paper aims at filling this gap and providing researchers
Q. Fang et al. (2018) and Park, Elsafty, and Zhu (2015). Park and engineers a practical and comprehensive deep-learning-
et al. (2015) employed handcrafted feature extraction, that is, based solution that can detect construction vehicles from the
HOG, to detect human bodies and hardhats. They then used very first step, solution development, to the last step, solution
spatial information about hats and bodies to match detected deployment. Our solution includes not only related software
hats and bodies (Park et al., 2015). Q. Fang et al. (2018), how- but also hardware.
ever, utilized Faster R-CNN to detect workers with no hard- This paper covers both phases necessary for a comprehen-
hats directly. The solution proposed by Q. Fang et al. (2018) sive deep learning solution for industrial applications. In the
is end to end, requiring no handcrafted feature extraction. first phase, the development phase, data gathering and prepa-
In the activity-understanding area, Ding et al. (2018) ration, model selection, model training, and model validation
used a hybrid deep learning model to detect dangerous are covered. The second phase describes model optimization,
activities. They employed the Inseption V3 (Szegedy et al., application-specific hardware selection, and solution evalua-
2015) structure to extract features and then used long short- tion. The main contributions of this paper can be summarized
term memory (Greff, Srivastava, Koutník, Steunebrink, & as follows:
Schmidhuber, 2016) in order to incorporate temporal effects
and identify unsafe activities (Ding et al., 2018). Moreover, • Improving the advanced infrastructure management (AIM)
Luo et al. proposed a convolutional network-based solution construction vehicle dataset by adding a new vehicle class
for recognizing workers’ activities. Their study consisted and annotating the additional images associated with this
of four main steps. First, they tracked the workers using a class.
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARABI ET AL. 755

• Proposing an improved version of the SSD_MobileNet TABLE 1 Data-splitting details for training, validation, and
object detector that is suitable for embedded devices. testing image datasets

• Proposing several embedded devices for various scenarios Image types Total Training Validation Testing
in the construction engineering domain. These scenarios Loader 787 504 126 157
include but are not limited to (a) applications requiring real- Excavator 361 231 58 72
time performance such as safety and object tracking and (b) Dump truck 760 486 122 152
applications needing semi-real-time performance such as Mixer truck 659 422 105 132
productivity analysis as well as management- and security- Roller 353 226 56 71
related analyses. Grader 351 225 56 70
• Proposing two hardware setups to meet the needs of varying
use cases for practical field implementation.
Visual inspection of the resulting datasets reveals multi-
ple challenges associated with the computer vision task, such
2 DEVELOPMENT PHASE as viewpoint variation, scale variation, occlusion, background
clutter, and intraclass variation. Figure 1 shows examples.
Our development phase involved (a) data gathering and pre- Contrary to large-scale image datasets, relatively fewer data
processing that resulted in prepared data leading to (b) our are available for specialized applications such as construc-
proposed detection model and (c) training and validation of tion vehicle detection. Consequently, training a model capa-
this model. ble of generalization while not underfitted or overfitted may
be unattainable. Moreover, in some applications, limitations
2.1 Data gathering and preprocessing in hardware resources even further increase the complexity
of developing a workable solution. This is not necessarily the
Our first step of solution development involved data gathering
case for other scenarios such as computer vision object detec-
and preprocessing. Data can be gathered using three major
tion competitions that differ greatly from real-life use cases.
processes. First, it can be gathered from available large-scale
Although the authors have tried their best to provide com-
datasets such as ImageNet (Russakovsky et al., 2015), the
prehensive information about each phase of the study, it was
Common Objects in Context (COCO) database (T.Y. Lin
not possible to cover every technical detail within this paper.
et al., 2014), and the Open Image dataset (Kuznetsova et al.,
However, we have referenced sufficient studies covering these
2018). Second, data can be gathered using web-crawling
details to enable readers to fully understand and follow our
techniques (Olston & Najork, 2010). Finally, image data can
process. The subsequent sections detail our proposed detec-
be captured at the location of application by researchers/
tion model, model training, and model validation.
engineers. The first and second approaches were used in this
study.
We used the AIM dataset (Kim et al., 2018) originally from
2.2 Proposed detection model
the ImageNet dataset (Russakovsky et al., 2015). This dataset There are two general types of object detectors, that is, one-
is a subset of ImageNet and contains construction vehicle stage and two-stage object detectors. Although two-stage
images, that is, excavators, loaders, rollers, concrete mixer object detectors are known for their accuracy, they are too
trucks, and dump trucks. Moreover, to complete the data- computationally intensive to be used for embedded devices
gathering process, we employed the web-crawling technique or for applications requiring real-time performance (Girshick,
to enhance the dataset. Web crawling refers to an automated 2015; Girshick, Donahue, Darrell, & Malik, 2015; Ren et al.,
process in which a crawler (bot) systematically browses the 2015). One-stage detectors, however, combine these two steps
web to retrieve information (Olston & Najork, 2010). Web and perform classification and localization using one single
crawling was used to gather images related to the “Grader” network.
object class. The resulting dataset can be shared upon request. The detector model used in this study is the single shot
After both automated and human visual inspection of the detector (SSD; Liu et al., 2016). This model uses an auxil-
“Grader” images and confirmation of their correctness and iary network for feature extraction, also known as a base net-
quality, these additional construction vehicle images were work. We used MobileNet as the base network in this study.
annotated by the authors. The dataset was then split into a (a) MobileNet and its variants are optimized primarily for speed
training/validation dataset and (b) testing dataset. Only 80% (Howard et al., 2017).
of the original dataset was dedicated to training/validation. The main building block of MobileNets are depthwise sep-
The remaining 20% was dedicated to testing. The train- arable convolutions that factorize the standard convolution
ing/validation dataset was similarly divided. Only 80% of this into two distinct operations, that is, depthwise and pointwise
dataset was used for training. The other 20% was devoted to convolutions (Sifre, 2014). It can be shown that depthwise
validation. Table 1 summarizes the data-splitting details. separable convolutions have fewer parameters and can reduce
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
756 ARABI ET AL.

FIGURE 1 Examples of challenges associated with the visual detection of construction equipment: (a) viewpoint variation, (b) scale variation,
(c) background clutter, (d) occlusion, and (e) intraclass variation

the computation eight to nine times (Howard et al., 2017). In feature maps, some of them from the MobileNet base network,
this study, followed by each convolution in the network, batch to perform classification and localization regression. These
normalization (Ioffe & Szegedy, 2015) and Relu6 activation feature maps have different sizes in order to leverage high-
(Krizhevsky & Hinton, 2010) were used. SSD uses different level as well as low-level information.
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARABI ET AL. 757

SSD employs the Jaccard overlap (also known as IOU) to (a)

identify matches. Any proposed box that has a Jaccard overlap
of 0.5 or greater with groundtruth bounding boxes is consid-
ered to be a match. The loss function of the SSD detector can
be described as a summation of a localization loss and clas-
sification loss. Smooth L1 loss is used for localization (Ren
et al., 2015). An improved version of cross-entropy classifi-
cation loss was used in this study to boost detector perfor-
mance. The original SSD model used hard negative mining to
address the class imbalance issue. However, it has been shown
(b)
that one-stage detectors can achieve better performance, even
without hard negative mining, using a different classification
loss named focal loss (T.Y. Lin, Goyal, Girshick, He, & Dol-
lár, 2017). This loss adds a modulating factor of (1 − 𝑐 𝑝 )𝛾 ,
where 𝑐 𝑝 is the predicted confidence score for class 𝑝, to the
cross-entropy loss in order to down-weight the relative loss of
well-classified examples and puts more focus on a few hard
and misclassified examples. Focal loss with the empirical val-
ues of 𝛼 = 0.75 and 𝛾 = 2 was used in this study (T.Y. Lin et al.,
2017). It should be noted that focal loss cannot be used in con-
junction with hard negative mining. The Adam optimizer was FIGURE 2 Detection model behavior during training: (a)
employed in order to minimize the loss function and train the training loss over training iterations and (b) validation loss and mAP
over training iterations
network (Kingma & Ba, 2014). Finally, after a comprehen-
sive experiment with different learning rates, an exponentially
the test dataset, the AP was calculated for each category.
decayed learning rate was used to train the network.
Table 2 summarizes the details of model evaluation for all the
categories.
2.3 Training and validation To validate and monitor detector performance, mAP and
loss were calculated using the validation dataset during the
The training process was carried out using a high- training process. This helps ensure the detector maintains
performance GPU (GeForce GTX 1080Ti) and 12 Core overall performance during training without overfitting or los-
i7 3.2 GHz Intel CPU. Tensorflow-GPU 1.12 and CUDA 9 ing generalization. A review of Figure 2b and Table 2 indi-
were used for this training. Transfer learning was used to cates that the validation mAP is very close to the test mAP.
accelerate and enhance the training process and the weights As demonstrated in the previous section, the proposed model
were initialized from the MobileNet pretrained weight values. was able to detect the majority of the objects with high pre-
The training loss is illustrated in Figure 2a. The minimal cision and recall. Figure 3 shows various categories’ detec-
variation in loss during training as well as steady convergence tion results. However, the proposed model was imperfect for
to a small number (0.4) shows that the optimizer was able to some hard cases. Figure 4 depicts these detection failure cases.
find the global minimum for the loss function. The training Misclassified, merged, and missed detections are among the
time was about 3 hr. unsuccessful detections. Poor performance in these cases is
The PASCAL VOC challenge evaluation metric is used almost certainly due to the relatively small number of training
to evaluate the localization and classification performance of images. More training data under various conditions, such as
detectors (Everingham, Van Gool, Williams, Winn, & Zis- different orientations, scales, locations, and brightness, would
serman, 2010). Based on the PASCAL VOC metric, any certainly improve detection performance. Once a develop-
detection with Jaccard overlap ≥50% and an identical pre- ment phase is completed, a model is ready to be optimized
diction label with groundtruth is considered a correct detec- and deployed at the location of application. The subsequent
tion. As the value for false positives, false negatives, and section is devoted to the deployment phase.
true positives can be changed by setting different confidence
thresholds, it is important to evaluate precision and recall at
different thresholds in order to measure the overall perfor- 3 DEPLOYMENT PHASE
mance of a detector model for each category. Average pre-
cision (AP) can be defined as the average of the maximum The paradigm of inference at the edge as well as the embed-
precision at different recalls. Mean average precision (mAP) ded devices used in this study are covered in the fol-
is defined as the average of APs over all categories. Using lowing subsections. Three separate embedded devices were
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
758 ARABI ET AL.

TABLE 2 Average precision (AP) for each object category derived using the training model and test dataset
Dump truck Excavator Grader Loader Mixer truck Roller
AP 92.31% 83.70% 93.86% 93.77% 96.94% 86.65% mAP = 91.20%

FIGURE 3 Examples of successful detection with proposed SSD_MobileNet structure

FIGURE 4 Examples of failure in detection: (a) misclassified, (b) merged, (c and d) missed, and (e) wrong classification
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARABI ET AL. 759

proposed to address the needs of two distinct scenarios. several benefits result (International Electrotechnical Com-
NVIDIA Jetson TX2 (NVIDIA Developer, 2019a) along with mission, 2016):
TensorRT optimizations (NVIDIA Developer, 2019b) has
been introduced as a GPU-accelerated solution for applica-
• Efficient and fast intelligent decision making through
tions that need real-time, yet accurate performance. Moreover,
deploying the machine learning algorithms at the edge of
Jetson Nano (NVIDIA Developer, 2019a) with TensorRT
the network, thereby eliminating the roundtrip delay intro-
optimizations is presented in this paper as a GPU-accelerated
duced by the cloud-computing paradigm.
platform for low-demand applications. Additionally, Rasp-
berry Pi (R. Pi) 3B+ (Raspberrypi.org, 2019) along with Intel • Securing data close to its origin and being able to follow
Neural Compute Stick (NCS) (Software.intel.com, 2019) was local management and control policies.
investigated as an alternative for low-demand applications. • Fast recovery from network failure or maintenance.
• Decreasing data transfer costs by lowering communication
3.1 Inference at the edge over public networks. In this case, only alarms or decisions
The availability of labeled data generated by various types of are sent to cloud servers.
sensors and devices together with recent progress in the arti-
ficial intelligence (AI) area has introduced innovative appli- Three edge computing platforms are introduced in this
cations such as connected autonomous vehicles, smart cities, study, that is, NVIDIA Jetson TX2, NVIDIA Jetson Nano, and
and intelligent infrastructures. R. Pi 3 with an Intel NCS.
There are two approaches to intelligent decision making,
namely, cloud-computing-based decision making, and edge-
3.2 Jetson TX2
computing-based decision making. Cloud computing refers to
a set of computing services such as servers, storage, analytics, NVIDIA Jetson TX2 uses the Tegra System on Module (SoM)
databases, etc., which are delivered over the Internet. In this and is the size of a credit card with input, output, and process-
model, data acquisition is conducted at the edge of a network ing hardware, similar to a typical computer. It takes advan-
(via sensors). The data are then sent to the cloud for process- tage of NVIDIA GPUs, which enable it to accelerate deep-
ing and decision making. Although this solution is relatively learning-related computations. The width and length of the
fast and easy to set up, it is associated with some inherent TX2 module is 50 and 87 mm, respectively. Table 3 sum-
limitations such as latency and jitter, limited bandwidth, and marizes the technical specifications of the TX2. The TX2
personal data privacy and security (Ericsson.com, 2016). module is called the Jetson TX2 development kit when it
On the other hand, with the edge-computing paradigm, the is mounted on its development carrier board, which is a
gathering, storing, processing, and decision making can all be 17.78 cm × 17.78 cm printed circuit with typical input and
done at the edge of a network. Although the main disadvan- output ports. A heatsink and a fan are mounted on the module
tage of edge computing is deployment and maintenance costs, to improve heat transfer.

TABLE 3 Detailed specifications of Jetson TX2, Jetson Nano, Raspberry Pi 3B+, and Intel NCS
Jetson TX2 Jetson Nano Raspberry Pi 3B+ Intel NCS
GPU NVIDIA Pascal, 256 NVIDIA Maxwell, 128 Broadcom VideoCore IV Intel Movidius Myriad 2
CUDA cores CUDA cores Vision Processing Unit
(VPU)
CPU Dual-core Denver 2 64-bit Quad-core ARM 4 × ARM Cortex-A53, N.A.
+ Quad-core ARM Cortex-A57 MPCore 1.2 GHz
Cortex-A57 MPCore
Memory 8 GB 128-bit LPDDR4 4 GB 64-bit LPDDR4 1 GB LPDDR2 N.A.
Display 2x DSI, 2x DP 1.2, HDMI HDMI 2.0, DP1.2, eDP 1.4, HDMI, DSI N.A.
2.0, eDP 1.4 2x DSI
Data storage 32 GB eMMC, SDIO, MicroSD card MicroSD N.A.
SATA
USB USB 3, USB 2 USB 3, USB 2 USB 2 N.A.
Connectivity 1 Gigabit Ethernet, Gigabit Ethernet 100 Base Ethernet, 2.4 GHz USB 3
802.11ac WLAN, 802.11n wireless
Bluetooth
Mechanical 50 mm × 87 mm 45 mm × 70 mm 56.5 mm × 85.60 mm 72.5 mm x 27 mm
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
760 ARABI ET AL.

A neural network can be trained using a GPU-accelerated R. Pi may be suitable for basic computer tasks, it cannot
host machine or using a GPU-enabled cloud computer deliver high performance for computationally intensive tasks
instance. The neural network can then be optimized and like object detection. We therefore added an Intel Movidius
deployed on the TX2 module. NVIDIA JetPack should be NCS as a deep learning accelerator to the proposed system.
used as the Software Development Kit (SDK). JetPack should The NCS is a USB-drive-sized fanless deep learning device
be installed on the host machine to ensure the necessary tool- that can accelerate computationally intensive inference at the
kits and packages such as CUDA, CuDNN, and TensorRT will edge. It is powered by an Intel Movidius Vision Processing
also be installed on the TX2. In this study, JetPack 3.2 was Unit that optimizes neural network operations. It is an ideal
used to flash the TX2. compact deep learning inference accelerator for resource-
CUDA is a parallel computing platform that increases restricted platforms such as R. Pi. It supports the TensorFlow
computing performance by utilizing GPUs. The CUDA deep (TensorFlow, 2019) and Caffe (Jia et al., 2014) deep learning
neural network (cuDNN) is a GPU-accelerated library that frameworks. Detailed technical specifications for the NSC can
includes highly tuned implementations of operations such as be found in Table 3. The NCS consumes only 1 W of power
convolution, pooling, and activation. TensorRT is a platform and has proven that it can greatly accelerate inference over use
for high-performance deep learning inference that includes an of the R. Pi CPU alone (Software.intel.com, 2019). In order to
optimizer and runtime and enables the making of applications make this setup functional, a trained neural network must first
with low latency and high throughput. be converted into an intermediate representation (IR) using
TensorRT is a C++ library that improves the inference per- the OpenVINO toolkit provided by Intel. The optimized IR
formance of NVIDIA GPUs. The input of the TensorRT opti- can then be used for inference.
mizer is a trained neural network and its output is an opti-
mized inference engine. It is only the inference engine that
needs to be deployed in the production environment. Ten-
sorRT enhances latency, power efficiency, memory consump-
4 RESULTS AND DISCUSSION
tion, and the throughput of a network by combining layers and
optimizing kernel selection. It can further improve network
4.1 Comparison of embedded devices
performance by running it with lower precision. For instance, A trained neural network can be deployed on either the cloud
it eliminates layers whose outputs are not used, horizontally or embedded devices. As construction is typically a long-term
and vertically fuses convolution and activation operations, and process, utilizing cloud services can be expensive. For exam-
adjusts the precision of weights from FP32 to FP16 or INT8. ple, Amazon Machine Learning will cost more than $90 for
20 hr of compute time and 890,000 batch predictions. Conse-
3.3 Jetson Nano quently, in the context of the current study, we mainly focus
on embedded devices.
Similar to the Jetson TX2, the Jetson Nano is an SoM designed
The inference speed, power efficiency, and normalized
and optimized for AI applications. As an affordable alterna-
benefit of six options were investigated in this study: (a) a Jet-
tive to Jetson TX2, the Jetson Nano can be ideal for low-
son TX2, (b) a Jetson TX2 with TensorRT optimizations, (c) a
demand applications. The Jetson Nano module is mounted on
Jetson Nano, (d) a Jetson Nano with TensorRT optimizations,
a development carrier board of 100 mm × 80 mm. A heatsink
(e) a R. Pi 3B+ with an Intel NCS, and (f) a desktop computer
is mounted on the module to improve heat transfer. The Jet-
with a GTX 1080 Ti GPU and Intel Core i7 CPU. It should be
son Nano module supports all of the required libraries for
noted that all benchmarking was done by operating the Jetson
GPU-accelerated deep learning inference such as TensorRT,
TX2 and Nano at the maximum performance.
CUDA, cuDNN, etc. Similar to Jetson TX2, Jetson Nano has
In regard to the first Jetson TX2 embedded device, review
all standard input and output ports and it can be used as a
of the results shows that its inference speed is 25 frames per
Linux desktop. Table 3 summarizes the technical specifica-
second (FPS). In order to compare the inference accuracy
tions of Jetson Nano. Jetpack 4.2 was used in this study to
of this setup with the others, the AP for each category was
flash the Jetson Nano.
calculated using the test dataset and a mAP of 93.41% was
achieved.
3.4 Raspberry Pi and Intel NCS
For the second embedded device option, the model was
The R. Pi 3 Model B+ is the latest version of the R. Pi and is optimized using TensorRT and half-precision floating point
the size of a credit card, uses a system on chip, and can func- (FP16) accuracy. It was then deployed on the Jetson TX2.
tion similarly to a standard high-performance computer for Examination of the optimized model shows that it is able to
basic tasks. Its low cost and tiny size make it ideal for embed- obtain the inference speed of 47 FPS, which is well above
ded devices in particular. Table 3 summarizes the main tech- the inference speed needed for real-time applications. Utiliz-
nical specifications of R. Pi 3 Model B+ (R. Pi). Although ing TensorRT was able to greatly increase the inference speed
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARABI ET AL. 761

TABLE 4 AP for each object category derived using the optimized models on embedded systems
Dump truck Excavator Grader Loader Mixer truck Roller
Jetson TX2 [$600] 93.73% 87.74% 94.28% 96.43% 98.49% 89.78% mAP = 93.41%
Jetson TX2 with TensorRT [$600] 92.29% 83.67% 93.86% 93.77% 96.95% 87.63% mAP = 91.36%
Jetson Nano [$99] 92.72% 84.54% 93.87% 93.79% 96.95% 89.32% mAP = 91.86%
Jetson Nano with TensorRT [$99] 92.29% 83.36% 93.86% 93.75% 96.95% 86.69% mAP = 91.15%
Raspberry Pi 3B+ with NCS [$150] 92.30% 83.73% 93.87% 93.79% 96.95% 86.70% mAP = 91.22%
GTX 1080 Ti GPU with Intel Core i7 92.31% 83.70% 93.86% 93.77% 96.94% 86.65% mAP = 91.20%
CPU [$1700]

(from 25 to 47 FPS) at the cost of only minimally reduced pared to that of the Jetson TX2 and Jetson Nano, this slight
inference accuracy (from 93.41% to 91.36% mAP). This com- inconsistency between mAPs was anticipated.)
bination is especially ideal for safety as well as object-tracking A detailed inference accuracy comparison across the pro-
applications which require real-time processing. posed embedded devices can be found in Table 4. Comparison
Regarding the third and fourth embedded device options, of inference speed for each of the six options discussed is pre-
for the Jetson Nano without Tensor RT optimization, infer- sented in Figure 5a.
ence speed and accuracy were 13.9 FPS and 91.86% mAP, Inference efficiency was also investigated. Inference effi-
respectively. Once the model was optimized on the Jetson ciency is measured by dividing inference speed by power
Nano using Tensor RT, inference speed increased to 22 FPS consumption, namely FPS/Watt. The Jetson TX2 at maxi-
with a mAP of 91.15%. This combination is likely to be mum performance consumed 15 W of power, the Jetson Nano
especially beneficial for applications requiring semi-real-time 10 W of power, the R. Pi 3B+ and Intel NCS 6 W, and desk-
performance. For example, this setup could be used as a top PC (with a GTX 1080 Ti GPU and Intel Core i7 CPU) is
video-recording trigger in certain situations to improve the estimated to consume almost 850 W. Figure 5b summarizes
management and security of a construction site. This would the inference efficiency of the different setups investigated in
save substantial storage space as well as facilitate inspection this study. The Jetson TX2 with TensorRT optimization was
processes. the most efficient option investigated in this study.
For the fifth embedded device option investigated in this Normalized inference benefit analysis was also conducted
study, it should be pointed out that as elaborated in Sec- for the proposed embedded devices, considering the price of
tion 3.1.3, an intermediate representative (IR) is needed to each development kit. At the time of writing for this paper,
conduct inference with a R. Pi and NCS setup. The NCS is the price of the Jetson TX2, Jetson Nano, R. Pi, and Intel NCS
not inherently necessary in this setup because inference can were $600, $99, $75, and $75, respectively. The desktop PC
be conducted on R. Pi using OpenCV with an IR backend. used in this study cost roughly $1700. Figure 5c depicts the
However, the resulting inference speed (about 0.25 FPS in studied systems’ normalized inference benefit. Based on the
this study) is very low due to R. Pi’s limited computational results of this analysis, the Jetson Nano with TensorRT opti-
performance. Adding the NCS to the R. Pi setup increased mization offers the highest inference benefit.
the inference speed 32 times to 8 FPS. The inference accu-
racy of this setup is 91.22% mAP. (It should be noted that
multiple NCSs can be used together to further enhance infer-
4.2 Model selection and comparative study
ence speed.) Similar to the standalone Jetson Nano, the R. Pi In order to choose a suitable object detector, there are two
with NCS setup would also be suitable for applications need- main components of the model that should be considered:
ing semi-real-time performance. For example, semi-real-time the meta-architecture and feature extractor. Usually, Faster-
tracking of construction vehicles might be of interest for pro- RCNN (Ren et al., 2015), SSD (Liu et al., 2016), or R-FCN
ductivity analytics applications. (Dai et al., 2016) are used as the meta-architecture and incep-
The last option evaluated in this study is the use of a desktop tion (Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, 2016);
PC. Though desktop PCs can be used for inference at the edge, VGG (Simonyan & Zisserman, 2014), ResNet (He, Zhang,
this setup requires expensive fiber optic cables to transfer Ren, & Sun, 2016), MobileNet (Howard et al., 2017), etc., or
video feeds in real time. For construction sites already using their variants are used as the feature extractor (Huang et al.,
fiber optic cables for other purposes, this method is workable. 2017). Depending on the use case, different combinations of
With the setup used in the current study, this allowed for one meta-architecture and feature extractor can be used.
video stream to be processed with an inference speed of 166 Any proposed model can be optimized over two main cri-
FPS and a mAP of 91.20%. (As different TensorFlow versions teria, that is, inference speed or inference accuracy. Usu-
were used to generate .pb frozen graphs for this setup com- ally, using an accuracy-optimized feature extractor leads to
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
762 ARABI ET AL.

SSD has the least inference time, memory usage, and highest
overall mAP (Huang et al., 2017). (It should be noted these
results were achieved using the large-scale COCO dataset;
T.Y. Lin et al., 2014).
Therefore, we wanted to compare our model proposed in
this study to feature extractor/meta-architecture combina-
tions reported in the literature as particularly optimized for
accuracy and speed, respectively. To compare the inference
accuracy of the model presented in this study, we trained a
Faster-RCNN_Inception-ResNet-V2 combination with our
dataset. Faster-RCNN_Inception-ResNet-V2 was used in
particular because it has been reported as the most accurate
combination by Huang et al. (2017). To compare the infer-
ence speed of our SSD_MobileNet model, we trained an
SSD_Inception-V2 because this has been reported the fastest
model after SSD_MobileNet.
To ensure a realistic comparison with our proposed model,
we set the image input size for these models to 300 × 300 pix-
els. The training and validation loss as well as mAP results
can be seen in A1. Review of the results shows that both the
model reported in the literature to be particularly accuracy-
optimized and that reported as particularly speed-optimized
perform worse than our proposed model. The highest mAP
achieved by the Faster-RCNN_Inception-ResNet-V2 combi-
nation was 75.71%, which is far less than the above 91% mAP
of our proposed SSD_MobileNet model. This was expected
due to the limited size of our dataset and the huge number of
parameters needed to train Faster-RCNN_Inception-ResNet-
V2 models. Other studies have reported similar outcomes.
For example, Xue and Li (2018) could not accomplish the
training of VGG16 due to the very high number of param-
eters in this model. The slow inference speed of the Faster-
RCNN_Inception-ResNet-V2 combination was also antici-
pated and, indeed, even with the GTX 1080 Ti GPU, use of
this model achieved an inference speed of only 3 FPS.
For SSD_Inception-V2 combination on the GTX 1080 Ti
GPU, an inference speed of 85 FPS was achieved. This lower
FIGURE 5 Performance comparison of proposed embedded inference speed than with our SSD_MobileNet model (i.e.,
systems: (a) inference speed, (b) inference efficiency, and (c) inference 85 FPS vs. 166 FPS) was expected as similar results have also
normalized benefit been reported in other studies (Huang et al., 2017). It should
be noted that 67.65% was the highest mAP achieved by this
an accurate object detector (Huang et al., 2017). Inception model.
ResNet-V2 (Szegedy, Ioffe, Vanhoucke, & Alemi, 2017) has
demonstrated an exceptional top-1 accuracy of 80.4% on the
ImageNet dataset (Huang et al., 2017). A comparison of
4.3 Practical field implementation
different combinations of this feature extractor and vari- At the production level, development carrier boards are not
ous meta-architectures has shown that the Faster-RCNN_ usually used due to the intended application’s size and weight
Inception-ResNet-V2 is the most accurate combination constraints. Alternatives include several third-party carrier
(Huang et al., 2017). boards. In this study, Connect Tech’s Orbitty carrier board
In contrast, speed-optimized feature extractors having designed for the Jetson TX2 module (Connect Tech Inc.,
fewer parameters increase inference speed and lower mem- 2019) was used. This board’s size is identical to the TX2
ory usage (Huang et al., 2017). MobileNet is a good example module, that is, 87 mm × 50 mm and it offers USB2, USB3,
(Howard et al., 2017). It has been shown that MobileNet with and Ethernet ports. The process required was detaching the
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARABI ET AL. 763

Jetson TX2 module from the development board and attach- chosen because not all neural network layers are supported by
ing it to the Orbitty carrier board and reflashing it with TensorRT optimizations by default.
the Orbitty carrier board in order to activate all the USB Hardware limitations should also be considered when
ports. designing a solution for use in the field. Although in this
The input power of the computing unit is 9–14 V DC, which study the Jetson Nano and R. Pi with NCS demonstrated
can be supplied via wall power using an AC power adapter or substantial normalized benefit, these devices should be used
by a DC battery. with caution because they have limited memory in addition to
To provide examples of how our proposed solution might being prone to overheating and consequently to system freez-
be implemented in the field, two sets of hardware were pro- ing. This should especially be considered in regard to high-
posed and investigated in this study for two distinct use cases. temperature construction seasons. Moreover, as the hardware
The first is powered by wall power, using an AC-to-DC may not be easily accessible after installation on the construc-
adapter. This is the preferred setup where wall power is avail- tion site, its Internet connectivity should also be considered
able. A female power adapter was used to connect the AC-to- when designing a solution. Both Jetson TX2 and R. Pi have a
DC adapter to the computing unit. The second setup is pow- built-in Wi-Fi adapter, whereas the Jetson Nano does not. A
ered by a DC battery and can be used where wall power is PCI Express network adapter or wireless Internet dongle can
not available on the construction site. A 3S LiPo battery was be used to connect the Jetson Nano to Internet.
used, which is capable of providing 11.1 V of power. This
5,000 mAh battery is capable of powering the system for
almost 4 hr. Because battery voltage will drop as it is used, 5 CONC LU SI ON
a voltage-regulator power supply board was used to ensure
a consistent 12-V output to the carrier board (O’Kelly et al., A comprehensive deep-learning-based solution for real-time
2019). An actual application was considered to test the pro- construction vehicle detection has been proposed in this study.
posed solution. Figure A2a illustrates an arbitrary frame from Many deep-learning-based computer vision studies neglect
a 2-min video of excavator operation using a fixed camera. solution deployment and, to the best of our knowledge, this
The black circles are the center of the detection bounding is the first study dealing with construction vehicle detection
boxes in the entire video. As the data were limited, we extrap- to address this gap. Although this study has focused on solu-
olated the detection results as if a 4-hr video was available tion deployment, it has also covered solution development.
(Figure A2b). Finally, the heatmap of vehicle operation was This study’s solution development phase involved initial
produced to determine its activity level. Figure A2c is the data gathering using available, labeled datasets followed by
heatmap generated using the data presented in Figure A2b. use of the web-crawling technique. This study proposes as
Using the hardware and software proposed in this section, its object detection model an improved SSD_MobileNet.
the detection results can be saved and uploaded to the cloud. The model was carefully selected to take into account the
Then, the heatmaps can be generated using the detection hardware-restricted nature of embedded devices. The trained
results in order to identify the active or inactive vehicles as model was then evaluated and its generalizability verified.
well as their level of activity. This study has proposed three embedded devices to address
the needs associated with several scenarios. First, an NVIDIA
Jetson TX2 with TensorRT optimizations was introduced as
4.4 Hardware/software limitations of
a GPU-accelerated solution for scenarios needing real-time,
proposed solution
yet accurate, performance such as safety- and construction-
The proposed models in this study resize images to 300 × 300 vehicle-tracking applications. For low-demand applications,
pixels at the beginning of the network. For some applications, both a Jetson Nano and an R. Pi 3B+/Intel NCS setup were
this may be a limitation of the current solution. Although proposed. These latter solutions are particularly suitable for
resizing contributes to the model’s high inference speed, this scenarios requiring semi-real-time performance such as pro-
does make objects smaller. Generally, resizing images of con- ductivity and other managerial applications. Among the pro-
struction vehicles to a smaller resolution is not problem- posed embedded systems, the Jetson TX2 with TensorRT
atic because construction vehicles are large and their visual optimizations had the highest inference speed and efficiency,
appearance diverse. However, the model should be used with whereas the Jetson Nano was associated with the highest nor-
caution for applications involving vehicles extremely far from malized inference benefit.
monitoring cameras. To ensure similar inference accuracy to Smartphones are sometimes used for the deployment of
that in our study, such images should be checked after resizing deep learning models. However, although smartphones can
to verify all construction vehicles the images include remain be useful for inspection purposes, they are not suitable to be
recognizable to the human eye. Moreover, to leverage Ten- employed as embedded devices because they cannot be kept
sorRT optimizations, deep learning model must be carefully in the field for long-term monitoring, especially during harsh
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
764 ARABI ET AL.

weather conditions. Moreover, smartphones are not optimized Arabi, S., Shafei, B., & Phares, B. M. (2019). Investigation of fatigue
for AI applications and their poor inference speed perfor- in steel sign-support structures under diurnal temperature changes.
mance, due to lack of GPUs, has been reported in the literature Journal of Constructional Steel Research, 153, 286–297.
Cha, Y.-J., Choi, W., & Büyüköztürk, O. (2017). Deep learning-
(Maeda et al., 2018).
based crack damage detection using convolutional neural networks.
To compare the performance of our proposed model, we
Computer-Aided Civil and Infrastructure Engineering, 32(5), 361–
also conducted a comparative study. This study supported pre- 378.
vious findings that the performance of models trained on a Cha, Y.-J., Choi, W., Suh, G., Mahmoudkhani, S., & Büyüköztürk, O.
large-scale dataset cannot be generalized to models trained (2018). Autonomous structural visual inspection using region-based
on a more limited dataset (Xue & Li, 2018). For example, deep learning for detecting multiple damage types. Computer-Aided
the Faster R-CNN model that had the highest mAP with the Civil and Infrastructure Engineering, 33(9), 731–747.
Chakraborty, P., Adu-Gyamfi, Y. O., Poddar, S., Ahsani, V., Sharma, A.,
COCO dataset performed much more poorly than our pro-
& Sarkar, S. (2018). Traffic congestion detection from camera images
posed model when applied to this study’s smaller construction
using deep convolution neural networks. Transportation Research
vehicle dataset. Therefore, it is of vital importance that dataset Record: Journal of the Transportation Research Board, 2672(45),
size be considered when designing deep-learning-based com- 222–231.
puter vision solutions. Chakraborty, P., Sharma, A., & Hegde, C. (2018). Freeway traf-
A wall-power setup is preferable for deploying an on-site fic incident detection from cameras: A semi-supervised learning
construction vehicle detection solution because DC battery approach. In 21st International Conference on Intelligent Trans-
setups require adding a voltage-regulator circuit in order to portation Systems (ITSC) (pp. 1840–1845), IEEE. https://doi.org/
10.1109/ITSC.2018.8569426
ensure voltage consistency. Moreover, DC batteries offer only
Chi, S., & Caldas, C. H. (2011). Automated object identification using
limited runtime and the maintenance of DC batteries, espe- optical video cameras on construction sites. Computer-Aided Civil
cially LiPo batteries, requires extra caution. and Infrastructure Engineering, 26(5), 368–380.
In regard to future studies, construction vehicle detection Connect Tech Inc. (2019). Orbitty carrier for NVIDIA Jetson
applications such as safety monitoring, productivity assess- TX2/TX2i/TX1. Connect Tech Inc. Retrieved from http://connecttech.
ment, and construction site management should be investi- com/product/orbitty-carrier-for-NVIDIA-Jetson-tx2-tx1/
Dai, J., Li, Y., He, K., & Sun, J. (2016, December). R-FCN: Object detec-
gated. Potential safety monitoring applications to investigate
tion via region-based fully convolutional networks. In Proceedings of
include the use of multiple cameras along with homographic
the 30th International Conference on Neural Information Processing
transformation in order to obtain distance information that can Systems (pp. 379–387). Curran Associates Inc.
then be used to send an alarm signal to vehicles at risk of col- Ding, L., Fang, W., Luo, H., Love, P. E. D., Zhong, B., & Ouyang,
lision. Productivity assessment applications could include the X. (2018). Automation in construction. A deep hybrid learning
utilization of tracking information in order to identify active model to detect unsafe behavior: Integrating convolution neural net-
and inactive vehicles. Management applications could include works and long short-term memory. Automation in Construction, 86,
the activation of video-recording triggers in response to pre- 118–124.
Dung, C. V., & Le, A. D. (2019). Autonomous concrete crack detection
determined situations in order to improve the management
using deep fully convolutional neural network. Automation in Con-
and the security of a construction site. Clearly, there are many struction, 99(March), 52–58.
potential benefits to applying the real-time monitoring tech- Ericsson.com. (2016). Ericsson—Hyperscale cloud—Reimagining data
nologies demonstrated feasible in this study. centers from hardware to applications. Retrieved from https://www.
ericsson.com/en/white-papers/hyperscale-cloud-reimagining-data-
REFERENCES centers-from-hardware-to-applications
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisser-
Adeli, H. (2001). Neural networks in civil engineering: 1989–2000. man, A. (2010). The PASCAL visual object classes (VOC) challenge.
Computer-Aided Civil and Infrastructure Engineering, 16(2), 126– International Journal of Computer Vision, 88, 303–338.
142. Fang, Q., Li, H., Luo, X., Ding, L., Luo, H., Rose, T. M., & An,
Amezquita-Sanchez, J. P., & Adeli, H. (2016). Signal processing tech- W. (2018). Detecting non-hardhat-use by a deep learning method
niques for vibration-based health monitoring of smart structures. from far-field surveillance videos. Automation in Construction, 85,
Archives of Computational Methods in Engineering, 23(1), 1–15. 1–9.
Amezquita-Sanchez, J. P., Valtierra-Rodriguez, M., & Adeli, H. Fang, W., Ding, L., Luo, H., & Love, P. E. D. (2018). Falls from
(2018). Wireless smart sensors for monitoring the health condi- heights: A computer vision-based approach for safety harness detec-
tion of civil infrastructure. Scientia Iranica A, 25(6), 2913–2925. tion. Automation in Construction, 91, 53–61.
https://doi.org/10.24200/SCI.2018.21136 Fang, W., Ding, L., Zhong, B., Love, P. E. D., & Luo, H. (2018). Auto-
Arabi, S., & Shafei, B. (2019). Multi-stressor fatigue assessment of steel mated detection of workers and heavy equipment on construction
sign-support structures: A case study in Iowa. Engineering Struc- sites: A convolutional neural network approach. Advanced Engineer-
tures, 200. https://doi.org/10.1016/j.engstruct.2019.109721 ing Informatics, 37, 139–149.
Arabi, S., Shafei, B., & Phares, B. M. (2018). Fatigue analysis of sign- Gao, Y., & Mosalam, K. M. (2018). Deep transfer learning for image-
support structures during transportation under road-induced excita- based structural damage recognition. Computer-Aided Civil and
tions. Engineering Structures, 164(2), 305–315. Infrastructure Engineering, 33(9), 748–768.
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARABI ET AL. 765

Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE Interna- Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., …
tional Conference on Computer Vision (pp. 1440–1448) Zitnick, C. L. (2014). Microsoft COCO: Common objects in context.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2015). Region- European Conference on Computer Vision (pp. 740–755). Cham:
based convolutional networks for accurate object detection and Springer.
segmentation. IEEE Transactions on Pattern Analysis and Machine Lin, Y., Nie, Z., & Ma, H. (2017). Structural damage detection with
Intelligence, 38(1), 142–158. automatic feature-extraction through deep learning. Computer-Aided
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmid- Civil and Infrastructure Engineering, 32(12), 1025–1046.
huber, J. (2016). LSTM: A search space odyssey. IEEE Transactions Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., &
on Neural Networks and Learning Systems, 28(10), 2222–2232. Berg, A. C. (2016). SSD: Single shot multibox detector. European
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning Conference on Computer Vision (pp. 21–37). Cham: Springer.
for image recognition. In Proceedings of the IEEE Conference on Luo, X., Li, H., Cao, D., Dai, F., Seo, J., & Lee, S. (2018). Recognizing
Computer Vision and Pattern Recognition (pp. 770–778). diverse construction activities in site images via relevance networks
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, of construction-related objects detected by convolutional neural net-
T., … Adam, H. (2017). MobileNets: Efficient convolutional neural works. Journal of Computing in Civil Engineering, 32(3), 4018012.
networks for mobile vision applications. arXiv:1704.04861. Luo, X., Li, H., Cao, D., Yu, Y., Yang, X., & Huang, T. (2018). Towards
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., … efficient and objective work sampling: Recognizing workers’ activ-
Murphy, K. (2017). Speed/accuracy trade-offs for modern convolu- ities in site surveillance videos with two-stream convolutional net-
tional object detectors. In Proceedings of the IEEE Conference on works. Automation in Construction, 94, 360–370.
Computer Vision and Pattern Recognition (pp. 7310–7311). Maeda, H., Sekimoto, Y., Seto, T., Kashiyama, T., & Omata, H. (2018).
International Electrotechnical Commission. (2016). Edge intelligence. Road damage detection and classification using deep neural networks
Retrieved from https://www.iec.ch/whitepaper/pdf/IEC_WP_Edge_ with smartphone images. Computer-Aided Civil and Infrastructure
Intelligence.pdf Engineering, 33(12), 1127–1141.
Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating Memarzadeh, M., Golparvar-Fard, M., & Niebles, J. C. (2013). Auto-
deep network training by reducing internal covariate shift. In Inter- mated 2D detection of construction equipment and workers from
national Conference on Machine Learning (pp. 448–456). site video streams using histograms of oriented gradients and colors.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Automation in Construction, 32, 24–37.
… Darrell, T. (2014, November). Caffe: Convolutional architecture Nabian, M. A., & Meidani, H. (2018). Deep learning for accelerated seis-
for fast feature embedding. In Proceedings of the 22nd ACM Interna- mic reliability analysis of transportation networks. Computer-Aided
tional Conference on Multimedia (pp. 675–678). ACM. Civil and Infrastructure Engineering, 33(6), 443–458.
Kim, H., Kim, H., Hong, Y. W., & Byun, H. (2018). Detecting construc- NVIDIA Developer. (2019a). Autonomous machines. Retrieved from
tion equipment using a region-based fully convolutional network and https://developer.NVIDIA.com/embedded-computing
transfer learning. Journal of Computing in Civil Engineering, 32(2). NVIDIA Developer. (2019b). NVIDIA TensorRT. Retrieved from
https://doi.org/10.1061/%28ASCE%29CP.1943-5487.0000731 https://developer.NVIDIA.com/tensorrt
Kim, H., Kim, K., & Kim, H. (2016). Vision-based object-centric safety O’Kelly, M., Sukhil, V., Abbas, H., Harkins, J., Kao, C., Pant, Y. V.,
assessment using fuzzy inference: Monitoring struck-by accidents … Bertogna, M. (2019). F1/10: An open-source autonomous cyber-
with moving objects. Journal of Computing in Civil Engineer- physical platform. arXiv:1901.08567.
ing, 30(4). https://doi.org/10.1061/%28ASCE%29CP.1943-5487. Olston, C., & Najork, M. (2010). Web crawling. Foundations and Trends
0000562 R in Information Retrieval, 4(3), 175–246.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimiza- Park, M.-W., Elsafty, N., & Zhu, Z. (2015). Hardhat-wearing detection
tion. arXiv:1412.6980. for enhancing on-site safety of construction workers. Journal of Con-
Krizhevsky, A., & Hinton, G. (2010). Convolutional deep belief networks struction Engineering and Management, 141(9), 4015024.
on CIFAR-10 (Technical report). University of Toronto. Raspberrypi.org. (2019). Teach, learn, and make with Rasp-
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, berry Pi— Raspberry Pi. Accessed December 23, 2019.
J., … Ferrari, V. (2018). The Open Images dataset V4: Unified image https://www.raspberrypi.org/.
classification, object detection, and visual relationship detection at Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN:
scale. arXiv:1811.00982. Towards real-time object detection with region proposal networks.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, In Advances in Neural Information Processing Systems (pp. 91–99).
521(7553), 436–444. NIPS.
Li, R., Yuan, Y., Zhang, W., & Yuan, Y. (2018). Unified vision-based Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,
methodology for simultaneous concrete defect detection and geolo- … Fei-Fei, L. (2015). ImageNet large scale visual recognition
calization. Computer-Aided Civil and Infrastructure Engineering, challenge. International Journal of Computer Vision, 115(3), 211–
33(7), 527–544. 252.
Liang, X. (2019). Image-based post-disaster inspection of reinforced Shao, L., Zhu, F., & Li, X. (2015). Transfer learning for visual categoriza-
concrete bridge systems using deep learning with Bayesian optimization: A survey. IEEE Transactions on Neural Networks and Learning
tion. Computer-Aided Civil and Infrastructure Engineering, 34(5), Systems, 26(5), 1019–1034.
415–430. Sifre, L. (2014). Rigid-motion scattering for image classification (Ph. D.
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal dissertation). Ecole Polytechnique, CMAP.
loss for dense object detection. Proceedings of the IEEE International Simonyan, K., & Zisserman, A. (2014). Very deep convolutional net-
Conference on Computer Vision (pp. 2980–2988). works for large-scale image recognition. arXiv:1409.1556.
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
766 ARABI ET AL.

Software.intel.com. (2019). Intel® MovidiusTM neural compute stick. Xue, Y., & Li, Y. (2018). A fast detection method via region-based
Retrieved from https://software.intel.com/en-us/movidius-ncs fully convolutional neural networks for shield tunnel lining defects.
Son, H., Choi, H., Seong, H., & Kim, C. (2019). Automation in con- Computer-Aided Civil and Infrastructure Engineering, 33(8), 638–
struction detection of construction workers under varying poses and 654.
changing background in image sequences via very deep residual net- Yeum, C. M., & Dyke, S. J. (2015). Vision-based automated crack detec-
works. Automation in Construction, 99(November), 27–38. tion for bridge inspection. Computer-Aided Civil and Infrastructure
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception- Engineering, 30(10), 759–770.
v4, inception-resnet and the impact of residual connections on learn- Zhang, A., Wang, K. C. P., Li, B., Yang, E., Dai, X., Peng, Y., …
ing. Thirty-first AAAI Conference on Artificial Intelligence, San Chen, C. (2017). Automated pixel-level pavement crack detection on
Francisco, CA. 3D asphalt surfaces using a deep-learning network. Computer-Aided
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Civil and Infrastructure Engineering, 32(10), 805–819.
Rabinovich, A. (2015). Going deeper with convolutions. Proceedings
of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, Boston, MA (pp. 1–9). How to cite this article: Arabi S, Haghighat A,
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. Sharma A. A deep-learning-based computer vision
(2016). Rethinking the inception architecture for computer vision. solution for construction vehicle detection. Com-
Proceedings of the IEEE Conference on Computer Vision and Pat- put Aided Civ Inf. 2020;35:753–767. https://doi.org/
tern Recognition, Las Vegas, NV (pp. 2818–2826).
10.1111/mice.12530
TensorFlow. (2019). TensorFlow. Retrieved from https://www.
tensorflow.org/

APPENDIX

(a) (b)

FIGURE A1 Behavior of the detection models over training iterations: (a) training loss of Faster-RCNN_Inception-ResNet-V2, (b) validation
loss and mAP of Faster-RCNN_Inception-ResNet-V2, (c) training loss of SSD_Inception-V2, and (d) validation loss and mAP of SSD_Inception-V2
14678667, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12530 by Eduardo Miranda - Estsp Politecnico Do Porto , Wiley Online Library on [27/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
767

Construction vehicle activity level identification: (a) an arbitrary frame from a 2-min video of excavator operation overlaid with
center of detection bounding boxes, (b) extrapolation of the results presented in (a) for a 4-hr video, and (c) excavator activity heatmap generated
using data presented in (b)
FIGURE A2
ARABI ET AL.

Arabi, Haghighat, Sharma - 2020 - A Deep-Learning-Based Computer Vision Solution For Construction Vehicle Detection-Annotated
No ratings yet
Arabi, Haghighat, Sharma - 2020 - A Deep-Learning-Based Computer Vision Solution For Construction Vehicle Detection-Annotated
15 pages
A Review of Computer Vision-Based Monitoring Appro
No ratings yet
A Review of Computer Vision-Based Monitoring Appro
22 pages
Construction Site Hazards Identification Using Dee
No ratings yet
Construction Site Hazards Identification Using Dee
19 pages
4 - Engineering Vehicles Detection For Warehouse Surveillance System Based On Modified YOLOv4-Tiny
No ratings yet
4 - Engineering Vehicles Detection For Warehouse Surveillance System Based On Modified YOLOv4-Tiny
17 pages
Buildings 15 01362 v2
No ratings yet
Buildings 15 01362 v2
22 pages
SODA A Large-Scale Open Site Object Detection Dataset For Deep Learning in
No ratings yet
SODA A Large-Scale Open Site Object Detection Dataset For Deep Learning in
17 pages
Buildings 12 02167
No ratings yet
Buildings 12 02167
12 pages
A Review of Computer Vision-Based Monitoring Approaches For Construction Workers Work-Related Behaviors
No ratings yet
A Review of Computer Vision-Based Monitoring Approaches For Construction Workers Work-Related Behaviors
22 pages
Sensors 23 06690 v2
No ratings yet
Sensors 23 06690 v2
23 pages
Buildings 13 01074 v2
No ratings yet
Buildings 13 01074 v2
17 pages
Computer Vision in Construction Management
No ratings yet
Computer Vision in Construction Management
11 pages
Deep Learning for Object Tracking
No ratings yet
Deep Learning for Object Tracking
3 pages
Applications of Object Detection in
No ratings yet
Applications of Object Detection in
19 pages
Accident Detection
No ratings yet
Accident Detection
5 pages
Measurements of Spatial Angles Using Diamond Nitrogen-Vacancy Center Optical Detection Magnetic Resonance
No ratings yet
Measurements of Spatial Angles Using Diamond Nitrogen-Vacancy Center Optical Detection Magnetic Resonance
5 pages
1 s2.0 S0263224125002672 Main
No ratings yet
1 s2.0 S0263224125002672 Main
20 pages
Investigations of Object Detection in Im
No ratings yet
Investigations of Object Detection in Im
46 pages
Classification and Analysis of Deep Learning Applications in Construction A Systematic Literature Review
No ratings yet
Classification and Analysis of Deep Learning Applications in Construction A Systematic Literature Review
16 pages
Buildings 15 00410
No ratings yet
Buildings 15 00410
18 pages
Smart Traffic Management of Vehicles Using Faster R CNN Based Deep Learning Method
No ratings yet
Smart Traffic Management of Vehicles Using Faster R CNN Based Deep Learning Method
11 pages
Real-Time Vehicle Collision Detection System
No ratings yet
Real-Time Vehicle Collision Detection System
6 pages
Graph Neural Network Based Propagation Effects Modeling Fo 2022 Automation I
No ratings yet
Graph Neural Network Based Propagation Effects Modeling Fo 2022 Automation I
17 pages
Real-Time Object Detection Analysis
No ratings yet
Real-Time Object Detection Analysis
11 pages
An Improved Lightweight Yolo-Fastest V2 For Engine
No ratings yet
An Improved Lightweight Yolo-Fastest V2 For Engine
13 pages
DFSFGSG
No ratings yet
DFSFGSG
7 pages
609) Deep Learning For Crack-Like Object Detection (Kaige Zhang, Heng-Da Cheng) 2023
No ratings yet
609) Deep Learning For Crack-Like Object Detection (Kaige Zhang, Heng-Da Cheng) 2023
107 pages
Computer Aided Civil Eng - 2024 - Zhang - Deep Learning Framework With Local Sparse Transformer For Construction Worker
No ratings yet
Computer Aided Civil Eng - 2024 - Zhang - Deep Learning Framework With Local Sparse Transformer For Construction Worker
18 pages
AI-Driven Structural Damage Detection
No ratings yet
AI-Driven Structural Damage Detection
13 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Asım Et Al. - Unknown - A Vehicle Detection Approach Using Deep Learning Methodologies-Annotated
No ratings yet
Asım Et Al. - Unknown - A Vehicle Detection Approach Using Deep Learning Methodologies-Annotated
7 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
5 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
SSRN Id4107251
No ratings yet
SSRN Id4107251
7 pages
A Brief Review and Challenges of Object 2020
No ratings yet
A Brief Review and Challenges of Object 2020
17 pages
Tunnel CCTV Accident Detection System
No ratings yet
Tunnel CCTV Accident Detection System
5 pages
Fin Irjmets1684232858
No ratings yet
Fin Irjmets1684232858
9 pages
Journsl To Publish Research Paper
No ratings yet
Journsl To Publish Research Paper
15 pages
Review of Deep Learning for Object Detection
No ratings yet
Review of Deep Learning for Object Detection
21 pages
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
No ratings yet
A Study On Real Time Object Detection Using Deep Learning IJERTV11IS050269
7 pages
Enhanced CNN for Concrete Crack Detection
No ratings yet
Enhanced CNN for Concrete Crack Detection
23 pages
3 - Modelo de Detecção de Objetos Baseado em Aprendizagem de Transferência para Inspeção de Fixação de Parafusos em Estruturas
No ratings yet
3 - Modelo de Detecção de Objetos Baseado em Aprendizagem de Transferência para Inspeção de Fixação de Parafusos em Estruturas
13 pages
Smart Traffic Management of Vehicles Using Faster R-CNN Based Deep Learning Method
No ratings yet
Smart Traffic Management of Vehicles Using Faster R-CNN Based Deep Learning Method
17 pages
Enhanced YOLOv8 Object Detection Model For Constru
No ratings yet
Enhanced YOLOv8 Object Detection Model For Constru
14 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
Fbuil 06 604665
No ratings yet
Fbuil 06 604665
2 pages
Deep Learning for Car Detection
No ratings yet
Deep Learning for Car Detection
5 pages
AI Integration in Construction Safety
No ratings yet
AI Integration in Construction Safety
4 pages
0TH Review
No ratings yet
0TH Review
10 pages
10 1108 - Ci 04 2023 0062
No ratings yet
10 1108 - Ci 04 2023 0062
28 pages
Automatic Toll Collection System Using Rfid With Vehicle Classification Using Convolutional Neural Network
No ratings yet
Automatic Toll Collection System Using Rfid With Vehicle Classification Using Convolutional Neural Network
6 pages
02 Computer Vision On Construction 2
No ratings yet
02 Computer Vision On Construction 2
39 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
Automatic Damage Detection and Diagnosis For Hydra
No ratings yet
Automatic Damage Detection and Diagnosis For Hydra
14 pages
Sensors 23 07395
No ratings yet
Sensors 23 07395
18 pages
Deep Learning For Traffic Scene Understanding A Re
No ratings yet
Deep Learning For Traffic Scene Understanding A Re
51 pages
Towards Fully Automated Processing and Analysis of
No ratings yet
Towards Fully Automated Processing and Analysis of
15 pages
Lecture01 Week01 Part1 PDF
No ratings yet
Lecture01 Week01 Part1 PDF
59 pages
Stcath2026sba IT
No ratings yet
Stcath2026sba IT
8 pages
C18IB04G APSoftPrintSVM 31 InstrManual
No ratings yet
C18IB04G APSoftPrintSVM 31 InstrManual
45 pages
Qgis 3 Shortcuts
No ratings yet
Qgis 3 Shortcuts
2 pages
Gatau
No ratings yet
Gatau
17 pages
Install Help
No ratings yet
Install Help
304 pages
Lab 16b Hacking Minesweeper With Ollydbg
No ratings yet
Lab 16b Hacking Minesweeper With Ollydbg
10 pages
Shorter - Teru PDF Musique Jazz Divertissement (Général)
No ratings yet
Shorter - Teru PDF Musique Jazz Divertissement (Général)
1 page
4D InSpec Data Sheet
No ratings yet
4D InSpec Data Sheet
2 pages
Human-Computer Interaction Exam Guide
No ratings yet
Human-Computer Interaction Exam Guide
2 pages
Firmware Upgrade Wizard User Guide
No ratings yet
Firmware Upgrade Wizard User Guide
15 pages
Vap Project PDF
No ratings yet
Vap Project PDF
66 pages
Lab Manual 1
No ratings yet
Lab Manual 1
23 pages
Text
No ratings yet
Text
4 pages
Combinational Gage Fingers
No ratings yet
Combinational Gage Fingers
26 pages
Class 4 Computer
No ratings yet
Class 4 Computer
2 pages
BIOS Explanation
No ratings yet
BIOS Explanation
2 pages
Year 7 ICT Exam 2021 Questions
No ratings yet
Year 7 ICT Exam 2021 Questions
8 pages
Architecture Design Overview
No ratings yet
Architecture Design Overview
32 pages
OpenLab CDS Quick Reference Guide
No ratings yet
OpenLab CDS Quick Reference Guide
2 pages
Borang Penyelenggaraan PM Cerdik
No ratings yet
Borang Penyelenggaraan PM Cerdik
2 pages
Startup Fundraising Platform Testing Guide
No ratings yet
Startup Fundraising Platform Testing Guide
16 pages
Beckhoff - CX9020 (2016)
100% (1)
Beckhoff - CX9020 (2016)
50 pages
Computer Packages Notes
No ratings yet
Computer Packages Notes
35 pages
LATEX for Indian Language Typesetting
No ratings yet
LATEX for Indian Language Typesetting
16 pages
Rupalis Computer Project
No ratings yet
Rupalis Computer Project
29 pages
Intro To Object Oriented Programming and Applications in Structural Engineering
No ratings yet
Intro To Object Oriented Programming and Applications in Structural Engineering
35 pages
College Notification System Overview
No ratings yet
College Notification System Overview
15 pages
4 It-Era
No ratings yet
4 It-Era
20 pages
Fake Number Plate Recognition
No ratings yet
Fake Number Plate Recognition
5 pages

1 - A Deep Learning Based Computer Vision Solution For Construction Vehicle Detection

Uploaded by

1 - A Deep Learning Based Computer Vision Solution For Construction Vehicle Detection

Uploaded by

DOI: 10.1111/mice.

A deep-learning-based computer vision solution for construction

Saeed Arabi Arya Haghighat Anuj Sharma

Department of Civil, Construction, and

1 I N T RO D U C T I O N deep learning shown major breakthroughs due to increasingly

© 2020 Computer-Aided Civil and Infrastructure Engineering

Comput Aided Civ Inf. 2020;35:753–767. wileyonlinelibrary.com/journal/mice 753

SSD employs the Jaccard overlap (also known as IOU) to (a)

FIGURE 3 Examples of successful detection with proposed SSD_MobileNet structure

You might also like