0% found this document useful (0 votes)
13 views13 pages

AI-Driven Structural Damage Detection

Uploaded by

Ánh Trần
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

AI-Driven Structural Damage Detection

Uploaded by

Ánh Trần
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Structural Health Monitoring

Engineering deep learning methods on automatic XX(X):1–12


© The Author(s) 2021 Reprints and

detection of damage in infrastructure due to permission:


sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/ToBeAssigned
extreme events www.sagepub.com/

SAGE

Yongsheng Bai, Bing Zha, Halil Sezen and Alper Yilmaz

Abstract
This paper presents a few comprehensive experimental studies for automated Structural Damage Detection (SDD) in extreme events
using deep learning methods for processing 2D images. In the first study, a 152-layer Residual network (ResNet) is utilized to classify
multiple classes in eight SDD tasks, which include identification of scene levels, damage levels, material types, etc. The proposed
ResNet achieved high accuracy for each task while the positions of the damage are not identifiable. In the second study, the existing
ResNet and a segmentation network (U-Net) are combined into a new pipeline, cascaded networks, for categorizing and locating
structural damage. The results show that the accuracy of damage detection is significantly improved compared to only using a
segmentation network. In the third and fourth studies, end-to-end networks are developed and tested as a new solution to directly
detect cracks and spalling in the image collections of recent large earthquakes. One of the proposed networks can achieve an accuracy
above 67.6% for all tested images at various scales and resolutions, and shows its robustness for these human-free detection tasks.
As a preliminary field study, we applied the proposed method to detect damage in a concrete structure that was tested to study its
progressive collapse performance. The experiments indicate that these solutions for automatic detection of structural damage using
deep learning methods are feasible and promising. The training datasets and codes will be made available for the public upon the
publication of this paper.

Keywords
Deep learning, structural damage, crack detection, spalling detection, ResNet, U-Net, cascaded networks, Mask R-CNN

Introduction objects (i.e., semantic segmentation) or the objects are


Artificial Intelligence (AI) began as an academic research marked by masks (i.e., instance segmentation)4.
subject in 1950s, and currently the commercialization and
potential applications of AI are being pushed for almost all Since structural damage captured by the cameras can be good,
industries. Machine learning, a sub-subject of AI technology, well-focused or not, the quality of the damage viewed from
is a vital discipline developing various algorithms for learning field investigations in extreme events varies and would not be
from data, identifying patterns and making decisions without at the same level. In addition to the variation of structural
human intervention. As a subcategory of machine learning, damage on different materials, decoration layers or covers on
deep learning provides state of the art results to problems the structural components can also affect the appearance of
that are initially considered to be intuitively solved by humans. the damage. Deep learning methods can effectively handle
Deep learning models learn from experiences and evolve these types of uncertainties through data collection and
through training and testing, and they are particularly training, and can make it viable for AI applications on SDD and
effective at learning complicated concepts by themselves 1. Structural Health monitoring (SHM) with vision-based
Thus, deep learning can capture and represent knowledge technologies5. For example, wall-climbing robots and
basis and reason like a real person2. Unmanned Aerial Vehicles (UAVs) had been used in real
projects with deep learning networks for collecting and
Computer vision is a science to process the information and
detecting the cracks on a tunnel6. Deep learning is the
gain high-level understanding from digital images and videos.
technique with great potential for measuring and assessing
Once developed and perfected, it can serve as a human vision
the damage observed in laboratory experiments, field
system for AI agents. The recent breakthrough achieved for
investigations, and annual inspections of existing
the large-scale image classification on ImageNet3 using
infrastructures7.
Convolutional Neural Network (CNN) significantly accelerated
the development of vision-based technologies, and deep
learning became an essential tool for computer vision. Two
Department of Civil, Environmental and Geodetic Engineering, The Ohio State
common techniques, classification and segmentation, are
University, USA.
used in practice for interpreting the scenes represented in the
images or videos acquired from cameras. Categories of Corresponding author:
objects are predicted through image classification but, in Yongsheng Bai, Department of Civil, Environmental and Geodetic Engineering,
The Ohio State University, Columbus, USA.
image segmentation, the pixels are labeled by classes of the
Email: [email protected]

Prepared using sagej.cls


Bai et al. 2
from large earthquake events have been tested to
automatically detect the damage and address the feasibility of
applications with the deep learning methods in these events.
In addition, we applied the proposed end-to-end deep
learning method to automatically detect damage that
occurred in the field during the gradual collapse of a building.
Our objective is to find a generalized solution for SDD on
classifying multiple types and levels of damage on reinforced
concrete and masonry structures and localizing the damage at
various scales using the images collected from field
investigations or laboratory experiments. Our studies also aim
to perform real-time SDD after finalizing all the parameters
pixel level object level structural level
and achieving stable performance with AI agents. Thus, a
Figure 1. Three scene levels (scales) in our models structural engineer can utilize UAVs or ground vehicles to
quickly and safely access the structures following an extreme
event or during a periodic inspection. This will reduce the
The observability and detectability of structural damage are workload of structural engineers and improve the efficiency
affected to great extent by their scales in images. For of the damage assessment during field inspections. The data
structural damage captured at a varying scale, e.g., when obtained and used in our research and the codes will be made
cameras are closer to or farther away from them, the detected available to public for reproducibility, general uses, and
shapes and number of the damaged regions may look quite continued work.
different. The appearance of background structural elements
will also change in images8. Figure 1 shows the examples of Related work
cracking and spalling damage at three scales or three scene Our studies benefit from many prior works which can be
levels: pixel, object and structural levels. At the pixel level, categorized into classification and segmentation techniques
cracks and spalling are clearly captured but structural with various deep learning methods. We also discuss the
components including columns, beams, walls and slabs are datasets that were used for training and testing because they
partially captured and cannot be identified accurately. These are critical for successful application of these methods.
components can be recognized in object-level images. At the
structural level, the entire structures, e.g., buildings or bridges, Datasets and damage classification with deep
can be observed along with the damage. The cracks and
learning in SDD and SHM
spalling, having various shapes and depths depending on the
surroundings, are less visible or even invisible at larger scale. There are several important classification datasets for SDD
Therefore, it is necessary to include representative images at and SHM in the research community. Yeum et al. 10 collected a
different scales before annotations. This is the best way to large image dataset for post-event building reconnaissance
counteract the imbalance of training samples in practice9. and used AlexNet to classify and identify the post-event
structural damages in buildings using large scale images, such
Aiming at practical solutions for AI robots in extreme events
as collapse classification and identification of building
(e.g., earthquakes) when human experts may not be readily
components. A Regional Convolutional Neural Networks (R-
available in the affected regions or it may be too dangerous
CNN) was employed to localize the spalling damage in some
for engineers to closely inspect the damaged infrastructures,
images. Furthermore, they also provided datasets about
this research attempts to reduce the reliance on human
Global Positioning System (GPS) devices, structural drawings,
experts by developing and improving the deep learning
timestamp, and measurements to automatically classify the
procedures to automatically detect and classify the
context information when these images were documented
infrastructure damage.
during post-event field investigations11. Meanwhile, Gao and
Four consecutive SDD studies are conducted with deep Mosalam8 set up PEER Hub ImageNet (Phi or φ-Net) Challenge
learning methods in our research. These studies include: 1) to encourage researchers to test their methods on a collection
classification of eight classes of damage by a ResNet, 2) a of building structural failures. There are 36,413 pairs of
pipeline with two-step networks called cascaded networks to images and labels at various scales in this benchmark dataset.
classify and locate the damage such as cracks and spalling, and φ-Net dataset contains eight classification tasks: 1) pixel,
3) a solution for detecting the damage (e.g., cracks or spalling object, and structural scene levels; 2) damaged or undamaged
and cracks together) directly with the state-of-the-art deep state of the structures; 3) spalling or non-spalling; 4) material
learning methods. The flow diagrams that show how these types such as steel, concrete and others; 5) various types of
networks work for predicting structural damage are collapse of the structures; 6) component types like beams,
illustrated in Figures 3, 4 and 9. Based on our knowledge, very columns, walls and others; 7) damage levels or severity; and
limited research have been conducted to address the 8) damage types for cracking, including bending-related
difference between classification and detection of structural damage, shear-related damage and combined damage, or no
damage with deep neural networks9, and no solution is damage (non-cracking). To overcome the insufficiency of
provided to unify them. Also, few image datasets collected training data, Visual Geometry Group (VGG) network and
Prepared using sagej.cls
Bai et al. 3
transfer learning were employed to perform classification in There are a total of 2,366 labeled images with the size of
their study12. In our studies, these images for Task 1, 3 and 8 500×375 for training23. Kim and Cho automatically localized
(see Table 1) are used for training and testing the proposed the cracks on a concrete wall with Mask R-CNN and employed
methods. an additional image processing procedure on each bounding
box to quantitatively measure the width of these cracks. The
Damage detection with deep learning in SDD and training data included 376 images24. Based on 1,250 images
with sizes varying from 344 × 296 to 1,024 × 796, Kalfarisi et
SHM al. employed structured random forest edge detection in the
Classification networks don’t provide the information about region of bounding boxes of a Faster R-CNN to localize the
where the damage is in an image. Therefore, structural cracks and compare it with Mask R-CNN. Photogrammetry
engineering experts need to locate the positions and identify software was used to reconstruct 3D model, thus, the cracks
the type of damage by themselves while it is impossible for can be visualized and quantified further25.
non-professionals to do that. For example, the bottom right
The aforementioned research help us to collect data and
image in Figure 1 shows there are spalling and cracking
create training datasets when we began to our studies, to
damage on the columns in the first and second stories of the
understand how to use deep learning methods correctly and
building. The classification models would associate this image
effectively, and to find right solutions for the problems we are
to the corresponding classes of the damage but cannot give
facing on in practice.
the locations of the damage. Therefore, a segmentation
network, which gives the class of each object and locates it
with bounding box or mark it with masks, is needed for Data preparation and methodologies
localizing such damage in SDD missions. Data should be prepared for training and testing when the
Some research focused on large scale images or multiclass applications on SDD with deep learning methods have
damage detection. Hoskere et al. conducted an experiment different objectives and expectations. Each method has its
with 23-layer ResNet and nine-layer VGG networks to classify own requirement on the size and composition of the visual
and segment seven classes of structural damage, including data. In general, there are more available datasets for a
cracks, spalling, exposed reinforcement, corrosion, fatigue classification network than for a segmentation network on
cracks, asphalt cracks, and no damage13. Ali et al. applied detecting structural damage, since the latter needs more
Faster R-CNN on defects detection in historical masonry efforts for labeling the image data26.
buildings with high-resolution images14. Kong and Li described
an application that detects and tracks the propagation of
Tools for data preparation
cracks in a steel girder with a video stream15. Atha et al.
explained the different effects when they utilized two Since the SDD models are trained in a supervised way (i.e.,
algorithms of CNNs on detecting metallic corrosion16. Mondal damage is clearly defined in training datasets), data collection
et al. used Faster R-CNN to automatically detect four common and preparation are vital to train and validate the deep
types of structural damage, including surface cracks, facade learning models. In current research, these data must be
and concrete spalling, and severe damage with exposed labelled manually by structural engineers or by people with
rebars and buckled rebars. They used bounding boxes to civil engineering background. Many researchers10,23,12,9 have
identify the positions and boundaries of these damage17. shown how to perform data preparation, such as associating
the classes of damage in the images, annotating the location
Pixel-level damage detection is a popular task among
of the damage with a bounding box, or defining the
researchers. Zhang et al. proposed an improved CNN for
boundaries and shape of the damage for structural damage
autonomous detection of pavement cracks at the pixel level 18.
classification or detection.
Liu et al. demonstrated the application with U-Net to segment
In our studies, COCO (Common Objects in Context)
the crack on concrete structures19, and their experiment
Annotator27 is chosen as a tool to label the damage (i.e., cracks
shows that the proposed network outperforms the CNN which
and spalling) on infrastructure, such as buildings, bridges and
was used by Cha et al.20. Dung and Anh also used Fully
other structures except for steel structures, and even on some
Convolutional Network for localizing the cracks on the
non-structural components for training and validation. The
concrete surface21, Liu et al. implemented DeepCrack, which
boundaries of damage on an image are defined with polygons
is made of an extended Fully Convolutional Networks (FCN)
to form a closed region to represent them, so that the error of
and a Deeply-Supervised Nets (DSN), to pin out pixel-wise
the damage shapes will be no greater than one pixel when
cracks22.
labeled. Then, these regions are converted to the labels with
Recent research became more applicable. With Holistically- binary images and saved as a JSON (JavaScript Object Notation)
Nested Edge Detection (HED) network and U-Net, Yang et al. file for training. Figure 2 shows some examples of original
detected cracks and spalling on concrete structures and then images and labels of cracks and spalling in our training
reconstructed 3D models through Simultaneous Localization datasets. Noted that each type of damage belongs to a class
and Mapping (SLAM) using drone images6. Cha et al. utilized and is independent of each other during labeling, although
Fast R-CNN for locating five types of structural damages, they look overlaid in some annotated images when they
including concrete cracking, steel corrosion with two levels appear at the same location.
(medium and high), bolt corrosion, and steel delamination.

Prepared using sagej.cls


Bai et al. 4
Structural damage classification with a ResNet The U-Net has a symmetrical structure in down-sampling and
ResNet provides higher accuracy than networks like VGG and up-sampling process, and each layer of down-sampling is
GoogleNet because of its unique framework. The residuals in connected to the corresponding layer of up-sampling. Thus,
each layer of this neural network can be set to zero and the low-level features in the down-sampling can be directly
whole hierarchical feature combinations can be optimized absorbed to high-level features during the up-sampling. In our
practice, the cascaded networks are utilized as a method to
find the positions of cracks and spalling in images.
Structural damage detection of cracks and spalling
with Mask R-CNNs as an end-to-end method
The latest Mask R-CNNs are tested and updated for detecting
the damage directly because cascaded networks are not an
end-to-end method but a two-step network, which may be
time-consuming and complicated for a structural engineer. In
addition, high-resolution images have to be resized to low
definition in the cascaded pipeline (more details are discussed
in the implementation section), which affects the visibility of
structural damage in images. Mask R-CNN is a benchmark of
regional convolutional neural networks for instance
original label original label segmentation. It is based on Faster R-CNN32, which uses
Regional Proposal Network (RPN) to automatically produce
Figure 2. Some examples of training data (Cracks and spalling are in
yellow and green while background is in purple for each labeled image). the proposals for Region of Interest (ROI) on feature maps
convoluted from the original image and achieves higher speed
and accuracy at low computational cost. This is also the first
stage for the Mask R-CNN. In the second stage, Mask R-CNN
with skipping connections. Therefore, the network can be
continues to predict the damage like spalling on an image by
designed to have large number of layers for extracting high-
classifying it and regressing it with a bounding box. Then a
level features28. A 152-layer ResNet with transfer learning and
mask of the damage is created within its boundaries and
fine-tuning technique were used in our damage classification
shapes in the third stage for each ROI when ROI Align is
network for PEER Hub ImageNet (PHI) Challenge12 where our
utilized (see Figure 5). Furthermore, ResNet and a Feature
approach secured the third place during the competition 29
Pyramid Network (FPN) are incorporated to obtain high
(see results at https://apps.peer.berkeley.edu/phicha-
quality feature maps33.
llenge/winner/). The flowchart of our ResNet is shown in
Figure 3. Three variants of Mask R-CNNs are explored and developed
in our studies. Meanwhile, with the awareness of the scale
problem on SDD, new datasets, which include all three
defined scales, are prepared and newly-developed skills in
Cascaded networks for structural damage
deep learning methods are also employed for improving its
classification and localization performance.
After using the ResNet to categorize various structural 1) Mask R-CNN with Path Aggregation Network (PANet) and
damage, material types and even the severity of damage in spatial attention mechanism: PANet is different from the
these images, we still need to delineate the damage location original Mask R-CNN in its first and second stages34. New
during SDD. Without delineation the structural engineers connections between low feature maps and high feature
would still have to mark the damage locations manually. maps in a FPN has been introduced (see Figure 6a), which
Instead of that structural engineering experts must be increase the efficiency of feature extraction. In addition, a
involved and manually identify the damage, a new pipeline technique called adaptive feature pooling is used to fuse all
similar to a process of diagnosing an illness using a levels of features in a proposed ROI at stage 2 in Figure 6c. The
combination of doctors’ personal experiences, medical other procedures and stages are the same as a traditional
equipment, and available patient data is provided 9. In this Mask R-CNN. To improve the level of feature extraction,
study, the classification network as a classifier can tell spatial attention mechanism35 is also introduced into this
whether there are structural defects in the selected images, framework. We will refer this modified version as APANet
and another segmentation network (e.g., U-Net) serves as a Mask R-CNN in this paper.
detector to locate them. These two-step networks are named
2) Mask R-CNN with High-resolution Network (HRNet):
as cascaded networks. Its flowchart for detection is shown in
HRNet is a state-of-the-art backbone developed for feature
Figure 4.
extraction and applies multi-scale fusion across the
In cascaded networks, the existing classification networks convolutional blocks36. With a traditional FPN, the features of
used by researchers on SDD can be kept without any change, an instance are embedded from the image via down-sampling
but a segmentation network is added after structural damage to obtain different levels of features, or in other words,
being categorized. From among other architectures, U-Net30,31 rescale the image from original size to a smaller size using
is chosen as the detector to locate the damage in our study.
Prepared using sagej.cls
Bai et al. 5
convolutional operations at each level to obtain higher-level Cascade Mask R-CNN introduces another two prediction
features (see Figure 6a). But HRNet has a parallel structure as branches with different strategies to detect and segment
shown in Figure 7. After each convolutional operation block, instances (see Figure 8b to 8d). Since more pooling operations
down-sampling process is utilized for high-level features as a on ROI proposals on the feature maps, it can overcome the
FPN does. However, a new branch is also created to keep the overfitting problem existing in original Mask R-CNN37.
size of feature maps at this level. By introducing low-level These Mask R-CNNs are used to predict the class of the
features with a strided convolutional operation and damage, limit the damage range with bounding boxes, and
combining high-level features with up-sampling operations, mark the shapes and boundaries of the damage with masks
these new branches continue to convolve until last step of on a new image as the results of an output. Figure 9 shows the
final stage. Since these inter-connections between branches
are used, more useful features can be extracted through this

Figure 3. Flowchart of our 152-layer ResNet for structural damage classification.


Output: a new
Classifying Spalling or image overlaid with
U-Net
spalling and cracking cracks or spalling
An image
as input cracking by Non-spalling or masks over the
ResNet non-cracking original image.

Figure 4. Flowchart for cascaded networks for testing.


Figure 6. Framework for PANet in SDD for detecting cracks and spalling:
classbox
(a) FPN backbone; (b) Bottom-up path augmentation; (c) Adaptive
feature pooling; (d) Box branch; (e) Fully-connected fusion. Pi and Ni in
the figures denote the ith of the original pyramid layers and new
ROI Align Convolution Convolution feature layers.

flowchart for this end-to-end methodology for testing on new


images. All the training parameters and techniques of these
Figure 5. Framework for Mask R-CNN in a SDD task (a mask is in purple deep learning methods are illustrated in the Implementation
for the spalling damage on the column, and a bounding box is section.
represented with dash green line).

Implementation details and results


new network for high-resolution images. After substituting In this section, the implementation details of the
the ResNet of the original Mask R-CNN with the HRNet, it aforementioned techniques for automatic classification and
forms a new network structure which we refer to as HRNet detection of structural damage are presented.
Mask R-CNN.
3) Cascade Mask R-CNN: Mask R-CNN has two prediction Automated classification for eight tasks of damage
branches after the first stage (i.e., it is represented by detection with the ResNet
convolution blocks in Figure 8a), in which the feature maps are ResNet outperformed other networks in many classification
extracted from the input image and potential ROIs are tasks, so we chose a 152-layer ResNet to identify material or
proposed by a FPN. These two branches can generate the class damage types, structural components, spalling or non-spalling,
C, bounding box B, and mask S for each instance on the image. and even the severity of structural collapse and damage in the
They are also called detection and segmentation branches.

Prepared using sagej.cls


Bai et al. 6

Figure 7. Framework for HRNet in SDD.

Figure 8. Framework for the original Mask R-CNN (a) and three Cascade Mask R-CNNs (b)-(d). ”Image” is the input, “Convolution” is the
backbone for convolutional operation on the input image, “Pool” is region-wise feature extraction, “H” is the RPN head, “B” is bounding box,
“C” is classification, and “S” denotes a segmentation branch.

competition12. Image size of the φ-Net dataset is uniform as acceptable considering difficulties associated with such
224×224 but at various scales. For training, hyper-parameters classification tasks.
are set as follows: learning rate is 0.001 and momentum is 0.9;
the loss function is cross-entropy; 40 min-batches and 100 Automated damage detection with cascaded
epochs are defined to maximize GPU usage. A NVIDIA GeForce networks
GTX 2080 Super GPU is used for training and testing. Cascaded networks were used to identify and localize cracks
Furthermore, the classification result of each task is evaluated in the first session, and then to detect the spalling damage in
by using the confusion matrix, where the diagonal elements the second session as new datasets were introduced in this
denote true predictions. So the metric called accuracy study. In the pipeline, the ResNet has the same parameters
represents the percentage of the correct performance on and setup as our prior study29, while learning rate and loss
each task and is defined as: function in the U-Net are set as 0.0001 and binary cross-
entropy. We used the same GPU as before. In detection test
(Number of True Predictions) with the U-Net, the accuracy is defined as the model can at
Accuracy = (1) least mark one piece of crack or spalling. Each pair of original
N image and the prediction is checked and thrown into
Class of the damage: corresponding folders of correct and incorrect predictions.
cracks, spalling, and thus, the ratio of the correct predictions over the total
non-prediction for
non-cracking and Output: a new number of testing images is the accuracy of the U-Net.
Non-spalling. image overlaid
with category 1) Cascaded networks for crack detection9: On one hand,
The damage range names, bounding 1,000 images at pixel level and 853 images at object and
An image
Mask R-CNNs predicted with boxes, and masks
as input structural levels are labeled and used for training the U-Net
bounding boxes. of cracks and
spalling over the with COCO Annotator. All these images were resized to
Boundaries and shapes original image. 256×256 in order to be compatible to the size of images in φ-
of the damage marked
by masks Net dataset12. The ResNet was trained with Task 8 of φ-Net
dataset (see Table 1). On the other hand, some images and
another publicly available dataset called Concrete Surface
Figure 9. Flowchart of Mask R-CNNs used for structural damage
detection in the testing stage. Crack (CSC)38 were tested. The latter has an image size of
227×227. Noted that all the images have no need to be
resized when the U-Net was used but they were resized to
where N = total number of samples. 224×224 only for the ResNet in this study.
Testing results are shown in Table 1. The ResNet model can There are a total of 40,000 images in the CSC dataset, half
identify scene levels (scales) and material types with a very of which are cracking and non-cracking at pixel level. We
high accuracy while the accuracy of the severity of collapse tested the proposed cascaded networks with this dataset. All
and damage and the types of damage is not very high but the images are identified as pixel-level ones by the ResNet,
Prepared using sagej.cls
Bai et al. 7
Table 1. Classification results on testing data of φ-Net dataset with the 152-layer ResNet.
Detection tasks Number of Image statistics Testing accuracy of
classes Training Validation Testing the ResNet

1 Scene classification 3 13,939 3,485 4,356 93.8%


2 Damage check 2 4,730 1,183 1,479 81.9%
3 Spalling condition 2 2,635 659 824 79.6%
4 Material type 2 3,470 867 1,085 99.5%
5 Collapse check 3 2,105 527 658 63.1%
6 Component type 4 2,104 526 658 71.7%
7 Damage level 4 2,105 527 658 67.8%
8 Damage type 4 2,105 527 658 67.5%
and the average accuracy for cracking and non-cracking is An end-to-end method to automatically detect cracks
91.2%. For these images categorized as cracking ones, the U- and spalling by using Mask R-CNNs
Net precisely marks the cracks and does not have any failure
Since two networks are involved in the cascaded networks
cases in the remained 19,000 images after 1,000 images and the training image data, especially those for the ResNet,
being labeled and trained. have a low resolution, we realized that there is a great need
The implementation of cascaded networks on φ-Net to simplify the framework and make it applicable to images at
dataset is a dilation study. It includes these procedures in the different definitions. On one hand, as the high-resolution
test: First, a new testing data were selected from training and images are resized to smaller ones, the visibility of tiny
validation images in Task 1 of φ-Net dataset (see Table 1). damage like cracks may be reduced or even be invisible. The
Second, the U-Net was used to mark the cracks directly and its useful information about the damage can be saved if the size
accuracy was calculated. Third, the cascaded networks were of these images has not been changed, thus, the networks can
applied on these testing data when the ResNet and U-Net was extract more precise features from original images. On the
applied to classify and locate cracks in these images. Finally, other hand, the U-Net is not suitable for segmenting multiple
the accuracy of the cascaded networks is computed. The types of damage simultaneously. Therefore, our next goal is
result of this experiment is shown in Table 2. It shows that the to find and assess end-to-end neural networks to localize the
accuracy of the cascade networks is improved dramatically damage like cracks and spalling when they are captured at
because the ResNet as the first gate to filter out some images different scales and varied resolutions.
without cracks on structural elements and the U-Net can focus The accuracy of these predictions is redefined as these
on less-noised images to mark the cracks. models correctly detected one or two kinds of structural
2) Cascaded networks for spalling detection: For this test, damage in Eq. (1). To be more specific, it is a correct prediction
the RestNet was trained in Task 3 of φ-Net dataset (see Table when at least one piece of cracks or spalling has been marked
1), and 1,178 images were prepared for training the U-Net to by a bounding box or a mask when the damage is visibly
mark the spalling. Training data for the latter are from the captured on each testing image. Otherwise, it is also correct
collection of Yang et al.6 and our own work39. There are two when no prediction is given for those images without any
datasets for testing. The first one is a spalling dataset by Yeum cracks or spalling on the structure. Each prediction from the
et al.10, in which there are 1,000 images with a uniform size of models and the original images are compared before being
640×480. All these images have the spalling damage shown moved into the corresponding folders of correct or incorrect
on the structural components. When the U-Net was used, it detection. In addition, parameters are set for the training of
acquired an accuracy of 99.0% on detecting the spalling. But Mask R-CNNs as follows: learning rate, momentum and decay
the accuracy of the ResNet is 85.6% for classifying these rate of weights are defined as 0.002, 0.9 and 0.0001,
images after they were resized to 224×224. On the other hand, respectively; the loss function for the mask is cross-entropy
a total of 1,692 images labeled as the spalling ones in Task 3 and for bounding boxes is smooth L1; the number of epochs
of φ-Net dataset were directly tested by the U-Net and its is 100 with the same GPU used in our previous studies. Our
40
accuracy reached to 97.6%. The ResNet for φ-Net spalling source codes are originated from MMDetection while some
dataset has an accuracy of 79.6% as shown in Table 1. We modifications and updates were made for different purposes
didn’t use the U-Net to locate the spalling damage right after during the training and testing.
the ResNet because the former has such a high accuracy on 1) Crack detection with APANet Mask R-CNN and HRNet
26
these two datasets. The accuracy of cascaded networks for Mask R-CNN : This pipeline is an end-to-end solution for
this test is determined rather by the ResNet than by the U-Net. damage detection, in which a total of 2,021 images with the
Some examples for good predictions from the cascaded size from 168×300 to 4,600×3,070 and with three scales were
networks are shown in Figure 10. It can be seen that the labeled. The annotated data format is similar to the examples
proposed networks typically provide precise locations of shown in Figure 2 while the original images came from our
these cracks and spalling, so there is less reliance on human own collection and internet search. The performance of
experts to manually find them. It should be pointed out that it APANet Mask R-CNN on φ-Net dataset is shown in Table 2.
takes more time for cascaded networks to classify and locate Compared to the cascaded networks, this Mask R-CNN has a
the damage than the ResNet or the U-net is used alone. dramatic improvement for these images at larger scale other
Prepared using sagej.cls
Bai et al. 8

Table 2. The accuracy of the U-Net, cascaded networks and APANet Mask R-CNN (APAN Mask) on detecting cracks in φ-Net dataset (TP
means true predictions).
Scene levels Number of U-Net Cascaded networks APAN Mask
images
TP Accuracy ResNet U-Net Accuracy TP Accuracy
Pixel level 4,661 2,819 60.5% 2,988 2,810 94.0% 3,948 84.7%
Object level 5,713 1,490 26.2% 1,479 1,129 59.6% 4,407 77.1%
Structural level 5,832 500 8.6% 717 356 49.7% 4,774 81.9%
than those at pixel level. In addition, APANet and HRNet Mask performance than the other two on detecting cracks and
R-CNNs were directly applied onto 2017 Pohang earthquake spalling no matter what the images are at low or high
images dataset (PEI2017)41 and 2017 Mexico City earthquake resolution. Overall, these Mask R-CNNs can achieve an
images dataset (MCEI2017)42, which include 4,109 and 4,136 accuracy above 66.0% for detecting two major structural
high-resolution images collected by structural experts after damage with high-definition images, although the
these two Richter magnitude 5.2 and 7.1 earthquakes performance of APANet and HRNet Mask CNNs declines a
happened. Table 3 shows the accuracy of two models, but little for spalling and crack detection compared to its
both Mask R-CNNs have a close accuracy on two testing performance on crack detection for two same datasets.
datasets. This test indicates that it is possible for end-to-end Figure 11 illustrates the predictions made from these Mask
deep learning methods like the latest Mask R-CNNs to R-CNNs on the testing datasets. Cracks and spalling at various
precisely detect cracks at various scale in large earthquake scales and different resolutions can be identified and located
events. automatically that will reduce the need for human interaction
2) Spalling and crack detection with new variants of Mask on locating damage following an extreme event. It should be
R-CNN as an end-to-end method39: Since APNet and HRNet noted that the limitation of insufficient training data at hand
Mask R-CNNs worked quite well for crack detection, we added is the major reason for incorrect predictions in our studies. In
the spalling damage into the detection task to check whether addition, the imbalance among the datasets training for
this solution is more robust and applicable in field damage localization is still hard to overcome. So, the deep
investigation. Image collection from Yang’s spalling dataset 6 learning models are easily distracted by these objects which
and our in-house generated dataset were relabeled, resulting appear like cracks and spalling in images such as: 1) wires or
in a total of 2,229 curated images for training and validation. cables; 2) trees; 3) fences; 4) shadow; 5) edges of windows,
Table 3. Accuracy of APANet and HRNet Mask R-CNN for crack Table 5. Accuracy of APANet Mask R-CNN for damage detection from
detection on two public datasets. images collected during progressive collapse testing of a building in the
Methods PEI2017 MCEI2017 field.
Image Total True Accuracy Average
APANet Mask R-CNN 74.1% 70.6%
source number predictions accuracy
HRNet Mask R-CNN 74.0% 73.0%
Table 4. Accuracy of three Mask R-CNN for cracks and spalling detection
Cell phone 220 172 78.8%
on three public datasets.
Methods φ-Net CrSp PEI2017 MCEI2017
Drones 303 172 56.8% 65.8%
Cascade Mask R-CNN 78.9% 66.0% 69.4% 39
buildings or other artifact objects . Through refining a few
APANet Mask R-CNN 81.1% 67.6% 74.7% parameters in the testing step, we also observed that the
HRNet Mask R-CNN 58.6% 68.1% 69.1% exact shapes and numbers of the damage can be precisely
marked in some images. For some SDD tasks, new damage-
Also, size of these images varies from 147×288 to 4600×3070.
related images need to be introduced into the training
The examples are shown in Figure 2. In addition, a new variant
datasets of this end-to-end method for better performance.
of Mask R-CNN was introduced for comparison resulting in
three different trained Mask R-CNNs that are tested. Data
augmentation43 was utilized for improving performance. An application of damage detection in a building
More diverse testing data are included for the purpose of during a collapse experiment in the field
detecting two major types of damage in extreme events. On
one hand, all of the spalling, non-spalling, cracking, and Recently we conducted a field experiment to study
noncracking images of training and validation data in Task 3 progressive collapse performance of a six-story parking garage
and 8 of φ-Net dataset were collected and combined into a structure on the main campus of The Ohio State University in
new testing dataset, which is called φ-Net CrSp dataset in Columbus, Ohio. The APANet Mask RCNN was used to detect
Table 4. It consists of 5,853 images in total and is a low-
resolution but comprehensive dataset. On the other hand,
high-resolution images from PEI2017 dataset and MCEI2017
datasets were still used for testing these three networks. The
performance of these three networks on these datasets is
shown in Table 4. APANet Mask R-CNN has a better
Prepared using sagej.cls
Bai et al. 9

1) CSC dataset

2) Pixel-level φ-Net dataset

3) Object-level φ-Net dataset

4) Structural-level φ-Net dataset

original predicted overlaid original predicted overlaid


Figure 10. Some examples of correct prediction for cracks and spalling by cascaded networks (cracks and spalling are in red and white for
the overlaid and predicted images, respectively, and background of the predicted images is in black).

Prepared using sagej.cls


Bai et al. 10

1) φ-Net dataset

2) MCEI2017 dataset

3) PEI2017 dataset

(a) (b) (c) (d) (e) (f) (g) (h)

Figure 11. Some examples of predictions from three Mask R-CNNs for three public datasets. (a) and (e), (b) and (f), (c) and (g), and (d) and
(h) denote original image, overlaid image of Cascade, APANet and HRNet Mask R-CNN, respectively. Bounding boxes, cracks and spalling are in
green, yellow and purple, respectively.

the damage from visual data collected from our cell phones building, cell phone cameras and drone cameras well
(iPhone 12 Pro) and drones (Wingtra and DJI). In this field documented the entire process of slab, facade and column
experiment, the building was instrumented and portions of removals and associated damage. Damage progression was
concrete slabs and facades at each floor level and reinforced captured from different angles at different scales during the
concrete columns in the top two stories were removed from removal of the reinforced concrete building.
the building. Our stationary cameras inside and outside the

Prepared using sagej.cls


Bai et al. 11

original predicted original predicted original predicted

Figure 12. Some examples of predictions by APANet Mask R-CNN for images from a progressive collapse experiment of a building in the
field. Damage is indicated by green bounding boxes (i.e., boundaries of detected damage), purple (i.e.,spalling) and yellow

For damage detection from images collected during this after two new training datasets were prepared for two
field experiment, we used a total of 523 images from cell separate studies. With a new feature pyramid network 34 and
phones and drones as the dataset for testing and evaluating spatial attention mechanism35, APANet Mask R-CNN is shown
the performance of APANet Mask R-CNN, since these images to outperform the cascaded networks and the U-Net for crack
capture different damage levels (cracking, spalling, and mostly detection in φ-Net dataset12. In the test of spalling and crack
complete loss of concrete pieces) at various scales and detection, it also achieves an accuracy above 67.6% for all
different definitions. These images include three resolutions images collected in extreme events at various scales and
such as 1920×1080, 4030×3020 and 5470× 3640. Table 5 different definitions, and reaches an accuracy of approxi-
shows that the APANet Mask R-CNN works well to locate the mately 81% for an image dataset at low resolution but with
damage from cell phone images, but a weak detection rate is damage at various scales, which is more challenging for
observed for drone images. This is mainly because there are structural damage detection.
less images from drones in the training data. Some examples
of good predictions from these images are shown in Figure 12, 4) We applied the APANet Mask R-CNN for damage
in which images from the cell phones and drones are in the detection from images collected during a field experiment
first and second rows, respectively. In this preliminary analysis, investigating progressive collapse of a building to study the
the APANet Mask R-CNN shows its robustness in locating the effectiveness of this method.
overall damage in the building.
These solutions with deep learning methods can not only
solve problems of classification and localization of different
Conclusions and future work structural damage in extreme events, but also can be used in
Several deep learning pipelines have been proposed as structural health monitoring for in-service structures and
solutions for the classification and detection of different automatic quantification of the damage in laboratory
structural damage at various scales and resolutions in images experiments as Woods et al.7 did.
collected after extreme events, such as large earthquakes.
In the future, more images will be collected and labeled to
This research aims to improve the understanding of deep
counteract the distractions by other objects in images and to
learning techniques and to make them practical and suitable
overcome the imbalance problem observed in some training
for applications of automated structural damage detection.
data. We plan to test more image datasets collected from
Our conclusions are as follows:
extreme events and quantify the damage with our methods
1) Our research shows that a 152-layer ResNet classifier can
and photogrammetry skills and conduct experiments for real-
perform well for multi-class damage classification when
time SDD during field inspections.
transfer learning and parameter fine-tuning are utilized.
Acknowledgements
2) In addition to classifying damage, cascaded networks are
This study is partially based upon work supported by the U.S. National
used to localize the damage. In our pipeline, we added a U-
Science Foundation under grant no. 2036193.
Net segmentation network after the existing classification
networks to achieve this. Our tests show that the cascaded
networks outperform the U-Net as the only network for References
detecting cracks and spalling. 1. Goodfellow I, Bengio Y and Courville A. Deep Learning. MIT
Press, 2016. http://www.deeplearningbook.org.
3) An approach for damage detection with end-to-end
networks is developed with the state-of-the-art Mask R-CNNs. 2. Stuart R and Peter N. Artificial Intelligence: A Modern Approach.
This approach is tested on public post-event image datasets 2020. Prentice Hall, Upper Saddle River, NJ.

Prepared using sagej.cls


Bai et al. 12
3. Russakovsky O, Deng J, Su H et al. Imagenet large scale visual Automation in Construction 2019; 104: 129–139.
recognition challenge. International journal of computer vision 20. Cha YJ, Choi W and Buy¨ uk¨ ozt¨ urk O.¨ Deep learning-based
2015; 115(3): 211–252. crack damage detection using convolutional neural networks.
4. Minaee S, Boykov Y, Porikli F et al. Image segmentation using Computer-Aided Civil and Infrastructure Engineering 2017; 32(5):
deep learning: A survey. arXiv preprint arXiv:200105566 2020; . 361–378.
5. Spencer Jr BF, Hoskere V and Narazaki Y. Advances in computer 21. Dung CV et al. Autonomous concrete crack detection using deep
vision-based civil infrastructure inspection and monitoring. fully convolutional neural network. Automation in Construction
Engineering 2019; 5(2): 199–222. 2019; 99: 52–58.
6. Yang L, Li B, Li W et al. Semantic metric 3d reconstruction for 22. Liu Y, Yao J, Lu X et al. Deepcrack: A deep hierarchical feature
concrete inspection. IEEE/CVF Conference on Computer Vision learning architecture for crack segmentation. Neurocomputing
and Pattern Recognition Workshops (CVPRW) 2018; : 2019; 338: 139–153.
1624–16248. 23. Cha YJ, Choi W, Suh G et al. Autonomous structural visual
7. Woods JE, Yang YS, Chen PC et al. Automated crack detection inspection using region-based deep learning for detecting
and damage index calculation for rc structures using image multiple damage types. Computer-Aided Civil and Infrastructure
analysis and fractal dimension. Journal of Structural Engineering Engineering 2018; 33(9): 731–747.
2021; 147(4): 04021019. 24. Kim B and Cho S. Image-based concrete crack assessment using
8. Gao Y and Mosalam KM. Deep transfer learning for imagebased mask and region-based convolutional neural network. Structural
structural damage recognition. Computer-Aided Civil and Control and Health Monitoring 2019; 26(8): e2381.
Infrastructure Engineering 2018; 33(9): 748–768. 25. Kalfarisi R, Wu ZY and Soh K. Crack detection and segmentation
9. Bai Y, Zha B, Sezen H et al. Deep cascaded neural networks for using deep learning with 3d reality mesh model for quantitative
automatic detection of structural damage and cracks from assessment and integrated visualization. Journal of Computing in
images. ISPRS Annals of the Photogrammetry, Remote Sensing Civil Engineering 2020; 34(3):
and Spatial Information Sciences 2020; 2: 411–417. 04020010.
10. Yeum CM, Dyke SJ and Ramirez J. Visual data classification in 26. Bai Y, Sezen H and Yilmaz A. End-to-end deep learning methods
post-event building reconnaissance. Engineering Structures for automated damage detection in extreme events at various
2018; 155: 16–24. scales. 25th International Conference on Pattern Recognition
11. Yeum CM, Dyke SJ, Benes B et al. Postevent reconnaissance (ICPR) 2021; : 6640–6647.
image documentation using automated classification. Journal of 27. Brooks J. COCO Annotator. https://github.com/ jsbroks/coco-
Performance of Constructed Facilities 2019; 33(1): annotator/, 2019.
04018103. 28. He K, Zhang X, Ren S et al. Deep residual learning for image
12. Gao Y and Mosalam KM. Peer hub imagenet: A large-scale recognition. Proceedings of the IEEE conference on computer
multiattribute benchmark data set of structural images. Journal vision and pattern recognition 2016; : 770–778.
of Structural Engineering 2020; 146(10): 04020198. 29. Zha B, Bai Y, Yilmaz A et al. Deep convolutional neural networks
13. Hoskere V, Narazaki Y, Hoang T et al. Vision-based structural for comprehensive structural health. International journal of
inspection using multiscale deep convolutional neural networks. computer vision 2019; 115(3): 211–252.
arXiv preprint arXiv:180501055 2018; : arXiv–1805. 30. Ronneberger O, Fischer P and Brox T. U-net: Convolutional
14. Ali L, Khan W and Chaiyasarn K. Damage detection and networks for biomedical image segmentation. International
localization in masonry structure using faster region Conference on Medical image computing and computerassisted
convolutional networks. International Journal 2019; 17(59): intervention 2015; : 234–241.
98–105. 31. Ruiz RDB, Lordsleem Junior AC, Sousa Neto AFd et al.
15. Kong X and Li J. Automated fatigue crack identification through Processamento digital de imagens para detecc¸ao autom˜ atica
motion tracking in a video stream. Sensors and Smart Structures de´ fissuras em revestimentos ceramicos de edifˆ ´ıcios.
Technologies for Civil, Mechanical, and Aerospace Systems 2018; Ambiente Constru´ıdo 2021; 21: 139 – 147.
10598: 212–219. 32. Ren S, He K, Girshick R et al. Faster r-cnn: Towards real-time
16. Atha DJ and Jahanshahi MR. Evaluation of deep learning object detection with region proposal networks. Advances in
approaches based on convolutional neural networks for neural information processing systems 2015; : 91–99.
corrosion detection. Structural Health Monitoring 2018; 17(5): 33. He K, Gkioxari G, Dollar P et al. Mask r-cnn.´ Proceedings of the
1110–1128. IEEE international conference on computer vision 2017; :
17. Ghosh Mondal T, Jahanshahi MR, Wu RT et al. Deep learning- 2961–2969.
based multi-class damage detection for autonomous post- 34. Liu S, Qi L, Qin H et al. Path aggregation network for instance
disaster reconnaissance. Structural Control and Health segmentation. Proceedings of the IEEE Conference on Computer
Monitoring 2020; 27(4): e2507. Vision and Pattern Recognition 2018; : 8759–8768.
18. Zhang A, Wang KC, Fei Y et al. Deep learning–based fully 35. Zhu X, Cheng D, Zhang Z et al. An empirical study of spatial
automated pavement crack detection on 3d asphalt surfaces attention mechanisms in deep networks. Proceedings of the IEEE
with an improved cracknet. Journal of Computing in Civil International Conference on Computer Vision 2019; :
Engineering 2018; 32(5): 04018041. 6688–6697.
19. Liu Z, Cao Y, Wang Y et al. Computer vision-based concrete crack
detection using u-net fully convolutional networks.

Prepared using sagej.cls


Bai et al. 13
36. Sun K, Zhao Y, Jiang B et al. High-resolution representations for
labeling pixels and regions. arXiv preprint arXiv:190404514
2019; .
37. Cai Z and Vasconcelos N. Cascade r-cnn: High quality object
detection and instance segmentation. IEEE Transactions on
Pattern Analysis and Machine Intelligence 2019; : 1–1.
38. Ozgenel C¸F.¨ Concrete crack images for classification.
Mendeley Data, v1 http://dx doi org/1017632/5y9wdsg2zt
2018; 1.
39. Bai Y, Sezen H and Yilmaz A. Detecting cracks and spalling
automatically in extreme events by end-to-end deep learning
frameworks. ISPRS Annals of the Photogrammetry, Remote
Sensing and Spatial Information Sciences 2021; V-2-2021:
161–168.
40. Chen K, Wang J, Pang J et al. MMDetection: Open mmlab
detection toolbox and benchmark. arXiv preprint
arXiv:190607155 2019; .
41. Sim C, Laughery L, Chiou TC et al. 2017 pohang earthquake -
reinforced concrete building damage survey, 2018. URL
https://datacenterhub.org/resources/14728.
42. Purdue-University. Buildings surveyed after 2017 mexico city
earthquakes, 2018. URL https://datacenterhub.
org/resources/14746.
43. Buslaev A, Iglovikov VI, Khvedchenya E et al. Albumentations:
Fast and flexible image augmentations. Information 2020; 11(2).

Prepared using sagej.cls

You might also like