0% found this document useful (0 votes)
81 views17 pages

1 s2.0 S0926580522002618 Main

This paper presents an integrated CNN-FCN crack detection system that utilizes photogrammetric 3D texture mapping for inspecting concrete structures. The system achieves high accuracy in detecting and segmenting cracks at the pixel level, with notable performance metrics including 99.88% accuracy and an F1 score of 86.01%. The proposed method enhances the visualization of crack maps on 3D models, facilitating more efficient inspections of larger structural areas.

Uploaded by

Reena Abhishek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views17 pages

1 s2.0 S0926580522002618 Main

This paper presents an integrated CNN-FCN crack detection system that utilizes photogrammetric 3D texture mapping for inspecting concrete structures. The system achieves high accuracy in detecting and segmenting cracks at the pixel level, with notable performance metrics including 99.88% accuracy and an F1 score of 86.01%. The proposed method enhances the visualization of crack maps on 3D models, facilitating more efficient inspections of larger structural areas.

Uploaded by

Reena Abhishek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Automation in Construction 140 (2022) 104388

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Integrated pixel-level CNN-FCN crack detection via photogrammetric 3D


texture mapping of concrete structures
Krisada Chaiyasarn a, Apichat Buatik a, *, Hisham Mohamad b, Mingliang Zhou c,
Sirisilp Kongsilp d, Nakhorn Poovarodom a
a
Thammasat Research Unit in Infrastructure Inspection and Monitoring, Repair and Strengthening (IIMRS), Faculty of Engineering, Thammasat School of Engineering,
Thammasat University Rangsit, Klong Luang, Pathumthani 12120, Thailand
b
Civil & Environmental Engineering Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
c
Department of Geotechnical Engineering College of Civil Engineering, Tongji University, Shanghai, China
d
Department of Electrical and Computer Engineering, Thammasat School of Engineering, Thammasat University Rangsit, Klong Luang, Pathumthani 12120, Thailand

A R T I C L E I N F O A B S T R A C T

Keywords: Although multiple learning-based crack detection systems show promising results in detecting cracks with pixel
Crack detection accuracy on individual images, few effectively enable inspection of larger structures. This paper thereby proposes
Convolutional Neural Network (CNN) an advanced inspection reporting system based on an integrated CNN-FCN crack detection system applied on the
Fully Convolutional Network (FCN)
texture space of a footing, enabling crack inspection and display for larger structures. The system, a Convolu­
Image-based 3D modelling
Crack mapping
tional Neural Network (CNN) and a Fully Convolutional Network (FCN), segments cracks at the pixel-level on the
texture space, acquired from a 3D model created with photogrammetry techniques. Firstly, the trained CNN is
employed to detect crack patches, then imported to the trained FCN system to segment cracks at the pixel-level,
and a crack map is then generated which is projected onto a 3D model. This system indicates promising results
for footing textures as represented by: Accuracy (99.88%), Precision (82.2%), Recall (90.2%), and F1 Score
(86.01%).

1. Introduction techniques [5]. DNNs can detect objects from the image level to the pixel
level, in which each pixel can be identified as that of a target object [6].
Periodic crack detection is vital in maintaining the integrity of The current research is predominantly focused on improving the accu­
structures as these damages are a preliminary indication of future racy of DNNs to solely detect cracks on individual images [4–6]. The
structural deterioration. Crack detection enables the mitigation of se­ benefits of these systems aside, their displays are restricted to compact
vere damage in aging structures which require regular maintenance [1, areas (via individual images). A solution which accurately and effi­
2]. Visual inspection is commonly relied on to assess the condition of ciently visualizes large-area crack detection has not yet been widely
non-destructive structures [3]. This method, however, is implemented in the field.
time-consuming and labour-intensive, which requires the skill set of To overcome this challenge, researchers have attempted to create
experienced inspectors for manual, and visual, assessment. The stren­ systems that detect and display damages in a larger area by relying on a
uous reliance on human capital resources calls for an urgent need to Three-Dimensional (3D) image-based system. Their systems rectify the
automate and improve the visual inspection process. large-scale problems at hand by increasing the field of view for an in­
Over the years, researchers have proposed algorithms to automati­ spection report. Wu et al. [7] is an example of such a damage inspection
cally detect cracks in concrete surfaces. With the emergence of deep report relying on an image-based 3D model, where cracks are displayed
learning in computer vision, a Deep Neural Network (DNN) has on a water tank in 3D. Unfortunately, difficulties have come to light
demonstrated outstanding potential for classification, detection, and when attempting to create crack maps from individual images. The most
recognition problems [4]. Compared to traditional methods, DNNs have concerning of which is misalignment, when individual images are
more layers and parameters than other classical machine learning collated to crack maps.

* Corresponding author.
E-mail addresses: [email protected] (K. Chaiyasarn), [email protected] (A. Buatik), [email protected] (H. Mohamad), zhoum@
tongji.edu.cn (M. Zhou), [email protected] (S. Kongsilp), [email protected] (N. Poovarodom).

https://doi.org/10.1016/j.autcon.2022.104388
Received 7 October 2021; Received in revised form 21 May 2022; Accepted 25 May 2022
Available online 2 June 2022
0926-5805/© 2022 Elsevier B.V. All rights reserved.
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 1. Example pictures of a concrete footing.

Fig. 2. Inspection site has footing of size 10x5 m2. The inspection area of 4 m2 is highlighted in a yellow box.

In this paper, an integrated CNN-FCN crack detection system is and used in the training stage to balance the training data to reduce false
applied to the texture space of footings acquired from their 3D models. It detections caused by any distortions on said texture space.
should be noted that problems may occur when applying a crack The contributions of this paper to the field are three-fold. Firstly, a
detection algorithm, which is solely trained on authentic images, on the crack detection system on a large area of structures with pixel accuracy
texture space. This is based on distortions, such as blurring, misalign­ is presented. Secondly, an integrated CNN-FCN crack detection system is
ment, and colour mismatching which may result from an image-based examined – which shows promising results when applying a two-stage
3D reconstruction process [8]. Therefore, the texture space referred to process, where CNN is used to filter only crack patches for the FCN
has been synthesized from realistic 3D models to generate image patches crack segmentation system. Finally, a new visualization method is

2
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

introduced, where a crack map is conveniently displayed on a 3D model,


assisting inspectors in their crack location attempts.

2. Related work

In this section, the available research on crack detection, based on


deep learning, is reviewed as this technique is more effective in crack
detection than traditional methods. The section also contains a discus­
sion on the works related to crack mapping on 3D models using an
image-based 3D modelling method.

2.1. Deep learning-based crack detection

Deep learning has been successful in breaking the limitations of


computer vision [9]. A multitude of research has applied deep learning
for crack detection using Convolutional Neural Networks (CNNs) which,
at the time of its introduction, was a state-of-the-art technique to detect
damages in structures. Zhang et al. [10] proposed a simple neural
network, consisting of four convolution layers and two fully connected
layers, totalling six layers to perform crack detection on image patches.

Table 1
Summary of 3D reconstruction parameters used in Agisoft Metashape software.
Workflow Option Detail

Accuracy High
Generic preselection Select
Align Photos
Key point limit 50000
Tie point limit 0
Quality High
Build Dense Cloud Depth filtering Mild
Fig. 3. System overview and contribution of this paper. Calculate point colours Select
Source data Dense cloud
Build Mesh Surface type Arbitrary (3D)
Face count High
Mapping mode Generic
Build Texture Blending mode Mosaic (default)
Texture size/count 4096 × 10

Fig. 4. Image of data collection strategies. (top) Grid for 2D maps flight path. (bottom) Double grid for 3D model flight path.

3
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 5. The texture model and camera position of concrete footing structure.

Table 2
Summary of virtual camera parameters from the camera registration process in
the concrete footing model.
Parameter Matrix

Intrinsic (K) f = 111450


cx and cy = 0
k1 = 0.08
b1 = -1220 and b2 = 5.5
p1 = -0.0005 and p2 = -0.0004
Extrinsic [ R | t ] ⎡ 1 0.03 − 0.09 38 ⎤
⎣ 0.03 − 1 0.03 − 6 ⎦
− 0.09 − 0.03 − 1 105

Cha et al. [5] presented an eight-layer CNN and used a sliding window
technique to scan a high-resolution image. Their proposed method was
compared with traditional crack detection methods, and the proposed
deep learning CNN was able to overcome multiple limitations that
troubled the traditional methods. Additionally, CNN has been applied to
other methods to increase the efficiency of crack detection and crack
segmentation. Chen et al. [11], for example, proposed an 11-layer CNN
integrated with Naïve Bayes to detect multiple damages for the inspec­
tion of nuclear power plants; whilst Dorafshan et al. [12] compared the
performance of common edge detectors (Roberts, Prewitt, Sobel, Lap­
lacian of Gaussian, Butterworth, and Gaussian) and deep Convolutional
Neural Networks called AlexNet DCNN architecture – which are used to
reduce the noise whilst segmenting a crack on an image; Zhang et al.
[13] proposed an improved CNN architecture of CrackNet known as
CrackNet II with enhanced learning capabilities and faster performance.
The improved CrackNet ll presented a deeper layer and feature gener­ Fig. 6. Overview of texture space data reading from texture model.
ator, modified to improve its efficiency. CrackNet ll is also able to

4
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 7. Example of training datasets (a) an open-source dataset (b) raw concrete footing images (c) texture spaces with multiple distortions (d) annotated crack
(ground truth) on image patches from the texture space with multiple distortions, such as blurring.

has been previously trained on a large dataset. These pre-trained models


Table 3 are typically used in multi-classification tasks and are a familiar and
Summary of concrete crack image datasets.
functional approach to deep learning on small datasets. Transfer
Task No. of Crack Non- Training Validation learning and the fine-tune model techniques increase its efficacy. Pre-
image crack dataset dataset trained models of prominent CNN architecture such as AlexNet [12],
patches
VGG16 [14], GoogLeNet [15], and Resnet50 [16] were typically trained
Classification 30000 15000 15000 20000 10000 (33%) on a large image dataset, such as ImageNet [17], which is widely
(100%) (67%)
regarded as reliable for general image classification. A benefit of training
Segmentation 4056 4056 - 3245 811 (20%)
(100%) (80%) on these datasets is that the pre-trained model can transfer data that has
Testing 3911 308 3603 - - been trained from the ImageNet dataset, thereby eliminating the need
for certain tasks in the training and fine-tune model processes. In
Note: All image patches are 150 × 150 pixel2; The testing dataset was synthe­
sized from the texture space of the area of interest as shown in Fig. 2. essence, these tasks are then performed without consuming the com­
puter's resources. A disadvantage to this tool is the impact on the de­
tection's accuracy should the training datasets be different from the
segment cracks without the need to integrate with other traditional
original datasets. Transfer learning has been proposed to improve the
methods. Yet, it still produces inefficient results when applied to intri­
efficiency and accuracy of a crack classifier in that regard [18,19].
cate surfaces.
The majority of the researchers have proposed image-based classi­
The inclusion of pre-trained models has further expanded the capa­
fication methods for object detection, using patch-based/ bounding
bilities of deep learning. Pre-trained models are a recorded network that
boxes. Yet, a pixel-based classification or image segmentation is still

Fig. 8. Overview of a crack detection system. The size of patch is 150 x 150 pixels with the stride of 75 pixel.

5
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

deconvolution layer to create a Fully Convolutional Network (FCN). The


FCN has thereby introduced an end-to-end pipeline [20–22]. Yang et al.
[6] proposed FCN architecture trained by multiple types of crack data­
sets to semantically identify and segment pixel-size cracks at different
scales. The produced crack segmentation is then represented by
single-pixel width skeletons to quantitatively measure crack length,
maximum width, and mean width. Dung [23] proposed a crack detec­
tion method based on FCN for semantic segmentation on a public con­
crete crack dataset of 40,000 images (227 × 227 pixel2). Here, the
feature extraction layer uses three different pre-trained network archi­
tectures (VGG16, InceptionV3 and ResNet). The training results indicate
that VGG16 offered the best performance. FCN has recently introduced
DeepCrack, a more complex decoder layer [24,25], which consists of the
extended Fully Convolutional Networks (FCN) and the
Deeply-Supervised Nets (DSN) [26], and provides integrated direct su­
Fig. 9. The framework of an FCN for image segmentation. pervision for features of each convolutional stage. The new integration
has reached state-of-the-art performance milestones and outperforms
the methods currently in use. Yang et al. [27] proposed a novel network
Table 4
architecture, known as the Feature Pyramid and Hierarchical Boosting
The architecture of the FCN presented.
Network (FPHBN), for pavement crack detection. FPHBN integrates
Layer name Filter/Kernel size Stride Output size local information to low-level features for crack detection in a feature
(H, W, I, O)/(H, W) (H, W) (H, W, D)
pyramid format. By balancing the data of samples, by nested sample
Input image - - (150, 150, 1) re-weighting in a hierarchy during training, results have shown that the
block1_conv1 (3, 3, 3, 64) (1, 1) (150, 150, 64) FPHBN outperforms current methods in terms of its accuracy and its
block1_conv2 (3, 3, 3, 64) (1, 1) (150, 150, 64)
block1_pool (Max Pooling) (2, 2) (2, 2) (75, 75, 64)
universal application.
block2_conv1 (3, 3, 64, 128) (1, 1) (75, 75, 128) In this paper, the integrated CNN-FCN architecture is proposed for
block2_conv2 (3, 3, 128, 128) (1, 1) (75, 75, 128) automatic crack detection. CNN and FCN were used to classify and
block2_pool (Max Pooling) (2, 2) (2, 2) (37, 37, 128) segment cracks, respectively, to improve crack detection accuracy. It
block3_conv1 (3, 3, 128, 256) (1, 1) (37, 37, 256)
should be noted that the majority of research referred to above is
block3_conv2 (3, 3, 256, 256) (1, 1) (37, 37, 256)
block3_conv3 (3, 3, 256, 256) (1, 1) (37, 37, 256) focused on crack detection in individual images. As emphasized previ­
block3_pool (Max Pooling) (2, 2) (2, 2) (18, 18, 256) ously, these architectures and their subsequent improvements have yet
block4_conv1 (3, 3, 256, 512) (1, 1) (18, 18, 512) to reduce the difficulty to detect cracks on a larger area of structures.
block4_conv2 (3, 3, 512, 512) (1, 1) (18, 18, 512) The following section is a review of previous research based on reporting
block4_conv3 (3, 3, 512, 512) (1, 1) (18, 18, 512)
block4_pool (Max Pooling) (2, 2) (2, 2) (9, 9, 512)
cracks through image-based 3D modelling. This method is advantageous
fully_con1 (9, 9, 512, 4096) (1, 1) (9, 9, 4096) in that it can report cracks in the context of larger structures whilst
dropout_1 - - (9, 9, 4096) providing additional information in 3D.
fully_con2 (1, 1, 4096, 4096) (1, 1) (9, 9, 4096)
dropout_2 - - (9, 9, 4096)
2.2. Image-based 3D modelling for crack reporting
block5_conv (1, 1, 4096, 2) (1, 1) (9, 9, 2)
Deconv1 (4, 4, 2, 2) (2, 2) (18, 18, 512)
Deconv1 + block3_pool - - (18, 18, 512) Geometric distortion presents another challenge for crack reporting
Deconv2 (4, 4, 2, 2) (2, 2) (37, 37, 256) [28] due to the motion model of image mosaicking, especially when
Deconv2 + block2_pool - - (37, 37, 256)
individual crack image data is used for deep learning purposes. Gener­
Deconv3 (4, 4, 2, 2) (2, 2) (75, 75, 128)
Deconv3 + block1_pool - - (75, 75, 128)
ally, crack detection algorithms can create a crack map on an individual
Deconv4 (4, 4, 2, 2) (2, 2) (150, 150, 2) image, yet these individual cracks cannot be combined into a mosaic for
Sigmoid - - (150, 150, 1) a large area due to misalignment. Therefore, it is essential to include
Prediction - - (150, 150) training images from mosaic images to balance the training datasets
Note: H = height; W = width; I = input channels; O = output channels; D = which come from original and mosaic images. Jahanshahi and Masri
depth. [29] have proposed a crack detection and segmentation system which
adjusts automatically using the depth parameters obtained in 3D scene
reconstruction. The system then employs the customary edge-based
Table 5 approach with mathematical morphology to extract the crack from the
Summary of 3D modelling results of a concrete footing structure. background. Torok et al. [30] proposed a crack detection algorithm to
Parameter Detail visualize 3D-mesh models. This crack detection algorithm can predict if
building elements (e.g., columns) are undamaged; which requires
Number of images 414 images
Projections 1777797 features normal surface mesh to be perpendicular to the axial direction of the
RMS reprojection error 0.886 pixels elements. Liu et al. [31] proposed an advanced image-based crack
Sparse point cloud 310517 points assessment methodology for bridge piers using UAV and 3D scene
Dense point cloud 29199234 points
reconstruction. Their research, on the Digital Image Processing (DIP)
Surface mesh 1946615 faces
Vertex 979158 vertices
method for crack detection and projection, has proven to be useful in
Texture 4096 × 10 texture size/count segmenting cracks from individual images onto a meshed 3D surface
Texture space data (region of interest) Array dimensions: [7500 × 6000 × 3] model – thereby correcting both the perspective and geometric distor­
tions of uneven structural surfaces. The system is not yet fully automated
as it is requires inspectors to locate cracks in the structure. In addition,
necessary as it can provide more precise information about cracks such
crack detection with pixel-level accuracy is not present. Kalfarisi et al.
as width, length or thickness. CNNs have been adapted to semantic
[8] presented deep learning and machine learning-based approaches for
segmentation by adding the final layer with the upsampling/ decoder/
crack detection and segmentation. Their method deploys a Faster

6
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 10. Texture space data from a 3D model of concrete footing in the area of interest.

Region-Based Neural Network (FRCNN) and a Structured Random For­ areas to improve the crack segmentation accuracy of FCN. The prin­
est Edge Detection (SRFED). The objective of this model is to detect cipal advantages of the proposed system are the expansion of surface
cracks on large structures such as water tanks [7], which can be dis­ area which can be inspected, and a significant increase in the accuracy of
played on a 3D mesh model that enables a quantitative assessment by crack detection at the pixel-level using CNN-FCN. Moreover, by pro­
providing an overview of an inspected structure. However, this system jecting crack maps onto the 3D model, visualization for inspectors is
cannot provide a crack map with pixel-level accuracy since detecting streamlined.
cracks on individual images from multiple perspectives may result in
misalignment issues when the individual pixels are projected back into a 3. Methodology
3D model. It should be noted that, as presented by their research [7,8],
the classification process aids in reducing the inaccuracies of the seg­ This study employed a CNN-FCN crack detection system with results
mentation process. Albeit the proposed systems were successful in displayed on a 3D concrete footing. The dimensions of which are
detecting and displaying damages of a larger area on a 3D model, they approximately 5 × 10 m2 with pixel accuracy. Fig. 1 shows an example
fail to address the misalignment of detected cracks when mapping in­ of concrete footing structures studied in this research. A variety of cracks
dividual images on the 3D model. To reiterate, a faultless system that are visible on the surface of the structures. The target inspection area is
can accurately and efficiently display cracks on a 3D model with approximately 4 m2, as shown in Fig. 2. The system commences with the
pixel-level accuracy on a large area has not yet been established. The image data acquisition method used to create 3D models and the texture
alternative to date has been manual investigation by inspectors to space data. Then, the raw image dataset, obtained from another concrete
confirm the location of damages on 3D mesh models [31]. The proposed footing structure, is prepared and used in the training process for CNN
crack detection system in this paper, therefore, with integrated crack and FCN. To reiterate, image patches from public crack datasets and
visualization in 3D models, aims to provide a remedy for this inspection texture spaces were included during training to prevent overfitting and
reporting nuisance. reduce false detections from any distortions on the texture space,
In this paper, CNN-FCN was used to classify and segment the cracks respectively. Finally, the CNN-FCN can detect cracks in the texture
on the texture space, providing details of a wider area with higher res­ space, the result of which is called a “crack map”. For efficiency, the
olution than that of an individual image. The texture space data of crack map can be projected onto the 3D surface model to be displayed in
footings was generated using a photogrammetry technique to read data a 3D calibrated system. Fig. 3 illustrates the system overview and
from a 3D model. Image patches generated from the texture space as contribution of this paper.
well as from original crack images were used in the training stage to
balance the training data, thereby reducing false detections caused by
3.1. Data collection and 3D reconstruction
any distortions. These techniques propose an increased coherency
method for solving misalignment complications when attempting to
Two methods are in practice for the collection of crack image data:
create crack maps. In addition, as CNN has been used with FCN to detect
Firstly, the 2D (Two-Dimensional) grid path method is employed to
cracks at the pixel-level, CNN further aids by filtering out non-crack
gather overview information of structures. Often, this practice relies on

7
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 11. (top) Accuracy and (bottom) loss of training and validation processes of each architecture for image classification.

using drone rendered images (dimensions: 5472 × 3648 pixel2) which 3.2. Texture data reading
are captured from higher altitudes with the camera angle perpendicular
to the ground as shown in Fig. 4 (top). Alternatively, the 3D grid path The concept in this section uses a pinhole camera model [40]. In this
method is put into use when data collection is required for a 3D model. model, a scene view is formed by projecting 3D points into the image
In this scenario, the camera's view is angled (θ) to the ground, vertically, plane using a perspective transformation, with the equation as follows:
as shown in Fig. 4 (bottom). In both practices, the DSLR camera stra­
m′ = K [ R | t ] M ′ (1)
tegically moves in a sweeping curved motion to collect data (di­
mensions: 4896 × 3264 pixel2) from the entire area. This study relied on ⎡ ⎤
⎡ ⎤ ⎡ ⎤⎡ ⎤ X
a total of 414 images captured by drone and DSLR camera. u f 0 cx r11 r12 r13 tx ⎢ ⎥
The image-based 3D reconstruction technique provides an undivided ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ Y⎥
⎥⎢ ⎥
⎢v⎥ = ⎢0 f cy ⎥⎢
⎦⎣ r21 r22 r23 ty ⎦⎢ ⎥ (2)
model of structures in addition to disclosing their relative size, shape ⎣ ⎦ ⎣ ⎢Z⎥
⎣ ⎦
and texture. This technique encompasses several features from Image 1 0 0 f r31 r32 r33 tz
1
Processing, Computer Vision and Machine Learning to create a 3D model
[32]. These features include the photogrammetry technique: Structure
where (X, Y, Z) are the coordinates of a 3D point in the world coordinate
From Motion (SFM) [33–36], Clustering Views for Multi-View Stereo
space. (u, v) are the coordinates of the projection point in pixels. K is a
(CMVS), Patch-based Multi-View Stereo (PMVS) [37], and Poisson
camera matrix or a matrix of intrinsic parameters. (cx, cy) is a principal
Surface Reconstruction (PSR) [38].
point that is usually at the image centre. f is the focal lengths expressed
For this paper the Agisoft Metashape software [39], a program that
in pixel units. The rotation-translation matrix [ R | t ] is called an
utilizes the knowledge of image-based 3D reconstruction, was tasked
extrinsic matrix. It is used to describe the camera motion around a static
with the 3D modelling process. Details of all parameters used by the
scene. Therefore, the camera in 3D space, can move and change
Agisoft Metashape software are shown in Table 1. Note that these pa­
perspective through the equations as follows:
rameters are suggested to achieve the highest quality 3D model. The
⎡ ⎤ ⎡ ⎤
final result of the texture model is displayed in Fig. 5. x X
⎢ ⎥ ⎢ ⎥
⎢ y ⎥ = R⎢ Y ⎥ + t
⎣ ⎦ ⎣ ⎦ (3)
z Z

8
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 12. Results of crack classification on the texture space using pre-trained networks.

Table 6
Summary results of image classification in the testing dataset.
Model TP TN FP FN Accuracy Precision Recall F1 Score

VGG19 291 3588 15 17 0.992 0.951 0.945 0.948


ResNet152 298 3581 22 10 0.992 0.931 0.968 0.949
DenseNet201 295 3556 47 13 0.985 0.863 0.958 0.908
InceptionResNet 279 3437 166 29 0.950 0.627 0.906 0.741
InceptionV3 271 3525 78 37 0.971 0.777 0.880 0.825
Xception 232 3481 122 76 0.949 0.655 0.753 0.701

In fact, real lenses have some distortion, mainly radial distortion and
slight tangential distortion. So, the above model is extended as:
∑3 ( )2
x 1+ ki r2i xy x
x′′ = ∑3 i=1 + 2p1 2 + p2 (r2 + 2 ) (4)
z 1+ ki+3 r 2i z z
i=1

∑3 ( )2
y 1+ ki r2i y xy
y′′ = ∑3 i=1 + p1 (r2 + 2 ) + 2p2 2 (5)
z 1+ k i+3 r 2i z z
i=1

w
u= + cx + fx'' + b1 x'' + b2 y'' (6)
2

h
v= + fy'' + cy (7)
2

where r2 = x′ 2 + y′ 2. k1, k2, k3, k4, k5 and k6 are radial distortion co­
efficients. p1 and p2 are tangential distortion coefficients. b1 and b2 are
affinity and non-orthogonality (skew) coefficients. w and h are image
width and height in pixels, respectively. In this research, the camera
Fig. 13. Recall-Precision curve and Average Precision (AP) of each architecture registration process has simulated a virtual camera on a 3D model based
for image classification. on the location of the first camera and assumes that radial distortion
errors have a small effect (only the errors from k1). Details of the virtual
camera parameters are shown in Table 2. Finally, when the camera
registration process is complete, the virtual camera position will appear

9
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 14. A sample model and camera position.

Fig. 15. The results of crack segmentation from the proposed system in the misalignment test in (a) public crack images and (b) texture data.

10
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 16. The results of the 3D mapping of a sample model (a) a map created from individual maps by the ray-casting method (misalignments are shown in red, blue,
and black) (b) a crack map created from texture data (proposed system).

on the 3D model. The virtual camera can read texture data from the 3D training datasets where image patches are taken from synthesized
models through the equations described above. Fig. 6 shows an example texture data. Image patches with multiple distortions (Fig. 7(c)) are used
of texture space data. in the training process to reduce false detections. The totality of 30000
image patches contains both crack and non-crack images equal to 15000
3.3. Training data preparation patches respectively. For image segmentation, 4056 crack images have
been randomly selected from the total 15000 crack-labelled images of
The image dataset for the training process is a combination of 30000 the dataset and labelled by their crack or non-crack pixels for the ground
patches from an open-source dataset, raw concrete footing images, and truth dataset as shown in Fig. 7(d). The training and validation datasets
texture spaces (the size of each image patch is 150 × 150 pixel2): an for the image segmentation contain 3245 and 811 image patches,
open-source dataset of 15000 concrete crack image patches from various respectively. Table 3 displays a summary of all datasets in this research.
buildings on the Middle East Technical University campus [41]; raw
concrete footing images of 7500 patches obtained from other concrete 3.4. Overview of crack detection system
footing structures; and texture spaces of 7500 patches synthesized from
other 3D models of concrete footings. Fig. 7(c) shows an example of The crack detection system proposed in this research is a

11
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Table 7 used in this research was designed based on the references from U-Net
Summary results of crack segmentation on a concrete footing, compared be­ architecture [43] and Yang et al. [6]. The architecture of the FCN is
tween different systems, (1) FCN, (2) DeepCrack, and (3) RestNet152+FCN. displayed in Table 4.
Parameters FCN DeepCrack [[24], ResNet152 + FCN
[50]] (proposed system) 3.6.1. Activation layer
TP (pixels) 171295 131820 169497 Rectified Linear Unit (ReLU) remains the preferred method of
TN (pixels) 43644148 44554348 44775379 acquiring nonlinearity since its introduction [44] as a nonlinear acti­
FP (pixels) 1167939 257739 36708 vation function, commencing after the convolution process. ReLU is
FN (pixels) 16618 56093 18416
changed to Exponential Linear Unit (ELU) [45] in the deconvolution
Accuracy 0.9737 0.9930 0.9988
Precision 0.1279 0.3384 0.8220 layer. Kay et al. [46] have indicated that ELU provides improved
Recall 0.9116 0.7015 0.9020 generalization performance at a faster learning rate in comparison with
F1 Score 0.2243 0.4565 0.8601 ReLU. In addition, the training process is reduced, through a batch
Processing time 85.8 81.9 76.6 normalization operation [47], after each convolution layer. The dropout
(seconds)
operation is performed at each level of the encoding and decoding paths.
Note: Crack pixel are 187913 pixels. Non-Crack pixel are 44812087 pixels The sigmoid activation [27] at the last layer output, provides a value
between 0 and 1. The appealing features of this are its simplicity and the
combination of image classification and segmentation as shown in properties it encapsulates by being non-linear, continuously differential,
Fig. 8. A hindrance in the accuracy of crack segmentation on the texture monotonic, and having a fixed output range.
space is the texture complexity of concrete footing structures, which
have various crack formations and a larger area. To address this, inter­ 3.6.2. Loss function and optimization
operability between classification and segmentation is proposed. In In this process, binary cross-entropy is applied to compute the loss
essence, the crack detection system initiates by training the CNN and function in the training process. It is calculated through the function
FCN architectures. Afterwards, the CNN model is used to detect probable shown in Equation (8). The loss function was backpropagated through
areas of cracks through the sliding window method to filter out crack network structure and updated at each parameter.
patches. Note that image separation can cause misclassification if cracks
are at the edges of image spaces [5]. The framework is therefore 1 ∑N
Hp (q) = − yi ⋅log(p(yi )) + (1 − yi )⋅log(1 − p(yi )) (8)
designed to be scanned twice during the classification process using the N i=1
sliding window technique as shown in Fig. 8(middle box). Then, the The optimization process employs Adam optimization [48], which
crack areas detected by the CNN are entrusted to the FCN model to applies bias-correction to assist the gradients in converging in the right
segment cracks. Finally, the results from the FCN have fused, which direction at an improved speed.
overlapping area uses average pixels. The image classification and seg­
mentation procedures are explained in the section to follow.
3.7. Confusion matrix
3.5. Image classification
The performance metrics of precision, recall, accuracy and F1-score
In the classification process, the CNN architecture manoeuvres were used to validate the proposed method. Each of these metrics were
through several convolutional layers, an activation layer, pooling layer calculated based on True Positives (TP), True Negatives (TN), False
and a fully connected layer. Recently, “deeper” CNN architectures such Positives (FP) and False Negatives (FN) values. TP is the number of
as VGG19, ResNet, Inception, and Xception have improved their archi­ pixels correctly identified as cracks (positives). TN is the number of
tecture to improve data learning by increasing the number of weight pixels correctly identified as non-cracks (negatives). FP is the number of
layers and modifying the internal architecture to be more complex. pixels wrongly identified as cracks. FN is the number of pixels wrongly
Transfer learning has proven to improve the training efficiency and identified as non-cracks. The entirety of the performance metrics has the
accuracy of crack classifiers in this regard. To the value of this research, equations as follows:
six different pre-trained CNN models including VGG19, ResNet, Dense­ TP
Net, InceptionV3, InceptionResNetV2 and Xception have been experi­ Recall = (9)
TP + FN
mented with to evaluate their crack classification performance before
the introduction of crack segmentation by FCN. Firstly, the pre-training TP
Precision = (10)
model was loaded, without the output layer (i.e., top layer). Then, the TP + FP
output layer is customized to specify its classification purpose. The
output layer, in this research, is customized by softmax activation, Accuracy =
TP + TN
(11)
which then classifies each image by “crack” or “non-crack” classes, TP + TN + FP + FN
shown in Fig. 8(middle box). Lastly, the fine-tune process is applied to
2⋅TP
adjust the values of all parameters in the network to relate to the output F1 Score = (12)
2⋅TP + FP + FN
layer and training dataset. The result being a CNN model which may
classify an image patch as being either a crack or non-crack. In addition to evaluating the semantic segmentation, recall-precision
curves are used to estimate the performance of the proposed system by
3.6. Fully Convolutional Network (FCN) calculating the Average Precision (AP).

The FCN is the pixel-to-pixel convolutional network tasked with se­ 4. Results and discussion
mantic segmentation, as shown in Fig. 9. An FCN can be considered an
expansion of CNN, seeing as the predictions laid out are converted from 4.1. Image-based 3D modelling and texture space data
several classes to a semantic segmentation image (known as dense
prediction). FCNs consist of two components: down-sampling and up- Table 5 indicates the results of the 3D modelling process of a concrete
sampling. Down-sampling components contain convolutional layers footing structure. Projection values indicate the number of feature
[5], pooling layers [5], and dropout layers [42]; whilst up-sampling matching points (totalling 1777797) with an RMS re-projection error of
components contain deconvolutional layers [6]. The FCN architecture 0.886 pixels. The RMS re-projection error indicates the scanned errors of

12
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 17. The result of crack segmentation on a concrete footing from various systems, (a) original texture image (b) ground truth (c) crack classification using only
ResNet152 (d) FCN (e) DeepCrack [24,50] (f) the proposed system (ResNet152 + FCN).

13
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 18. The finalized crack map – the composition of crack pixels onto the texture space.

Fig. 19. The final result of crack detection on a concrete footing model.

the 3D modelling process from 414 images has an average error of 0.886 of re-projection error means that images are not well aligned, resulting
pixels in each image, resulting in the assumption that the process has in the final crack map having more misalignments. The 3D model pos­
few errors and concluding that the model is highly accurate. A high error sesses a surface mesh of 1946615 faces and texture of 4096 × 10 (texture

14
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

size/ count), boasting high-resolution quality. Fig. 10 illustrates the images, the colour values of pixels from different cameras can be com­
result of texture space data, read from the virtual camera in Section 3.2, bined in the final texture. To conduct this study, in reference to Table 1,
using the parameters from Table 2. Fig. 10 exemplifies what can be the Agisoft Metashape software [39] was configured with the “Mosaic”
expected from texture space data. Foremost, the texture data is the blending mode as it maintains controlled variables. It is the default
matrix array having sufficient dimensions [7500 × 6000 × 3], to detect setting for all experiments in this research. This preference is in part due
cracks. to how it blends low-frequency components, as it overlaps images to
In this study, images from drones and DSLR cameras were used to avoid seamline problems. Additionally, high-frequency components,
create a 3D model, with drones focused on collecting overview images of which are responsible for picture details, are taken from a single image –
the structure and a DSLR camera collecting close-up images to document one that presents a high resolution for the area of interest. The camera
fine details of the structure. The process of collecting image data to angle is nearly perpendicular to the reconstructed surface from this
create a 3D model is notably a daunting task. Effective texture data perspective.
acquisition requires a high-quality 3D model, which relies on the In this experiment, the combined ResNet152 and FCN (Section 4.2)
expertise and skillset of experienced drone operators. An additional method was used to segment cracks on individual public crack images
impediment to this task is the manual creation and selection of a virtual and texture data. Fig. 15(a) shows the results of crack segmentation on
camera view. Future research studies are likely to develop an autono­ individual images and Fig. 15(b) shows the results of crack segmentation
mous plane detection system for a 3D model. This would further reduce on the texture data. Following segmentation, crack maps were re-
manual labour by reading texture data from the parameters obtained by projected onto a 3D model to assess misalignment. Fig. 16(a) illus­
automatic plane detection. trates a crack map created by re-projecting individual maps onto the 3D
model and Fig. 16(b) illustrates a crack map created by re-projecting
4.2. Crack classification using pre-trained networks crack pixels onto the texture data of the 3D model. Foreseeably, the
crack map created from individual maps shows is prone to visible mis­
The classifiers are trained for 50 epochs as shown in Fig. 11. During alignments. This problem occurs due to the inability of multiple indi­
training, all pre-trained networks have achieved nearly 0.99 and 0.01 or vidual crack maps to align seamlessly given the ray-casting method from
accuracy and loss, respectively. Fig. 12 exemplifies the results of crack multiple perspective images, which, upon closer inspection is further
classification from the ResNet152 model whilst specifying the locations exacerbated by absent sections which have not been detected in some
of FPs and FNs. During testing, VGG19, ResNet152 and DenseNet201 other views. Therefore, crack pixels cannot be mapped correctly. This
achieved accuracy, precision and recall values upwards of 0.85 as shown limitation may be due to the detection system inconsistently differen­
in Table 6. Concurrently, the InceptionResNet, InceptionV3 and Xcep­ tiating crack pixels from multiple camera positions, which, therefore,
tion achieved lower precision and recall than the latter classifiers. It is cannot provide a complete seamless crack map. However, the proposed
the hypothesis then that VGG19, ResNet152 and DenseNet201 are system provides a better-segmented crack map with no misalignment as
suitable pre-trained networks to classify cracks. These distinguished pre- the map was created from a single texture image and crack pixels were
trained networks for image classification are thereafter applied to image detected on the texture data of the 3D model directly (without the ray-
segmentation. Fig. 13 details the recall-precision curve, whose graph casting method from multi-view perspectives). The result confirms the
aids by determining and comparing the performance of pre-trained practicality of the proposed system as a solution for creating a crack map
networks based on the Average Precision (AP). AP exhibits that for larger areas at the pixel level, which can be seamlessly rendered for a
ResNet152 is the optimal pre-trained network (AP = 0.977) for crack simple 3D geometric structure.
classification in concrete foundation structures. For the proposed method, a virtual camera can be adjusted to read
To prepare the training dataset, an open-source dataset [41] was texture data with a wider field of view to accommodate for an even
combined with our dataset of concrete footing structures to generalize larger area without losing resolution in the texture space. The proposed
and thereby avoid an overfitting problem. Training data was combined method can be useful for structures with the geometry of a developable
at random with the prepared datasets and public datasets to ensure the surface, such as cylindrical underground tunnels and planar facades as
proposed method's generalizability. Additionally, this allows the data to these geometric shapes can be unwrapped onto a plane without distor­
be used with other types of surfaces other than concrete. Public datasets tion [49]. The proposed system, however, may not be suitable for
contain various types of surfaces including asphalt road surfaces. Future structures with more complex geometry, which requires further study.
datasets (non-concrete surfaces) should be combined with our training
dataset to enhance the learning capabilities of this study's proposed deep 4.4. Crack segmentation using Fully Convolutional Network (FCN) on a
learning system. concrete footing

4.3. Misalignment test From the results shown in sections 4.2 and 4.3, the combined
ResNet152 and FCN were used to segment cracks on the texture data of a
In this section, a publicly available dataset on crack detection [7,8, concrete footing model. Table 7 shows the results of crack segmentation
31] was used to conduct an experiment to address misalignment prob­ on a footing compared between 3 systems, (1) the FCN only (running
lems. The experiment compared the results of crack maps, created by semantic segmentation on all patches directly), (2) pre-trained Deep­
combining maps from a ray casting method, and the proposed method Crack [24,50], and (3) the proposed system (ResNet152 + FCN). The
that obtained a map from the texture data in a 3D space through a virtual results showed that, for the crack segmentation task on a footing, the
camera. The experiment was conducted on simple concrete crack images proposed system surpassed the performance of the previously
(dimensions: 3900 × 3000 pixel2) from a public crack dataset to mentioned systems in all metrics, namely, Accuracy, Precision, Recall,
generate a sample model in the Agisoft Metashape software, as shown in and F1 Score of 0.9988, 0.8220, 0.9020, and 0.8601, respectively.
Fig. 14, using the aforementioned 3D reconstruction parameters from Moreover, the proposed CNN-FCN system benefits from a computational
Table 1. For the proposed method, the virtual camera was simulated in time of 76.6 s, which is faster than both FCN and DeepCrack with speeds
the 3D space through the camera registration process to read texture of 85.8 s and 81.9 s, respectively. Fig. 17 displays the final results from
data from a 3D model (the texture data having the following array di­ each crack detection system. Fig. 17(a) is the texture space data derived
mensions [3787 × 3121 × 3]). Previously, the standard process for from the 3D model, Fig. 17(b) is the ground truth, Fig. 17(c) is the result
generating crack maps was by assembling individual images which were of crack classification from ResNet152, Fig. 17(d) is the result of crack
then reprojected onto a 3D model through the ray-casting method, segmentation using only the FCN method, Fig. 17(e) is the result of crack
which resulted in misalignment problems. By ray-casting individual segmentation using the pre-trained DeepCrack architecture [24,50], and

15
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

Fig. 17(f) is the final result of the proposed system (ResNet152+FCN). based 3D reconstruction technique and texture data reading, applied
As observed in Fig. 17(f), the proposed method offers a clear crack map, to the crack detection system, enables the precise location detection of
whereas the map created by the FCN alone, Fig. 17(d), illustrates a map damage at pixel level on a concrete footing structure to be plotted in 3D
with a large amount of noise. Similarly, the final result from DeepCrack, without misalignment concerns (Figs. 18 and 19). Nonetheless, in this
Fig. 17(e), displays noise, albeit less. In conclusion, the results from the research, the inspection area from the texture space only demonstrates it
proposed system as shown in Table 7 and Fig. 18 detail the efficacy of on a selected 2 × 2 m2, for which future studies will need to cover all
the system as an alternative method to inspect cracks on a large surface areas in large structures. Finally, the proposed system initiates further
area. fields of study for the improvement and creation of a tool which may
It remains to be said that other factors may cause segmentation er­ measure the width/ length of cracks on a 3D model.
rors. The errors caused by the FCN architecture can be divided into two In conclusion, the integrated CNN-FCN crack detection system in­
cases. Firstly, the error can occur when an image contains thin cracks or dicates promising results. This is supported by achieving coherent crack
scratches, leaving a crack map with unwanted noise. Secondly, error can detection at the pixel level with accuracy, precision, recall, and F1 scores
result from incorrect crack segmentation itself. These errors can be of 99.88%, 82.2%, 90.2%, and 86.01%, respectively. Culminating this
mitigated or eliminated by increasing the amount of training data (e.g., research is the practical implication that inspectors, using the integrated
include data with different classes, such as scratches that may need to be CNN-FCN crack detection system, benefit from the use of a CNN as it aids
considered separately), by increasing the complexity of the FCN archi­ in reducing the inaccuracies of the segmentation process in FCN which
tecture, or by making the FCN to have deeper layers. As shown in in turn increases the outcome of crack prediction. It remains to be said
Table 7, using both ResNet152 and FCN reduces the likelihood of false that in the future, additional image data for FCN training should be
positive detection. At the same time, the number of true positives is also included to continuously improve the accuracy of crack segmentation.
decreased, thereby reducing the chance of cracks being falsely detected. Furthermore, the prospect of performing end-to-end learning in a multi-
The overall results of the proposed system (based on its Accuracy/Pre­ task setup with a shared encoder and two separate decoders for classi­
cision/Recall/F1 Score) delivers the best performance. fication and segmentation is in the future plan. During inference, the
image would be fed to the encoder to obtain image embedding. This
4.5. Crack mapping embedding could be utilised for classification and re-used (in circum­
stances where cracks are detected) for the segmentation decoder, in
This section presents a method for displaying crack detection results which the encoder would be put to use only once.
in a 3D coordinate system, based on the camera registration and texture
data reading discussed in section 3.2. Fig. 19 exhibits the final result of Declaration of Competing Interest
the project, in which the crack map (Fig. 18) has been collated onto the
3D model using the projection technique. In this research, the projection No conflict of interest exits in the submission of this manuscript, and
technique through camera parameters uses to replace crack map pixel manuscript is approved by all authors for publication. I would like to
data onto the 3D model, which does not impact the 3D model's param­ declare that the work described was original research that has not been
eters. Fig. 19 illustrates a comprehensive and engaging example of crack published previously, and not under consideration for publication
detection on the surface of a 3D model. Appearance aside, this technique elsewhere. We declare that we have no financial and personal re­
may be seminal in perfecting crack maps from individual images, as lationships with other people or organizations that can inappropriately
misalignment from texture space crack maps is significantly reduced. An influence our work, there is no professional or other personal interest of
additional benefit is the universal export of the 3D model as OBJ/ DAE/ any nature or kind in any product, service and/or company that could be
and FBX files for cross-platform displays on applications such as 3D construed as influencing the position presented in the manuscript.
Viewer (free software in Microsoft Windows), Unity and Unreal Engine
(game engine for developers), and Autodesk Revit (3D software for civil Acknowledgements
engineers), etc.
Nevertheless, this process is subject to further development to in­ This research was funded by Thammasat School of Engineering
crease the outcomes of accurately reporting crack detection results. The (TSE), Thammasat University. The authors would like to thank Tham­
success of crack recognition results may also be dependent on the ac­ masat University Research Fund for providing the scholarship and
curacy of a reconstructed 3D model, which requires further study. To support for the research project and Mr. Noppanat Poovarodom to help
further enhance the proposed system, it may include features which with preparing datasets.
automatically identify crack width/ length, and integrate with VR
technology to assist inspectors with verifying crack detection results References
through virtual lenses and so forth. One must bear in mind that complex
3D models may result in overly complicated and distorted mappings, [1] N.S. Grigg, Infrastructure report card: purpose and results, J. Infrastruct. Syst. 21
which creates more challenges for the proposed pipeline. For these (4) (2015) 02514001, https://doi.org/10.1061/(ASCE)IS.1943-555X.0000186.
[2] W. Cook, P.J. Barr, Observations and trends among collapsed bridges in New York
cases, data acquisition becomes more challenging and requires intricate state, J. Perform. Constr. Facil. 31 (4) (2017) 04017011, https://doi.org/10.1061/
UAV flight patterns. These ambitious developments are under consid­ (ASCE)CF.1943-5509.0000996.
eration in field trials and are currently in research. Optimistically, these [3] K. Chaiyasarn, Damage Detection and Monitoring for Tunnel Inspection Based on
Computer Vision, Doctoral Dissertation, University of Cambridge, 2014, https://
targets may be reached in the near future. doi.org/10.17863/CAM.14071.
[4] Y.Z. Lin, Z.H. Nie, H.W. Ma, Structural damage detection with automatic feature-
5. Conclusion extraction through deep learning, Comput. Aided Civ Infrastruct. Eng. 32 (12)
(2017) 1025–1046, https://doi.org/10.1111/mice.12313.
[5] Y.J. Cha, W. Choi, O. Büyüköztürk, Deep learning-based crack damage detection
To conclude the work in this research, the texture space derived from using Convolutional Neural Networks, Comput. Aided Civ Infrastruct. Eng. 32 (5)
a 3D model through camera registration provides a large area for crack (2017) 361–378, https://doi.org/10.1111/mice.12263.
[6] X. Yang, H. Li, Y. Yu, X. Luo, T. Huang, X. Yang, Automatic pixel-level crack
detection in a large structure. In addition to providing a larger inspec­
detection and measurement using Fully Convolutional Network, Comput. Aided
tion area, the proposed system offers effective texture space data Civ Infrastruct. Eng. 33 (12) (2018) 1090–1109, https://doi.org/10.1111/
(Fig. 10), which, when inspected with Deep Learning (Fig. 17), enables mice.12412.
scrutiny at the pixel level. In essence, the research proposes a novel [7] Z.Y. Wu, R. Kalfarisi, F. Kouyoumdjian, C. Taelman, Applying deep convolutional
neural network with 3D reality mesh model for water tank crack detection and
method of inspection reporting which ultimately improves the efficiency evaluation, Urban Water J. 17 (8) (2020) 682–695, https://doi.org/10.1080/
of inspectors while reducing time restraints. Furthermore, the image- 1573062X.2020.1758166.

16
K. Chaiyasarn et al. Automation in Construction 140 (2022) 104388

[8] R. Kalfarisi, Z.Y. Wu, K. Soh, Crack detection and segmentation using deep learning [28] Z.H. Zhu, J.Y. Fu, J.S. Yang, X.M. Zhang, Panoramic image stitching for arbitrarily
with 3D reality mesh model for quantitative assessment and integrated shaped tunnel lining inspection, Comput. Aided Civ Infrastruct. Eng. 31 (12)
visualization, J. Comput. Civ. Eng. 34 (3) (2020) 04020010, https://doi.org/ (2016) 936–953, https://doi.org/10.1111/mice.12230.
10.1061/(ASCE)CP.1943-5487.0000890. [29] M.R. Jahanshahi, S.F. Masri, Adaptive vision-based crack detection using 3D scene
[9] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep reconstruction for condition assessment of structures, Autom. Constr. 22 (2012)
convolutional neural networks, Commun. ACM 60 (6) (2017) 84–90, https://doi. 567–576, https://doi.org/10.1016/j.autcon.2011.11.018.
org/10.1145/3065386. [30] M.M. Torok, M. Golparvar-Fard, K.B. Kochersberger, Image-based automated 3D
[10] L. Zhang, F. Yang, Y.D. Zhang, Y.J. Zhu, Road crack detection using deep crack detection for post-disaster building assessment, J. Comput. Civ. Eng. 28 (5)
convolutional neural network, IEEE Int. Conf. Image Process.(ICIP) (2016) (2014) 4014004, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000334.
3708–3712, https://doi.org/10.1109/ICIP.2016.7533052. [31] Y.F. Liu, X. Nie, J.S. Fan, X.G. Liu, Image-based crack assessment of bridge piers
[11] F.C. Chen, M.R. Jahanshahi, NB-CNN: Deep learning-based crack detection using using unmanned aerial vehicles and three-dimensional scene reconstruction,
Convolutional Neural Network and Naïve Bayes data fusion, IEEE Trans. Ind. Comput. Aided Civ Infrastruct. Eng. 35 (5) (2020) 511–529, https://doi.org/
Electron. 65 (5) (2017) 4392–4400, https://doi.org/10.1109/TIE.2017.2764844. 10.1111/mice.12501.
[12] S. Dorafshan, R.J. Thomas, M. Maguire, Comparison of deep convolutional neural [32] H. Hoppe, Poisson surface reconstruction and its applications, Proc. 2008 ACM
networks and edge detectors for image-based crack detection in concrete, Constr. Symp. Solid Phys. Model. (2008) 1–10, https://doi.org/10.1145/
Build. Mater. 186 (2018) 1031–1045, https://doi.org/10.1016/j. 1364901.1364904.
conbuildmat.2018.08.011. [33] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.
[13] A. Zhang, K.C. Wang, Y. Fei, Y. Liu, S. Tao, C. Chen, J.Q. Li, B. Li, Deep learning- Comput. Vis. 60 (2) (2004) 91–110, https://doi.org/10.1023/B:
based fully automated pavement crack detection on 3D asphalt surfaces with an VISI.0000029664.99615.94.
improved CrackNet, J. Comput. Civ. Eng. 32 (5) (2018) 04018041, https://doi. [34] A.M. Andrew, Multiple View Geometry in Computer Vision, Kybernetes, 2001,
org/10.1061/(ASCE)CP.1943-5487.0000775. https://doi.org/10.1108/k.2001.30.9_10.1333.2.
[14] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale [35] D. Nistér, An efficient solution to the five-point relative pose problem, IEEE Trans.
image recognition. http://doi.org/10.48550/arXiv.1409.1556. Pattern Anal. Mach. Intell. 26 (6) (2004) 756–770, https://doi.org/10.1109/
[15] A. Mollahosseini, D. Chan, M.H. Mahoor, Going deeper in facial expression TPAMI.2004.17.
recognition using deep neural networks, Ieee Winter Conf. Appl. Comput. Vis. [36] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting
(WACV) (2016) 1–10, https://doi.org/10.1109/WACV.2016.7477450. with applications to image analysis and automated cartography, Commun. ACM 24
[16] S. Wu, S. Zhong, Y. Liu, Deep residual learning for image steganalysis, Multimed. (6) (1981) 381–395, https://doi.org/10.1145/358669.358692.
Tools Appl. 77 (9) (2018) 10437–10453, https://doi.org/10.1007/s11042-017- [37] Y. Furukawa, B. Curless, S.M. Seitz, R. Szeliski, Towards internet-scale multi-view
4440-4. stereo, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. (2010) 1434–1441,
[17] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale https://doi.org/10.1109/CVPR.2010.5539802.
hierarchical image database, IEEE Conf. Comput. Vis. Pattern Recogn. (2009) [38] H. Hoppe, Poisson surface reconstruction and its applications, Proc. 2008 ACM
248–255, https://doi.org/10.1109/CVPR.2009.5206848. Symp. Solid Phys. Model. (2008) 1–10, https://doi.org/10.1145/
[18] K. Gopalakrishnan, S.K. Khaitan, A. Choudhary, A. Agrawal, Deep convolutional 1364901.1364904.
neural networks with transfer learning for computer vision-based data-driven [39] Agisoft Metashape, Website, Access Date: 2 February 2022. https://www.agisoft.
pavement distress detection, Constr. Build. Mater. 157 (2017) 322–330, https:// com.
doi.org/10.1016/j.conbuildmat.2017.09.110. [40] R. Szeliski, Computer Vision: Algorithms and Applications, Springer Science &
[19] K. Zhang, H.D. Cheng, B. Zhang, Unified approach to pavement crack and sealed Business Media, 2010, https://doi.org/10.1007/978-1-84882-935-0.
crack detection using preclassification based on transfer learning, J. Comput. Civ. [41] C.F. Özgenel, Concrete Crack Images for Classification, Mendeley data, v1, 2018,
Eng. 32 (2) (2018) 04018001, https://doi.org/10.1061/(ASCE)CP.1943- https://doi.org/10.17632/5y9wdsg2zt.1.
5487.0000736. [42] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a
[20] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder- simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1)
decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. (2014) 1929–1958, https://doi.org/10.5555/2627435.2670313.
Intell. 39 (12) (2017) 2481–2495, https://doi.org/10.1109/TPAMI.2016.2644615. [43] Z. Liu, Y. Cao, Y. Wang, W. Wang, Computer vision-based concrete crack detection
[21] W. Sun, R. Wang, Fully convolutional networks for semantic segmentation of very using U-net Fully Convolutional Networks, Autom. Constr. 104 (2019) 129–139,
high resolution remotely sensed images combined with DSM, IEEE Geosci. Remote https://doi.org/10.1016/j.autcon.2019.04.005.
Sens. Lett. 15 (3) (2018) 474–478, https://doi.org/10.1109/LGRS.2018.2795531. [44] A.F. Agarap, Deep Learning using Rectified Linear Units (relu), 2018, https://doi.
[22] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical org/10.48550/arXiv.1803.08375.
image segmentation, International Conference on Medical image computing and [45] D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and Accurate Deep Network
computer-assisted intervention, Springer, Cham (2015) 234–241, https://doi.org/ Learning by Exponential Linear Units (elus), 2015, https://doi.org/10.48550/
10.1007/978-3-319-24574-4_28. arXiv.1511.07289.
[23] C.V. Dung, Autonomous concrete crack detection using deep fully convolutional [46] K.R. Oskal, M. Risdal, E.A. Janssen, E.S. Undersrud, T.O. Gulsrud, A U-net based
neural network, Autom. Constr. 99 (2019) 52–58, https://doi.org/10.1016/j. approach to epidermal tissue segmentation in whole slide histopathological images
autcon.2018.11.028. SN, Appl. Sci. 1 (7) (2019) 1–12, https://doi.org/10.1007/s42452-019-0694-y.
[24] Y. Liu, J. Yao, X. Lu, R. Xie, L. Li, DeepCrack: a deep hierarchical feature learning [47] J. Bjorck, C. Gomes, B. Selman, K.Q. Weinberger, Understanding Batch
architecture for crack segmentation, Neurocomputing 338 (2019) 139–153, Normalization, 2018, https://doi.org/10.48550/arXiv.1806.02375.
https://doi.org/10.1016/j.neucom.2019.01.036. [48] D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, 2014. doi:10.48
[25] Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, S. Wang, Deepcrack: learning hierarchical 550/arXiv.1412.6980.
convolutional features for crack detection, IEEE Trans. Image Process. 28 (3) [49] K. Chaiyasarn, T.K. Kim, F. Viola, R. Cipolla, K. Soga, Distortion-free image
(2018) 1498–1512, https://doi.org/10.1109/TIP.2018.2878966. mosaicing for tunnel inspection based on robust cylindrical surface estimation
[26] Q. Zhu, B. Du, B. Turkbey, P.L. Choyke, P. Yan, Deeply-supervised CNN for prostate through structure from motion, J. Comput. Civ. Eng. 30 (3) (2016) 04015045,
segmentation, International joint conference on neural networks (IJCNN), IEEE https://doi.org/10.1061/(ASCE)CP.1943-5487.0000516.
(2017) 178–184, https://doi.org/10.1109/IJCNN.2017.7965852. [50] Y. Liu, J. Yao, X. Lu, M. Xia, X. Wang, Y. Liu, RoadNet: Learning to
[27] F. Yang, L. Zhang, S. Yu, D. Prokhorov, X. Mei, H. Ling, Feature pyramid and comprehensively analyze road networks in complex urban scenes from high-
hierarchical boosting network for pavement crack detection, IEEE Trans. Intell. resolution remotely sensed images, IEEE Trans. Geosci. Remote Sens. 57 (4) (2018)
Transp. Syst. 21 (4) (2019) 1525–1535, https://doi.org/10.1109/ 2043–2056, https://doi.org/10.1109/TGRS.2018.2870871.
TITS.2019.2910595.

17

You might also like