0% found this document useful (0 votes)

41 views

Research Article: Concrete Cracks Detection Using Convolutional Neural Network Based On Transfer Learning

Uploaded by

Vihanga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Research Article: Concrete Cracks Detection Using Convolutional Neural Network Based On Transfer Learning

Uploaded by

Vihanga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Hindawi

Mathematical Problems in Engineering

Volume 2020, Article ID 7240129, 10 pages
https://doi.org/10.1155/2020/7240129

Research Article
Concrete Cracks Detection Using Convolutional Neural
Network Based on Transfer Learning

Chao Su and Wenjun Wang

College of Water Conservancy and Hydropower Engineering, Hohai University, Nanjing 210098, China

Correspondence should be addressed to Wenjun Wang; [email protected]

Received 5 August 2020; Revised 21 September 2020; Accepted 26 September 2020; Published 17 October 2020

Academic Editor: Zheng-zheng Wang

Copyright © 2020 Chao Su and Wenjun Wang. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Crack plays a critical role in the field of evaluating the quality of concrete structures, which affects the safety, applicability, and
durability of the structure. Due to its excellent performance in image processing, the convolutional neural network is becoming
the mainstream choice to replace manual crack detection. In this paper, we improve the EfficientNetB0 to realize the detection of
concrete surface cracks using the transfer learning method. The model is designed by neural architecture search technology. The
weights are pretrained on the ImageNet. Supervised learning uses Adam optimizer to update network parameters. In the testing
process, crack images from different locations were used to further test the generalization capability of the model. By comparing
the detection results with the MobileNetV2, DenseNet201, and InceptionV3 models, the results show that our model greatly
reduces the number of parameters while achieving high accuracy (0.9911) and has good generalization capability. Our model is an
efficient detection model, which provides a new option for crack detection in areas with limited computing resources.

1. Introduction processing techniques, including thresholding [5, 6], edge

detection [7], wavelet transforms [8, 9], and machine
In the current infrastructure, the concrete structure accounts learning [10] and so on. Image thresholding divides the
for the largest proportion. For the concrete structure, cracks crack in the image at the pixel level according to the features
are a frequently encountered disease. With the increase in of different pixel values, which makes the image simple and
service time, the number and width of cracks show a gradual facilitates further processing. Differential operators used for
increasing trend, which seriously affects the safety, appli- edge detection mainly contain Roberts operator, Sobel op-
cability, and durability of the structure. Therefore, it is of erator, and Laplace operator [11]. The basic idea of wavelet
great significance to detect cracks regularly and takes cor- transform is using a set of wavelet functions or basis
responding maintenance measures for the safety of the functions to represent a function or signal, such as an image
concrete structures [1–3]. The traditional crack detection signal. Machine learning extracts feature vectors of crack
method is mainly based on the direct detection of profes- from the training dataset and combines certain algorithms to
sionals with related instruments. This detection method is make prediction. These methods solve the problem of crack
not only labor-consuming but also time wasting. In addition, detection in engineering effectively. However, due to the
it also brings great hidden dangers to the safety of people. unevenness of cracks, the diversity of surface textures, and
In order to find an efficient and safe crack detection the complexity of the background, this field of research is
method and overcome various shortcomings of manual still active.
detection, people turned their attention to image processing As a new technology in the research of machine learning
technologies (IPTs) [4]. In the past decades, the computer algorithms, deep learning is motivated by the establishment
vision community has been working on the automated and simulation of a neural network for analyzing problems
detection of images and proposed a series of image and learning like the human brain. Through typical feature
2 Mathematical Problems in Engineering

learning [12], each layer of the network takes the output of are 97.96%, 81.73%, 78.97%, and 79.95%, respectively. Zhu
the previous layer as its own input, learns a deep nonlinear and Song [22] used the transfer learning method to improve
network structure, and transforms the specific raw data into VGG16 and realized the accurate classification of surface
a more abstract expression model. The rapid expansion of defects on concrete bridges. The training of convolutional
effective datasets, the realization of high-performance neural networks usually requires a large number of data, but
computing hardware, and the continuous improvement of in many cases, it is more expensive to obtain large-scale data.
training methods are driving the rapid development of deep The pretrained model can be transferred to the task of crack
learning. In 2012, AlexNet [13] won the championship with detection by means of transfer learning. The results show
an overwhelming advantage in ImageNet Large-Scale Visual that the model can effectively extract defect features and
Recognition Challenge (ILSVRC). Convolutional neural provide a new idea for surface defect detection. Deng et al.
network (CNN) begins to attract the attention of many [23] added a region-based deformable module to Faster
researchers since then. CNN is a feed-forward neural net- R-CNN [24], R-FCN [25], and FPN-based Faster R-CNN
work, and the connection between neurons is inspired by the [26] to improve the evaluation accuracy of crack detection.
animal visual cortex. It has the characteristics of local In this paper, we use the transfer learning method to
connectivity and parameter sharing and has excellent per- build a model for concrete surface crack detection. Com-
formance in large-scale image processing [14]. In recent pared with existing models, our model achieves a good
years, researchers have begun to use convolutional neural balance among accuracy, model size, and training speed.
networks to detect road defects automatically. The following Due to the use of transfer learning, the model becomes easier
is a brief overview of the application of CNN in crack to train and faster to converge and has better generalization
detection. capability.
Zhang et al. [15] proposed a crack detection method The remaining of this paper is structured as follows:
based on deep learning, which seems to be one of the earliest Section 2 describes the dataset and image preprocessing
works applying CNN to road crack detection. The pavement method; Section 3 presents the overall model architecture
pictures are taken by smartphones, and the network model is and training details; Section 4 shows our experimental re-
built on the Caffe DeepLearning (DL) framework. By sults; and Section 5 delivers the conclusion of this paper.
comparing with traditional machine learning classifiers such
as support vector machine (SVM) and boosting methods, the 2. Dataset and Data Augmentation
author proved the effectiveness of deep learning methods.
Pauly et al. [16] studied the influence of CNN depth and the 2.1. Building Dataset. In this study, we use the dataset
position change between training dataset and test dataset on collected by Li and Zhao [27]. The photos in this dataset were
pavement crack detection accuracy. The results show that obtained by a smartphone with a resolution of 4160 × 3120
increasing the network depth can improve the network pixels from the surface of a pylon and anchor room of a
performance, but when the image position changes, the suspension bridge in Dalian, Liaoning, China. Then, images
detection accuracy will be greatly reduced. Maeda et al. [17] are cropped to 256 × 256 pixel resolutions. After cropping,
created a large-scale road damage dataset and marked the the images are manually divided into two categories: with
location and type of road damage in each picture. Finally, an cracks and without cracks. In this study, we only use 12,000
end-to-end object detection method based on deep learning photos of the dataset, and the number of crack and noncrack
was used to train the damage detection model. Maeda et al. images are set to equal. These images include crack features
also transplanted their model into a mobile phone appli- and background features under various conditions. The
cation to facilitate road damage detection in areas lacking 12,000 selected images are divided into the training set,
experts and financial resources. Xu et al. [18] established an validation set, and test set at the ratio of 6 : 2 : 2. The number
end-to-end bridge crack detection model to realize auto- of crack and noncrack images in the three datasets are set to
matic bridge crack detection. The use of depthwise separable equal. In addition, we also select 1,000 concrete bridge
convolution reduces the number of parameters effectively. images with cracks and 1,000 images without cracks from
The atrous spatial pyramid pooling (ASPP) module extracts the SDNET2018 [28] dataset. This will introduce various
information at multiple scales. The model achieves a de- changes, such as changes in lighting conditions and the
tection accuracy of 96.37% without pretraining. Li et al. [19] features of cracks and crack surface texture to further test the
proposed the YOLOv3-Lite method, which greatly improves generalization capability of the model and make a more
the crack detection speed without reducing the detection comprehensive evaluation of the model. Figure 1 shows
accuracy. Tong et al. [20] used convolutional neural net- several crack and noncrack images in the two datasets.
works (CNNs) to detect, locate, and measure ground pen-
etrating radar images automatically and finally reconstruct
concealed cracks in three dimensions, realizing a low-cost 2.2. Data Augmentation. The generalization capability of the
damage characterization method. Yang et al. [21] realized neural network model is closely related to the number of
the pixel-level detection of cracks based on a fully con- training datasets. But in reality, the amount of data is limited.
volutional neural network. The fully convolutional neural One way to solve this problem is to create fake data and add
network is composed of upsampling and downsampling and it to the training set—data augmentation [29]. By allowing
can detect objects at different scales. In terms of crack limited data to generate more equivalent data to artificially
segmentation, the accuracy, precision, recall, and F1-score expand the training dataset, data augmentation can also
Mathematical Problems in Engineering 3

(a)

(b)

Figure 1: (a) Sample noncrack and crack images from the original dataset; (b) sample noncrack and crack images from the SDNET2018
dataset.

effectively overcome the overfitting phenomenon. Cur- this article uses the Swish activation function. The definition
rently, it is widely used in various fields of deep learning. At of Swish is defined as follows:
present, the commonly used methods of data augmentation
f(x) � x · σ(βx) , (1)
in the field of computer vision mainly include data aug-
mentation based on image processing techniques and data where σ(t) � (1 + exp (−t)− 1 ) and β is either a constant or a
augmentation based on deep learning. trainable parameter.
In this paper, we use the built-in ImageDataGenerator Figure 3 shows the graph of Swish for different values of
interface of Tensorflow2.0 to enhance the input image data, β.
including image flip, rotation, shift, and other operations.
Figure 2 shows several crack images after data augmentation.
3.3. Architecture Description. EfficientNet [34] was proposed
3. Model Construction and Training by Google in 2019 which has great capability of feature
extraction. Compared with other classic convolutional
3.1. The CNN. Convolutional neural network is a special neural networks, it has fewer parameters and higher accu-
kind of neural network. Its main feature is convolution racy. The baseline network of EfficientNet is designed using
operation, which has excellent performance for large-scale multiobjective neural architecture search, and then, the
image processing. Generally speaking, a convolutional baseline network is scaled in terms of depth, width, and
neural network is a hierarchical model that extracts the resolution to achieve a balance among them. The compound
original data (such as RGB images) from the input layer scaling method is defined as follows:
through a series of operations such as convolution, pooling,
and nonlinear activation function mapping. Abstract layer depth � αϕ , (2)
by layer, extract feature information, and finally make
predictions. Deep convolutional neural networks have be-
come popular since 2012, and now, they have become a width � βϕ , (3)
pivotal research topic in the field of artificial intelligence.
Classical convolutional neural network includes AlexNet resolution � cϕ , (4)
[13], VGG [30], GoogLeNet [31], and ResNet [32]. A layer is
the basic calculation unit of a CNN. CNN is mainly com- α · β2 · c2 ≈ 2, (5)
posed of input layer, convolution layer, activation function,
pooling layer, fully connected layer, and Softmax layer. α ≫ 1,
β ≫ 1, (6)
3.2. Swish Activation Function. Ramachandran et al. [33] c ≫ 1,
proposed a Swish activation function using a combination of
exhaustive and reinforcement learning-based search. The where α, β, and c can be calculated by a small grid search.
effectiveness of this activation function has been verified in Firstly, EfficientNetB0 performs a 3 × 3 convolution
some large neural networks. The EfficientNet model used in operation on the input image, and then, the next 16 mobile
4 Mathematical Problems in Engineering

(a) (b)

Figure 2: Images after data augmentation: (a) original image; (b) horizontal ﬂip; (c) rotation; (d) horizontal shift.

can be obtained in the fully connected layer. After each

3
convolution operation in the network, batch normalization
is performed. The activation function used in this network is
2
Swish. The overall architecture of EfficientNetB0 is shown in
Figure 4.
1 The core component of the network is a mobile inverted
bottleneck convolution module (MBConv). Figure 5 shows
0 the framework of this module. The design of this module is
inspired by inverted residual and residual structure. Before
–1 performing on 3 × 3 or 5 × 5 convolution, the dimension of
images is increased via 1 × 1 convolution in order to extract
–2 more feature information. The Squeeze-and-Excitation (SE)
–5 –4 –3 –2 –1 0 1 2 3 [35] model is added after 3 × 3 or 5 × 5 convolution oper-
ation to further improve performance. Finally, 1 × 1 con-
β = 0.1 volution operation is used to reduce the dimension, and a
β = 1.0 residual connection is added.
β = 10.0 The Squeeze-and-Excitation block compresses the fea-
Figure 3: The Swish activation function. ture map, performs a global average pooling operation in the
direction of the channel dimension, and performs an ex-
citation operation on the global feature to learn the rela-
inverted bottleneck convolution modules are used to further tionship in each channel. Then, it obtains the weights of
extract image features. Finally, after 1 × 1 convolution and different channels through the sigmoid activation function.
global average pooling operations, the classification results Finally, the weights multiply the original feature map to get
Mathematical Problems in Engineering 5

Input: 224 × 224 × 3 Input

14 × 14 × 112
Conv, 3 × 3 MBConv6, 5 × 5
112 × 112 × 32 7 × 7 × 192
MBConv1, 3 × 3 Conv2d, 1 × 1
MBConv6, 5 × 5
112 × 112 × 16 Batch normalization
MBConv6, 3 × 3 7 × 7 × 192
56 × 56 × 24 MBConv6, 5 × 5 Swish
MBConv6, 3 × 3 7 × 7 × 192
56 × 56 × 24
MBConv6, 5 × 5
MBConv6, 5 × 5
7 × 7 × 192 DepthwiseConv2d, 3 × 3 or 5 × 5
28 × 28 × 40
MBConv6, 3 × 3
MBConv6, 5 × 5 Batch normalization
7 × 7 × 320
28 × 28 × 40 Swish
Conv, 1 × 1
MBConv6, 3 × 3
7 × 7 × 1280
28 × 28 × 80
MBConv6, 3 × 3 Global Average
Pooling2D
28 × 28 × 80
1 × 1280
MBConv6, 3 × 3 SENT
FC, 2
28 × 28 × 80
MBConv6, 5 × 5 1×2
14 × 14 × 112 Softmax

MBConv6, 5 × 5
Output Conv2d, 1 × 1
14 × 14 × 112
MBConv6, 5 × 5 Batch normalization

Figure 4: The EﬃcientNetB0 network architecture.

+
the ﬁnal feature. The block allows the model to pay more
attention to the channel features which has the most in-
formation, while suppressing those that are not important.
Figure 6 illustrates the detail of the Squeeze-and-Excitation
block. Output

Figure 5: Illustration of MBConv’s framework.

3.4. Loss Function and Adam
3.4.1. Loss Function. The loss function is mainly used to
evaluate the effect of the model. For a large amount of
information, the machine discovers the laws through au-
tonomous learning and makes predictions. The loss function
GlobalAveragePooling2D
is used to measure the degree of deviation between the
predicted result and the actual value. During the network Conv, 1 × 1
training process, the function is continuously updated until
the best fitting result is found to reduce the error. Swish
The cross-entropy loss function, also known as the
Conv, 1 × 1
Softmax function, is widely used to measure the gap between
the predicted value and the actual value when the con- Sigmoid
volutional neural network deals with classification problems.
The Softmax function is defined as follows:
1 N eak X
Lsoftmax loss � − 􏽘 log􏼠 􏼡, (7)
N i�1 􏽐j eaj

where N represents the number of neurons in the output

layer. ak is the input signal. Figure 6: Squeeze-and-excitation block.
6 Mathematical Problems in Engineering

3.4.2. Adam. When training the model effectively and do data augmentation via image flip, translation, ro-
making accurate predictions, the internal parameters of the tation, and other operations. It is worth noting that, due
model play a significant role. This is why we ought to choose to the use of TensorFlow’s built-in function, a large
a suitable optimizer to update network parameters to ap- number of pictures generated by data augmentation
proximate or reach the optimal value. The optimization will not be saved to the local computer. All pictures are
algorithm helps to minimize the loss function, update the online.
model parameters, and finally reach converge. In this article, Step 3: define the structure of the crack detection model
the Adam algorithm is utilized to update weights.
The pretrained model (such as EfficientNetB0) is
Kingma [36] utilizes exponentially moving averages to
loaded and fine-tuned. We remove the last fully con-
estimate the moments:
nected layer and replace it with a custom layer. In this
mt � β1 mt−1 + 1 − β1 􏼁 · gt , (8) article, the number of classifications in the custom layer
is set to 2. The value of weights in other layers will not
]t � β2 ]t−1 + 1 − β2 􏼁 · g2t , (9) change.
Step 4: compile the model and start training
where m and ] are moving averages, gt is gradient, and
Before training the model, it is necessary to specify the
β1 and β2 are the decay rates of moment estimate (setting to
hyperparameters related to the network structure and
0.9 and 0.999, respectively).
select the appropriate optimization strategy. In this
Then, we do bias correction:
experiment, a random batch training method is used to
mt train the neural network. The dataset is randomly
􏽣t �
m , (10)
1 − βt1 shuffled before each epoch of training to ensure that the
same batch of data in each epoch of model training is
]t different, which can increase the rate of model con-
]􏽢t � . (11)
1 − βt2 vergence. The learning rate plays a significant role in the
training of model. Choosing an appropriate learning
The final formula for weight update is rate can speed up the model’s convergence speed; on
􏽣
m the contrary, it may cause the loss value of objective
ωt+1 � ωt − lr · 􏽰�� t , (12) function to explode. Since the transfer learning method
]􏽢t + ϵ
is used, the network has converged on the original
where lr is the learning rate and ϵ is the hyperparameter dataset, so the initial learning rate in this experiment is
(setting to 1e − 3). set to 5e − 7. When the validation loss does not decrease
for two consecutive times, the model learning rate will
be reduced to half. We select the adaptive learning
3.5. Training. The convolutional neural network models rate optimization algorithm Adam and use the cross-
used in this article use transfer learning methods to detect entropy loss function to guide the model training. The
concrete surface crack. Specifically, the weight of the model initialization method of weights is as follows: initialize
is trained on the ImageNet and saved and then transferred to the weights of newly added fully connected layer
our model. Therefore, our model has a higher starting point, randomly and initialize the remaining weights with the
which greatly saves training time and obtains better pretrained weights. Figure 7 shows the process of
performance. model adjustment and weights initialization.
All the experiments in this paper are performed on
In order to solve the problem of how long to train the
TensorFlow in Windows system: hardware settings: CPU:
model, we adopt an early stopping strategy to avoid
Intel (R) Core (TM) [email protected] GHz, RAM: 16G, and
overtraining the network. During the training process,
GPU: NVIDIA GTX1080Ti.
we monitor the validation loss. Once the validation loss
Pretrained model is a model that has been trained and
drops less than 1e − 3 for 30 epochs, the model will stop
saved in advance on a large dataset. In order to realize the
detection of concrete surface cracks, we need to retrain the training.
pretrained convolutional neural network model. In addition, Step 5: test the performance of the model.
the last fully connected layer of the original model is replaced Test the performance of the model on the test dataset.
by a new fully connected layer. The specific experimental In addition, we also select 1000 crack and 1000 nocrack
steps are as follows: images of concrete bridge decks from the SDNET2018
Step 1: data loading dataset to construct a completely different dataset so
that we can further evaluate the detection performance
Importing concrete surface crack images. A batch of of the model.
data is randomly loaded from the training set (batch
size: 16) for subsequent data processing.
Step 2: image preprocessing
4. Experimental Results and Analyses
We use built-in function in TensorFlow to adjust the 4.1. Experimental Results and Evaluation Index. When doing
size of input image to the fixed size of the model. Then, image classification tasks, in order to evaluate the
Mathematical Problems in Engineering 7

New fully
connected
layer (random
initialization)

Layer 1 Layer 2 Layer n

(initialized by (initialized by (initialized by Softmax()
……
pretrained pretrained pretrained
weights) weights) weights)

Fully connected
layer (initialized
by pretrained
weights)

Figure 7: Transfer learning.

TP: p = 99.6% TP: p = 82.0% TP: p = 99.4% TP: p = 99.9%

(a) (b) (c) (d)

Figure 8: Detection of crack. TP denotes true positive.

performance of different algorithms, some evaluation same dataset. Figure 10 shows the change in loss and ac-
metrics need to be selected. Accuracy refers to the ratio of curacy during the training and validation process. The
number of correctly predicted crack and noncrack images to number of parameters of each model can be seen in Table 1.
the total number of input images. Precision can be un- The results of different methods are compared in Table 2.
derstood as the number of correctly predicted crack images Table 3 compares the training time of four models. Table 4
divided by the number of crack images predicted by the shows the size of four models. We can see that, in contrast to
classifier. Recall is the percentage of the number of correctly the other three models, MobileNetV2 [37] has the smallest
predicted crack images to the total number of cracked number of parameters in the task of detecting concrete
images. F1−score is the harmonic mean of precision and recall. surface cracks, but its test accuracy is obviously lower than
Accuracy, Precision, Recall, and F1−score are defined as the other three models. Although the accuracy of Effi-
follows: cientNetB0 on the test dataset is slightly lower than Den-
TP + TN seNet201 [38] (0.21%), its model size is 3.5x smaller and 2.5x
Accuracy � , (13) faster (average training time of each epoch). Its parameters
TP + FP + TN + FN
are reduced by 77.89% at the same time. EfficientNetB0
TP achieves a good balance among accuracy, model size, and
Precision � , (14) training speed. In terms of crack detection task, the model is
TP + FP
quite efficient. It can also be seen from Table 2 that when
TP tested on a dataset which is quite different from the training
Recall � , (15) dataset, the performance of the network drops a little. This
TP + FN
drop is mainly caused by the changes in background con-
Precision · Recall ditions of the images in the dataset, and some image features
F1−score � 2 · , (16) have not been well learned by the network.
Precision + Recall
It is worth noting that, in Figure 10, the accuracy of
where TP and TN mean images with crack and without validation during the training and the validation process is
crack, which are correctly classified. FP and FN mean images slightly higher than that of the training. Two reasons can
with crack and without crack which are wrongly classified. account for this phenomenon. On the one hand, we use the
Figures 8 and 9 show the images, together with the transfer learning method to train the model. The network
respective probability of correct classification. was initialized with pretrained weights (pretrained on
ImageNet). Therefore, the model has a better feature ex-
traction ability and features are more effective. On the other
4.2. Comparative Experiments. In order to verify the per- hand, this phenomenon results from the use of dropout
formance of the model, we compare the proposed model since its behavior during training and validation is different.
with other classic convolutional neural networks on the Dropout forces the neural network to become a very large set
8 Mathematical Problems in Engineering

TP: p = 97.5% TP: p = 99.7% TP: p = 98.2% TP: p = 97.9%

(a) (b) (c) (d)

Figure 9: Detection of noncrack. TN denotes true negative.

1.0 1.0

0.9 0.8

0.8 0.6
Accuracy

Loss

0.7 0.4

0.6 0.2

0.5 0.0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Epochs Epochs

MobileNetV2 training MobileNetV2 validation MobileNetV2 training MobileNetV2 validation

EfficientNetB0 training EfficientNetB0 validation EfficientNetB0 training EfficientNetB0 validation
InceptionV3 training InceptionV3 validation InceptionV3 training InceptionV3 validation
DenseNet201 training DenseNet201 validation DenseNet201 training DenseNet201 validation
(a) (b)

Figure 10: Accuracy (a) and loss (b) during training and validation.

Table 1: Parameters of diﬀerent models.

Recognition model Number of parameters
MobileNetV2 2,260,546
EﬃcientNetB0 4,052,126
DenseNet201 18,325,826
InceptionV3 21,806,882

Table 2: Comparison of four models’ crack detection results.

Recognition model Accuracy Precision Recall F1 score Accuracy (tested on diﬀerent datasets)
MobileNetV2 0.9782 0.9821 0.9740 0.9781 0.9655
EﬃcientNetB0 0.9911 0.9878 0.9945 0.9912 0.9737
DenseNet201 0.9932 0.9892 0.9973 0.9932 0.9749
InceptionV3 0.9898 0.9891 0.9904 0.9898 0.9715

of weak classifiers, which means that a single classifier does During training, dropout cuts off the random set of these
not have too high classification accuracy, and only when we classifiers, so the training accuracy will be affected. During
connect them together will they become more powerful. validation, dropout will automatically turn off and allow all
Mathematical Problems in Engineering 9

Table 3: Comparison of four models’ training time.

Model Training epochs Total training time Average training time of each epoch (s)
MobileNetV2 79 2 h 1 m 47 s 92.49
EﬃcientNetB0 76 3 h 14 m 32 s 153.58
DenseNet201 65 9 h 47 m 8 s 541.97
InceptionV3 69 3 h 46 m 14 s 196.72

Table 4: Size of four models. [2] I.-H. Kim, H. Jeon, S.-C. Baek, W.-H. Hong, and H.-J. Jung,
“Application of crack identification techniques for an aging
Model Size (M) concrete bridge inspection using an unmanned aerial vehicle,”
MobileNetV2 8.8 Sensors, vol. 18, no. 6, p. 1881, 2018.
EfficientNetB0 15.7 [3] T. Liu, H. Huang, and Y. Yang, “Crack detection of reinforced
DenseNet201 70.6 concrete member using rayleigh-based distributed optic fiber
InceptionV3 83.5 strain sensing system,” Advances in Civil Engineering,
vol. 2020, Article ID 8312487, 11 pages, 2020.
[4] T. Yamaguchi, S. Nakamura, R. Saegusa, and S. Hashimoto,
weak classifiers in the neural network to be used, so the “Image-based crack detection for real concrete surfaces,” IEEJ
validation accuracy is improved. Transactions on Electrical and Electronic Engineering, vol. 3,
no. 1, pp. 128–135, 2008.
5. Conclusions [5] Y.-C. Tsai, V. Kaul, and R. M. Mersereau, “Critical assessment
of pavement distress segmentation methods,” Journal of
In this paper, a concrete surface crack detection model based on Transportation Engineering, vol. 136, no. 1, pp. 11–19, 2010.
transfer learning and convolutional neural network is proposed. [6] D. Zhang, Q. Li, Y. Chen, M. Cao, L. He, and B. Zhang, “An
EfficieneNetB0 is a highly effective convolutional neural net- efficient and reliable coarse-to-fine approach for asphalt
pavement crack detection,” Image and Vision Computing,
work. The last fully connected layer is replaced by a new fully
vol. 57, pp. 130–146, 2017.
connected layer with a classification number of 2. The newly [7] A. Ayenu-Prah and N. Attoh-Okine, “Evaluating pavement
added fully connected layer is initialized randomly, and the cracks with bidimensional empirical mode decomposition,”
remaining weights are initialized with pretrained weights. Fi- EURASIP Journal on Advances in Signal Processing, vol. 2008,
nally, by comparing with other models, the results show that our pp. 1–7, 2008.
model achieves a good balance among accuracy, model size, and [8] P. Subirats, J. Dumoulin, V. Legeay, and D. Barba, “Auto-
training speed. In addition, when tested on crack images taken mation of pavement surface crack detection using the con-
from other places, our model also shows good performance and tinuous wavelet transform,” in Proceedings of the
generalization capability. Our model is an efficient crack de- International Conference on Image Processing, pp. 3037–3040,
tection model, which is a good choice for areas with limited IEEE, Atlanta, GA, USA, October 2006.
computing resources. Traditional crack detection mostly pays [9] L. Ying and E. Salari, “Beamlet transform-based technique for
attention to how to identify the cracks in the image. In addition, pavement crack detection and classification,” Computer-
Aided Civil and Infrastructure Engineering, vol. 25, no. 8,
it is also important to characterize the severity of the cracks,
pp. 572–580, 2010.
which is an area that is often overlooked. We will be devoted to
[10] A. Hizukuri and T. Nagata, “Development of a classification
this work in future research. method for a crack on a pavement surface images using
machine learning,” in Proceedings of the Thirteenth Interna-
Data Availability tional Conference on Quality Control by Artificial Vision,
Tokyo, Japan, May 2017.
The codes used in this paper are available from the author [11] P. P. Acharjya, R. Das, and D. Ghoshal, “Study and com-
upon request. parison of different edge detectors for image segmentation,”
Global Journal of Computer Science and Technology, vol. 12,
Conflicts of Interest no. 13, 2012.
[12] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
The authors declare that there are no conflicts of interest vol. 521, no. 7553, pp. 436–444, 2015.
regarding the publication of this paper. [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,” in
Acknowledgments Proceedings of the Advances in Neural Information Processing
Systems, pp. 1097–1105, Lake Tahoe, NV, USA, December
This study was funded by the National Natural Science 2012.
Foundation of China (grant no. 51579089). [14] M. Valueva, N. Nagornov, P. Lyakhov, G. Valuev, and
N. Chervyakov, “Application of the residue number system to
References reduce hardware costs of the convolutional neural network
implementation,” Mathematics and Computers in Simulation,
[1] D. G. Aggelis, N. Alver, and H. K. Chai, “Health monitoring of vol. 177, 2020.
civil infrastructure and materials,” Scientific World Journal, [15] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack
vol. 2014, Article ID 435238, 2 pages, 2014. detection using deep convolutional neural network,” in
10 Mathematical Problems in Engineering

Proceedings of the IEEE International Conference on Image [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
Processing (ICIP), pp. 3708–3712, IEEE, Phoenix, AZ, USA, for image recognition,” in Proceedings of the IEEE Conference
September 2016. on Computer Vision and Pattern Recognition, pp. 770–778,
[16] L. Pauly, D. Hogg, R. Fuentes, and H. Peel, “Deeper networks Boston, MA, USA, June 2016.
for pavement crack detection,” in Proceedings of the 34th [33] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for
ISARC, IAARC, Taipei, Taiwan, pp. 479–485, June 2017. activation functions,” 2017, https://arxiv.org/abs/1710.05941.
[17] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, [34] M. Tan and Q. V. Le, “Efficientnet: rethinking model scaling
“Road damage detection using deep neural networks with for convolutional neural networks,” 2019, https://arxiv.org/
images captured through a smartphone,” 2018, https://arxiv. abs/1905.11946.
org/abs/1801.09454. [35] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation net-
[18] H. Xu, X. Su, Y. Wang, H. Cai, K. Cui, and X. Chen, “Au- works,” in Proceedings of the IEEE Conference on Computer
tomatic bridge crack detection using a convolutional neural Vision and Pattern Recognition, pp. 7132–7141, Salt Lake City,
network,” Applied Sciences, vol. 9, no. 14, p. 2867, 2019. UT, USA, June 2018.
[19] Y. Li, Z. Han, H. Xu, L. Liu, X. Li, and K. Zhang, “YOLOv3- [36] D. P. Kingma and J. Ba, “Adam: a method for stochastic
lite: a lightweight crack detection network for aircraft optimization,” 2014, https://arxiv.org/pdf/1412.6980.pdf.
structure based on depthwise separable convolutions,” Ap- [37] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and
plied Sciences, vol. 9, no. 18, p. 3781, 2019. L.-C. Chen, “Mobilenetv2: inverted residuals and linear
[20] Z. Tong, J. Gao, and H. Zhang, “Recognition, location, bottlenecks,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 4510–4520, Salt
measurement, and 3D reconstruction of concealed cracks
Lake City, UT, USA, June 2018.
using convolutional neural networks,” Construction and
[38] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
Building Materials, vol. 146, pp. 775–787, 2017.
“Densely connected convolutional networks,” in Proceedings
[21] X. Yang, H. Li, Y. Yu, X. Luo, T. Huang, and X. Yang,
of the IEEE Conference on Computer Vision and Pattern
“Automatic pixel-level crack detection and measurement
Recognition, pp. 4700–4708, Honolulu, HI, USA, July 2017.
using fully convolutional network,” Computer-Aided Civil
and Infrastructure Engineering, vol. 33, no. 12, pp. 1090–1109,
2018.
[22] J. Zhu and J. Song, “An intelligent classification model for
surface defects on cement concrete bridges,” Applied Sciences,
vol. 10, no. 3, p. 972, 2020.
[23] L. Deng, H.-H. Chu, P. Shi, W. Wang, and X. Kong, “Region-
based CNN method with deformable modules for visually
classifying concrete cracks,” Applied Sciences, vol. 10, no. 7,
p. 2528, 2020.
[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards
real-time object detection with region proposal networks,”
IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, vol. 39, pp. 91–99, 2015.
[25] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: object detection via
region-based fully convolutional networks,” in Proceedings of
the Advances in Neural Information Processing Systems,
pp. 379–387, Barcelona, Spain, December 2016.
[26] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and
S. Belongie, “Feature pyramid networks for object detection,”
in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2117–2125, Honolulu, HI, USA, July
2017.
[27] S. Li and X. Zhao, “Image-based concrete crack detection
using convolutional neural network and exhaustive search
technique,” Advances in Civil Engineering, vol. 2019, 12 pages,
2019.
[28] S. Dorafshan, R. J. Thomas, and M. Maguire, “SDNET2018: an
annotated image dataset for non-contact concrete crack de-
tection using deep convolutional neural networks,” Data in
Brief, vol. 21, pp. 1664–1668, 2018.
[29] L. Perez and J. Wang, “The effectiveness of data augmentation
in image classification using deep learning,” 2017, https://
arxiv.org/abs/1712.04621.
[30] K. Simonyan and A. Zisserman, “Very deep convolutional
networks for large-scale image recognition,” 2014, https://
arxiv.org/abs/1409.1556.
[31] C. Szegedy, “Going deeper with convolutions,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1–9, Boston, MA, USA, June 2015.