0% found this document useful (0 votes)
49 views15 pages

Feature Fusion for Road Crack Detection

Uploaded by

Youssef Aouni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views15 pages

Feature Fusion for Road Crack Detection

Uploaded by

Youssef Aouni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.0122113

Adaptive Canny and Semantic Segmentation


Networks Based on Feature Fusion for Road
Crack Detection
JIE LUO1 , HUAZHI LIN2 , XIAOXU WEI3 , and YONGSHENG WANG4
1
School of Automation, Wuhan University of Technology, Wuhan 430063, China (e-mail: luo_jie@[Link])
2
School of Automation, Wuhan University of Technology, Wuhan 430063, China (email: linhz@[Link] )
3
School of Automotive Engineering, Wuhan University of Technology, Wuhan 430063, China (e-mail:
wxx2014@[Link])
4
School of Information Engineering, Wuhan University of Technology, Wuhan 430063, China (e-mail: wysh@[Link])
Corresponding author: Yongsheng Wang (e-mail: wysh@[Link]).

ABSTRACT Road cracks, which are a common hazard in pavements throughout the life cycle
of a road, can degrade the performance of the road, shorten its service life, and endanger the
safety of vehicles. Traditional vision machine detection methods can detect road crack details but
suffer from poor stability and generalization ability, whereas semantic segmentation detection,
although more stable, cannot track fine road crack information. To combine the advantages of
both methods and improve the accuracy of road crack detection, a novel feature fusion road crack
detection method is proposed in this study. First, the bilateral filter and four-way Sobel operator
are introduced into the Canny algorithm to enhance the noise reduction effect and extract edge
features more effectively. Second, the dynamic threshold is generated adaptively by the gradient
information after non-maximum suppression. Subsequently, the detection map is morphologically
processed, the connected areas are ranked, and the bilateral filter parameters are adjusted based on
the detection results. The Canny road crack detection map is then extracted by the convolutional
feature extraction module, fused with the low feature layer in the DeepLabV3+ detection network,
and finally stitched with the high feature layer; the resulting map is obtained after convolutional
feature extraction. The method was validated on the publicly available complex road crack dataset
CRACK500; the experimental results showed that feature fusion outperformed the adaptive Canny,
DeepLabV3+, Unet, PSPnet, and ICnet algorithms by more than 6.5% on the Mean Intersection
over Union(MIoU) and also in Mean Absolute Error(MAE) by effectively combining crack features
and improving the detection accuracy.

INDEX TERMS adaptive Canny, feature fusion, road cracks, semantic segmentation

I. INTRODUCTION spection was time-consuming and laborious; the results


were also susceptible to the subjective awareness of

C RACKS are one of the most common road de-


fects. The appearance of cracks indicates that the
road structure has been damaged to varying degrees,
different inspectors. The subsequent rapid development
of computer vision detection algorithms led to them
gradually being applied; however, the influence of several
allowing corrosion to reach the internal reinforcement interfering factors, such as complex road materials,
layer of the road, whereby internal damage to the road excessive pollutants, and weather changes, increase the
is accelerated [1], and the tensile strength of the road probability of false detection and missed detection errors
is lowered; this not only affects the aesthetics of the in the inspection process.
road but also poses a potential threat to pedestrian
safety [2][3]. Therefore, road crack detection must be With the development of computer vision and arti-
performed regularly on roads. Previously, road crack ficial intelligence, the use of machines for road crack
detection was performed manually; however, manual in- detection has become increasingly common. The use

VOLUME 10, 2022 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

of computers renders the road crack detection process the image before detecting the edges. It is not easily
less susceptible to complex and changing environments, disturbed by noise and can detect true weak edges. Two
and the detection results are more standardized and different thresholds are used to detect strong and weak
accurate. Currently, two mainstream machine detection edges. When weak edges are connected to strong edges,
methods are road crack detection based on traditional the weak edges are included in the output image. The
vision and road crack detection based on semantic detected edges are more complete, better connected,
segmentation. and have higher localization accuracy. Therefore, the
In the traditional visual inspection method, image Canny operator is chosen as the traditional image pro-
processing technology is employed to preprocess images cessing method for its excellent edge detection ability
and reduce noise interference. Subsequently, a featured and suitability for crack detection. This paper selects
operator is used to extract the characteristic information DeepLabv3+ as the semantic segmentation network.
of the road crack, enabling differentiation between the DeepLabv3+ [5] is an extension of DeepLabv3 and is
cracked and non-cracked regions. However, traditional considered a new pinnacle in semantic segmentation by
image processing suffers from several limitations, includ- the academic community. DeepLabv3+ combines the
ing the need to set multiple parameters, lack of self- advantages of the spatial pyramid pooling model and the
learning ability, vulnerability to environmental noise, encoder-decoder structure. It adds a simple and effective
poor adaptability to changes in the detection environ- decoder module to refine the segmentation results, espe-
ment, and limited generalization ability. In the semantic cially along the object boundaries. The authors further
segmentation detection method, a convolutional neural explore the application of depth-separable convolution
network is trained to identify road crack features and to the atrous spatial pyramid pooling and decoder mod-
classify each image pixel. While this approach is more ule, resulting in a faster and stronger encoder-decoder
stable, some issues arise in actual road crack scenarios, network. The DeepLabv3+ network has an advanced
such as pixel ratio disparities between the road crack architecture, excellent segmentation accuracy, and enor-
and background. As a result, the semantic segmentation mous potential to handle surface feature segmentation
detection method may overlook crucial details in the tasks in crack scenarios [6]. Therefore, DeepLabv3+ is
road crack. chosen as the semantic segmentation part, and many
To address the issue of excessive manual parameter researchers use DeepLabv3+ for crack detection, which
adjustment, poor stability in traditional visual inspec- will be discussed later in this paper.
tion methods, and the loss of detail associated with In this study, the primary contributions presented in
semantic segmentation detection, this paper proposes a this article are as follows:
road crack detection method based on feature fusion, • An adaptive Canny dual-threshold selection
adaptive Canny, and DeepLabv3+. In traditional image method is proposed in this paper. The method generates
processing, the four-directional Sobel operator is used a gradient histogram by traversing the image pixels. It
instead of the original two-directional Sobel operator. calculates the gradient region most likely to be the road
Bilateral filtering replaces the original Gaussian filtering crack edge based on the gradient variance and sets high
to enhance feature extraction and noise reduction. Adap- and low thresholds accordingly. Experimental results
tive bilateral filtering is obtained through feedback from show that the adaptive Canny threshold set according
the results, and adaptive Canny thresholds are obtained to the gradient histogram can effectively detect road
from the image gradient information, improving the crack edges and solve the problems of inaccurate and
automation level of detecting complex road cracks. The inefficient manual threshold setting.
adaptive Canny detection image is fused with the low- • An adaptive bilateral filtering method is proposed
level features of the DeepLabV3+ network through a in this paper. The method adjusts the filtering pa-
feature extraction module to solve the problem of losing rameters through feedback from the detection results.
some details caused by the combination of high and low- Firstly, the number and area of connected regions of road
level features, thereby improving the accuracy of road crack edge images generated are detected. If there are
crack detection. multiple small connected regions, it indicates that the
This paper selects the Canny edge detection operator noise interference is too large and the filtering strength
as the traditional image processing method. There are needs to be increased. If the connected regions are few
many edge detection operators, and Wang et al. [4] and small, it indicates that some road crack information
compare five commonly used operators and come to the is lost, and the filtering strength needs to be reduced.
following conclusions: the Roberts operator has good • A supervised learning method for feature fusion
localization but detects fewer edge details; the edge of adaptive Canny detection results and DeepLabV3+
detection effect of Sobel is similar to Prewitt, with clear network is proposed in this paper. The Canny result
contours, but wider edges; the Kirsch operator detects image is fused with the low-level feature layer in the
blurry edges and has low localization accuracy. The DeepLabV3+ network, increasing the road crack posi-
Canny operator uses a Gaussian function to smooth tion and details and compensating for the loss of details
2 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

when the high and low-level feature layers are fused. the natural iteration method was used to obtain the
These details can detect small cracks. The proposed image threshold, and discontinuous edges were repaired
fusion method is tested on the literature’s widely used using the morphological closure operation and thinning
CRACK500 complex road crack dataset. The experi- operators to obtain the final detected edges. Experi-
mental results show that this method is superior to ments showed that this improved algorithm maintains
the pre-fusion method in terms of MIoU and MAE the integrity of the detected edges and can reduce the
values and has better results than previous semantic interference of granular and flaky noise. Akagic et al. [11]
segmentation networks. proposed using Otsu-thresholded image segmentation
In the following sections, Section 2 introduces related for pavement crack detection. First, the method divides
research on road crack detection; Section 3 presents the input image into four independent sub-images of
the method proposed in this study; Section 4 presents equal sizes. Then, cracks are searched based on the
the implementation details, experimental results, and ratio between the Otsu threshold and the maximum
analysis; and Section 5 provides the conclusions. histogram value of each sub-image, and all segmented
sub-images are merged. This method performs better
II. RELATED WORK with low signal-to-noise ratios but is unsuitable for
This section introduces research on road crack detection complex crack environments.
and their advantages and disadvantages. Road crack
detection based on computer vision includes traditional B. ROAD CRACK DETECTION BASED ON SEMANTIC
visual and semantic segmentation detection. SEGMENTATION
Semantic segmentation, in this study’s context, involves
A. ROAD CRACK DETECTION BASED ON TRADITIONAL classifying each image pixel for road crack detection.
VISION Zhang et al. [12] developed CrackNet based on a con-
Following the traditional vision for road crack detection. volutional neural network to detect cracks at the pixel
Huan et al. [7] proposed an improved Canny operator level; CrackNet does not have any pooling layers, and its
for pavement road crack detection. They used morpho- feature map size is constant for all layers. Subsequently,
logical filtering to replace the original Gaussian filtering in 2018, Zhang et al. [13] proposed the improved Crack-
and Otsu’s maximum inter-class variance threshold seg- Net II. Compared with CrackNet, CrackNet II sets train-
mentation algorithm to achieve adaptive determination able parameters in all hidden layers to further improve
of dual thresholds. The experimental results show that crack detection performance. Fei et al. [14] proposed
the improved algorithm demonstrates improved crack CrackNet-V on top of CrackNet, which significantly
detection speed and accuracy, but only for relatively improves the crack detection rate. The above series of
simple crack scenes with little noise interference, and CrackNet models can maintain spatial resolution and
performs poorly in more complex crack environments. avoid the loss of spatial information during downsam-
Zhao et al. [8] further improved the Canny operator and pling. In addition, encoder-decoder architectures have
applied it to crack detection. They replaced the original been widely used for semantic segmentation. Among
Gaussian filter with bilateral filtering. They used the the various encoder-decoder architectures for road crack
Otsu algorithm to adaptively obtain a double threshold detection, DeepLabv3+ is a widely used technique. Us-
and morphological filtering to eliminate small voids and ing their dataset, Ji et al. [15] were the first to apply
fill cracks in the contour lines. The improved method DeepLabv3+ to road crack detection. The effectiveness
enhances the accuracy of detection but is still ineffective of DeepLabv3+ was verified for single, multiple, in-
in detecting cracks in complex environments and has tersecting, and crocodile cracks, and its performance
weak generalization capability; it is thus unable to meet was superior to that of FCN [16] and DeepCrack [17].
the needs of simultaneous multi-scene crack detection. However, DeepLabv3+ suffers from poor performance
Monicka et al. [9] proposed an improved Canny operator in crack details, which could be attributed to the
for noise removal that incorporates adaptive threshold suppression of some crucial information in the high
selection, morphological operations, image segmenta- feature layer by the low feature layer during fusion [18].
tion, and image binarization. The experimental results Attention mechanisms have been used in some studies
show that the improved algorithm is more accurate to address this issue. Sun et al. [19] proposed a multi-
than traditional edge detection algorithms in detecting scale attention module in the decoder of DeepLabv3+
cracks. However, the problem of too many manually to generate attention masks and dynamically allocate
adjusted parameters persists, which affects detection weights between high-level and low-level feature maps.
efficiency. To improve the Canny operator, Meiling et Dynamic weights can assign more rational weights to
al. [10] first used statistical filtering to solve the image different feature maps than fixed weights across different
edge blurring problem, thereby calculating the gradient features. The network exhibited significantly improved
amplitude and direction of the crack image based on segmentation performance and performed well on the
the 3 × 3 gradient template of the operator. Finally, road crack dataset. Liu et al. [20] improved DeepLabv3+
VOLUME 10, 2022 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

by using Resnet50 as the backbone network to enhance in environments with sufficient lighting and minimal
its feature extraction capability. While this improved noise interference. Its detection accuracy is superior to
method showed higher detection accuracy than other that of semantic segmentation. However, this approach
common pixel-level segmentation algorithms, it required is highly susceptible to environmental factors such as
increased time due to the increased network layers. Cai pollutants, lighting, and road signs, which can lead
et al. [21] utilized MobileNetV2 as the backbone feature to inaccurate detection results. Moreover, the manual
extraction network, substantially reducing the model’s parameter setting is laborious, and the method has
computational load and increasing the calculation speed. weak generalization ability and low efficiency. On the
Furthermore, they introduced the attention mechanism other hand, the road crack detection method based on
module into the backbone feature extraction network semantic segmentation is less prone to environmental
and decoder, further optimizing the model’s edge recog- disturbances, and it exhibits strong generalization abil-
nition effect and segmentation accuracy. ity and high robustness. However, it needs improvement
to detect crack details effectively.
TABLE 1. SUMMARY TABLE OF WORK RELATED TO ROAD CRACK
DETECTION
III. METHODS
Thesis This section presents the proposed method in this pa-
Research Methods Advantage Disadvantages
Tittle per, which comprises three parts: the adaptive Canny
Adaptive Selection of Avoid the
Does not solve
the interference
method, the DeepLabv3+ method, and the fusion of the
[7]
Canny Double Thresh-
olds Using Ostu Algo-
tedious manual
parameter
of road two.
environmental
rithm adjustment
noise This study proposes feature fusion-based adaptive
Use the Ostu algorithm
Susceptible to
environmental
Canny and DeepLabV3+ for crack detection, which is
[8]
to obtain the threshold
Improved
detection
interference, used for pixel-level crack detection. The flowchart is
and introduce morpho- poor detection
logical filtering
accuracy
effect on shown in Fig. 1. The proposed method is divided into
Added adaptive
complex cracks two parts: (1) adaptive Canny for crack detection in
threshold selection,
Better ability
Many parame- the original image and (2) adaptive Canny detection
[9] morphological
operations, image
to remove noise
ters need to be
set manually with DeepLabV3+ network fusion for crack detection
segmentation in the resulting image. The original incoming image
Preserves edge
Solving Blurred Edges integrity and is preprocessed in the first part to reduce ambient
[10] with Statistical Filter-
ing
reduces grain
and flaky noise
noise interference by grayscaling and bilateral filtering.
interference Second, the gradient operator is used to calculate the
Better
Use Otus to segment
performance at
Not suitable for gradient magnitude and direction of the pixel points
[11] images, detect them complex crack
and merge them
low signal-to-
noise ratios
environment of the entire image, whereby a suitable threshold is
Maintain spa-
The detection
generated based on the gradient map to complete edge
Semantic segmentation
tial resolution
to avoid loss
speed is not detection. Finally, morphological operations are used to
[12] network developed fast and the
based on CNN
of spatial infor-
accuracy is not refine the edge-detected image. The original image is
mation during
downsampling
high fed into the DeepLabV3+ network for detection in the
Trainable parameters
Improved
performance
second part, divided into two paths after the dynamic
[13] are set in all hidden
layers
for detecting convolutional neural network (DCNN). One path con-
cracks
Compared to the orig- Improved sists of the low feature layer, which is to be fused with
[14]
inal version of Crack-
Net, CrackNet-V has a
detection
accuracy and
the detection result map of the first part after the con-
deeper architecture but computational volutional feature extraction module; the other path is
fewer parameters
A Semantic
efficiency
Loss of crack passed through the enhanced feature extraction network
Clear
[15]
Segmentation Network
boundary
details during known as the atrous spatial pyramid pooling network
Based on Encoder- feature layer
Decoder Architecture
extraction
fusion for information extraction to generate the high feature
[19]
Introducing multi-scale
attention modules into
Enhance
ability
the
to
Insufficient
generalization
layer. Subsequently, the high and low feature layers are
the decoder detect details ability stacked, followed by two depth-separable convolution
Select Resnet50
as the backbone
Enhanced fea-
Slow down op- blocks, and then upsampled to the size of the input
[20] ture extraction
network for improving
DeepLabv3+
capability
eration speed image to complete the final prediction.
Using MobileNetV2 as
the backbone feature Enhance edge
[21] extraction network and extraction ca- A. ADAPTIVE CANNY DETECTION
introducing attention pability
mechanism 1) Image pre-processing
The publicly available complex road dataset CRACK500
Table 1 provides a summary of related work. The road covers a wide range of cracks caused by factors such as
crack detection method based on traditional vision can road material and contaminants. To address the com-
detect the edges of road cracks and even minor details plexity of the pavement images, the crack images must
4 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

FIGURE 1. The block diagram of the crack detection method proposed in this paper is presented below. The left half of the diagram shows the adaptive
Canny detection process, followed by fusing the resulting image with the DeepLabV3+ network structure in the right half for further detection.

be preprocessed before they are fed into the inspection kernel and the central pixel, color intensity, and depth
system, as outlined in the following steps: distance, among others. Both weights are considered
(1) Grayscale. The acquired dataset images are in when calculating the central pixel, and the formulae are
color. To reduce memory consumption and increase the shown in (1) and (2).
processing speed of the images, they must be converted
into 8-bit greyscale images to reduce the amount of data 1 X
Ibf
p = Gδ (p − q) Gδr (|Ip − Iq |) Iq (1)
for processing road images [22]. Wpb f q∈s s
(2) Bilateral filtering [23]. The grayscale image still
Wpbf =
X
contains several environmental distractors affecting the Gδs (p − q) Gδr (|Ip − Iq |) (2)
subsequent edge feature extraction; therefore, filtering is q∈s

required. The original Canny algorithm applies Gaussian where: Ip is the input image and Iq is the output
filtering to the image. Gaussian filtering is a weighted image; p and q are any two points on the image; s
average of neighboring pixels in the domain and can is the range region centered on a point; Ibf p is the
be used to remove high-frequency noise. However, the filtered image; Wpbf is the normalization factor; δs is
edges are also high-frequency signals, and some edge the spatial domain standard deviation; δr is the value
information of the cracks is lost while eliminating in- domain standard deviation; Gδs (p − q) is the spatial
terference noise, which affects the subsequent extraction weight and Gδr (|Ip − Iq |) is the pixel range weights. For
of fine crack features. This study used bilateral filtering the points to be filtered, the weight of the pixel point
to replace the original Gaussian filter. Bilateral filtering in its domain is related to the distance and similarity
is a nonlinear filter that can maintain edges and reduce between the two pixel points. Combining the spatial
noise smoothly and is thus ideal for crack pictures of distance and degree of similarity, two-sided filtering can
complex pavements. Bilateral filtering uses a weighted be achieved using the formulae shown in (3) and (4).
average method [24], in which the intensity of a pixel
is represented by the weighted average of the luminance −∞
Z −∞
Z
values of the surrounding pixels; this weighted average h (x) = k −1
(x) f (ε) c (ε, x) s × [f (ε) , f (x)] dε (3)
is based on a Gaussian distribution. Most importantly, +∞ +∞
the weights of the bilateral filter consider not only the
Euclidean distance of the pixel but also the radiometric −∞
Z −∞
Z
differences in the pixel range domain, including the k (x) = c (ε, x) s [f (ε) , f (x)] dε (4)
degree of similarity between the pixel in the convolution +∞ +∞
VOLUME 10, 2022 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

Where: h (x) is the filtered grey value of point x; 3) Canny Adaptive Threshold Selection
c is the weight of Gaussian space; s is the Gaussian In response to the problem of the traditional Canny
weight of the degree of similarity between pixels [25]; algorithm requiring manual adjustment of high and low
f (x) is the pixel value of the pixel point; k−1 (x) is the thresholds, this study proposes an adaptive dynamic
normalization factor. thresholding method. Gradient conversion is performed
To obtain a better result map, the parameters of c、s on the image after non-maximum suppression, convert-
and filter kernel size must be set manually. Manual ad- ing the gradient value to 0 and 255. The gradient
justment is inefficient because the dataset contains many magnitude of each pixel point is counted to generate
images of different types. To address this problem, this a gradient histogram describing the edge strength [28].
study proposes an adaptive bilateral filtering method, The gradient histogram has information on image edge
which is presented in the subsequent sections. strength, where pixel points with gradient 0 indicate
non-edge regions that are excluded to not affect the
2) Calculating image gradients subsequent calculation. The remaining points contain
The Canny algorithm uses the traditional Sobel opera- all the detected edge information, which contains both
tor, which only considers gradient information in both correct and incorrect edges, that is, ambient noise, as
directions, is not robust in analyzing images, and is shown in Fig. 3.
easily affected by noise. It does not consider the image
edge direction and often loses edge details [26].
To reduce the influence of noise on the image, enhance
the detection of contour lines, and retain edge details,
this study drew on the first-order gradient template of
the Sobel operator to determine the gradient amplitude
of the image, thereby extending it to a first-order gradi-
ent template in four directions: horizontal, vertical, 45°,
and 135° [27], as shown in Fig. 2.

FIGURE 3. Gradient histogram of the crack image

The gradient histogram demonstrates no obvious bi-


modal peak, only a very high single peak in the region
of low gradient values, indicating that the ambient
noise in the image accounts for the majority of the
cracked image. The region corresponding to the edges
FIGURE 2. Gradient direction template. Where a is the x-axis direction, b in the original image in the gradient histogram shows
is the y-axis direction, c is the 45°-axis direction, and d is the 135°-axis
direction.
no obvious spikes but exhibits a uniformly decreasing
state with an approximately normal distribution. There
The first-order gradient components Gx (x, y),Gy (x, y), is no obvious valley boundary between the non-edge and
G45◦ (x, y) and G135◦ (x, y) in each of the four directions edge regions in the gradient histogram; therefore, the
can be obtained by convolving the filtered image with gradient histogram cannot be segmented by simply using
each of the four first-order gradient templates in Fig. 2. the grayscale histogram to determine the bimodal peak.
The gradient amplitude and angle can be obtained Based on the properties of the gradient histogram, a
from the first-order gradient components in the four method is proposed for adaptively segmenting segment
directions, as shown in (5) and (6). the correct edge region from the incorrect edge region.
The images of the road crack dataset contain both crack
q and environmental information, and it is known, from
M (x, y) = G2x + G2y + G245 + G2135 (5) a priori knowledge, it is known that the proportion of
incorrect edges in the images is much larger than the
proportion of correct edges; therefore, the pixel-level
Gy sum of the areas corresponding to the peaks of single
 
θ (x, y) = arctan (6) peaks and their vicinity in the gradient histogram must
Gx
6 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

be the set of incorrect edges, that is, environmental 4) MORPHOLOGICAL MANIPULATION AND ADAPTIVE
interference noise. The gradient with the highest number BILATERAL FILTERING
of pixels in the gradient histogram is called the pixel- The morphological closure operation [29] is image expan-
maximum gradient Gmax , and the variance of the gradi- sion followed by erosion. When applied to crack images,
ent of all pixels within the sub-image relative to Gmax this process removes narrow interruptions and long thin
is calculated and referred to as the pixel-max gradient divots, eliminates small holes, and replenishes cracks in
variance σmax . the contour lines. This operation smoothens the contour
edges.
After the morphological closure operation, the edges
v
u 255 255
demonstrate superior smoothing and noise reduction
uX 2
X
σmax =t (i × Gi − max × Gmax ) / (i × Gi ) (7)
i=1 i=1 while retaining more image edge details. At this point,
only the crack edge information is obtained, and the
Where i is the gradient value, and max is the gradient center of the crack is an undetected area, forming a
value containing the maximum number of pixels. large hole that must be eliminated by morphological
When there is little ambient noise, the gradient his- reconstruction of the image. After morphological recon-
togram has only a single peak, the pixel gradient values struction, crack information is fully detected, but some
are concentrated around the most-valued gradient Gmax , environmental noise is also detected. The non-cracked
and σmax is small. When the gradient histogram reflects areas are filtered out by determining the connected areas
not only cracked edges but also a large proportion of and comparing the size of each connected area across the
false edges caused by ambient noise, the correct edge image.
pixel gradients are distributed relatively far from Gmax ; As mentioned earlier, the parameters c, s, and the
consequently, σmax is large. Therefore, based on this filter kernel size must be set manually to obtain a
analysis, high and low thresholds can be adaptively set. better bilateral filtering effect. This study proposes an
Analysis of the principle of the Canny algorithm shows adaptive method with feedback adjustment based on the
that the double-threshold connection ignores weak edge detection results to solve this problem.
points below the low threshold τl and retains strong edge
(1) The size of the filter kernel. Filter kernel size
points above the high threshold τh . Edge points between
determines the number of pixels involved in filtering cal-
τl and τh are retained depending on whether there are
culations. A larger filter kernel leads to better denoising
strong edge points in the neighborhood. Therefore, τl
but requires greater computational effort and generates
must be set outside the wrong edge region. Otherwise,
a blurred image. The filtering speed is increased to
a large amount of false edge noise will be introduced
reduce the amount of computation; to maintain a good
into the final result. In this study, we set τl adaptively.
filtering effect, the kernel size is generally chosen to be
Gmax reflects the central position of the distribution of
no larger than 15, depending on the situation.
the error edge, that is, ambient noise, in the gradient
(2) Gaussian spatial weight. A linear relationship
histogram, whereas σmax reflects the dispersion of the
is observed between the Gaussian spatial weight and
gradient distribution in the gradient histogram relative
the size of the filter kernel [30]. Because more than
to Gmax that is, the dispersion relative to the error edge.
95% of the components of the Gaussian function are
Therefore, it can be used as Gmax , and the σmax is used
concentrated in the interval [−2σ, 2σ], to obtain a larger
to calculate the extent of the erroneous edge region.
number of samples while ensuring image clarity, the
The distribution pattern of each gradient pixel in the
spatial weight is defined as,
gradient histogram conforms to a normal distribution,
and it is empirically known that the gradient values
of the crack edges are larger so that few correct edge mr
c= (9)
regions are in regions with larger gradient values. Ac- 2
cording to the 3σ principle for the normal distribu- Where c is Gaussian spatial weight, m is a constant,
tion, when the low threshold value τl is greater than and r is the filter kernel size. The best value of m is
the pixelmaximum gradient Gmax by three times the experimentally determined to be in the range of [0.75-
pixelmaximum gradient variance σmax , τl , is considered 0.85], which can effectively over-disperse the spatial
outside the wrong edge region; thus the environmental weight, resulting in blurred images.
noise edge can be prevented from appearing in the (3) Gaussian pixel weight. Compared with Gaussian
contour map. The formula for calculating τl is, spatial weight, the effect of bilateral filtering is more
easily influenced by Gaussian pixel weight [31]. The
τl = Gmax + 3 × σmax (8) feedback from the detection result map is used to adjust
Gaussian pixel weight, which can be more influential in
Once τl has been determined, τh is generally consid- bilateral filtering. When the percentage of the maximum
ered to be three times the low threshold τl . connected area is too large, and the number of connected
VOLUME 10, 2022 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

areas is too many, it indicates that noise significantly convolution to compress the number of channels. Two
impacts the detection area and that Gaussian pixel depth-separable convolution blocks are then used to ex-
weight should be increased. Gaussian pixel weight is tract features, which are then upsampled and magnified
adjusted according to two feedback indicators: the maxi- four times to obtain a prediction map of the same size
mum connected area ratio and the number of connected as the original image.
areas. After conducting numerous experiments, it was
concluded that the detection region of a correct crack 2) Feature layer fusion
should be the largest connected region in the detection In image processing, feature fusion at different scales
map, comprising 12% to 18% of the entire image area, is an important tool for improving image results. In
and the number of connected regions should not exceed convolution, lower-level features have higher resolution
5. In the adaptive bilateral filtering method proposed and contain more location and detail information but
in this paper, the filter kernel size was set to 15, the are less semantic and noisier. In contrast, higher-level
Gaussian spatial weight was set to 8 using formula (9), features have stronger semantic information at the ex-
and the initial value of the Gaussian pixel weight was set pense of very low resolution and poor detail perception.
to 15. Next, the image was predicted, and the detection Therefore, fusing features from different scales is the key
results were looped through. If the largest connected to improving the results. In the DeepLabV3+ model, the
area accounted for more than 18% of the entire image high and low feature layers are stitched together using
area, or the number of connected areas was greater than concat [32], and the number of features describing the
5, it indicated that the image was significantly disturbed image (the number of channels) is increased; however,
by noise and that the algorithm had misidentified some the information under each feature does not increase,
non-crack areas. Gaussian pixel weight should be in- and the low feature layers possess less semantic infor-
creased to improve the filtering effect. Thus, the value of mation and play little role in the final prediction [33].
s was incremented by 1, and the detection was re-filtered. When combining different levels of feature layers, the
Conversely, if the largest connected region accounted for misalignment of information feature locations may lead
less than 12% of the entire image area, or the number to a loss of detail in detecting small cracks [34]. In this
of connected regions was less than 5, it indicated that study, we propose fusing the adaptive Canny detection
no cracks or some cracks were missing. In this case, map with lower feature layers after feature extraction
Gaussian pixel weight should be reduced to weaken the and performing sequential operations. The structure of
filtering effect. Therefore, the value of Gaussian pixel this method is shown in Fig. 4.
weight was decremented by 1, and the detection was re- This paper proposes a feature fusion method that
filtered. fuses the adaptive Canny result map with the low feature
layer of DeepLabv3+. This approach was chosen for
B. SEMANTIC SEGMENTATION AND FEATURE FUSION three main reasons. Firstly, the Canny detection map is
1) DeepLabV3+ rich in detail and provides accurate position information.
DeepLabV3+ is an excellent semantic segmentation The low feature layer has finer features than the high
model that introduces the encoder-decoder form to feature layer, which enhances its ability to locate object
fuse multiscale information better. The encoder-decoder positions and borders. Hence, selecting and fusing low-
architecture introduces arbitrary control over the res- feature layers for pixel-level feature fusion can improve
olution of the features extracted by the encoder and the accuracy of semantic segmentation. Secondly, low-
balances accuracy and time consumption through atrous feature layers are more sensitive to pixel-level details,
convolution. The DeepLabV3+ model is divided into two such as texture and edge, which must be considered
parts: one for the encoder and the other for the decoder. when fusing pixel-level features. As a result, selecting
The encoder uses serial atrous convolution in the and fusing with low-feature layers can better preserve
backbone network DCNN. After applying deep convo- these details, improving the accuracy of semantic seg-
lutional feature extraction to the incoming image, the mentation. Finally, compared to fusing Canny detection
output is divided into two parts: one part is directly maps with high-feature layers, fusing with low-feature
passed to the decoder as the lower feature layer, and layers requires fewer convolution kernels, making the
the other part goes through parallel atrous convolution algorithm run faster.
at different rates to extract features, capture contextual In the structure diagram, after adaptive Canny de-
information at different scales, merge them, and perform tection, the original map becomes a Canny detection
1×1 convolution to compress the features to obtain the map, to which a 3×3 convolutional block is applied
high-level feature layer for the decoder. for feature extraction to obtain a low feature layer
In the decoder, the higher feature layer from the containing more semantic information. This feature layer
encoder is upsampled by bilinear interpolation using a is then fused with the lower feature layer after DCNN,
magnification factor of four; the concat feature is then with an additional fusion considered at this point.
used for fusion with the lower feature layer after 1×1 Compared to concat, instead of increasing the number
8 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

FIGURE 4. Feature fusion structure diagram. The blue box is the encoding part, the red box is the decoding part, and the highlighted area is the innovative
method proposed in this paper.

of features (number of channels) of the image, the di- Information on the parameter settings of the model as
mension of each feature layer needs to be superimposed well as numerical and visualization results, are provided.
directly so that the amount of information under the
features describing the image is increased, which is A. PARAMETER SETTINGS
equivalent to addition in advance and an increase in the This study developed the proposed method using Ana-
amount of semantic information in the otherwise low- conda, Python, and PyTorch. A supervised learning
feature layers, which is beneficial for the final classifica- method is utilized to train the network model, and
tion. the corresponding details and parameter selections are
provided below.
IV. EXPERIMENTAL DETAILS • Bilateral filter kernel size of 15, Gaussian spatial
This section presents the experimental section of the weight c of 8, and Gaussian weight s of 15 to determine
method proposed in this paper. It includes the descrip- the degree of similarity between pixels.
tion of the experimental environment, the parameter • Minimum connected pixel area of 1500,
settings, the selection of the dataset, the ablation experi- • Batchsize of 12,
ments, the comparative experiments with other semantic • Initial learning rate of 0.001 and minimum learning
segmentation networks, and the experimental analysis. rate of 0.00001,
In the experimental study, the CRACK500 dataset • Number of training iterations was set to 100,
and GAPS384 dataset were used to evaluate the per- • Backbone is the Xception network,
formance of the proposed method. The adaptive Canny, • The total loss function is the sum of pixel-wise
DeepLabV3+, Unet, PSPNet, and ICnet methods for cross entropy loss and auxiliary loss,
pixel-level surface defect detection were evaluated, and • The input image size is 512×512,
their performances were compared with the proposed • The downsampling factor is 16,
method. • The optimizer is sgd, the momentum is 0.9, and
The hardware environment used to conduct the ex- the weight decay is 0.0001,
periments in this study was an Intel Xeon(R) W-2145 • The learning rate drop method is cos,
CPU with an NVIDIA GeForce RTX 2080Ti GPU with • Use pre-trained weight files for training
11 GB of graphics memory; the software environment In addition, in the experimental study, intersection
consisted of Ubuntu 18.04, Python 3.6, PyTorch 1.10, over union (IoU), MIoU, Precision, Recall, F1-Score,
and the associated Python libraries for neural networks. Pixel Accuracy(PA), and MAE were used to evaluate the
VOLUME 10, 2022 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

performance of the proposed method. The mathematical TABLE 2. CRACK500 details


expressions for these performance metrics are as follows:
Data set Training set Test set Validation set

TP CRACK500 400 300 50


IoU = (10)
(TP + FP + FN)

(IoU1 + ... + IoUn)


MIoU = (11)
n

TP
Recall = (12)
(TP + FN)

TP
Precision = (13)
(TP + FP)

(2 × Precision × Recall)
F1 − Score = (14)
Recall + Precision

(TP + TN)
PA = (15)
(TP + TN + FP + FN) FIGURE 5. Sample images in [Link] left is the original image, and
the right is the labeled image.

N
1X
MAE = |yi − ŷi | (16) pixels; each road crack image has a pixel-level annotated
N i=1
label, making this dataset the largest publicly accessible
Where TP, TN, FP, and FN indicate a true positive, pavement crack dataset with pixel-level annotation.
a true negative, a false positive, and a false negative,
respectively. 2) GAPS384
This paper selects MIoU and MAE as the final perfor- This study evaluates the generalization and limitations
mance indicators to evaluate the algorithm. MIoU mea- of the proposed method based on the model trained on
sures the intersection and union relationship between CRACK500 by utilizing the publicly available road crack
predicted and actual regions and then calculates the dataset GAPS384.
average. It comprehensively reflects the model perfor-
mance by considering the similarity and overlap between TABLE 3. GAPS384 details

predicted and actual values. On the other hand, MAE


is used to evaluate the model’s prediction accuracy by Data set Training set Test set Validation set
representing the average difference between the pre- GAPS384 180 102 102
dicted and true values. In binary classification semantic
segmentation, MAE can assess the model’s classification Figure 6 and Table 3 illustrate the GAPS384 dataset
accuracy on a pixel level and directly reflect the differ- and sample size used in the experimental study.
ence between the model prediction and actual results. GAPS384 is a road crack detection dataset maintained
The reason for selecting MIoU and MAE is that they by the University of Duisburg-Essen in Germany. The
can evaluate the performance of the binary classification dataset mainly consists of high-definition asphalt con-
semantic segmentation algorithm from different aspects. crete road photos. Although the dataset only provides
image files without any information about the location
B. DATASETS and shape of road cracks, Yang et al. [36] cropped
1) CRACK500 and annotated the images with a size of 540×440. The
This study used the publicly available complex road dataset is diverse, with various noises such as dark
cracking dataset CRACK500 to evaluate the perfor- environments, oil stains, and zebra crossing, making it
mance of the proposed method and compare the per- ideal for testing the network’s generalization ability.
formance of other semantic segmentation networks.
Fig. 5 and Table 1 present the CRACK500 dataset C. ABLATION EXPERIMENT RESULTS AND ANALYSIS
and sample sizes used in this study. The dataset is This paper thoroughly tested and analyzed the adaptive
from the main campus of Temple University in the Canny method components proposed. Throughout the
USA, with image sizes of approximately 2000 × 1500 experiment, each module used the same parameters and
10 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

TABLE 4. Performance comparison of different models.

Components CRACK500
Method
Adaptive
Four-
Bilateral Bilat- Adaptive MIoU% MAE
way
Filter eral Canny
sobel
Filter

Method1 # # # # 19.39 26.44


Method2 ! # # # 37.61 9.76
Method3 ! ! # # 52.33 9.51
Method4 ! ! ! # 57.17 9.02
Method5 ! ! ! ! 65.28 6.24

FIGURE 6. Sample images in [Link] left is the original image, and


the right is the labeled image.

displayed the Canny edge detection image and the image


after the morphological operation. Finally, the results of
each method were examined on the CRACK500 dataset.
FIGURE 8. histogram of the test results of the ablation experiment on
The visualization results and values obtained from this CRACK500.
study are presented in Figure 7 and Table 4, respectively.

amount of noise, and the road crack edge was not clear
enough. Additionally, after the morphological operation,
the resulting map differed from the real value. Method 2
improved on method 1 by using bilateral filtering instead
of Gaussian filtering, which significantly enhanced the
effect of denoising and edge preservation. Despite elimi-
nating many environmental noises, it failed to extract all
the road crack edge details. Method 3 enriched fracture
edge information by introducing the four-way Sobel op-
erator based on method 2, but also mistakenly extracted
some environmental noise. Method 4 further reduced
interference by adding adaptive bilateral filtering based
on method 3 and continuously adjusting the filtering
parameters through the feedback of the resulting map.
However, the detected road crack edge was still not
accurate enough. Method 5 adapted Canny detection
based on method 4, which enabled the algorithm to
FIGURE 7. Visualization result map of ablation experiment. select the Canny threshold adaptively according to the
image gradient information, achieving better detection
We compared and analyzed different methods’ road results.
crack detection capabilities based on the detection re- The evaluation indicators used to measure the detec-
sults in Table 4 and the performance comparison in tion results of each method on the CRACK500 dataset
Figure 8. Method 1 exhibited poor anti-interference and were the MIoU value and the MAE value, as shown in
edge feature extraction, as shown in Figure 8. The Table 4. The MIoU value represents the overlap between
Canny image generated by it contained a significant the detected and actual areas. In contrast, the MAE
VOLUME 10, 2022 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

value represents the difference between the average pixel evaluation indicators are presented in Table 5. Figure 9
of the detected image and the actual image. From Table visually illustrates the performance of each algorithm,
4, we can see the algorithm modules of each method and allowing for a more intuitive comparison of their pros
their detection results MIoU and MAE tested on the and cons in detecting road cracks.
CRACK500 dataset. Figure 8 visually demonstrates the
performance improvement of each method after adding TABLE 5. Test results (%) of the proposed method and other test results
on CRACK500
different algorithm modules. From method 1 to method
5, the MIoU value gradually increased, and the MAE
Model MIoU(%) MAE PA(%) F1-Score(%)
value gradually decreased. Method 5 improved MIoU
by approximately 45% and MAE by approximately PSPNet 62.68 2.81 93.55 50
ICNet 56.15 2.77 91.82 39
20 compared to method 1. Therefore, this experiment UNet 67.76 3.01 95.36 53
proves that each new module can improve the road crack DeepLabV3+ 71.56 2.23 97.12 56
detection ability of the algorithm, and each algorithm OpenCV 65.28 6.24 94.07 49
Ours 77.64 1.55 97.38 63
module proposed in this paper is reliable and necessary.

D. COMPARATIVE EXPERIMENT RESULTS AND ANALYSIS


The CRACK500 dataset includes small-sized surface
defects that resemble the background and images taken
under varying lighting conditions, making it a complex
dataset for surface defect detection.
Fig. 10 shows the numerical IoU and visual esti-
mates of some surface examples and methods for the
CRACK500 dataset. Fig. 10 displays ground fracture
scenarios for different materials, with examples of larger
background disturbances, as in rows 1, 6, and 7 and
images with minor surface defects, as in row 5. As
ICNet is a lightweight network, the IoU values for ICNet
detection, as shown in Fig. 10, were not high compared
to those for other methods, especially for the images
in row 5, which contain more road crack details. The
DeepLabV3+ network detected the various road crack
FIGURE 9. Histogram of comparative test results of each algorithm in
example images in Fig. 10 better than the other meth- CRACK500.
ods; however, tiny details are lost in the example images
in rows 1, 2, and 7. Adaptive Canny detected road cracks As the MIoU and Mean MAE can measure the per-
better in the example images; however, noise interference formance of binary classification semantic segmentation
was observed in the road crack images in rows 1, 6, algorithms in different aspects, this paper selects them
and 7, which had high background interference. There as the final evaluation indices. The fusion method pro-
was excessive detail loss in the example image in row 5. posed in this paper obtained the highest score, with an
In addition, the adaptive Canny method suffered from MIoU of 77.64% and an MAE of 1.55, by integrating
insufficient smoothing of the detected road crack edges. the adaptive Canny and DeepLabV3+ methods. The
The method of adaptive Canny fused with adaptive Canny method accurately detects small road
DeepLabV3+ proposed in this study showed higher IoU cracks and uses this as a prior on the DeepLabV3+
values in road crack example map detection than other network, while the DeepLabV3+ method smoothes the
methods, as shown in Fig. 10. The detection in rows road crack edges and makes up for the parts not detected
1, 6, and 7 where the background environment is more by Canny. The DeepLabV3+ method scored the second
disturbed, is satisfactory, and the road crack body can highest, with an MIoU of 77.64% and an MAE of 2.23.
be detected by removing the background interference. Its success can be attributed to the introduction of
Detection is also improved over other methods for row the resolution, which allows for control of the feature
5, which contains a large area of road cracks with more extraction of the encoder, balance of accuracy and time
details. The IoU scores of the proposed method are consumption through atrous convolution, and increase of
higher than those of the DeepLabV3+ and adaptive the receptive field without losing information, resulting
Canny methods in the example map detection and in each convolution output containing an extensive range
higher than those of the other networks. of information. Adaptive Canny’s MIoU score ranked
The proposed method in this paper was tested three fourth among all methods, and it performed poorly on
times with other semantic segmentation networks on the MAE index, mainly due to the non-crack area pixels
the CRACK500 dataset, and the average values of the that improved the MAE score.
12 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

FIGURE 10. Visualisation results from the CRACK500 dataset.

Therefore, the information provided in Figure 9 and


Table 5 suggests that the method proposed in this
paper has better detection results on the CRACK500
dataset and improved detection results compared to
the adaptive Canny, DeepLabv3+, PSPNet, ICNet, and
Unet networks. It is difficult to cope with the complex
and changing environment by using only the adaptive
Canny method because it is easily disturbed by noise
when detecting in a noisy environment, resulting in false
detection in the final detection result map. When only
DeepLabV3+ is used for detection, some details will be
lost when fusing and splicing the high-feature and low-
feature layers, unfavorable to the final detection result.
The method proposed herein takes advantage of each
and the combined detection. The combined detection
method can overcome the interference of complex en-
vironmental noise. It can also be used to accurately
detect complex cracks with many details, such that the
final results are better than those of other comparison
methods.
We evaluated the generalization capability and lim-
itations of the proposed method by applying it to a FIGURE 11. The trained model is tested in GAPS384.

different dataset, namely GAPS384, after training it on


the CRACK500 dataset. The results of our experiments
are presented in Figure 11 and Table 6, which include
TABLE 6. THE TRAINED MODEL IS TESTED IN GAPS384
both visual results and numerical values.
Based on the results presented in Figure 11 and Table
Model MIoU MAE PA(%) F1-Score(%)
6, the proposed method in this study demonstrates
a certain degree of generalization ability on different Ours 51.29 3.58 88.52 46
datasets. It can reliably detect road cracks even after
VOLUME 10, 2022 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection

scene changes. However, this method has some lim- [7] H. Xu, et al., “Pavement crack detection based on OpenCV
itations, specifically regarding the road crack images and improved Canny operator,” Comput. Eng. Des., vol. 35,
pp. 4254-4255, 2014.
in the GAPS384 dataset, which originate from asphalt [8] F. Zhao, et al., “Application of improved Canny operator in
pavement. Due to the small difference between the road crack detection,” Electron. Meas. Technol., vol. 41, no. 20,
crack and the road surface, the presence of low light, and pp. 107-111, 2018. DOI:10.19651/[Link].1801773.
[9] S. G. Monicka, D. Manimegalai, and M. Karthikeyan, “De-
interference from road signs, such as zebra crossings, this tection of microcracks in silicon solar cells using Otsu-Canny
method is unsuitable for detecting road cracks in this edge detection algorithm,” Renew. Energy Focus, vol. 43, pp.
case, resulting in a weak detection ability. The detection 183-190, 2022.
[10] M. Huang, Y. Liu, and Y. Yang, “Edge detection of ore and
results are displayed in lines 2 and 3 of Figure 11. rock on the surface of explosion pile based on improved Canny
Furthermore, the information extracted from the road operator,” Alex. Eng. J., vol. 61, no. 12, pp. 10769-10777,
cracks by the proposed method is insufficient, and the 2022. DOI:10.1016/[Link].2022.04.019.
[11] A. Akagic, et al., “Pavement crack detection using Otsu
road cracks cannot be successfully detected. thresholding for image segmentation” 41st International Con-
vention on Information and Communication Technology,
V. CONCLUSION Electronics, and Microelectronics (MIPRO), vol. 2018. IEEE,
2018, pp. 1092-1097.
This study proposed a feature fusion-based adaptive [12] A. Zhang, et al., “Automated pixel-level pavement crack
Canny and semantic segmentation network for road detection on 3D asphalt surfaces using a deep-learning net-
crack detection. Based on improved Canny edge de- work,” Comput. Aid. Civ. Infrastruct. Eng., vol. 32, no. 10,
pp. 805-819, 2017. DOI:10.1111/mice.12297.
tection, the method fuses the Canny crack detection [13] A. Zhang, et al., “Deep learning–based fully automated
map with the low feature layer in the DeepLabV3+ pavement crack detection on 3D asphalt surfaces with
detection network after extracting information from an improved CrackNet,” J. Comput. Civ. Eng., vol. 32,
no. 5, p. 04018041, 2018. DOI:10.1061/(ASCE)CP.1943-
the convolutional feature extraction module. Finally, it 5487.0000775.
splices it with a high feature layer after the resulting map [14] Y. Fei, et al., “Pixel-level cracking detection on 3D asphalt
is obtained. The experimental results on the CRACK500 pavement images through deep-learning-based CrackNet-V,”
IEEE Trans. Intell. Transp. Syst., vol. 21, no. 1, pp. 273-284,
dataset demonstrate the effectiveness of the proposed fu- 2019. DOI:10.1109/TITS.2019.2891167.
sion method, with an MIoU of 77.64 and an MAE of 1.55, [15] A Ji, X Xue, Y Wang, et al., An integrated approach to
outperforming other semantic segmentation detection automatic pixel-level crack detection and quantification of
asphalt pavement[J]. Automation in Construction, 2020, 114:
algorithms. This method can reliably eliminate interfer- 103176.
ence and detect a more comprehensive range of fracture [16] X Yang, H Li, Y Yu, et al., Automatic pixel-level crack detec-
details. However, it still has limitations, with detection tion and measurement using fully convolutional network[J].
Computer-Aided Civil and Infrastructure Engineering, 2018,
performance deteriorating in dark environments or when 33(12): 1090-1109.
interfering objects are present. Thus, future research is [17] Y Liu, J Yao, X Lu, et al., DeepCrack: A deep hierarchical
necessary to optimize the algorithm further and enhance feature learning architecture for crack segmentation[J]. Neu-
rocomputing, 2019, 338: 139-153.
its detection accuracy in such scenarios while improving [18] H Üzen, M Turkoglu, M Aslan, et al., Depth-wise Squeeze
its generalization ability. and Excitation Block-based Efficient-Unet model for surface
defect detection[J]. The Visual Computer, 2022: 1-20.
[19] X Sun, Y Xie, L Jiang, et al., Dma-net: Deeplab with multi-
REFERENCES scale attention for pavement crack segmentation[J]. IEEE
[1] Y. Pan, G. Zhang, and L. Zhang, “A spatial-channel hi- Transactions on Intelligent Transportation Systems, 2022,
erarchical deep learning network for pixel-level automated 23(10): 18392-18403.
crack detection,” Autom. Constr., vol. 119, p. 103357, 2020. [20] Z Liu, X Li, J Li, et al., A New Approach to Automatically
DOI:10.1016/[Link].2020.103357. Calibrate and Detect Building Cracks[J]. Buildings, 2022,
[2] N. Safaei, et al., “Gasoline prices and their relationship to the 12(8): 1081.
number of fatal crashes on US roads,” Transp. Eng., vol. 4, [21] M Cai, X Yi, G Wang, et al., Image Segmentation Method
p. 100053, 2021. DOI:10.1016/[Link].2021.100053. for Sweetgum Leaf Spots Based on an Improved DeeplabV3+
[3] B. Safaei, et al., “Studying the risks and factors contributing Network[J]. Forests, 2022, 13(12): 2095.
to motorcycle crashes, and prioritizing strategies to reduce [22] C. Saravanan, “C. Saravanan color image to grayscale im-
fatalities, and improve community health,” no. Febr., 2021. age conversion, computer engineering and applications (IC-
[4] C Wang, Y Li, and Y Qi. Comparison Research of Capa- CEA)” Second International Conference on IEEE, 2010.
bility of Several Edge Detection Operators[C]//International DOI:10.1109/ICCEA.2010.192.
Research Association of Information and Computer Sci- [23] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, Bilateral
[Link] of International Conference on Industrial Filtering: Theory and Applications, Now, 2009.
Technology and Management Science(ITMS 2015).Atlantis [24] S. H. Li, X. Liu, W. Fang, D. Y. Zhang, H. Q. Fei, C. Li, and
Press,2015:796-799. Z. Qin, “Detection of tiny leakage points based on bilateral
[5] L C Chen, Y Zhu, G Papandreou, et al. Encoder-decoder filtering and frame difference variance method,” Sci. Technol.
with atrous separable convolution for semantic image seg- Eng., vol. 22, no. 25, pp. 11084-11090, 2022.
mentation[C]//Proceedings of the European conference on [25] Q. Wang, et al., “Anomaly detection in periodic motion
computer vision (ECCV). 2018: 801-818. scenes based on multi-scale feature Gaussian weighting anal-
[6] Z Zhu, P Zhu, J Zeng, et al. A Surface Fatal Defect De- ysis,” Meas. Sci. Technol., vol. 30, no. 5, pp. 2-6, 2019.
tection Method for Magnetic Tiles based on Semantic Seg- DOI:10.1088/1361-6501/ab0479.
mentation and Object Detection: IEEE ITAIC (ISSN: 2693- [26] H. Zhu, L. Lin, D. Q. Chen, and J. Chen, “A PCB image
2865)[C]//2022 IEEE 10th Joint International Information localization correction method based on multi-directional
Technology and Artificial Intelligence Conference (ITAIC). improved Sobel operator,” J. Electron. Meas. Instrum., vol.
IEEE, 2022, 10: 2580-2586. 33, no. 09, pp. 121-128, 2019. DOI:10.13382/[Link].B1902204.

14 VOLUME 10, 2022

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888

[27] H. K. Xu, Y. Y. Qin, and H. R. Chen, “An improved Canny- XIAOXU WEI received the B.S. degree (2011)
based edge detection algorithm,” Infrared Technol., vol. 36, in automation and the M.S. degree (2014) in
no. 03, pp. 210-214, 2014. control science and engineering from Wuhan
[28] Z. Wang and S. X. He, “An adaptive edge detection method University of Technology, Wuhan, China.
based on Canny’s theory,” Chin. J. Graph., vol. 9, no. 08, pp. She is currently pursuing the doctor’s de-
65-70, 2004. gree in Automotive electronics, the School
[29] B. Feng, J. Wang, and J. Sun, “Threshold segmentation of of Automotive Engineering, Wuhan Univer-
Ostu images based on quantum particle swarm algorithm,”
sity of Technology, Wuhan, China.
Comput. Eng. Des., vol. 29, no. 13, pp. 3429-3431, 2008.
Her research interests include robotic ma-
[30] S. Zhang, L. X. Yang, and L. Ding, “Research on V-shaped
weld seam feature extraction based on adaptive bilateral chining and machine learning.
filtering,” Manuf. Technol. Mach. Tool., vol. 07, pp. 125-129,
2021. DOI:10.19287/[Link].1005-2402.2021.07.024.
[31] Z. Y. Cheng, “An efficient adaptive bilateral filtering
method,” Digit. Technol. Appl., vol. 37, no. 10, pp. 121-123,
2019. DOI:10.19695/[Link].cn12-1369.2019.10.68.
[32] S. Bell, et al., “Inside-outside net: Detecting objects in context
with skip pooling and recurrent neural networks,” Proc. IEEE
Conference on Computer Vision and Pattern Recognition,
2016, pp. 2874-2883.
[33] J. Fan, et al., “Multi-scale feature fusion: Learning better
semantic segmentation for road pothole detection” IEEE
International Conference on Autonomous Systems (ICAS),
vol. 2021. IEEE, 2021, pp. 1-5.
[34] H. Üzen, et al., “Depth-wise Squeeze and Excitation Block-
based Efficient-Unet model for surface defect detection,” Vis.
Comput., pp. 1-20, 2022. DOI:10.1007/s00371-022-02442-0.
[35] M Eisenbach, R Stricker, D Seichter, et al., How to get pave-
ment distress detection ready for deep learning? A systematic
approach[C]//2017 joint international conference on neural
networks (IJCNN). IEEE, 2017: 2039-2047.
[36] F Yang, L Zhang, S Yu, et al., Feature pyramid and hier-
archical boosting network for pavement crack detection[J].
IEEE Transactions on Intelligent Transportation Systems,
2019, 21(4): 1525-1535.

YONGSHENG WANG received the B.E. de-


gree in Automation from Wuhan University
of Technology, Wuhan, China, in 2011 and
the M.S. degree in Pattern Recognition and
Intelligent Systems from Huazhong Univer-
JIE LUO received the B.S. degree (2006) in sity of Science and Technology, Wuhan,
automation and the M.S. degree (2010) in China, in 2014.
control science and engineering from the He is currently a Lecturer with the School
School of Automation, received the Ph.D. of Information Engineering, Wuhan Univer-
degree (2013) in information and commu- sity of Technology, Wuhan, China.
nication engineering from the School of in- His research interests include autonomous vehicles and percep-
formation engineering, Wuhan University of tion.
Technology, Wuhan, China.
He is currently an associate professor
with the School of Automation, Wuhan
University of Technology.
His current research interests include machine learning, and
intelligent vehicles.

HUAZHI LIN received the B.E. degree in Au-


tomation from Chang’an University, xian,
China, in 2021.
He is a postgraduate student at the
School of Automation, Wuhan University of
Technology, Wuhan, China.
His research interests include computer
vision and image processing.

VOLUME 10, 2022 15

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]

You might also like