Feature Fusion for Road Crack Detection
Feature Fusion for Road Crack Detection
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.0122113
ABSTRACT Road cracks, which are a common hazard in pavements throughout the life cycle
of a road, can degrade the performance of the road, shorten its service life, and endanger the
safety of vehicles. Traditional vision machine detection methods can detect road crack details but
suffer from poor stability and generalization ability, whereas semantic segmentation detection,
although more stable, cannot track fine road crack information. To combine the advantages of
both methods and improve the accuracy of road crack detection, a novel feature fusion road crack
detection method is proposed in this study. First, the bilateral filter and four-way Sobel operator
are introduced into the Canny algorithm to enhance the noise reduction effect and extract edge
features more effectively. Second, the dynamic threshold is generated adaptively by the gradient
information after non-maximum suppression. Subsequently, the detection map is morphologically
processed, the connected areas are ranked, and the bilateral filter parameters are adjusted based on
the detection results. The Canny road crack detection map is then extracted by the convolutional
feature extraction module, fused with the low feature layer in the DeepLabV3+ detection network,
and finally stitched with the high feature layer; the resulting map is obtained after convolutional
feature extraction. The method was validated on the publicly available complex road crack dataset
CRACK500; the experimental results showed that feature fusion outperformed the adaptive Canny,
DeepLabV3+, Unet, PSPnet, and ICnet algorithms by more than 6.5% on the Mean Intersection
over Union(MIoU) and also in Mean Absolute Error(MAE) by effectively combining crack features
and improving the detection accuracy.
INDEX TERMS adaptive Canny, feature fusion, road cracks, semantic segmentation
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
of computers renders the road crack detection process the image before detecting the edges. It is not easily
less susceptible to complex and changing environments, disturbed by noise and can detect true weak edges. Two
and the detection results are more standardized and different thresholds are used to detect strong and weak
accurate. Currently, two mainstream machine detection edges. When weak edges are connected to strong edges,
methods are road crack detection based on traditional the weak edges are included in the output image. The
vision and road crack detection based on semantic detected edges are more complete, better connected,
segmentation. and have higher localization accuracy. Therefore, the
In the traditional visual inspection method, image Canny operator is chosen as the traditional image pro-
processing technology is employed to preprocess images cessing method for its excellent edge detection ability
and reduce noise interference. Subsequently, a featured and suitability for crack detection. This paper selects
operator is used to extract the characteristic information DeepLabv3+ as the semantic segmentation network.
of the road crack, enabling differentiation between the DeepLabv3+ [5] is an extension of DeepLabv3 and is
cracked and non-cracked regions. However, traditional considered a new pinnacle in semantic segmentation by
image processing suffers from several limitations, includ- the academic community. DeepLabv3+ combines the
ing the need to set multiple parameters, lack of self- advantages of the spatial pyramid pooling model and the
learning ability, vulnerability to environmental noise, encoder-decoder structure. It adds a simple and effective
poor adaptability to changes in the detection environ- decoder module to refine the segmentation results, espe-
ment, and limited generalization ability. In the semantic cially along the object boundaries. The authors further
segmentation detection method, a convolutional neural explore the application of depth-separable convolution
network is trained to identify road crack features and to the atrous spatial pyramid pooling and decoder mod-
classify each image pixel. While this approach is more ule, resulting in a faster and stronger encoder-decoder
stable, some issues arise in actual road crack scenarios, network. The DeepLabv3+ network has an advanced
such as pixel ratio disparities between the road crack architecture, excellent segmentation accuracy, and enor-
and background. As a result, the semantic segmentation mous potential to handle surface feature segmentation
detection method may overlook crucial details in the tasks in crack scenarios [6]. Therefore, DeepLabv3+ is
road crack. chosen as the semantic segmentation part, and many
To address the issue of excessive manual parameter researchers use DeepLabv3+ for crack detection, which
adjustment, poor stability in traditional visual inspec- will be discussed later in this paper.
tion methods, and the loss of detail associated with In this study, the primary contributions presented in
semantic segmentation detection, this paper proposes a this article are as follows:
road crack detection method based on feature fusion, • An adaptive Canny dual-threshold selection
adaptive Canny, and DeepLabv3+. In traditional image method is proposed in this paper. The method generates
processing, the four-directional Sobel operator is used a gradient histogram by traversing the image pixels. It
instead of the original two-directional Sobel operator. calculates the gradient region most likely to be the road
Bilateral filtering replaces the original Gaussian filtering crack edge based on the gradient variance and sets high
to enhance feature extraction and noise reduction. Adap- and low thresholds accordingly. Experimental results
tive bilateral filtering is obtained through feedback from show that the adaptive Canny threshold set according
the results, and adaptive Canny thresholds are obtained to the gradient histogram can effectively detect road
from the image gradient information, improving the crack edges and solve the problems of inaccurate and
automation level of detecting complex road cracks. The inefficient manual threshold setting.
adaptive Canny detection image is fused with the low- • An adaptive bilateral filtering method is proposed
level features of the DeepLabV3+ network through a in this paper. The method adjusts the filtering pa-
feature extraction module to solve the problem of losing rameters through feedback from the detection results.
some details caused by the combination of high and low- Firstly, the number and area of connected regions of road
level features, thereby improving the accuracy of road crack edge images generated are detected. If there are
crack detection. multiple small connected regions, it indicates that the
This paper selects the Canny edge detection operator noise interference is too large and the filtering strength
as the traditional image processing method. There are needs to be increased. If the connected regions are few
many edge detection operators, and Wang et al. [4] and small, it indicates that some road crack information
compare five commonly used operators and come to the is lost, and the filtering strength needs to be reduced.
following conclusions: the Roberts operator has good • A supervised learning method for feature fusion
localization but detects fewer edge details; the edge of adaptive Canny detection results and DeepLabV3+
detection effect of Sobel is similar to Prewitt, with clear network is proposed in this paper. The Canny result
contours, but wider edges; the Kirsch operator detects image is fused with the low-level feature layer in the
blurry edges and has low localization accuracy. The DeepLabV3+ network, increasing the road crack posi-
Canny operator uses a Gaussian function to smooth tion and details and compensating for the loss of details
2 VOLUME 10, 2022
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
when the high and low-level feature layers are fused. the natural iteration method was used to obtain the
These details can detect small cracks. The proposed image threshold, and discontinuous edges were repaired
fusion method is tested on the literature’s widely used using the morphological closure operation and thinning
CRACK500 complex road crack dataset. The experi- operators to obtain the final detected edges. Experi-
mental results show that this method is superior to ments showed that this improved algorithm maintains
the pre-fusion method in terms of MIoU and MAE the integrity of the detected edges and can reduce the
values and has better results than previous semantic interference of granular and flaky noise. Akagic et al. [11]
segmentation networks. proposed using Otsu-thresholded image segmentation
In the following sections, Section 2 introduces related for pavement crack detection. First, the method divides
research on road crack detection; Section 3 presents the input image into four independent sub-images of
the method proposed in this study; Section 4 presents equal sizes. Then, cracks are searched based on the
the implementation details, experimental results, and ratio between the Otsu threshold and the maximum
analysis; and Section 5 provides the conclusions. histogram value of each sub-image, and all segmented
sub-images are merged. This method performs better
II. RELATED WORK with low signal-to-noise ratios but is unsuitable for
This section introduces research on road crack detection complex crack environments.
and their advantages and disadvantages. Road crack
detection based on computer vision includes traditional B. ROAD CRACK DETECTION BASED ON SEMANTIC
visual and semantic segmentation detection. SEGMENTATION
Semantic segmentation, in this study’s context, involves
A. ROAD CRACK DETECTION BASED ON TRADITIONAL classifying each image pixel for road crack detection.
VISION Zhang et al. [12] developed CrackNet based on a con-
Following the traditional vision for road crack detection. volutional neural network to detect cracks at the pixel
Huan et al. [7] proposed an improved Canny operator level; CrackNet does not have any pooling layers, and its
for pavement road crack detection. They used morpho- feature map size is constant for all layers. Subsequently,
logical filtering to replace the original Gaussian filtering in 2018, Zhang et al. [13] proposed the improved Crack-
and Otsu’s maximum inter-class variance threshold seg- Net II. Compared with CrackNet, CrackNet II sets train-
mentation algorithm to achieve adaptive determination able parameters in all hidden layers to further improve
of dual thresholds. The experimental results show that crack detection performance. Fei et al. [14] proposed
the improved algorithm demonstrates improved crack CrackNet-V on top of CrackNet, which significantly
detection speed and accuracy, but only for relatively improves the crack detection rate. The above series of
simple crack scenes with little noise interference, and CrackNet models can maintain spatial resolution and
performs poorly in more complex crack environments. avoid the loss of spatial information during downsam-
Zhao et al. [8] further improved the Canny operator and pling. In addition, encoder-decoder architectures have
applied it to crack detection. They replaced the original been widely used for semantic segmentation. Among
Gaussian filter with bilateral filtering. They used the the various encoder-decoder architectures for road crack
Otsu algorithm to adaptively obtain a double threshold detection, DeepLabv3+ is a widely used technique. Us-
and morphological filtering to eliminate small voids and ing their dataset, Ji et al. [15] were the first to apply
fill cracks in the contour lines. The improved method DeepLabv3+ to road crack detection. The effectiveness
enhances the accuracy of detection but is still ineffective of DeepLabv3+ was verified for single, multiple, in-
in detecting cracks in complex environments and has tersecting, and crocodile cracks, and its performance
weak generalization capability; it is thus unable to meet was superior to that of FCN [16] and DeepCrack [17].
the needs of simultaneous multi-scene crack detection. However, DeepLabv3+ suffers from poor performance
Monicka et al. [9] proposed an improved Canny operator in crack details, which could be attributed to the
for noise removal that incorporates adaptive threshold suppression of some crucial information in the high
selection, morphological operations, image segmenta- feature layer by the low feature layer during fusion [18].
tion, and image binarization. The experimental results Attention mechanisms have been used in some studies
show that the improved algorithm is more accurate to address this issue. Sun et al. [19] proposed a multi-
than traditional edge detection algorithms in detecting scale attention module in the decoder of DeepLabv3+
cracks. However, the problem of too many manually to generate attention masks and dynamically allocate
adjusted parameters persists, which affects detection weights between high-level and low-level feature maps.
efficiency. To improve the Canny operator, Meiling et Dynamic weights can assign more rational weights to
al. [10] first used statistical filtering to solve the image different feature maps than fixed weights across different
edge blurring problem, thereby calculating the gradient features. The network exhibited significantly improved
amplitude and direction of the crack image based on segmentation performance and performed well on the
the 3 × 3 gradient template of the operator. Finally, road crack dataset. Liu et al. [20] improved DeepLabv3+
VOLUME 10, 2022 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
by using Resnet50 as the backbone network to enhance in environments with sufficient lighting and minimal
its feature extraction capability. While this improved noise interference. Its detection accuracy is superior to
method showed higher detection accuracy than other that of semantic segmentation. However, this approach
common pixel-level segmentation algorithms, it required is highly susceptible to environmental factors such as
increased time due to the increased network layers. Cai pollutants, lighting, and road signs, which can lead
et al. [21] utilized MobileNetV2 as the backbone feature to inaccurate detection results. Moreover, the manual
extraction network, substantially reducing the model’s parameter setting is laborious, and the method has
computational load and increasing the calculation speed. weak generalization ability and low efficiency. On the
Furthermore, they introduced the attention mechanism other hand, the road crack detection method based on
module into the backbone feature extraction network semantic segmentation is less prone to environmental
and decoder, further optimizing the model’s edge recog- disturbances, and it exhibits strong generalization abil-
nition effect and segmentation accuracy. ity and high robustness. However, it needs improvement
to detect crack details effectively.
TABLE 1. SUMMARY TABLE OF WORK RELATED TO ROAD CRACK
DETECTION
III. METHODS
Thesis This section presents the proposed method in this pa-
Research Methods Advantage Disadvantages
Tittle per, which comprises three parts: the adaptive Canny
Adaptive Selection of Avoid the
Does not solve
the interference
method, the DeepLabv3+ method, and the fusion of the
[7]
Canny Double Thresh-
olds Using Ostu Algo-
tedious manual
parameter
of road two.
environmental
rithm adjustment
noise This study proposes feature fusion-based adaptive
Use the Ostu algorithm
Susceptible to
environmental
Canny and DeepLabV3+ for crack detection, which is
[8]
to obtain the threshold
Improved
detection
interference, used for pixel-level crack detection. The flowchart is
and introduce morpho- poor detection
logical filtering
accuracy
effect on shown in Fig. 1. The proposed method is divided into
Added adaptive
complex cracks two parts: (1) adaptive Canny for crack detection in
threshold selection,
Better ability
Many parame- the original image and (2) adaptive Canny detection
[9] morphological
operations, image
to remove noise
ters need to be
set manually with DeepLabV3+ network fusion for crack detection
segmentation in the resulting image. The original incoming image
Preserves edge
Solving Blurred Edges integrity and is preprocessed in the first part to reduce ambient
[10] with Statistical Filter-
ing
reduces grain
and flaky noise
noise interference by grayscaling and bilateral filtering.
interference Second, the gradient operator is used to calculate the
Better
Use Otus to segment
performance at
Not suitable for gradient magnitude and direction of the pixel points
[11] images, detect them complex crack
and merge them
low signal-to-
noise ratios
environment of the entire image, whereby a suitable threshold is
Maintain spa-
The detection
generated based on the gradient map to complete edge
Semantic segmentation
tial resolution
to avoid loss
speed is not detection. Finally, morphological operations are used to
[12] network developed fast and the
based on CNN
of spatial infor-
accuracy is not refine the edge-detected image. The original image is
mation during
downsampling
high fed into the DeepLabV3+ network for detection in the
Trainable parameters
Improved
performance
second part, divided into two paths after the dynamic
[13] are set in all hidden
layers
for detecting convolutional neural network (DCNN). One path con-
cracks
Compared to the orig- Improved sists of the low feature layer, which is to be fused with
[14]
inal version of Crack-
Net, CrackNet-V has a
detection
accuracy and
the detection result map of the first part after the con-
deeper architecture but computational volutional feature extraction module; the other path is
fewer parameters
A Semantic
efficiency
Loss of crack passed through the enhanced feature extraction network
Clear
[15]
Segmentation Network
boundary
details during known as the atrous spatial pyramid pooling network
Based on Encoder- feature layer
Decoder Architecture
extraction
fusion for information extraction to generate the high feature
[19]
Introducing multi-scale
attention modules into
Enhance
ability
the
to
Insufficient
generalization
layer. Subsequently, the high and low feature layers are
the decoder detect details ability stacked, followed by two depth-separable convolution
Select Resnet50
as the backbone
Enhanced fea-
Slow down op- blocks, and then upsampled to the size of the input
[20] ture extraction
network for improving
DeepLabv3+
capability
eration speed image to complete the final prediction.
Using MobileNetV2 as
the backbone feature Enhance edge
[21] extraction network and extraction ca- A. ADAPTIVE CANNY DETECTION
introducing attention pability
mechanism 1) Image pre-processing
The publicly available complex road dataset CRACK500
Table 1 provides a summary of related work. The road covers a wide range of cracks caused by factors such as
crack detection method based on traditional vision can road material and contaminants. To address the com-
detect the edges of road cracks and even minor details plexity of the pavement images, the crack images must
4 VOLUME 10, 2022
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
FIGURE 1. The block diagram of the crack detection method proposed in this paper is presented below. The left half of the diagram shows the adaptive
Canny detection process, followed by fusing the resulting image with the DeepLabV3+ network structure in the right half for further detection.
be preprocessed before they are fed into the inspection kernel and the central pixel, color intensity, and depth
system, as outlined in the following steps: distance, among others. Both weights are considered
(1) Grayscale. The acquired dataset images are in when calculating the central pixel, and the formulae are
color. To reduce memory consumption and increase the shown in (1) and (2).
processing speed of the images, they must be converted
into 8-bit greyscale images to reduce the amount of data 1 X
Ibf
p = Gδ (p − q) Gδr (|Ip − Iq |) Iq (1)
for processing road images [22]. Wpb f q∈s s
(2) Bilateral filtering [23]. The grayscale image still
Wpbf =
X
contains several environmental distractors affecting the Gδs (p − q) Gδr (|Ip − Iq |) (2)
subsequent edge feature extraction; therefore, filtering is q∈s
required. The original Canny algorithm applies Gaussian where: Ip is the input image and Iq is the output
filtering to the image. Gaussian filtering is a weighted image; p and q are any two points on the image; s
average of neighboring pixels in the domain and can is the range region centered on a point; Ibf p is the
be used to remove high-frequency noise. However, the filtered image; Wpbf is the normalization factor; δs is
edges are also high-frequency signals, and some edge the spatial domain standard deviation; δr is the value
information of the cracks is lost while eliminating in- domain standard deviation; Gδs (p − q) is the spatial
terference noise, which affects the subsequent extraction weight and Gδr (|Ip − Iq |) is the pixel range weights. For
of fine crack features. This study used bilateral filtering the points to be filtered, the weight of the pixel point
to replace the original Gaussian filter. Bilateral filtering in its domain is related to the distance and similarity
is a nonlinear filter that can maintain edges and reduce between the two pixel points. Combining the spatial
noise smoothly and is thus ideal for crack pictures of distance and degree of similarity, two-sided filtering can
complex pavements. Bilateral filtering uses a weighted be achieved using the formulae shown in (3) and (4).
average method [24], in which the intensity of a pixel
is represented by the weighted average of the luminance −∞
Z −∞
Z
values of the surrounding pixels; this weighted average h (x) = k −1
(x) f (ε) c (ε, x) s × [f (ε) , f (x)] dε (3)
is based on a Gaussian distribution. Most importantly, +∞ +∞
the weights of the bilateral filter consider not only the
Euclidean distance of the pixel but also the radiometric −∞
Z −∞
Z
differences in the pixel range domain, including the k (x) = c (ε, x) s [f (ε) , f (x)] dε (4)
degree of similarity between the pixel in the convolution +∞ +∞
VOLUME 10, 2022 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
Where: h (x) is the filtered grey value of point x; 3) Canny Adaptive Threshold Selection
c is the weight of Gaussian space; s is the Gaussian In response to the problem of the traditional Canny
weight of the degree of similarity between pixels [25]; algorithm requiring manual adjustment of high and low
f (x) is the pixel value of the pixel point; k−1 (x) is the thresholds, this study proposes an adaptive dynamic
normalization factor. thresholding method. Gradient conversion is performed
To obtain a better result map, the parameters of c、s on the image after non-maximum suppression, convert-
and filter kernel size must be set manually. Manual ad- ing the gradient value to 0 and 255. The gradient
justment is inefficient because the dataset contains many magnitude of each pixel point is counted to generate
images of different types. To address this problem, this a gradient histogram describing the edge strength [28].
study proposes an adaptive bilateral filtering method, The gradient histogram has information on image edge
which is presented in the subsequent sections. strength, where pixel points with gradient 0 indicate
non-edge regions that are excluded to not affect the
2) Calculating image gradients subsequent calculation. The remaining points contain
The Canny algorithm uses the traditional Sobel opera- all the detected edge information, which contains both
tor, which only considers gradient information in both correct and incorrect edges, that is, ambient noise, as
directions, is not robust in analyzing images, and is shown in Fig. 3.
easily affected by noise. It does not consider the image
edge direction and often loses edge details [26].
To reduce the influence of noise on the image, enhance
the detection of contour lines, and retain edge details,
this study drew on the first-order gradient template of
the Sobel operator to determine the gradient amplitude
of the image, thereby extending it to a first-order gradi-
ent template in four directions: horizontal, vertical, 45°,
and 135° [27], as shown in Fig. 2.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
be the set of incorrect edges, that is, environmental 4) MORPHOLOGICAL MANIPULATION AND ADAPTIVE
interference noise. The gradient with the highest number BILATERAL FILTERING
of pixels in the gradient histogram is called the pixel- The morphological closure operation [29] is image expan-
maximum gradient Gmax , and the variance of the gradi- sion followed by erosion. When applied to crack images,
ent of all pixels within the sub-image relative to Gmax this process removes narrow interruptions and long thin
is calculated and referred to as the pixel-max gradient divots, eliminates small holes, and replenishes cracks in
variance σmax . the contour lines. This operation smoothens the contour
edges.
After the morphological closure operation, the edges
v
u 255 255
demonstrate superior smoothing and noise reduction
uX 2
X
σmax =t (i × Gi − max × Gmax ) / (i × Gi ) (7)
i=1 i=1 while retaining more image edge details. At this point,
only the crack edge information is obtained, and the
Where i is the gradient value, and max is the gradient center of the crack is an undetected area, forming a
value containing the maximum number of pixels. large hole that must be eliminated by morphological
When there is little ambient noise, the gradient his- reconstruction of the image. After morphological recon-
togram has only a single peak, the pixel gradient values struction, crack information is fully detected, but some
are concentrated around the most-valued gradient Gmax , environmental noise is also detected. The non-cracked
and σmax is small. When the gradient histogram reflects areas are filtered out by determining the connected areas
not only cracked edges but also a large proportion of and comparing the size of each connected area across the
false edges caused by ambient noise, the correct edge image.
pixel gradients are distributed relatively far from Gmax ; As mentioned earlier, the parameters c, s, and the
consequently, σmax is large. Therefore, based on this filter kernel size must be set manually to obtain a
analysis, high and low thresholds can be adaptively set. better bilateral filtering effect. This study proposes an
Analysis of the principle of the Canny algorithm shows adaptive method with feedback adjustment based on the
that the double-threshold connection ignores weak edge detection results to solve this problem.
points below the low threshold τl and retains strong edge
(1) The size of the filter kernel. Filter kernel size
points above the high threshold τh . Edge points between
determines the number of pixels involved in filtering cal-
τl and τh are retained depending on whether there are
culations. A larger filter kernel leads to better denoising
strong edge points in the neighborhood. Therefore, τl
but requires greater computational effort and generates
must be set outside the wrong edge region. Otherwise,
a blurred image. The filtering speed is increased to
a large amount of false edge noise will be introduced
reduce the amount of computation; to maintain a good
into the final result. In this study, we set τl adaptively.
filtering effect, the kernel size is generally chosen to be
Gmax reflects the central position of the distribution of
no larger than 15, depending on the situation.
the error edge, that is, ambient noise, in the gradient
(2) Gaussian spatial weight. A linear relationship
histogram, whereas σmax reflects the dispersion of the
is observed between the Gaussian spatial weight and
gradient distribution in the gradient histogram relative
the size of the filter kernel [30]. Because more than
to Gmax that is, the dispersion relative to the error edge.
95% of the components of the Gaussian function are
Therefore, it can be used as Gmax , and the σmax is used
concentrated in the interval [−2σ, 2σ], to obtain a larger
to calculate the extent of the erroneous edge region.
number of samples while ensuring image clarity, the
The distribution pattern of each gradient pixel in the
spatial weight is defined as,
gradient histogram conforms to a normal distribution,
and it is empirically known that the gradient values
of the crack edges are larger so that few correct edge mr
c= (9)
regions are in regions with larger gradient values. Ac- 2
cording to the 3σ principle for the normal distribu- Where c is Gaussian spatial weight, m is a constant,
tion, when the low threshold value τl is greater than and r is the filter kernel size. The best value of m is
the pixelmaximum gradient Gmax by three times the experimentally determined to be in the range of [0.75-
pixelmaximum gradient variance σmax , τl , is considered 0.85], which can effectively over-disperse the spatial
outside the wrong edge region; thus the environmental weight, resulting in blurred images.
noise edge can be prevented from appearing in the (3) Gaussian pixel weight. Compared with Gaussian
contour map. The formula for calculating τl is, spatial weight, the effect of bilateral filtering is more
easily influenced by Gaussian pixel weight [31]. The
τl = Gmax + 3 × σmax (8) feedback from the detection result map is used to adjust
Gaussian pixel weight, which can be more influential in
Once τl has been determined, τh is generally consid- bilateral filtering. When the percentage of the maximum
ered to be three times the low threshold τl . connected area is too large, and the number of connected
VOLUME 10, 2022 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
areas is too many, it indicates that noise significantly convolution to compress the number of channels. Two
impacts the detection area and that Gaussian pixel depth-separable convolution blocks are then used to ex-
weight should be increased. Gaussian pixel weight is tract features, which are then upsampled and magnified
adjusted according to two feedback indicators: the maxi- four times to obtain a prediction map of the same size
mum connected area ratio and the number of connected as the original image.
areas. After conducting numerous experiments, it was
concluded that the detection region of a correct crack 2) Feature layer fusion
should be the largest connected region in the detection In image processing, feature fusion at different scales
map, comprising 12% to 18% of the entire image area, is an important tool for improving image results. In
and the number of connected regions should not exceed convolution, lower-level features have higher resolution
5. In the adaptive bilateral filtering method proposed and contain more location and detail information but
in this paper, the filter kernel size was set to 15, the are less semantic and noisier. In contrast, higher-level
Gaussian spatial weight was set to 8 using formula (9), features have stronger semantic information at the ex-
and the initial value of the Gaussian pixel weight was set pense of very low resolution and poor detail perception.
to 15. Next, the image was predicted, and the detection Therefore, fusing features from different scales is the key
results were looped through. If the largest connected to improving the results. In the DeepLabV3+ model, the
area accounted for more than 18% of the entire image high and low feature layers are stitched together using
area, or the number of connected areas was greater than concat [32], and the number of features describing the
5, it indicated that the image was significantly disturbed image (the number of channels) is increased; however,
by noise and that the algorithm had misidentified some the information under each feature does not increase,
non-crack areas. Gaussian pixel weight should be in- and the low feature layers possess less semantic infor-
creased to improve the filtering effect. Thus, the value of mation and play little role in the final prediction [33].
s was incremented by 1, and the detection was re-filtered. When combining different levels of feature layers, the
Conversely, if the largest connected region accounted for misalignment of information feature locations may lead
less than 12% of the entire image area, or the number to a loss of detail in detecting small cracks [34]. In this
of connected regions was less than 5, it indicated that study, we propose fusing the adaptive Canny detection
no cracks or some cracks were missing. In this case, map with lower feature layers after feature extraction
Gaussian pixel weight should be reduced to weaken the and performing sequential operations. The structure of
filtering effect. Therefore, the value of Gaussian pixel this method is shown in Fig. 4.
weight was decremented by 1, and the detection was re- This paper proposes a feature fusion method that
filtered. fuses the adaptive Canny result map with the low feature
layer of DeepLabv3+. This approach was chosen for
B. SEMANTIC SEGMENTATION AND FEATURE FUSION three main reasons. Firstly, the Canny detection map is
1) DeepLabV3+ rich in detail and provides accurate position information.
DeepLabV3+ is an excellent semantic segmentation The low feature layer has finer features than the high
model that introduces the encoder-decoder form to feature layer, which enhances its ability to locate object
fuse multiscale information better. The encoder-decoder positions and borders. Hence, selecting and fusing low-
architecture introduces arbitrary control over the res- feature layers for pixel-level feature fusion can improve
olution of the features extracted by the encoder and the accuracy of semantic segmentation. Secondly, low-
balances accuracy and time consumption through atrous feature layers are more sensitive to pixel-level details,
convolution. The DeepLabV3+ model is divided into two such as texture and edge, which must be considered
parts: one for the encoder and the other for the decoder. when fusing pixel-level features. As a result, selecting
The encoder uses serial atrous convolution in the and fusing with low-feature layers can better preserve
backbone network DCNN. After applying deep convo- these details, improving the accuracy of semantic seg-
lutional feature extraction to the incoming image, the mentation. Finally, compared to fusing Canny detection
output is divided into two parts: one part is directly maps with high-feature layers, fusing with low-feature
passed to the decoder as the lower feature layer, and layers requires fewer convolution kernels, making the
the other part goes through parallel atrous convolution algorithm run faster.
at different rates to extract features, capture contextual In the structure diagram, after adaptive Canny de-
information at different scales, merge them, and perform tection, the original map becomes a Canny detection
1×1 convolution to compress the features to obtain the map, to which a 3×3 convolutional block is applied
high-level feature layer for the decoder. for feature extraction to obtain a low feature layer
In the decoder, the higher feature layer from the containing more semantic information. This feature layer
encoder is upsampled by bilinear interpolation using a is then fused with the lower feature layer after DCNN,
magnification factor of four; the concat feature is then with an additional fusion considered at this point.
used for fusion with the lower feature layer after 1×1 Compared to concat, instead of increasing the number
8 VOLUME 10, 2022
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
FIGURE 4. Feature fusion structure diagram. The blue box is the encoding part, the red box is the decoding part, and the highlighted area is the innovative
method proposed in this paper.
of features (number of channels) of the image, the di- Information on the parameter settings of the model as
mension of each feature layer needs to be superimposed well as numerical and visualization results, are provided.
directly so that the amount of information under the
features describing the image is increased, which is A. PARAMETER SETTINGS
equivalent to addition in advance and an increase in the This study developed the proposed method using Ana-
amount of semantic information in the otherwise low- conda, Python, and PyTorch. A supervised learning
feature layers, which is beneficial for the final classifica- method is utilized to train the network model, and
tion. the corresponding details and parameter selections are
provided below.
IV. EXPERIMENTAL DETAILS • Bilateral filter kernel size of 15, Gaussian spatial
This section presents the experimental section of the weight c of 8, and Gaussian weight s of 15 to determine
method proposed in this paper. It includes the descrip- the degree of similarity between pixels.
tion of the experimental environment, the parameter • Minimum connected pixel area of 1500,
settings, the selection of the dataset, the ablation experi- • Batchsize of 12,
ments, the comparative experiments with other semantic • Initial learning rate of 0.001 and minimum learning
segmentation networks, and the experimental analysis. rate of 0.00001,
In the experimental study, the CRACK500 dataset • Number of training iterations was set to 100,
and GAPS384 dataset were used to evaluate the per- • Backbone is the Xception network,
formance of the proposed method. The adaptive Canny, • The total loss function is the sum of pixel-wise
DeepLabV3+, Unet, PSPNet, and ICnet methods for cross entropy loss and auxiliary loss,
pixel-level surface defect detection were evaluated, and • The input image size is 512×512,
their performances were compared with the proposed • The downsampling factor is 16,
method. • The optimizer is sgd, the momentum is 0.9, and
The hardware environment used to conduct the ex- the weight decay is 0.0001,
periments in this study was an Intel Xeon(R) W-2145 • The learning rate drop method is cos,
CPU with an NVIDIA GeForce RTX 2080Ti GPU with • Use pre-trained weight files for training
11 GB of graphics memory; the software environment In addition, in the experimental study, intersection
consisted of Ubuntu 18.04, Python 3.6, PyTorch 1.10, over union (IoU), MIoU, Precision, Recall, F1-Score,
and the associated Python libraries for neural networks. Pixel Accuracy(PA), and MAE were used to evaluate the
VOLUME 10, 2022 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
TP
Recall = (12)
(TP + FN)
TP
Precision = (13)
(TP + FP)
(2 × Precision × Recall)
F1 − Score = (14)
Recall + Precision
(TP + TN)
PA = (15)
(TP + TN + FP + FN) FIGURE 5. Sample images in [Link] left is the original image, and
the right is the labeled image.
N
1X
MAE = |yi − ŷi | (16) pixels; each road crack image has a pixel-level annotated
N i=1
label, making this dataset the largest publicly accessible
Where TP, TN, FP, and FN indicate a true positive, pavement crack dataset with pixel-level annotation.
a true negative, a false positive, and a false negative,
respectively. 2) GAPS384
This paper selects MIoU and MAE as the final perfor- This study evaluates the generalization and limitations
mance indicators to evaluate the algorithm. MIoU mea- of the proposed method based on the model trained on
sures the intersection and union relationship between CRACK500 by utilizing the publicly available road crack
predicted and actual regions and then calculates the dataset GAPS384.
average. It comprehensively reflects the model perfor-
mance by considering the similarity and overlap between TABLE 3. GAPS384 details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
Components CRACK500
Method
Adaptive
Four-
Bilateral Bilat- Adaptive MIoU% MAE
way
Filter eral Canny
sobel
Filter
amount of noise, and the road crack edge was not clear
enough. Additionally, after the morphological operation,
the resulting map differed from the real value. Method 2
improved on method 1 by using bilateral filtering instead
of Gaussian filtering, which significantly enhanced the
effect of denoising and edge preservation. Despite elimi-
nating many environmental noises, it failed to extract all
the road crack edge details. Method 3 enriched fracture
edge information by introducing the four-way Sobel op-
erator based on method 2, but also mistakenly extracted
some environmental noise. Method 4 further reduced
interference by adding adaptive bilateral filtering based
on method 3 and continuously adjusting the filtering
parameters through the feedback of the resulting map.
However, the detected road crack edge was still not
accurate enough. Method 5 adapted Canny detection
based on method 4, which enabled the algorithm to
FIGURE 7. Visualization result map of ablation experiment. select the Canny threshold adaptively according to the
image gradient information, achieving better detection
We compared and analyzed different methods’ road results.
crack detection capabilities based on the detection re- The evaluation indicators used to measure the detec-
sults in Table 4 and the performance comparison in tion results of each method on the CRACK500 dataset
Figure 8. Method 1 exhibited poor anti-interference and were the MIoU value and the MAE value, as shown in
edge feature extraction, as shown in Figure 8. The Table 4. The MIoU value represents the overlap between
Canny image generated by it contained a significant the detected and actual areas. In contrast, the MAE
VOLUME 10, 2022 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
value represents the difference between the average pixel evaluation indicators are presented in Table 5. Figure 9
of the detected image and the actual image. From Table visually illustrates the performance of each algorithm,
4, we can see the algorithm modules of each method and allowing for a more intuitive comparison of their pros
their detection results MIoU and MAE tested on the and cons in detecting road cracks.
CRACK500 dataset. Figure 8 visually demonstrates the
performance improvement of each method after adding TABLE 5. Test results (%) of the proposed method and other test results
on CRACK500
different algorithm modules. From method 1 to method
5, the MIoU value gradually increased, and the MAE
Model MIoU(%) MAE PA(%) F1-Score(%)
value gradually decreased. Method 5 improved MIoU
by approximately 45% and MAE by approximately PSPNet 62.68 2.81 93.55 50
ICNet 56.15 2.77 91.82 39
20 compared to method 1. Therefore, this experiment UNet 67.76 3.01 95.36 53
proves that each new module can improve the road crack DeepLabV3+ 71.56 2.23 97.12 56
detection ability of the algorithm, and each algorithm OpenCV 65.28 6.24 94.07 49
Ours 77.64 1.55 97.38 63
module proposed in this paper is reliable and necessary.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
J. LUO et al.: Adaptive Canny and semantic segmentation networks based on feature fusion for road crack detection
scene changes. However, this method has some lim- [7] H. Xu, et al., “Pavement crack detection based on OpenCV
itations, specifically regarding the road crack images and improved Canny operator,” Comput. Eng. Des., vol. 35,
pp. 4254-4255, 2014.
in the GAPS384 dataset, which originate from asphalt [8] F. Zhao, et al., “Application of improved Canny operator in
pavement. Due to the small difference between the road crack detection,” Electron. Meas. Technol., vol. 41, no. 20,
crack and the road surface, the presence of low light, and pp. 107-111, 2018. DOI:10.19651/[Link].1801773.
[9] S. G. Monicka, D. Manimegalai, and M. Karthikeyan, “De-
interference from road signs, such as zebra crossings, this tection of microcracks in silicon solar cells using Otsu-Canny
method is unsuitable for detecting road cracks in this edge detection algorithm,” Renew. Energy Focus, vol. 43, pp.
case, resulting in a weak detection ability. The detection 183-190, 2022.
[10] M. Huang, Y. Liu, and Y. Yang, “Edge detection of ore and
results are displayed in lines 2 and 3 of Figure 11. rock on the surface of explosion pile based on improved Canny
Furthermore, the information extracted from the road operator,” Alex. Eng. J., vol. 61, no. 12, pp. 10769-10777,
cracks by the proposed method is insufficient, and the 2022. DOI:10.1016/[Link].2022.04.019.
[11] A. Akagic, et al., “Pavement crack detection using Otsu
road cracks cannot be successfully detected. thresholding for image segmentation” 41st International Con-
vention on Information and Communication Technology,
V. CONCLUSION Electronics, and Microelectronics (MIPRO), vol. 2018. IEEE,
2018, pp. 1092-1097.
This study proposed a feature fusion-based adaptive [12] A. Zhang, et al., “Automated pixel-level pavement crack
Canny and semantic segmentation network for road detection on 3D asphalt surfaces using a deep-learning net-
crack detection. Based on improved Canny edge de- work,” Comput. Aid. Civ. Infrastruct. Eng., vol. 32, no. 10,
pp. 805-819, 2017. DOI:10.1111/mice.12297.
tection, the method fuses the Canny crack detection [13] A. Zhang, et al., “Deep learning–based fully automated
map with the low feature layer in the DeepLabV3+ pavement crack detection on 3D asphalt surfaces with
detection network after extracting information from an improved CrackNet,” J. Comput. Civ. Eng., vol. 32,
no. 5, p. 04018041, 2018. DOI:10.1061/(ASCE)CP.1943-
the convolutional feature extraction module. Finally, it 5487.0000775.
splices it with a high feature layer after the resulting map [14] Y. Fei, et al., “Pixel-level cracking detection on 3D asphalt
is obtained. The experimental results on the CRACK500 pavement images through deep-learning-based CrackNet-V,”
IEEE Trans. Intell. Transp. Syst., vol. 21, no. 1, pp. 273-284,
dataset demonstrate the effectiveness of the proposed fu- 2019. DOI:10.1109/TITS.2019.2891167.
sion method, with an MIoU of 77.64 and an MAE of 1.55, [15] A Ji, X Xue, Y Wang, et al., An integrated approach to
outperforming other semantic segmentation detection automatic pixel-level crack detection and quantification of
asphalt pavement[J]. Automation in Construction, 2020, 114:
algorithms. This method can reliably eliminate interfer- 103176.
ence and detect a more comprehensive range of fracture [16] X Yang, H Li, Y Yu, et al., Automatic pixel-level crack detec-
details. However, it still has limitations, with detection tion and measurement using fully convolutional network[J].
Computer-Aided Civil and Infrastructure Engineering, 2018,
performance deteriorating in dark environments or when 33(12): 1090-1109.
interfering objects are present. Thus, future research is [17] Y Liu, J Yao, X Lu, et al., DeepCrack: A deep hierarchical
necessary to optimize the algorithm further and enhance feature learning architecture for crack segmentation[J]. Neu-
rocomputing, 2019, 338: 139-153.
its detection accuracy in such scenarios while improving [18] H Üzen, M Turkoglu, M Aslan, et al., Depth-wise Squeeze
its generalization ability. and Excitation Block-based Efficient-Unet model for surface
defect detection[J]. The Visual Computer, 2022: 1-20.
[19] X Sun, Y Xie, L Jiang, et al., Dma-net: Deeplab with multi-
REFERENCES scale attention for pavement crack segmentation[J]. IEEE
[1] Y. Pan, G. Zhang, and L. Zhang, “A spatial-channel hi- Transactions on Intelligent Transportation Systems, 2022,
erarchical deep learning network for pixel-level automated 23(10): 18392-18403.
crack detection,” Autom. Constr., vol. 119, p. 103357, 2020. [20] Z Liu, X Li, J Li, et al., A New Approach to Automatically
DOI:10.1016/[Link].2020.103357. Calibrate and Detect Building Cracks[J]. Buildings, 2022,
[2] N. Safaei, et al., “Gasoline prices and their relationship to the 12(8): 1081.
number of fatal crashes on US roads,” Transp. Eng., vol. 4, [21] M Cai, X Yi, G Wang, et al., Image Segmentation Method
p. 100053, 2021. DOI:10.1016/[Link].2021.100053. for Sweetgum Leaf Spots Based on an Improved DeeplabV3+
[3] B. Safaei, et al., “Studying the risks and factors contributing Network[J]. Forests, 2022, 13(12): 2095.
to motorcycle crashes, and prioritizing strategies to reduce [22] C. Saravanan, “C. Saravanan color image to grayscale im-
fatalities, and improve community health,” no. Febr., 2021. age conversion, computer engineering and applications (IC-
[4] C Wang, Y Li, and Y Qi. Comparison Research of Capa- CEA)” Second International Conference on IEEE, 2010.
bility of Several Edge Detection Operators[C]//International DOI:10.1109/ICCEA.2010.192.
Research Association of Information and Computer Sci- [23] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, Bilateral
[Link] of International Conference on Industrial Filtering: Theory and Applications, Now, 2009.
Technology and Management Science(ITMS 2015).Atlantis [24] S. H. Li, X. Liu, W. Fang, D. Y. Zhang, H. Q. Fei, C. Li, and
Press,2015:796-799. Z. Qin, “Detection of tiny leakage points based on bilateral
[5] L C Chen, Y Zhu, G Papandreou, et al. Encoder-decoder filtering and frame difference variance method,” Sci. Technol.
with atrous separable convolution for semantic image seg- Eng., vol. 22, no. 25, pp. 11084-11090, 2022.
mentation[C]//Proceedings of the European conference on [25] Q. Wang, et al., “Anomaly detection in periodic motion
computer vision (ECCV). 2018: 801-818. scenes based on multi-scale feature Gaussian weighting anal-
[6] Z Zhu, P Zhu, J Zeng, et al. A Surface Fatal Defect De- ysis,” Meas. Sci. Technol., vol. 30, no. 5, pp. 2-6, 2019.
tection Method for Magnetic Tiles based on Semantic Seg- DOI:10.1088/1361-6501/ab0479.
mentation and Object Detection: IEEE ITAIC (ISSN: 2693- [26] H. Zhu, L. Lin, D. Q. Chen, and J. Chen, “A PCB image
2865)[C]//2022 IEEE 10th Joint International Information localization correction method based on multi-directional
Technology and Artificial Intelligence Conference (ITAIC). improved Sobel operator,” J. Electron. Meas. Instrum., vol.
IEEE, 2022, 10: 2580-2586. 33, no. 09, pp. 121-128, 2019. DOI:10.13382/[Link].B1902204.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3279888
[27] H. K. Xu, Y. Y. Qin, and H. R. Chen, “An improved Canny- XIAOXU WEI received the B.S. degree (2011)
based edge detection algorithm,” Infrared Technol., vol. 36, in automation and the M.S. degree (2014) in
no. 03, pp. 210-214, 2014. control science and engineering from Wuhan
[28] Z. Wang and S. X. He, “An adaptive edge detection method University of Technology, Wuhan, China.
based on Canny’s theory,” Chin. J. Graph., vol. 9, no. 08, pp. She is currently pursuing the doctor’s de-
65-70, 2004. gree in Automotive electronics, the School
[29] B. Feng, J. Wang, and J. Sun, “Threshold segmentation of of Automotive Engineering, Wuhan Univer-
Ostu images based on quantum particle swarm algorithm,”
sity of Technology, Wuhan, China.
Comput. Eng. Des., vol. 29, no. 13, pp. 3429-3431, 2008.
Her research interests include robotic ma-
[30] S. Zhang, L. X. Yang, and L. Ding, “Research on V-shaped
weld seam feature extraction based on adaptive bilateral chining and machine learning.
filtering,” Manuf. Technol. Mach. Tool., vol. 07, pp. 125-129,
2021. DOI:10.19287/[Link].1005-2402.2021.07.024.
[31] Z. Y. Cheng, “An efficient adaptive bilateral filtering
method,” Digit. Technol. Appl., vol. 37, no. 10, pp. 121-123,
2019. DOI:10.19695/[Link].cn12-1369.2019.10.68.
[32] S. Bell, et al., “Inside-outside net: Detecting objects in context
with skip pooling and recurrent neural networks,” Proc. IEEE
Conference on Computer Vision and Pattern Recognition,
2016, pp. 2874-2883.
[33] J. Fan, et al., “Multi-scale feature fusion: Learning better
semantic segmentation for road pothole detection” IEEE
International Conference on Autonomous Systems (ICAS),
vol. 2021. IEEE, 2021, pp. 1-5.
[34] H. Üzen, et al., “Depth-wise Squeeze and Excitation Block-
based Efficient-Unet model for surface defect detection,” Vis.
Comput., pp. 1-20, 2022. DOI:10.1007/s00371-022-02442-0.
[35] M Eisenbach, R Stricker, D Seichter, et al., How to get pave-
ment distress detection ready for deep learning? A systematic
approach[C]//2017 joint international conference on neural
networks (IJCNN). IEEE, 2017: 2039-2047.
[36] F Yang, L Zhang, S Yu, et al., Feature pyramid and hier-
archical boosting network for pavement crack detection[J].
IEEE Transactions on Intelligent Transportation Systems,
2019, 21(4): 1525-1535.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see [Link]