0% found this document useful (0 votes)
56 views20 pages

Deep Learning for Asphalt Crack Detection

Uploaded by

emonome2190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views20 pages

Deep Learning for Asphalt Crack Detection

Uploaded by

emonome2190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

Diyala Journal of Engineering Sciences


Journal homepage: [Link]

ISSN: 1999-8716 (Print); 2616-6909 (Online)

Deep Learning-Based Detection, Segmentation, and Quantification of


Asphalt Pavement Cracks
Shemeam T. Muhey*, Sinan A. Naji

Informatics Institute for Postgraduate Studies, University of Information Technology and Communications. Iraq

ARTICLE INFO ABSTRACT


The primary factor influencing road performance is pavement deterioration. Pavement
Article history: cracking, a prevalent form of road deterioration, is a significant challenge in road
Received March 25, 2025
Revised May 14, 2025 maintenance. This paper proposes a method utilizing deep convolutional neural network
Accepted June 11,2025 models for precise crack detection, segmentation, and geometric parameter calculation
Available online September 01,2025 in pavement crack identification. The system operates through three primary stages:
Commencement, crack identification employs YOLOv10, a rapid and efficient object
Keywords: detection model. Secondly, crack segmentation employs a modified Unet 3+ variant
YOLOv10 known as Residual-Attention UNet 3+, which effectively distinguishes crack pixels
UNet 3+ from the background by utilizing attention mechanisms and residual connections to
Attention gate enhance accuracy. Finally, crack quantification, wherein the system computes the
Residual unit crack's geometric parameters, including width, length, angle, and orientation. We
Cracks identification assessed performance using two datasets: SUT-Crack, a publicly accessible dataset, and
IRD-Crack, a new real-world dataset compiled by the authors from roads in Diyala, Iraq,
with diverse lighting conditions and surface complexities. The suggested technique
attained an accuracy of 98.96% on the SUT-Crack dataset. It showed superior
performance on the IRD-Crack dataset under actual situations, therefore validating its
efficacy and generalization capability. This method offers a pragmatic and
computationally efficient instrument for monitoring pavement cracks and can facilitate
road repair choices.

1. Introduction Therefore, early detection of pavement cracks is


crucial for preventing pavement degradation,
Pavement deterioration is the primary safeguarding the underlying foundation layers,
element influencing road performance. The minimizing maintenance efforts and expenses,
prompt and precise identification of pavement and ensuring safety for all road users.
deterioration is essential for pavement upkeep. Early pavement identification and repair
Cracks are the primary indication of multiple usually depend on manual detection, which is
forms of pavement deterioration. Pavement time-consuming and laborious, has weak
cracks would adversely impact both the detection accuracy, and has some associated
aesthetic quality and driving comfort while also dangers [2]. From crack classification, it is
potentially escalating to induce structural revealed that they exhibit diverse shapes,
damage and diminish the overall service extensive coverage areas, varying extension
performance and lifespan of the pavement [1]. lengths, and irregular widths.

*
Corresponding author.
E-mail address: ms202220727@[Link]
DOI: 10.24237/djes.2025.18305
This work is licensed under a Creative Commons Attribution 4.0 International License.

68
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

In recent years, the fast development of method for pavement crack identification
computer technology and Artificial intelligence utilizing a deep convolutional neural network
(AI) has led to its integration across numerous fusion model, which effectively identifies
fields. In recent years, it has become extensively cracks and ensures recognition accuracy
utilized for road crack identification, resulting in through the YOLOv10 model. A detected crack
numerous automated detection techniques. can be segmented using Residual-Attention
Traditional automatic detection techniques UNet 3+, and the resulting binary image can be
encompass the Canny algorithm, which relies on utilized to compute the geometry characteristics
threshold segmentation [3], and the Otsu of the crack. Consequently, the suggested model
approach [4]. Nevertheless, owing to the holds substantial importance for intelligent
intricate characteristics of the road surface and pavement detection and can concurrently
pavement environment, along with the general perform detection and segmentation, thereby
applicability and robustness of the traditional markedly enhancing model efficiency.
Canny and Otsu algorithms, the accuracy of the The main contributions to the suggested model
detection findings is suboptimal. Subsequently, include the following:
the minimal cost path search algorithm [4], the
support vector machine (SVM) detection
algorithm [5], the Crack Tree detection
1. The suggested system employs YOLOv10
technique, and others emerged. These
for object detection because it effectively
techniques have solved low Precision, however,
resolves a significant obstacle in
detecting Precision still takes a long time.
organizational development: balancing
Additionally, the complex architecture of
accuracy and computational efficiency
detection and Crack Tree techniques using SVM
compared to earlier versions.
prevents their practical application. Based on
2. This study presents a novel technique for
these difficulties and the prevalent machine
image segmentation. We introduced
learning technology [6], a deep learning and
Residual-Attention UNet 3+, a composite
neural network-based automatic crack
neural network that amalgamates the
identification and detection technique is
advantages of UNet 3+, residual units, and
developed. Crack detection data can also
attention mechanisms for the segmentation
monitor pavement conditions and determine
of crack images. This technique has
road maintenance strategies. Thus, pavement
improved predicted accuracy relative to
crack identification would considerably impact
earlier methods, hence differentiating our
road monitoring and maintenance automation,
approach from prior methodologies. This
thus its precision and speed must be improved.
results in attaining a high level of precision
Currently, the direction of Road pavement
in the identification of pavement cracks.
crack recognition research is separated into two
3. Utilizing another dataset by capturing
sections. The digital image processing method
pavement crack images of local roads to test
uses artificial feature identification, including
the proposed system in order to reflect its
frequency, edge, HOG, gray level, texture, and
applicability and generalization in the real
entropy, to construct feature recognition
environment.
conditions for limited and total recognition. The
second type uses deep learning to create a This paper is organized into the following
Convolutional Neural Network (CNN) for sections: Section 2 presents the related research,
automatic feature recognition. The network Section 3 outlines our methodology, and Section
adjusts to meet or exceed label accuracy by 4 details the experiments and analysis. In
following specific rules. This paper establishes conclusion, Section 5 encapsulates the entirety
a deep learning-based convolutional network to of the work.
detect pavement cracks automatically.
Considering the aforementioned issues in
pavement crack detection, this paper proposes a

69
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

2. Related works However, the framework presupposes optimal


conditions and fails to include real-world
In recent years, the automated identification environmental fluctuations, such as shadows,
of pavement cracks has garnered heightened debris, or illumination discrepancies. Shu et al.
interest. Authors and maintenance specialists presented a pavement crack detection model that
are exploring diverse strategies and utilizes the YOLOv5 target detection network
methodologies to improve maintenance with the street view image data source [9],
dependability and efficacy. This section which is a cost-effective method. With a mAP
summarizes the literature in this field. Li et al. of more over 70%. However, the model has
presented an interesting form of the road crack difficulties in identifying small or hairline
detection model called RDD-YOLO [7]. The fractures due to the constrained resolution and
model combined a simple attention mechanism noise inherent in street view data. An et al.
(SimAM) to the backbone network to bring suggested a system called the Crack
attention to significant details in the input Identification Network (CIN) [10] for
image. By using GhostConv instead of identifying and calculating the size of concrete
traditional convolution modules, the neck surface cracks by integrating deep learning
structure is enhanced. As a result, the task of convolutional neural networks, clustering
damage recognition will execute more segmentation and morphological techniques.
lightweight and effective because there is less The accuracy rate achieves 99%, although the
redundant data, fewer parameters, and less approach demonstrates great accuracy, its
computing complexity. Lastly, the upsampling computational complexity and absence of real-
algorithm in the neck is improved by replacing time performance may impede its
the nearest interpolation with more accurate implementation in practical settings. Zhang Z. et
bilinear interpolation. This finer interpolation al. introduced the ResUnet, a semantic
method more successfully restores the image's segmentation neural network, which gathers the
delicate details and improves the accuracy of the strength of residual learning and U-Net from
detection results. The proposed model achieves high-resolution remote sensing images [11].
an mAP50 and mAP50-95 of 62.5% and 36.4% The first benefit of this model is that residual
on the validation set respectively on the units facilitate deep network training. Second,
RDD2022 dataset. This study is constrained by information might spread more easily due to the
its dependence on a singular dataset network's rich skip connections, making it
(RDD2022), perhaps limiting its applicability to possible to build networks with fewer
diverse pavement or lighting situations. Deng et parameters but better performance. The
al. suggested an integrated framework for suggested method's break-even points which
automatic detection, segmentation, and defined as the point on the relaxed precision-
measurement of road surface [8]. In the recall curve, was 0.9187. Nonetheless, it is
proposed framework, three different computer computationally demanding and inappropriate
vision algorithms are effectively combined: for implementation on low-resource devices or
First, to identify cracks, the real-time object in real-time applications. Zhang Q. et al.
detection algorithm YOLOv5 is employed, it introduced an improved U-net network for crack
achieves a mean average precision (mAP) of detection and segmentation with a complicated
91%. Secondly, a modified ResNet is created by background [12]. The VGG16 and the novel
adding an attention gate module to increase Up_Conv module are added as the backbone
accuracy of cracks segmentation at the pixel network to increase the recognition accuracy of
level which achieves 87% intersection over small cracks in the road surface. Moreover, U-
union (IoU) on crack pixels segmentation. net's skip connection was enhanced using the Ca
Lastly, an innovative surface feature (Channel Attention) mechanism to distinguish
quantification method is created to measure both between cracks and background noise. In order
the width and length of segmental road cracks, to extract richer information through more
achieving a 95% identification accuracy. convolutional layers in the network, the

70
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

DG_Conv (Depthwise GSConv Convolution) Through downsampling, the Multi-Fusion U-


and UnetUp (Unet Upsampling) modules are Net architecture is suggested as a way to
introduced in the decoding stage. The suggested aggregate contextual information from feature
system's results show a precision of up to 87.4%. maps of different sizes. The accuracy of the
However, the model's efficacy is contingent system is 86.41%. Nonetheless, the model's
upon hyperparameter configurations, and the attention methods introduce a considerable
research is deficient in detail on its resilience computational burden, rendering it less
under diverse environmental circumstances. He appropriate for real-time crack investigation.
and Lau put out an interesting model called Zhang et al. employ an innovative technique that
CrackHAM. This encoder-decoder network is combines a Convolutional Block Attention
based on the U-Net design and incorporates a Module (CBAM) with a ResNet model to
novel model network called the HASP module identify multi-type cracks [14]. The suggested
to address the problem of deteriorating spatial model achieves a precision of 92.9%.
data [13]. Additionally, the channel attention Nevertheless, the study is hindered by its
and spatial attention modules were used to geographically narrow dataset, which may not
capture abundant contextual information for accurately reflect different road conditions
high-level features and extract rich edge worldwide. Table 1 illustrates a summary of
information for low-level features respectively. related works.
Table 1: Summary of related works

Study Technique Dataset Performance Metrics


[7] YOLOv8 and simple attention mAP50: 62.5% and
RDD2022
mechanism mAP50-95: 36.4%
[8] (RDD) dataset for training and mAP: 91%
YOLOv5 and Attention ResNet validation, and Road-Crack-Images-Test IoU: 87%,
from Hunan University Accuracy: 95%
[9] YOLOv5 Street view images mAP > 70%
[10] CNN, clustering segmentation and
Collect 1000 crack original images. Accuracy: 99%
morphology
[11] Precision-
residual learning and U-Net Massachusetts roads dataset
recall:.91.87%
[12] Improved U-Net and VGG16 CFD and Deepcrack Precision: 87.4%
[13] Multi-Fusion U-Net Deepcrack, Crack500, and FIND Accuracy:86.41%.
[14] Convolutional Block Attention
Collect crack images from China streets Precision: 92.9%
Module and ResNet model

3. Materials and methods applications and is particularly effective for


crack-detecting tasks that include stringent time
This study provides a fully automated limitations and significant safety threats.
procedure for the detection, segmentation, and Consequently, the YOLOv10-based method
measurement of asphalt pavement cracks [15] is initially employed to detect road crack
located in different shapes and forms within the areas utilizing bounding boxes. Second,
image , as illustrated in Figure 1 two semantic segmentation using the improved
methodologies have been used in our system: Residual-Attention UNet 3+ algorithm. The
first, object detection using YOLOv10 which is outcomes of Stage 1 are input into the improved
a single-stage object detection method, offers Residual-Attention UNet 3+ algorithm as Stage
rapid detection speed and effective 2. To enhance pixel-level crack segmentation,
identification of small targets. YOLOv10 we have refined the UNet 3+ model by
enhances both precision and efficiency via a developing an integrated neural network that
synthesis of training methodologies and combines the advantages of UNet 3+, residual
architectural developments. This method has units, and attention gates (AG) for crack image
been utilized in numerous engineering

71
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

segmentation. In Stage 3, a novel approach for Simultaneously, a novel technique for surface
estimating surface cracks is introduced to feature estimation has been devised to examine
measure the length, width, and orientation of the surface feature data, emphasizing crack
segmented cracks. The primary benefit of the morphology precisely [16]. The specifics of
suggested model is the substantial enhancement each phase of the planned architecture are
in the precision and efficacy of road crack explained in the subsequent subsections.
segmentation among intricate backgrounds.

Figure 1. The general architecture of system

3.1 Image pre-processing computational efficiency, image scaling


provides the model with a standard input size.
Image pre-processing is an essential stage in
deep learning that increases the quantity and 3.1.2 Image Augmentation
quality of dataset images required for system
training and yields a more efficient learning Large datasets are typically needed during
model. cropping, flipping, rotation, the training phase of deep learning with CNN-
improvement of contrast, colour-space based methodologies in order to improve the
transformation, noise reduction, and colour ability of the model to learn new image patterns
enhancement, are examples of pre-processing and generate accurate predictions. The
methods [17]. Various approaches performed to augmentation process improves the training
the SUT-Crack Datasets are shown in the dataset by using multiple image
following sections. transformations. Rotation, shifting, shearing,
zooming, flipping, and reflecting are a few
3.1.1 Image scaling examples of these transformations. Through the
production of new images from the dataset of
All of the images used in this study were asphalt cracks, overfitting is mitigated,
resized to 640 x 640 x 3 and 320 x 320 x 3 in undesired feature acquisition is avoided, and
order to be compatible with the inputs utilized overall performance is enhanced [17,18]. The
by the YOLOv10 and UNet 3+ models, various transformation types and their
respectively. In addition to ensuring associated parameters are displayed in Table 2.

72
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

Table 2: Dataset augmentation with different transformations.

Transformation Type Corresponding Values


Range of Rotation 30 degrees
Range of Width-Shift 10%
Range of Height-Shift 10%
Range of Shear 10%
Range of Zoom [70% - 100%]
Horizontal-Flip ‘True’
Fill Mode Reflection 'Nearest'

3.1.3 Splitting the dataset datasets. SUT-Crack datasets were split into
three subsets for this study: 70% for training,
A common technique in machine learning, 20% for validation, and 10% for testing.
data mining, pattern recognition, and other comprehensive information is available in Table
fields is splitting the dataset into smaller sub- 3.
Table 3: Details of splitting dataset

Training (70%) Validation (20%) Testing (10%)

5756 1644 822

3.2 YOLOv10 for crack detection efficiency. As following, we introduce some of


the special features of YOLOv10: The road
The initial stage of the suggested model is crack images are initially processed by the
crack detection (object detection) using backbone to extract crack features,
YOLOv10 which is employed to identify road subsequently, feature fusion is conducted in the
cracks in images. The purpose of object neck utilizing an Enhanced Feature Pyramid
detection techniques is to locate and classify an Network (FPN) with spatial-channel decoupling
object in image by drawing bounding box and Partial Self-Attention (PSA) [15].
around object region [11,19]. Popular models ultimately, the outputs consist of predicted
like YOLO (You Only Look Once) [20] or values for class probability, item level, and
Faster R-CNN [21] are often adapted for this bounding box location of road cracks.
task. These models don't detect the exact pixel YOLOv10's architecture comprises three
boundaries but rather identify the crack as an components: backbone, neck, and head. Below
object within the image [22]. One of the greatest is an explanation of the basic components:
real-time object detection algorithms is the
YOLO (You Only Look Once) series (from v1 a. The initial component is the backbone,
to v10), a single-stage object detection primarily responsible for feature extraction
technique developed in recent years. It from the input image. YOLOv10's
significantly and broadly affects several backbone employs an advanced generation
computer vision studies [23,24]. Numerous of CSPNet (Cross-Stage Partial Network)
tests demonstrate that YOLOv10 is superior to to boost gradient flow and minimize
other advanced detectors by achieving state-of- computational redundancy. CSPDarknet
the-art performance and latency [25], through has numerous essential modules, such
fusing training techniques and architectural convolutional layers, batch normalization,
innovations YOLOv10 improves accuracy and activation functions, and residual blocks. A

73
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

vital element of CSPDarknet is the Cross typically established at 0.25, and


Stage Partial (CSP) connections, which Intersection-over-Union (IoU) thresholds,
partition maps features into two sections set at 0.45, are employed to eliminate weak
and integrate them via a cross-stage detections. The resulting bounding boxes
hierarchy to enhance learning efficiency. are then transformed into image-scale
Furthermore, spatial-channel decoupled- coordinates for visual overlay and
down sampling is implemented to improve annotation.
computing efficiency. Additionally,
YOLOv10 integrates large-kernel 3.2 Improved Residual-Attention UNet 3+ for
convolutions and partial self-attention Crack Segmentation
methods during the feature extraction
We have improved UNet 3+ model by
phase, enhancing detection precision while
constructing an integrated neural network that
preserving computing efficiency. The
combines the strengths of UNet 3+, residual
enhancements in the backbone architecture
unit, and attention gate (AG) to carry out crack
enable YOLOv10 to attain enhanced
semantic segmentation. Semantic Segmentation
efficacy in object detecting tasks.
involves assigning a class label to every pixel in
b. The architecture's neck integrates a path
an image into a pre-defined set of categories,
aggregation network (PAN) module,
such as road, building, or vehicle. Since FCN,
optimized for efficiency, along with up
U-Net, and their variations predict one
sampling layers to improve feature map
segmentation map based on pixel-wise
resolution; it comprises an FPN and a PSA
classification, they have been extensively used
positioned between the backbone and head
for semantic segmentation across a variety of
layers. Utilizing an FPN architecture
applications [26,27]. The core network
facilitates the transmission of substantial
framework used in the suggested model is UNet
semantic attributes from the highest to the
3+, which connects the encoder and decoder
lowest feature maps. This design guarantees
networks using deep supervision and full-scale
the accuracy of minor object details while
skip connections [28]. Following each encoding
enabling the abstract representation of big
step (E1 to E4), the encoder network's feature
objects. The PSA architecture transmits
map was translated to the decoder network using
precise localization data across feature
dense convolution blocks and a residual block
maps of differing granularity. By
(Conv+Maxpooling+Dropout(0.2)). We
integrating the FPN and PSA, YOLOV10
inserted an attention gate between (E4-D4) to
improves efficiency through the PSA
help the model focus on the most significant
module and the Compact Inverted
features and disregard the unimportant ones. It
Bottleneck (CIB) block, facilitating
takes two inputs g, gate signal comes from the
effective multi-scale feature processing and
next lowest layer of the network (decoder stage),
attention mechanisms. Consequently, the
which has the better features and x, comes from
neck attains adequate power for feature
skip connection at early layers (encoder stage).
fusion.
An element-wise sum is performed on the two
c. The predictive header eliminates the need
vectors. Because of this process, aligned
for non-maximum suppression (NMS) used
weights get bigger and unaligned weights get
by previous versions; a technique used to
smaller. A ReLU activation function is applied
eliminate duplicate predictions and select
to the resulting vector. The attention coefficients
the most confidently selected boxes. By
(weights) are produced by scaling this vector
introducing a double-assignment strategy
between [0,1] using a sigmoid layer; more
into its training process, it thus significantly
relevant features are indicated by coefficients
reduces processing time. Finally, the
closer to 1. Trilinear interpolation is used to up-
predicted specified box is generated, and
sampling the attention coefficients to the x
the object is classified and labelled. During
vector's original dimensions. The original x
post-processing, confidence criteria,
vector is scaled based on significance by

74
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

multiplying the attention coefficients element Improved Residual-Attention UNet 3+is


by element. The skip connection then transmits depicted in Figure 2.
this as usual [29]. The general structure of

Figure 2. Enhancement of UNet 3+ Model Architecture

3.2.1 UNet 3+ in complete sizes. The basic architecture of Unet


3+ also contains skip connections, The basic
Which benefits full-scale skip connections idea of skipping connections is that as the
and deep supervisions. While the deep encoder lowers the spatial resolution, which can
supervision learns hierarchical representations cause a loss of fine details, the skip connections
from the full-scale aggregated feature maps, the assist in maintaining spatial details by directly
full-scale skip connections combine low-level transmitting them to the decoder. For example,
details with high-level semantics from feature Figure 3 shows how to extract the feature map
maps on different scales. The main advantage of 3
of 𝑋𝐷𝑒 . Like the UNet, the decoder receives the
UNet 3+ is its ability to be efficiently trained on feature map from the same-scale encoder layer
small datasets [28]. The primary architecture of 3
𝑋𝐸𝑛 directly. Unlike to the UNet, a set of inter
the UNet 3+ is consists of two main parts: encoder-decode skip connections transfers the
Encoder and Decoder. The encoder means a low-level detailed information from the smaller-
chain of convolutional layers that capture high- scale encoder layer X1En ' and XEn
2
, by using non-
level features. Each decoder layer in Unet 3+ overlapping max pooling operation; while by
includes both smaller- and same-scale feature applying bilinear interpolation a chain of intra
maps from the encoder and larger-scale feature decoder skip connections transfers the high-
maps from the decoder, which capture fine- level semantic information from larger-scale
grained features and coarse-grained semantics

75
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

4 5
decoder layer 𝑋𝐷𝑒 and 𝑋𝐷𝑒 . As a result, it will be aggregation process, comprising 320 filters of
formed five same resolution feature maps. To size 3 × 3, batch normalization, and a ReLU
eliminate unnecessary information and further activation function, has been applied on the
standardize the number of channels a concatenated feature map from five scales in
convolution with 64 filters of size 3 × 3 could be order to smoothly combine the shallow exquisite
a good option. Furthermore, a feature information with deep semantic information
[28].

3
Figure 3. Illustration of how to construct the full-scale aggregated feature map of third decoder layer 𝑋𝐷𝑒 in original
Unet 3+

3.2.2 Residual Blocks or Units

A series of stacked layers make up residual 3.2.3 Attention mechanisms


blocks or units, in which inputs are added back
to their outputs in order generate identity The main idea behind attention mechanisms
mappings. In practice, identity mappings are is to recognize the most important elements of
implemented using what are known as skip or feature maps in convolutional neural networks
residual connections. However, there are several (CNNs) that the redundancy is removed for
possible ways to apply these connections, machine vision applications. Attention
depending on where they are inserted within the mechanisms generate attention maps that help
stacked layers that form a residual block [30]. CNNs focus on important spatial or channel-
According to learning theory, deeper neural wise features [32,33].
networks should achieve lower training and test 3.3 Crack Quantification
error, but in practice, the opposite occurs. Once
the error rate reaches a minimum value, the error We utilized deep learning techniques using
rate starts increasing again. The exploding and the improved UNET 3+ algorithm to extract the
vanishing gradient descent problem is the source crack precisely. Nevertheless, the dimensions of
of this, as it leads to overfitting of the model and the crack remain indeterminate. The pavement
an increase in error, Fortunately, Residual crack is often measured in terms of width,
Networks have proved to be quite efficient in length, and depth, all of which are critical
solving this problem because they employ a skip indications that assess the severity of the crack
connection or a "shortcut" between every two and inform the restoration plan. In most studies,
layers along with using direct connections crack quantification is performed on the
between all the layers [31]. anticipated binary crack map using image

76
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

processing techniques and geometric crack width is computed by dividing the crack's
calculations. However, the morphological length by the area of the linked component. The
characteristics of cracks are not thoroughly contour is examined after it is sketched in the
addressed, which reduces efficiency and image of the crack area. The contour is analysed.
accuracy. Finally, the results of the crack length and width
At this stage, the suggested model is are displayed in the crack image, as shown in
subjected to a case study to verify its robust and Figure 4, which shows the steps of crack
dependable performance in a real-world quantification.
environment. We used a dataset comprising
asphalt crack images from local Iraqi roadways. i. Image Pre-processing: As shown in Figure
The segmentation model produces segmented 5, A series of operations is applied to find
images. Through this rigorous testing, the and analyze the contour of the crack
proposed model can be validated for its (Convert image to Gray-scale, to blur the
effectiveness in solving many safety problems, image, apply Gaussian filter [34], to convert
improving road performance, and reducing the image pixels to a binary image, apply
maintenance costs. adaptive Thresholding, Morphological
We provide a region-connected search Operations, most common morphological
method based on the linked component of operations are erosion and dilation. Erosion
cracks to make the visible cracks more removes pixels from image borders,
comprehensive, distinct, and consistent with the whereas dilation adds pixels.
real trend of cracks. Following the acquisition of Morphological processing removes tiny
the crack binary image's contour coordinates, cracks and fills gaps in detected cracks,
the crack's length is computed using the improving crack detection accuracy [35].
coordinates that were obtained, and the average

Figure 4. Crack quantification by real-world data steps

Figure 5. Image pre-processing steps

ii. Find Optimal Contour: We relied on The optimal contour was chosen based on
geometric analysis of the crack contour to the area. The contour area is then calculated
accurately detect real cracks, which helps in using the function counter area (); small area
filtering real cracks from noise in the image. contours are neglected because they often
To achieve this, we applied canny edges represent noise or unimportant details, while
filter to find the edges (connected large areas are considered because they
components) and then find the contour represent mostly cracks.
using the OpenCV, a “library in Python The optimal crack is calculated by assuming
programming language,” simplifies a minimum threshold (Min value), so areas
locating and drawing crack contours smaller than the previously specified value are
through two basic functions: find-Contours neglected, and areas larger than the specified
() and draw-Contours () [35]. value are mostly considered a contour of cracks.

77
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

After that, we verify whether the contour Algorithm (1): Analysis Crack
represents a crack or not by determining the
Input: Valid crack contour , Angl_Threshold=30
smallest rectangle surrounding the contour
through (GetMinAreaRect(contour)), and the Output: crack properties
width and length of the rectangle are calculated.
Through (aspect-ratio = length/width) and if the Begin
cracks are real, this ratio is greater or equal to Step 1: Get rotated rectangle properties:
(3), but if this ratio is less, it is not considered a
rect = GetMinAreaRect(contour)
crack [36]. This ratio was adopted as a minimum
because studies of crack analysis in road Step 2: Calculate dimensions:
engineering and materials science show that width = Min([Link], [Link])
actual cracks in substructures have a length-to- length = Max([Link], [Link])
width ratio ranging from 3 to 20 or more... If the
angle = [Link]
ratio is less than 3, this means that the shape is
square or circular, but if the ratio is greater than Step 3: Normalize angle
3, this means that the shape is longitudinal and area = CalculateArea(contour)
thin, as the cracks are considered longitudinal IF width > length:
and thin, which is caused by mechanical stress
angle = angle + 90
and thermal changes, which leads to linear
cracking, making them much longer than their END IF
width. Step 4: Determine orientation
IF (Angle < Angl_Threshold (Angle > (180 -
iii. Crack analysis
Angl_Threshold))Then
The algorithm (1) includes analyzing cracks orientation ="horizontal"
to determine their dimensions (width, length,
IF |Angle - 90| < Angl_Threshold Then
angle, and orientation), which will then be
translated into actual measurements. This orientation ="vertical"
algorithm mixes mathematical geometry, image IF Angle < 90 Then orientation ="diagonal-
analysis, and engineering data processing to right"
present an accurate and efficient method for
IF Angle > 90 Then orientation = "diagonal-
analyzing cracks in infrastructure. It generates
interpretable results, making it suitable for left"
practical applications in road maintenance. We Step 5: return{contour: contour,
employ a specified angle threshold (Angle width: width,
Threshold = 30°) to consistently classify crack
length: length,
direction into horizontal, vertical, and diagonal
orientations. Previous investigations angle: angle,
corroborate this threshold, which indicated that center: CenterPoint,
an angular tolerance of 25°–35° was efficient in orientation: orientation}
categorizing cracks under diverse situations.
End
The 30° threshold value signifies an effective
equilibrium between accuracy and tolerance in
practical settings, particularly where fractures iv. Convert to real measurements
may display minor angular variations due to
surface defects or perspective distortions. It is necessary to translate the resultant
measurements (in pixels) to millimetres (mm)
for informed decision-making in road
maintenance. We derive the conversion factor
for applying it to the real measurements of

78
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

length, width, and area, as demonstrated in the and bridges in Diyala governorate. It includes
subsequent equation [37]: various types of images that present various
𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑤𝑖𝑑𝑡ℎ (𝑚𝑚)
problems for crack detection, such as
𝐶𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟 (𝐶𝐹) =
𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑤𝑖𝑑𝑡ℎ (𝑝𝑖𝑥𝑒𝑙)
(1) shadows and stains of oil. A fixed height of
one meter, directly above the pavement, was
𝑙𝑒𝑛𝑔𝑡ℎ𝑚𝑚 = 𝐶𝐹 ∗ 𝐿𝑒𝑛𝑔𝑡ℎ𝑝𝑖𝑥𝑒𝑙 (2) used to capture the high-quality photos. using
a digital camera type (Canon RP + 18-
𝑤𝑖𝑑𝑡ℎ𝑚𝑚 = 𝐶𝐹 ∗ 𝑤𝑖𝑑𝑡ℎ𝑝𝑖𝑥𝑒𝑙 (3) 135mm), with a resolution of (6240 × 4160).
All pictures were captured during morning
𝐴𝑟𝑒𝑎𝑚𝑚 2 = 𝐶𝐹 2 ∗ 𝐴𝑟𝑒𝑎𝑝𝑖𝑥𝑒𝑙2 (4) hours to ensure clarity and similar lighting
conditions. The images in this dataset were
annotated by the use of Labelme application.
4. Experiments results and analysis This dataset was prepared specially to reflect
4.1 Implementation details and Dataset
the real-world environment on local asphalt
Collection
roads. Figure 7 shows a sample of these
The training utilized the Adam optimizer, images with their corresponding masks.
including a learning rate of 0.0001 and a batch These images of datasets present various
size of 32 at 100 epochs. All tests were problems for crack detection, such as oil
conducted with TensorFlow on a Windows 10 stains, shadows, and varying lighting
computer with an Intel Core i7 running at 3.60 conditions. This feature improves the
GHz and 16 GB of RAM. In this study, we reliability of automated pavement crack-
conducted experiments using two datasets to get detecting methods and simulates real-world
more accurate results: circumstances.

In this study, we conducted experiments The training, validation, and testing images
using two datasets to get more accurate results: are from the SUT-Crack dataset, whilst the real-
time test images are sourced from the IRD-
 Set 1: The “SUT-Crack Dataset “which Crack dataset. During pre-processing, images
contains 130 high-resolution images in jpg designated for training, validation, and testing
format, with dimensions of 3024 by 4032 are downsized to 640 x 640 x 3 and 320 × 320
pixels [38]. The images are organized dually, pixels. The limited size of the SUT-Crack
meaning that each original image is matched dataset may not furnish sufficient training data
by its corresponding ground truth image, as to attain appropriate outcomes. To tackle this
shown in Figure 6. SUT-Crack is available at difficulty, diverse strategies are utilized to
[Link] enhance the dataset and increase the quantity of
photographs. Augmentation employs rotation,
 Set 2: The “IRD-Crack Dataset”: which shifting, shearing, zooming, flipping, and
represents our local dataset. It comprises of reflection as shown in Figure 8.
asphalt crack images that were collected in
cooperation with the directorate of highways

(a) (b) (a) (b)


Figure 6. Sample of SUT-Crack dataset of real cracks; (a) Original image; (b) Ground truth image.

79
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

Figure 7. Samples of the LIR-Crack Dataset.

Horizontal Horizontal
Original Rotation Vertical Flip Zoom Vertical shift
Flip shift

Figure 8. Results of the Augmentation Operations for SUT-Crack dataset.

1
4.2 Evaluation metrics 𝐴𝑃 ∑𝑁
1 ∫0 𝑝(𝑟)𝑑𝑟
𝑚𝐴𝑃 = = (9)
𝑁 𝑁
To statistically assess the experimental To
statistically assess the experimental results, where N is the number of crack classes, p is the
many performance metrics were analysed, percentage of all anticipated positive samples
including Accuracy (ACC), Precision (Pr), that are successfully detected, and r is the
Recall (Re), Dice Coefficient (DC), mean percentage of all actual positive samples that
Average Precision (mAP), and Intersection over were correctly detected.
Union (IoU). The methods used for metric The Intersection over Union (IoU) is the
calculation are delineated in Equations (5), (6), ratio of the intersection to the union of the
(7), (8), (9), and (10) respectively[39]. predicted mask and the ground truth data,
𝑇𝑃+𝑇𝑁
expressed as:
𝐴𝐶𝐶 = (5)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 𝐴∩𝐵
𝐼𝑜𝑈 = (10)
𝑇𝑃 𝐴∪𝐵
𝑃𝑟 = (6)
𝑇𝑃+𝐹𝑃
where A and B indicated the predicting image
𝑇𝑃 mask and ground truth image mask,
𝑅𝑒 = (7)
𝑇𝑃+𝐹𝑁 respectively.
2𝑇𝑃
𝐷𝐶 = 𝐹𝑃+2𝑇𝑃+𝐹𝑁 (8) 4.3 Crack detection results
Where TP = True Positives, TN =True We utilized validation data from the SUT-
Negatives, FP = False Positive, and FN = False Crack dataset to assess the efficacy of the
Negatives. proposed crack detection algorithm in the object
detection phase. The findings are displayed in
The average precision (AP) denotes the area Table 4, illustrating performance indicators like
below the precision-recall curve, whereas mean Precision, Recall, mAP@0.5, and
average precision (mAP) refers to the average of mAP@0.5:0.95. At an IoU threshold of 0.5, the
different classes of AP values: suggested YOLOv10 model attained an

80
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

(mAP@0.5) of 68.90%. When the IoU threshold noise, illumination fluctuations, and oil stains.
was varied from 0.5 to 0.95, it produced an YOLOv10 not only delivered great detection
mAP@0.5:0.95 of 54.58%, as seen in Figure 9. accuracy but also enhanced computational
Furthermore, Figure 10 and Figure 11 depict efficiency, rendering it particularly suitable for
the precision-confidence and recall-confidence real-time asphalt crack detection systems. This
curves, affirming the model's resilience across results from its enhanced design, which
different confidence levels. Figure 12 diminishes processing time and resource use
illustrates that YOLOv10 effectively detects while preserving detection accuracy.
fractures, even under adverse situations like

Table 4: Detection Results and compare with previous studies. This mark "N/R" indicates that the
results are not available.

Author method dataset Precision Recall mAP@0.5 mAP@0.5:0.95 mAP


Deng et YOLOv5& RDD
N/R N/R N/R N/R 91
al,2023 Attention ResNet
Li et YOLOv8&
RDD
al,2024 attention mechanism N/R N/R 62.5 36.4 N/R
2022
(SimAM)
YOLOv10 YOLOv10& SUT-
100 91 68.90 54.58 N/R
(ours) UNET 3+ Crack

Table 4 demonstrates that the suggested model successfully absorbed the fundamental
YOLOv10 model attained enhanced accuracy patterns present in the training data. Figure 13
and recall relative to prior studies. Although the Illustrates both training/validation loss.
study of Li et al,2024 [7] indicated a diminished Utilizing 100 epochs. It indicates that the
mAP\@0.5 and lacked other metrics, and Deng minimal loss score attained during training and
et al,2023[8] presented merely an aggregate validation was 0.16.
mAP score without a detailed analysis, our The aforementioned values indicate the model's
methodology delivers a more thorough and excellent performance and efficacy,
dependable assessment across various demonstrating its generalizability, as the results
thresholds, substantiating the efficacy of are consistently low and equivalent in both
YOLOv10 in practical detection contexts. training and validation scenarios.
Figure 14 depicts visual representations of
the model's training and validation performance
with epoch = 100 for Accuracy, Precision, and
4.4 Crack Segmentation Results
Recall metrics. The maximum accuracy,
The loss percentages indicate the disparity precision, and recall scores achieved during
between the predicted outcomes and the actual training and validation were 0.9906, 0.974, and
ground truth values. Lower loss percentages 0.999, respectively.
indicate a higher concordance between the
predicted and actual values, so implying that the

81
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

Figure 9. The network's performance during the validation process: (a) at an IoU threshold of 0.5, the computed mAP
(mAP@0.5), and (b) with the IoU threshold varying from 0.5 to 0.95, the computed mAP (mAP@0.5: 0.95).

Figure 10. Results of the Precision-Confidence Curve.

Figure 11. Results of the Recall-Confidence Curve

The results indicate that the model performance and robust generalization
demonstrated efficiency in making accurate capability.
predictions. In contrast, the equilibrium between The proposed system assesses the model
recall and precision indicates that the model is utilizing various metrics, including (IoU) and
accurate in identifying cracks and thorough in the Dice coefficient. Our model attained an IoU
encompassing every relevant feature, hence of 0.956 and a Dice coefficient of 0.977,
minimizing false negatives. Reflecting effective signifying a substantial correspondence
between the predicted and actual segmentations.

82
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

Figure 12. Results of the crack detection with YOLOv10

Figure 13. Training and validation loss

(a)

83
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

(b)

(c)

Figure 14. Training and Validation (a) Accuracy (b) Precision (c) Recall.

Table 5 provides a comparative evaluation provides samples of Results of the Crack


of our model's segmentation efficacy relative to Segmentation with Residual-Attention UNet. 3+
past studies. Compared with the study in [7],
which attained an IoU of 87.00 and a Dice 4.5 Crack Quantification Results:
coefficient 93.14, our model markedly enhances
Figure 15 shows the ideal accuracy of the
both measures, achieving 95.6% IoU and 97.7%
proposed system in analyzing cracks and
Dice. In comparison to [8], which obtained an
converting them into actual measurements (mm)
IoU of 0.7644, our method exhibits a significant
with high accuracy. This outstanding accuracy
enhancement in segmentation accuracy. The
makes the system very suitable for use in the
results demonstrate the superiority of our model
early detection of pavement cracks, where
in accuracy and segmentation efficacy,
accurate and reliable image analysis plays a
attributable to the integration of the Residual-
crucial role in preventing pavement
Attention UNet architecture, which improves
deterioration and is a valuable tool for road and
feature extraction and segmentation precision.
bridge maintenance officials, enabling them to
Integrating residual connections and attention
make informed decisions based on the results of
mechanisms enhances fracture identification by
high accuracy image analysis. Figure 16
emphasizing the most related features, resulting
represents the image testing process and crack
in superior segmentation tasks. Figure 14
analysis (direction, length, width, and angle).

84
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

Table 5: Segmentation Results and compare with previous studies. This mark "-" indicates that the results are not
available.

models Accuracy Precision Recall IoU Dice

[7] 98.47 - - 87.00 93.14

[8] - 86.41 84.97 0.7644 -

(Ours) 98.96 96.76 98.74 95.60 97.74

Figure 15. Results of the crack segmentation with residual-attention UNet. 3+.

Figure 16. Results of the crack quantification results

4.6 Limitations and Challenges 5. Conclusion


Notwithstanding the promising outcomes In conclusion, we proposed a method for
attained by the proposed method, several limits highly accurately assessing road cracks under
must be recognized. The restricted size of the complex backgrounds. An integrated
SUT-Crack dataset may impede the model's framework is proposed that combines crack
generalizability, notwithstanding the detection, segmentation, and quantification
augmentation procedures employed. based on an image-processing approach with the
Ultimately, fluctuations in illumination, help of deep learning techniques. Crack
shadows, and oil stains within the IRD-Crack detection is first detected using YOLOv10 and
dataset may provide issues for consistent then fed into the improved residual-attention
identification, necessitating enhanced resilience UNet 3+ model for crack segmentation. We
strategies in future implementations. proposed a new method for quantification by

85
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

introducing an algorithm to search for connected image processing technique,” Measurement, vol.
components of cracks, find the crack's optimal 215, p. 112832, 2023.
[5] K. Sarkar, A. Shiuly, and K. G. Dhal,
contour, and analyze it for real measurements. “Revolutionizing concrete analysis: An in-depth
Consequently, we reached the following survey of AI-powered insights with image-centric
conclusions: The suggested technique can approaches on comprehensive quality control,
accurately detect at the pixel scale. Our method's advanced crack detection and concrete property
superiority was assessed using precision, recall, exploration,” Constr. Build. Mater., vol. 411, p.
134212, 2024.
mAP@0.5, and mAP@0.5:0.95 measures, [6] K. C. Laxman, N. Tabassum, L. Ai, C. Cole, and P.
demonstrating greater object detection accuracy Ziehl, “Automated crack detection and crack depth
than prior studies. The crack detection method prediction for reinforced concrete structures using
attained 100% precision, 91% recall, and deep learning,” Constr. Build. Mater., vol. 370, p.
68.90% mAP@0.5. Incorporating an attention 130709, 2023.
[7] Y. Li, C. Yin, Y. Lei, J. Zhang, and Y. Yan, “RDD-
gate and residual connection significantly YOLO: Road Damage Detection Algorithm Based
enhances the accuracy of Residual-Attention on Improved You Only Look Once Version 8,”
UNet 3+ for crack segmentation, resulting in an Appl. Sci., vol. 14, no. 8, p. 3360, 2024.
IOU of 95.60% and a dice coefficient of 97.74% [8] L. Deng, A. Zhang, J. Guo, and Y. Liu, “An
for the segmented cracks. The advanced crack integrated method for road crack segmentation and
surface feature quantification under complex
quantification technique can significantly backgrounds,” Remote Sens., vol. 15, no. 6, p. 1530,
mitigate pavement damage by analyzing cracks 2023.
and translating them into precise measures [9] Z. Shu, Z. Yan, and X. Xu, “Pavement Crack
(mm). These data provide an accurate Detection Method of Street View Images Based on
assessment and characterization of the cracks, Deep Learning,” J. Phys. Conf. Ser., vol. 1952, no.
2, 2021, doi: 10.1088/1742-6596/1952/2/022043.
hence aiding maintenance teams in executing [10] Q. An, X. Chen, X. Du, J. Yang, S. Wu, and Y. Ban,
appropriate maintenance strategies. “Semantic Recognition and Location of Cracks by
In future work, we aim to expand the local Fusing Cracks Segmentation and Deep Learning,”
dataset to ensure the diversity and severity of Complexity, vol. 2021, 2021, doi:
pavement defects to detect patching, erosion, 10.1155/2021/3159968.
[11] Z. Zhang, Q. Liu, and Y. Wang, “Road Extraction
and many other defects, not just cracks in road by Deep Residual U-Net,” IEEE Geosci. Remote
pavements. We also aspire to promote the Sens. Lett., vol. 15, no. 5, pp. 749–753, 2018, doi:
system through edge computing to detect cracks 10.1109/LGRS.2018.2802944.
directly from edge devices such as drones and [12] Q. Zhang et al., “Improved U-net network asphalt
IoT evaporators. This can provide real-time pavement crack detection method,” PLoS One, vol.
19, no. 5 May, pp. 1–21, 2024, doi:
crack detection and segmentation from 10.1371/[Link].0300679.
photographs or videos. [13] M. He and T. L. Lau, “CrackHAM: A Novel
Automatic Crack Detection Network Based on U-
References Net for Asphalt Pavement,” IEEE Access, vol. 12,
no. November 2023, pp. 12655–12666, 2024, doi:
[1] X. Feng et al., “Pavement crack detection and 10.1109/ACCESS.2024.3353729.
segmentation method based on improved deep [14] Z. Zhang, K. Yan, X. Zhang, X. Rong, D. Feng, and
learning fusion model,” Math. Probl. Eng., vol. S. Yang, “Automated highway pavement crack
2020, no. 1, p. 8515213, 2020. recognition under complex environment,” Heliyon,
[2] Z. Li, H. Zhang, Z. Li, and Z. Ren, “Residual- vol. 10, no. 4, p. e26142, 2024, doi:
attention UNet++: a nested residual-attention U-net 10.1016/[Link].2024.e26142.
for medical image segmentation,” Appl. Sci., vol. 12, [15] M. Hussain, “Yolov5, yolov8 and yolov10: The go-
no. 14, p. 7149, 2022. to detectors for real-time vision,” arXiv Prepr.
[3] K. Malek, A. Mohammadkhorasani, and F. Moreu, arXiv2407.02988, 2024.
“Methodology to integrate augmented reality and [16] A. J. Yousif and M. H. Al-Jammas, “Real-time
pattern recognition for crack detection,” Comput. Arabic Video Captioning Using CNN and
Civ. Infrastruct. Eng., vol. 38, no. 8, pp. 1000–1019, Transformer Networks Based on Parallel
2023. Implementation,” Diyala J. Eng. Sci., pp. 84–93,
[4] M.-V. Pham, Y.-S. Ha, and Y.-T. Kim, “Automatic 2024.
detection and measurement of ground crack [17] M. M. Islam, M. B. Hossain, M. N. Akhtar, M. A.
propagation using deep learning networks and an Moni, and K. F. Hasan, “CNN based on transfer

86
Shemeam T. Muhey, Sinan A. Naji/ Diyala Journal of Engineering Sciences Vol (18) No 3, 2025: 68-87

learning models using data augmentation and arXiv1804.03999, 2018.


transformation for detection of concrete crack,” [30] J. Naranjo-Alcazar, S. Perez-Castanos, I. Martin-
Algorithms, vol. 15, no. 8, p. 287, 2022. Morato, P. Zuccarello, and M. Cobos, “On the
[18] C. Shorten and T. M. Khoshgoftaar, “A survey on performance of residual block design alternatives in
image data augmentation for deep learning,” J. big convolutional neural networks for end-to-end audio
data, vol. 6, no. 1, pp. 1–48, 2019. classification,” arXiv Prepr. arXiv1906.10891,
[19] R. J. Kolaib and J. Waleed, “Crime Activity 2019.
Detection in Surveillance Videos Based on [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
Developed Deep Learning Approach,” Diyala J. learning for image recognition,” in Proceedings of
Eng. Sci., pp. 98–114, 2024. the IEEE conference on computer vision and pattern
[20] T. Diwan, G. Anirudh, and J. V Tembhurne, “Object recognition, 2016, pp. 770–778.
detection using YOLO: Challenges, architectural [32] A. M. Hafiz, S. A. Parah, and R. U. A. Bhat,
successors, datasets and applications,” Multimed. “Attention mechanisms and deep learning for
Tools Appl., vol. 82, no. 6, pp. 9243–9275, 2023. machine vision: A survey of the state of the art,”
[21] B. Liu, W. Zhao, and Q. Sun, “Study of object arXiv Prepr. arXiv2106.07550, 2021.
detection based on Faster R-CNN,” in 2017 Chinese [33] A. J. Yousif and M. H. Al-Jammas, “A Lightweight
automation congress (CAC), IEEE, 2017, pp. 6233– Visual Understanding System for Enhanced
6236. Assistance to the Visually Impaired Using an
[22] H. Oliveira and P. L. Correia, “Automatic road crack Embedded Platform,” Diyala J. Eng. Sci., pp. 146–
segmentation using entropy and image dynamic 162, 2024.
thresholding,” in 2009 17th European Signal [34] T. Yun et al., “Individual tree crown segmentation
Processing Conference, IEEE, 2009, pp. 622–626. from airborne LiDAR data using a novel Gaussian
[23] C.-Y. Wang and H.-Y. M. Liao, “YOLOv1 to filter and energy function minimization-based
YOLOv10: The fastest and most accurate real-time approach,” Remote Sens. Environ., vol. 256, p.
object detection systems,” APSIPA Trans. Signal 112307, 2021.
Inf. Process., vol. 13, no. 1, 2024. [35] Z. Azouz, B. Honarvar Shakibaei Asli, and M. Khan,
[24] M. F. Rashad and Q. I. Ali, “Deploying Android- “Evolution of crack analysis in structures using
Based Smart RSUs with YOLOv8 and SAHI for image processing technique: A review,” Electronics,
Enhanced Traffic Management,” Diyala J. Eng. Sci., vol. 12, no. 18, p. 3862, 2023.
pp. 70–90, 2025. [36] J. Toribio, J.-C. Matos, and B. González, “Aspect
[25] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, and J. ratio evolution in embedded, surface, and corner
Han, “Yolov10: Real-time end-to-end object cracks in finite-thickness plates under tensile fatigue
detection,” Adv. Neural Inf. Process. Syst., vol. 37, loading,” Appl. Sci., vol. 7, no. 7, p. 746, 2017.
pp. 107984–108011, 2024. [37] D. Schlicke, E. M. Dorfmann, E. Fehling, and N. V.
[26] J. Long, E. Shelhamer, and T. Darrell, “Fully Tue, “Calculation of maximum crack width for
convolutional networks for semantic segmentation,” practical design of reinforced concrete,” Civ. Eng.
in Proceedings of the IEEE conference on computer Des., vol. 3, no. 3, pp. 45–61, 2021.
vision and pattern recognition, 2015, pp. 3431– [38] M. Sabouri and A. Sepidbar, “SUT-Crack: A
3440. comprehensive dataset for pavement crack detection
[27] O. Ronnerberger, P. Fischer, and T. Brox, “U-Net: across all methods,” Data Br., vol. 51, p. 109642,
Convolutional Neural Networks for Biomedical 2023.
Image Segmentation,” in Medical Image Computing [39] M. Lan, D. Yang, S. Zhou, and Y. Ding, “Crack
and Computer-Assisted Intervention—MICCAI, detection based on attention mechanism with
2015. YOLOv5,” Eng. Reports, vol. 7, no. 1, p. e12899,
[28] H. Huang et al., “Unet 3+: A full-scale connected 2025.
unet for medical image segmentation,” in ICASSP [40] L. Deng, A. Zhang, J. Guo, and Y. Liu, “An
2020-2020 IEEE international conference on Integrated Method for Road Crack Segmentation
acoustics, speech and signal processing (ICASSP), and Surface Feature Quantification under Complex
IEEE, 2020, pp. 1055–1059. Backgrounds,” Remote Sens., vol. 15, no. 6, 2023,
[29] O. Oktay et al., “Attention u-net: Learning where to doi: 10.3390/rs15061530.
look for the pancreas,” arXiv Prepr.

87

You might also like