0% found this document useful (0 votes)
66 views26 pages

Advanced Machine Learning Techniques For Predictin

This article presents a machine learning model for predicting concrete compressive strength using mix design variables and curing age, achieving a high predictive accuracy with a gradient boosting regressor (R2 = 0.94). The study emphasizes the importance of feature interactions, particularly the water-cement ratio and age, and utilizes SHAP values for enhanced interpretability of the model. The findings suggest that advanced machine learning techniques can optimize concrete mix design for sustainable construction, recommending further research with expanded datasets and features.

Uploaded by

Doua Abdou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views26 pages

Advanced Machine Learning Techniques For Predictin

This article presents a machine learning model for predicting concrete compressive strength using mix design variables and curing age, achieving a high predictive accuracy with a gradient boosting regressor (R2 = 0.94). The study emphasizes the importance of feature interactions, particularly the water-cement ratio and age, and utilizes SHAP values for enhanced interpretability of the model. The findings suggest that advanced machine learning techniques can optimize concrete mix design for sustainable construction, recommending further research with expanded datasets and features.

Uploaded by

Doua Abdou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Article

Advanced Machine Learning Techniques for Predicting Concrete


Compressive Strength
Mohammad Saleh Nikoopayan Tak 1,2 , Yanxiao Feng 2, * and Mohamed Mahgoub 2

1 School of Architecture, New Jersey Institute of Technology, Newark, NJ 07102, USA; [email protected]
2 School of Applied Engineering and Technology, New Jersey Institute of Technology, Newark, NJ 07102, USA;
[email protected]
* Correspondence: [email protected]

Abstract: Accurate estimation of concrete compressive strength is very important for


the improvement of mix design, quality assurance, and compliance with engineering
specifications. Most empirical traditional models have failed to capture the complex
relationships inherent within varied constituents of concrete mixes. This paper develops a
machine learning model for compressive strength prediction using mix design variables and
curing age from a “Concrete Compressive Strength Dataset” obtained from the UCI Machine
Learning Repository. After comprehensive data preprocessing and feature engineering,
various regression and classification models were trained and evaluated, including gradient
boosting, random forest, AdaBoost, k-nearest neighbors, linear regression, and neural
networks. The gradient boosting regressor (GBR) achieved the highest predictive accuracy
with an R2 value of 0.94. Feature importance analysis showed that the water–cement ratio
and age are the most crucial factors affecting compressive strength. Advanced methods such
as SHapley Additive exPlanations (SHAP) values and partial dependence plots were used
to attain deep insights about feature interaction with a view to enhancing interpretability
and fostering trust in models. Results highlight the potential of machine learning models
to improve concrete mix design with the aim of sustainable construction through the
optimization of material usage and waste reduction. It is recommended that future research
be undertaken with expanding datasets, more features, and richer feature engineering to
enhance predictive power.
Academic Editors: Darius Bačinskas
and Chris Goodier
Keywords: machine learning models; compressive strength prediction; feature importance
Received: 23 November 2024 analysis; SHAP values; mix design optimization; sustainable construction
Revised: 16 January 2025
Accepted: 17 January 2025
Published: 21 January 2025

Citation: Nikoopayan Tak, M.S.; 1. Introduction


Feng, Y.; Mahgoub, M. Advanced
Machine Learning Techniques for
Concrete is the most frequently used construction material globally due to its versatility,
Predicting Concrete Compressive durability, and cost-effectiveness [1]. Its mechanical properties, particularly compressive
Strength. Infrastructures 2025, 10, 26. strength, are critical for ensuring the safety and longevity of structures. Accurate prediction of
https://doi.org/10.3390/ concrete’s compressive strength is essential for mix design optimization, quality control, and
infrastructures10020026
compliance with engineering standards [2]. Traditional empirical methods for estimating com-
Copyright: © 2025 by the authors. pressive strength often involve extensive laboratory testing and simplistic models that may not
Licensee MDPI, Basel, Switzerland. capture the complex interactions among the multitude of variables in concrete mixtures. This
This article is an open access article
complexity has led researchers to explore advanced computational techniques, particularly
distributed under the terms and
machine learning (ML), to model and predict concrete behavior more accurately [3].
conditions of the Creative Commons
Attribution (CC BY) license
In recent years, ML algorithms have gained prominence in civil engineering applica-
(https://creativecommons.org/ tions due to their ability to model nonlinear relationships and handle large datasets. These
licenses/by/4.0/). algorithms learn patterns from historical data and can make accurate predictions based on

Infrastructures 2025, 10, 26 https://doi.org/10.3390/infrastructures10020026


Infrastructures 2025, 10, 26 2 of 26

input features, which makes them suitable for predicting the properties of various concrete
types, including those modified with Supplementary Materials such as fly ash, nano-silica,
recycled aggregates, and other industrial by-products.
Several studies have applied ML models to predict concrete compressive strength with
notable success. Alghrairi et al. [4] developed nine ML models to estimate the compressive
strength of lightweight concrete modified with nanomaterials. Among these, the gradient-
boosted trees (GBT) model outperformed others by achieving a coefficient of determination
(R2 ) of 0.90 and a root mean square error (RMSE) of 5.286 MPa. The study highlighted that
water content was the most influential factor affecting compressive strength predictions and
emphasized the critical role of the water-to-cement ratio in concrete mix design. Similarly,
Ding et al. [5] investigated ML models to predict the compressive strength of alkali-activated
cementitious materials using solid waste components. They employed six ML algorithms,
including support vector machine (SVM), random forest (RF), radial basis function neural
network (RBF), and long short-term memory network (LSTM). The SVM model achieved the
highest performance with an R2 of 0.9054 and a normalized root mean square error of 0.0997.
In addition to the evaluation of prediction accuracy, feature importance analysis using
SHapley Additive exPlanations (SHAP) revealed key influencing factors such as calcium
oxide content, water-to-binder ratio, silicon dioxide content, modulus of water glass, and
aluminum trioxide content. Ekanayake et al. [6] addressed the “black-box” nature of ML
models by employing SHAP to interpret predictions of concrete compressive strength.
Utilizing tree-based algorithms including XGBoost and light gradient boosting machine
(LGBM), they achieved high accuracy with an R-value of 0.98. The SHAP analysis provided
insights into feature importance and confirmed that age and cement content were the most
influential features. This approach demonstrated that ML models could capture complex
relationships among variables and lead to enhanced trust among domain experts.
Despite these advancements, a persistent limitation in the existing literature is the
inadequate exploration of feature interactions and their cumulative impact on model pre-
dictions. Most studies emphasize achieving high predictive accuracy without thoroughly
investigating how input variables interact within the models. For instance, Paudel et al. [7]
compared the performance of non-ensemble and ensemble ML models in predicting the
compressive strength of concrete containing fly ash. The study identified age, cement
content, and water content as the most influential features but lacked a comprehensive
analysis of feature interactions. Similarly, Song et al. [8] employed ML algorithms, includ-
ing gene expression programming (GEP), artificial neural network (ANN), decision tree
(DT), and bagging regressor, to predict the compressive strength of concrete with fly ash
admixture. While the study confirmed that the selection of input parameters and regressors
significantly affects the accuracy of predicted outcomes, it did not extensively explore
feature interactions. Tran et al. [9] evaluated the compressive strength of concrete made
with recycled concrete aggregates using six ML models. The GB_PSO model achieved the
highest prediction accuracy with an R2 of 0.9356. Feature importance analysis revealed that
cement content and water content were the most important factors affecting compressive
strength. However, the study primarily focused on individual feature importance rather
than the interactions between variables. Ahmad et al. [10] compared supervised ML al-
gorithms, including ANN, AdaBoost, and boosting, to predict the compressive strength
of geopolymer concrete containing high-calcium fly ash. This study demonstrated the
potential of ensemble methods in capturing complex patterns in data, which can lead to
more accurate predictions. Nevertheless, it did not explore the interactions among input
features. Anjum et al. [11] applied ensemble ML methods, including gradient boosting,
RF, bagging regressor, and AdaBoost regressor, to estimate the compressive strength of
fiber-reinforced nano-silica modified concrete. SHAP analysis revealed that the coarse
Infrastructures 2025, 10, 26 3 of 26

aggregate to fine aggregate ratio had a stronger negative correlation with compressive
strength, while specimen age positively affected it. The study highlighted the importance
of considering the interaction and effects of input parameters but did not provide a detailed
feature interaction analysis. Ullah et al. [12] predicted the compressive strength of sustain-
able foam concrete using individual and ensemble ML approaches, including SVM, RF,
bagging, boosting, and a modified ensemble learner. The study suggested that ensemble
learners significantly enhance the performance and robustness of ML models but did not
explore feature interactions in depth. Moreover, Kumar and Pratap [13] investigated the use
of ML models to predict the compressive strength of high-strength concrete and focused on
the influence of superplasticizer, sand, and water content. The study acknowledged the sig-
nificant influence of superplasticizer on compressive strength but lacked a comprehensive
analysis of feature interactions. Nguyen et al. [14] proposed a machine learning approach
using multivariate polynomial regression and automated feature engineering to predict
the compressive strength of ultra-high-performance concrete (UHPC). While this study
provided insights into feature interactions, it was specific to UHPC and did not address
broader concrete types.
These studies collectively demonstrate that while ML models can achieve high accuracy
in predicting concrete compressive strength, they often lack interpretability due to insufficient
analysis of feature interactions. Most focus on individual feature importance without exploring
how variables interact within the model to influence predictions. This limitation hinders the
practical application of ML models in concrete mix design optimization, as understanding the
synergistic effects among key variables is crucial. To address this gap, there is a pressing need
for research that not only leverages advanced ML models for predicting concrete properties
but also provides a thorough analysis of feature interactions and their collective impact on
model predictions. Such an approach would enhance the interpretability of the models, allow
for more informed decision-making in mix design optimization, and promote the development
of high-performance, durable, and sustainable concrete materials.
Recent research has also begun integrating advanced predictive modeling with sus-
tainability considerations. For example, Ref. [15] developed an ANN-based approach
for recycled aggregate concrete, offering high-accuracy compressive strength predictions
and practical closed-form solutions. In a related study, Ref. [16] examined ultra-high-
performance lightweight concrete incorporating rice husk ash, applying life cycle assess-
ment (LCA) to evaluate the environmental performance alongside compressive strength.
Similarly, Ref. [17] employed multiple AI and optimization techniques to investigate inter-
actions between fly ash content, mechanical properties, and environmental impact, thereby
informing multi-objective optimization of sustainable concrete mixes. These contributions
underscore a growing emphasis on not only predicting performance but also considering
environmental implications. Nevertheless, even with these advancements, a persistent gap
remains in the literature: the need for a more thorough exploration of feature interactions
and their collective influence on model predictions. Addressing this gap is crucial for both
interpretability and practical utility in concrete mix design.
Unlike prior work that predominantly focuses on predictive accuracy, our approach not
only aims to achieve high accuracy but also provides in-depth interpretability by examining
feature interactions using SHAP and partial dependence plots. This dual focus on accuracy
and interpretability represents a key advancement over current methodologies to enable
more informed decision-making in concrete mix design. This study aims to fulfill this need
by developing machine learning models capable of predicting the compressive strength of
various concrete types, including diverse input variables related to mix composition. By
employing advanced feature importance analysis methods such as SHAP and interaction
effects such as partial dependence plots, we investigate the interactions among these input
Infrastructures 2025, 10, 26 4 of 26

variables and their collective impact on compressive strength predictions. Additionally, we


classify concrete samples into predefined strength categories closely aligned with industry
standards and thresholds defined by the American Concrete Institute (ACI) [18] to make
our models more applicable for industry uses that may require knowledge of the concrete
class rather than the exact strength value.
This study is guided by several key research questions. First, it aims to explore how
effectively machine learning models can predict concrete compressive strength using mix
design parameters and curing age, while also examining how input variables interact
within these models to influence the predictions. Additionally, the study investigates
whether advanced feature importance analysis techniques, such as SHAP values, can
enhance the interpretability of machine learning models in concrete strength prediction
by revealing feature interactions and their impact on model outputs. Finally, the research
seeks to determine how accurately machine learning models can classify concrete samples
into predefined strength categories.
To answer these questions, the research follows a multi-step process that includes compre-
hensive data preprocessing to address missing values, outliers, and inconsistencies, followed
by exploratory data analysis (EDA) to uncover patterns and relationships within the data.
Feature selection techniques are employed to identify the most relevant variables affecting
concrete strength to enhance model performance and interpretability. A range of machine
learning algorithms, including regression models and classification models for strength cate-
gorization, are trained and evaluated using performance metrics such as accuracy and mean
squared error. By integrating advanced feature interaction analysis into ML models for con-
crete strength prediction, this study contributes to the advancement of data-driven approaches
in concrete technology. The findings are expected to provide valuable insights for optimizing
mix designs and ensuring quality control in the construction industry.

2. Materials and Methods


This study employed a comprehensive methodology to analyze and predict the com-
pressive strength of concrete using various machine learning models. The research process,
as illustrated in Figure 1, involved data collection, data preprocessing, exploratory data
analysis, feature engineering, and the development and evaluation of multiple regression
and classification models. The aim was to identify the most effective predictive models and
Infrastructures 2025, 10, x FOR PEER understand
REVIEW the underlying factors influencing concrete strength through the application
5 of of
27
machine learning techniques and feature interaction analysis.

Figure 1.
Figure 1. Framework
Framework for
for modeling
modeling analysis
analysis of
of concrete
concretecompressive
compressivestrength.
strength.

2.1. Data Collection and Description


The study utilized the “Concrete Compressive Strength Dataset” from the UCI Ma-
chine Learning Repository, generously provided by Prof. I-Cheng Yeh [19]. The dataset
comprises 1030 observations with nine variables, each representing a unique concrete mix
Infrastructures 2025, 10, 26 5 of 26

2.1. Data Collection and Description


The study utilized the “Concrete Compressive Strength Dataset” from the UCI Ma-
chine Learning Repository, generously provided by Prof. I-Cheng Yeh [19]. The dataset
comprises 1030 observations with nine variables, each representing a unique concrete mix
design. The features include the quantities of different concrete components measured in
kilograms per cubic meter (kg/m3 ) and the age of the concrete in days. The target variable
is the concrete compressive strength measured in megapascals (MPa). The dataset contains
1030 instances (rows) and 9 attributes (columns), where each row represents a concrete
sample, and the columns correspond to the features described in Table 1.

Table 1. Variable description in the dataset.

Variable Type Unit Description


cement quantitative kg/m3 Input
blast furnace slag quantitative kg/m3 Input
fly ash quantitative kg/m3 Input
water quantitative kg/m3 Input
superplasticizer quantitative kg/m3 Input
coarse aggregate quantitative kg/m3 Input
fine aggregate quantitative kg/m3 Input
age quantitative Days Input
compressive strength quantitative MPa Output

2.2. Data Preprocessing


The data preprocessing phase was critical to ensure data quality and prepare the
dataset for modeling. It involved data cleaning, exploratory data analysis, handling of
outliers, feature engineering, and data scaling.

2.2.1. Data Cleaning


The dataset was initially inspected for missing values, duplicates, and inconsistencies.
Using Python 3.10.13 with the pandas library [20], it was confirmed that there were no miss-
ing values in any of the variables. Duplicate entries were identified using the duplicated()
function, which revealed 25 duplicate rows. These duplicates were removed to ensure
data quality, reducing the dataset to 1005 unique observations. Additionally, a preliminary
analysis, as shown in Figure 2, indicated the existence of outliers in the dataset. To mitigate
the potential impact of these outliers on model performance, they were identified and
removed using the interquartile range (IQR) method [21]. The IQR was calculated as the
difference between the 75th (Q3) and 25th (Q1) percentiles, and any data points lying
below Q1 − 1.5 IQR or above Q3 + 1.5 IQR were considered outliers. Significant outliers
were found in variables such as age, and these outliers were removed from the dataset
to improve model accuracy and generalizability. After outlier removal, the final dataset
consisted of 911 observations.

2.2.2. Exploratory Data Analysis


Exploratory data analysis was performed to assess the distribution and characteristics
of the variables related to concrete strength. We followed the EDA practices outlined
by [22], which emphasize visualizing data distributions using histograms and frequency
plots to uncover potential skewness or anomalies. Histograms and frequency plots were
generated using Matplotlib [23] to visualize these distributions distinctly (see Figure 3). The
analysis revealed a wide range of cement content with a peak of around 160 kg/m3 , which
suggests variability in mix designs used across different concrete samples. The distribution
of water content was mostly centralized around 190 kg/m3 , which indicates a common
Infrastructures 2025, 10, 26 6 of 26

standard in water usage for these concrete mixtures. Most samples contained low amounts
of blast furnace slag, with a significant peak at 0 kg/m3 , which highlights its optional use in
Infrastructures 2025, 10, x FOR PEER REVIEW
the mixtures. The majority of the data points were clustered at low superplasticizer content, 6 of 27

with a significant number of observations showing zero usage, emphasizing its selective
application depending on specific mix requirements. There was a significant spike in age
identified
at 28 days,and removed
which using the
is commonly interquartile
recognized as arange (IQR)
standard method
curing [21].
time forThe IQRconcrete
testing was cal-
culated as
strength thealthough
[24], difference between
other the 75th
ages were also (Q3) and 25th
represented to (Q1) percentiles,
a lesser extent. Theand any data
strength of
points lying
concrete below
showed Q1 − 1.5distribution
a normal IQR or abovewithQ3a +mean
1.5 IQR were considered
of around 35 MPa and outliers. Signifi-
illustrates the
cant outliers
common rangewere found inencountered
of strength variables such as age, concrete
in typical and these outliers were
applications. removed
This from
exploratory
the dataset
analysis to improve
provided model accuracy
a foundation and generalizability.
for understanding After outlierofremoval,
the key characteristics the
the dataset,
which informconsisted
final dataset the subsequent predictive modeling efforts.
of 911 observations.

Infrastructures 2025, 10, x FOR PEER REVIEW


Figure 2. Distribution of concrete
Figure 2. concrete mix
mix components
components and
and compressive
compressive strength.
strength.

2.2.2. Exploratory Data Analysis


Exploratory data analysis was performed to assess the distribution and characteris-
tics of the variables related to concrete strength. We followed the EDA practices outlined
by [22], which emphasize visualizing data distributions using histograms and frequency
plots to uncover potential skewness or anomalies. Histograms and frequency plots were
generated using Matplotlib [23] to visualize these distributions distinctly (see Figure 3).
The analysis revealed a wide range of cement content with a peak of around 160 kg/m3,
which suggests variability in mix designs used across different concrete samples. The dis-
tribution of water content was mostly centralized around 190 kg/m3, which indicates a
common standard in water usage for these concrete mixtures. Most samples contained
low amounts of blast furnace slag, with a significant peak at 0 kg/m3, which highlights its
optional use in the mixtures. The majority of the data points were clustered at low super-
plasticizer content, with a significant number of observations showing zero usage, em-
phasizing its selective application depending on specific mix requirements. There was a
significant spike in age at 28 days, which is commonly recognized as a standard curing
time for testing concrete strength [24], although other ages were also represented to a
lesser extent. The strength of concrete showed a normal distribution with a mean of
around 35 MPa and illustrates the common range of strength encountered in typical con-
crete applications.
Figure 3. Concrete This
mix exploratory
design attributes analysis
and theirprovided a foundation
relationship for understanding
withrelationship
compressive strength. the
Red lines
Figure 3. Concrete mix design attributes and their with compressive strength. Re
key characteristics
represent smoothed of the
density dataset,
curves for which
each inform
histogram. the subsequent predictive modeling ef-
represent smoothed density curves for each histogram.
forts.
2.2.3. Correlation Analysis and Preparation of Predictor Variables
The Pearson correlation coefficient was calculated using Pandas [25] to ident
relationships between the input features and the target variable, compressive stren
Infrastructures 2025, 10, 26 7 of 26

2.2.3. Correlation Analysis and Preparation of Predictor Variables


The Pearson correlation coefficient was calculated using Pandas [25] to identify the
relationships between the input features and the target variable, compressive strength. A
correlation matrix was visualized using the heatmap function from the Seaborn library [26]
to illustrate these relationships (see Figure 4). The correlation matrix reveals a moderate
positive correlation between cement content and compressive strength. This correlation
indicates that increases in cement content are associated with increases in compressive
strength, although the relationship is not exceptionally strong. Blast furnace slag and fly
ash show moderate negative correlations with cement content. These findings suggest
their use as partial cement replacements and imply that mixes with higher quantities of
blast furnace slag and fly ash tend to have lower cement content. The data also reveal
a strong negative correlation between water content and superplasticizer usage. This
correlation emphasizes the role of superplasticizers in reducing water demand to maintain
workability, thereby enhancing the concrete’s performance and durability. Moreover,
a moderate positive correlation exists between superplasticizer usage and compressive
strength. Interestingly, both coarse and fine aggregates display weak negative correlations
with compressive strength, with R-values of −0.15 and −0.18, respectively. Finally, concrete
age shows a moderate positive correlation with compressive strength, indicated by an
R-value of 0.52. This relationship highlights the importance of the curing process, as the
Infrastructures 2025, 10, x FOR PEER REVIEW 8 of 27
ongoing chemical reactions during this time enhance the concrete’s structural integrity and
compressive capabilities.

Figure 4. Correlation matrix between the input features and the target variable.
Figure 4. Correlation matrix between the input features and the target variable.
2.2.4. Feature Engineering and Multicollinearity Analysis
2.2.4. Feature Engineering and Multicollinearity Analysis
Multicollinearity among predictor variables can negatively impact the stability and
Multicollinearity amongmodels
interpretability of regression predictor
by variables canvariance
inflating the negatively impact theestimates
of coefficient stability and
[27].
interpretability
To quantify theofdegree
regression models by inflating
of multicollinearity the the
among variance of coefficient
predictor estimates
variables, [27].
the variance
To quantify
inflation the (VIF)
factor degree of calculated
was multicollinearity among
using the the predictor variables,function
variance_inflation_factor() the variance
from
inflation factor (VIF) was calculated in
statsmodels.stats.outliers_influence using the variance_inflation_factor()
Python. The VIF for each feature isfunction
computed fromas
statsmodels.stats.outliers_influence in Python. The VIF for each feature is computed as
VIF = 1/(1 − R2), where R2 is obtained by regressing that feature against all other features.
The initial VIF analysis, presented in Figure 5a, revealed significant multicollinearity is-
sues. Notably, the VIF values for water, coarse aggregate, fine aggregate, and cement were
Infrastructures 2025, 10, 26 8 of 26

VIF = 1/(1 − R2 ), where R2 is obtained by regressing that feature against all other features.
The initial VIF analysis, presented in Figure 5a, revealed significant multicollinearity issues.
Notably, the VIF values for water, coarse aggregate, fine aggregate, and cement were
exceptionally high, with water exhibiting a VIF of 95.27, coarse aggregate at 84.71, fine
aggregate at 76.82, and cement at 14.15. Such high VIF scores indicate that these variables
are highly correlated with other predictors, which can destabilize regression models and
obscure the true relationships between variables and the target outcome.
To mitigate multicollinearity and enhance the predictive power of the models, feature
engineering was employed based on domain knowledge in concrete technology [28,29].
Two new features were created: the water–cement ratio (W/C ratio) and the coarse
aggregate–fine aggregate ratio (C/F ratio). The W/C ratio was calculated by dividing
the water content by the cement content. This ratio is a critical factor influencing concrete
strength, as it affects the hydration process and the microstructure of the hardened concrete.
A lower W/C ratio generally leads to higher strength and durability. The C/F ratio was
determined by dividing the coarse aggregate content by the fine aggregate content. This
ratio impacts the workability, compaction, and overall strength of concrete by influencing
the particle packing and void content within the mix [30].

Water
W/C Ratio = (1)
Cement
Infrastructures 2025, 10, x FOR PEER REVIEW 9 of 27
Coarse Aggregate
C/F Ratio = (2)
Fine Aggregate
By transforming
transformingthe theoriginal
original highly
highly correlated
correlated variables
variables into into ratios,
ratios, the absolute
the absolute quan-
quantities, previously
tities, previously exhibiting
exhibiting high multicollinearity,
high multicollinearity, were converted
were converted intomeasures
into relative relative
measures
that capture thatthe
capture theproportional
essential essential proportional
relationshipsrelationships in themix.
in the concrete concrete
This mix. This
approach
approach reduced redundancy
reduced redundancy among predictors
among predictors while the
while retaining retaining
criticalthe critical information
information necessary
necessary
for accurate forstrength
accurate strength prediction.
prediction. After featureAfter feature engineering,
engineering, the VIF wasfor
the VIF was recalculated recal-
the
culated
updatedfor setthe updated The
of features. set of features.
results, shownTheinresults,
Figureshown in Figure
5b, indicated 5b, indicated
a substantial a sub-
reduction
stantial reduction inacross
in multicollinearity multicollinearity
the dataset. across
The VIF the dataset.
values Thenewly
for the VIF values for thefeatures
engineered newly
engineered features
were significantly werewith
lower, significantly lower, with
the water–cement theatwater–cement
ratio ratio ataggregate–fine
10.24 and the coarse 10.24 and the
coarse
aggregateaggregate–fine
ratio at 7.98. aggregate
While these ratio at 7.98.
values While
are still these
above thevalues
commonlyare still above threshold
accepted the com-
monly accepted
of 5, they representthreshold
a marked of 5, they represent
improvement a marked
from improvement
the initial from the
VIF scores. These initialwere
features VIF
scores.
retained These
due features were retained
to their significant due toimportance
practical their significant practical importance
and contribution and con-
to the predictive
tribution
capabilitytoofthethepredictive
models. Othercapability of the
features alsomodels.
exhibited Other featuresVIF
acceptable also exhibited
values, accepta-
all below the
ble VIF values,
threshold of 5. all below the threshold of 5.

VIF results for input feature selection: (a) all initial features, (b) revised feature set.
Figure 5. VIF

2.2.5. Data Scaling


Machine learning algorithms, especially those involving gradient descent optimiza-
tion, can be sensitive to the scale of the input features. To ensure all features contribute
equally to the model training and to improve convergence, data scaling was performed
Infrastructures 2025, 10, 26 9 of 26

2.2.5. Data Scaling


Machine learning algorithms, especially those involving gradient descent optimization,
can be sensitive to the scale of the input features. To ensure all features contribute equally
to the model training and to improve convergence, data scaling was performed using
min–max normalization [31]. The MinMaxScaler from scikit-learn’s preprocessing module
was applied to rescale all features to a range between 0 and 1.

2.2.6. Discretization of the Target Variable for Classification


Before applying classification techniques, the continuous target variable (compressive
strength) was converted into categorical classes based on scales aligning closely with com-
mon practices in the construction industry [15]. The categories and their corresponding
count are shown in Table 2. By assigning each concrete sample to one of these categories,
the continuous numeric target values were transformed into discrete labels suitable for
classification algorithms. This approach ensured that classifiers could effectively dis-
tinguish among these defined strength categories rather than attempting to predict a
continuous value.

Table 2. Concrete compressive strength categories.

Strength Classification Threshold (MPa) Count


very high strength ≥60 62
high strength [41, 59.99] 215
normal strength [30, 40.99] 250
weak [20, 29.99] 190
very weak <20 194

2.3. Model Development and Evaluation


The core of the methodology involved developing and evaluating various machine
learning models for both regression and classification tasks. The objective was to predict the
compressive strength of concrete accurately and to classify concrete samples into predefined
strength categories. Multiple machine learning models were developed and evaluated for
these regression and classification tasks. The models considered are shown in Table 3. The
dataset was split into training and testing sets using an 80–20 split with the train_test_split
function from the scikit-learn library [32].

Table 3. Overview of machine learning models evaluated.

Regression Models Classification Models


linear regression
k-nearest neighbors (KNN) regression RF classifier
decision tree regression logistic regression
RF regression SVM
gradient boosting regression k-nearest neighbors (KNN) classifier
AdaBoost regression bagging classifier
neural network

An 80–20% training–testing split was selected to align with common machine learning
practices for robust evaluation [33]. To ensure that the training and testing subsets share
similar statistical characteristics, we first divided the target variable in the dataset into
ten quantile-based bins (num_bins = 10) and then performed a stratified split. After this
procedure, we computed descriptive statistics—record count, minimum, maximum, range,
Infrastructures 2025, 10, 26 10 of 26

mean, variance, and standard deviation—for each numeric feature. As presented in Table 4,
the training and testing sets exhibited very similar statistics. Additionally, Kolmogorov–
Smirnov tests [34] for each feature yielded high p-values (all > 0.05), which indicated no
statistically significant differences between the distributions of the two subsets. These
results confirmed that the testing set is representative of the training set and ensured that
the performance metrics derived from the test set are both reliable and unbiased. The
models were then trained on the training set and evaluated on the testing set.

Table 4. Descriptive statistics of features for training and testing sets.

Feature Number Min Max Range Mean Variance Std Dev


Blast Furnace Slag 728 0 342.1 342.1 71.75 7453.08 86.33
Fly Ash 728 0 200.1 200.1 59.92 4102.09 64.05
Superplasticizer 728 0 22 22 6.06 27.27 5.22
Training Set
Age 728 1 120 119 31.86 792.56 28.15
Water_Cement_Ratio 728 0.3 1.88 1.58 0.77 0.1 0.31
Coarse_Fine_Ratio 728 0.92 1.87 0.95 1.28 0.03 0.18
Blast Furnace Slag 183 0 305.3 305.3 70.21 7331.27 85.62
Fly Ash 183 0 195 195 59.97 4437.47 66.61
Superplasticizer 183 0 22.1 22.1 5.88 28.08 5.3
Testing Set
Age 183 3 120 117 33.15 871.41 29.52
Water_Cement_Ratio 183 0.28 1.66 1.38 0.76 0.09 0.31
Coarse_Fine_Ratio 183 0.94 1.84 0.89 1.26 0.03 0.17

2.3.1. Regression and Classification Models


Multiple regression models were developed to predict concrete compressive strength,
using a range of techniques to capture both linear and non-linear relationships within the
data. These models are shown below.
• Linear regression: This serves as a baseline model to establish a benchmark and
assess the extent of linear relationships between the features and the target variable
(compressive strength).
• Decision tree regression: This model is employed to capture non-linear relationships
by partitioning the data based on feature thresholds, effectively creating a tree-like
structure of decisions to arrive at a prediction.
• RF regression: This ensemble method combines multiple decision trees to improve
predictive accuracy and mitigate overfitting, by leveraging the wisdom of the crowd
for a more robust prediction.
• Gradient boosting regression: This technique builds models sequentially, with each
subsequent model correcting errors made by previous ones. This iterative approach
enhances performance, particularly on complex datasets with intricate patterns.
• AdaBoost regression: Similar to gradient boosting, AdaBoost focuses on instances
where prior models struggled and adjusts weights accordingly to improve prediction
accuracy on challenging data points.
• KNN regression: This model predicts target values based on the average of the
nearest neighbors in the feature space and leverages the similarity between data points
for prediction.
• Neural network model: A neural network model was implemented using Tensor-
Flow [35] and Keras [36] to capture complex, non-linear relationships within the data.
The architecture comprises an input layer, hidden layers, and output layers.
For the classification task, five classification models were developed and evaluated,
including RF classifier, logistic regression, SVM, KNN classifier, and bagging classifier with
Infrastructures 2025, 10, 26 11 of 26

decision trees. Each classifier underwent hyperparameter tuning to optimize performance.


Bayesian optimization (BayesSearchCV from skopt) was employed for all models, which
iteratively refines the hyperparameter search space based on model parameters [37,38].
The hyperparameters considered for the regression and classification models are
detailed in Table 5. Additional details on the default values, tuned values, and optimization
processes employed are provided in Section 3.

Table 5. Hyperparameters considered for regression and classification models in this study.

Regression Classification
Model Hyperparameters Considered Model Hyperparameters Considered
Linear Logistic penalty (l1, l2), C (regularization
None (used ordinary least squares)
Regression Regression strength), solver (saga)
C (regularization), gamma (kernel
K-Nearest Support Vector
n_neighbors, metric, weights coefficient), kernel (linear, rbf, poly,
Neighbors Machine
sigmoid), degree (if kernel = poly)
n_neighbors, weights (uniform,
Decision Tree max_depth, min_samples_split, k-Nearest
distance), p (distance metric:
Regressor min_samples_leaf Neighbors
1 = Manhattan, 2 = Euclidean)
Random n_estimators, max_depth, Random
n_estimators, max_depth,
Forest min_samples_split, min_samples_leaf, Forest
min_samples_split, max_features
Regressor max_features Classifier
n_estimators, max_samples,
max_features, bootstrap,
Gradient Bagging
n_estimators, learning_rate, max_depth, bootstrap_features,
Boosting Classifier
subsample, min_samples_split estimator__max_depth,
Regressor (with DT)
estimator__criterion (for
DecisionTreeClassifier)
AdaBoost n_estimators, learning_rate, base_estimator
Regressor (DT max_depth)
Neural Number of layers, units per layer,
Network activation, dropout rate, batch size, epochs,
(MLP) optimizer, learning_rate, L2 regularization

2.3.2. Model Evaluation Metrics


The performance of the developed models was assessed using appropriate evaluation
metrics for both regression and classification tasks. For regression models, the mean
squared error (MSE) and the coefficient of determination, known as R2 , were employed to
quantify the accuracy of the predictions. The MSE measures the average squared difference
between the predicted values (ŷi ) and the actual observed values (yi ). It is defined by
Equation (3). The R2 metric represents the proportion of variance in the dependent variable
that is predictable from the independent variables. It is calculated using Equation (4).

1 n
n ∑ i =1 i
MSE = (y − ŷi )2 (3)

where n is the number of observations. A lower MSE indicates that the model’s predictions
are closer to the actual values, which signifies better predictive accuracy.

2
∑in=1 (yi − ŷi )
R2 = 1 − 2
(4)
∑in=1 (yi − y)
Infrastructures 2025, 10, 26 12 of 26

where y is the mean of the observed data. An R2 value closer to 1 indicates that a higher
proportion of variance is explained by the model, and it reflects a better fit.
Additionally, to provide a more comprehensive and intuitive visual comparison of
the regression models’ performance, a Taylor diagram was employed. The Taylor diagram
plots correlation (with the observed values), the ratio of the standard deviation of the
model predictions to that of the observations, and the centered RMS error, all on a single
polar coordinate plot [39]. This approach allows simultaneous evaluation of how well each
model’s variability and pattern of predictions match the observed data.
For classification models, accuracy was calculated to determine the overall effec-
tiveness of the model in correctly predicting the class labels. It is given by Equation (5).
However, in datasets with class imbalances, accuracy can be misleading because it may
be biased towards the majority class. To address this, balanced accuracy was used, which
adjusts for imbalanced classes by averaging the recall (sensitivity) obtained for each class.
It is defined by Equation (6).

Number of Correct Predictions


Accuracy = (5)
Total Number of Predictions
1 K TPk
Balanced Accuracy =
K ∑ k = 1 TPk + FNk
(6)

where K is the number of classes, TPk is the number of true positives for class k, and FNk is
the number of false negatives for class k.
To gain deeper insights into the model’s performance on individual classes, precision,
recall, and F1-score [40] were calculated for each class. Precision measures the proportion of
correct positive predictions among all positive predictions, defined in Equation (7). Recall,
also known as sensitivity, assesses the model’s ability to correctly identify all positive
instances (see Equation (8)). The F1-score, as defined in Equation (9), is the harmonic mean
of precision and recall, which provides a single metric that balances both concerns.

TP
Precision = (7)
TP + FP

where TP is the number of true positives, and FP is the number of false positives.

TP
Recall = (8)
TP + FN

Precision × Recall
F1 − score = 2 × (9)
Precision + Recall
In multiclass classification settings with imbalanced classes, evaluating overall model
performance requires aggregating these per-class metrics. To account for the varying
number of instances in each class, weighted average precision, weighted average recall,
and weighted average F1-score were calculated. These metrics are computed by weighting
the per-class metrics by the number of true instances in each class to ensure classes with
more samples have a proportionally greater impact on the overall score.
The weighted average precision is calculated as follows:

∑kK=1 nk × Precisionk
Weighted Precision = (10)
∑kK=1 nk

where nk is the number of true instances in class k. Similarly, the weighted accuracy, average
recall, and F1-score were calculated.
Infrastructures 2025, 10, 26 13 of 26

The use of balanced accuracy and weighted metrics is particularly important in the
presence of class imbalance, which was evident in our dataset (see Table 3). Certain
strength categories had significantly more samples than others, which could bias the
model’s performance towards those classes. The confusion matrix was also utilized to
visualize the performance of the classification models by displaying the counts of true
positive, true negative, false positive, and false negative predictions for each class. This
matrix allowed for a detailed error analysis by highlighting specific areas where the model
was misclassifying observations.
To optimize model performance and ensure robust hyperparameter selection, bayesian
optimization was conducted using 5-fold cross-validation. This involved partitioning the
training dataset into five equal subsets, training the model on four subsets, and evaluating
its performance on the remaining subset. By averaging the performance across folds, this
approach provides a more reliable estimate of the model’s generalization ability and helps
mitigate the risk of overfitting during hyperparameter tuning.

2.3.3. Minimum Dataset Size Analysis


To address the question of the optimal dataset size for stable and reliable predictions,
a subsampling analysis was conducted. The size of the training subset was incrementally
increased from 30 samples to 900 samples (in increments of 5), and the performance of the
best-performing regression model was evaluated on each subset. For each subset size, we
performed multiple runs with 30 different random seeds to obtain the mean and confidence
intervals for both R2 and MSE. This approach allowed us to identify the point at which
further increases in dataset size yield diminishing returns in terms of predictive accuracy.

2.4. Feature Importance Analysis


Understanding the contribution of each feature to the predictions of the best-
performing regression model was essential for interpreting the model and gaining in-
sights into the factors influencing concrete compressive strength. Therefore, the best-
performing regression model was analyzed using two methods: mean decrease in impurity
(MDI) [41] and SHAP (SHapley Additive exPlanations) values [42]. These methods pro-
vided both global and local interpretability of the model and helped to identify the most
influential features.

2.4.1. Mean Decrease in Impurity


The mean decrease in impurity is a feature importance metric intrinsic to tree-based
models like the GBR. It quantifies the importance of a feature by measuring how much
each feature reduces the impurity in a tree, averaged over all trees in the ensemble. For
regression trees, impurity was measured using variance. The impurity I(m) at node mm is
defined as follows: 2
1 
I (m) = ∑
Nm i∈ Nm
y i − y Nm (11)

where Nm is the number of samples at node m; yi is the target value of sample i; and y Nm
is the mean target value at node m. When a node m is split on feature j, the decrease in
impurity ∆I ( j, m) due to that feature is calculated as follows:

Nle f t Nright
 
∆I ( j, m) = I (m) − I (le f t) + I (right) (12)
Nm Nm

where Nle f t and Nright are the numbers of samples in the left and right child nodes, and
I (le f t) and I (right) are the impurities of the left and right child nodes. The mean decrease
in impurity for feature j across all trees T in the ensemble is then as follows:
Infrastructures 2025, 10, 26 14 of 26

1
| T | ∑ t ∈ T ∑ m ∈ Mt
MDIj = ∆It ( j, m) (13)

where Mt is the set of all nodes where feature j is used to split in tree t, and ∆It ( j, m) is the
decrease in impurity for feature j at node m in tree t. A higher MDI value indicates greater
importance of the feature in reducing the overall impurity of the model.

2.4.2. SHAP Values


SHAP values provide a unified approach to interpreting model predictions by assign-
ing each feature an importance value for a particular prediction [42]. Based on cooperative
game theory, SHAP values consider all possible combinations of features to ensure a fair
allocation of the contribution of each feature. The SHAP value ϕj for feature j is calculated
as follows:

|S|!(| F | − |S| − 1)! h   i


ϕj = ∑S⊑F\{ j} | F |!
f s∪{ j} xS∪{ j} − f s ( xS ) (14)

where F is the set of all features, { j} denotes the set containing only feature j, S is a subset
of features not containing feature j, |S| is the number of features in subset S,
 f S ( xS ) is the
model trained with features in subset S evaluated at xS , and f s∪{ j} xS∪{ j} is the model
trained with features in subset S ∪ { j} evaluated at xS∪{ j} .

2.4.3. Ablation Study


An ablation study was also conducted to assess the impact of progressively removing
features on the model’s performance. Starting with all features, features were removed one
at a time in order of increasing importance based on the MDI ranking. After each removal,
the GBR was retrained, and its performance was evaluated using the R2 metric. The R2
values were then plotted against the number of features retained.

2.4.4. Partial Dependence Plot


To further interpret the influence of key features on the predicted concrete compressive
strength, partial dependence plots (PDPs) [43] were employed. This method provides
insights into the relationship between the target variable and the features and helps to
understand whether the relationship is linear, monotonic, or more complex. The partial
dependence function for a feature xs is defined by Equation (15). For a pair of features xs1
and xs2 , the two-way partial dependence function is shown in Equation (16).

1 n  
fˆPD ( xs ) = ∑i=1 fˆ xs , xC
(i )
(15)
n

where fˆ is the trained predictive function (the best-performing regressor model), xs is the
(i )
feature (or set of features) for which the partial dependence is computed, xC represents
the values of all other features C (the complement of s) for instance i in the dataset, and n is
the number of instances in the dataset.
1 n  
fˆPD ( xs1 , xs2 ) = ∑i=1 fˆ xs1 , xs2 , xC
(i )
(16)
n
In this study, PDPs were generated for the top two most influential features iden-
tified in the feature importance analysis. Additionally, a two-way PDP was created to
examine the interaction effect between these two features on the predicted compressive
strength. The partial dependence functions fˆPD ( xs ) and fˆPD ( xs1 , xs2 ) were calculated
using the PartialDependenceDisplay.from_estimator method from the scikit-learn library.
Infrastructures 2025, 10, 26 15 of 26

The method systematically varies the feature(s) of interest while averaging out the effects
of all other features.

2.5. Model Implementation and Validation


The final models were implemented using optimized hyperparameters, and their
validation involved evaluating performance metrics on the test set to assess how well
they generalized to new, unseen data. For the regression tasks, actual versus predicted
values were visualized using scatter plots to qualitatively assess predictive accuracy, while
residual analysis was conducted to identify potential patterns that might reveal model bias
or heteroscedasticity.

3. Results
3.1. Regression Analysis
The regression models were evaluated based on their MSE and R2 values, as summa-
rized in Table 6. This table provides a clear comparison of their effectiveness in predicting
concrete compressive strength. The GBR emerged as the top performer with an MSE of
15.79 and an R2 value of 0.94, which indicates its ability to explain 94% of the variance in
compressive strength. Following closely, the RF regressor captured a significant portion
of the target variable’s variance with an R2 value of 0.91 and an MSE of 21.61. Both the
neural network model and AdaBoost also showed strong results, each with R2 values of
0.90. The KNN model demonstrated a moderate fit with an R2 of 0.84 and an MSE of 39.88,
while the decision tree regressor posted an MSE of 42.67 and an R2 of 0.83. The linear
regression model, simpler and less robust, managed an R2 of 0.69 and an MSE of 71.25,
which highlights its limited capacity to capture complex patterns in the data.

Table 6. Performance of regression models.

Model MSE R2
gradient boosting regressor 15.79 0.94
RF regressor 21.61 0.91
neural network model 24.20 0.90
AdaBoost 24.27 0.90
k-nearest neighbors 39.88 0.84
decision tree regressor 42.67 0.83
linear regression 71.25 0.69

Our R2 of 0.94 closely matches Alghrairi et al. [4]’s R2 of 0.90 using a gradient-boosted
trees model for nanomaterial lightweight concrete. This improvement is possibly due to
our ratio-based features (W/C and C/F) and enhanced hyperparameter tuning. Similarly,
Ding et al. [5] found that ensemble methods like RF and SVM outperformed single models
in predicting the compressive strength of alkali-activated materials.
To complement the statistical summary in Table 6, Figure 6 presents a Taylor diagram
that visually compares the predictions of each model to the observed compressive strengths.
In this diagram, the distance from the origin corresponds to the models’ standard deviations,
and their angular position represents the correlation with the observed data. Additionally,
the annotations near each model’s marker show the centered RMS (CRMS) error, which
provides a measure of how closely the model predictions match the observed values after
removing any bias. From Figure 6, we see that the GBR and RF models not only rank
highly in terms of MSE and R2 but also cluster closer to the observed standard deviation
reference point, exhibit higher correlations, and have lower CRMS errors. These visual
insights confirm and reinforce the numerical findings presented in Table 6. Meanwhile,
(CRMS) error, which provides a measure of how closely the model predictions match the
observed values after removing any bias. From Figure 6, we see that the GBR and RF
models not only rank highly in terms of MSE and R2 but also cluster closer to the observed
Infrastructures 2025, 10, 26 16 of 26
standard deviation reference point, exhibit higher correlations, and have lower CRMS er-
rors. These visual insights confirm and reinforce the numerical findings presented in Table
6.the
Meanwhile, the neural
neural network network and
and AdaBoost AdaBoost
models models
maintain maintain
strong strongand
correlations correlations
relativelyand
low
relatively low CRMS errors, which align well with their high R 2 values. In contrast, the
CRMS errors, which align well with their high R2 values. In contrast, the KNN and decision
KNN and decision
tree models, whiletree models, while
moderately moderately
correlated, correlated,
display display
larger CRMS largerconsistent
errors, CRMS errors,
with
consistent with their higher MSE values. The linear regression model
their higher MSE values. The linear regression model stands out as having stands out asweakest
the having
the weakest and
correlation correlation and CRMS
the highest the highest
error,CRMS error,
mirroring its mirroring its poor performance
poor performance in terms of MSEin
terms of
and R .2 MSE and R 2.

Figure 6. Taylor diagram for regression models.

The robustness of the GBR is further supported by Figure 7a,b. In Figure 7a, the residual
plot demonstrates that the residuals are randomly scattered around zero, which indicates
the absence of systematic patterns or biases. The residual variance is consistent across the
predicted values and suggests that the model performs reliably across the range of compressive
strengths. This uniformity reinforces the model’s superior fit. In Figure 7b, the “actual vs.
predicted values” plot shows points closely aligned with the ideal red dashed line, which
highlights the model’s accuracy in predicting the actual values. The tight clustering around
this line supports the model’s ability to make precise predictions.
In addition to evaluating model performance on the full dataset, we investigated how
model accuracy changes with different training set sizes. Figure 8 illustrates the relationship
between subset size and gradient-boosting regressor performance. Initially, as the subset
size increases from 30 samples upward, the R2 score improves dramatically, while the
MSE decreases significantly. Beyond approximately 400 samples, the improvement in
R2 and reduction in MSE become marginal, suggesting that the model has captured the
underlying data patterns sufficiently well. Hence, while larger datasets can still provide
benefits, a dataset size of around 400 observations appears to be a practical lower bound
for achieving near-optimal performance in this particular problem. This analysis suggests
that the current cleaned dataset size of 911 observations is more than sufficient for stable
sidual plot demonstrates that the residuals are randomly scattered around zero, which
indicates the absence of systematic patterns or biases. The residual variance is consistent
Infrastructures 2025, 10, 26 across the predicted values and suggests that the model performs reliably across the 17range
of 26
of compressive strengths. This uniformity reinforces the model’s superior fit. In Figure 7b,
the “actual vs. predicted values” plot shows points closely aligned with the ideal red
and high-quality
dashed predictions,
line, which highlightsand
thesmaller
model’sdatasets (onin
accuracy the order of athe
predicting fewactual
hundred samples)
values. The
could still achieve near-optimal results, given a similar data distribution and complexity.
tight clustering around this line supports the model’s ability to make precise predictions.

Infrastructures 2025, 10, x FOR PEER Figure


REVIEW
Figure 7. Residual analysis
7. Residual analysis and
and prediction
prediction accuracy
accuracy of
of the
the GBR.
GBR. (a):
(a): residual
residual plot;
plot; (b): 18 ofvs.
(b): actual
actual 27
vs.
predicted values.

In addition to evaluating model performance on the full dataset, we investigated how


model accuracy changes with different training set sizes. Figure 8 illustrates the relation-
ship between subset size and gradient-boosting regressor performance. Initially, as the
subset size increases from 30 samples upward, the R2 score improves dramatically, while
the MSE decreases significantly. Beyond approximately 400 samples, the improvement in
R2 and reduction in MSE become marginal, suggesting that the model has captured the
underlying data patterns sufficiently well. Hence, while larger datasets can still provide
benefits, a dataset size of around 400 observations appears to be a practical lower bound
for achieving near-optimal performance in this particular problem. This analysis suggests
that the current cleaned dataset size of 911 observations is more than sufficient for stable
and high-quality predictions, and smaller datasets (on the order of a few hundred sam-
ples) could still achieve near-optimal results, given a similar data distribution and com-
plexity.

2 score and MSE vs. dataset size (80/20 split) for the GBR model.
Figure 8. R
Figure 8. R2 score and MSE vs. dataset size (80/20 split) for the GBR model.

3.2. Classification Analysis


3.2. Classification Analysis
For the classification analysis, the models were evaluated using metrics such as ac-
For the classification analysis, the models were evaluated using metrics such as ac-
curacy, precision, recall, and F1-score. The classification models demonstrated varied
curacy, precision, recall, and F1-score. The classification models demonstrated varied per-
performance, as summarized below and detailed in Table 7. The SVM classifier achieved
formance, as summarized below and detailed in Table 7. The SVM classifier achieved the
the highest overall accuracy among the tested models at 0.80. The SVM classifier proved to
highest overall accuracy among the tested models at 0.80. The SVM classifier proved to be
be the best-performing model, and its confusion matrix is shown in Figure 9. It balanced
the best-performing model, and its confusion matrix is shown in Figure 9. It balanced pre-
precision and recall effectively across all classes and showed particular strength in correctly
cision and recall effectively across all classes and showed particular strength in correctly
classifying the “very weak” category. It also handled the nuances between “high strength”,
classifying the “very weak” category. It also handled the nuances between “high
“normal strength”, and “weak” categories better than other models, which indicates its
strength”, “normal strength”, and “weak” categories better than other models, which in-
ability to capture more complex patterns in the data. The bagging classifier, with consistent
dicates its ability to capture more complex patterns in the data. The bagging classifier,
scores of 0.76 and above across all metrics, also showed strong and balanced performance.
with consistent scores of 0.76 and above across all metrics, also showed strong and bal-
The RF model demonstrated acceptable precision at 0.76 but had slightly lower accuracy
anced performance. The RF model demonstrated acceptable precision at 0.76 but had
and recall scores compared to SVM and bagging, which suggests effectiveness in cor-
slightly lower accuracy and recall scores compared to SVM and bagging, which suggests
rectly identifying certain classes but with some limitations in achieving consistent accuracy
effectiveness in correctly identifying certain classes but with some limitations in achieving
across all predictions. The logistic regression model and KNN model displayed lower
consistent accuracy across all predictions. The logistic regression model and KNN model
performance metrics, with balanced accuracies of 0.63 and 0.62, respectively.
displayed lower performance metrics, with balanced accuracies of 0.63 and 0.62, respec-
tively.

Table 7. Performance of classification models.

Balanced Weighted Weighted Avg Pre- Weighted Avg Weighted Avg


Model
Accuracy Accuracy cision Recall F1-Score
Infrastructures 2025, 10, 26 18 of 26

Table 7. Performance of classification models.

Balanced Weighted Weighted Avg Weighted Avg Weighted Avg


Model
Accuracy Accuracy Precision Recall F1-Score
RF classifier 0.74 0.73 0.76 0.75 0.75
logistic regression 0.63 0.62 0.63 0.64 0.63
SVM classifier 0.76 0.78 0.80 0.80 0.80
KNN 0.62 0.53 0.69 0.69 0.68
Infrastructures 2025, 10, x FOR PEER REVIEW 19 of 27
bagging with decision trees 0.77 0.78 0.77 0.76 0.76

Figure 9.
Figure Confusion matrix
9. Confusion matrix for
for SVM
SVM (heatmap
(heatmap colors
colors darken
darken as
as count
count increase)
increase) and
and classification
classification
matrix by class.
matrix by class.

To provide
To provide further
furtherclarity
clarityononmodel
model reproducibility,
reproducibility, Table
Table 8 presents
8 presents the final
the final hy-
hyperparameter configurations obtained through Bayesian optimization
perparameter configurations obtained through Bayesian optimization for the top-per-for the top-
performing
forming regression
regression model
model (GBR)
(GBR) and and the top-performing
the top-performing classification
classification models
models (SVM).
(SVM). De-
Detailed hyperparameters and tuning procedures for all other models are available
tailed hyperparameters and tuning procedures for all other models are available in the in the
Supplementary Materials.
Supplementary Materials.

Table 8. Best hyperparameters for selected top-performing models.


Table 8. Best hyperparameters for selected top-performing models.

Hyperparameter Tun-
Hyperparameter
Model Hyperparameters Considered Initial/Default Values
Values Best/Tuned
Best/Tuned Values
Values
Tuning
ing Method
Method
n_estimators= = 100, n_estimators
n_estimators = 500,
= 500,
n_estimators 100,
n_estimators,learning_rate,
learning_rate, learning_rate = 0.1, learning_rate
learning_rate = 0.2057,
= 0.2057,
n_estimators, learning_rate = 0.1, Bayesian
GRB “max_depth, subsample, max_depth = 3, max_depth
max_depth = 10,
= 10,
GRB “max_depth, subsample, max_depth = 3, Optimization
Bayesian Optimization
min_samples_split subsample = 1.0, subsample
subsample= 0.5,
= 0.5,
min_samples_split subsample = 1.0,
min_samples_split =2 min_samples_split = 0.242
min_samples_split =
min_samples_split = 2
C (regularization), C = 1.0, 0.242
C ≈ 5.68 × 10 , 5
Cgamma
(regularization), Ckernel
= 1.0, = ‘rbf’, Bayesian C ≈ 5.68 × 105,
(kernel coefficient), gamma ≈ 0.1434,
SVM Optimization
gamma (kernelrbf,
kernel (linear, coefficient),
poly, sigmoid), kernel
gamma = rbf’,
= ‘scale’, Bayesian Optimization gamma
kernel ≈ 0.1434,
= ’rbf’,
SVM (BayesSearchCV)
degree (if kernel = poly) degree = 3
kernel (linear, rbf, poly, sigmoid), gamma = scale’, (BayesSearchCV) degree = 5
kernel = ’rbf’,
degree (if kernel = poly) degree = 3 degree = 5

3.3. Feature Importance Ranking and Feature Ablation


Understanding feature contributions is crucial for interpreting concrete strength pre-
diction models and identifying influential factors. Therefore, the feature importance val-
ues were extracted from the GBR to provide a measure of each feature’s influence on the
predictive model. Figure 10a illustrates the feature importance ranking. The importance
scores are normalized to sum up to 1 to allow for direct comparison among features. Anal-
ysis shows water–cement ratio (0.425) and age (0.301) are the most significant predictors
and Age) caused R to drop to 0.7752 and MSE to rise to 58.6519, and relying solely on
Water_Cement_Ratio produced a drastic decline (R2 = 0.1501, MSE = 221.6960). These re-
sults emphasize the importance of multiple synergistic features in achieving both high R2
Infrastructures 2025, 10, 26 and low MSE, with Water_Cement_Ratio, Age, and Blast Furnace Slag being particularly 19 of 26
influential. Conversely, features like Fly Ash and Coarse_Fine_Ratio demonstrate lower
predictive accuracy due to weaker direct correlations with compressive strength or their
3.3. Feature
effects beingImportance
overshadowedRankingbyand Feature
more Ablationparameters. Fly Ash, for instance, may
dominant
Understanding
improve strength andfeature contributions
durability is crucial
under certain for interpreting
conditions but exerts concrete strength
a more subtle pre-
or con-
diction models and
text-dependent identifying
influence influential
on early-age factors. Therefore,
compressive strength,the feature
which importance
makes values
its overall con-
were extracted from the GBR to provide a measure of each feature’s influence
tribution less pronounced in a broad dataset. Similarly, the Coarse_Fine_Ratio’s influence on the predic-
tive
is model. Figure
secondary to that10a illustrates the feature and
of Water_Cement_Ratio importance ranking.
Age, which Theshape
directly importance scores
hydration ki-
are normalized to sum up to 1 to allow for direct comparison among features.
netics and microstructural development. Thus, while these lower-ranked features are not Analysis
shows water–cement
without ratio (0.425)
value, their marginal and age (0.301)
improvements are therelative
are minimal most significant
to the top predictors
three predic- of
concrete compressive strength. Blast furnace slag (0.106) and superplasticizer
tors. Taken together, these findings suggest that a simplified model using only a few key (0.080) show
moderatecan
variables influence, while near-optimal
still achieve coarse aggregate–fine
accuracy,aggregate
providingratio (0.059)
practical and fly ash
guidance (0.029)
for future
have lesser impacts.
model development and feature selection.

Figure 10. (a)


(a) Feature
Feature importance ranking; (b) contribution of each feature to model performance.

To further assess
3.4. Understanding the impact
Feature of eachwith
Contributions feature
SHAP onAnalysis
the model’s performance, an ablation
studyTowas conducted. In this study, features were progressively
gain a deeper understanding of the GBR’s predictive behavior removedand from the model
interpret its
in order of increasing importance (starting with the least important feature),
predictions, we employed SHAP analysis. This method allows for both global and local and the model
was retrained each
interpretability and time.
revealsThetheMSE and R2 values
contribution were
of each recorded
feature to theatmodel’s
each step to evaluate
output across
how the removal of features affected the model’s predictive accuracy.
the entire dataset and for individual predictions. Figure 11a presents the SHAP summary
The ablation study (Figure 10b) shows the impact of incrementally removing features
plot, which displays the global feature importance. Each point on the plot represents a
on both R2 and MSE and clarifies each feature’s individual contribution to the model’s
SHAP value for a feature and an instance. The features are ordered by their overall
performance. Starting with all six variables (Water_Cement_Ratio, Age, Blast Furnace Slag,
Superplasticizer, Coarse_Fine_Ratio, and Fly Ash), we obtained an R2 of 0.9394 and an MSE
of 15.7961. Removing Fly Ash had a minimal effect on accuracy (R2 = 0.9366, MSE = 16.5504),
which indicates that although it adds some predictive value, its contribution is relatively
modest compared to the top-ranked features. Further reducing the feature set led to more
substantial declines: while retaining only the top three predictors—Water_Cement_Ratio,
Age, and Blast Furnace Slag—still achieved a commendable R2 of 0.9027, the MSE increased
to 25.3888. Narrowing down to just two features (Water_Cement_Ratio and Age) caused
R2 to drop to 0.7752 and MSE to rise to 58.6519, and relying solely on Water_Cement_Ratio
produced a drastic decline (R2 = 0.1501, MSE = 221.6960). These results emphasize the
importance of multiple synergistic features in achieving both high R2 and low MSE, with
Water_Cement_Ratio, Age, and Blast Furnace Slag being particularly influential. Con-
versely, features like Fly Ash and Coarse_Fine_Ratio demonstrate lower predictive accu-
racy due to weaker direct correlations with compressive strength or their effects being
overshadowed by more dominant parameters. Fly Ash, for instance, may improve strength
and durability under certain conditions but exerts a more subtle or context-dependent
influence on early-age compressive strength, which makes its overall contribution less
Infrastructures 2025,
Infrastructures 10, x26FOR PEER REVIEW
2025, 10, 2120ofof27
26

pronounced in a broad dataset. Similarly, the Coarse_Fine_Ratio’s influence is secondary


importance, with the most important feature at the top. The color of the points indicates
to that of Water_Cement_Ratio and Age, which directly shape hydration kinetics and
the feature value, with red representing high values and blue representing low values.
microstructural development. Thus, while these lower-ranked features are not without
The SHAP summary plot confirms the findings from the MDI analysis and highlights the
value, their marginal improvements are minimal relative to the top three predictors. Taken
water–cement ratio and age as the most influential features. Higher values of water–ce-
together, these findings suggest that a simplified model using only a few key variables
ment ratio generally contribute negatively to the predicted strength, while higher values
can still achieve near-optimal accuracy, providing practical guidance for future model
of age have a positive impact. Blast furnace slag and superplasticizer also show moderate
development and feature selection.
influence, with higher values of blast furnace slag typically decreasing the predicted
strength and higherFeature
3.4. Understanding valuesContributions
of superplasticizer increasing
with SHAP Analysisit. Fly ash and coarse aggregate–
fine aggregate
To gain aratio
deeper have relatively smaller
understanding impacts
of the GBR’s on the predictions.
predictive behavior and interpret its
Figure 11b shows a SHAP waterfall plot for
predictions, we employed SHAP analysis. This method allows a specific instancefor with
bothanglobal
actual and
concrete
local
strength of 61.89 MPa. This plot provides a local explanation to illustrate
interpretability and reveals the contribution of each feature to the model’s output across how each feature
contributes to the and
the entire dataset model’s predictionpredictions.
for individual for this particular
Figure 11ainstance. Thethe
presents base
SHAPvalue, repre-
summary
sented by E[f(X)],
plot, which is the
displays theaverage
global prediction of the model
feature importance. across
Each theon
point entire
the dataset (32.489
plot represents
MPa). Each bar in the plot represents a feature, and its length corresponds
a SHAP value for a feature and an instance. The features are ordered by their overall to the SHAP
value to indicate
importance, withthethe magnitude
most important and direction
feature atof thethe
top.feature’s
The color contribution
of the pointsto indicates
the final
prediction. For this instance, the water–cement ratio of 0.3 has the
the feature value, with red representing high values and blue representing low values.largest positive contri-
The
bution (+30.82), significantly increasing the prediction from the base
SHAP summary plot confirms the findings from the MDI analysis and highlights the water– value. The age of 28
days also contributes positively (+4.37), further increasing the predicted
cement ratio and age as the most influential features. Higher values of water–cement ratio strength. Con-
versely,
generally the absence of
contribute blast furnace
negatively to theslag (−3.85),
predicted fly ash while
strength, (−1.75), and avalues
higher moderate
of ageamount
have a
of superplasticizer (−0.711) contribute negatively, slightly lowering the
positive impact. Blast furnace slag and superplasticizer also show moderate influence, with prediction. The
coarse
higheraggregate–fine
values of blast aggregate
furnace slag ratio has a small
typically positivethe
decreasing impact (+1.31).
predicted The final
strength andpredic-
higher
tion (f(x)) of 62.667 MPa is the sum of the base value and all the individual
values of superplasticizer increasing it. Fly ash and coarse aggregate–fine aggregate feature contri-
ratio
butions.
have relatively smaller impacts on the predictions.

Figure11.
Figure (a) Feature
11. (a) Feature importance
importance analysis:
analysis: SHAP
SHAP summary
summary plot;
plot; (b)
(b) contribution
contribution analysis:
analysis: SHAP
SHAP
waterfall plot showing feature contributions for an actual concrete strength of 61.89
waterfall plot showing feature contributions for an actual concrete strength of 61.89 MPa. MPa.

Figure
The 11bdependence
partial shows a SHAP plotswaterfall
presented plot for a specific
in Figure instance
12 highlight with an actual
the influence con-
of water–
crete strength of 61.89 MPa. This plot provides a local explanation to illustrate
cement ratio and age on the compressive strength of concrete, as predicted by the gradient how each
feature contributes
boosting model. Figureto the
12amodel’s
displaysprediction
a markedfor this particular
decrease in concreteinstance. The base
compressive value,
strength
represented by E[f(X)], is the average prediction of the model across the
as the water–cement ratio increases from around 0.3 to 1.25. Initially, the decline is sub- entire dataset
(32.489 MPa).
stantial, Each bar
particularly in theratios
between plot represents a feature,
of 0.3 to 0.75 and its length
which indicates corresponds
that lower to the
ratios signif-
icantly enhance the concrete’s strength. Beyond a ratio of 0.75, the negative impact the
SHAP value to indicate the magnitude and direction of the feature’s contribution to on
final prediction.
strength continuesForbutthis instance,
becomes lessthe water–cement
pronounced, ratio ofless
eventually 0.3 has the largest
significant afterpositive
a ratio
contribution
of (+30.82),that
1.0. This suggests significantly increasing
maintaining the prediction
a water–cement ratiofrom
below the0.75
base
isvalue.
criticalThe
forage of
opti-
28 days
mal also strength.
concrete contributes Thepositively (+4.37),(Figure
age of concrete further12b)
increasing
shows athe predicted
robust strength.
positive Con-
correlation
versely, the absence of blast furnace slag ( − 3.85), fly ash ( − 1.75), and a moderate
with its compressive strength. From day 0 to approximately 50 days, there is a sharp in- amount of
superplasticizer (−0.711) contribute negatively, slightly lowering the prediction. The coarse
crease in strength, which reflects the critical curing phase, in which concrete gains most
of its compressive strength. Beyond 50 days, the rate of increase in strength diminishes,
becoming more gradual up to 100 days. The step increase in strength at around 100 days
Infrastructures 2025, 10, 26 21 of 26

aggregate–fine aggregate ratio has a small positive impact (+1.31). The final prediction (f(x))
of 62.667 MPa is the sum of the base value and all the individual feature contributions.
The partial dependence plots presented in Figure 12 highlight the influence of water–
cement ratio and age on the compressive strength of concrete, as predicted by the gradient
boosting model. Figure 12a displays a marked decrease in concrete compressive strength as
the water–cement ratio increases from around 0.3 to 1.25. Initially, the decline is substantial,
particularly between ratios of 0.3 to 0.75 which indicates that lower ratios significantly
enhance the concrete’s strength. Beyond a ratio of 0.75, the negative impact on strength
continues but becomes less pronounced, eventually less significant after a ratio of 1.0.
This suggests that maintaining a water–cement ratio below 0.75 is critical for optimal
concrete strength. The age of concrete (Figure 12b) shows a robust positive correlation
with its compressive strength. From day 0 to approximately 50 days, there is a sharp
Infrastructures 2025, 10, x FOR PEER REVIEW
increase in strength, which reflects the critical curing phase, in which concrete gains 22 most
of 27

of its compressive strength. Beyond 50 days, the rate of increase in strength diminishes,
becoming more gradual up to 100 days. The step increase in strength at around 100 days
might
might indicate
indicate specific
specific curing
curing oror environmental
environmental conditions
conditions affecting
affectingthethe concrete’s
concrete’slong-
long-
term strength characteristics. The interaction plot (Figure 12c) elucidates how
term strength characteristics. The interaction plot (Figure 12c) elucidates how combinationscombina-
tions of water–cement
of water–cement ratio ratio andimpact
and age age impact concrete
concrete strength.
strength. At early
At early agesages
(0–20(0–20
days) days)
and
and lower water–cement ratios (0.3–0.5), the concrete strength is highest, which
lower water–cement ratios (0.3–0.5), the concrete strength is highest, which emphasizes the empha-
sizes the importance
importance of both
of both proper properratios
mixture mixture
andratios and sufficient
sufficient curing
curing time. time.
As the ageAs the age
increases,
increases, even higher water–cement ratios (up to 1.5) show a less detrimental
even higher water–cement ratios (up to 1.5) show a less detrimental effect on the strength, effect on
the strength,inparticularly
particularly in concrete
concrete aged aged This
over 60 days. overinteraction
60 days. This interaction
suggests suggests influence
a diminishing a dimin-
ishing influence of the water–cement ratio on strength
of the water–cement ratio on strength as the concrete matures. as the concrete matures.

Figure 12.
Figure (a) Partial
12. (a) Partial dependence
dependence plot
plot for
for water–cement
water–cement ratio;
ratio; (b)
(b) partial
partial dependence
dependenceplot
plotfor
for age;
age;
(c) partial dependence plot showing the combined influence of water–cement ratio and
(c) partial dependence plot showing the combined influence of water–cement ratio and age (coolerage (cooler
colors (purple zones) indicate lower partial dependence values, warmer colors (greenish zones)
colors (purple zones) indicate lower partial dependence values, warmer colors (greenish zones) in-
indicate higher values).
dicate higher values).

4. Discussion
4. Discussion
The findings of this study highlight the significant potential of machine learning
models infindings
The of predicting
accurately this studyandhighlight the significant
classifying potential
the compressive of machine
strength learning
of concrete based
models in accurately predicting and classifying the compressive strength
on its mix design parameters and curing age. The superior performance of the GBR of concrete
based on its mix
underscores the design parameters
effectiveness and curing
of ensemble age. The
methods superior performance
in capturing the complex,of the GBR
non-linear
underscores the effectiveness of ensemble methods in capturing the complex, non-linear
relationships inherent in concrete materials. This discussion elaborates on the implications
relationships inherent
of these results, in concrete
the insights materials.
gained from This discussion
feature elaborates
importance on the
analyses, theimplications
challenges
of
encountered, and the broader impact on the field of concrete technology. challenges en-
these results, the insights gained from feature importance analyses, the
countered, and the broader impact on the field of concrete technology.

4.1. Model Performance Insights


Gradient boosting regression emerged as the most effective model for predicting con-
crete compressive strength, by achieving an R2 of 0.94 and an MSE of 15.79. This model’s
superior performance is attributed to its ability to capture complex non-linear relation-
ships between predictors and the target variable, which are inherent in concrete behavior
Infrastructures 2025, 10, 26 22 of 26

4.1. Model Performance Insights


Gradient boosting regression emerged as the most effective model for predicting
concrete compressive strength, by achieving an R2 of 0.94 and an MSE of 15.79. This
model’s superior performance is attributed to its ability to capture complex non-linear
relationships between predictors and the target variable, which are inherent in concrete
behavior due to intricate chemical and physical interactions. While RF also demonstrated
strong performance with an R2 value of 0.91, neural networks did not surpass gradient
boosting, likely due to the dataset size limitations. The comparatively lower performance of
linear regression underscores the inadequacy of linear models for capturing the non-linear
dynamics of concrete properties.
The classification task revealed that the SVM classifier achieved the highest accuracy,
correctly classifying compressive strength categories with a balanced accuracy of 0.76 and
a weighted F1-score of 0.80. The SVM’s ability to handle high-dimensional spaces and
its effectiveness with non-linear kernels likely contributed to its superior performance.
However, the challenge of classifying intermediate strength classes was evident across all
models. Misclassifications often occurred between “high strength” and “normal strength”
categories, possibly due to overlapping feature distributions and the inherent variability
in concrete mixes. The use of balanced accuracy and weighted metrics was crucial in this
context, as it accounted for class imbalances within the dataset. Some strength categories,
such as “very high strength”, had significantly fewer samples, which could bias the models
toward the majority classes. By employing these metrics, the evaluation provided a more
accurate reflection of the models’ capabilities across all categories.
Our results underscore the effectiveness of ensemble methods for compressive strength
prediction and align with prior studies that similarly reported boosted trees or hybrid
approaches outperforming conventional regressors [4,5,8,9]. For instance, Song et al. [8]
and Paudel et al. [7] each found that bagging- or boosting-based models attained R2
values exceeding 0.90, whereas simpler models such as linear or decision tree regressors
lagged. Notably, Tran et al. [9] and Ahmad et al. [10] showed that hybrid or ensemble
algorithms could achieve R2 values above 0.93 for recycled and geopolymer concretes,
respectively, further evidencing that these advanced architectures generalize effectively
across various binder systems. Our GBR’s R2 = 0.94 and the SVM classification accuracy of
0.80 for strength categories thus corroborate the conclusion that robust ensemble approaches
can accommodate the heterogeneous nature of concrete composites and yield superior
predictive accuracy.

4.2. Feature Importance and Practical Implications


Analysis using MDI and SHAP values revealed that water–cement ratio and age are
the most critical factors influencing compressive strength. A lower water–cement ratio
reduces porosity, enhancing strength, while increased age allows continued hydration and
microstructure densification, with strength gains leveling off after about 50 days. Blast
furnace slag and superplasticizer also contribute moderately. Blast furnace slag improves
long-term strength through latent hydraulic reactions, and superplasticizers enhance work-
ability, enabling lower water content without sacrificing performance. These insights aid
mix design optimization by highlighting key components. Focusing on optimizing water–
cement ratio and curing time can significantly boost compressive strength efficiently and
offers economic and environmental benefits by potentially reducing cement usage.
The use of SHAP values provided a nuanced understanding of how individual features
influenced model predictions at both global and local levels. The waterfall plot for a specific
instance illustrated how feature values contribute to a single prediction and enhanced the
model interpretability. This level of interpretability is crucial for gaining trust in machine
Infrastructures 2025, 10, 26 23 of 26

learning models within the construction industry, where decisions have significant safety
and financial implications. By demonstrating that the model’s behavior aligns with domain
knowledge, stakeholders are more likely to adopt these data-driven approaches. The ability
to predict strength without extensive laboratory testing accelerates the design process and
enhances project efficiency.
It is also important to note that the strong influence of the water–cement ratio and
curing age in our analysis concurs with numerous prior investigations. For instance, Ding
et al. [5] and Ekanayake et al. [6] both identified age (or curing duration) as a dominant
factor in concrete strength evolution, while Alghrairi et al. [4] and Anjum et al. [11] em-
phasized the significant role of water content. Our SHAP-based interpretability analysis
(Section 3.4) parallels these findings and demonstrates that small changes in W/C ratio
lead to sizable shifts in predicted strength. Moreover, partial dependence plots revealed
synergy between W/C ratio and curing time, aligning with earlier studies that used SHAP
or feature-importance techniques for clarity [6,11]. As a result, our results substantiate
that data-driven ranking of variables (e.g., W/C ratio, age) resonates strongly with well-
established concrete fundamentals.

4.3. Challenges, Limitations, and Future Research


Despite the promising results, several challenges were encountered during the study.
One notable challenge was multicollinearity among the original features, which was ad-
dressed through feature engineering by creating ratios, such as the water–cement ratio and
the coarse aggregate–fine aggregate ratio. While this approach reduced multicollinearity
and improved model performance, the engineered features still exhibited higher-than-
desirable VIF values. This suggests that further refinement or alternative methods, such as
regularization techniques, may be necessary to fully mitigate multicollinearity.
Another limitation pertains to the dataset used in this study. The subsampling analysis
revealed that a more moderate sample size of around 400 instances would be sufficient to
achieve stable predictive performance. This is particularly valuable for future studies that
may face data availability constraints, suggesting that similar models can be developed
with smaller datasets without compromising accuracy. However, the dataset’s compre-
hensiveness presents challenges. While it covers a wide range of mix designs and curing
ages, it may not fully capture variations in raw materials, environmental conditions, and
construction practices across different regions. This could affect the model’s generalizability
to other contexts. Therefore, it is suggested that future research incorporate larger and more
diverse datasets, including various cement types, aggregate sources, and environmental
conditions, to enhance the model’s robustness and applicability in varied settings. Addi-
tionally, the removal of outliers, while improving model performance, presents challenges.
Excluding outliers may omit valid but extreme cases, potentially limiting the model’s ability
to predict accurately in scenarios. Future studies should explore methods that balance
outlier removal with the retention of essential data points to maintain comprehensive
predictive capabilities.
To advance these findings, future research is suggested to focus on expanding datasets
by incorporating diverse sources and geographical locations. Advanced feature engineer-
ing techniques, such as polynomial features, interaction terms, and real-time monitoring
data such as temperature and humidity during curing, can capture more nuanced data
patterns and improve predictive accuracy. Exploring deep learning approaches may reveal
complex relationships not identified by traditional machine learning models, particularly
when combined with non-traditional data sources such as imaging or sensor data. Also,
enhancing model interpretability through methods such as layer-wise relevance propaga-
Infrastructures 2025, 10, 26 24 of 26

tion or integrated gradients is crucial for industry adoption, ensuring that complex models
remain transparent and trustworthy.
Furthermore, integrating predictive models into user-friendly decision support sys-
tems, such as software tools or mobile applications, and incorporating optimization al-
gorithms can facilitate practical use by practitioners, enabling automated mix design
suggestions tailored to specific project requirements. The adoption of machine learning
models in concrete technology also raises ethical and environmental considerations. Opti-
mizing mix designs for strength and cost must be balanced with sustainability goals, such
as reducing carbon emissions associated with cement production. Future models could
incorporate environmental impact metrics to support eco-friendly decision-making.

5. Conclusions
The present study demonstrated the effectiveness of machine learning algorithms,
especially ensemble techniques such as gradient boosting, in making precise predictions
and classifying the compressive strength of concrete based on mix design parameters and
curing duration. Using advanced feature importance analysis techniques, including SHAP
values and partial dependence plots, allowed us to delve into the details of how the input
variables interact with each other in these models to affect the predictions. These results
have shown the potential of machine learning models to enhance mix design optimization,
quality assurance, and fulfillment of engineering standards. SHAP analysis allowed a better
insight into feature contributions on both a global and local level, thus possibly increasing
model interpretability. The ability to predict strength without extensive laboratory testing
accelerates the design process, reduces costs, and promotes more efficient project timelines.
However, the study acknowledges certain limitations. While the dataset used is com-
prehensive, it may not capture all possible variations in raw materials, environmental
conditions, and construction practices across different regions. Exploring deep learning
approaches and integrating real-time monitoring data could uncover more complex relation-
ships and enhance the model’s robustness. Additionally, improving model interpretability
is essential for ensuring widespread adoption in the industry.

Supplementary Materials: The following supporting information, including the code used for
data preprocessing, feature engineering, model development, and analysis in this study, can be
downloaded at: https://github.com/mnikoopayan/Concrete-Compressive-Strength (accessed on 12
July 2024).

Author Contributions: Conceptualization, M.S.N.T. and Y.F.; methodology, M.S.N.T. and Y.F.; soft-
ware, M.S.N.T.; validation, M.S.N.T., Y.F. and M.M.; formal analysis, M.S.N.T. and Y.F.; investigation,
M.S.N.T.; resources, M.S.N.T. and M.M.; data curation, M.S.N.T. and M.M.; writing—original draft
preparation, M.S.N.T. and Y.F.; writing—review and editing, Y.F. and M.M.; visualization, M.S.N.T.
and Y.F.; supervision, Y.F.; project administration, Y.F.; funding acquisition, Y.F. All authors have read
and agreed to the published version of the manuscript.

Funding: This research received no external funding.

Data Availability Statement: The data presented in this study are available in the UC Irvine Machine
Learning Repository at 10.24432/C5PK67.

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Griffiths, S.; Sovacool, B.K.; Furszyfer Del Rio, D.D.; Foley, A.M.; Bazilian, M.D.; Kim, J.; Uratani, J.M. Decarbonizing the Cement
and Concrete Industry: A Systematic Review of Socio-Technical Systems, Technological Innovations, and Policy Options. Renew.
Sustain. Energy Rev. 2023, 180, 113291. [CrossRef]
Infrastructures 2025, 10, 26 25 of 26

2. Young, B.A.; Hall, A.; Pilon, L.; Gupta, P.; Sant, G. Can the Compressive Strength of Concrete Be Estimated from Knowledge of
the Mixture Proportions?: New Insights from Statistical Analysis and Machine Learning Methods. Cem. Concr. Res. 2019, 115,
379–388. [CrossRef]
3. Li, Z.; Yoon, J.; Zhang, R.; Rajabipour, F.; Srubar, W.V., III; Dabo, I.; Radlińska, A. Machine Learning in Concrete Science:
Applications, Challenges, and Best Practices. npj Comput. Mater. 2022, 8, 127. [CrossRef]
4. Alghrairi, N.S.; Aziz, F.N.; Rashid, S.A.; Mohamed, M.Z.; Ibrahim, A.M. Machine Learning-Based Compressive Strength
Estimation in Nanomaterial-Modified Lightweight Concrete. Open Eng. 2024, 14, 20220604. [CrossRef]
5. Ding, Y.; Wei, W.; Wang, J.; Wang, Y.; Shi, Y.; Mei, Z. Prediction of Compressive Strength and Feature Importance Analysis of Solid
Waste Alkali-Activated Cementitious Materials Based on Machine Learning. Constr. Build. Mater. 2023, 407, 133545. [CrossRef]
6. Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A Novel Approach to Explain the Black-Box Nature of Machine Learning in
Compressive Strength Predictions of Concrete Using Shapley Additive Explanations (SHAP). Case Stud. Constr. Mater. 2022, 16,
e01059. [CrossRef]
7. Paudel, S.; Pudasaini, A.; Shrestha, R.K.; Kharel, E. Compressive Strength of Concrete Material Using Machine Learning
Techniques. Clean. Eng. Technol. 2023, 15, 100661. [CrossRef]
8. Song, H.; Ahmad, A.; Farooq, F.; Ostrowski, K.A.; Maślak, M.; Czarnecki, S.; Aslam, F. Predicting the Compressive Strength of
Concrete with Fly Ash Admixture Using Machine Learning Algorithms. Constr. Build. Mater. 2021, 308, 125021. [CrossRef]
9. Quan Tran, V.; Quoc Dang, V.; Si Ho, L. Evaluating Compressive Strength of Concrete Made with Recycled Concrete Aggregates
Using Machine Learning Approach. Constr. Build. Mater. 2022, 323, 126578. [CrossRef]
10. Ahmad, A.; Ahmad, W.; Chaiyasarn, K.; Ostrowski, K.A.; Aslam, F.; Zajdel, P.; Joyklad, P. Prediction of Geopolymer Concrete
Compressive Strength Using Novel Machine Learning Algorithms. Polymers 2021, 13, 3389. [CrossRef] [PubMed]
11. Anjum, M.; Khan, K.; Ahmad, W.; Ahmad, A.; Amin, M.N.; Nafees, A. Application of Ensemble Machine Learning Methods to
Estimate the Compressive Strength of Fiber-Reinforced Nano-Silica Modified Concrete. Polymers 2022, 14, 3906. [CrossRef]
12. Ullah, H.S.; Khushnood, R.A.; Farooq, F.; Ahmad, J.; Vatin, N.I.; Ewais, D.Y.Z. Prediction of Compressive Strength of Sustainable
Foam Concrete Using Individual and Ensemble Machine Learning Approaches. Materials 2022, 15, 3166. [CrossRef]
13. Kumar, P.; Pratap, B. Feature Engineering for Predicting Compressive Strength of High-Strength Concrete with Machine Learning
Models. Asian J. Civ. Eng. 2024, 25, 723–736. [CrossRef]
14. Nguyen, N.-H.; Abellán-García, J.; Lee, S.; Vo, T.P. From Machine Learning to Semi-Empirical Formulas for Estimating Compres-
sive Strength of Ultra-High Performance Concrete. Expert Syst. Appl. 2024, 237, 121456. [CrossRef]
15. Onyelowe, K.C.; Gnananandarao, T.; Ebid, A.M.; Mahdi, H.A.; Ghadikolaee, M.R.; Al-Ajamee, M. Evaluating the Compressive
Strength of Recycled Aggregate Concrete Using Novel Artificial Neural Network. Civ. Eng. J. 2022, 8, 1679–1693. [CrossRef]
16. Onyelowe, K.C.; Ebid, A.M.; Mahdi, H.A.; Riofrio, A.; Eidgahee, D.R.; Baykara, H.; Soleymani, A.; Kontoni, D.-P.N.; Shak-
eri, J.; Jahangir, H. Optimal Compressive Strength of RHA Ultra-High-Performance Lightweight Concrete (UHPLC) and Its
Environmental Performance Using Life Cycle Assessment. Civ. Eng. J. 2022, 8, 2391–2410. [CrossRef]
17. Onyelowe, K.C.; Kontoni, D.-P.N.; Ebid, A.M.; Dabbaghi, F.; Soleymani, A.; Jahangir, H.; Nehdi, M.L. Multi-Objective Optimization
of Sustainable Concrete Containing Fly Ash Based on Environmental and Mechanical Considerations. Buildings 2022, 12, 948.
[CrossRef]
18. ACI Committee 318; American Concrete Institute. Building Code Requirements for Structural Concrete (ACI 318-08) and Commentary;
American Concrete Institute: Farmington Hills, MI, USA, 2008; ISBN 978-0-87031-264-9.
19. Yeh, I.-C. Concrete Compressive Strength. UCI Mach. Learn. Repos. 2007, 10, C5PK67.
20. Mckinney, W. Pandas: A Foundational Python Library for Data Analysis and Statistics. Python High Perform. Sci. Comput. 2011,
14, 1–9.
21. Vinutha, H.P.; Poornima, B.; Sagar, B.M. Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset. In
Information and Decision Sciences, Proceedings of the 6th International Conference on FICTA, Bhubaneswar, India, 14–16 October 2017;
Satapathy, S.C., Tavares, J.M.R.S., Bhateja, V., Mohanty, J.R., Eds.; Springer: Singapore, 2018; pp. 511–518.
22. Tukey, J.W. Exploratory Data Analysis; Addison-Wesley Pub. Co.: Reading, MA, USA, 1977; ISBN 978-0-201-07616-5.
23. Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [CrossRef]
24. American Concrete Institute. Building Code Requirements for Structural Concrete (ACI 318-19) and Commentary; American Concrete
Institute: Farmington Hills, MI, USA, 2019.
25. McKinney, W. Data Structures for Statistical Computing in Python. Proc. Python Sci. Conf. 2010, 445, 56–61.
26. Waskom, M.L. Seaborn: Statistical Data Visualization. J. Open Source Softw. 2021, 6, 3021. [CrossRef]
27. O’brien, R.M. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Qual. Quant. 2007, 41, 673–690. [CrossRef]
28. Hover, K.C. The Influence of Water on the Performance of Concrete. Constr. Build. Mater. 2011, 25, 3003–3013. [CrossRef]
29. Hashemi, M.; Shafigh, P.; Karim, M.R.B.; Atis, C.D. The Effect of Coarse to Fine Aggregate Ratio on the Fresh and Hardened
Properties of Roller-Compacted Concrete Pavement. Constr. Build. Mater. 2018, 169, 553–566. [CrossRef]
Infrastructures 2025, 10, 26 26 of 26

30. Iqbal Khan, M.; Abbass, W.; Alrubaidi, M.; Alqahtani, F.K. Optimization of the Fine to Coarse Aggregate Ratio for the Workability
and Mechanical Properties of High Strength Steel Fiber Reinforced Concretes. Materials 2020, 13, 5202. [CrossRef] [PubMed]
31. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
32. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.;
et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
33. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY,
USA, 2009; ISBN 978-0-387-84857-0.
34. Massey, F.J., Jr. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [CrossRef]
35. Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. {TensorFlow}: A
System for {Large-Scale} Machine Learning. In Proceedings of the 12th USENIX symposium on operating systems design and
implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283.
36. Chollet, F. Fchollet/Keras-Resources. Available online: https://github.com/fchollet/keras-resources (accessed on 20 November 2024).
37. Bayesian Optimization in Action. Available online: https://www.manning.com/books/bayesian-optimization-in-action
(accessed on 15 January 2025).
38. Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process.
Syst. 2012, 25. Available online: https://arxiv.org/abs/1206.2944 (accessed on 15 January 2025).
39. Taylor, K.E. Summarizing Multiple Aspects of Model Performance in a Single Diagram. J. Geophys. Res. Atmos. 2001, 106,
7183–7192. [CrossRef]
40. Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv
2020, arXiv:2010.16061.
41. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
42. Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874.
43. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like