Machine Learning Forecasting Models of Disc Cutters Life of Tunnel Boring Machine
Machine Learning Forecasting Models of Disc Cutters Life of Tunnel Boring Machine
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
A R T I C L E I N F O A B S T R A C T
Keywords: This study aims to propose four Machine Learning methods of Gaussian process regression (GPR), support vector
Tunneling regression (SVR), decision trees (DT), and K-nearest neighbors (KNN) to predict disc cutter’s life of TBM. 200
Tunnel boring machine (TBM) datasets monitored during the Alborz service tunnel construction in Iran, including TBM operational parameters,
Machine learning (ML)
geometry, and geological conditions, were applied in the models. The 5-fold cross-validation method was
TBM disc cutter life
considered to investigate the prediction performance of the models. Finally, the GPR model with R2 = 0.8866/
RMSE = 107.3554, was the most accurate model to predict TBM disc cutter’s life. KNN model with R2 = 0.1753/
RMSE = 288.9277, produced the minimum accuracy. To assess each parameter’s contribution in the prediction
problem, the backward selection method was used. The results showed that TF, RPM, PR, and Qc parameters
significantly contribute to TBM disc cutter’s life. However, RPM and PR parameters were more and less sig
nificant compared to the others.
1. Introduction disc cutter life in different conditions can be essential. Several prediction
models or adjustment factors for estimating the TBM disc cutter’s life
There are various applications of the full-face rock tunnel boring and disc cutters wearing have been developed in some researches [3,4].
machine (TBM) due to the rapid development in national construction All the studies performed on disc cutter wear and disc cutter life
and underground engineering technology. Since one of the most sig prediction can be categorized into two rough groups. In some studies,
nificant parts of TBM is considered to be the TBM disc cutter, it has been the disc cutter wear can be predicted from the mechanical computation
a subject of interest of several researchers [1–3]. One of the most general of the interaction between the rocks and cutters. Plinninger et al. [5]
topics among the TBM disc cutter researches is the disc cutter life and investigated the Cerchar Abrasivity Index (CAI) index based on the
the disc cutter wearing. It has a significant value practically and experimental conditions and rock mass properties. Michalakopoulos
economically in the tunneling process. A disc cutter is generally used as et al. [6] worked on the CAI index considering the effect of steel styli.
a rolling rock-breaking tool on a hard rock TBM cutterhead. There is a Other studies have differently predicted the disc cutter life by gaining
direct contact between the disc cutter and hard rock in the TBM cut the statistical theory between the rock condition, cutter life, and TBM
terhead working process. The disc cutters roll, and due to the cutterhead behavior. Hassanpour [7] and Liu et al. [8] established a mathematical
thrust and torque action, it can grind the hard rock. equation for predicting TBM disc cutter life by performing single and
In the last two decades, the use of rock TBM in tunneling projects has multiple regression analyses. In their study, several rock properties,
remarkably increased worldwide. Hence, the accurate prediction of TBM including UCS, CAI, quartz content, Vicker’s hardness number of rock
* Corresponding author.
E-mail addresses: [email protected] (A. Mahmoodzadeh), [email protected] (M. Mohammadi), [email protected] (H. Hashim
Ibrahim), [email protected] (S. Nariman Abdulhamid), [email protected] (H. Farid Hama Ali), [email protected] (A. Mohammed
Hasan), [email protected] (M. Khishe), [email protected] (H. Mahmud).
https://doi.org/10.1016/j.autcon.2021.103779
Received 18 January 2021; Received in revised form 18 April 2021; Accepted 19 May 2021
Available online 24 May 2021
0926-5805/© 2021 Elsevier B.V. All rights reserved.
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Thrust force Cutter rotation speed Penetration rate Screw rate Despite the merits of various ML approaches, according to the No-
Free-Lunch (NFL) theorem, there is no ML model to solve all engineer
Grouting pressure Soil pressure Disc cutter life ing problems as the best method successfully. Therefore, researchers
have tried to evaluate the efficiency of various ML approaches for
solving various optimization. As an NFL theorem, we use four ML
K-fold CV (K=5)
models with different features and capability, including KNN, GPR, SVR,
Training set Testing set and DT. However, the key features of the models mentioned above,
which motivate us to use them, is as follows:
• Regression Analysis
Data normalization
Regression analysis is used to predict a continuous target variable
from one or multiple independent variables. Typically, regression
analysis is used with naturally occurring variables rather than variables
AI algorithms
that have been manipulated through experimentation. As stated above,
GPR SVR DT KNN there are many different types of regression, so once we’ve decided
regression analysis should be used, how do we choose which regression
technique should be applied?
Statistical evaluation indices
R2 MAE RMSE MAPE
• We chose GPR because:
- GPR directly captures the model uncertainty. For example, in
regression, GPR directly distributes the prediction value, rather than
Results comparison just one value as the prediction. This uncertainty is not directly
captured in neural networks.
Identify the best prediction model
- When using GPR, we can add prior knowledge and specifications
about the shape of the model by selecting different kernel functions.
Feature selection For example, based on the answers to the following questions, we
may choose different priors. Is the model smooth? Is it sparse?
Identify the most effective features on the slope stability
Should it be able to change drastically? Should it be differentiable?
This capability gives researchers flexible models, which can be fitted
to various kinds of datasets.
Fig. 1. Overall procedure of TBM disc cutter life prediction using
ML techniques. • We chose SVR because:
2
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
KNN is a non-parametric method we used for prediction in this - K value: how many neighbors participate in the KNN algorithm. k
paper. It is one of the easiest ML approaches that has been recently used. should be tuned based on the validation error.
It is a lazy learning model with local approximation. We use this model - Distance function: Euclidean distance is the most used similarity
considering the following terms: function. Manhattan distance, Hamming Distance, Minkowski dis
The key Advantages: tance are different alternatives.
3
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Fig. 5. Open gripper hard rock TBM used for the excavation of Alborz ser Fig. 6. Structure of a TBM disc cutter [31].
vice tunnel.
completed. The tunnel entrance is considered as the northern mouth (to
- KNN and other models’ general difference is the large real-time the Shomal-S), and its outlet is considered the southern mouth (to the
computation needed by KNN. Tehran-T). The tunnel route’s Lithology is mainly composed of Tuffs,
Andesite, Anidrite, Limestone, and Sandstone. The compressive strength
2. Case description of the rocks of the tunnel route varies from 20 to 120Mpa. The longest
fault is located in the ST5339–5361, where the water flowing into the
The Alborz service tunnel on Tehran–Shomal motorway project in tunnel is high from this fault and provides conditions for squeezing the
Iran is considered in this study to access the database. The Tehran- rocks of the tunnel pathway. The longitudinal geological map of the
Shomal motorway project is a new motorway through which the capi Alborz service tunnel is shown in Fig. 4.
tal Tehran is connected to the city of Chalus at the Caspian Sea in the An open gripper hard rock TBM manufactured by Wirth with 5.2 m
North with a length of 121 km. At present, traffic crosses the Alborz diameter (Fig. 5) is used to excavate the Alborz service tunnel with a
mountains on narrow highways, and it takes 5–6 h to travel. Once the constant positive gradient (~1%). The maximum overburden occurs
project is finished, travel time with an average higher volume can be across the length of 850 m. The first excavation step came to pass on 06
shortened under two hours. There are more than 30 twin tunnels with Sep 2004 during the erection and commissioning of the TBM. Productive
dual lanes in the motorway alignment. With a length of 6400 m and an excavation started on 06 Feb 2005 at TM 122. The break-through into
altitude of 2400 m, the Alborz tunnel would be the longest. The location the S-portal heading was celebrated at TM 6073 on 03 Feb 2009 after 48
of the Alborz service tunnel is shown in Fig. 2. months. The excavations were carried out for 919 days (63%) out of
There is a service tunnel situated between the main existing tunnel 1459 days; thus, an average total of 6.48 m per day in advance was
tubes. This tunnel’s primary role is to investigate, drain, and reach the observed during days. The advance was maximal 30.47 m per day,
utility to the main tunnels. A schematic of the Alborz twin tunnel is 110.96 m per week, 389.43 m per month.
shown in Fig. 3. Currently, the construction of the Alborz tunnel is
4
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Fig. 7. Different forms of TBM disc cutter wear. (a) Normal wear, (b) edge curling, (c) cutter ring partial wear, (d) cutter ring fracture, (e) cutter ring crack, (f) seal
and bearing failure [30].
Fig. 8. (a) Number of disc cutters replacement for all disc cutter positions on the cutterhead, (b) Pie chart of each normal and abnormal wear form.
3. Analysis of TBM disc cutter wear and disc cutter life for all disc cutter positions. As shown in Fig. 8(b), 66.35% of disc cutters
wear is reported as normal, and the rest (33.65%) as abnormal. Ac
A TBM disc cutter is made up of various components, each of which cording to Fig. 8(a), the further step away from the center of the cut
has a specific task. The structure of a TBM disc cutter is shown in Fig. 6. terhead to the edges, the more the discs are worn and the more they are
TBM disc cutters wear normally or abnormally. In normal wear, the replaced.
entire disc cutter ring wears out almost evenly. On the other hand, there A disc cutter’s life is the amount of time it takes to use it before it
can be edge curling, partial ring wear, ring fracture, ring crack, and seal needs to be replaced. Looking at the previous publications [28], Eqs.
and bearing failure in abnormal disc cutter wear [28,30]. The different (1)–(3) are employed as three methods to predict a disc cutter life.
forms of TBM disc cutter wear are shown in Fig. 7. there should be
L
immediate disc replacement; if abnormal wear is observed on the disc, Hm = (1)
NTBM
the disc needs to be replaced immediately.
In the Alborz service tunnel, the disc cutter position in the cutterhead NTBM
was numbered so that the influence of disc cutters position on their life Wm = (2)
L
can be determined as in Fig. 5. During the excavation of the Alborz
service tunnel, 214 disc cutters were replaced. In Fig. 8(a), the overall Hm πd2TBM
Hf = (3)
number of disc cutter changes in the Alborz service tunnel is presented 4
5
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Fig. 9. Data distribution and correlation of the TF, RPM, SR, PR, GP, SP, SE, Qc, and H parameters to the output parameter of Hf.
Table 1
A brief review on the database used in this study.
TF [kN] RPM [rev/min] SR [rev/min] PR [mm/rev] GP [kPa] SP [kPa] SE [kWh/m3] Qc [%] H [m] Hf [m3/cutter]
count 200 200 200 200 200 200 200 200 200 200
mean 30,940 1.6040 10.2680 28.1970 327.450 190.900 3.9733 9.6102 395.404 1896.410
std 6461 0.1961 4.6916 7.2329 63.3784 39.1260 1.4052 6.7656 97.8027 308.4626
min 19,000 1.2000 2.6000 14.000 210.000 120.000 1.3700 0.0000 96.6000 790.0000
25% 25,950 1.5000 5.5000 22.000 270.000 160.000 2.9975 4.1750 325.450 1759.500
50% 30,700 1.6000 11.000 28.000 330.000 185.000 3.7200 7.2000 381.800 1934.000
75% 35,250 1.8000 14.200 33.000 370.000 220.000 4.8425 13.775 433.550 2054.750
max 45,600 1.9000 19.600 45.000 470.000 280.000 8.4600 26.300 811.900 2850.000
where Hm is the average length of tunnel bored in m/cutter, Hf is the appropriate parameter in several projects to estimate the TBM disc
volume of rock excavated for each cutter change in m3/cutter, Wm is the cutter life [7,28]. Therefore, Hf is considered in this study to predict the
number of cutters changed per rolling distance of excavated soil in disc cutters life of the TBM used in the Alborz service tunnel.
cutter/m, NTBM is the total number of disc cutters changed, L is tunnel
length excavated for each full dressing of the head, and d is the tunnel 4. Database
diameter.
Among the three above equations, Hf has been identified as the most In this study, a database including 200 datasets obtained during the
6
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
and the output. This shows that the output value cannot be obtained
with one of the parameters alone. In this case, it is necessary to consider
all the parameters affecting the model’s output simultaneously. Table 3
Table 1 provides a brief review of the data used. The observed data The optimized hyper-parameters of the SVR model.
includes geology conditions and operational parameters of the TBM, Parameter Value or type
including thrust force (TF), excavation depth (H), soil pressure (SP), Kernel Function ‘Medium Gaussian’
cutter rotation speed (RPM), disc cutter life (Hf), grouting pressure (GP), Epsilon 21.9681
quartz content (Qc), penetration rate (PR), specific energy (SE), and Solver ‘SMO’
screw rate (SR). Bias 1835.1
In order to employ the database in the prediction models, the K-fold
CV (K = 5) was used to categorize datasets into two groups of training
Note that each Fxi is barely Gaussian, with mean μ(xi) and difference k
and testing. To obtain robust outcomes from the datasets’ analysis, they
(xi, xi) [22].
randomly separated equally into two equal-sized portions (i.e., K and K1
Assume there is an f(x) that can update. Besides, suppose that f
sub-samples). To validate and test the models, the K sub-samples and the
cannot be watched legitimately, yet that an arbitrary variable Fx can be
K1 sub-samples were employed, respectively.
seen that is listed by a similar space as f and whose normal esteem is f, i.
e., ∀x ∈ X, E[Fx] = f(x). It is agreed that previous convictions of the
5. Statistical evaluation indices capacity f are associated with an early mean μ and part k Gaussian
method. Assume that Fx is a perception of f(x) that has been tainted by
To evaluate the accuracy of the forecasting models, some statistical zero-mean, i.i.d. Gaussian clamor, i.e., Fx = f(x) + ϵ, where ϵ~N(0, σ2ϵ ).
evaluation indices, including coefficient of determination (R2), root Then, f(x) is a shrouded vector, the back appropriation of which can be
mean square error (RMSE) and mean absolute percentage error (MAPE) derived in the wake of observing Fx experiments in various space areas.
are taken into account. In the following, the formulas for calculating The subsequent deduction is called Gaussian procedure relapse [23].
these indices are presented. Let us consider x to be the arrangement of perceptions focuses and Fx
sum squared regression (SSR) be the subsequent genuine esteemed perceptions. The back appropria
R2 = 1 − (4) tion of some new point ̂ x ∈ X needs to be processed. The appropriation
sum of squares total (SST)
will be Gaussian with mean and difference,
n ⃒ ⃒
1∑ ⃒yi − y′i ⃒
MAPE = ⃒
⃒
⃒ × 100% (5) μ(̂x |x) = μ(̂x ) + k(̂x , x)k(x, x)− 1 (Fx − μ(x) ) (8)
n i=1 yi ⃒
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )∑ ̅ σ 2 (̂x |x ) = k(̂x , ̂x ) − k(̂x , x)k(x, x)− 1 k(x, ̂x ) (9)
1 n
(6) Note that the backward application deceives the portion network of
′ 2
RMSE = (yi − yi )
n i=1
watched area focuses, thus can be figured once and used to assess the
back at numerous focuses in the space. Since the issue is to find an ideal
where yi is the actual value, yi′ is the predicted value, yi and yi are the
′
of the obscure capacity, the final step is to process the ideal of the
means of actual and predicted values, and n is the number of samples. subsequent back mean xR = argmax̂x ∈X μ( ̂ x |x ). This, can not be pro
cessed in a shut structure and demands employing some strategy for
6. Prediction models of disc cutter life and results capacity streamlining. Although not ideal, another strategy is to restore
the xi from x with the biggest watched esteem Fxi. This has the functional
6.1. GPR reaction that the method will not restore a point that has never been
assessed, including some insurance from an inaccurate earlier.
A Gaussian procedure (GP) is a set F of arbitrary factors Fx1, Fx2, … for In this work, the regression learner app of Matlab 2019 software was
which any finite subset of the factors has a joint multivariate Gaussian employed to predict disc cutter life using the GPR method. In this pre
conveyance. The factors are listed by components x of a set X. For any diction, the GPR method utilizing the Matlab app tested four models,
finite length vector of lists x = [x1, x2, …, xn]T, there is a comparing including exponential, squared exponential, rational quadratic and
vector Fx = [Fx1, Fx2, …, Fxn]T of factors that has a multivariate Gaussian Matren 5/2, separately. At the end of the process, the most accurate
(or ordinary) distribution [21], model was selected. Each of the four models embraces a wide range of
Fx ∼ N{μ(x) , k(x, x) } (7) hyper-parameters; in turn, these hyper-parameters’ values evaluate the
models’ performance. Therefore, the optimization mode in the app of
where μ(x) is provided by a mean capacity μ(xi), and k is the portion Matlab has been activated to emerge the outcomes from the models.
work. The portion takes two files xi and xj, and gives the covariance Table 2 shows the selected parameters of the most powerful GPR model
between their comparing factors Fxi and Fxj. Given vectors of lists xi adopted in this study.
andxj, k restores the framework of covariances between all sets of factors Three primary hyperparameter tuning approaches have been pro
where the first in the pair originates from Fxi and the second from Fxj. posed in literature, including random search, grid search, metaheuristic-
7
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Fig. 10. Comparison of the disc cutters life predicted by the GPR model with the actual ones.
based search (smart search) [32,33] although there are also other ap
proaches that are less popular [34,35]. Contrary to the “dumb” alter
natives of grid search and random search, metaheuristic-based
hyperparameter tuning, including PSO, is much less parallelizable.
Instead of producing all the candidate points up front and investigating
the batch in parallel, metaheuristic-based tuning approaches pick a few
hyperparameter settings, investigate their performance, then determine
where to sample next. These methods are intrinsically iterative and
sequential process, which is not parallelizable. On one hand, making
fewer evaluations and reducing the total time complexity are the pri
mary goal of any computation algorithm. On the other hand,
metaheuristic-based search algorithms require computation time to find
out where to place the next set of samples. Besides, metaheuristic-based
search algorithms also contain parameters of their own that need to be
tuned. These shortcomings motivate us to choose random search, which
has the least time and space complexity.
Considering the Bergstra theorem [32] that stated that “if the close-
to-optimal region of hyperparameters occupies at least 5% of the grid
surface, then random search with 60 trials will find that region with high
probability”, the hyperparameters of the GPR and SVR methods, which
are shown in Tables 2 and 3, were optimized with a random search
Fig. 11. Disc cutters life predicted by the GPR model vs. the actual mode.
approach as follows: the values of hyperparameter are modeled with an
exponential probability density function, which produces values for
each hyperparameter to be investigated according to the models’ per
formance in the validation set. After 60 iterations with differently
Fig. 12. Comparison of the disc cutters life predicted by the SVR model with the actual ones.
8
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Table 4
The optimal hyper-parameters considered in the DT
method.
Parameter Value or type
PredictorSelection ‘allsplits’
SplitCriterion ‘mse’
Prune ‘on’
MaxNumSplits 199
MinLeafSize 4
MinParentSize 10
6.2. SVR
DT is one of the classifications and regression methods based on the
non-parametric survived learning technique. Furthermore, it consists of
SVR preserves all of the Support Vector Machine (SVM) standard
a set of if-then-else decision rules. The best perdition of the model occurs
algorithm’s critical features. The model is generated through classifi
when the DT goes deeper and deeper to make the best fit with the actual
cation with SVM. Consequently, SVM concepts for classification are
data. There are several advantages of the DT. First, the distribution of
similarly used for SVR, but few minor variations allow the algorithm to
explanatory variables does not require assumption. Second, strong re
be used as an efficient tool in evaluating real value functions. The SVR
lations among independent variables do not affect the DT outcomes.
provides flexibility to clarify how much error can be tolerated and de
Third, various dependent variables such as survived data, categorical
fines an appropriate line or hyperplane for data-fitting in higher di
and numerical can be covered by DT. Fourth, this technique comprises
mensions. It is also defined by regulating the number of support vectors
the powerful variables and eliminates the least powerful variables which
and margin using the sparse solution, kernels, and Vapnik-Chervonenkis
describe the dependent variable. For the DT, it is possible to predict
theory (VC). Although SVR is not as wide-ranging as SVM, it has yet been
small and large datasets well, even though this technique was initially
extended to many research fields, including but not limited to; control
developed to predict large data only [14].
systems, bioinformatics, electric loads and consumption, customer de
The algorithm of DT can be explained as follow:
mand, finance, tourism demand, air quality, prices in the market, and
flood control.
1. First, the calculation of the targeted variance is performed.
Grid-search with cross-validation is probably the most common
2. Based on the various attributes, the database is divided into distinct
approach to tuning SVR models. However, we must pay attention to the
parts, and the variance of each sectioned part is deducted from the
time ordering of our data. For example, a sliding-window cross-valida
variance before the division. This can be defined as variance
tion accounts for data ordering, whereas standard cross-validation does
reduction.
not, which might be inappropriate for our data.
Coming back to SVR parameters, given that we typically need to tune
9
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Fig. 14. Comparison of the disc cutters life predicted by the DT model with the actual ones.
Fig. 16. Comparison of the disc cutters life predicted by the KNN model with the actual ones.
10
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
11
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Table 6 datasets with different sizes. The size of datasets and the model’s per
R2 and RMSE of the simulated outputs. formances are tabulated in Table 6.
Predictive The training Datasets for R2 RMSE Both GPR and SVR are memory-based methods that store a part or
model datasets simulation the entire training data for testing. Therefore, their training is generally
Model I 100 40 0.6944 183.9310 fast and they can improve the efficiency of the massive-training meth
Model II 130 40 0.7583 146.7382 odology. GPR approaches nonlinear regression from a Bayesian
Model III 160 40 0.8866 107.3554 perspective. The Bayesian paradigm provides probabilistic modeling of
nonlinear regression. The Bayesian approach to regression specifies a
priori probability of the parameters to be estimated and it computes the
analyzing and comparing the values of the obtained statistical evalua
maximum a posteriori probability given the observed data samples.
tion indices for each model, it can be concluded that the highest accu
Contrary to non-Bayesian schemes where some criterion typically
racy and the lowest accuracy are provided by the GPR and KNN models,
chooses a single parameter, the Bayesian probabilistic model produces
respectively. Therefore, the most acceptable results, which are not very
both the optimal estimated function and the covariance associated with
different from the actual ones, are provided by the GPR model. After the
the estimation. Therefore, the Bayesian paradigm offers more informa
GPR model, the highest prediction accuracy is generated by the SVR
tion on the estimated parameters than does the non-Bayesian method
model.
ology. On the other hand, rooted in a maximum margin property, SVR
offers excellent generalization ability and robustness to outliers. Both
7. Discussion SVR and GPR are kernel-based nonlinear regression techniques. A kernel
or a covariance function is used to implicitly transform the original
Three primary hyperparameter tuning approaches have been pro image data into a high-dimensional reproducing kernel Hilbert space.
posed in literature. However, including random search, grid search, Therefore, both SVR and GPR, as the state-of-the-art nonlinear regres
metaheuristic-based search (smart search) although there are also other sion models, can offer a performance comparable or potentially superior
less popular approaches. Contrary to the “dumb” alternatives of grid to ML models.
search and random search, metaheuristic-based hyperparameter tuning, Although DT does not require normalization and rescaling of data, a
including PSO, is much less parallelizable. Instead of producing all the small change in the data can cause a large change in the decision tree
candidate points up front and investigating the batch in parallel, structure causing instability.
metaheuristic-based tuning approaches pick a few hyperparameter set KNN can be very sensitive to the scale of data as it relies on
tings, investigate their performance, then determine where to sample computing the distances. The calculated distances can be very high for
next. These methods are intrinsically iterative and sequential process, features with a higher scale and might produce poor results. Herein,
which is not parallelizable. On one hand, making fewer evaluations and although the data have been normalized, due to the large gap between
reducing the total time complexity are the primary goal of any compu the scale of features, KNN provides poor results.
tation algorithm. On the other hand, metaheuristic-based search algo The normalization technique was applied in this study. Since the
rithms require computation time to determine where to place the next data parameters have a wide range of values, this particular feature will
set of samples. Besides, metaheuristic-based search algorithms also govern the computed distance. This is the reason, why the range of all
contain parameters of their own that need to be tuned. These short features should be normalized (scaled) so that each feature will have
comings motivate us to choose random search, which has the least time values in same range.
and space complexity. In this study, the generalization of the suggested Gaussian process
In this article, considering the Bergstra theorem that stated that “if regression methodology is discussed. Generalization is a concept used to
the close-to-optimal region of hyperparameters occupies at least 5% of characterize the model’s ability to interact and adapt to new informa
the grid surface, then random search with 60 trials will find that region tion. Therefore, a model can ingest novel data and predict accurately
with high probability”, the hyperparameters of the GPR and SVR after practicing with data not used during training. The basis for a
methods, were optimized with a random search approach as follows: the model’s success and its practical performance is related to its capability
values of hyperparameter are modeled with an exponential probability to generalize. When a model was so well trained in training data, it
density function, which produces values for each hyperparameter to be cannot be generalized. When new data are given, the model is rendered
investigated according to the models’ performance in the validation set. inaccurate predictions and worthless even if it can accurately predict the
After 60 iterations with differently produced values from the probability training data. A model starts ‘memorizing’ the training data instead of
functions, the hyperparameters that generated the highest accuracy are ‘learning’; this is known as overfitting.
selected and stored. The optimal hyperparameters are chosen based on Feature selection can be used to avoid the overfitting of the model. In
the median of the hyperparameters’ values calculated for each fold. this case, feature selection would minimize the number of features,
Finally, the 5-fold cross-validation was repeated to guarantee that all which decreases the computational complexity of the model. The step
test instances are investigated with the optimal hyperparameters, which wise approach for choosing an important collection of features from the
leads to more stable practical and results. data sets is used for the full features available.
The performance of the proposed model was investigated by various
Table 7
First step of feature selection.
Estimate SE tStat p-value Significance code
12
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Table 8 0.05) are of the share of TF, RPM, PR, and Qc, respectively, as shown in
The second step of feature selection. Table 7. So these four parameters are chosen, and then step 2 is taken.
Estimate SE tStat p-value Significance The model now has only four predictors, which are TF, RPM, PR, and
code QC. It can be noticed that in Table 8 that the smallest range of features
(Intercept) 3717 171.16 21.717 6.02E- *** are {TF, RPM, PR, and Qc}, as shown in the same table that the highest
54 impact parameter for disk cutter life is the RPM parameter.
TF − 0.016222 0.0026838 − 6.0442 7.50E- *** In Table 9, the feature selection results made by the other ML models
09 of SVR, DT, and KNN are provided. As in Table 9, the smallest range of
RPM − 803.29 86.075 − 9.3325 2.35E- ***
17
features selected by all the models is {TF, RPM, PR, and Qc}, that the
PR 2.7451 2.4229 − 8.133 2.5861 *** highest impact parameter for disk cutter life is the RPM parameter.
E-11 Tables 8 and 9 show the similarity between the feature selection results
Qc − 11.202 2.5344 − 4.4199 1.64E- *** of all the models used in this paper.
05
8. Conclusions
Stepwise regression methods can be classified into three strategies:
The process of disc cutter wear is complicated with many influential
• The first strategy (forward selection), which begins without any factors. Cutter life is an important economic index for TBM excavation,
predictors in the model, relies on adding more iterative predictors. and its prediction is widely concerned. This study introduced a method
Simultaneously, it stops when the improvement in the results no for predicting the life of each cutter of the TBM based on the regression
longer has a statistically positive impact analysis on the cutter changing records during the excavation of the
Alborz service tunnel in Iran using four ML methods. 200 datasets
• The second strategy (backward selection), which begins with all monitored during the Alborz tunnel construction, including geology
predictors in the model, periodically eliminates the lowest contrib conditions, geometry, and operational parameters of the TBM were
utive predictors;. At the same time, it stops once you get a model, all applied in the models. Two software of MATLAB 2019 and Python were
its predictors become statistically meaningful. used to analyze the prediction models. In order to achieve more accurate
• The third strategy (stepwise selection) is a mixture of forwarding and predictions, the hyper-parameters of the ML models were optimized.
backward processes. It begins without any predictors and then adds The 5-fold CV method was applied to investigate the prediction effi
the predictors that contribute most to the outcome sequentially (like ciency of the models. The prediction models’ validitywas examined by
backward selection). While adding every new variable, those that no comparing their predicted results with the monitored ones with
more enhance the model’s fit should be removed (like forwarding reasonable agreement.
selection). The results indicated that, the GPR model with R2 = 0.8866, RMSE =
107.3554, and MAPE = 4.018355%, was the most accurate model to
In this research, the stepACI [MASS Package] was used, which de predict TBM disc cutters life. KNN model with R2 = 0.1753, RMSE =
fines the best design by AIC. The model also has a choice called direc 288.9277, and MAPE = 11.46970%, produced the minimum accuracy.
tion, which takes these values: i) forward (for elimination from The backward selection method was used to assess the contribution
forwarding); ii) backward (for elimination from backward); iii) both of each parameter in the prediction problem. The results showed that
(sequential replacement, for forward and backward elimination). The four TF, RPM, PR, and Qc parameters significantly contribute to TBM
best-finished model is recovered. In R, among the most popular search disc cutter’s life. However, RPM and PR parameters were more and less
methods for selecting features is stepAIC. For the stepAIC model’s values significant compared to the others.
continuously to arrive at the final feature set are attempted to be It is suggested that in future works, the models presented in this work
reduced. In the tables below, the finding listed as follows, three asterisks be used to predict disc cutter life in other tunnels by using newer data
(*) reflect the highly significant value of p. Therefore, it could reject the with various input parameters, the most accurate algorithms be identi
null hypothesis by providing a small p-value for the intercept and path fied and the most effective parameters on the TBM disc cutter life into
that enables us to create a good relationship between two measured the tunnels be specified. Also, given that there are a variety of geological
variables (the target and the predictor variables). A p-value around 5% and geotechnical issues that can be very important to predict, it is sug
or less would be a good cut-off point for most situations. gested that the prediction models presented in this work be used to
During the first stage, they fitted the model to include all the pre predict them, and their ability to solve these problems be examined.
dictors and the target. The lowest values of p (which must be less than
Table 9
The final step of feature selection is made by other models.
Method Estimate SE tStat p-value Significance code
13
A. Mahmoodzadeh et al. Automation in Construction 128 (2021) 103779
Declaration of Competing Interest [19] J. Dalong, S. Zhichao, Y. Dajun, Effect of spatial variability on disc cutters failure
during TBM tunneling in hard rock, Rock Mech. Rock. Eng. 53 (2020) 4609–4621,
https://doi.org/10.1007/s00603-020-02192-2.
There is no conflict of interest. [20] A. Mahmoodzadeh, M. Mohammadi, S.N. Abdulhamid, H.H. Ibrahim, H.F. Hama
Ali, S.G. Salim, Dynamic reduction of time and cost uncertainties in tunneling
References projects, Tunn. Undergr. Space Technol. 109 (2021) 103774, https://doi.org/
10.1016/j.tust.2020.103774.
[21] A. Mahmoodzadeh, M. Mohammadi, H.H. Ibrahim, S.N. Abdulhamid, S.G. Salim,
[1] Q. Tan, L. Yi, Y.M. Xia, Performance prediction of TBM disc cutting on marble rock H.F. Hama Ali, M.K. Majeed, Artificial intelligence forecasting models of uniaxial
under different load cases, KSCE J. Civ. Eng. 22 (2018) 1466–1472, https://doi. compressive strength, Transport. Geotech. 27 (2021) 100499, https://doi.org/
org/10.1007/s12205-017-1048-1. 10.1016/j.trgeo.2020.100499.
[2] P. Zhou, J.J. Guo, J. Sun, D.F. Zou DF., Theoretical research and simulation [22] A. Mahmoodzadeh, M. Mohammadi, H.F. Hama Ali, S.N. Abdulhamid, H.
analysis on the cutter spacing of double disc cutters breaking rock, KSCE J. Civ. H. Ibrahim, K.M.G. Noori, Dynamic prediction models of rock quality designation
Eng. 23 (2019) 3218–3227, https://doi.org/10.1007/s12205-019-1777-4. in tunneling projects, Transport. Geotech. 27 (2021) 100497, https://doi.org/
[3] R. Wang, Y. Wang, J. Li, L. Jing, G. Zhao, L. Nie, A TBM cutter life prediction 10.1016/j.trgeo.2020.100497.
method based on rock mass classification, KSCE J. Civ. Eng. 24 (2020) 2794–2807, [23] A. Mahmoodzadeh, M. Mohammadi, H.H. Ibrahim, K.M.G. Noori, S.
https://doi.org/10.1007/s12205-020-1511-2. N. Abdulhamid, H.F. Hama Ali, Forecasting sidewall displacement of underground
[4] Z. Zhang, M. Aqeel, C. Li, F. Sun, Theoretical prediction of wear of disc cutters in caverns using machine learning techniques, Autom. Constr. 123 (2021) 103530,
tunnel boring machine and its application, J. Rock Mech. Geotech. Eng. 11 (2019) https://doi.org/10.1016/j.autcon.2020.103530.
111–120, https://doi.org/10.1016/j.jrmge.2018.05.006. [24] H.Q. Yang, Z. Li, T.Q. Jie, Z.Q. Zhang, Effects of joints on the cutting behavior of
[5] R. Plinninger, H.K. Asling, K. Thuro, G. Spaun, Testing conditions and disc cutter running on the jointed rock mass, Tunn. Undergr. Space Technol. 81
geomechanical properties influencing the CERCHAR abrasiveness index (CAI) (2018) 112–120, https://doi.org/10.1016/j.tust.2018.07.023.
value, Int. J. Rock Mech. Min. Sci. 40 (2003) 259–263, https://doi.org/10.1016/ [25] A. Mahmoodzadeh, M. Mohammadi, A.H.M. Aldalwie, H.H. Ibrahim, T.A. Rashid,
S1365-1609(02)00140-5. H.F. Hama Ali, Tunnel geomechanical parameters prediction using Gaussian
[6] T.N. Michalakopoulos, V.G. Anagnostou, M.E. Bassanou, G.N. Panagiotou, The process regression, Mach. Learn. Appl. 3 (2021) 100020, https://doi.org/10.1016/
influence of steel styli hardness on the Cerchar abrasiveness index value, Int. J. j.mlwa.2021.100020.
Rock Mech. Min. Sci. 43 (2006) 321–327, https://doi.org/10.1016/j. [26] Y. Li, W. Zhang, Investigation on passive pile responses subject to adjacent
ijrmms.2005.06.009. tunnelling in anisotropic clay, Comput. Geotech. 127 (2020) 103782, https://doi.
[7] J. Hassanpour, Development of an empirical model to estimate disc cutter wear for org/10.1016/j.compgeo.2020.103782.
sedimentary and low to medium grade metamorphic rocks, Tunn. Undergr. Space [27] F. Chen, L. Wang, W. Zhang, Reliability assessment on stability of tunnelling
Technol. 75 (2018) 90–99, https://doi.org/10.1016/j.tust.2018.02.009. perpendicularly beneath an existing tunnel considering spatial variabilities of rock
[8] Q.S. Liu, J.P. Liu, Y.C. Pan, X.P. Zhang, X.X. Peng, Q.M. Gong, L.J. Du, A wear rule mass properties, Tunn. Undergr. Space Technol. 88 (2019) 276–289, https://doi.
and cutter life prediction model of a 20-in. TBM cutter for granite: a case study of a org/10.1016/j.tust.2019.03.013.
water conveyance tunnel in China, Rock Mech. Rock. Eng. 50 (2017) 1303–1320, [28] K. Elbaz, S.L. Shen, A. Zhou, Z.Y. Yin, H.M. Lyu, Prediction of disc cutter life during
https://doi.org/10.1007/s00603-017-1176-4. shield tunneling with AI via the incorporation of a genetic algorithm into a GMDH-
[9] A. Glowacz, Acoustic fault analysis of three commutator motors, Mech. Syst. Signal type neural network, Engineering 7 (2020) 238–251, https://doi.org/10.1016/j.
Process. 133 (2019) 106226, https://doi.org/10.1016/j.ymssp.2019.07.007. eng.2020.02.016.
[10] X.X. Liu, S.L. Shen, Y.S. Xu, Z.Y. Yin, Analytical approach for time-dependent [29] S.R. Torabi, H. Shirazi, H. Hajali, M. Monjezi, Study of the influence of
groundwater inflow into shield tunnel face in confined aquifer, Int. J. Neumer. geotechnical parameters on the TBM performance in Tehran–Shomal highway
Anal. Method Geomech. 42 (2018) 655–673, https://doi.org/10.1002/nag.2760. project using ANN and SPSS, Arab. J. Geosci. 6 (2013) 1215–1227, https://doi.
[11] A. Mahmoodzadeh, M. Mohammadi, A. Daraei, T.A. Rashid, A.F.H. Sherwani, R. org/10.1007/s12517-011-0415-3.
H. Faraj, A.M. Darwesh, Updating ground conditions and time-cost scatter-gram in [30] Y. Yang, K. Hong, Z. Sun, K. Chen, F. Li, J. Zhou, B. Zhang, The derivation and
tunnels during excavation, Autom. Constr. 105 (2019) 102822, https://doi.org/ validation of TBM disc cutter wear prediction model, Geotech. Geol. Eng. 36
10.1016/j.autcon.2019.04.017. (2018) 3391–3398, https://doi.org/10.1007/s10706-018-0540-9.
[12] K. Elbaz, S.L. Shen, A.N. Zhou, D.J. Yuan, Y.S. Xu, Optimization of EPB shield [31] Y. Xia, K. Zhang, J. Liu, Design optimization of TBM disc cutters for different
performance with adaptive neuro-fuzzy inference system and genetic algorithm, geological conditions, World J. Eng. Technol. 3 (2015) 218–231. https://doi.
Appl. Sci. 9 (2019) 780, https://doi.org/10.3390/app9040780. org/10.4236/wjet.2015.34023.
[13] A. Mahmoodzadeh, M. Mohammadi, A. Daraei, R.H. Faraj, R.M.D. Omer, A.F. [32] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach.
H. Sherwani, Decision-making in tunneling using artificial intelligence tools, Tunn. Learn. Res. 13 (2012) 281–305. https://jmlr.org/papers/v13/bergstra12a.html.
Undergr. Space Technol. 103 (2020) 103514, https://doi.org/10.1016/j. [33] J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter
tust.2020.103514. optimization, in: Proceedings of the 24th International Conference on Neural
[14] A. Mahmoodzadeh, M. Mohammadi, A. Daraei, H.F. Hama-Ali, A.I. Abdullah, N. Information Processing SystemsDecember, 2011, pp. 2546–2554. https://papers.
K. Al-Salihi, Forecasting tunnel geology, construction time and costs using machine nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html.
learning methods, Neural Comput. & Applic. 33 (2021) 321–348, https://doi.org/ [34] A. Zheng, M. Bilenko, Lazy paired hyper-parameter tuning, in: Proceedings of the
10.1007/s00521-020-05006-2. Twenty-Third international joint conference on Artificial Intelligence, 2013,
[15] A. Glowacz, Fault diagnosis of electric impact drills using thermal imaging, pp. 1924–1931, https://doi.org/10.5555/2540128.2540404.
Measurement 171 (2021) 108815, https://doi.org/10.1016/j. [35] D. Maclaurin, D. Duvenaud, R. Adams, Gradient-based Hyperparameter
measurement.2020.108815. optimization through reversible learning, in: Proceedings of the 32nd International
[16] A. Mahmoodzadeh, M. Mohammadi, A. Daraei, H.F. Hama-Ali, N.K. Al-Salihi, R.M. Conference on Machine Learning, PMLR 37, 2015, pp. 2113–2122, in: http://pro
D. Omer, Forecasting maximum surface settlement caused by urban tunneling, ceedings.mlr.press/v37/maclaurin15.html.
Autom. Constr. 120 (2020) 103375, https://doi.org/10.1016/j. [36] P. Hall, B.U. Park, R.J. Samwort, Choice of neighbor order in nearest-neighbor
autcon.2020.103375. classification, Ann. Stat. 36 (2018) 2135–2152, https://doi.org/10.1214/07-
[17] W. Zhang, A.T.C. Goh, Multivariate adaptive regression splines and neural network AOS537.
models for prediction of pile drivability, Geosci. Front. 7 (2016) 45–52, https:// [37] A.B. Hassanat, M.A. Abbadi, G.A. Altarawneh, A.A. Alhasanat, Solving the problem
doi.org/10.1016/j.gsf.2014.10.003. of the K parameter in the KNN classifier using an ensemble learning approach, Int.
[18] A.T.C. Goh, W. Zhang, Y. Zhang, X. Yang, Y. Xiang, Determination of earth pressure J. Comput. Sci. Inform. Secur. 12 (2014) 33–39. https://arxiv.org/abs/1409.0919.
balance tunnel-related maximum surface settlement: a multivariate adaptive [38] A. Celisse, T. Mary Huard, Theoretical analysis of cross-validation for estimating
regression splines approach, Bull. Eng. Geol. Environ. 77 (2018) 489–500, https:// the risk of the k-nearest neighbor classifier, J. Mach. Learn. Res. 19 (2018) 1–54.
doi.org/10.1007/s10064-016-0937-8. https://jmlr.csail.mit.edu/papers/v19/15-498.html.
14