Orthod Craniofacial Res - 2024 - Köktürk - Evaluation of Different Machine Learning Algorithms For Extraction Decision in Orthodontic Treatment
Orthod Craniofacial Res - 2024 - Köktürk - Evaluation of Different Machine Learning Algorithms For Extraction Decision in Orthodontic Treatment
DOI: 10.1111/ocr.12811
RESEARCH ARTICLE
1
Department of Orthodontics, Faculty
of Dentistry, Başkent University, Ankara, Abstract
Turkey
Introduction: The extraction decision significantly affects the treatment process and
2
Independent Researcher, Ankara, Turkey
outcome. Therefore, it is crucial to make this decision with a more objective and stand-
Correspondence ardized method. The objectives of this study were (1) to identify the best-performing
Hande Pamukçu, Department of
Orthodontics, Faculty of Dentistry,
model among seven machine learning (ML) models, which will standardize the extrac-
Başkent University, Yukarıbahçelievler tion decision and serve as a guide for inexperienced clinicians, and (2) to determine
Mah. 82. Sokak No: 26, 06490
Bahçelievler, Ankara, Turkey.
the important variables for the extraction decision.
Email: [email protected] Methods: This study included 1000 patients who received orthodontic treatment with
Funding information
or without extraction (500 extraction and 500 non-extraction). The success criteria of
Baskent Üniversitesi the study were the decisions made by the four experienced orthodontists. Seven ML
models were trained using 36 variables; including demographic information, cephalo-
metric and model measurements. First, the extraction decision was performed, and
then the extraction type was identified. Accuracy and area under the curve (AUC) of
the receiver operating characteristics (ROC) curve were used to measure the success
of ML models.
Results: The Stacking Classifier model, which consists of Gradient Boosted Trees,
Support Vector Machine, and Random Forest models, showed the highest perfor-
mance in extraction decision with 91.2% AUC. The most important features deter-
mining extraction decision were maxillary and mandibular arch length discrepancy,
Wits Appraisal, and ANS-Me length. Likewise, the Stacking Classifier showed the
highest performance with 76.3% accuracy in extraction type decisions. The most
important variables for the extraction type decision were mandibular arch length
discrepancy, Class I molar relationship, cephalometric overbite, Wits Appraisal, and
L1-NB distance.
Conclusion: The Stacking Classifier model exhibited the best performance for the ex-
traction decision. While ML models showed a high performance in extraction deci-
sion, they could not able to achieve the same level of performance in extraction type
decision.
KEYWORDS
ensemble models, extraction decision, machine learning, treatment plan
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in
any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2024 The Author(s). Orthodontics & Craniofacial Research published by John Wiley & Sons Ltd.
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 KÖKTÜRK et al.
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KÖKTÜRK et al. 3
extraction decisions. The success criteria of this study was that used as training data (800 patients) and one-f old was used as
the decisions of the ML models were consistent with the decisions test data (200 patients). The ML models were trained with
of experienced orthodontists. A total of 36 variables, consisting the four-f olds, and their performance on the test data was re-
of demographic information, cephalometric and model measure- corded. This process was repeated five times, with each fold
ments were chosen to train the ML models. Twenty-nine mea- serving as the test data once. The average performance across
surements were chosen for cephalometric evaluation (Table 1). all iterations was calculated. This process was repeated 10
Gender, initial age, and lip posture data were obtained from the times, shuffling the data and re-d ividing data into folds. As a
digital anamnesis forms in the university's patient archive. Angle's final metric, the average of the repeated cross-v alidation re-
molar classification was determined based on the patient's initial sults was calculated. Different regularization techniques speci-
models and intraoral photographs of the patients. Additionally, fied for each algorithm were applied to prevent overfitting.
maxillary-mandibular arch length discrepancy and the curve of Additionally, each algorithm was tuned with different hyper-
Spee were measured with a digital calliper. parameters, maximizing the repeated cross-v alidation metric
on test data, and the best hyper-p arameter selections for each
algorithm were found.
2.2 | Reliability analysis
All data were analysed and evaluated by the same investigator (BK). 2.5 | ML models' performance evaluation
Pre-
treatment lateral cephalometric radiographs were measured
with the Dolphin Imaging® program (Vers 11.95, Patterson Dental). ML models were evaluated by comparing their predictions to the
The intra-examiner reliability was tested by remeasuring 20% of actual treatment plans determined by the orthodontists for each
the radiographs 2 weeks after the first measurement. The intra- sample. As a metric of success, the area under the curve (AUC) of
examiner reliability level was assessed using intra-class correlation the receiver operating characteristic (ROC) curve, accuracy, pre-
coefficients (ICCs). cision and recall were measured. The ROC curve is a graphical
representation of a model's classification performance at differ-
ent thresholds, and the AUC is calculated by measuring the area
2.3 | Machine learning models under this curve. A higher AUC value indicates a more successful
ML model, with a value closer to 1 signifying superior performance.
The extraction decision of ML models was analysed in a two-s tep The equations and explanations for accuracy, precision, and recall
approach. First, extraction decisions were identified by using a metrics were provided in Figure S1. Three of the algorithms that
binary classifier. Extraction cases were labelled as 1, and non- achieve the highest performance were selected and combined
extraction cases were labelled as 0. Then, the extraction types were using ensemble models known as voting and stacking. The percent-
identified by a multiclass classifier. The outputs of this classifier are age of patients correctly assigned to the appropriate treatment
the probabilities for each extraction type. The extraction type with plan, along with 95% confidence intervals (CIs), was calculated.
maximum probability is chosen as the final decision (Figure 1). MLxtend library was used for statistical analysis. Cochran's Q
ML models were trained with a total of 36 variables (Table 2). test was utilized to assess the performance of the classifiers and
Seven different types of ML models were applied: Logistic Pair-wise McNemar tests were conducted to examine specific dif-
Regression, Support Vector Machines, Random Forests, Gradient ferences at the P < .05 level of significance (Table S1).
Boosted Trees, Multi-layer Perceptron (ANN), Voting Classifier, and
Stacking Classifier. These models were implemented with Python
3.9.12 (Python Software Foundation) using the ML packages: Scikit- 3 | R E S U LT S
Learn module26 and LightGBM. 27 The models had been made acces-
sible as open-
source on: https://github.com/ortho-research/extra 3.1 | Reliability analysis
ction-decision.
In order to explain the models and identify how the variables ICC values were ranged from 0.91 to 0.98, representing excellent
were affecting the inference, SHAP (Shapley Additive explanation)28 reliability.
package was used.
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 KÖKTÜRK et al.
TA B L E 1 Variable definitions.
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KÖKTÜRK et al. 5
II division 1 was the most common malocclusion with 61.2%. On the Multi-L ayer Perceptron—81.5% accuracy (95% CI of 81%–82%)
other hand, Class I malocclusion was the most common among the and 0.881 AUC (95% CI of 0.875–0.887).
non-extraction cases with 54.4% (Table 2). The mean values of the
variables for the extraction and non-extraction groups were given Models' performance for the extraction type decision as follows:
in Table 2.
Gradient Boosted Trees—75.9% accuracy (95% CI of
75.1%–76.7%).
3.3 | ML models' performance results Support Vector Machine—75.3% accuracy (95% CI of
75.1%–75.5%).
The null hypothesis was rejected because there were differences Random Forest—75.1% accuracy (95% CI of 74.6%–75.6%).
between the seven ML models, and the model with the highest per- Logistic Regression—75.7% accuracy (95% CI of 75.3%–76.1%).
formance was the Stacking Classifier for both decisions. Although Multi-L ayer Perceptron—75% accuracy (95% CI of 74.3%–75.7%).
the Stacking Classifier was the best model, the ML models' success
was found to be close to each other. Accuracy, AUC, precision and The performances of the ensemble models were presented
recall of the ML models were given in Table 3. The performances of below in order of success based on AUC values.
the five models, excluding the ensemble models, were given below Stacking Classifier model consists of three models with the
in order of success based on AUC values. highest performance: Random Forest, Support Vector Machine,
Models' performance for the extraction/non-extraction decision and Gradient Boosted Trees. This model showed 84.1% accuracy
as follows: (95% CI of 83.8%–84.4%) and 0.912 AUC (95% CI of 0.908–0.916)
for the extraction/non-extraction decision; 76.3% accuracy (95% CI
Gradient Boosted Trees—83.3% accuracy (95% CI of 82.8%– of 76.0%–76.6%) for the extraction type decision. The ROC curve
83.8%) and 0.899 AUC (95% CI of 0.897–0.902). graph of the Stacking Classifier is given in Figure S2.
Support Vector Machine—82.6% accuracy (95% CI of 82.4%– Voting Classifier model consists of three models with the high-
82.8%) and 0.898 AUC (95% CI of 0.897–0.900). est performance: Random Forest, Support Vector Machine, and
Random Forest—81.7% accuracy (95% CI of 81.3%–82.1%) and Gradient Boosted Trees. This model showed 83% accuracy (95%
0.894 AUC (95% CI of 0.892–0.896). CI of 82.8%–83.2%) and 0.905 AUC (95% CI of 0.903–0.907) for
Logistic Regression—82.4% accuracy (95% CI of 82.1%–82.7%) the extraction/non-extraction decision; 76.1% accuracy (95% CI of
and 0.888 AUC (95% CI of 0.889–0.897). 75.7%–76.5%) for the extraction type decision.
|
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 KÖKTÜRK et al.
Treatment type
Mean age ± SD (years) 14.98 ± 4.87 15.00 ± 4.12 14.95 ± 5.53 .888
Sex (n) (%)
Female 618 314 (62.8%) 304 (60.8%) .515
Male 382 186 (37.2%) 196 (39.2%)
Angle's molar relationship (n) (%)
Class I 439 167 (33.4%) 272 (54.4%) <.001*
Class II Division 1 500 306 (61.2%) 194 (38.8%)
Class II Division 2 47 23 (4.6%) 24 (4.8%)
Class III 14 4 (0.8%) 10 (2%)
Molar key (n) (%)
Class III key 14 4 10
Super Class I key 101 15 86
Class I key 338 132 206 <.001*
End-on key 211 98 113
Class II key 336 231 105
Lip posture (n) (%)
Compotent 947 457 (91.4%) 490 (98%) <.001*
Incompotent 53 43 (8.6%) 10 (2%)
Model measurements (mean ± SD)
Maxillary arch length discrepancy (mm) −3.91 ± 3.69 −5.8 ± 3.48 −2.02 ± 2.81 <.001*
Mandibular arch length discrepancy (mm) −2.45 ± 2.63 −3.5 ± 2.80 −1.4 ± 1.95 <.001*
Curve of Spee (mm) 2.68 ± 0.50 2.72 ± 0.52 2.64 ± 0.47 .012*
Cephalometric measurements (mean ± SD)
SNA (°) 80.43 ± 3.79 80.31 ± 3.91 80.55 ± 3.67 .312
SNB (°) 76.59 ± 3.77 75.97 ± 3.59 77.22 ± 3.84 <.001*
ANB (°) 3.84 ± 2.60 4.34 ± 2.57 3.33 ± 2.54 <.001*
Wits Appraisal (mm) 1.41 ± 3.93 2.31 ± 4.05 0.51 ± 3.59 <.001*
GoGnSN (°) 33.88 ± 5.89 34.87 ± 5.97 32.88 ± 5.64 <.001*
FMA (°) 27.56 ± 5.51 28.55 ± 5.70 26.57 ± 5.14 <.001*
Sum of the angles (°) 396.84 ± 5.83 397.8 ± 5.94 395.88 ± 5.57 <.001*
ANS-Me (mm) 62.90 ± 6.50 63.88 ± 5.92 61.91 ± 6.89 <.001*
U1-NA (mm) 4.46 ± 2.95 4.83 ± 3.07 4.09 ± 2.78 <.001*
U1-NA (°) 22.74 ± 8.80 23.69 ± 9.17 21.80 ± 8.32 .001*
U1-FH (°) 112.45 ± 8.92 113.25 ± 9.27 111.65 ± 8.49 .005*
U1-PP (°) 111.56 ± 8.51 112.21 ± 8.92 110.92 ± 8.03 .016*
L1-NB (mm) 5.23 ± 2.51 5.70 ± 2.55 4.76 ± 2.38 <.001*
L1-NB (°) 26.06 ± 6.83 26.87 ± 6.93 25.25 ± 6.65 <.001*
IMPA (°) 92.62 ± 7.71 93.10 ± 7.86 92.15 ± 7.54 .053
L1-A-Pg (mm) 2.23 ± 2.57 2.43 ± 2.66 2.03 ± 2.46 .015*
Interincisal angle (°) 127.36 ± 12.2 125.1 ± 12.34 129.62 ± 11.64 <.001*
Cephalometric overjet (mm) 4.5 ± 2.59 5.03 ± 2.77 3.98 ± 2.28 <.001*
Cephalometric overbite (mm) 2.46 ± 2.46 2.21 ± 2.55 2.71 ± 2.35 .001*
Upper lip to E line (mm) −2.53 ± 2.45 −2.20 ± 2.46 −2.86 ± 2.40 <.001*
Lower lip to E line (mm) −0.87 ± 2.64 −0.43 ± 2.69 −1.31 ± 2.51 <.001*
|
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KÖKTÜRK et al. 7
TA B L E 2 (Continued)
Treatment type
Nasolabial angle (°) 109.76 ± 10.38 109.4 ± 10.64 110.12 ± 10.11 .273
Facial convexity angle (°) 17.03 ± 6.27 18.15 ± 5.92 15.92 ± 6.43 <.001*
Z angle (°) 72.31 ± 7 70.88 ± 6.62 73.75 ± 7.08 <.001*
Maxillary sulcus depth (mm) 151.58 ± 12.18 152.83 ± 11.98 150.32 ± 12.26 .001*
Mandibulary sulcus depth (mm) 128.70 ± 14.76 128.58 ± 15.30 128.81 ± 14.21 .812
Upper lip thickness (mm) 11.69 ± 2.20 11.39 ± 2.15 11.98 ± 2.21 <.001*
Lower lip thickness (mm) 10.71 ± 2.43 10.40 ± 2.41 11.02 ± 2.42 <.001*
Upper incisor exposure (mm) 2.90 ± 1.92 2.86 ± 2 2.94 ± 1.85 .513
Note: T-test to compare age, model and cephalometric measurements between extraction and non-extraction groups. Chi-square test to compare
sex, Angle's molar relationship and lip posture between extraction and non-extraction groups.
Abbreviation: SD, standard deviation.
*Statistically significant differences (P ≤ .05).
The performance assessment of the seven ML models is given relationship (12.2%), cephalometric overbite (7.69%), Wits Appraisal
in Table S1. (7.25%) and L1-NB distance (7.19%).
For the 44-0 0 and 44-4 4 groups, Class I molar relationship and
mandibular arch length discrepancy were found the most important
3.4 | Important variables of the variables. Patients with molar relationships other than Class I and
best-performing model patients with high values for mandibular arch length discrepancies
were more likely to be in the 44-0 0 group, whereas the opposite was
The most important variables affecting the extraction/non- true for the 44-4 4 group (Figure S3 and S4). The important variables
extraction decisions (classifier 1) were shown in Figure 3: maxillary for the 44-55 group are given in Figure S5.
(30.7%) and mandibular (10.56%) arch length discrepancies, Wits
Appraisal (6.73%) and ANS-Me length (6.65%). There was a nega-
tive correlation between the probability of extraction and maxillary/ 4 | DISCUSSION
mandibular arch length discrepancy. On the other hand, there is a
positive correlation between Wits Appraisal and ANS-Me length. ML is widely used in both dentistry and continuously advancing,
The most important variables affecting the extraction type deci- especially in the fields of diagnosis and treatment planning. In this
sion were mandibular arch length discrepancy (14.56%), Class I molar study, the success of seven ML models in extraction decisions was
|
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 KÖKTÜRK et al.
84.1% (83.8%–84.4%)
83.2% (82.8%–83.6%)
84.4% (84.1%–84.7%)
76.3% (76.0%–76.6%)
0.912 (0.908–0.916)
of 10 years of academic and clinical experience. The objective was
Stacking classifier
Stacking classifier
to identify the most effective ML method that could provide sup-
port to clinicians with less clinical experience. To achieve this, seven
different ML models were trained using data from 1000 patients,
incorporating 36 parameters. The Stacking Classifier model, which
83% (82.8%–83.2%) combines the three most successful models' outputs, achieved the
83.8% (83.7%–83.9%)
82.6% (82.4%–82.8%)
highest accuracy of 84.1% and the highest AUC of 0.912 for the ex-
76.1% (75.7%–76.5%)
0.905 (0.903–0.907)
traction/non-extraction decision. The three best-performing models
Voting classifier
Voting classifier
83.3% (82.8%–83.8%)
84.1% (83.7%–84.5%)
82.8% (82.2%–83.4%)
account that Jung et al.17 and Li et al.18 did not apply any cross-
validation techniques. In the literature, two studies measured AUC
with the Multilayer Perceptron model as 98% and 82%, respec-
tively.18,21 However, the possibility of overfitting should be consid-
ered in these studies. On the other hand, Support Vector Machine
was found to be the most successful model with 92.5% AUC in the
Support vector machine
Support vector machine
75.3% (75.1%–75.5%)
81.3% (81.2%–81.4%)
0.898 (0.897–0.900)
75.7% (75.3%–76.1%)
81.5% (81.1%–81.9%)
0.888 (0.889–0.897)
the extraction type were also different in the study of Li et al.; the
Logistic regression
Logistic regression
75.1% (74.6%–75.6%)
0.894 (0.892–0.896)
better performances than single models. Each ML model has its own
strengths and weaknesses. By appropriately merging their outputs,
Abbreviation: AUC, area under the ROC curve.
80.9% (80.2%–81.6%)
82.6% (80%–83.2%)
75% (74.3%–75.7%)
81.5% (81%–82%)
Precision
Recall
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KÖKTÜRK et al. 9
F I G U R E 3 SHAP summary plot of the top nine variables of the Stacking Classifier model in extraction decision. Positive (extraction) and
negative (non-extraction) values represent the classes for the extraction decision. Higher SHAP values indicate a higher impact on the model
decision. For each variable, a dot was created indicating the patient's value in that variable and the colour of this point is determined by the
value of the variable. Red represents higher values of that variable and blue represents lower values. The definitions of the variables were
given in Table 2.
a large sample for ML studies. Another problem is the imbalanced In this study, the decisions of four orthodontists who have been
sample distribution which has negative consequences. The distribu- working together at the same university for at least 10 years have
tion of extraction and non-extraction groups was generally around been utilized. Even though these four orthodontists have received
40%–60% to 24.8%–75.2% in the previous studies.16–18,21,22 In this the same training and have been working together for years, they
study, while the sample size was kept as high as possible, the ratio may have different tendencies in treatment decisions. In the liter-
of extraction and non-extraction was kept equal (%50). However, ature, there is no consensus among ML studies on whether the ex-
due to the limited number of patients in each extraction group, this traction decision is made by a single expert, by a committee, or by
balance could not be achieved for the extraction types. multiple experts. Jung et al.17 and Li et al.18 achieved a more stable
One of the major challenges in ML is overfitting. This phenome- model by, respectively, using one and two experts, leading to high
non occurs when the model becomes unnecessarily complex, essen- performance. Unfortunately, the generalizability of models trained
tially memorizing the training data and failing to generalize to solve with a single perspective is questionable. Leavitt et al.32 extensively
the problem. In such conditions, metrics on the training sets are high, discussed this fact in their study, in which extraction decisions were
but they are significantly lower on the unseen test data. In order to made by 30 experts. Treatment bias can be prevented in mod-
prevent overfitting, samples can be divided as train and test sets, els trained with various perspectives and treatment philosophies.
regularization techniques specific to each ML model can be used, However, the inclusion of too many experts in treatment planning
and cross-validation can be applied. In this study, all data were used can also reduce consistency.
both during train and test phases (Figure 2). In traditional approaches Parameters that affect the extraction decision of clinicians are
where data are divided into train and test sets directly, the output one of the important topics for orthodontics. Numerous studies
scores cannot be generalized as they may be dependent on a specific have demonstrated that one of the most important parameters is
split, or the order of the data. When the previous studies were in- crowding.1,33–36 However, Evrard et al. 34 found that the soft tis-
21,22,29
vestigated, three studies used cross-validation methods. The sue profile has a greater impact on the decision to extract teeth
5-fold cross-validation method was used and repeated 10 times to in their survey study. This indicates that orthodontists are now
establish a stronger validation scheme in this research. placing more emphasis on facial aesthetics. Iard et al. revealed that
The accuracy and AUC were used as success metrics. AUC is a the initial lip protrusion is a helpful parameter for the extraction
metric that identifies how two classes are separable and gives a bet- decision in borderline cases. 37 However, the extent of soft tissue
ter understanding of the performance of the model when data is changes caused by tooth and alveolar process movement is still un-
31
imbalanced. In a ML problem where 90% of the data is positive, certain. 38,39 Another parameter for the extraction decision is the
and 10% is negative, accuracy can be a misleading metric as a model vertical pattern of the patient. Guo et al. 33 found that the growth
always classifies positive would reach 90% accuracy. AUC was uti- pattern was the most important parameter for the extraction
lized as a success metric in three previous studies.18,21,29 Mason decision. While extraction treatments tend to be performed in
29
et al. calculated the CIs of the four ML models and found larger hyper-divergent patients, non-extraction treatments are mostly
intervals for each model compared to this study. The reason for the performed in mesio-
divergent patients. However, some stud-
significantly smaller CIs in current study is the larger dataset, using ies have argued that treatment with or without extraction does
5-fold cross-validation repeated 10 times, and tuning of each model. not cause a significant vertical change. 8,40 Previous ML studies
|
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 KÖKTÜRK et al.
concluded that the extraction decision of ML models is mostly model was the Stacking Classifier with 84.1% accuracy and
18,21,22,29
based on maxillary/mandibular arch length discrepancy, 0.912 AUC value.
U1-N A(mm),18,29 L1-NB (mm), 21,29 molar relationship, 22 incompe- 2. While ML models showed a high performance in extraction deci-
tent lips and IMPA.16 Current study revealed that the arch length sions, they could not show this performance in extraction type
discrepancy is the most important variable in the extraction de- decisions.
cision for ML models, which was in agreement with the findings 3. The variables that play an important role in the decision of ex-
of previous studies.18,21,22 This indicates that although tooth ex- traction for ML models were maxillary/mandibular arch length
traction decisions are made for various conditions, the primary discrepancy, Wits Appraisal, and ANS-Me length. On the other
reason is still the lack of space. Unlike previous research, this hand, the most important variables in the extraction type de-
study found Wits Appraisal and ANS-M e length as other import- cision were mandibular arch length discrepancy, Class I molar
ant parameters for the extraction decision. These parameters also relationship, cephalometric overbite, Wits Appraisal, and L1-NB
point to the sagittal and vertical problems of the patients. distance.
4. Future studies should focus not only on the development of
algorithms, but also on matching clinical scenarios in order
4.1 | Clinical significance to evaluate the algorithms and support decision-
making in
extraction.
Based on these results, the Stacking Classifier, which consists of
Gradient Boosted Trees, Support Vector Machine, and Random AU T H O R C O N T R I B U T I O N S
Forest models, appears to be the best ML model. As algorithms B.K. contributed to the concept of this study, determined the
continue to be developed, these models could greatly contribute methodology, explored the sources, and wrote the original text.
to the planning phase of orthodontic treatment. It is important to H.P. contributed to the concept design of this study, determined
note that no ML model can replace the judgement of an experienced the methodology, reviewed and edited the original text, and su-
orthodontist. pervised. Ö.G. determined the methodology, implemented the
software, performed the data curation, and wrote the original text.
All authors have read and agreed to the published version of the
4.2 | Limitations manuscript.
One of the limitations was that some parameters, including the pa- AC K N OW L E D G E M E N T S
tient's cooperation, desire for the treatment plan, presence of den- This study was supported by Başkent University Research Fund. All
tal asymmetry, maturation stage, gingival biotype, photographic soft authors have nothing to disclose.
tissue analysis (gummy smile, etc.), which could potentially influence
the extraction decision were not included in this study. The qual- C O N FL I C T O F I N T E R E S T S TAT E M E N T
ity assessment of treatment outcomes could not be conducted as The authors have nothing to disclose.
some of the patients' treatments are still ongoing. In future studies,
including only patients who have successfully completed their treat- DATA AVA I L A B I L I T Y S TAT E M E N T
ment can increase the quality of the study. Another limitation was The data that support the findings of this study are available from
that a sufficient number of patients could only be obtained just for the corresponding author upon reasonable request.
some extraction groups since the most common types of extractions
were 44-0 0, 44-4 4, and 44-55 in Başkent University's Orthodontic E T H I C A L A P P R OVA L
clinic. Other combinations of premolar extractions, molar, and lower Written informed consents from the patients had been collected
incisor extractions could also be included in future studies. Due to at the beginning of the treatment as a standard procedure of the
the retrospective nature of this study, the inter-examiner and intra- Başkent University's orthodontic clinic.
examiner reliability of orthodontists' treatment decisions could not
be analysed. Future research involving multiple orthodontists with ORCID
different perspectives and backgrounds contributes to the generali- Hande Pamukçu https://orcid.org/0000-0003-4242-5114
zation of the study. Finally, further investigations can enhance the
reliability of the study by testing it with an external test set. REFERENCES
1. Jackson TH, Guez C, Lin FC, Proffit WR, Ko CC. Extraction fre-
quencies at a university orthodontic clinic in the 21st century:
demographic and diagnostic factors affecting the likelihood of
5 | CO N C LU S I O N S
extraction. Am J Orthod Dentofacial Orthop. 2017;151(3):456-462.
doi:10.1016/j.ajodo.2016.08.021
1. This study compared seven machine learning models for ex- 2. Saghafi N, Heaton LJ, Bayirli B, Turpin DL, Khosravi R, Bollen
traction decision and extraction type. The highest performing AM. Influence of clinicians' experience and gender on extraction
|
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
KÖKTÜRK et al. 11
decision in orthodontics. Angle Orthod. 2017;87(5):641-650. 21. Etemad L, Wu TH, Heiner P, et al. Machine learning from clinical data
doi:10.2319/020117-8 0.1 sets of a contemporary decision for orthodontic tooth extraction.
3. Al-Ani MH, Mageet AO. Extraction planning in orthodon- Orthod Craniofac Res. 2021;24:193-200. doi:10.1111/ocr.12502
tics. J Contemp Dent Pract. 2018;19(5):619-623. doi:10.5005/ 22. Del Real A, Del Real O, Sardina S, Oyonarte R. Use of automated
jp-journals-10024-2307 artificial intelligence to predict the need for orthodontic ex-
4. Konstantonis D, Vasileiou D, Papageorgiou SN, Eliades T. Soft tis- tractions. Korean J Orthod. 2022;52(2):102-111. doi:10.4041/
sue changes following extraction vs. nonextraction orthodontic kjod.2022.52.2.102
fixed appliance treatment: a systematic review and meta-analysis. 23. Liaw A, Wiener M. Classification and regression by randomForest.
Eur J Oral Sci. 2018;126(3):167-179. doi:10.1111/eos.12409 R News. 2002;2(3):18-22.
5. Peck S. Extractions, retention and stability: the search for ortho- 24. Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton- Brown K.
dontic truth. Eur J Orthod. 2017;39(2):109-115. doi:10.1093/ejo/ Auto-WEKA: automatic model selection and hyperparameter opti-
cjx004 mization in WEKA. Automated Machine Learning: Methods, Systems,
6. Baek ES, Hwang S, Kim KH, Chung CJ. Total intrusion and distal- Challenges. Cham, Switzerland AG: Springer Nature; 2019:81-95.
ization of the maxillary arch to improve smile esthetics. Korean J 25. Lee R, MacFarlane T, O'Brien K. Consistency of orthodontic
Orthod. 2017;47(1):59-73. treatment planning decisions. Clin Orthod Res. 1999;2(2):79-84.
7. Germec-C akan D, Taner TU, Akan S. Arch-width and perimeter doi:10.1111/ocr.1999.2.2.79
changes in patients with borderline class i malocclusion treated 26. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine
with extractions or without extractions with air-rotor stripping. learning in python. J Mach Learn Res. 2011;12:2825-2830.
Am J Orthod Dentofacial Orthop. 2010;137(6):734.e1-734.e7. 27. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient
doi:10.1016/j.ajodo.2009.12.023 gradient boosting decision tree. Adv Neural Inf Process Syst.
8. Kirschneck C, Proff P, Reicheneder C, Lippold C. Short-term ef- 2017;30:3146-3154.
fects of systematic premolar extraction on lip profile, vertical di- 28. Lundberg SM, Lee SI. A unified approach to interpreting model pre-
mension and cephalometric parameters in borderline patients for dictions. Adv Neural Inf Process Syst. 2017;30:4768-4777.
extraction therapy—a retrospective cohort study. Clin Oral Investig. 29. Mason T, Kelly KM, Eckert G, Dean JA, Dundar MM, Turkkahraman
2016;20(4):865-874. doi:10.1007/s00784-015-1574-5 H. A machine learning model for orthodontic extraction/
9. Herzog C, Konstantonis D, Konstantoni N, Eliades T. Arch-width non- extraction decision in a racially and ethnically diverse pa-
changes in extraction vs nonextraction treatments in matched tient population. Int Orthod. 2023;21(3):100759. doi:10.1016/j.
class I borderline malocclusions. Am J Orthod Dentofacial Orthop. ortho.2023.100759
2017;151(4):735-743. doi:10.1016/j.ajodo.2016.10.021 30. Osisanwo FY, Akinsola JET, Awodele O, Hinmikaiye JO, Olakanmi
10. Domingos P. A few useful things to know about machine learning. O, Akinjobi J. Supervised machine learning algorithms: classification
Commun ACM. 2012;55(10):78-87. https://dl.acm.org/citation.c fm? and comparison. Int J Comput Trends Technol. 2017;48(3):128-138.
id=2347755 31. Wardhani NWS, Rochayani MY, Iriany A, Sulistyono AD, Lestantyo
11. Tandon D, Rajawat J, Banerjee M. Present and future of artificial P. Cross-validation metrics for evaluating classification performance
intelligence in dentistry. J Oral Biol Craniofacial Res. 2020;10(4):391- on imbalanced data. 2019 International Conference on Computer,
396. doi:10.1016/j.jobcr.2020.07.015 Control, Informatics and Its Applications (IC3INA). 2019;IEEE:14-18.
12. Shan T, Tay FR, Gu L. Application of artificial intelligence in dentistry. doi:10.1109/IC3INA48034.2019.8949568
J Dent Res. 2021;100(3):232-244. doi:10.1177/0022034520969115 32. Leavitt L, Volovic J, Steinhauer L, et al. Can we predict orthodontic
13. Mohammad- Rahimi H, Nadimi M, Rohban MH, Shamsoddin E, extraction patterns by using machine learning? Orthod Craniofac
Lee VY, Motamedian SR. Machine learning and orthodontics, cur- Res. 2023;26:552-559.
rent trends and the future opportunities: a scoping review. Am J 33. Guo Y, Han X, Xu H, Ai D, Zeng H, Bai D. Morphological charac-
Orthod Dentofacial Orthop. 2021;160(2):170-192. doi:10.1016/j. teristics influencing the orthodontic extraction strategies for
ajodo.2021.02.013 Angle's class II division 1 malocclusions. Prog Orthod. 2014;15:1-7.
14. Bichu YM, Hansa I, Bichu AY, Premjani P, Flores-Mir C, Vaid NR. doi:10.1186/s40510-014-0 044-y
Applications of artificial intelligence and machine learning in ortho- 34. Evrard AS, Tepedino M, Cattaneo PM, Cornelis MA. Which factors
dontics: a scoping review. Prog Orthod. 2021;22(1):1-11. influence orthodontists in their decision to extract? A question-
15. Duran GS, Gökmen Ş, Topsakal KG, Görgülü S. Evaluation of naire survey. J Clin Exp Dent. 2019;11(5):e432-e438. doi:10.4317/
the accuracy of fully automatic cephalometric analysis soft- jced.55709
ware with artificial intelligence algorithm. Orthod Craniofac Res. 35. Bishara SE, Cummins DM, Jakobsen JR. The morphologic basis
2023;26:481-490. for the extraction decision in class II, division 1 malocclusions: a
16. Xie X, Wang L, Wang A. Artificial neural network modeling for de- comparative study. Am J Orthod Dentofacial Orthop. 1995;107(2):
ciding if extractions are necessary prior to orthodontic treatment. 129-135.
Angle Orthod. 2010;80(2):262-266. doi:10.2319/111608-588.1 36. Konstantonis D, Anthopoulou C, Makou M. Extraction decision and
17. Jung SK, Kim TW. New approach for the diagnosis of extractions identification of treatment predictors in class I malocclusions. Prog
with neural network machine learning. Am J Orthod Dentofacial Orthod. 2013;14:1-8. doi:10.1186/2196-1042-14-47
Orthop. 2016;149(1):127-133. doi:10.1016/j.ajodo.2015.07.030 37. Iared W, da Silva EMK, Iared W, Macedo CR. Esthetic perception
18. Li P, Kong D, Tang T, et al. Orthodontic treatment planning based of changes in facial profile resulting from orthodontic treatment
on artificial neural networks. Sci Rep. 2019;9(1):2037. doi:10.1038/ with extraction of premolars. J Am Dent Assoc. 2017;148(1):9-16.
s41598-018-38439-w doi:10.1016/j.adaj.2016.09.004
19. Takada K. Artificial intelligence expert systems with neural net- 38. Hodges A, Rossouw PE, Campbell PM, Boley JC, Alexander RA,
work machine learning may assist decision-making for extractions Buschang PH. Prediction of lip response to four first premolar
in orthodontic treatment planning. J Evid Based Dent Pract. extractions in white female adolescents and adults. Angle Orthod.
2016;16(3):190-192. doi:10.1016/j.jebdp.2016.07.002 2009;79(3):413-421. doi:10.2319/050208-247.1
20. Suhail Y, Upadhyay M, Chhibber A. Machine learning for the diag- 39. Erdinc AE, Nanda RS, Dandajena TC. Profile changes of pa-
nosis of orthodontic extractions: a computational analysis using tients treated with and without premolar extractions. Am J
ensemble learning. Bioengineering. 2020;7(2):55. doi:10.3390/ Orthod Dentofacial Orthop. 2007;132(3):324-331. doi:10.1016/j.
bioengineering7020055 ajodo.2005.08.045
|
16016343, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ocr.12811 by Cochrane Palestinian Territory, Wiley Online Library on [22/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 KÖKTÜRK et al.