0% found this document useful (0 votes)
79 views7 pages

Bug Prediction Optimization with ML

This document proposes an OPABP model to optimize machine learning parameters to improve bug prediction accuracy. It compares several machine learning algorithms on a bug report dataset to determine the optimal algorithm based on various performance metrics. The findings show that SVM achieved significantly higher accuracy than other algorithms for this dataset.

Uploaded by

Tripti Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views7 pages

Bug Prediction Optimization with ML

This document proposes an OPABP model to optimize machine learning parameters to improve bug prediction accuracy. It compares several machine learning algorithms on a bug report dataset to determine the optimal algorithm based on various performance metrics. The findings show that SVM achieved significantly higher accuracy than other algorithms for this dataset.

Uploaded by

Tripti Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

OPABP-Optimizing Parameters,

to Improve Accuracy in Bug


Prediction using Machine
Learning
Sapna Arora
Nidhi Srivastava Manisha Agarwal
CSE
CSE CSE
IILM university
Banasthali Vidyapith Banasthali Vidyapith
Gurugram, India
Rajasthan, India Rajasthan, India
[email protected]
[email protected] [email protected]

Tripti Lamba
CSE
Chandigarh University
Punjab,India
[email protected]
for an analyst to work on and compare to the entire system. It is
Abstract—Predicting a bug and attaining a successful
application is critical in today's scenario during the
always recommended to perform resource-intensive, time-
development phase of a program. This can only be consuming, and expensive sedation activities[10].
accomplished by foreseeing some of the shortcomings in the
early stages of development, resulting in software that is
dependable, efficient, and of high quality. A challenging aspect
is to develop a sophisticated model capable to determine the
error and producing effective software. A few ML methods are
utilized to achieve this, and they produce accuracy with both
trained and test datasets. The novelty of this approach is to
demonstrate the applicability of machine learning algorithms
namely Neural Network, SVM, Decision Tree and Cubist in
using different performance metrics i.e. R, R square, Root
Mean Square Error, Accuracy and obtaining the optimal
outcome-based algorithm for a Bug report on diversion dataset
from PROMISE repository. Findings reveal that SVM is
giving significantly higher accuracy among all the algorithms
in the ANT dataset and integrates the existing work on
detecting a bug in software by providing information about
various aforementioned methods in bug prediction The
proposed work is highlighting the accuracy obtained by the
current approaches that are significant for research scholars
and solution providers. Fig. 1: Notable data development process.

Keywords— Machine Learning, SVM, Software The study's objective is to determine the best bug detection
Performance Metrics, Accuracy, Bug Prediction. algorithm using machine learning, evaluate the accuracy of the
entire algorithm, and compare them. The optimal algorithm will
I. INTRODUCTION make it simple for the user to evaluate the findings[11].
To address issues related to ever-larger and more The paper is structured in sections: Section II addresses related
complicated data sets, data science and machine learning literature, and Section III is elaborating on the OPABP model
approach i.e. supervised learning[1], unsupervised structuring. Section IV includes all of the statistical analysis used
learning[2] and reinforcement learning[3] are now widely for ML, and Section V includes the analysis results evaluated
used throughout the science and engineering fields[4]–[6]. using ML, Section VI is containing the conclusion and highlighted
The optimization, analysis, control, and design of the the future work of the research.
proposed system or process that provides data collection are
frequent issues that are posed to scientists and engineers [7]. II. RELATED WORK
Data sets with millions of samples each and as many For the software defect problem, a hybrid classifier is suggested
features as feasible can be used to describe many machine for five NASA datasets[12], the suggested classifier's performance
learning tasks. To select an efficient strategy and effectively is contrasted with that of competing algorithms. Instead of trying
simulate the system at hand, it is essential to comprehend to find a better classifier, more attention should be paid to data
the nature of the problem. Self-driving automobiles, speech pretreatment, feature selection, and other data mining
recognition, and facial recognition are a few examples of approaches[13]. Due to its long-term practical need, High Impact
complicated issues that call for numerous ways to be solved Bug Report prediction is an essential research issue. The author X.
[8]. Wu et al.[14] discussed high impact bug predictor, an automated
method for locating particular categories of bug reports from huge
Bug prediction can be accomplished with the aid of machine bug repositories. For data labelling, the computer-human
learning and predictive analysis. Developers can make interaction mode and active learning are used to reduce effort. The
improvements as they create code by integrating the prediction
most statistically effective strategy frequently comes from one of
models into their development environments. Even though
the many “newly" produced combinations, indicating that the
models can't be created in a manner that comes close to
state-of-the-art transfer learning and classification combinations
perfection. However some, inaccurate predictions are
are still far from being fully developed. Ke Li et al. [15]findings
unavoidable. There are two types of these incorrect
offer insightful information that the practitioners in this particular
predictions: those that incorrectly label clean code as buggy
research sector can use. Also talked about a sophisticated
and those that incorrectly label buggy code as clean. Obtaining
optimizer for Cross Project Defect Prediction based on such that
an ideal model that balances the incorrect predictions is crucial
explores the parameter space for the transfer learning part. U. Ali
to inspiring developers to trust the model. The models have
et al.[16] suggested a classification framework for the
been studied in terms of their level of accuracy and complexity
identification of software modules that is likely to contain defects.
despite the lack of common benchmarks for model
comparison[9]. The accuracy of the model is greatly influenced The researchers' main efforts to boost performance were feature
by the selected metrics and this becomes the most important selection and variant-based ensemble classification. The
step in the bug prediction. The method becomes more difficult framework's findings are contrasted with those of other popular
as the number of metrics in the model increases. The inclusion supervised classifiers from academic studies. A. Panichella et
of pointless measurements can significantly reduce accuracy. al.[17] prepare cases for present techniques in defect prediction
and are trained on tasks that are unrelated to their intended use,
Software development challenges represent a learning process they may not perform to their full potential. While the true goal is
that varies depending on the conditions and the stages of to rate them and make affordable forecasts, current approaches
development in which we find and can easily detect the based on statistical models are trained to find the best match to
problem. Through Fig. 1, it has been shown how the data estimate the raw number of flaws in artifacts.
development process is carried out on three levels i.e. Level 1,
Level 2, and Level 3. At Level 1, data filtration and extraction III. PROPOSED MODEL
have been performed, and then all of the extracted data is
dissected into Training, Testing, and Validation on the data in Regression is one of the machine learning techniques for
Level 2, and Level 3, which provides the actual notable data determining the relationship between relevant variables;
specifically, regression allows for the selection of the curve that cyclomatic complexity
best fits the available data. Many regression techniques[18] are 19. MOA Measure of Aggregation
available for resolving the engineering problem. The goal of the 20. LOC Lines of Code
regression is to reduce the total squared errors (least squares)[9]. 21. Bug Bug
In OPABP model entire process is divided into four different
stages which are shown in Fig. 2 wherein the first stage the data
is acquired from a reputed promise repository and in the next B. DATA PREPROCESSING
stage the data preprocessing which can be achieved by applying The generalization performance of ML algorithm is
some feature selection technique next stage is working with the frequently influenced by the data preprocessing. One of the
data modelling that can be achieved with the data metrics and in most challenging inductive ML challenges is the removal of
the last data visualization is performed. noise instances[20]. Another frequently addressed concern in
data preprocessing is missing data handling. It is generally
best to determine well-known data preprocessing methods
are like data normalization, feature selection, and training &
Testing of data. Feature selection forms the foundation for
ML, it contributes to feature measure or assessment criterion
in data model[21]. Boruta [22] deals with the issue of
increasing the system's randomization. The basic concept is
pretty straightforward: simply duplicate the system using
randomization, combine it with the original, and then
develop a classifier for this expanded system[23]. Then
contrast it with that of the randomized variables to determine
the variable's significance in the original system. Only
variables are considered important if their importance
exceeds that of the randomized variables[24].
After implementing Boruta in OPABP on dataset matrices
that help in selecting and finding important variables which are
shown in Fig. 3 to Fig. 9 for all datasets used in the paper it is
seen that all the datasets are having a different number of
important variables i.e. In ANT dataset 13 matrices are selected,
Camel 1.6 dataset 12 matrices are selected, Iucene dataset 16
data matrices s got selected, Poi3 dataset 14 data matrices got
selected, synapse dataset 10 data matrices are selected, Tomcat
Fig. 2 Working layout used in OPABP. dataset only 7 data matrices got selected and Velocity dataset 11
data matrices got selected. After the feature selection process,
A. DATA ACQUISITION the training and testing are done by the random sampling
For the purpose of research, Bug dataset[19] is used, method and take a ratio of bugged and not bugged instances,
where 20 metrics training an ML algorithm to predict labels from characteristics,
i.e.WMC,MFA,DIT,CAM,NOC,IC,CBO,CBM,RFC,AMC, tweaking it for the business need, and verifying it on outlier data
LCOM,Ca,LCOM3,Ce,NPM,Max_cc,DAM,Avg_CC,MOA are all part of the modeling process. Training and testing ratio of
,LOC are used as the features (i.e., independent variables) 80:20 has been taken into consideration and this always helps in
and the metric “bug” is used as the response or dependent enhancing the learning procedure.
variable a detailed illustration of the same is given in
TABLE 1 . The features variables (i.e., independent
variables) are 20 metrics, and the response (or dependent)
parameter is the number of bugs.

TABLE 1. The java metrics in the bug prediction dataset.


S No. Metrics Description
1. WMC Weighted methods per class
2. MFA A measure of Functional
Abstraction
3. DIT Depth of Inheritance Tree
4. CAM Cohesion Among Methods of
Class
5. NOC Number of Children IC
Inheritance Coupling
6. IC Inheritance Coupling
7. CBO Coupling between object Fig. 3 Showing Important Variables of Ant Dataset after
classes implementing Boruta.
8. CBM Coupling Between Methods
9. RFC Response for a Class
10. AMC Average Method Complexity
11. LCOM Lack of cohesion in methods
12. Ca Afferent couplings
13. LCOM3 Lack of cohesion in methods
14. Ce Efferent couplings
15. NPM Number of Public Methods
16. Max_CC Maximum of McCabe's
cyclomatic complexity
17. DAM Data Access Metric
18. Avg_CC Average of McCabe's
Fig. 8 Showing Important Variables of Tomcat Dataset after
Fig. 4 Showing Important Variables of Camel 1.6 Dataset after implementing Boruta.
implementing Boruta.

Fig. 9 Showing Important Variables of Velocity1.6 Dataset after


Fig. 5 Showing Important Variables of Ivcene Dataset
after implementing Boruta. implementing Boruta.

C. DATA MODELING
Regression algorithms are being used in the research to get the
expected outcomes from the existing data. Different regression
algorithms used in this paper are Neural Network (NN),
Support Vector Machine (SVM), Decision Tree (DT) and
Cubist with the help of all the above algorithms results are
evaluated.

• Neural Network (NN): Neural Network can also be


defined as the number of hidden nodes included in the
model or the number of inputs and outputs present in
each node[25].
Fig. 6 Showing Important Variables of Poi3 Dataset after • Support Vector Machine (SVM): Each data point in
implementing Boruta. the SVM algorithm is plotted as a point in n-
dimensional space (where n is no. of features in a
dataset). The classification is then carried out by
locating the hyper-plane that best distinguishes the two
classes.
• Decision Tree (DT): The decision Tree algorithm
begins at the root node and takes a step up to predict
the class of a given dataset. This algorithm follows the
branch and jumps to the mentioned node based on a
comparison of the values of the data.
• Cubist: Cubist is constructed using the predictors from
earlier splits. Additionally, there are intermediate
linear models at every stage of the tree, and by
Fig. 7 Showing Important Variables of Synapse Dataset combining them one can create a tree with various
after implementing Boruta. rates of growth.
D. MODEL EVALUATION USING PERFORMANCE
METRICS
Different performance indicators are used for ML tasks in
regression. There are numerous metrics for the problems, and
the OPABP model performance can be determined by Root
Mean square error (RMSE, Eq.(2))[26], [27] is used to
calculate how close the regression line is to a set of points,
coefficient of determination are used to assess the model's
correctness during training and validation R squared (R2, Eq.
(1)) and Accuracy (Eq. 3) is the proportion of correct
predictions over total predictions are employed together for Cubist -0.05 0 0.75 83.65
the performance of the models. Mathematical expressions of Camel 1.6 Dataset
several measures are given below:
R R2 RMSE Accuracy
Neural
r  1   ( yi  yˆi ) ( yi  y )
2 2 2
Network -0.05 0 0.8 89.69
(1)
Decision
Tree 0.02 0 0.79 85.57
SVM 0.38 0.14 0.61 90.7
Where:
Cubist 0.55 0.3 0.7 90.86
yi  y
ˆi
= actual y value – predicted y Iucene Dataset
value
ˆi
y = predicted value R R2 RMSE Accuracy
t
Neural
P  A 
2
i i
Network 0.11 0.01 1.8 33.33
RMSE  i 1
(2) Decision
t
Tree 0.64 0.41 1.4 47.83
Where:
t = number of items SVM 0.64 0.41 1.24 56.2
A = Actual observed value Cubist 0.66 0.44 1.15 58.48
P= Predicted value
Poi3 Dataset
corrt. _ prediction
accuracy  R R2 RMSE Accuracy
total _ prediction (3) Neural
Network 0.1 0.01 0.91 75.28
Decision
E. DATA VISUALIZATION Tree 0.6 0.36 0.82 87.64
SVM 0.4 0.16 0.84 80.23
The process of selecting the best model from a group of
effective models can be reduced to simply identifying the Cubist 0.51 0.26 0.82 81.53
portions of the model that provide the highest accuracy or
Synapse Dataset
lowest loss while ensuring that the model doesn't overfit [28].
The OPABP model visualization generates actions by R R2 RMSE Accuracy
applying previously learned information to new input. In Neural
TABLE 2 the average size of bugs which were seen earlier is Network 0.31 0.1 0.57 84.62
helping in data visualization. Decision
Tree 0.51 0.26 0.63 82.69
SVM 0.38 0.14 0.53 87.38
TABLE 2: Java dataset used in OPABP with average no of bugs.
Cubist 0.58 0.34 0.54 87.6
System No. of Avg no. of
#version classes Bugs Tomcat Dataset
Ant 338 19.58
R R2 RMSE Accuracy
Camel 696 18.87 Neural
Ivcene2.4 235 24.92 Network 0.32 0.1 0.16 99.42
Decision
Poi3 345 49.82 Tree 0.43 0.18 0.23 93.6
Synapse 212 23.6 SVM 0.41 0.17 0.18 96.51
Tomcat 858 8.97
Cubist 0.42 0.18 0.16 97.21
Velocity 213 58.47
Velocity Dataset
IV. ANALYSIS
R R2 RMSE Accuracy
There are seven datasets (Ant, Camel, Iucene 2.4, Poi3, Neural
Synapse, Tomcat, and Velocity) in OPABP detailed Network 0.25 0.06 1 58.7
information is already discussed and on every dataset, four Decision
different regression algorithms i.e. Neural Network[29], Tree 0.36 0.13 1.06 73.91
SVM[30], Decision Tree[31] and cubist[32] were being SVM 0.45 0.2 0.79 77.17
implemented, and with the help of the Mean Square Error, R
square and accuracy is being calculated on each dataset. Also, Cubist 0.45 0.2 0.72 79.13
it is being found out the algorithm that is showing the best
accuracy after judging their values. In TABLE 3 it can be seen that there are different
TABLE 3. Results of different Performance Measures. performance measures which are evaluated on 7 different
datasets and with the implementation of four different
Ant Dataset algorithms. Feature selection is done on every dataset and
R R2 RMSE Accuracy important features only performance measure is calculated.
Neural
Network 0.02 0 0.76 88.2 V. RESULTS
Decision
This paper uses different supervised ML algorithms: Neural
Tree 0.52 0.27 0.5 88
Network (NN), Support Vector Machine (SVM), Decision Tree
SVM 0.93 0.86 0.1 100 (DT), and Cubist for the analysis and access of the data. The
paper also includes a comparative examination of ML [3] P. Dayan and Y. Niv, “Reinforcement learning: The Good, The Bad
algorithms and demonstrates the performance accuracy and and The Ugly,” Curr. Opin. Neurobiol., vol. 18, no. 2, pp. 185–196, 2008.
capabilities in software bug prediction. SVM is found to be the [4] S. Arora, M. Agarwal, S. Mongia, and R. Kawatra, “PSRE Self-
most précised model with the highest value of accuracy. assessment Approach for Predicting the Educators’ Performance Using
Whereas, in some places, Decision Tree and Neural Network Classification Techniques,” in Communications in Computer and Information
also come up with better values and for some datasets, Cubist Science, 2022, vol. 1546 CCIS, pp. 405–423.
is also producing significant values as shown in TABLE 4 and [5] S. Arora, M. Agarwal, and S. Mongia, “Comparative Analysis of
accuracy values used for the different ML algorithms are Educational Job Performance Parameters for Organizational Success: A
graphically shown in Fig. 10. Review,” in Proceedings of the International Conference on Paradigms of
TABLE 4. Comparing the Data Model in respect of Computing, Communication and Data Sciences, 2021, pp. 105–121.
Accuracy
Ant Camel Ivcene Poi3 Synapse Tomcat Velocity [6] S. Arora, R. Kawatra, and M. Agarwal, “An Empirical Study - The
1.6 Cardinal Factors towards Recruitment of Faculty in Higher Educational
NN 88.20 89.69 33.33 75.28 84.62 99.42 58.70 Institutions using Machine Learning,” in Proceedings of the 8th International
Conference on Signal Processing and Integrated Networks, SPIN 2021, 2021,
DT 88.00 85.57 47.83 87.64 82.69 93.60 73.91 pp. 491–497.
SVM 100.00 90.70 56.20 80.23 87.38 96.51 77.17 [7] J. P. C. Kleijnen and R. G. Sargent, “A methodology for fitting and
validating metamodels in simulation,” Eur. J. Oper. Res., vol. 120, no. 1, pp.
Cubist 83.65 90.86 58.48 81.53 87.60 97.21 79.13
14–29, 2000.

[8] Z. T. Wilson and N. V. Sahinidis, “The ALAMO approach to


machine learning,” Comput. Chem. Eng., vol. 106, pp. 785–795, 2017.

[9] S. Puranik, P. Deshpande, and K. Chandrasekaran, “A Novel


Machine Learning Approach for Bug Prediction,” Procedia Comput. Sci., vol.
93, no. September, pp. 924–930, 2016.

[10] L. Kumar and A. Sureka, “Feature Selection Techniques to Counter


Class Imbalance Problem for Aging Related Bug Prediction: Aging Related
Bug Prediction,” in Proceedings of the 11th Innovations in Software
Engineering Conference, 2018.

[11] A. Hammouri, M. Hammad, M. Alnabhan, and F. Alsarayrah,


“Software Bug Prediction using machine learning approach,” Int. J. Adv.
Comput. Sci. Appl., vol. 9, no. 2, pp. 78–83, 2018.

[12] G. J. Sabolish and J. R. Callahan, “NASA/WVU Software Research


Laboratory, 1995,” 1995.

[13] Ö. F. Arar and K. Ayan, “Software defect prediction using cost-


sensitive neural network,” Appl. Soft Comput. J., vol. 33, pp. 263–277, 2015.
Fig. 10 Comparing the Data Model in respect of Accuracy.
[14] X. Wu, W. Zheng, X. Chen, Y. Zhao, T. Yu, and D. Mu,
“Improving high-impact bug report prediction with combination of interactive
VI. CONCLUSION machine learning and active learning,” Inf. Softw. Technol., vol. 133, no.
October 2020, p. 106530, 2021.
Software bug prediction is a technique that uses existing data to
develop a prediction approach to determine potential software [15] K. Li, Z. Xiang, T. Chen, S. Wang, and K. C. Tan, “Understanding
defects in the future. In this research, various datasets, metrics, the automated parameter optimization on transfer learning for cross-project
and performance measures are evaluated. The four machine defect prediction: An empirical study,” Proc. - Int. Conf. Softw. Eng., pp.
566–577, 2020.
learning methods used are NN, SVM, DT, and Cubist. Testing
datasets are used to carry out the evaluation process. Based on [16] U. Ali, S. Aftab, A. Iqbal, Z. Nawaz, M. S. Bashir, and M. A.
different outcomes, performance metrics i.e. measures accuracy, Saeed, “Software defect prediction using variant based ensemble learning and
R square and RMSE, experimental findings are compiled. feature selection techniques,” Int. J. Mod. Educ. Comput. Sci., vol. 12, no. 5,
pp. 29–40, 2020.
Findings indicated that:
 ML techniques are effective methods for anticipating [17] A. Panichella, C. V. Alexandru, S. Panichella, A. Bacchelli, and H.
software problems in the future. C. Gall, “A search-based training algorithm for cost-aware defect prediction,”
 The comparison performed in the paper demonstrated GECCO 2016 - Proc. 2016 Genet. Evol. Comput. Conf., pp. 1077–1084, 2016.
that the SVM classifier outperformed and gives the [18] J. García-Gutiérrez, F. Martínez-Álvarez, A. Troncoso, and J. C.
highest accuracy as compared to other classifiers. Riquelme, “A comparison of machine learning regression techniques for
 Outcomes of the research shows that NN, DT and cubist LiDAR-derived estimation of forest variables,” Neurocomputing, vol. 167, pp.
outperforms in terms R square and RMSE results for 24–31, 2015.
prediction model as discussed in Table 4. [19] R. Ferenc, Z. Tóth, G. Ladányi, I. Siket, and T. Gyimóthy, “A
In future research, we might use other ML methods and provide a public unified bug dataset for java and its assessment regarding metrics and
thorough comparison of them and with the best method; we bug prediction,” Softw. Qual. J., vol. 28, no. 4, pp. 1447–1506, 2020.
will try to develop a new framework. Another way to
[20] S. B. Kotsiantis and D. Kanellopoulos, “Data preprocessing for
improve the prediction model's accuracy can be by including supervised leaning,” Int. J. …, vol. 1, no. 2, pp. 1–7, 2006.
more software metrics in the learning process.
[21] J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selection in machine
REFERENCES learning: A new perspective,” Neurocomputing, vol. 300, pp. 70–79, 2018.

[1] Y. M. Goh, C. U. Ubeynarayana, K. L. X. Wong, and B. H. W. [22] M. B. Kursa and W. R. Rudnicki, “Feature selection with the boruta
Guo, “Factors influencing unsafe behaviors: A supervised learning package,” J. Stat. Softw., vol. 36, no. 11, pp. 1–13, 2010.
approach,” Accid. Anal. Prev., vol. 118, pp. 77–85, 2018.
[23] C. Selvaraj, N. Bhalaji, and K. B. Sundhara Kumar, “Empirical
[2] C. L. Philip Chen and S. R. LeClair, “Integration of design and study of feature selection methods over classification algorithms,” Int. J. Intell.
manufacturing: solving setup generation and feature sequencing using an Syst. Technol. Appl., vol. 17, no. 1/2, p. 98, 2018.
unsupervised-learning approach,” Comput. Des., vol. 26, no. 1, pp. 59–75,
1994. [24] M. B. Kursa, A. Jankowski, and W. R. Rudnicki, “Boruta - A
system for feature selection,” Fundam. Informaticae, vol. 101, no. 4, pp.
271–285, 2010.

[25] E. Chemali, P. J. Kollmeyer, M. Preindl, and A. Emadi, “State-of-


charge estimation of Li-ion batteries using deep neural networks: A machine
learning approach,” J. Power Sources, vol. 400, pp. 242–255, 2018.

[26] S. Arora, M. Agarwal, and R. Kawatra, “Prediction of


educationist’s performance using regression model,” in Proceedings of the
7th International Conference on Computing for Sustainable Global
Development, INDIACom 2020, 2020, pp. 88–93.

[27] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or


mean absolute error (MAE)? -Arguments against avoiding RMSE in the
literature,” Geosci. Model Dev., vol. 7, no. 3, pp. 1247–1250, 2014.

[28] A. Vellido, “The importance of interpretability and visualization in


machine learning for applications in medicine and health care,” Neural
Comput. Appl., vol. 32, no. 24, pp. 18069–18083, 2020.

[29] D. F. Cook, C. T. Ragsdale, and R. L. Major, “Combining a neural


network with a genetic algorithm for process parameter optimization,” Eng.
Appl. Artif. Intell., vol. 13, no. 4, pp. 391–396, 2000.

[30] Q. Wen, Z. Yang, Y. Song, and P. Jia, “Automatic stock decision


support system based on box theory and SVM algorithm,” Expert Syst.
Appl., vol. 37, no. 2, pp. 1015–1022, 2010.

[31] M. Batra and R. Agrawal, “Comparative analysis of decision tree


algorithms,” in Advances in Intelligent Systems and Computing, 2018, vol.
652, pp. 31–36.

[32] H. Nguyen, X. N. Bui, Q. H. Tran, and N. L. Mai, “A new soft


computing model for estimating and controlling blast-produced ground
vibration based on Hierarchical K-means clustering and Cubist algorithms,”
Appl. Soft Comput., vol. 77, pp. 376–386, 2019.

You might also like