Detectron2 Predictions Analysis
Detectron2 Predictions Analysis
Abstract: Accurately estimating software effort is probably the biggest challenge facing software developers.
Estimates done at the proposal stage has high degree of inaccuracy, where requirements for the scope are not defined
to the lowest details, but as the project progresses and requirements are elaborated, accuracy and confidence on
estimate increases. It is important to choose the right software effort estimation techniques for the prediction of
software effort. Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been used using China
dataset for prediction of software effort in this work. The performance indices Sum-Square-Error (SSE), Mean-
Square-Error (MSE), Root-Mean-Square-Error (RMSE), Mean-Magnitude-Relative-Error (MMRE), Relative-
Absolute-Error (RAE), Relative-Root-Square-Error (RRSE), Mean-Absolute-Error (MAE), Correlation Coefficient
(CC), and PRED (25) have been used to compare the results obtained from these two methods.
Keywords: Software effort estimation, Machine Learning Techniques, Artificial Neural Network, and Support Vector
Machine.
1. Introduction
The Software effort estimation methods are mainly categorized in to algorithmic and non-algorithmic. The algorithmic
methods are mainly COCOMO, Function Points and SLIM. Theses methods are also known as parametric methods
because they predict software development effort using a formula of fixed form that is parameterized from historical
data.The algorithmic methods require as input attributes such as experience of the development team, the required
reliability of the software, the programming language in which the software is to be written, an estimate of the final
number of delivered source line of code (SLOC), complexity and so on which are difficult to obtain during the early
stage of a software development life cycle (SDLC). They have also difficulty in modelling the inherent complex
relationships [9].The limitations of algorithmic methods compel us to the exploitation of non-algorithmic methods which
are soft computing based. These methods have advantage of
1. Ability to learn from previous data.
2. Able to model complex relationship between the dependent (effort) and independent variables.
3. Ability to generalize from the training dataset thus enabling it to produce acceptable result from previous unseen
data.
2. Related Work
A lot of research has been done using machine learning techniques like Artificial Neural Networks, Decision Tree,
Linear Regression, Support Vector Machine, Fuzzy Logic, Genetic Algorithm, Empirical Techniques, and Theory based
techniques for predicting the software effort. The paper by FINNIE and WITTIG [4], has examined the potential of two
artificial intelligence approaches i.e. Artificial Neural Networks (ANNs) and Case Based Reasoning (CBR), for creating
development effort estimation models using the dataset Australian Software Metrics Association (ASMA). Also, the
potential of Artificial Neural Networks (ANNs) and Case Based Reasoning (CBR), for providing the basis for
development effort estimation models in contrast to regression models is examined by the same author [3]. The authors
concluded that Artificial Intelligence Models are capable of providing adequate estimation models. Their performance is
to a large degree dependent on the data which they have trained, and the extent to which suitable project data is available
will determine the extent to which adequate effort estimation models can be developed.
The paper proposed by TOSUN, et.al. [1], a novel method for assigning weights to features by taking their
particular importance on cost in to consideration. Two weight assignment heuristics are implemented which are inspired
by a widely used statistical technique called Principal Component Analysis (PCA).
The paper by ELISH [6], empirically evaluates the potential and accuracy of MART as a novel software effort
estimation model when compared with recently published models i.e. Radial Basis Function (RBF) neural networks,
linear regression, and Support Vector regression models with linear and RBF kernels. The comparison is based on a well
known NASA software project dataset.
The paper by Martin, et. al., [10], describes an enhanced Fuzzy Logic model for the estimation of software
development effort and proposed a new approach by applying Fuzzy Logic for software effort estimates.
3. Research Methodology
The Artificial Neural Network (ANN) and Support Vector Machine (SVM) learning techniques have been used for
predicting the software effort using China dataset of software projects in order to compare the performance results
obtained from these models.
(A) Empirical Data Collection
The data we have used is China Dataset. This data is obtained from PROMISE (PROMISE = PRedictOr Models In
Software Engineering) Data Repository [7]. The mostly used software data sets for software Effort Predictions are China,
Maxwell, NASA, Finnish, Telecom, Kemerer and Desharnais.
The China Dataset consists of 19 features, 18 independent variable and 1 dependent variables. It has 499 instances
correspond to 499 projects. The descriptive statistics of China data set is appended at Table 1.
Table 1 China Data Set Statistics
S N Variables Min Max Mean Standard Deviation
Pij Ai
i 1
Ej = n
Ai Am
i 1
Where Pij = Predicted value by the individual data set j for data point i.
Ai = Actual value for data point;
n = Total number of data points;
Am = Mean of all Ai
6. Root Relative Squared Error (RRSE)
The root relative squared error of individual data set j is defined as
n
Pij Ai 2
Root Relative Squared Error in1
Ai Am
2
i 1
Where Pij = Predicted value by the individual dataset j for data point in i;
Ai = Actual value for the data point i ;
n = Total number of data points;
Am =Mean of all Ai;
7. Mean Absolute Error (MAE)
The mean absolute error measures of how far the estimates are from actual values. It could be applied to any two pairs of
numbers, where one set is “actual” and the other is an estimate prediction.
1 n
MAE = Pi Ai
n i 1
Where Pi = Predicted value for data point i
Ai = Actual value for data point i
9. PRED (A)
It is calculated from the relative error. It is defined as the ratio of data points with error less than equal to A to the total
number of data points. Thus, higher the value of PRED (A), the better it is considered.
d
PRED (A) =
n
d = value of MRE where data points have less than or equal to A error.
4. Result Analysis
The Artificial Neural Network (ANN) and Support Vector Machine (SVM) machine learning techniques have been used
for predicting the software efforts using China dataset. Nine performance indices have been used in order to compare the
results obtained from these models. These indices are Sum-Square-Error (SSE), Mean-Square-Error (MSE), Root-Mean-
Square-Error (RMSE), Mean-Magnitude-Relative-Error (MMRE), Relative-Absolute-Error (RAE), Relative-Root-
Square-Error (RRSE), Mean-Absolute-Error (MAE), Correlation Coefficient (CC), and PRED(25). The model
possessing the lower values of SSE, MSE, MMRE, RMSE, RAE, MAE, and RRSE and the higher values of correlation
coefficient and PRED (25) is considered to be the best among others.
Table 2 Comparison of Performance indices with Artificial Neural Network and Support Vector Machine
Artificial Neural Support Vector
SN Performance Network (ANN) Machine (SVM)
Measures One Two Linear ANOVA
Hidden Hidden Kernel Kernel
Layer Layer
1 Sum Square Error 0.04490 0.06440 0.0183 0.0187
(SSE)
2 Mean Square 0.00045 0.00064 0.0002 0.0002
Error (MSE)
3 Root Mean Square 0.02120 0.02540 0.0135 0.0137
Error (RMSE)
4 Mean Magnitude 0.07630 0.11120 0.2023 0.1879
Relative Error
(MMRE)
5 Relative Absolute 0.05650 0.08710 0.0843 0.0842
Error (RAE)
6 Root Relative 0.02180 0.03120 0.0089 0.0090
Squared Error
(RRSE)
7 Mean Absolute 0.00460 0.00710 0.0069 0.0069
Error (MAE)
8 Pred(25) 0.92000 0.90000 0.7500 0.7800
9 Correlation 0.99350 0.99370 0.9960 0.9959
Coefficient
MATLAB programs were developed for training and testing of various models and also for computation of performance
indices. The results are tabulated in Table 2 and plotted in Figures 4.1-4.2. In these plots, the blue curve represents the
curve for the actual value and red curve represents the curve for the predicted values. The more the closeness between the
curves for actual and predicted output values, the lesser is the error and hence better is the model.
Comparison between Target and Predicted Values
0.9
Target Value
0.8 Predicted Value
0.7
Target and Predicted Values
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Data Set
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Data Set
Figure 4.1(B): Target and Predicted values using Artificial Neural Networks with Two Hidden layers
x 10
4 Comparison between Target and Predicted Values
5
Target Value
4.5 Predicted Value
4
Target and Predicted Values
3.5
2.5
1.5
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Data Set
Figure 4.2 (A): Target and Predicted values using Support Vector Machine with Linear Kernel
x 10
4 Comparison between Target and Predicted Values
5
Target Value
4.5 Predicted Value
4
Target and Predicted Values
3.5
2.5
1.5
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Data Set
Figure 4.2(B): Target and Predicted values using Support Vector Machine with ANOVA Kernel
As shown in Table 2, the Artificial Neural Network with one hidden layer and Support Vector Machine with ANOVA
kernel show the best results, and the former is better than the latter in accordance with most of the performance indices.
5. Conclusion
The Artificial Neural Network (ANN), and Support Vector Machine (SVM) learning techniques have been used to
analyze the results using China dataset for predicting software development effort. A similar study can be carried out to
predict software effort using prediction models based on other machine learning algorithms such as Genetic Algorithms
(GA) and Random Forest (RF) techniques. Cost benefit analysis of models may be carried out to determine whether a
given effort prediction model would be economically viable.
References:
[1] A.Tosun, B. Turhan and A.B. Bener, “Feature Weighting Heuristics for Analogy- based Effort Estimation Models,”
Expert Systems with Applications, vol. 36, pp.10325-10333, 2009.
[2] C.J. Burgess and M.Lefley, “Can Genetics Programming improves Software Effort Estimation? A Comparative
Evaluation,” Information and Software Technology, vol.43, pp.863-873, 2001.
[4] G. R. Finnie and G.E. Wittig, “AI Tools for Software Development Effort Estimation,” Proceedings of the
International Conference on Software Engineering: Education and Practice (SEEP’ 96).
[5] K. Srinivasan and D. Fisher, “Machine Learning Approaches to Estimating Software Development Effort,” IEEE
Transactions on Software Engineering, vol.21, Feb.1995.
[6] M. O. Elish, “Improved Estimation of Software Project Effort using Multiple Additive Regression Tree,” Expert
Systems with Applications, vol.36, pp. 10774-10778, 2009.
[7] G. Boetticher, T. Menzies and T. Ostrand , PROMISE Repository of Empirical Software Engineering data
http://promisedata.org/repository, West Virginia University, Department of Computer Science, 2007.
[8] R. Malhotra, A. Jain, “Software Effort Prediction using Statistical and Machine Learning Methods,” International
Journal of Advanced Computer Science and Applications, vol.2, No.1, January 2011.
[9] I. Attarzadeh and Siew Hock Ow, “Software Development Effort Estimation Based on a New Fuzzy Logic Model,”
International Journal of Computer Theory and Engineering, Vol. 1, No. 4, pp.1793-8201, October 2009.
[10] C. L. Martin, J. L. Pasquier and Cornelio Y M and Agustin G. T., “Software Development Effort Estimation using
Fuzzy Logic: A Case Study,” Proceedings of the Sixth Mexican International Conference on Computer Science
(ENC’05), IEEE Software, 2005.
[11] C. Mair, G.Kadoda, M. Lefley, K.P.C.Schofield, M. Shepperd and Steve Webster, “An Investigation of Machine
Learning Based Prediction Systems,” Empirical Software Engineering Research Group, Bournemouth University, U.K.
09 July, 1999.
[12] Bibi Stamatia and Stamelos Ioannis, “Selecting the Appropriate Machine Learning Techniques for Predicting of
Software Development Costs,” Artificial Intelligence Applications and Innovations, vol. 204, pp.533-540, 2006.
[13] Parag C. Pendharkar, “Probabilistic estimation of software size and effort,” An International Journal of Expert
Systems with Applications, vol. 37, pp.4435-4440, 2010.
[14] L. Radlinki and W. Hoffmann, “On Predicting Software Development Effort Using Machine Learning Techniques
and Local Data,” International Journal of Software Engineering and Computing, vol. 2, pp.123-136, 2010.