0% found this document useful (0 votes)
30 views

14 Apr

The document discusses using machine learning algorithms like Gradient Boosting and Support Vector Machines to estimate software defects. It proposes a method to reduce defects by training these algorithms on data sets to identify defective and non-defective modules, and generate predictions to alert developers. The method aims to improve software quality and accuracy.

Uploaded by

Siva Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

14 Apr

The document discusses using machine learning algorithms like Gradient Boosting and Support Vector Machines to estimate software defects. It proposes a method to reduce defects by training these algorithms on data sets to identify defective and non-defective modules, and generate predictions to alert developers. The method aims to improve software quality and accuracy.

Uploaded by

Siva Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Dogo Rangsang Research Journal UGC Care Group I Journal

ISSN : 2347-7180 Vol-13, Issue-4, April 2023


SOFTWARE DEFECT ESTIMATION USING MACHINE LEARNING ALGORITHMS

Mrs.N.Srilekha Assistant Professor, Department of CSE, Raghu Engineering College,


Visakhapatnam.
E.Supraja, A.Rajendra, B.Sony Students, Department of CSE, Raghu Engineering College,
Visakhapatnam.
1
[email protected] , [email protected] ,
3
[email protected] , [email protected]

Abstract : Software plays an important role in any project. So, it is important that it does not contain
defects and also ensures that it is reliable and has high performance. This project deals with identifying
software issues and help modify them to avoid serious damage. Here the project suggests a method to
reduce software defects. This is done by using machine learning algorithms. The algorithms used here
are Gradient Boosting and Support Vector Machines (SVM). These methods are then tested on various
data sets to obtain a result and identify the ones with defects and the ones without them. The proposed
system is user friendly making it easy for developers to give an input and then obtain a prediction of
the defects if they are present. This process will also alert developers to make software's more accurate
and improve the quality of the software. This project is a contribution to the software sector.

Keywords : Support Vector Machines,Gradient Boosting, Software,Machine Learning

I. INTRODUCTION:
Developing software is a very important part of today's industry. The defects in the software are a very
common issue that the developers face. So, recognizing these defects and solving them at an early
stage will not only reduce the work but will also reduce the time and cost.
Complex software's often tends to have errors; therefore, it is better to check the software at regular
intervals. Machine learning algorithms are extensively used in the software industry to understand and
identify patterns and predict outcomes. In recent times the concept of machine learning has shown
accurate and promising results. The objective of this project is to use these machine learning methods
and develop a code for identifying the software issues. With this we can improve the overall condition
of the software industry by improving accuracy and efficiency. The models used in these machine
learning algorithms are called classifiers which consist of attribute variables and also a single class
variable. After execution this model will predict whether or not a defect has occurred. In addition to
detection, it will help us understand the software features. We propose this scheme to help evaluate,
analyse and improve .

2. LITERATURE SURVEY:
1. Agasta,A.Ramchandran,M.arishlom,E.lionel,C.B Magnus (2014) [1]
A coding flaw can cause software to behave incorrectly, resulting in errors and ultimately, software
failure - a significant loss. Such errors, known as software faults, arise due to programming mistakes
and may be recoverable. Hardware detects these faults and forwards them to the corresponding
software handler for resolution. By predicting faults in advance, the development team gains an
additional opportunity to retest modules or files. This study explores a software defect prediction
method using a genetic algorithm-based approach for classification.
2. Erik Arisholm, Lionel C. Briand, Eivind B. Johannessen (2010)[2]

Page | 74 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023
This paper compares many data mining and machine learning techniques to build fault-free models,.It
assess the impact of using different data sets such as source code structural measures and change . It
also compares several alternative ways of assessing the performance of the models. The results of the
study indicate that the choice of this modeling technique has limited impact on the resulting cost-
effectiveness.
3. Naresh.E, Vijaya Kumar B.P, Sahana.P.Shankar920170[3]
Defect Prediction activities will be able to tell us where the defects lie in the software product. Various
data mining techniques are extensively used for the defect prediction method. This paper mainly
focuses on the comparison of the various techniques available and an idea as to where exactly to apply
what data mining techniques using the NASA MDP data sets. It is clearly seen from the survey
conducted in the experiment the Classification Techniques have been the main area of interest in
the recent years. Much of the research work and activities in the department of Software Defect
prediction is being carried out in identifying the techniques for classifying the modules as either
defective or correct.
4. Tingting Yu, Wei Wen, Xue Han, Jane Hayes (2018) [4]
Software quality assurance is an expensive concept it requires time and resources , as it delays a
product’s delivery to the market. This high-cost issue is more challenging in many of today’s available
software systems due to their complicated behaviors. Defect prediction techniques build models using
software data and use the models to predict whether the instances of code regions like files, changes,
and methods, contain defects. Based on these prediction results, developers can allocate limited testing
efforts more effectively to focus on the defect-prone modules and improve the efficiency and accuracy

Page | 75 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023
3. METHODOLOGY:
A. Proposed Method : The proposed method employs a deep neural network to extract features from
the image, which are then passed through a hash function to generate a digital signature. The signature
is compared to the original image signature to detect any discrepancies, indicating image tampering.
This was evaluated on a data set of real-world images with various types of manipulations, such as
copy-move, splicing, and removal. Experimental results demonstrate that the proposed method
achieves high accuracy in detecting image forgery,outperforming existing methods in terms of
accuracy and computational efficiency. This has potential applications in various domains, such as
forensics, security, and media authentication, and presents a promising approach to addressing the
growing concern of image forgery.
B. Proposed Architecture

Fig 1.Proposed Architecture of the given model

C. TECHNIQUES USED:
1. NAIVE BAYES :
Naive Bayes is based on Bayes theorem and is a a statistical classification algorithm. It makes
assumptions that say that the data set are independent which might not always be true .This works by
calculating the probability of a data point belonging to a certain class based on the probability of
features in that class. This theorem is a fast and simple algorithm which works well with larger data
sets with multiple features .
It helps classify text and filter spam , This can be applied to any classification problem where
independence assumption holds.

2. DECISION TREE :
A decision tree technique is used for solving classification and regression problems.It is a part of
machine learning algorithm . This determines a graphical representation of all possible chances to a

Page | 76 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023
decision based on features .The entire data set is represented by a root node. This algorithm then
chooses the important attribute to split data into nodes. This split is made on the basis of a feature
value which increases the information acquired or reduces the impurity , Here every sub-node is a
representation of a subset of the data with specific values of a chosen feature , The process of splitting
is repeated recursively . Then, the algorithm chooses the most important feature or attribute to split
the data into two or more sub-nodes. The split is made based on the feature value that maximizes the
information gain or minimizes the impurity of the data.

3. RANDOM FOREST :
This method is used for classification purposes. This machine learning algorithm is used in feature
selection tasks .It combines multiple predictions to give a accurate and stable prediction .
The basic idea behind the random forest algorithm is to create a large number of decision trees, each
of which is trained on a random subset of the data and a random subset of the features. This introduces
randomness into the model and helps to reduce over fitting, which is a common problem in decision
tree models.The most important advantage of random forest is it can deal with multiple features which
include categorical and numerical variables. It gives a measure of feature importance, which can be
used for feature selection or help understand the insights of underlying data . We can conclude that
random forest is powerful and versatile ,This method can be used in various problems because it gives
high accuracy with little tuning .

4. LOGISTIC REGRESSION :
This method is used in the analysis sector here data which has multiple variables are analyzed to
determine an outcome.It is used to model binary outcomes. This estimates the chance of the outcome
being a success as a function of an independent variable . The output is a logistic function which deals
from 0 to 1. This estimates the coefficient of independent variable.It determines the shape and location
of the function. Logistic regression can be extended to handle multi-class classification model.Here
the technique one-vs-all method is used and so the class with higher probability is chosen as predicted
class . It is a baseline model for multiple complex models .

5. SUPPORT VECTOR MACHINES (SVM) :


Support Vector Machines (SVM) is an ML algorithm which is used in the field of classification and
regression. It is model which helps find a hyper plant which will help maximize a margin of two
classes in a given data set.Data points are represented as vectors in the SVM technique . Support
vectors are the closest data points in the field of SVM. In comparison to other machine learning
algorithms SVM has several advantages .

SVM can get expensive when we deal with larger data sets .Choice of kernel has significant importance
with respect to performance . The concept of SVM algorithm deals with multiple variations like linear
,polynomial and RBF SVM.
Overall, we can conclude that SVM is an powerful machine learning algorithm it can be used in
multiple tasks .

6. GRADIENT BOOSTING :
Gradient boosting ML technique is used to build protective models. It combines multiple predictive
models which are weak to form one strong predictive model . Here the models are trained sequentially
.Each model is trained to correct errors of the previous model . The main objective to this is to construct
decision trees,in which each tree takes into account the errors of the previous one .The basic idea
behind gradient boosting is to build a series of decision trees, with each tree taking into account the
errors of the previous tree. It is used to calculate the difference between the actual and predicted
value.The loss is reduced using the gradient descent which deals with adjusting weights which will
help reduce the loss . Gradient boosting can handle complex data and also captures non-linear

Page | 77 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023
relationships in variables. It also deals with missing values in a given data . Regularizing techniques
like shrinkage and early stooping can help reduce over fitting and thereby increase the overall
performance of the model .

4. RESULTS :

Fig 2. Shows the output screen after running the code successfully

Fig 3. Upload the Data Set

Page | 78 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023

Fig 4. Preprocess the uploaded data set

Fig 5. Select the features using Features Selection Algorithm

Page | 79 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023

Fig 6. Run Machine Learning Algorithms

Fig 7. Depicts that Gradient boosting has highest accuracy

Page | 80 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023

Fig 8. Graph shows that Gradient boosting got the highest accuracy compared to
other algorithms

5. CONCLUSION AND FUTURE SCOPE


We can conclude that this project proposes a software defect prediction system using technical
components like machine learning algorithms. This system helps detect defects using these algorithms.
This can help the developers to make the required changes and increase the overall output.

This technique will also help save a lot of time cost and work involved because this identifies defects
in the early stages itself. This also helps us to determine that machine learning gives more accurate
results when compared to other methods. The system is easy to understand and comprehend, making
it user-friendly.
The future scope can involve modifications and adding deep learning algorithms which will help
improve. We can also develop a few more comprehensive software metrics. We can also compare and
study different sets and evaluate the effect on different algorithms.
Overall, we can say that this system will help improve the quality of the software and reduce the
defects.

REFERENCES :
[1] Khoshgoftaar, T.M., Gao, K.and Szabo, R.M., “An Application of Zero-Inflated Poisson
Regression for Software Fault Prediction”, Software Reliability Engineering, 2001. ISSRE 2001.
Proceedings. 12th International Symposium on, 27-30 Nov. 2001, Page(s): 66 -73.
[2] Munson, J. and Khoshgoftaar, T., “Regression Modeling of Software Quality: An Empirical
Investigation”, Information and Software Technology, Volume: 32 Issue: 2, 1990, Page(s): 106 - 114.
[3] Khoshgoftaar, T.M. and Munson, J.C., “Predicting Software Development Errors using
Complexity Metrics”, Selected Areas in Communications, IEEE Journal on, Volume: 8 Issue: 2, Feb.
1990, Page(s): 253 -261.

Page | 81 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors


Dogo Rangsang Research Journal UGC Care Group I Journal
ISSN : 2347-7180 Vol-13, Issue-4, April 2023
[4] Wang, Y, “A New Approach for fitting Linear Models in high-dimensional Spaces”, PhD Thesis
(2000), Department of Computer Science, University of Waikato, New Zealand,
www.cs.waikato.ac.nz/~ml/publications/2000/thesis.pdf
[5] X. Yang and M. Duan, "Research of Software Defect Analysis Technology," Computer
Engineering & Software, 2018.
[6] J. Collofello and B. P. Gosalla, "An application of causal analysis to the software modification
process," Software: Practice and Experience, vol. 23, 1993.
[7] J. W. Horch, Practical Guide to Software Quality Management, Artech House, 2003.
[8] R. Chillarege, I. Bhandari, J. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray and M.-Y. Wong,
"Orthogonal Defect Classification - A Concept for In-Process Measurements," IEEE Transactions on
software Engineering, vol. 18, pp. 943-956, 1992.
[9] S. &. S. E. S. Committee, "IEEE 1044-1993 - IEEE Standard Classification for Software
Anomalies," IEEE, 1993.

Page | 82 DOI:10.36893.DRSR.2023.V13I04.074-082 Copyright @ 2023 Authors

You might also like