0% found this document useful (0 votes)
12 views5 pages

Predicting The Fix Time of Bugs: Emanuel Giger Giger@ifi - Uzh.ch Martin Pinzger Harald Gall Gall@ifi - Uzh.ch

This paper investigates the relationship between bug report attributes and the time taken to fix bugs in open source projects, aiming to develop prediction models to classify bugs as fast or slow to fix. Empirical studies using decision tree analysis on data from Eclipse, Mozilla, and Gnome show that certain attributes, including post-submission data, significantly improve prediction accuracy. The results indicate that models can effectively recommend which bugs should be prioritized for fixing based on their attributes.

Uploaded by

Madhu Niket
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Predicting The Fix Time of Bugs: Emanuel Giger Giger@ifi - Uzh.ch Martin Pinzger Harald Gall Gall@ifi - Uzh.ch

This paper investigates the relationship between bug report attributes and the time taken to fix bugs in open source projects, aiming to develop prediction models to classify bugs as fast or slow to fix. Empirical studies using decision tree analysis on data from Eclipse, Mozilla, and Gnome show that certain attributes, including post-submission data, significantly improve prediction accuracy. The results indicate that models can effectively recommend which bugs should be prioritized for fixing based on their attributes.

Uploaded by

Madhu Niket
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Predicting the Fix Time of Bugs

Emanuel Giger Martin Pinzger Harald Gall


Department of Informatics Department of Software Department of Informatics
University of Zurich Technology University of Zurich
giger@ifi.uzh.ch Delft University of Technology gall@ifi.uzh.ch
[email protected]

ABSTRACT priority have a significant influence of the fix time of a


Two important questions concerning the coordination of de- bug. The two hypotheses of our empirical studies are: H1—
velopment effort are which bugs to fix first and how long it Incoming bug reports can be classified into fast and slowly fixed
takes to fix them. In this paper we investigate empirically and H2—Post-submission data of bug reports improves prediction
the relationships between bug report attributes and the time models, e.g., number of comments made to a bug.
to fix. The objective is to compute prediction models that We investigate these two hypotheses with bug report data
can be used to recommend whether a new bug should and of six software systems taken from the three open source
will be fixed fast or will take more time for resolution. We projects Eclipse, Mozilla, and Gnome. Decision tree analysis
examine in detail if attributes of a bug report can be used with 10-fold cross validation is used to train and test predic-
to build such a recommender system. We use decision tree tion models. The predictive power of each model is evalu-
analysis to compute and 10-fold cross validation to test pre- ated with precision, recall, and a summary statistic.
diction models. We explore prediction models in a series of
empirical studies with bug report data of six systems of the 2. ANALYSIS
three open source projects Eclipse, Mozilla, and Gnome. Re- In the first step we obtain bug report information from
sults show that our models perform significantly better than Bugzilla repositories of open source software projects (see
random classification. For example, fast fixed Eclipse Plat- Section 3). For each bug report the set of attributes listed in
form bugs were classified correctly with a precision of 0.654 Table 1 is computed. Some attributes of a bug report, such as
and a recall of 0.692. We also show that the inclusion of post-
submission bug report data of up to one month can further Table 1: Constant (I) and changing (C) bug report attributes.
improve prediction models. Attribute Short Description
monthOpened, I month in which the bug was opened
1. INTRODUCTION yearOpened, I year in which the bug was opened
platform, C hardware plaform, e.g., PC, Mac
Several open source projects use issue tracking systems to
os, C operating system, e.g., Windows XP
enable an effective development and maintenance of their
reporter, I email of the bug reporter
software systems. Typically, issue tracking systems collect
assignee, C email of the bug assignee
information about system failures, feature requests, and sys-
milestone, C identifier of the target milestone
tem improvements. Based on this information and actual
nrPeopleCC, C #people in CC list
project planing, developers select the issues to be fixed. In
priority, C bug priority, e.g., P1, ..., P5
this paper we investigate prediction models which support
severity, C bug severity, e.g., trivial, critical
developers in the cost/benefit analysis by giving recommen-
hOpenedBefore- hours opened before the next release
dations which bugs should be fixed first. We address the re-
NextRelease, I
search question whether we can classify incoming bug re-
resolution, C current resolution, e.g., FIXED
ports into fast and slowly fixed. In particular, we investigate
status, C current status, e.g., NEW, RESOLVED
whether certain attributes of a newly reported bug have an
hToLastFix, I bug fix-time (from opened to last fix)
effect on how long it takes to fix the bug and whether predic-
nrActivities, C #changes of bug attributes
tion models can be improved by including post-submission
nrComments, C #comments made to a bug report
information within 1 to 30 days after a bug was reported. In-
tuitively one would expect that some of the attributes, e.g.,
the reporter and the opening date, are entered once dur-
ing the initial submission and remain constant. Other at-
tributes, such as milestone and status, are changed or
Permission to make digital or hard copies of all or part of this work for entered later on in the bug treating process. We highlight at-
personal or classroom use is granted without fee provided that copies are tributes that remain constant over time in Table 1 by an I and
not made or distributed for profit or commercial advantage and that copies attributes that can change by a C. The change history of bug
bear this notice and the full citation on the first page. To copy otherwise, to reports is stored in bug activities. We then use the change
republish, to post on servers or to redistribute to lists, requires prior specific history of bug reports to compute the measures marked with
permission and/or a fee.
RSSE ’10, May 4 2010, Cape Town, South Africa C at specific points in time. In addition to the initial values
Copyright 2010 ACM 978-1-60558-974-9/10/05 ...$10.00. we obtain the attribute values at 24 hours (1 day), 72 hours (3

52
days), 168 hours (1 week), 336 hours (2 weeks), and 720 hours and slowly fixed. Table 3 gives an overview of the performance
(∼1 month) after a bug report was opened. nrActivities measures obtained by the decision tree analysis. We used
simply refers to the number of these changes up to a given Fast as target variable for our calculations.
point in time. nrComments is similar but counts the number
of comments entered by Bugzilla users up to the given point Table 3: Performance measures of prediction models computed
in time. The fix-time hToLastFix of each bug report is mea- with initial attribute values.
sured by the time between the opening date and the date of Project Median Prec. Rec. AUC
the last change of the bug resolution to FIXED. Eclipse JDT 122 0.635 0.485 0.649
In a second step we computed decision trees using Ex- Eclipse Platform 258 0.654 0.692 0.743
haustive CHAID algorithm [6]. For each experiment we Mozilla Core 727 0.639 0.641 0.701
binned bug reports into Fast and Slow using the median Mozilla Firefox 359 0.608 0.732 0.701
of hToLastFix: Gnome GStreamer 128 0.646 0.694 0.724
 Gnome Evolution 701 0.628 0.695 0.694
F ast : hT oLastF ix <= median
bugClass =
Slow : hT oLastF ix > median
bugClass is the dependent variable with Fast selected as Eclipse.
target category. The remaining bug measures are used as in- Looking at Table 3 we see that the decision tree model ob-
dependent variables in all of our experiments. Because both tained with Eclipse Platform bug reports outperforms the Ec-
bins are of equal size, the prior probability for each experi- lipse JDT model. The most important attribute in the Eclipse
ment is 0.5 which corresponds to random classification. We Platform model is monthOpened. An investigation of the
used the default settings of 100 for the minimum number of values, however, yielded no clear trend that bug reports are
cases for parent nodes and 50 for the minimum number of treated differently during the year. The second attribute at-
cases in leaf nodes. The tree depth was set to 3 levels. tached to the tree is assignee.The model performance is
For the validation of each prediction model we used 10- significantly higher than random classification which lets us
fold-cross validation [8]. The data set is broken into 10 sets accept hypothesis H1 for Eclipse Platform.
of equal size. The model is trained with 9 data sets and tested With a low recall value of 0.485 the Eclipse JDT model
with the remaining tenth data set. This process is repeated 10 strikes out. A recall value lower than 0.5 indicates that the
times with each of the 10 data sets used exactly once as the model misses more than half of Fast bug reports. Further-
validation data. The results of the 10 folds then are averaged more, the Eclipse JDT model has the lowest AUC value of
to produce the performance measures. all examined projects. The top most attribute of the Eclipse
We use precision (P), recall (R), and the area under the re- JDT decision tree is assignee. The overall structure of the
ceiver operating characteristic curve (AUC) statistic for mea- tree affirms the moderate performance of the model. Most
suring the performance of prediction models. Precision (P) of the nodes in the decision tree show low performance to
denotes the proportion of correctly predicted Fast bugs: P = distinguish between fast and slowly fixed bugs. We reject
T P/(T P + F P ). Recall (R) denotes the proportion of true hypothesis H1 for Eclipse JDT.
positives of all Fast bugs: R = T P/(T P + F N ). AUC is the
area under receiver operating characteristic curve. It can be Mozilla.
interpreted as the probability, that, when randomly selecting Decision tree models computed with bug reports of the
a positive and a negative example the model assigns a higher two Mozilla projects show similar performance. The first at-
score to the positive example [4]. In our case the positive ex- tribute considered in the decision tree of the Mozilla Core
ample is a bug classified Fast. project is yearOpened. Bug reports opened after the year
2003 were more likely to get fixed fast with a probability
3. EXPERIMENTS of 0.632. In contrast, bug reports opened before 2001 tend
We investigated the relationships between the fix-time of to be classified Slow with a probability of 0.639. Bug re-
bug reports and their attributes with six (sub-)systems taken ports opened between 2001 and 2003 cannot be distinguished
from the three open source software projects Eclipse, Mozilla, sufficiently by yearOpened. Additionally, the decision tree
and Gnome. Table 2 lists the number of bugs input to our model contains the component of a bug as well as informa-
experiments. tion about the assignee, the operating system (os), and
monthOpened. Improvements over random classification
Table 2: Number of bugs and dates of first and last filed bug are significant and we accept hypothesis H1 for Mozilla Core.
reports of subject systems. In contrast to Mozilla Core, the Firefox model contains com-
Project #Bugs Observation Period ponent as the most significant predictor. There is one node
Eclipse JDT 10,813 Oct. 2001 – Oct. 2007 predicting perfectly, however, it only covers 0.9% of bug re-
Eclipse Platform 11,492 Oct. 2001 – Aug. 2007 ports. The second most important attribute is the assignee,
Mozilla Core 27,392 Mar. 1997 – June 2008 and in contrast to the Mozilla Core model, the yearOpened
Mozilla Firefox 8,899 Apr. 2001 – July 2008 attribute of Firefox bug reports is of only minor relevance.
Gnome GStreamer 3,604 April 2002 – Aug. 2008 Precision, recall, and AUC values let us accept hypothesis
Gnome Evolution 13,459 Jan. 1999 – July 2008 H1 for Mozilla Firefox.

Gnome.
3.1 Classifying Bugs with Initial Bug Data The prediction models of both Gnome projects improve
In this section we present the results of our investigation of random classification. The top most attribute of the Gnome
hypothesis H1—incoming bug reports can be classified into fast GStreamer decision tree is yearOpened. Similar to Mozilla

53
Core older bug reports (i.e., opened before 2005) were likely milestone led to improved performance of prediction mod-
to take more time to fix than recently reported bugs. The af- els for the Eclipse JDT project. In addition to milestone, the
fected component is the second most significant predictor. assignee, the reporter, monthOpened, and yearOpened
An investigation of corresponding tree nodes showed that represent significant predictors in computed decision tree mod-
bug reports which affected components related to the plugin els. The best performing model takes into account 14 days of
architecture of Gnome GStreamer tend to be fixed faster. In post-submission data. Precision, recall, and AUC values of
particular recent bug reports followed this trend. As in our this model are higher as the corresponding values of the ini-
previous experiments prediction models were improved by tial model. This lets us accept hypothesis H2 for Eclipse JDT.
including the attributes reporter and assignee. The val- Table 5 lists the performance measures for the Eclipse Plat-
ues for precision, recall, and AUC let us accept hypothesis form bugs. Experiments showed similar results as before
H1 for Gnome GStreamer. with Eclipse JDT. On average, bugs in the Eclipse Platform
The decision tree model of Gnome Evolution bug reports project tend to take longer to fix than in the Eclipse JDT project.
contains assignee as first attribute. The attributes on the This is indicated by a higher median fix-time for the different
second level of the tree are hOpenedBeforeNextRelease, observation periods.
reporter, yearOpened, and severity. An investigation
of the decision tree did not show any patterns or tendencies, Table 5: Median fix-time and performance measures of Eclipse
that enable a straight forward classification of bug reports Platform prediction models.
into Slow and Fast. Concerning precision, recall, and AUC Days Median #Bugs Prec. Rec. AUC
the model performs significantly better than random classifi- 0 258 11,492 0.654 0.692 0.743
cation. We accept hypothesis H1 for Gnome Evolution. 1 560 9,003 0.682 0.586 0.734
In summary, decision tree analysis with the initial bug at- 3 840 7,803 0.691 0.631 0.749
tributes obtains prediction models that for five out of six sys- 7 1,309 6,457 0.691 0.587 0.738
tems perform 10 to 20% better than random classification. 14 1,912 5,307 0.743 0.669 0.798
This is a sufficient indicator that we can compute prediction 30 2,908 4,135 0.748 0.617 0.788
models to classify incoming bug reports into Fast and Slow
and we accept hypothesis H1. The inclusion of post-submission data of Eclipse Platform
bug reports only sightly improved prediction models. As in
3.2 Classifying Bugs with Post-Submission Data the decision tree computed with Eclipse JDT bug reports, the
milestone attribute was selected as the first attribute in the
This section presents the results of the evaluation of hy-
tree. Also in the Platform data, milestones are added in the
pothesis H2—post-submission data of bug reports improves pre-
post-submission phase of bug reports. After one day, mile-
diction models. For each bug report we obtained post-submis-
stones were added to 27% of pending bugs. This ratio re-
sion data at different points in time, namely 1 day, 3 days,
mained constant for the later observation points. Most of the
1 week, 2 weeks, and 1 month after the creation date of the
undecidable bugs do not have any milestone specified. The
bug report. For each observation period we computed deci-
monthOpend, reporter, and assignee are the other sig-
sion tree models which we validated with 10-fold cross val-
nificant predictors contained by decision tree models. The
idation. The following paragraphs present and discuss the
model with 14 days of post-submission data performed best.
results of experiments and performance measures of predic-
Improvements over the initial model led to the acceptance of
tion models.
hypothesis H2 for Eclipse Platform.
Eclipse. Mozilla.
Table 4 lists the median fix-time of bugs and the results
The results of the decision tree analysis with bug reports of
of decision tree analysis with bug reports of the Eclipse JDT
the Mozilla Core project are depicted in Table 6. The median
project.
bug fix-time indicate longer fix times for Mozilla Core than
for Eclipse bugs on average.
Table 4: Median fix-time and performance measures of Eclipse
JDT prediction models. Table 6: Median fix-time and performance measures of Mozilla
Days Median #Bugs Prec. Rec. AUC Core prediction models.
0 122 10,813 0.635 0.485 0.649 Days Median #Bugs Prec. Rec. AUC
1 296 7,732 0.710 0.577 0.742 0 727 11,377 0.639 0.641 0.701
3 491 6,277 0.693 0.659 0.767 1 935 10,424 0.708 0.667 0.773
7 865 4,767 0.750 0.606 0.785 3 1,179 9,524 0.727 0.630 0.770
14 1,345 3,653 0.775 0.661 0.823 7 1,617 8,347 0.712 0.697 0.777
30 2,094 2,615 0.885 0.554 0.806 14 2,201 7,142 0.757 0.671 0.803
30 3,257 5,716 0.688 0.708 0.746
The inclusion of post-submission information improved
the performance of prediction models as indicated by increas- Mozilla Core models contained priority, milestone,
ing precision, recall, and AUC. In contrast to the initial de- assignee, and reporter as significant predictors. priori-
cision tree, the models built with post submission data ob- ty is the first attribute in decision tree models computed
tained milestone as the top most predictor. New bug re- with 3 and 7 days of post-submission data. Bug reports with
ports rarely have a milestone specified which, in the case of low priority take longer to fix than bugs with higher prior-
Eclipse JDT, are 36 out of 10,813 bug reports. Within one ity. For example, in the 3-days model 80.7% of 1,255 bug
week the ratio of pending bugs with milestones increased reports with priority P1 were fixed fast. milestone is the
to 37% and afterwards remained constant. The inclusion of most significant predictor in the other models that consider

54
post-submission data. In Mozilla Core few (1.6%) milestones inclusion of post-submission information. While the AUC
were entered when the bug was reported. This ratio changed value of the initial model is 0.724 the AUC value of the last
to 30% within one day whereas most of the reports were as- model is only 0.586. One big difference is that in Gnome
signed to the "moz" milestone. The ratio steadily increased GStreamer the milestone attribute is specified for only few
up to 47% within 30 days after bug report submission. In bug reports, hence, was not included into prediction models.
extension to Eclipse JDT and Platform, the models computed Although, milestones were initially specified for 9% of bug
with Mozilla Core bug reports contained also severity, the reports, this ratio increased to only 18% within 30 days which
affected component, nrComments, and nrActivities. is lower than the ratio in the Eclipse or Mozilla projects. In
Prediction models with post-submission data show improved the models with post-submission data, the assignee is the
performance, hence, we accept hypothesis H2 for Mozilla most significant predictor followed by the reporter and
Core. The median fix-time and performance measures of nrComments. Also with post-submission data we could not
models computed with Mozilla Firefox bugs are listed in Ta- obtain reasonable prediction models, hence, we reject hy-
ble 7. The median fix-time indicates faster fixes of Mozilla pothesis H2 for the Gnome GStreamer.
Firefox bugs than Mozilla Core bugs. The next series of experiments was with bug reports of the
Gnome Evolution project. The results of the decision tree
Table 7: Median fix-time and performance measures of Mozilla analysis are depicted by Table 9. Bugs of this system tend
Firefox prediction models. to take longer to fix on average than in the other subject sys-
Days Median #Bugs Prec. Rec. AUC tems.
0 359 8899 0.609 0.732 0.701
1 587 7478 0.728 0.584 0.748 Table 9: Median fix-time and performance measures of Gnome
3 801 6539 0.697 0.633 0.742 Evolution prediction models.
7 1176 5485 0.729 0.610 0.759 Days Median #Bugs Prec. Rec. AUC
14 1778 4553 0.680 0.683 0.757 0 701 13459 0.628 0.695 0.694
30 2784 3440 0.751 0.748 0.834 1 1136 11548 0.649 0.659 0.727
3 1476 10496 0.693 0.611 0.746
The best model was computed with post-submission data 7 1962 9335 0.636 0.798 0.752
of up to 30 days. This decision tree model has a precision 14 2566 8228 0.665 0.760 0.766
of 0.751, a recall of 0.748, and an AUC of 0.834. While in 30 3625 6695 0.690 0.682 0.771
previous prediction models milestone or priority were
selected as the most significant predictors, nrActivities Compared to Gnome GStreamer the models computed with
and yearOpened were selected in Mozilla Firefox models. the Gnome Evolution bug reports show better performance
In the models computed with 1, 3, and 7 days of post-submis- regarding precision, recall, and AUC values. The performance
sion data we observed, that bugs with zero or one activity of the decision tree models increases when including post-
were fixed slower than bugs with more than 7 activities. The submission information. Similar to the decision trees com-
ratio of bug reports with specified milestones follows a simi- puted with Eclipse and Mozilla bug reports milestone is
lar trend as in previous case studies. Surprisingly, the model the most significant predictor followed by assignee. Mile-
with the best performance (30 days) does not contain the stones were added for 21% of the bugs within one day. This
milestone attribute. In this model, yearOpened is the ratio increased to 31% within 30 days. Slow bug reports are
most significant predictor. In particular, bugs that were re- indicated by milestones, such as "Later", "Future", or "resched-
ported before the year 2003 took longer to fix on average ule" while fast bug reports got mainly concrete release num-
than bugs reported after the year 2006. The reporter and bers. Bug reports with no milestone are basically undecid-
assignee were the other bug attributes contained by this able. Other significant predictor variables which appeared
decision tree. The good performance of the last model (30- in the various Gnome Evolution models are the reporter,
days) lets us accept the hypothesis H2 for Mozilla Firefox. yearOpened, and monthOpened. Furthermore, severity
and hOpenedBeforeNextRelease are significant. The good
Gnome. performance of the prediction models with 7 and 14 days of
Table 8 lists the measures of the prediction models com- post-submission data lets us accept hypothesis H2 for Gnome
puted with the Gnome GStreamer bug reports. Similar to Evolution.
bug reports of the two Eclipse projects many reports in Gnome In summary, the inclusion of post-submission data led to
GStreamer have a short fix-time on average as indicated by improved predictive power of models in all systems but Gnome
lower median fix-time. GStreamer. We therefore accept hypothesis H2.
Table 8: Median fix-time and performance measures of Gnome
GStreamer prediction models. 4. RELATED WORK
Days Median #Bugs Prec. Rec. AUC Hooimeijer and Weimer [5] used linear regression analy-
0 128 3604 0.646 0.694 0.724 sis on bug report data to predict whether a bug report is
1 406 2553 0.581 0.810 0.666 triaged within a given amount of time. Similar to our ap-
3 708 2052 0.606 0.704 0.667 proach they take into account post-submission data and in-
7 1084 1650 0.613 0.652 0.669 vestigate how much of this data is needed to yield adequate
14 1517 1351 0.658 0.561 0.680 predictive power. While they focus on reducing the bug triage
30 2268 1018 0.538 0.811 0.586 time which they denote as time needed to inspect, under-
stand, and making the initial decision regarding how to ad-
In contrast to previous experiments, the performance of dress the report, we concentrate on the fix-time of bugs. Fur-
models computed for Gnome GStreamer decreases with the thermore, they aim at finding an optimal cut-off value to clas-

55
sify bug reports into "cheap" and "expensive" while we use can think of recommender tools in a way that they provide
fixed cut-off values for Fast and Slow. Additionally, we use valuable input to developers and aid them in deciding which
decision tree analysis instead of linear regression analysis. bugs to address first—though, their decision can not be solely
Another important basis for our work was done by Pan- based on the output of the models. Our models could also be
jer in [10]. He used several different data mining models to useful to new developers or bug reporters as they give an
predict eclipse bug lifetimes. We extend his work by look- insight in how bugs are prioritized in a software project.
ing at more systems. Furthermore, while he counted the cc On-going and future work is basically concerned with im-
list, dependent bugs, bug dependencies, and comments we proving the performance of prediction models. For this we
take into account that other attributes, e.g., assignee might plan to extend the input data set and investigate other algo-
change as well. We rather create different profiles represent- rithms to compute prediction models. For example, detailed
ing the state of a bug at a certain point of its lifetime. change information of bug report attributes, data about the
An approach to assist in bug triage is presented by Anvik affected components and text analysis will be tested. Fur-
et al. in [1]. They give suggestions to which developer a new thermore, we plan to evaluate whether Random Forests and
bug report should be assigned. To find suitable developers Naive Bayes algorithms can improve prediction models.
among all possible candidates they apply machine learning
techniques to open bug repositories. It would be interesting 6. REFERENCES
to see whether vector support machines instead of decision [1] J. Anvik, L. Hiew, and G. C. Murphy. Who should fix
trees can improve prediction in our sense. this bug? In Proc. of the Int’l Conf. on Softw. Eng., pages
Wang et al. recognize the mentioned problem of duplicates 361–370, New York, NY, USA, 2006. ACM.
and its possible drawbacks on bug triage [11]. [2] N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj,
Kim and Whitehead argue that the time needed to fix a and T. Zimmermann. What makes a good bug report?
bug is a significant factor when measuring the quality of a In Proc. of the Int’l Symp. on Foundations of Softw. Eng.,
software system [7]. Our approach is complementary, in that pages 308–318, 2008.
it provides a prediction model for estimating whether a bug [3] C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein,
will be fixed fast or take more time for resolution. V. Filkov, and P. Devanbu. Fair and balanced?: bias in
Given a new bug report Weiss et al. present a method to bug-fix datasets. In Proc. of the Joint Meeting of the
predict the effort, i.e., the person-hours spent on fixing that European Softw. Eng. Conf. and the ACM SIGSOFT Symp.
bug [12]. They apply text mining technique to search reports on the Foundations of Softw. Eng., pages 121–130, New
that match a new filed bug. They use effort measures from York, NY, USA, 2009. ACM.
past bug reports as a predictor. We also use existing data
[4] D. M. Green and J. A. Swets. Signal Detection Theory and
from recoded bug reports to compute a prediction model but
Psychophysics. John Wiley & Sons, Inc., New York NY,
we remain limited to non-textual features of a bug report.
1966.
Bettenburg et al. investigated which elements developers
[5] P. Hooimeijer and W. Weimer. Modeling bug report
rely on when fixing a bug [2]. Similar to our approach they
quality. In Proc. of the Int’l Conf. on Autom. Softw. Eng.,
claim that the information given in bug reports has an impact
pages 34–43, New York, NY, USA, 2007. ACM.
on the fix time.
Lessmann et al. compare different classification models [6] G. V. Kass. An exploratory technique for investigating
for software defect prediction using AUC as benchmark [9]. large quantities of categorical data. Journal of Applied
We use similar analysis techniques and performance evalua- Statistics, 29(2):119–127, 1980.
tion criteria but instead of failure-proneness aim at providing [7] S. Kim and J. E. James Whitehead. How long did it take
models to predict the fix time of bugs. to fix bugs? In Proc. of the Int’l Workshop on Mining
Recently Bird et al. have found evidence the there is a sys- Softw. Repositories, pages 173–174, New York, NY, USA,
tematic bias in bug datasets [3]. This might effect prediction 2006. ACM.
models relying on such biased datasets. [8] R. Kohavi. A study of cross-validation and bootstrap
for accuracy estimation and model selection. In Proc. of
5. CONCLUSIONS & FUTURE WORK the Int’l Joint Conf. on Artificial Intelligence, pages
1137–1143. Morgan Kaufmann, 1995.
We computed prediction models in a series of experiments
[9] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch.
with initial bug report data as well as post-submission infor-
Benchmarking classification models for software defect
mation from three active open source projects. Summarized,
prediction: A proposed framework and novel findings.
the results of our experiments are: Between 60% and 70% of
IEEE Trans. on Softw. Eng., 34(4):485–496, 2008.
incoming bug reports can be correctly classified into fast and
slowly fixed. assignee, reporter, and monthOpened are [10] L. D. Panjer. Predicting eclipse bug lifetimes. In Proc. of
the attributes that have the strongest influence on the fix-time the Int’l Workshop on Mining Softw. Repositories, page 29,
of bugs. Post-submission data of bug reports improves the Washington, DC, USA, 2007. IEEE Computer Society.
performance of prediction models by 5% to 10%. The best- [11] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An
performing prediction models were obtained with 14-days approach to detecting duplicate bug reports using
or 30-days of post-submission data. The addition of concrete natural language and execution information. In Proc. of
milestone information was the main factor for the perfor- the Int’l Conf. on Softw. Eng., pages 461–470, New York,
mance improvements (see Section 3.2). Decision tree mod- NY, USA, 2008. ACM.
els with initial and post-submission bug report data showed [12] C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller.
adequate performance when compared to random classifica- How long will it take to fix this bug? In Proc. of the Int’l
tion. However, the applicability of these models to develop Workshop on Mining Softw. Repositories, page 1,
fully automated recommender systems is questionable. We Washington, DC, USA, 2007. IEEE Computer Society.

56

You might also like