Where Should the Bugs Be Fixed?
Zhou, J., H. Zhang, and D. Lo.
Presented by: XYZ
Outline
✘ Context
✘ Problem
✘ Solution
✘ Related work/Limitations
✘Approach overview
✘ Validation
Context
summary
metadata
descriptions
The screenshot of Eclipse bug report #339286 got from the Bugzilla website
https://bugs.eclipse.org/bugs/show_bug.cgi?id=339286
Outline
✘ Context
✘ Problem
✘ Solution
✘ Related work/Limitations
✘Approach overview
✘ Validation
Problem
✘ There are more than 5,041
bug reports created for the
Eclipse project in this year
(from 01-01-2015 to 11-22-
2015). This results in an
average of 15 bug reports per
day.
https://bugs.eclipse.org/bugs/
Figure 1. Android bug report from Bugzilla
Outline
✘ Context
✘ Problem
✘ Solution
✘ Related work/Limitations
✘Approach overview
✘ Validation
Solution
Algorithm
Android bug report from Bugzilla
A ranked list of likely
buggy code fragments
Outline
✘ Context
✘ Problem
✘ Solution
✘ Related work/Limitations
✘Approach overview
✘ Validation
Related Work
✘ Similarity between bug
report and source files
✘ Use of Information
Retrieval to calculate the
similarity
Figure 2. A bug report and its relevant source code
file
Related Work
✘LDA: Latent Dirichlet Allocation
✘Sum: Smoothed Unigram Model
✘LSI: Latent Semantic Indexing
✘VSM: Vector Space Model
Figure 3. LDA corpus process
Limitations of existing studies
✘ There is a significant difference between the natural language used in bug
reports and the programming language.
✘Information of previously fixed bug reports can be also used to learn about
potential candidate files if there fixed bug reports are similar to the opened
one.
✘Existing techniques failed to accurately locate the bug.
Outline
✘ Context
✘ Problem
✘ Solution
✘ Related work/Limitations
✘Approach overview
✘ Validation
Approach Overview
✘ Source code files: Source code files may
contain words that are similar to those occurring
in the bug reports.
✘ Software size: When two source files have
similar scores, we need to determine which one
should be ranked higher. They need to assign
higher scores to larger files.
✘ Similar bugs: Examine similar bugs that were
reported and fixed before. It could help them Figure 4.
locate the relevant files for the new bug.
Outline
✘ Context
✘ Problem
✘ Solution
✘ Related work/Limitations
✘Approach overview
✘ Validation
Validation
✘RQ1: How many bugs can be successfully located by BugLocator?
The files are ranked in top 1, top 5 or top 10, we consider the report has been
effectively localized.
✘RQ2: Can BugLocator outperform other bug localization methods?
They compare BugLocator to the bug localization methods implemented using the
same techniques.
✘RQ3: What is the impact of weighting the similarity scores when aggregating
them?
Vary the weight of each score and see the impact on the localization.
Validation
Data Collection
✘ Complete bug and change history
✘ Have different numbers of bugs and
source code files.
Table 1. The studied projects
Comparison Metrics
✘ MRR (Mean Reciprocal Rank).
✘ MAP (Mean Average Precision).
Validation
✘RQ1: How many bugs can be successfully located by BugLocator?
Table 2. The performance of BugLocator
30%, 50% and 60% of effectiveness of Figure 5. The comparison between the results of BugLocator
the proposed approach. and the SUM results given in [32] on AspectJ dataset
[32] S. Rao and A. Kak. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In Proceeding of the 8th
working conference on Mining software repositories (MSR'11), ACM, Waikiki, Honolulu, Hawaii, p.43-52, May 2011.
Validation
✘RQ2: Can BugLocator outperform other bug localization methods?
Figure 6. The comparisons between different bug localization methods
Validation
✘ RQ3: What is the impact of weighting the similarity scores when aggregating
them?