0% found this document useful (0 votes)
69 views

A Source Code Recommender System To Support Newcomers

Precision and recall at K

Uploaded by

Deep Ghose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

A Source Code Recommender System To Support Newcomers

Precision and recall at K

Uploaded by

Deep Ghose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2012 IEEE 36th International Conference on Computer Software and Applications

A Source Code Recommender System to Support Newcomers

Yuri Malheiros∗§ , Alan Moraes† , Cleyton Trindade‡ and Silvio Meira§


∗ Departamento de Ciências Exatas
Universidade Federal da Paraı́ba, Rio Tinto, PB, Brazil
Email: [email protected]
† Centro de Informática
Universidade Federal da Paraı́ba, João Pessoa, PB, Brazil
‡ Unidade Acadêmica de Serra Talhada
Universidade Federal Rural de Pernambuco, Serra Talhada, PE, Brazil
§ Centro de Informática
Universidade Federal de Pernambuco, Recife, PE, Brazil

Abstract—Newcomers in a software development project tracking and mailing lists during the development, and all
often need assistance to complete their first tasks. Then a of these tools record artifacts creating the project memory.
mentor, an experienced member of the team, usually teaches Recommender systems can use the project memory to help
the newcomers what they need to complete their tasks. But,
to allocate an experienced member of a team to teach a newcomers in some tasks answering their questions, thus in
newcomer during a long time is neither always possible nor some cases the developers do not need a mentor, since they
desirable, because the mentor could be more helpful doing can ask to the computer.
more important tasks. During the development the team In this paper we present Mentor, a recommender system
interacts with a version control system, bug tracking and to assist newcomers to solve change requests recommending
mailing lists, and all these tools record data creating the project
memory. Recommender systems can use the project memory to source code files. Mentor uses the Prediction by Partial
help newcomers in some tasks answering their questions, thus Matching (PPM)[2] algorithm and some heuristics to analyze
in some cases the developers do not need a mentor. In this paper a change request and the data of version control systems, and
we present Mentor, a recommender system to help newcomers then recommend potentially relevant source code that will
to solve change requests. Mentor uses the Prediction by Partial help the developer in the change request solution.
Matching (PPM) algorithm and some heuristics to analyze the
change requests, and the version control data, and recommend We begin the paper with an overview of related work. We
potentially relevant source code that will help the developer then describe in details the Mentor recommender system.
in the change request solution. We did three experiments to We continue by presenting three experiments to evaluate the
compare the PPM algorithm with the Latent Semantic Indexing tool, each one using a different open source project, and
(LSI). Using PPM we achieved results for recall rate between their results. We conclude with a discussion of the results,
37% and 66.8%, and using LSI the results were between 20.3%
and 51.6%. and future research directions.
Keywords-recommender systems; software engineering; soft- II. R ELATED W ORK
ware maintenance; information theory;
The Hipikat [3] assists newcomers in a software develop-
ment project recommending source code, change requests,
I. I NTRODUCTION
mailing list messages, documentation and people informa-
Newcomers in a software development project often need tion. It creates relations between the artifacts, for example,
assistance to complete their first tasks, because they need link source code modified to solve a change request with
to learn how the project works, its architecture, the de- the change request. This relation is used by Mentor too,
velopment process, and how to use some tools to become but the tools use different approaches to link artifacts. The
productive. Then a mentor, an experienced member of the Hipikat uses the Latent Semantic Indexing (LSI) algorithm
team, usually teaches the newcomers what they need to to find similarity among textual artifacts. According to the
complete their tasks [1]. To help, the mentor talks to the authors, the LSI algorithm is the bottleneck of the system,
newcomer, give him tips to solve problems, and usually because it is slow to use LSI in a system with a big number
show source code examples to teach how to do something. of artifacts. The Mentor uses PPM to find similarity, then
However the cost to take an experienced developer to his we intend to obtain better recommendations than LSI with
main tasks to teach a newcomer is high, then sometimes it a good performance, even with many change requests.
is not possible to allocate someone as a mentor for a long Codebook [4] is a framework for mining the data of
period of time. project repositories. It uses a graph with relations between
The team interacts with version control system, bug people and artifacts, an approach very similar to Hipikat.

0730-3157/12 $26.00 © 2012 IEEE 19


DOI 10.1109/COMPSAC.2012.11

Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on August 18,2020 at 18:40:19 UTC from IEEE Xplore. Restrictions apply.
The Codebook paper presented two applications using the Context k=1
framework: the Hoozizat, a tool to find experts and the Deep o 1 1/2
Intellisense, a Visual Studio add-in that shows events ordered h
escape 1 1/2
chronologically related to a symbol. The Codebook could be c 2 2/3
used to recommend source code related to a change request, o
escape 1 1/3
if the graph has edges linking these kinds of artifacts. u 2 2/3
The system proposed by Moin and Khansari [5] also c
escape 1 1/3
recommends source code related to change requests. The s 2 2/3
tool uses the Support Vector Machine (SVM) classifier u
escape 1 1/3
[6] to find similar change requests solved earlier and then p 1 1/2
recommend the source code modified to solve these similar s
escape 1 1/2
change requests as the solution of an open change request. o 1 1/2
The system has a crucial difference to the Mentor, because p
escape 1 1/2
it recommends whole directories with source code files, and
Context k=0
Mentor recommends files directly. To recommend directories
h 1 1/16
may not help the developers, because the directories may
o 2 2/16
have a lot of files.
c 2 2/16
III. M ENTOR u 2 2/16
s 2 2/16
Mentor is a tool that manages change requests and makes p 1 1/16
recommendations of source code related to a change request. escape 6 6/16
The tool makes recommendations assuming similar
change requests have similar solutions. Thus, to find the Table I
source code files related to an open change request, the tool PPM MODEL AFTER PROCESS THE STRING “ HOCUSPOCUS ”.
looks for similar change requests that were solved in the
past and recommends the files changed to solve them as
the related files of the open change request. The tool ranks In the first step, a PPM model is created for each change
the solutions by the similarity of the change requests, then request stored in the system. It is made using the text of the
the files modified to solve the most similar change request change request summary, description and comments made
appears first, the files of the second most similar change by developers concatenated in only one string. The PPM
request appears next, and so on. model is a statistical model, i.e., the model is made by
The Mentor recommendations are independent of pro- probabilities according to the occurrence of the text symbols.
gramming language and independent of the language used The model also considers the context of the symbols,
to describe the change requests. Then, it does not matter if i.e., the k previous symbols of the current symbol. Using
the project is written in C, C++, Java or mix programming contexts, the probability of a symbol does not just depend
languages, or if the developers using English or Portuguese on its frequency, but it depends on the context in which it
to describe the change requests, the algorithms used in the occurs too. For example, the probability of the letter “h”
tool work in all these cases. This is a very good point of appears in an English text is 5%. However, if the current
Mentor, because different teams, developing different kinds symbol is the letter “t”, there is a greater probability that
of projects, can use the same tool and obtain useful results. the next symbol is the letter “h”, about 30%, because, in
The tool is based in the MVC (Model-View-Controller) English, the letters “th” often appear together [2].
architecture and two components were created to work on There is a special symbol called escape in the PPM
the recommendation tasks - Matcher and Similarity Assigner algorithm, it is added to the model in a context whenever a
- they interact directly with the Model layer reading and symbol appears for the first time in that context. This special
writing data in the database. symbol is important to calculate the entropy of a message,
because it represents the probabilities of all the symbols that
A. Similarity Assigner do not appear in the model.
The Similarity Assigner component creates the similarity The Table I shows a PPM model using a maximum
relations among change requests. It analyzes each change context K = 1 for the string “hocuspocus”. N is the number
request stored in the database, and it uses the PPM algorithm of times a symbol appears in a context and P is the estimated
to calculate similarity between them. The process to identify probability for this symbol.
similar change requests has two steps: model creation and We use the entropy to measure how a change request is
classification. similar to other. The entropy is calculated using the PPM

20

Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on August 18,2020 at 18:40:19 UTC from IEEE Xplore. Restrictions apply.
model. Thus, to calculate the similarity between a change
request A and a change request B, we need to create a PPM
model of A and then calculate the entropy of B using the
model of A. The result will give to us if they are similar or
not.
Mentor uses two equations to find the entropy. The Equa-
tion (1) is the mathematical definition for the information I
of a symbol x [7].
The entropy H of a text is the mean of the information
produced by the symbols of this text. The Equation (2)
represents the averaging of information of each symbol xi
of a message of size N . It is easy to notice that the value
of the entropy depends directly of the probabilities of the Figure 1. Mentor index
model used.
The tool calculates the information of each symbol of
the summary, description and comments concatenated of Figure 2. In this screen the developer can analyze the change
the change request B using the probabilities of the change requests details, in many cases only the information of the
request A model, sum them all and divide by the message summaries is not sufficient to inform the developer what he
length. needs to start to solve the problem.
A low entropy value means that the change requests are
similar, a high value means the opposite, that the change
requests are not similar.
1
[h]I(x) = log2 ( ) (1)
P (x)
N
1 
[h]H = I(xi ) (2)
N i=1
B. Matcher
The Matcher component analyzes every change requests
stored in the database and the data of the version control Figure 2. Change request details
used in the project, and it uses a heuristic to discover and
store the relation between the revisions and change requests Below the change request summary there is a link high-
in the database. lighted with the text “recommend solutions”. Clicking on
The heuristic used by the Matcher works as follows. this link, the Mentor will recommend to the developer a list
Usually in software development project the developers use of similar change requests that were solved in the past. The
a convention to create commit messages. When a developer figure 3 shows the similar change requests of the request
sends the modifications to solve a change request, he could #7300 of the project Hadoop Common.
attach a message like “issue #1234 solved”, where #1234 is
the ID of the change request solved. Thus, the Matcher uses
regular expressions to scan the commit messages looking for
some patterns like these. When it finds a pattern, a relation
between a change request and a revision is stored in the
database.
C. Usage
The Figure 1 shows the Mentor initial screen. The tool
displays a list of the change requests IDs and its summaries
ordered by modification date. This approach is very common
in bug tracking systems, because the users can browse
quickly among the change requests and read the short
description to know what the change request subject is.
Figure 3. Similar change requests
Clicking on a change request, the Mentor change its
screen to the change request details, it is the screen of the
The most similar change request according to the tool is

21

Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on August 18,2020 at 18:40:19 UTC from IEEE Xplore. Restrictions apply.
the change request #7001. Clicking on the change request in Project Data Amount Period
the list, the Mentor shows the revisions related to the change change requests 374 06/2001 -
request and the source code files changed in these revisions. GTK+
revisions 26805 08/2008
IV. E XPERIMENT change requests 496 06/2001 -
GIMP
We compared PPM, the technique used in Mentor, with revisions 29708 09/2008
LSI, the technique used in Hipikat [3] to evaluate which change requests 250 06/2009 -
Hadoop
technique is more efficient to find similarity among texts. revisions 3049 06/2011
We create other version of the Similarity Assigner com-
ponent to use the LSI instead PPM, thus, we change only the Table II
E XPERIMENT DATA
algorithm to find similarity, the rest of the system remained
exactly the same. We use the LSI implementation of the
Gensim library1 to generate the text similarity relations.
D. Validities
A. Metrics
We used three following metrics in the experiment: All the variables in the experiments are static, we change
precision[8], recall[8] and recall rate[9], because they are only the similarity algorithm to compare PPM and LSI. In
largely used in the literature in similar experiments. other words, in the experiments we change the similarity
algorithm applied in the static variables and observe the
B. Hypotheses metrics.
For the experiments the null hypothesis are: To reject or not the hypotheses we use two statistical
• Hn1 - PPM precision = LSI precision tests, the Wilcoxon signed-rank test for matched pairs for
• Hn2 - PPM recall = LSI recall precision and recall, and a proportion test for the recall rate
• Hn3 - PPM recall rate = LSI recall rate [10].
And the alternative hypothesis are: We used three different projects to avoid mistakes in the
• Ha1 - PPM precision > LSI precision experiment conclusion. If we used only one project, maybe
• Ha2 - PPM recall > LSI recall the results were true only in this specific case, then we would
• Ha3 - PPM recall rate > LSI recall rate not be able to conclude if the PPM is better or not compared
C. Instrumentation to LSI. However, using three projects, written in different
programming languages and with purposes very distinct, we
We did three experiments, each one using data of a try to avoid this threat.
different open source project. The projects used are:
• GTK+: 503,161 source code lines in 1,348 files; E. Execution
• GIMP: 737,835 source code lines in 3,293 files;
For each experiment, we import the project data using
• Hadoop: 613,481 source code lines in 1,003 files.
some scripts to load the data from different formats directly
We filter some change requests in the experiment. First, in the database of the system and run the processes to
we selected only the minor and trivial change requests, generate similarity and change requests-revision relations,
because they are simple tasks that a newcomer could solve. one time with PPM and other time with LSI.
Some change requests imported do not have any version Each change request received from one to ten recom-
control revision associated, because there is no revision mendations of similar change requests, and we calculated
referencing them in the commit messages. These requests precision, recall for each recommendation. For example,
were removed from the experiment too. for one recommendation (A), first only one change request,
Also, the solution of some change requests may does not
and the files changed to solve it, is recommended, then the
have intersection with any other change request solution.
metrics are calculated. After that, the system recommends
This is a problem, because it is impossible to recommend
other change request, the second most similar. Now, the
a correct solution if there is not any change request that
metrics are calculated using the files of the first and of
changes the correct files. For example, if a change request
the second change request. This process continues until
(A) was solved modifying a file (X), and in all the rest of
recommends ten change requests and their files. In the end,
the change requests, no one was solved modifying the file
we calculated recall rate for set of recommendations.
(X), it is impossible to recommend the correct solution of
All change requests used in the experiment were solved
change request (A). The requests like the request (A) were
previously by the developers of the projects, however, we did
removed from the experiment too.
not use this information during the recommendation process.
The Table II shows the amount of change requests and
Thus, to analyze if a recommendation is right or wrong, we
revisions used in the experiment, and the period of time.
only need to compare the files recommended by the system
1 http://nlp.fi.muni.cz/projekty/gensim/ with the files of the real solution.

22

Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on August 18,2020 at 18:40:19 UTC from IEEE Xplore. Restrictions apply.
The calculation of the metrics precision and recall is PPM
realized for each change request, then, for example, rec- N precision recall recall rate
ommending one similar change request we get many values 1 7.5013% 7.1029% 13.1048%
of precision and recall, one for each change request that 5 3.7490% 16.87% 28.0242%
receives recommendation. The same happens to two, three, 10 2.6758% 24.6009% 37.0968%
four similar change requests recommendations, and so on. LSI
The results presented for the precision and recall in the tables N precision recall recall rate
are the means of the results of all change requests. We use 1 5,4571% 5.6412% 8.871%
this approach to simplify the data presentation. 5 3.0886% 13.9663% 22.1774%
The tables III, IV and V show the results of the three
10 1.8808% 18.5938% 29.4355%
experiments. The first column shows the number of similar
change requests recommended, and the next columns show
Table V
the mean of the metrics precision and recall, and recall rate GIMP EXPERIMENT RESULTS
respectively.

PPM
T = 6640, z = −4.9494 and the critical z is +-1.959962,
N precision recall recall rate
then, we can reject the null hypothesis Hn1 . For recall in
1 17.8476% 18.6704% 22.1925% the GTK+ project, the PPM had 46.5235% and LSI had
5 7.7356% 37.7189% 44.1176% 15.9488%. The Wilcoxon signed-rank test for matched pairs
10 4.9950% 46.5235% 53.7433% with 0.05 of significance returns T = 3096, z = −5.9436
LSI and the critical z is +-1.959962, then, we can reject the null
N precision recall recall rate hypothesis Hn2 . For the recall rate in GTK+ project, the
1 2.6744% 1.7342% 3.2086% PPM had 53.7433% and LSI had 20.3209%. The proportion
5 1.7887% 8.0765% 10.6952% test with 0.05 of significance returns P − V alue = 0, z =
10 1.1780% 15.9488% 20.3209% 9.4648 and the critical z is +-1.96, then, we can reject the
null hypothesis Hn3 .
Table III The experiment using Hadoop had 4.2510% of precision
GTK+ EXPERIMENT RESULTS
using PPM and 2.5328% using LSI. The Wilcoxon signed-
rank test for matched pairs with 0.05 of significance returns
T = 11393, z = 4.4732 and the critical z is +-1.959962,
PPM then, we can reject the null hypothesis Hn1 . For recall in
N precision recall recall rate the Hadoop project, the PPM had 45.9387% and LSI had
1 11.3363% 11.5758% 21.2% 32.7126%. The Wilcoxon signed-rank test for matched pairs
5 5.987% 32.9433% 53.2% with 0.05 of significance returns T = 11717, z = 22.0381
10 4.2510% 45.9387% 66.8% and the critical z is +-1.959962, then, we can reject the
LSI null hypothesis Hn2 . For the recall rate in Hadoop project,
N precision recall recall rate the PPM had 66.8% and LSI had 51.6%. The proportion
1 6.9959% 7.9908% 15.6% test with 0.05 of significance returns P − V alue = 0.0005,
z = 3.4579 and the critical z is +-1.96, then, we can reject
5 3.3429% 19.9785% 36%
the null hypothesis Hn3 .
10 2.5328% 32.7126% 51.6%
The experiment using GIMP had 2.6758% of precision
using PPM and 1.8808% using LSI. The Wilcoxon signed-
Table IV
H ADOOP EXPERIMENT RESULTS rank test for matched pairs with 0.05 of significance returns
T = 27454, z = 15.9605 and the critical z is +-1.959962,
then, we can reject the null hypothesis Hn1 . For recall in
V. D ISCUSSION the GIMP project, the PPM had 24.6009% and LSI had
All results using PPM to find similar change requests 18.5938%. The Wilcoxon signed-rank test for matched pairs
were better than the results using LSI. Let’s analyze the with 0.05 of significance returns T = 13375, z = 14.4705
hypotheses always using the case that we recommend ten and the critical z is +-1.959962, then, we can reject the null
change requests. hypothesis Hn2 . For the recall rate in GIMP project, the
The experiment using GTK+ had 4.9950% of precision PPM had 37.0968% and LSI had 29.4355%. The proportion
using PPM and 1.1780% using LSI. The Wilcoxon signed- test with 0.05 of significance returns P − V alue = 0.0104,
rank test for matched pairs with 0.05 of significance returns z = 2.5607 and the critical z is +-1.96, then, we can reject

23

Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on August 18,2020 at 18:40:19 UTC from IEEE Xplore. Restrictions apply.
the null hypothesis Hn3 . The recommendation engine of Mentor could be inte-
The recall rate is a very important metric in these ex- grated in some project management systems. This way the
periments, because the Mentor may make recommendations project does not need to change de tool to manage the project
of change requests that were solved modifying some correct and the Mentor will only add some important features
files, but also other files. This recommendation still be useful related to recommendation in the bug tracking system of
for the developer, however, the precision and recall values the management tool.
tend to low. Other important work we need to do is evaluate the
For example, the change request #7300 of Hadoop Com- Mentor using a case study. Developers using the tool could
mon project was solved changing three .java files (Con- show to us if the tool is really useful in real tasks of real
figuration.java, TestConfiguration.java and StringUtils.java). software projects.
Using PPM, the most similar change request recommended
ACKNOWLEDGMENT
by Mentor was #7001, and it was solved changing seven
.java files, including Configuration.java. In this case, the The authors would like to thank the Conselho Nacional
change request enters as a correct answer in the recall rate de Desenvolvimento Cientı́fico e Tecnológico (CNPq) and
calculation, but the precision was only 14.2%. Despite the the Coordenação de Aperfeiçoamento de Pessoal de Nı́vel
low value of precision, the recommendation of the seven Superior (CAPES) for funding this research.
.java files may be very good, because a developer is likely R EFERENCES
to open the TestConfiguration.java file when he finds the
[1] C. Hansman, V. Mott, A. Ellinger, and T. Guy, Critical Per-
Configuration.java file, for the reason that the first is the
spectives on Mentoring: Trends and Issues. Columbus, OH:
test file of the second. Further, in the Configuration.java Clearinghouse on Adult, Career and Vocational Education,
there are many uses of methods of the class StringUtils, 2002.
then, a developer reading the code will open the StringUtils
file too. Using LSI the change request #7300 appeared as [2] D. Salomon and G. Motta, Handbook of Data Compression,
5th ed. Springer Publishing Company, Incorporated, 2009.
the 8th most similar change request.
The Hadoop project follows a rigid commit message [3] D. Cubranic, G. C. Murphy, J. Singer, and K. S. Booth,
guideline, the relation between a revision and a change “Hipikat: A project memory for software development,” IEEE
request is very clear. The GIMP project, that had low results, Trans. Softw. Eng., vol. 31, no. 6, pp. 446–465, 2005.
does not follow clear rules to create commit messages.
[4] A. Begel, Y. P. Khoo, and T. Zimmermann, “Codebook:
However, the GTK+ project that had as good results as discovering and exploiting relationships in software reposi-
Hadoop does not follow any clear rule too. So, the rigid tories,” in Proceedings of the 32nd ACM/IEEE International
commit messages pattern may do not have so much influ- Conference on Software Engineering - Volume 1, ser. ICSE
ence, because the Matcher does a good job looking for the ’10. New York, NY, USA: ACM, 2010, pp. 125–134.
change requests ID.
[5] A. H. Moin and M. Khansari, “Bug localization using revision
After the three experiments we can conclude that to use log analysis and open bug repository text categorization,” in
PPM to get similarity among text is potentially better than OSS, 2010, pp. 188–199.
to use LSI.
[6] V. Vapnik, Statistical learning theory. Wiley, 1998.
VI. C ONCLUSION
[7] C. E. Shannon, “A mathematical theory of communication,”
To move an experienced developer from his main tasks, to The Bell system technical journal, vol. 27, pp. 379–423, Jul.
1948.
act as a mentor may cause a delay in the project and increase
its cost. In this paper we presented a tool called Mentor that [8] I. H. Witten and E. Frank, Data Mining: Practical Machine
tries to help the developers to solve change requests, and to Learning Tools and Techniques, Second Edition (Morgan
avoid moving experienced members from their tasks. Kaufmann Series in Data Management Systems). San
Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,
We ran three experiments to evaluate the tool comparing
2005.
PPM with LSI to find similarity and all null hypotheses were
rejected in the three experiments. Using PPM we achieved [9] P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of
results for recall rate between 37% and 66.8%, and using duplicate defect reports using natural language processing,”
LSI the results were between 20.3% and 51.6%. in Proceedings of the 29th international conference on
Software Engineering, ser. ICSE ’07. Washington, DC,
There are many possibilities for future work. The first USA: IEEE Computer Society, 2007, pp. 499–510. [Online].
is expanding the recommendations to other artifacts, for Available: http://dx.doi.org/10.1109/ICSE.2007.32
example, messages of a mailing list or the documentation of
the project. They are made by text, then it seems possible [10] M. F. Triola, Introdução à Estatı́stica, 10a. edição. LTC,
to use PPM to find similarity among all these artifacts. 2008.

24

Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on August 18,2020 at 18:40:19 UTC from IEEE Xplore. Restrictions apply.

You might also like