Research Article 1
Research Article 1
[Link]/locate/jss
Received 3 April 2006; received in revised form 5 September 2006; accepted 9 September 2006
Available online 18 October 2006
Abstract
In addition to degrading the quality of software products, software defects also require additional efforts in rewriting software and
jeopardize the success of software projects. Software defects should be prevented to reduce the variance of projects and increase the sta-
bility of the software process. Factors causing defects vary according to the different attributes of a project, including the experience of
the developers, the product complexity, the development tools and the schedule. The most significant challenge for a project manager is
to identify actions that may incur defects before the action is performed. Actions performed in different projects may yield different
results, which are hard to predict in advance. To alleviate this problem, this study proposes an Action-Based Defect Prevention (ABDP)
approach, which applies the classification and Feature Subset Selection (FSS) technologies to project data during execution.
Accurately predicting actions that cause many defects by mining records of performed actions is a challenging task due to the rarity of
such actions. To address this problem, the under-sampling is applied to the data set to increase the precision of predictions for subse-
quence actions. To demonstrate the efficiency of this approach, it is applied to a business project, revealing that under-sampling with FSS
successfully predicts the problematic actions during project execution. The main advantage utilizing ABDP is that the actions likely to
produce defects can be predicted prior to their execution. The detected actions not only provide the information to avoid possible defects,
but also facilitate the software process improvement.
2006 Elsevier Inc. All rights reserved.
0164-1212/$ - see front matter 2006 Elsevier Inc. All rights reserved.
doi:10.1016/[Link].2006.09.009
560 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570
disadvantage of applying the defect distribution analysis is collected from a project developing an Attendance Man-
that the reported defects may fall into different categories. agement System for the Customs Office of the Ministry
The defect tendency is difficult to investigate when the root of Finance of Taiwan (AMS-COMFT), where the informa-
cause analysis schema is complicated and the sample size of tion concerning defects and performed actions was
defect is small (Leszak et al., 2002). To solve this problem, recorded according to the proposed schema. The project
the historic data on multiple releases of products can be was started in 2000, and finished in 2001. The remainder
utilized to discover the defect patterns, and used to predict of this study is organized as follows. Section 2 presents
the possible defects. To reduce the effort involved in data an overview of the defect prevention and related works.
gathering, the historic data are typically obtained from Section 3 describes the architecture of the ABDP process,
the Configuration Management System (Khoshgoftaar while Section 4 discusses the data set to be analyzed using
et al., 2000). The difficulty with utilizing multiple release ABDP. The analytical results are shown and discussed in
data to discover the defect patterns is that the attributes Section 5. Finally, Section 6 draws conclusions.
of the actions performed on different releases of products
may be different owing to changes in resources in the pro- 2. Background
ject, and cannot be applied to in-process prediction.
This study proposes an action-based defect prevention 2.1. Software process improvement
(ABDP) approach, which applies classifies the records of
the performed actions to predict whether the subsequent The software process can be defined as a sequence of
actions cause defects in the same project. An action is activities used to satisfy customer’s requirements, and
defined herein as an operation performed based on the task involves developing a new software product or maintaining
in Work Breakdown Structure (WBS) of the project. software with available resources (Sommerville, 2001). To
Rather than focusing on the reported defects, ABDP mines achieve the goal of the project, some activities are selected
the patterns of actions that may cause defects, and uses the according to the software process model. These activities
analytical results to predict whether the subsequent actions can be used to describe the software process. The selected
are likely to generate defects. Once actions with high prob- activities contain many tasks, which can be further divided
ability of causing defects are identified, stakeholders can into several operations to be performed. The tasks can be
review these actions carefully and take appropriate correc- represented using the WBS. The project can be decom-
tive actions. The newly performed actions are continually posed into work packages, each of which can be further
appended to the historic data set to construct a new predic- divided into tasks, each of which is assigned to a particular
tion model for subsequent actions. To address the imbal- person (Pressman, 2001). The tasks in WBS can be per-
anced data set problem where the number of actions formed in different ways, and produce different results,
causing defects is fairly small, this study applies under-sam- such as the efforts used and the defects generated of the
pling techniques to the data set, and compares the results products, according to who is assigned to do them. Hence,
with those of over sampling. The comparison results indi- the process needs to be managed to guarantee that it is con-
cate that under-sampling achieves more precise predictions ducted as expected (Florac and Carleton, 1999).
than over-sampling. ABDP also adopts the Feature Subset Selecting attributes to reflect the status of the current
Selection (FSS) technique to filter out the important attri- process, and applying methodologies to analyze the col-
butes and thus improve the prediction accuracy. The lected data, are the most important parts of measuring
advantages of applying ABDP to measure the process are the process, but require significant effort. To reduce the
as follows: effort of data collection, most software companies define
a set of attributes (i.e., the number of defects, staff experi-
• In-process prediction: The data used to construct the pre- ence, earned value and effort) to collect data where the
diction model are obtained from the same project that attributes can be categorized into many issue areas (i.e.,
can decrease the variance between different projects. the schedule, quality and customer satisfaction) (Jones,
• Requires less effort to collect data: Actions and defect 2003). The set of selected measures must not only reduce
reporting are common procedures for most software the data collection effort, but must also be flexible to sup-
teams, and the required data can be collected from these port the future analysis to avoid the problem of data
reports. required for analysis being unavailable. To address these
• Reduces the effort in identifying the problem in the pro- problems, data collection tools are applied, and data, such
cess: The detected actions that are likely to cause defects as the daily work report, change request, modification
can be further analyzed and reviewed in the causal anal- records and defect records, are collected from existing pro-
ysis meeting, thus reducing the effort involved in identi- jects (Kilpi, 2001; Lawler and Kitchenham, 2003; Aversano
fying problematic actions. et al., 2004).
The collected data then can be analyzed using analysis
To demonstrate the efficiency of ABDP, this approach tools. The earned value management is a common method-
was applied to a business project, where the results are pre- ology for evaluating the cost and schedule performance of
sented in the result section. The data used in this study were the process (Fleming, 1998). The problem of using the
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 561
earned value is that the index may not reflect the status of data, and predict the possible results of the subsequent
the project when the project changes rapidly (Boehm and actions. The classification with decision tree is one of the
Huang, 2003). The control charts are also common tools common approaches for analyzing the data (Han and
for determining whether the process is under control. The Kamber, 2001). The C4.5 algorithm (Quinlan, 1993), which
selected attributes are treated as random variables, and was extended from the ID3 algorithm (Quinlan, 1986), is a
can be analyzed statistically (Weller, 2000). The control well-known algorithm for building the decision tree, and
chart depicts the quantitative view of the project, where provides well accuracy and efficiency of prediction. Exten-
the abnormal symptom shown when the problems occur. sions of C4.5 include handling attributes with continuous
To analyze the problem in further detail, the project man- ranges, estimating unknown values, pruning the decision
ager needs to discuss with the stakeholders. Once the root tree and other useful extensions (Lim et al., 2000).
causes of the problems are identified, corrective actions can To predict the action that is likely to cause defects, two
be planned and implemented (Humphrey, 1989). major problems have to be solved before applying the clas-
sification tree model for defect prediction, the rarity prob-
2.2. Causal analysis lem and irrelevant feature problem. The rarity problem
occurs because the number of actions that cause defects
Causal analysis is an approach used to identify the (the minority class) is small compared to the number of
causes of defects. It is also an important step in the defect actions that do not cause any defects (the majority class).
prevention process, which integrates several activities into The sampling technique is commonly used to solve the rar-
the development process to prevent defects from occurring ity problem. Under-sampling can be used to reduce the
(Mays et al., 1990). The main procedures of the causal number of the majority class, while over-sampling is used
analysis are item selection and analysis (CMMI Product to increase the number in the minority class (Weiss,
Team, 2001). To select the defect items for analyzing, the 2004). Selected attributes for classification may be redun-
defect classification schema can be adopted to categorize dant or irrelevant, causing actions to be classified incor-
the reported defects (Chillarege et al., 1992), which can rectly. The feature subset selection can be applied to
be prioritized according to frequency of occurrence, defect address the rarity problem, where only the relevant attri-
severity, cost of impact and type of defect (Mohapatra and butes are selected to construct the model (Dy and Brodley,
Mohanty, 2001). 2000). The wrapper and filter are two common approaches
The selected defects then can be further analyzed in used for feature selection. The wrapper wraps the FSS and
detail in a causal analysis meeting, where brainstorming induction algorithm as a black box, where the feature sub-
is a common approach in causal analysis. The efficiency set is searched to find a good subset of features, and is eval-
of this approach depends on the experience of the analysts. uated by the induction algorithm (Kohavi and John, 1996).
The variance of the analytical results of this approach can The filter treats the feature selection as an independent pro-
be reduced by using a checklist, which needs to be tailored cess from the induction algorithm, where the undesired fea-
(NASA, 2000). The difficulty of using the elicitation tures are eliminated before the induction algorithm. The
approach is that a particular defect may have many possi- Correlation-based Feature Selection (CFS) is a popular fil-
ble causes, and the actual cause is not easy to identify. To ter algorithm, which evaluates and ranks the intercorrela-
reduce the effort in selecting and analyzing the defect items, tion among the feature subset rather than individual
automated support for software defect prediction is neces- correlations, where both the continuous and discrete attri-
sary for causal analysis. For instance, the reported defects butes can be measured by the CFS (Hall, 2000). To facili-
can be categorized to analyze the cause of the defects (Pod- tate the feature selection process, the search strategy can be
gurski et al., 2003), and the classification tree model can utilized to select a desired feature subset within a reason-
also be applied to the data over multiple releases of soft- able time, such as the sequential forward search, hill climb-
ware components to identify components with defects ing search and best-first search. The best-first search with
(Khoshgoftaar et al., 2000). However, these methods focus forward search is a common method applied on CFS for
on the reported defects rather than measuring the actions feature selection, and achieves good results (Russell and
in advance, while measurement of actions can provide Norvig, 1995).
practical predictions to prevent defects from occurring.
To define an action schema of actions, the Multi-User 3. The ABDP architecture
Dimension (MUD) refines the process into tasks, transac-
tions and actions that can be used to support the data col- The execution of a software process can be treated as a
lection stage of the software development process (Doppke sequence of actions executed in sequence or parallel to
et al., 1997). achieve the objective of the project. The ABDP approach
proposed herein treats the action as the basic element used
2.3. The prediction model to execute the task of the WBS. The action can be as small
as an operation to correct a bug, or as large as coding a
Data mining techniques can be applied to build models module. The execution of an action can be divided into
describing the behaviors of the processes from the collected three stages, namely planning, execution and reporting.
562 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570
Table 3
The selected features used to describe the action
ID Feature name Possible values & Description
1 Action_State 0: Scheduled,1: Unscheduled
2 Action_Type N: New, M: Modify, D: Delete, A: Add, –: None
3 Link_By R: R action, D: D action, –: root action
4 Action_Complexity 0: Low, 5: Median, 10: High
5 Object_Type 0: none, 1. Documentation, 2: Database, 3: Application, 4: System configuration
6 Effort_Expected Integer value (the efforts expected to be used)
7 Action_Originator 0: None, 1: Customer, 2: User, 3: Manager, 4: Programmer
8 Action_Target 0: None, 1: RD, 2: PD, 3: DD, 4: Coding, 5: Testing, 6: Maintenance, 7: Support
9 Num_of_action_objects Integer value (the number of objects operated by this action)
10 Task the task id that the action to perform (i.e., 10, 14, 18, . . .)
11 Task_Status 0: Within schedule, 1: After schedule, 2: After completion, 9: Unknown
12 Task_Effort_Estimated Integer value (the estimated efforts of the task)
13 Task_Actions Integer value (the number of performed action)
14 Task_Modificaiton Integer value (the number of performed action with Action_Type = M)
15 Task_New Integer value (the number of performed action with Action_Type = N)
16 Task_Reaction Integer value (the number of performed action with Link_By = R)
17 Task_D_action Integer value (the number of performed action with Link_By = D)
18 Task_severe_D Integer value (the number of severe defects reported of the task)
19 Task_defect_effort Integer value (the efforts used to address the defects of the task)
20 Task_progress Real value (the ratio of the used effort to the estimated effort of the task)
21 total_defect_num L: the number of reported defect is less than 2, M: between 3 and 5, H: more than 5
RD: requirement development, PD: preliminary design, DD: detailed design.
564 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570
Collected Data
feature. The remaining nodes of the decision tree are con-
Selected
Raw Data Features Subset Control structed by the divide and conquer strategy, which selects
the feature subset according to the evaluation of individual
Feature features. The CFS with best-first search strategy can be
Data Data Filter out the Data
Transformation Validation
Subset
Data set Sampling used to improve the prediction accuracy of the C4.5 algo-
Selection
rithm (Hall and Smith, 1999).
Data Set 1 Data Set 2 Data Set 3 Data Set 4 3.3. The prediction model construction
Fig. 6. The data processing of the ABDP. Instead of using the data collected from the previous
project to build the prediction model, the ABDP approach
builds the prediction model using the data collected from
Third, the FSS technique is used to filter out unneces- the current process to increase the prediction accuracy
sary attributes from the data set. The Data Set 2 can then (since the actions used to build the model have many sim-
be used to build the correlation matrix (using the whole ilar features to the submitted action, such as the stakehold-
data set as the training data set by default) and find the best ers, environments and work products).
feature subset using the best-first search. The CFS is Fig. 7 shows the data set following preprocessing as in
selected as the evaluator to evaluate the worth of the fea- Fig. 5, where the performed actions of the software process
ture subset, and the best-first search strategy is used to can be listed by action date (date performed). Actions sub-
reduce the search space of the feature subset selection. mitted at the beginning of the project (the actions 1–20)
Fourth, the selected feature subset then can be used to filter cannot be predicted, since no prediction model can be
out the desirable data, where the data of unselected fea- applied to them. After some actions (20 actions in this
tures are removed. example) have been performed, the prediction model can
Fifth, the data sampling step is performed to sample the be built using the performed actions (the actions 1–20).
major class using under-sampling (by default) and gener- The built model then can be used to predict the following
ates the final data set (Data Set 4) to be analyzed by the submitted actions, in this case actions 21–30. The model
data analysis element. The proposed sampling step is is then updated after action 30 is performed, where the per-
applied to address the rarity problem, which may cause formed actions 1–30 are used as the training data set to
the decision tree to classify all submitted actions to the build the new prediction model. The updated model can
major class (predicted as Low-defect action). The over- then be used to predict the following submitted actions
sampling can be used to duplicate the rare classes, and thus (the actions 31–40). The model continues to be updated
address the imbalance problem. However, the over- after certain submitted actions are performed until the
sampling may cause overfitting, since the duplication does end of the project.
not generate new rate class data (Chawla et al., 2002). The submitted actions need to be preprocessed as
Rather than duplicating the rare class data, the under-sam- described in Section 3.1 to generate the format (the same
pling applied in this study reduces the number of major as the data set used to build the prediction model). The
class data, and can be effectively used with C4.5 (Drum- number of defects in the submitted actions is the class
mond and Holte, 2003).
Data Set
3.2. The data analyzing Action Action Action defect
ID Date State … Task … num
1 2000/5/10 0 … 20 … 0
The data analysis element is used to analyze the data by 2 2000/5/11 0 … 22 …
Link_by = D
1
using classification techniques and build the prediction : Action_State = 0
Task_severe_D <= 5
20 2000/5/12 0 … 20 … Effort_Expected
0
model from the data set prepared by the preprocessing ele- ...
<= 6
quent actions. The model kept updated when the Submitted Action Prediction
The C4.5 algorithm used to build the decision tree has Submitted Action Prediction
feature that needs to be predicted, and is unknown prior to actions are divided into many segments (10 actions for each
execution. However, the accuracy of a prediction can not segment), where the last action in each segment is the
be evaluated until the end of the project, because some checking point. The performed actions before the check
defects may not have been detected at this time. point are used to renew the prediction models. To evaluate
The interval used to update the prediction model can be the accuracy of the built model, ten subsequent actions, fol-
based on either the number of submitted actions, or time. lowed by check point, were selected as the testing data to be
In the first case, the prediction model is updated after a spe- included in the training data on the next iteration to renew
cific number of performed actions (ten in this study). In the the models. For example, the first iteration used actions 1–
second case, the model is updated after a specific time inter- 20 as the training data, and action 21– 30 to test the accu-
val, such as one day or one week. For instance, the predic- racy of the models. The second iteration used actions 1–30
tion model can be updated at the midnight every day to as the training data, and action 31–40 as the testing data.
ensure that new actions are not submitted when updating Hence, 66 iterations were generated, and applied to evalu-
the prediction model. However, the manager can evaluate ate the efficiency of ABDP.
the interval selection.
4.3. Accuracy evaluation
4. The experiment
The accuracy, precision, recall and specificity are com-
The main purpose of ABDP is to capture the actions that mon ways to assess the prediction model. These evaluators
cause high or middling numbers of defects, all of which need are listed as Eqs. (1)–(4), where T, F, P and N represent
to be corrected. ABDP was applied to the data set obtained true, false, positive and negative respectively. The accuracy
from the AMS-COMFT project according to the proposed is the percentage of correct predictions (include high defect
schema to demonstrate the efficiency of ABDP. Table 3 and low defect predictions) among all predictions. The pre-
shows defined features used for data collection. The first cision denotes the percentage of the correct prediction of
nine features can be retrieved directly from the actions, high-defect (or median-defect) actions (the positive part).
while the remaining features need to be determined from The recall (sensitivity) denotes the percentage of high-
the tasks and defects of the action. The total_defect_num defect actions that have been discovered. The specificity
is the number of defects caused by the action, and is used denotes the percentage of low-defect actions that have been
to classify the action as low, median or high defect. The classified correctly.
Task_Status indicate if the status of the task is within or
over schedule where the status is determined by comparing TP þ TN
Accuracy ¼ ð1Þ
the action performed date against the scheduled date of the TP þ TN þ FP þ FN
task. The status falls within the schedule when the action TP
Precision ¼ ð2Þ
performed date is before the scheduled date of the task. TP þ FP
The task_progress represents the progress of the task TP
Recall ¼ ð3Þ
when the action is ready to be performed, where the task TP þ FN
progress is calculated by dividing the used efforts of all per- TN
formed actions of the task by expected efforts of the task. Specificity ¼ ð4Þ
FP þ TN
When the value of the task_progress is greater than 1, the
efforts of the task are overrun, possibly affecting the gener- Since the ABDP is utilized to predict the actions that
ated defects of the action. The following subsection explains cause high defects (include the median defects in this
the data set in detail, and shows how the ABDP analysis can study), the high-defect prediction is treated as positive,
be applied to the software development process. and the recall can be treated as the percentage of high
defect actions that can be predicted correctly. The false
4.1. The data set alarm rate is also defined as the percentage of low-defect
actions misclassified as high-defect actions.
The AMS-COMFT project contains seven work pack-
ages and 22 tasks. The project contains 682 actions, sorted 5. Results and discussion
by the performed date. Only 26 actions cause middling
defects, and 15 actions cause high defects. Most actions To compare the accuracy of sampling techniques, the
cause few or no defects, and a total of 413 defects were data set was analyzed using under-sampling and over-
caused by these actions at the end of the project (not samling with FSS, and represented as four categories in
including the maintenance phase). Table 4. The first two categories, Under without FSS and
Over without FSS, show the results of under and over sam-
4.2. The iteration of software process pling without FSS. The results of applying FSS to under
sampling and over sampling are shown as the categories
To demonstrate the efficiency of the ABDP approach Under with FSS and Over with FSS. For each category,
applied on the software development process, all sorted the MH (high and median defects) class and L class (low
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 567
Table 4
The summary of applying the sampling and FSS to the testing data in each iteration
Under without FSS Over without FSS Under with FSS Over with FSS
MH class L class MH class L class MH class L class MH class L class
MH L MH L MH L MH L MH L MH L MH L MH L
1 0 0 10 0 0 0 0 10 0 0 0 10 0 0 0 10
2 1 0 1 8 1 0 0 9 1 0 0 9 1 0 0 9
3 1 0 2 7 1 0 0 9 0 1 2 7 1 0 0 9
4 0 2 1 7 0 2 1 7 0 2 1 7 0 2 1 7
5 0 0 3 7 0 0 2 8 0 0 7 3 0 0 1 9
6 0 0 1 9 0 0 1 9 0 0 1 9 0 0 0 10
7 0 0 2 8 0 0 4 6 0 0 2 8 0 0 0 10
8 0 0 4 6 0 0 3 7 0 0 4 6 0 0 3 7
9 1 1 2 6 0 2 0 8 1 1 2 6 1 1 1 7
10 0 0 2 8 0 0 1 9 0 0 2 8 0 0 0 10
11 2 0 1 7 2 0 0 8 2 0 1 7 2 0 0 8
12 0 1 0 9 1 0 0 9 1 0 0 9 0 1 0 9
13 0 0 2 8 0 0 2 8 0 0 2 8 0 0 0 10
14 3 0 3 4 0 3 0 7 3 0 3 4 0 3 0 7
15 0 0 2 8 0 0 2 8 0 0 1 9 0 0 2 8
16 1 0 0 9 1 0 0 9 1 0 0 9 0 1 0 9
17 0 1 0 9 0 1 1 8 0 1 1 8 0 1 1 8
18 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
19 0 0 0 10 0 0 0 10 0 0 1 9 0 0 1 9
20 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
21 0 0 1 9 0 0 0 10 0 0 3 7 0 0 2 8
22 0 0 6 4 0 0 0 10 0 0 1 9 0 0 0 10
23 0 0 4 6 0 0 0 10 0 0 4 6 0 0 0 10
24 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
25 0 0 2 8 0 0 1 9 0 0 2 8 0 0 1 9
26 0 2 0 8 1 1 0 8 2 0 0 8 0 2 0 8
27 0 0 2 8 0 0 1 9 0 0 1 9 0 0 0 10
28 2 0 0 8 2 0 0 8 2 0 0 8 0 2 0 8
29 1 0 1 8 1 0 1 8 1 0 1 8 0 1 1 8
30 0 0 2 8 0 0 2 8 0 0 2 8 0 0 2 8
31 1 0 1 8 0 1 0 9 1 0 1 8 1 0 1 8
32 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
33 1 0 0 9 0 1 0 9 1 0 0 9 1 0 0 9
34 1 0 2 7 1 0 0 9 1 0 2 7 1 0 2 7
35 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
36 0 0 1 9 0 0 0 10 0 0 1 9 0 0 1 9
37 1 0 1 8 1 0 1 8 1 0 1 8 0 1 1 8
38 0 0 4 6 0 0 2 8 0 0 4 6 0 0 0 10
39 0 1 1 8 0 1 0 9 0 1 1 8 0 1 0 9
40 0 0 1 9 0 0 0 10 0 0 1 9 0 0 1 9
41 1 0 2 7 0 1 0 9 1 0 2 7 0 1 0 9
42 2 0 2 6 1 1 0 8 2 0 2 6 1 1 0 8
43 1 0 1 8 1 0 1 8 1 0 1 8 1 0 0 9
44 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
45 2 0 0 8 1 1 0 8 2 0 0 8 1 1 0 8
46 1 0 0 9 0 1 0 9 1 0 0 9 1 0 0 9
47 1 0 1 8 1 0 1 8 1 0 1 8 0 1 1 8
48 0 0 1 9 0 0 1 9 0 0 1 9 0 0 1 9
49 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
50 0 0 3 7 0 0 1 9 0 0 3 7 0 0 2 8
51 0 0 2 8 0 0 0 10 0 0 2 8 0 0 0 10
52 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
53 1 0 0 9 0 1 0 9 1 0 0 9 0 1 0 9
54 2 2 0 6 0 4 0 6 2 2 0 6 0 4 0 6
55 1 0 1 8 1 0 3 6 1 0 1 8 1 0 4 5
56 0 0 10 0 0 0 2 8 0 0 10 0 0 0 3 7
57 0 0 6 4 0 0 0 10 0 0 1 9 0 0 2 8
58 0 0 2 8 0 0 2 8 0 0 2 8 0 0 1 9
59 0 1 1 8 0 1 1 8 0 1 0 9 0 1 0 9
60 0 0 1 9 0 0 6 4 0 0 0 10 0 0 0 10
(continued on next page)
568 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570
Table 4 (continued)
Under without FSS Over without FSS Under with FSS Over with FSS
MH class L class MH class L class MH class L class MH class L class
MH L MH L MH L MH L MH L MH L MH L MH L
61 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
62 0 0 1 9 0 0 0 10 0 0 0 10 0 0 0 10
63 0 0 4 6 0 0 0 10 0 0 1 9 0 0 0 10
64 0 0 3 7 0 0 0 10 0 0 3 7 0 0 0 10
65 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
66 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
defects) represent the actual number of testing cases which Classified as MH L Accuracy = 81.67
are classified as either MH or L. The numbers of testing Precision = 20.29
MH class 28 11 Recall = 71.79
data classified as MH or L are shown in the MH and L col- L class 110 511 Specificity = 82.29
umns respectively. The total number of MH cases was 39
rather than 41, since two of the MH cases were part of Fig. 10. The results of under sampling without FSS.
the training data at the first iteration (transaction 2 and
3), and the first testing data set started from transaction 21.
Each row represents the results of prediction at the significantly. That is, up to 72% of high-defect actions were
check point. For example, the results of iteration 9 using captured, which is an acceptable result. However, the pre-
the Under without FSS shows that one MH case was cat- cision was 21%, meaning that only one prediction in every
egorized into the L class and two L cases predicted as five predictions was correct. The precision must be further
MH class. The results can be shown as Fig. 8. The results improved to reduce the false alarm rate.
of using different approaches are shown in the following
subsections.
5.2. Applied the FSS with sampling
5.1. The Sampling without FSS To avoid misclassification of actions, the FSS was
applied to the data set for feature selection. The selected
Fig. 9 shows the results of applying over-sampling to the subset of features may not be the same when using different
testing data by selecting all features (without FSS). The training data. Table 5 lists the selected feature subsets using
accuracy was 90%, but the recall was only 28%, indicating the training data set in all iterations.
that many MH classes were missing (misclassified as L). By using the selected feature subset, the desirable attri-
Undetected high defect actions may increase the effort butes can be filtered out to build the prediction model.
involved in the process. Fig. 11 shows the results of applying over-sampling to
Fig. 9 demonstrates that most cases (both MH and L)
were classified as L, which not only raised the specificity,
but also reduced the recall. However, capturing as many Table 5
high-defect actions as possible is very important for defect The selected feature subset by iteration
prevention. Iteration Selected features Iteration Selected features
Under-sampling the majority classes rather than increas-
1 6, 15 27–39 2, 3, 13, 20
ing the number of rare classes, can improve the results. 2 6, 15, 20 40–41 1, 3, 6, 13, 20
Fig. 10 shows the results of under-sampling without FSS, 3 6, 15 42 1, 3, 6, 7, 13, 20
where the recall rose to 72% while the specificity did not fall 4 6 43–47 1, 3, 7, 13, 20
5 3, 6, 17, 20 48–49 1, 2, 3, 7, 13, 20
6–8 3, 6, 20 50–54 1, 2, 3, 13, 20
9 1, 3, 6, 17, 20 55–60 1, 2, 3, 6, 13, 20
Classified as MH L Accuracy = 70.00 10–11 2, 3, 6, 17, 20 61–62 1, 2, 3, 6, 7, 13, 17, 20
Precision = 33.33
MH class 1 1 Recall = 50.00
12 2, 3, 6, 7, 17, 20 63–64 1, 2, 3, 6, 13, 20
Specificity = 75.00 13 2, 3, 6, 17, 20 65–66 1, 2, 3, 6, 7, 13, 17, 20
L class 2 6
14–26 2, 3, 17, 20
Fig. 8. The results of predictions.
Fig. 9. The results of over sampling without FSS. Fig. 11. The results of over sampling with FSS.
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 569
Classified as MH L Accuracy = 85.15 predict the high defect classes. Over-sampling may
Precision = 25.21 lower the error of predicting low-defect classes, which
MH class 30 9 Recall = 76.92
L class 89 532 Specificity =85.67 is not the main objective of defect prevention.
(3) The recall and specificity can be improved by apply-
Fig. 12. The results of under sampling with FSS. ing the FSS technique. Applying FSS with under-
sampling can achieve desirable results.
the filtered data set. Although the specificity rose to 94%, The FSS with under-sampling can be applied to con-
and the precision also rose to 27%, the recall fell to 33%. struct the proposed prediction model for predicting high-
Thus, the prediction rate of over-sampling was not better defect actions. Additionally, we conclude that patterns
than that of under-sampling without FSS. exist among actions causing many defects, and these pat-
Fig. 12 shows the results of applying under-sampling terns can be modeled using data mining techniques. Future
with FSS, where both recall and specificity increased up work to identify these patterns will include applying
to almost 80%. The precision was 25%, meaning that one sequence pattern analysis techniques to increase the predic-
high defect action could be discovered out of four alarmed tion performance.
high-defect actions.
The analytical results in Fig. 12 indicate that the under-
Acknowledgements
sampling with simple feature subset selection can signifi-
cantly improve the efficiency of prediction. The results also
This work is partially supported by the National Science
reveal that high-defect actions can be found without caus-
Council of Taiwan, ROC, under Grant NSC-92-2213-E-
ing too many false alarms.
309-005, and partially sponsored by the Ministry of Eco-
nomic Affairs of Taiwan, under Grant 93-EC-17-A-02-S1-
6. Conclusion
029.
This study presents an action-based defect prevention
approach that can be applied to the software development References
process to detect actions that may cause many defects. The
ABDP approach presented in this study classifies data col- Aversano, L., Lucia, A.D., Gaeta, M., Ritrovato, P., Stefanucci, S.,
Villani, M.L., 2004. Managing coordination and cooperation in
lected from the reports of operations and defects of the
distributed software processes: The GENESIS Environment. Software
project. The FSS and sampling techniques can be applied Process Improvement and Practice 9, 239–263.
the data set to address the rarity problem. By detecting Boehm, B., Huang, L.G., 2003. Value-based software engineering: A case
the suspect actions, necessary corrective actions can be study. IEEE Computer 36 (3), 33–41.
taken to prevent the defects from occurring. Card, D.N., 1993. Defect-causal analysis drives down error rates. IEEE
Software 15 (1), 88–89.
The main advantage of ABDP is the in-process prediction
Card, D.N., 1998. Learning from our mistakes with defect causal analysis.
where the training data can be obtained from the project in IEEE Software 15 (1), 56–63.
execution used to build the prediction model. The in-process Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002.
analysis also can reduce the variance between different pro- SMOTE: Synthetic minority over-sampling technique. Journal of
jects. Second, the features utilized in ABDP to build the pre- Artificial Intelligence Research 15, 321–357.
Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S.,
diction model can be adapted from the existing process
Ray, B.K., Man-Yuen Wong, M.-Y., 1992. Orthogonal defect classi-
where the effort involved in modifying the existing process fication – A concept for in-process measurements. IEEE Transactions
for ABDP can be reduced. Third, the latest models can accu- on Software Engineering 18 (11), 943–956.
rately predict the submitted actions to obtain a quick Chrissis, M.B., Konrad, M., Shrum, S., 2003. CMMI guidelines for
response. The ABDP process can also be merged into the process integration and product improvement. Addison-Wesley, MA,
pp. 143–155.
existing process with little additional effort. To facilitate
CMMI Product Team. 2001. Capability Maturity Model Integration V1.1,
the ABDP process, this study defines a set of features, and Stage Representation. Software Engineering Institute, Carnegie Mel-
applies to the AMS-COMFT project to evaluate the perfor- lon University, Pittsburgh, USA.
mance of ABDP, where the results can be concluded as Doppke, J.C., Heimbigener, D.H., Wolf, A.L., 1997. Software process
follow. modeling and execution within virtual environments. ACM Transac-
tions on Software Engineering and Methodology 7, 1–47.
Drummond, C., Holte, R.C., 2003. C4.5, Class Imbalance, and Cost
(1) Actions that cause many defects among repositories Sensitivity: Why Under-Sampling beats Over-Sampling. In: Proceed-
of the software process are rare class actions which ings of the Workshop on Learning from Imbalance Data Sets II,
can not be classified directly and need to be prepro- International Conf. on Machine Learning.
cessed by sampling. The rarity problem of the high- Dy, J.G., Brodley, C.E., 2000. Feature selection for unsupervised learning.
Journal of Machine Learning Research 5, 845–889.
defect class also raises the difficulty of detecting the
Fleming, Q.W., 1998. Cost/Schedule Control Systems Criteris: The
actions that cause high defects. management Guide to C/SCSC. Probus.
(2) A comparison of under and over-sampling reveals Florac, W.A., Carleton, A.D., 1999. Measuring the Software Process.
that under-sampling produces acceptable results to Addison-Wesley, MA.
570 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570
Gale, J.L., Tirso, J.R., Burchfield, C.A., 1990. Implementing the defect Lim, T.-S., Loh, W.Y., Shin, Y.S., 2000. A Comparison of prediction
prevention process in the MVS interactive programming organization. accuracy, complexity, and training time of thirty-three old and new
IBM System Journal 29 (1), 33–43. classification algorithms. Machine Learning 40 (3), 203–228.
Hall, M.A., 2000. Correlation-based Feature Selection for Discrete and Mays, R.G., Jones, C.L., Holloway, G.J., Studinski, D.P., 1990.
Numeric Class Machine Learning. In: Proceedings of the Seventeenth Experiences with defect prevention. IBM System Journal 29 (1),
International Conference on Machine Learning, CA, USA, pp. 359–366. 4–32.
Hall, M.A., Smith, L.A., 1999. Feature Selection for Machine Learning: Mohapatra, S., Mohanty, B., 2001. Defect Prevention through Defect
Comparing a Correlation-based Filter Approach to the Wrapper. In: Prediction: A Case Study at Infosys. In: proceedings of International
Proceedings of the Florida Artificial Intelligence Symposium, FLAIRS Conference on Software Maintenance (ICSM 2001), Florence, Italy,
Conference, pp. 235–239. pp. 260–272.
Han, J., Kamber, M., 2001. Data Mining Concepts and Techniques. NASA. 2000. NASA Procedures and Guidelines for Mishap Reporting,
Morgan Kaufmann Publishers, USA. Investigating, and Recordkeeping. Safety and Risk Management
Humphrey, W., 1989. Managing the Software Process. Addison-Wesley, Division, NASA Headquarters, USA.
MA. Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M. 2003.
Jones, C.L., 1985. A process-integrated approach to defect prevention. Automated Support for Classifying Software Failure Reports. In:
IBM System Journal 24 (2), 150–167. Proceedings of the 25th International Conference on Software Engi-
Jones, T.C., 1994. Assessment and Control of Software Risks. Prentice neering (ICSE 2003).
Hall, NY. Pooley, R., Senior, D., Christie, D., 2002. Collecting and analyzing web-
Jones, C., 2003. Practical Software and Systems Measurement: A based project metrics. IEEE Software 19 (1), 52–58.
Foundation for Objective Project Management. Version. 4.0c. Depart- Pressman, R.S., 2001. Software Engineering: A Practitioner’s Approach.
ment of Defense and US Army, USA. McGraw-Hill, NY.
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., Hudepohl, J.P., 2000. Quinlan, J.R., 1986. Induction of decision tree. Machine Learning 1 (1),
Classification-tree models of software-quality over multiple releases. 81–106.
IEEE Transactions on Reliability 49 (1), 4–11. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan
Kilpi, T., 2001. Implementing a software metrics program at Nokia. IEEE Kaufmann Publishers, USA.
Software 18 (6), 72–77. Russell, S., Norvig, P., 1995. Artificial Intelligence: A Modern Approach.
Kohavi, R., John, G.H., 1996. Wrappers for feature subset selection. Prentice Hall, USA.
Artificial Intelligence 97 (12), 273–324. Sommerville, I., 2001. Software Engineering. Addison-Wesley, MA.
Lawler, J., Kitchenham, B., 2003. Measurement modeling technology. Weiss, G.M., 2004. Mining with rarity: a unifying framework. ACM
IEEE Software 12 (3), 68–75. SIGKDD Explorations Newsletter 6, 7–19.
Leszak, M., Perry, D.E., Stoll, D., 2002. Classification and evaluation of Weller, E.F., 2000. Practical applications of statistical process control.
defects in a project retrospective. The Journal of System and Software IEEE Software 17 (3), 48–55.
61, 173–187.