0% found this document useful (0 votes)

23 views12 pages

Research Article 1

The document discusses the Action-Based Defect Prevention (ABDP) approach aimed at reducing software defects by predicting actions that may cause them before execution. It highlights the challenges of identifying defect-causing actions due to varying project attributes and proposes using classification and Feature Subset Selection (FSS) technologies to improve prediction accuracy. The study demonstrates the effectiveness of ABDP through its application in a business project, showing that it can facilitate defect avoidance and enhance software process improvement.

Uploaded by

Niaz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

Research Article 1

Uploaded by

Niaz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

The Journal of Systems and Software 80 (2007) 559–570

[Link]/locate/jss

Defect prevention in software processes: An action-based approach

Ching-Pao Chang *, Chih-Ping Chu
Department of Computer Science and Information Engineering, National Cheng-Kung University, Taiwan, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan

Received 3 April 2006; received in revised form 5 September 2006; accepted 9 September 2006
Available online 18 October 2006

Abstract

In addition to degrading the quality of software products, software defects also require additional efforts in rewriting software and
jeopardize the success of software projects. Software defects should be prevented to reduce the variance of projects and increase the sta-
bility of the software process. Factors causing defects vary according to the different attributes of a project, including the experience of
the developers, the product complexity, the development tools and the schedule. The most significant challenge for a project manager is
to identify actions that may incur defects before the action is performed. Actions performed in different projects may yield different
results, which are hard to predict in advance. To alleviate this problem, this study proposes an Action-Based Defect Prevention (ABDP)
approach, which applies the classification and Feature Subset Selection (FSS) technologies to project data during execution.
Accurately predicting actions that cause many defects by mining records of performed actions is a challenging task due to the rarity of
such actions. To address this problem, the under-sampling is applied to the data set to increase the precision of predictions for subse-
quence actions. To demonstrate the efficiency of this approach, it is applied to a business project, revealing that under-sampling with FSS
successfully predicts the problematic actions during project execution. The main advantage utilizing ABDP is that the actions likely to
produce defects can be predicted prior to their execution. The detected actions not only provide the information to avoid possible defects,
but also facilitate the software process improvement.
2006 Elsevier Inc. All rights reserved.

Keywords: Software process improvement; Software repositories; Classiﬁcation; Mining rarity

1. Introduction and prevent defects or other process problems from occur-

ring in the future (Chrissis et al., 2003).
Software defects not only influence the quality of the Defect prevention was first proposed by IBM Corpora-
software products, but also can increase the effort involved tion to prevent future defects from occurring in its products
in a project. Furthermore, the undetected defects may need (Jones, 1985). The main steps of defect prevention are the
significant effort to detect and remove in subsequent soft- kickoff meeting, causal analysis and action item meeting
ware development stages (Pressman, 2001). Additionally, (Gale et al., 1990). The causal analysis is an important step
the high defect rate is also an important factor in the cost of the defect prevention process, and can be applied to
and schedule of the project overrun (Jones, 1994). Rather drive down the defect rates of software process where the
than detecting existing defects, the defect prevention can analysis meeting and interviewing with stakeholder are
be applied to prevent the defects from occurring, as in commonly used in this step (Card, 1993). The most signif-
the Causal Analysis and Resolution (CAR) in maturity icant challenge for causal analysis is to identify the causes
level 5 (ML 5) of the Capability Maturity Model Integra- of defects among large amounts of defect records where
tion (CMMI1). The objective of CAR is to detect, analyze the cause-effect diagram and control chart can be utilized
to support the analysis process (Card, 1998). To decrease
*
Corresponding author. Tel.: +886 7 6937633; fax: +886 7 6930435.
the effort involved in causal analysis, the defect distribution
E-mail address: chingpao@[Link] (C.-P. Chang). can be applied to shows the metrics of defects and classifies
1
CMMI is registered in the US Patent and Trademark Office. the defects in terms of their causes (Pooley et al., 2002). The

0164-1212/$ - see front matter 2006 Elsevier Inc. All rights reserved.
doi:10.1016/[Link].2006.09.009
560 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570

disadvantage of applying the defect distribution analysis is collected from a project developing an Attendance Man-
that the reported defects may fall into different categories. agement System for the Customs Office of the Ministry
The defect tendency is difficult to investigate when the root of Finance of Taiwan (AMS-COMFT), where the informa-
cause analysis schema is complicated and the sample size of tion concerning defects and performed actions was
defect is small (Leszak et al., 2002). To solve this problem, recorded according to the proposed schema. The project
the historic data on multiple releases of products can be was started in 2000, and finished in 2001. The remainder
utilized to discover the defect patterns, and used to predict of this study is organized as follows. Section 2 presents
the possible defects. To reduce the effort involved in data an overview of the defect prevention and related works.
gathering, the historic data are typically obtained from Section 3 describes the architecture of the ABDP process,
the Configuration Management System (Khoshgoftaar while Section 4 discusses the data set to be analyzed using
et al., 2000). The difficulty with utilizing multiple release ABDP. The analytical results are shown and discussed in
data to discover the defect patterns is that the attributes Section 5. Finally, Section 6 draws conclusions.
of the actions performed on different releases of products
may be different owing to changes in resources in the pro- 2. Background
ject, and cannot be applied to in-process prediction.
This study proposes an action-based defect prevention 2.1. Software process improvement
(ABDP) approach, which applies classifies the records of
the performed actions to predict whether the subsequent The software process can be defined as a sequence of
actions cause defects in the same project. An action is activities used to satisfy customer’s requirements, and
defined herein as an operation performed based on the task involves developing a new software product or maintaining
in Work Breakdown Structure (WBS) of the project. software with available resources (Sommerville, 2001). To
Rather than focusing on the reported defects, ABDP mines achieve the goal of the project, some activities are selected
the patterns of actions that may cause defects, and uses the according to the software process model. These activities
analytical results to predict whether the subsequent actions can be used to describe the software process. The selected
are likely to generate defects. Once actions with high prob- activities contain many tasks, which can be further divided
ability of causing defects are identified, stakeholders can into several operations to be performed. The tasks can be
review these actions carefully and take appropriate correc- represented using the WBS. The project can be decom-
tive actions. The newly performed actions are continually posed into work packages, each of which can be further
appended to the historic data set to construct a new predic- divided into tasks, each of which is assigned to a particular
tion model for subsequent actions. To address the imbal- person (Pressman, 2001). The tasks in WBS can be per-
anced data set problem where the number of actions formed in different ways, and produce different results,
causing defects is fairly small, this study applies under-sam- such as the efforts used and the defects generated of the
pling techniques to the data set, and compares the results products, according to who is assigned to do them. Hence,
with those of over sampling. The comparison results indi- the process needs to be managed to guarantee that it is con-
cate that under-sampling achieves more precise predictions ducted as expected (Florac and Carleton, 1999).
than over-sampling. ABDP also adopts the Feature Subset Selecting attributes to reflect the status of the current
Selection (FSS) technique to filter out the important attri- process, and applying methodologies to analyze the col-
butes and thus improve the prediction accuracy. The lected data, are the most important parts of measuring
advantages of applying ABDP to measure the process are the process, but require significant effort. To reduce the
as follows: effort of data collection, most software companies define
a set of attributes (i.e., the number of defects, staff experi-
• In-process prediction: The data used to construct the pre- ence, earned value and effort) to collect data where the
diction model are obtained from the same project that attributes can be categorized into many issue areas (i.e.,
can decrease the variance between different projects. the schedule, quality and customer satisfaction) (Jones,
• Requires less effort to collect data: Actions and defect 2003). The set of selected measures must not only reduce
reporting are common procedures for most software the data collection effort, but must also be flexible to sup-
teams, and the required data can be collected from these port the future analysis to avoid the problem of data
reports. required for analysis being unavailable. To address these
• Reduces the effort in identifying the problem in the pro- problems, data collection tools are applied, and data, such
cess: The detected actions that are likely to cause defects as the daily work report, change request, modification
can be further analyzed and reviewed in the causal anal- records and defect records, are collected from existing pro-
ysis meeting, thus reducing the effort involved in identi- jects (Kilpi, 2001; Lawler and Kitchenham, 2003; Aversano
fying problematic actions. et al., 2004).
The collected data then can be analyzed using analysis
To demonstrate the efficiency of ABDP, this approach tools. The earned value management is a common method-
was applied to a business project, where the results are pre- ology for evaluating the cost and schedule performance of
sented in the result section. The data used in this study were the process (Fleming, 1998). The problem of using the
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 561

earned value is that the index may not reflect the status of data, and predict the possible results of the subsequent
the project when the project changes rapidly (Boehm and actions. The classification with decision tree is one of the
Huang, 2003). The control charts are also common tools common approaches for analyzing the data (Han and
for determining whether the process is under control. The Kamber, 2001). The C4.5 algorithm (Quinlan, 1993), which
selected attributes are treated as random variables, and was extended from the ID3 algorithm (Quinlan, 1986), is a
can be analyzed statistically (Weller, 2000). The control well-known algorithm for building the decision tree, and
chart depicts the quantitative view of the project, where provides well accuracy and efficiency of prediction. Exten-
the abnormal symptom shown when the problems occur. sions of C4.5 include handling attributes with continuous
To analyze the problem in further detail, the project man- ranges, estimating unknown values, pruning the decision
ager needs to discuss with the stakeholders. Once the root tree and other useful extensions (Lim et al., 2000).
causes of the problems are identified, corrective actions can To predict the action that is likely to cause defects, two
be planned and implemented (Humphrey, 1989). major problems have to be solved before applying the clas-
sification tree model for defect prediction, the rarity prob-
2.2. Causal analysis lem and irrelevant feature problem. The rarity problem
occurs because the number of actions that cause defects
Causal analysis is an approach used to identify the (the minority class) is small compared to the number of
causes of defects. It is also an important step in the defect actions that do not cause any defects (the majority class).
prevention process, which integrates several activities into The sampling technique is commonly used to solve the rar-
the development process to prevent defects from occurring ity problem. Under-sampling can be used to reduce the
(Mays et al., 1990). The main procedures of the causal number of the majority class, while over-sampling is used
analysis are item selection and analysis (CMMI Product to increase the number in the minority class (Weiss,
Team, 2001). To select the defect items for analyzing, the 2004). Selected attributes for classification may be redun-
defect classification schema can be adopted to categorize dant or irrelevant, causing actions to be classified incor-
the reported defects (Chillarege et al., 1992), which can rectly. The feature subset selection can be applied to
be prioritized according to frequency of occurrence, defect address the rarity problem, where only the relevant attri-
severity, cost of impact and type of defect (Mohapatra and butes are selected to construct the model (Dy and Brodley,
Mohanty, 2001). 2000). The wrapper and filter are two common approaches
The selected defects then can be further analyzed in used for feature selection. The wrapper wraps the FSS and
detail in a causal analysis meeting, where brainstorming induction algorithm as a black box, where the feature sub-
is a common approach in causal analysis. The efficiency set is searched to find a good subset of features, and is eval-
of this approach depends on the experience of the analysts. uated by the induction algorithm (Kohavi and John, 1996).
The variance of the analytical results of this approach can The filter treats the feature selection as an independent pro-
be reduced by using a checklist, which needs to be tailored cess from the induction algorithm, where the undesired fea-
(NASA, 2000). The difficulty of using the elicitation tures are eliminated before the induction algorithm. The
approach is that a particular defect may have many possi- Correlation-based Feature Selection (CFS) is a popular fil-
ble causes, and the actual cause is not easy to identify. To ter algorithm, which evaluates and ranks the intercorrela-
reduce the effort in selecting and analyzing the defect items, tion among the feature subset rather than individual
automated support for software defect prediction is neces- correlations, where both the continuous and discrete attri-
sary for causal analysis. For instance, the reported defects butes can be measured by the CFS (Hall, 2000). To facili-
can be categorized to analyze the cause of the defects (Pod- tate the feature selection process, the search strategy can be
gurski et al., 2003), and the classification tree model can utilized to select a desired feature subset within a reason-
also be applied to the data over multiple releases of soft- able time, such as the sequential forward search, hill climb-
ware components to identify components with defects ing search and best-first search. The best-first search with
(Khoshgoftaar et al., 2000). However, these methods focus forward search is a common method applied on CFS for
on the reported defects rather than measuring the actions feature selection, and achieves good results (Russell and
in advance, while measurement of actions can provide Norvig, 1995).
practical predictions to prevent defects from occurring.
To define an action schema of actions, the Multi-User 3. The ABDP architecture
Dimension (MUD) refines the process into tasks, transac-
tions and actions that can be used to support the data col- The execution of a software process can be treated as a
lection stage of the software development process (Doppke sequence of actions executed in sequence or parallel to
et al., 1997). achieve the objective of the project. The ABDP approach
proposed herein treats the action as the basic element used
2.3. The prediction model to execute the task of the WBS. The action can be as small
as an operation to correct a bug, or as large as coding a
Data mining techniques can be applied to build models module. The execution of an action can be divided into
describing the behaviors of the processes from the collected three stages, namely planning, execution and reporting.
562 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570

The planning stage is performed to plan the execution of Table 2

the action, such as the description of the action, the The main features of defects
required resources of the action and the work products to Attributes Description
be performed. The stakeholders can then perform the Def_id The defect id
planned action. The results of the performed action, such Def_desc Description of the defect
as the actual efforts used to execute the action, and the Act_generated The action which results the defect
Act_detected The action which detects the defect
defects detected by the action, can be reported after execu- Act_removed The action used to remove the defect
tion. A set of features must be defined to collect the data Severity Severity of the defect
from the actions. Object_id The module which causes the defect
Table 1 lists the main features of the action. The
expected efforts and complexity of the action are evaluated
by the actor in advance. The originator denotes the stake-
holder who invokes the action. For instance, if the cus- Task 1 Task 2 Task 3

tomer sends a change request, which is approved by the a1 a2 a3 a4 a5 a6

D D R
project manager, then the originator is the customer. The R
a 24 a26
R a 21 a 22
originator may not be the same person as the actor who a 23 D a25
performs the action. Although the actions vary in size, to
reduce the complexity of individual factors, this study stip- Fig. 1. The relation between performed actions.
ulates that one action can only be performed by one person
in one task.
In practice, the execution of an action is not a single meaning that they are expected and can be scheduled to
event, and may cause other modules to be changed. For be performed according to the task, whereas unscheduled
example, changes to the DB API (Database Application actions are actions that can not be expected, such as a21
Interface) will lead to changes to all modules that use the (caused by defects) and a23 (caused by other action). Action
API, where these models may be developed by different a22 is performed to address the defect caused by a21, and
people. To represent the relation of actions, the reaction invokes two other actions a24 and a26 in two different tasks.
is used to indicate that the action is invoked by another Action a25 may be a reaction that is not performed imme-
action and denoted by R action which may produce a diately after a24 on time and result in a defect which needs
defect, and may be reported by a customer if it is not per- to be addressed by a25. The defects of an action may not be
formed on time (i.e., certain modules do not change when detected until the product is released and reported by the
the used API has been changed). The actions used to customers, and a reported defect may not be easy to trace
remove defects are defined as D action. An action that is to the action that caused it. The reported defects are then
neither R nor D action is defined as a root action. Table analyzed in a causal analysis meeting.
2 lists the main features of defects where the expected The predefined features can be divided into two groups
and actual used efforts to fix the defect can be retrieved according to the time that the data are available, the fea-
from the expected and used efforts of Act_removed (the tures that can be collected in planning stage denoted as
action used to remove the defect). Antecedent features, and the features that can only be
Fig. 1 shows the relationship between actions R and D. gathered after the execution stage denoted as Subsequent
Actions a1, a2, a3, a4, a5 and a6 are scheduled actions, features. For instance, the features shown in Table 1 are
Antecedent features, except the last feature – effort_used
(the actual efforts used by the action) whose value is
Table 1 unknown until the action is completed. The Subsequent
The main features of actions
features may include the number of defects generated by
Attributes Description this action, the total efforts used to correct these defects,
Act_id The action id and the number of severe defects generated by this action.
Task_id The task which is performed by the action Although the effort_used is known once the action is com-
Act_desc Description of the action
pleted, the number of defects generated by this action can
Act_date The date to perform the action
Act_state Indicate if the action is scheduled not be known until the end of the project. The main aim
Act_type The type of action, such as N (create new module), of the ABDP approach is to predict that the number of
M (modify, existing code), etc., defects generated by the action is greater than a specific
Complexity The complexity of the action (0:low, 10:high) threshold (such as three defects in this study) before the
Object_num The number of modules which are going to be changed
execution stage using the Antecedent features of the action.
or modified by this action
Originator The stakeholder who invokes the action, such as the Fig. 2 shows the procedure for executing an action.
programmer, manager or customer First, the action is planned to determine the values of Ante-
Reaction The action is invoked by another action cedent features, such as the action description, action type,
Effort_exp Estimate the efforts need to perform the action originator and state. The Antecedent features are then sub-
Effort_used The actual efforts used to perform the action
mitted to the prediction engine, and are predicted using an
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 563

Submit the Predict the Action_Type = N

Plan the Execute
Antecedent Subsequent Object_Type = 0: Low
Action
Features Features Predicted the Action
Object_Type = 1: High
as Low
Object_Type = 2: Medium
Object_Type = 3
Predicted as High Prediction Execution Action_Complexity = 0
Model Results
Effort_Expected >= 6: Medium
Effort_Expected < 6: Low
Fig. 2. The execution of an action. Action_Complexity = 5
Task_Status = 0: High
existing prediction model constructed from previously per- Task_Status = 1, 2 : Medium
Action_Complexity = 10: High
formed actions. The engine responds with predictions of Object_Type = 4: Medium
the submitted action. The submitted action may need to Action_Type = M, D, - : Low
be re-planned if a High-defect action is predicted, where Action_Type = A: Medium
the submitted action is defined as a High-defect action if
Fig. 3. An example of prediction model.
the number of defects generated by the action is greater
than 3, and as a Low-defect action otherwise.
By using the ARDP approach, the data set of performed • Action_Complexity = 0 (evaluated as Low complexity)
actions can be generated to build the prediction model. The • Object_Type = 3 (work on application)
number of generated defects of an action can be used to • Num_of_action_objects = 1 (one module will be worked
classify actions as low-defect (less than 3), medium-defects on)
(between 3 and 5) and high-defect (more than 5). The pre- • Originator = 4 (performed by programmer)
diction model can then be applied to predict the submitted • Link_By = -(this is a root action)
action will cause high defects or not. Fig. 3 shows a predic- • Effort_Expected = 6 (the efforts expected to execute the
tion model (decision tree) built from the performed actions, action)
where the names of the features are listed as in Table 3. To
perform a prediction, the Action_Type of a submitted The action submitted with above features is predicted as
action is compared with the model, and is predicted as High-defect according the values of Action_Type, Object_
Low-defect when the value of Action_Type is ‘M’, ‘D’ or Type, Action_Complexity and Effort_Expected shown in
‘’. Fig. 3. (The Medium and High are considered as the
For instance, an action to create a new module can be High-defect actions in this study.) To avoid High-defect
planned as follows (only some of the features are shown): actions, certain modifications on the submitted action can
be performed, such as decomposing the action into two
• Action_State = 0 (scheduled) or more actions to reduce the value of Effort_Expected
• Action_Type = N (create a new module) (such as below 6).

Table 3
The selected features used to describe the action
ID Feature name Possible values & Description
1 Action_State 0: Scheduled,1: Unscheduled
2 Action_Type N: New, M: Modify, D: Delete, A: Add, –: None
3 Link_By R: R action, D: D action, –: root action
4 Action_Complexity 0: Low, 5: Median, 10: High
5 Object_Type 0: none, 1. Documentation, 2: Database, 3: Application, 4: System configuration
6 Effort_Expected Integer value (the efforts expected to be used)
7 Action_Originator 0: None, 1: Customer, 2: User, 3: Manager, 4: Programmer
8 Action_Target 0: None, 1: RD, 2: PD, 3: DD, 4: Coding, 5: Testing, 6: Maintenance, 7: Support
9 Num_of_action_objects Integer value (the number of objects operated by this action)
10 Task the task id that the action to perform (i.e., 10, 14, 18, . . .)
11 Task_Status 0: Within schedule, 1: After schedule, 2: After completion, 9: Unknown
12 Task_Effort_Estimated Integer value (the estimated efforts of the task)
13 Task_Actions Integer value (the number of performed action)
14 Task_Modificaiton Integer value (the number of performed action with Action_Type = M)
15 Task_New Integer value (the number of performed action with Action_Type = N)
16 Task_Reaction Integer value (the number of performed action with Link_By = R)
17 Task_D_action Integer value (the number of performed action with Link_By = D)
18 Task_severe_D Integer value (the number of severe defects reported of the task)
19 Task_defect_effort Integer value (the efforts used to address the defects of the task)
20 Task_progress Real value (the ratio of the used effort to the estimated effort of the task)
21 total_defect_num L: the number of reported defect is less than 2, M: between 3 and 5, H: more than 5
RD: requirement development, PD: preliminary design, DD: detailed design.
564 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570

The following subsection described in detail the data

Project Planning Feature
Stage Definition
WBS & Action collection, data preprocessing and data analysis compo-
Definitions
nents.
Design, Coding ,
Testing Stage Action Resubmit 3.1. The data collection and preprocessing
Submission the action

If model To build the prediction model, the data of performed

exists
Action Correct or execute actions and generated defects need to be collected accord-
Prediction the submitted action ing to the defined features. The data collection element is
If no model used to record the information of the actions that are ready
exists Report defects &
Results of actions to be preformed, the results of the performed actions (i.e.,
Data the used efforts) and the reported defects, which can also be
Collection
Software Process input in later stages. Besides the features listed in Tables 1
Repositories and 2, other data related to an action can be considered for
collection, such as the information concerning the actor
Data Data set for and environments. The data collection component can be
Preprocessing Data Analysis implemented with a user interface to reduce the data collec-
tion effort.
The new model can Since the collected data may reside at different locations
be used to predict Data Analysis
Prediction Model
subsequent actions (or databases), the collected data need to be transformed
into a format that can be recognized by data mining
Fig. 4. The architecture of the ABDP process. engine. Fig. 5 shows that the collected data located at dif-
ferent database tables are transformed into a table accord-
ing to the features presented in Table 3. The last feature
The main aim of ABDP is to predict the possible defects (defect num) of the transformed data set denotes the subse-
during the software process by using the data from the quent features to be predicted for submitted actions.
same project, where the prediction model becomes stable As well as the data transformation, the data preprocess-
when the number of collected data increases. Fig. 4 shows ing element also includes the data validation, feature selec-
the architecture of the ABDP process. tion and filtered data, which can be expressed as Fig. 6.
The feature definition is used to define a set of features First, the collected raw data (including the data of per-
(such as Tables 1 and 2) to describe the attributes of action, formed actions and reported defects) are transformed into
such as effort taken, action type/complexity, the task, work Data Set 1 according to the features listed in Table 3. Sec-
package and project. The feature definition can be con- ond, the Data Set 1 is validated by checking the values
ducted with the project planning. The main objective of against the valid range, which is defined by the feature def-
feature definition is to minimize the effort and maximize inition element. Each input datum is checked in the data
the application of existing processes for data collection. collection element. The problematic data entries are sub-
Although the ABDP can handle any feature sets in the data mitted to the manager for further analysis.
analysis element by using the FSS technique, defining a fea-
ture set used in the whole process of the project at the fea-
ture definition element can facilitate data collection and
analysis.
The second component is the action submission. The Action
act_idt ask_id Action Descriptiona ct_date .. effort_exp
ABDP process serves as an iterative process, where each 12 20 … 000/5/10 .. 2
action is submitted to the prediction element before execu- 13 22 Func. Req. Spec. 2000/5/11
Defect.. 4
tion. However, the first definite iteration is not sent, since : def_id act_generated .. severity
679 26 C/S AP Prototype 2000/11/118 .. 412
no action records can be utilized to build the prediction .. 1
Task 21 45 .. 3
model. The action submission can be achieved by using a ...
Project :
process management system where the actor can input 143 620 .. 3
the information of the action and obtain the prediction
results immediately.
The action prediction element is used to predict whether Data Set
Action Action Action Task defect
the subsequent action produces defects. The actions that ID Date State … Task Status … num
are predicted as likely to produce many defects are 12 2000/5/10 0 … 20 0 … 0
reported to the manager to take corrective action, while 13 2000/5/11 0 … 22 0 … 10
the actions that are predicted as causing no defects respond :
679 2000/11/1 0 … 26 0 … 2
to the actor and send the information of the action to the
data collection elements. Fig. 5. The collected data transformation.
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 565

Collected Data
feature. The remaining nodes of the decision tree are con-
Selected
Raw Data Features Subset Control structed by the divide and conquer strategy, which selects
the feature subset according to the evaluation of individual
Feature features. The CFS with best-ﬁrst search strategy can be
Data Data Filter out the Data
Transformation Validation
Subset
Data set Sampling used to improve the prediction accuracy of the C4.5 algo-
Selection
rithm (Hall and Smith, 1999).

Data Set 1 Data Set 2 Data Set 3 Data Set 4 3.3. The prediction model construction

Fig. 6. The data processing of the ABDP. Instead of using the data collected from the previous
project to build the prediction model, the ABDP approach
builds the prediction model using the data collected from
Third, the FSS technique is used to filter out unneces- the current process to increase the prediction accuracy
sary attributes from the data set. The Data Set 2 can then (since the actions used to build the model have many sim-
be used to build the correlation matrix (using the whole ilar features to the submitted action, such as the stakehold-
data set as the training data set by default) and find the best ers, environments and work products).
feature subset using the best-first search. The CFS is Fig. 7 shows the data set following preprocessing as in
selected as the evaluator to evaluate the worth of the fea- Fig. 5, where the performed actions of the software process
ture subset, and the best-first search strategy is used to can be listed by action date (date performed). Actions sub-
reduce the search space of the feature subset selection. mitted at the beginning of the project (the actions 1–20)
Fourth, the selected feature subset then can be used to filter cannot be predicted, since no prediction model can be
out the desirable data, where the data of unselected fea- applied to them. After some actions (20 actions in this
tures are removed. example) have been performed, the prediction model can
Fifth, the data sampling step is performed to sample the be built using the performed actions (the actions 1–20).
major class using under-sampling (by default) and gener- The built model then can be used to predict the following
ates the final data set (Data Set 4) to be analyzed by the submitted actions, in this case actions 21–30. The model
data analysis element. The proposed sampling step is is then updated after action 30 is performed, where the per-
applied to address the rarity problem, which may cause formed actions 1–30 are used as the training data set to
the decision tree to classify all submitted actions to the build the new prediction model. The updated model can
major class (predicted as Low-defect action). The over- then be used to predict the following submitted actions
sampling can be used to duplicate the rare classes, and thus (the actions 31–40). The model continues to be updated
address the imbalance problem. However, the over- after certain submitted actions are performed until the
sampling may cause overfitting, since the duplication does end of the project.
not generate new rate class data (Chawla et al., 2002). The submitted actions need to be preprocessed as
Rather than duplicating the rare class data, the under-sam- described in Section 3.1 to generate the format (the same
pling applied in this study reduces the number of major as the data set used to build the prediction model). The
class data, and can be effectively used with C4.5 (Drum- number of defects in the submitted actions is the class
mond and Holte, 2003).

Data Set
3.2. The data analyzing Action Action Action defect
ID Date State … Task … num
1 2000/5/10 0 … 20 … 0
The data analysis element is used to analyze the data by 2 2000/5/11 0 … 22 …
Link_by = D
1
using classiﬁcation techniques and build the prediction : Action_State = 0
Task_severe_D <= 5
20 2000/5/12 0 … 20 … Effort_Expected
0
model from the data set prepared by the preprocessing ele- ...
<= 6

Model Construction Task_severe_D > 5:H class

ment. The model then can be used to predict the subse- Action_State = 1: M class

quent actions. The model kept updated when the Submitted Action Prediction

performed actions and defects records are reporting during 21 2000/5/14 0 … 20 …

:
project execution. The updated model then can be used to Link_by = D
30 2000/5/15 1 … 22 …
predict several subsequent actions. The decision tree model Action_State = 0
Task_severe_D <= 5
Model Construction Effort_Expected <= 6
in ABDP is built using the C4.5 algorithm, which can han- ...
Task_severe_D > 5:H class
dle both discrete and continuous data. Action_State = 1:M class

The C4.5 algorithm used to build the decision tree has Submitted Action Prediction

been utilized in many research areas, and produces good 31 2000/5/16 1 … 26 …

:
prediction results (Lim et al., 2000). The decision tree is
40 2000/5/18 0 … 22 …
built starting from the root node, which is selected accord-
ing to the information gain ratio evaluation on every Fig. 7. The ABDP process.
566 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570

feature that needs to be predicted, and is unknown prior to actions are divided into many segments (10 actions for each
execution. However, the accuracy of a prediction can not segment), where the last action in each segment is the
be evaluated until the end of the project, because some checking point. The performed actions before the check
defects may not have been detected at this time. point are used to renew the prediction models. To evaluate
The interval used to update the prediction model can be the accuracy of the built model, ten subsequent actions, fol-
based on either the number of submitted actions, or time. lowed by check point, were selected as the testing data to be
In the first case, the prediction model is updated after a spe- included in the training data on the next iteration to renew
cific number of performed actions (ten in this study). In the the models. For example, the first iteration used actions 1–
second case, the model is updated after a specific time inter- 20 as the training data, and action 21– 30 to test the accu-
val, such as one day or one week. For instance, the predic- racy of the models. The second iteration used actions 1–30
tion model can be updated at the midnight every day to as the training data, and action 31–40 as the testing data.
ensure that new actions are not submitted when updating Hence, 66 iterations were generated, and applied to evalu-
the prediction model. However, the manager can evaluate ate the efficiency of ABDP.
the interval selection.
4.3. Accuracy evaluation
4. The experiment
The accuracy, precision, recall and specificity are com-
The main purpose of ABDP is to capture the actions that mon ways to assess the prediction model. These evaluators
cause high or middling numbers of defects, all of which need are listed as Eqs. (1)–(4), where T, F, P and N represent
to be corrected. ABDP was applied to the data set obtained true, false, positive and negative respectively. The accuracy
from the AMS-COMFT project according to the proposed is the percentage of correct predictions (include high defect
schema to demonstrate the efficiency of ABDP. Table 3 and low defect predictions) among all predictions. The pre-
shows defined features used for data collection. The first cision denotes the percentage of the correct prediction of
nine features can be retrieved directly from the actions, high-defect (or median-defect) actions (the positive part).
while the remaining features need to be determined from The recall (sensitivity) denotes the percentage of high-
the tasks and defects of the action. The total_defect_num defect actions that have been discovered. The specificity
is the number of defects caused by the action, and is used denotes the percentage of low-defect actions that have been
to classify the action as low, median or high defect. The classified correctly.
Task_Status indicate if the status of the task is within or
over schedule where the status is determined by comparing TP þ TN
Accuracy ¼ ð1Þ
the action performed date against the scheduled date of the TP þ TN þ FP þ FN
task. The status falls within the schedule when the action TP
Precision ¼ ð2Þ
performed date is before the scheduled date of the task. TP þ FP
The task_progress represents the progress of the task TP
Recall ¼ ð3Þ
when the action is ready to be performed, where the task TP þ FN
progress is calculated by dividing the used efforts of all per- TN
formed actions of the task by expected efforts of the task. Specificity ¼ ð4Þ
FP þ TN
When the value of the task_progress is greater than 1, the
efforts of the task are overrun, possibly affecting the gener- Since the ABDP is utilized to predict the actions that
ated defects of the action. The following subsection explains cause high defects (include the median defects in this
the data set in detail, and shows how the ABDP analysis can study), the high-defect prediction is treated as positive,
be applied to the software development process. and the recall can be treated as the percentage of high
defect actions that can be predicted correctly. The false
4.1. The data set alarm rate is also defined as the percentage of low-defect
actions misclassified as high-defect actions.
The AMS-COMFT project contains seven work pack-
ages and 22 tasks. The project contains 682 actions, sorted 5. Results and discussion
by the performed date. Only 26 actions cause middling
defects, and 15 actions cause high defects. Most actions To compare the accuracy of sampling techniques, the
cause few or no defects, and a total of 413 defects were data set was analyzed using under-sampling and over-
caused by these actions at the end of the project (not samling with FSS, and represented as four categories in
including the maintenance phase). Table 4. The first two categories, Under without FSS and
Over without FSS, show the results of under and over sam-
4.2. The iteration of software process pling without FSS. The results of applying FSS to under
sampling and over sampling are shown as the categories
To demonstrate the efficiency of the ABDP approach Under with FSS and Over with FSS. For each category,
applied on the software development process, all sorted the MH (high and median defects) class and L class (low
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 567

Table 4
The summary of applying the sampling and FSS to the testing data in each iteration
Under without FSS Over without FSS Under with FSS Over with FSS
MH class L class MH class L class MH class L class MH class L class
MH L MH L MH L MH L MH L MH L MH L MH L
1 0 0 10 0 0 0 0 10 0 0 0 10 0 0 0 10
2 1 0 1 8 1 0 0 9 1 0 0 9 1 0 0 9
3 1 0 2 7 1 0 0 9 0 1 2 7 1 0 0 9
4 0 2 1 7 0 2 1 7 0 2 1 7 0 2 1 7
5 0 0 3 7 0 0 2 8 0 0 7 3 0 0 1 9
6 0 0 1 9 0 0 1 9 0 0 1 9 0 0 0 10
7 0 0 2 8 0 0 4 6 0 0 2 8 0 0 0 10
8 0 0 4 6 0 0 3 7 0 0 4 6 0 0 3 7
9 1 1 2 6 0 2 0 8 1 1 2 6 1 1 1 7
10 0 0 2 8 0 0 1 9 0 0 2 8 0 0 0 10
11 2 0 1 7 2 0 0 8 2 0 1 7 2 0 0 8
12 0 1 0 9 1 0 0 9 1 0 0 9 0 1 0 9
13 0 0 2 8 0 0 2 8 0 0 2 8 0 0 0 10
14 3 0 3 4 0 3 0 7 3 0 3 4 0 3 0 7
15 0 0 2 8 0 0 2 8 0 0 1 9 0 0 2 8
16 1 0 0 9 1 0 0 9 1 0 0 9 0 1 0 9
17 0 1 0 9 0 1 1 8 0 1 1 8 0 1 1 8
18 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
19 0 0 0 10 0 0 0 10 0 0 1 9 0 0 1 9
20 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
21 0 0 1 9 0 0 0 10 0 0 3 7 0 0 2 8
22 0 0 6 4 0 0 0 10 0 0 1 9 0 0 0 10
23 0 0 4 6 0 0 0 10 0 0 4 6 0 0 0 10
24 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
25 0 0 2 8 0 0 1 9 0 0 2 8 0 0 1 9
26 0 2 0 8 1 1 0 8 2 0 0 8 0 2 0 8
27 0 0 2 8 0 0 1 9 0 0 1 9 0 0 0 10
28 2 0 0 8 2 0 0 8 2 0 0 8 0 2 0 8
29 1 0 1 8 1 0 1 8 1 0 1 8 0 1 1 8
30 0 0 2 8 0 0 2 8 0 0 2 8 0 0 2 8
31 1 0 1 8 0 1 0 9 1 0 1 8 1 0 1 8
32 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
33 1 0 0 9 0 1 0 9 1 0 0 9 1 0 0 9
34 1 0 2 7 1 0 0 9 1 0 2 7 1 0 2 7
35 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
36 0 0 1 9 0 0 0 10 0 0 1 9 0 0 1 9
37 1 0 1 8 1 0 1 8 1 0 1 8 0 1 1 8
38 0 0 4 6 0 0 2 8 0 0 4 6 0 0 0 10
39 0 1 1 8 0 1 0 9 0 1 1 8 0 1 0 9
40 0 0 1 9 0 0 0 10 0 0 1 9 0 0 1 9
41 1 0 2 7 0 1 0 9 1 0 2 7 0 1 0 9
42 2 0 2 6 1 1 0 8 2 0 2 6 1 1 0 8
43 1 0 1 8 1 0 1 8 1 0 1 8 1 0 0 9
44 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
45 2 0 0 8 1 1 0 8 2 0 0 8 1 1 0 8
46 1 0 0 9 0 1 0 9 1 0 0 9 1 0 0 9
47 1 0 1 8 1 0 1 8 1 0 1 8 0 1 1 8
48 0 0 1 9 0 0 1 9 0 0 1 9 0 0 1 9
49 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
50 0 0 3 7 0 0 1 9 0 0 3 7 0 0 2 8
51 0 0 2 8 0 0 0 10 0 0 2 8 0 0 0 10
52 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10
53 1 0 0 9 0 1 0 9 1 0 0 9 0 1 0 9
54 2 2 0 6 0 4 0 6 2 2 0 6 0 4 0 6
55 1 0 1 8 1 0 3 6 1 0 1 8 1 0 4 5
56 0 0 10 0 0 0 2 8 0 0 10 0 0 0 3 7
57 0 0 6 4 0 0 0 10 0 0 1 9 0 0 2 8
58 0 0 2 8 0 0 2 8 0 0 2 8 0 0 1 9
59 0 1 1 8 0 1 1 8 0 1 0 9 0 1 0 9
60 0 0 1 9 0 0 6 4 0 0 0 10 0 0 0 10
(continued on next page)
568 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570

Table 4 (continued)
Under without FSS Over without FSS Under with FSS Over with FSS
MH class L class MH class L class MH class L class MH class L class
MH L MH L MH L MH L MH L MH L MH L MH L
61 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
62 0 0 1 9 0 0 0 10 0 0 0 10 0 0 0 10
63 0 0 4 6 0 0 0 10 0 0 1 9 0 0 0 10
64 0 0 3 7 0 0 0 10 0 0 3 7 0 0 0 10
65 0 0 0 10 0 0 0 10 0 0 0 10 0 0 0 10
66 0 0 1 9 0 0 0 10 0 0 1 9 0 0 0 10

defects) represent the actual number of testing cases which Classified as MH L Accuracy = 81.67
are classified as either MH or L. The numbers of testing Precision = 20.29
MH class 28 11 Recall = 71.79
data classified as MH or L are shown in the MH and L col- L class 110 511 Specificity = 82.29
umns respectively. The total number of MH cases was 39
rather than 41, since two of the MH cases were part of Fig. 10. The results of under sampling without FSS.
the training data at the first iteration (transaction 2 and
3), and the first testing data set started from transaction 21.
Each row represents the results of prediction at the significantly. That is, up to 72% of high-defect actions were
check point. For example, the results of iteration 9 using captured, which is an acceptable result. However, the pre-
the Under without FSS shows that one MH case was cat- cision was 21%, meaning that only one prediction in every
egorized into the L class and two L cases predicted as five predictions was correct. The precision must be further
MH class. The results can be shown as Fig. 8. The results improved to reduce the false alarm rate.
of using different approaches are shown in the following
subsections.
5.2. Applied the FSS with sampling

5.1. The Sampling without FSS To avoid misclassification of actions, the FSS was
applied to the data set for feature selection. The selected
Fig. 9 shows the results of applying over-sampling to the subset of features may not be the same when using different
testing data by selecting all features (without FSS). The training data. Table 5 lists the selected feature subsets using
accuracy was 90%, but the recall was only 28%, indicating the training data set in all iterations.
that many MH classes were missing (misclassified as L). By using the selected feature subset, the desirable attri-
Undetected high defect actions may increase the effort butes can be filtered out to build the prediction model.
involved in the process. Fig. 11 shows the results of applying over-sampling to
Fig. 9 demonstrates that most cases (both MH and L)
were classified as L, which not only raised the specificity,
but also reduced the recall. However, capturing as many Table 5
high-defect actions as possible is very important for defect The selected feature subset by iteration
prevention. Iteration Selected features Iteration Selected features
Under-sampling the majority classes rather than increas-
1 6, 15 27–39 2, 3, 13, 20
ing the number of rare classes, can improve the results. 2 6, 15, 20 40–41 1, 3, 6, 13, 20
Fig. 10 shows the results of under-sampling without FSS, 3 6, 15 42 1, 3, 6, 7, 13, 20
where the recall rose to 72% while the specificity did not fall 4 6 43–47 1, 3, 7, 13, 20
5 3, 6, 17, 20 48–49 1, 2, 3, 7, 13, 20
6–8 3, 6, 20 50–54 1, 2, 3, 13, 20
9 1, 3, 6, 17, 20 55–60 1, 2, 3, 6, 13, 20
Classified as MH L Accuracy = 70.00 10–11 2, 3, 6, 17, 20 61–62 1, 2, 3, 6, 7, 13, 17, 20
Precision = 33.33
MH class 1 1 Recall = 50.00
12 2, 3, 6, 7, 17, 20 63–64 1, 2, 3, 6, 13, 20
Specificity = 75.00 13 2, 3, 6, 17, 20 65–66 1, 2, 3, 6, 7, 13, 17, 20
L class 2 6
14–26 2, 3, 17, 20
Fig. 8. The results of predictions.

Classified as MH L Accuracy = 90.15 Classified as MH L Accuracy = 90.61

Precision = 28.33 Precision = 26.53
MH class 17 22 Recall = 43.59 MH class 13 26 Recall = 33.33
L class 43 578 Specificity = 93.08 L class 36 585 Specificity = 94.20

Fig. 9. The results of over sampling without FSS. Fig. 11. The results of over sampling with FSS.
C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570 569

Classified as MH L Accuracy = 85.15 predict the high defect classes. Over-sampling may
Precision = 25.21 lower the error of predicting low-defect classes, which
MH class 30 9 Recall = 76.92
L class 89 532 Specificity =85.67 is not the main objective of defect prevention.
(3) The recall and speciﬁcity can be improved by apply-
Fig. 12. The results of under sampling with FSS. ing the FSS technique. Applying FSS with under-
sampling can achieve desirable results.

the filtered data set. Although the specificity rose to 94%, The FSS with under-sampling can be applied to con-
and the precision also rose to 27%, the recall fell to 33%. struct the proposed prediction model for predicting high-
Thus, the prediction rate of over-sampling was not better defect actions. Additionally, we conclude that patterns
than that of under-sampling without FSS. exist among actions causing many defects, and these pat-
Fig. 12 shows the results of applying under-sampling terns can be modeled using data mining techniques. Future
with FSS, where both recall and specificity increased up work to identify these patterns will include applying
to almost 80%. The precision was 25%, meaning that one sequence pattern analysis techniques to increase the predic-
high defect action could be discovered out of four alarmed tion performance.
high-defect actions.
The analytical results in Fig. 12 indicate that the under-
Acknowledgements
sampling with simple feature subset selection can signifi-
cantly improve the efficiency of prediction. The results also
This work is partially supported by the National Science
reveal that high-defect actions can be found without caus-
Council of Taiwan, ROC, under Grant NSC-92-2213-E-
ing too many false alarms.
309-005, and partially sponsored by the Ministry of Eco-
nomic Affairs of Taiwan, under Grant 93-EC-17-A-02-S1-
6. Conclusion
029.
This study presents an action-based defect prevention
approach that can be applied to the software development References
process to detect actions that may cause many defects. The
ABDP approach presented in this study classifies data col- Aversano, L., Lucia, A.D., Gaeta, M., Ritrovato, P., Stefanucci, S.,
Villani, M.L., 2004. Managing coordination and cooperation in
lected from the reports of operations and defects of the
distributed software processes: The GENESIS Environment. Software
project. The FSS and sampling techniques can be applied Process Improvement and Practice 9, 239–263.
the data set to address the rarity problem. By detecting Boehm, B., Huang, L.G., 2003. Value-based software engineering: A case
the suspect actions, necessary corrective actions can be study. IEEE Computer 36 (3), 33–41.
taken to prevent the defects from occurring. Card, D.N., 1993. Defect-causal analysis drives down error rates. IEEE
Software 15 (1), 88–89.
The main advantage of ABDP is the in-process prediction
Card, D.N., 1998. Learning from our mistakes with defect causal analysis.
where the training data can be obtained from the project in IEEE Software 15 (1), 56–63.
execution used to build the prediction model. The in-process Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002.
analysis also can reduce the variance between different pro- SMOTE: Synthetic minority over-sampling technique. Journal of
jects. Second, the features utilized in ABDP to build the pre- Artificial Intelligence Research 15, 321–357.
Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S.,
diction model can be adapted from the existing process
Ray, B.K., Man-Yuen Wong, M.-Y., 1992. Orthogonal defect classi-
where the effort involved in modifying the existing process fication – A concept for in-process measurements. IEEE Transactions
for ABDP can be reduced. Third, the latest models can accu- on Software Engineering 18 (11), 943–956.
rately predict the submitted actions to obtain a quick Chrissis, M.B., Konrad, M., Shrum, S., 2003. CMMI guidelines for
response. The ABDP process can also be merged into the process integration and product improvement. Addison-Wesley, MA,
pp. 143–155.
existing process with little additional effort. To facilitate
CMMI Product Team. 2001. Capability Maturity Model Integration V1.1,
the ABDP process, this study defines a set of features, and Stage Representation. Software Engineering Institute, Carnegie Mel-
applies to the AMS-COMFT project to evaluate the perfor- lon University, Pittsburgh, USA.
mance of ABDP, where the results can be concluded as Doppke, J.C., Heimbigener, D.H., Wolf, A.L., 1997. Software process
follow. modeling and execution within virtual environments. ACM Transac-
tions on Software Engineering and Methodology 7, 1–47.
Drummond, C., Holte, R.C., 2003. C4.5, Class Imbalance, and Cost
(1) Actions that cause many defects among repositories Sensitivity: Why Under-Sampling beats Over-Sampling. In: Proceed-
of the software process are rare class actions which ings of the Workshop on Learning from Imbalance Data Sets II,
can not be classified directly and need to be prepro- International Conf. on Machine Learning.
cessed by sampling. The rarity problem of the high- Dy, J.G., Brodley, C.E., 2000. Feature selection for unsupervised learning.
Journal of Machine Learning Research 5, 845–889.
defect class also raises the difficulty of detecting the
Fleming, Q.W., 1998. Cost/Schedule Control Systems Criteris: The
actions that cause high defects. management Guide to C/SCSC. Probus.
(2) A comparison of under and over-sampling reveals Florac, W.A., Carleton, A.D., 1999. Measuring the Software Process.
that under-sampling produces acceptable results to Addison-Wesley, MA.
570 C.-P. Chang, C.-P. Chu / The Journal of Systems and Software 80 (2007) 559–570

Gale, J.L., Tirso, J.R., Burchfield, C.A., 1990. Implementing the defect Lim, T.-S., Loh, W.Y., Shin, Y.S., 2000. A Comparison of prediction
prevention process in the MVS interactive programming organization. accuracy, complexity, and training time of thirty-three old and new
IBM System Journal 29 (1), 33–43. classification algorithms. Machine Learning 40 (3), 203–228.
Hall, M.A., 2000. Correlation-based Feature Selection for Discrete and Mays, R.G., Jones, C.L., Holloway, G.J., Studinski, D.P., 1990.
Numeric Class Machine Learning. In: Proceedings of the Seventeenth Experiences with defect prevention. IBM System Journal 29 (1),
International Conference on Machine Learning, CA, USA, pp. 359–366. 4–32.
Hall, M.A., Smith, L.A., 1999. Feature Selection for Machine Learning: Mohapatra, S., Mohanty, B., 2001. Defect Prevention through Defect
Comparing a Correlation-based Filter Approach to the Wrapper. In: Prediction: A Case Study at Infosys. In: proceedings of International
Proceedings of the Florida Artificial Intelligence Symposium, FLAIRS Conference on Software Maintenance (ICSM 2001), Florence, Italy,
Conference, pp. 235–239. pp. 260–272.
Han, J., Kamber, M., 2001. Data Mining Concepts and Techniques. NASA. 2000. NASA Procedures and Guidelines for Mishap Reporting,
Morgan Kaufmann Publishers, USA. Investigating, and Recordkeeping. Safety and Risk Management
Humphrey, W., 1989. Managing the Software Process. Addison-Wesley, Division, NASA Headquarters, USA.
MA. Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M. 2003.
Jones, C.L., 1985. A process-integrated approach to defect prevention. Automated Support for Classifying Software Failure Reports. In:
IBM System Journal 24 (2), 150–167. Proceedings of the 25th International Conference on Software Engi-
Jones, T.C., 1994. Assessment and Control of Software Risks. Prentice neering (ICSE 2003).
Hall, NY. Pooley, R., Senior, D., Christie, D., 2002. Collecting and analyzing web-
Jones, C., 2003. Practical Software and Systems Measurement: A based project metrics. IEEE Software 19 (1), 52–58.
Foundation for Objective Project Management. Version. 4.0c. Depart- Pressman, R.S., 2001. Software Engineering: A Practitioner’s Approach.
ment of Defense and US Army, USA. McGraw-Hill, NY.
Khoshgoftaar, T.M., Allen, E.B., Jones, W.D., Hudepohl, J.P., 2000. Quinlan, J.R., 1986. Induction of decision tree. Machine Learning 1 (1),
Classification-tree models of software-quality over multiple releases. 81–106.
IEEE Transactions on Reliability 49 (1), 4–11. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan
Kilpi, T., 2001. Implementing a software metrics program at Nokia. IEEE Kaufmann Publishers, USA.
Software 18 (6), 72–77. Russell, S., Norvig, P., 1995. Artificial Intelligence: A Modern Approach.
Kohavi, R., John, G.H., 1996. Wrappers for feature subset selection. Prentice Hall, USA.
Artificial Intelligence 97 (12), 273–324. Sommerville, I., 2001. Software Engineering. Addison-Wesley, MA.
Lawler, J., Kitchenham, B., 2003. Measurement modeling technology. Weiss, G.M., 2004. Mining with rarity: a unifying framework. ACM
IEEE Software 12 (3), 68–75. SIGKDD Explorations Newsletter 6, 7–19.
Leszak, M., Perry, D.E., Stoll, D., 2002. Classification and evaluation of Weller, E.F., 2000. Practical applications of statistical process control.
defects in a project retrospective. The Journal of System and Software IEEE Software 17 (3), 48–55.
61, 173–187.

Predicting Software Defects Using ABDP
No ratings yet
Predicting Software Defects Using ABDP
11 pages
Software Defect Prevention Strategies
No ratings yet
Software Defect Prevention Strategies
6 pages
Effective Defect Prevention Approach in Software Process For Achieving Better Quality Levels
No ratings yet
Effective Defect Prevention Approach in Software Process For Achieving Better Quality Levels
5 pages
1001 3552 PDF
No ratings yet
1001 3552 PDF
5 pages
Software Defect Management Strategies
No ratings yet
Software Defect Management Strategies
11 pages
Art of Software Defect Association & Correction Using Association Rule Mining
No ratings yet
Art of Software Defect Association & Correction Using Association Rule Mining
11 pages
Defect Prevention Strategies in Software Quality
No ratings yet
Defect Prevention Strategies in Software Quality
6 pages
Defect Prevention and Causal Analysis
No ratings yet
Defect Prevention and Causal Analysis
9 pages
Software Defect Prediction Models Overview
No ratings yet
Software Defect Prediction Models Overview
9 pages
Software Defect Reduction Strategies
No ratings yet
Software Defect Reduction Strategies
3 pages
Understanding Defect Rate and Prevention
No ratings yet
Understanding Defect Rate and Prevention
16 pages
1990 Experiences With Defect Prevention
No ratings yet
1990 Experiences With Defect Prevention
30 pages
Defect Management Process Overview
No ratings yet
Defect Management Process Overview
41 pages
Defect Management Process Overview
No ratings yet
Defect Management Process Overview
67 pages
Quality Management in Information Systems
No ratings yet
Quality Management in Information Systems
44 pages
Defect Prevention Strategies in Software
No ratings yet
Defect Prevention Strategies in Software
52 pages
Unit 3
No ratings yet
Unit 3
52 pages
Defect Management Process Overview
No ratings yet
Defect Management Process Overview
23 pages
Defect Management Process Overview
No ratings yet
Defect Management Process Overview
12 pages
Software Quality Management Metrics Guide
No ratings yet
Software Quality Management Metrics Guide
8 pages
Defect Analysis and Reliability Factors
No ratings yet
Defect Analysis and Reliability Factors
12 pages
Enhancing Defect Classification Strategies
No ratings yet
Enhancing Defect Classification Strategies
21 pages
Defect Containment Effectiveness Explained
No ratings yet
Defect Containment Effectiveness Explained
23 pages
Real-Time Defect Data Analysis Guide
No ratings yet
Real-Time Defect Data Analysis Guide
6 pages
CS-736: Advanced Software Engineering: Software Quality Measurement: A Framework For Counting Problems and Defects
No ratings yet
CS-736: Advanced Software Engineering: Software Quality Measurement: A Framework For Counting Problems and Defects
21 pages
FAMI: Four-Step Defect Management Model
No ratings yet
FAMI: Four-Step Defect Management Model
14 pages
Inspection Strategies for Error Correction
No ratings yet
Inspection Strategies for Error Correction
34 pages
Defect Management System Overview
No ratings yet
Defect Management System Overview
5 pages
Quantitative Quality Management Through Defect Prediction and Statistical Process Control
No ratings yet
Quantitative Quality Management Through Defect Prediction and Statistical Process Control
6 pages
Quantitative Software Quality Management
No ratings yet
Quantitative Software Quality Management
11 pages
Defect Management in Software Testing
No ratings yet
Defect Management in Software Testing
27 pages
Defect Prevention Techniques in Software
No ratings yet
Defect Prevention Techniques in Software
2 pages
Effective Defect Prevention Strategies
No ratings yet
Effective Defect Prevention Strategies
12 pages
Defect Management Essentials
No ratings yet
Defect Management Essentials
20 pages
Defect Prevention in Software Quality
No ratings yet
Defect Prevention in Software Quality
40 pages
Defect Management in Software Testing
No ratings yet
Defect Management in Software Testing
7 pages
Biologically-Inspired Software Project Management
No ratings yet
Biologically-Inspired Software Project Management
8 pages
Root Causes of Software Defects Analysis
No ratings yet
Root Causes of Software Defects Analysis
4 pages
Dynamic vs Static Testing in QA
No ratings yet
Dynamic vs Static Testing in QA
17 pages
Software Quality Assurance Overview
No ratings yet
Software Quality Assurance Overview
49 pages
ML Techniques for Software Defect Detection
No ratings yet
ML Techniques for Software Defect Detection
84 pages
Process Improvement Strategies and Metrics
No ratings yet
Process Improvement Strategies and Metrics
52 pages
Management Spectrum in Software Projects
No ratings yet
Management Spectrum in Software Projects
20 pages
Chapter 5: Defect Analysis
No ratings yet
Chapter 5: Defect Analysis
64 pages
Root Cause Analysis in Software Quality
No ratings yet
Root Cause Analysis in Software Quality
8 pages
Statistical Quality Assurance in Software
No ratings yet
Statistical Quality Assurance in Software
3 pages
Software Process Improvement Strategies
No ratings yet
Software Process Improvement Strategies
73 pages
Challenges in Defect Management Process
No ratings yet
Challenges in Defect Management Process
9 pages
Key Software Quality Metrics Explained
No ratings yet
Key Software Quality Metrics Explained
26 pages
Process Improvement Strategies by Bruce Maxim
No ratings yet
Process Improvement Strategies by Bruce Maxim
52 pages
Defect Density and Quality Metrics Overview
No ratings yet
Defect Density and Quality Metrics Overview
6 pages
Defect Management Process Overview
No ratings yet
Defect Management Process Overview
25 pages
Software Risk and Configuration Management
No ratings yet
Software Risk and Configuration Management
35 pages
4th unit
No ratings yet
4th unit
23 pages
Process Improvement Strategies and Metrics
No ratings yet
Process Improvement Strategies and Metrics
52 pages
Predicting Software Quality with Time Series
No ratings yet
Predicting Software Quality with Time Series
23 pages
Image Convolution Assignment in C++
No ratings yet
Image Convolution Assignment in C++
3 pages
Securing Network and Web Applications
No ratings yet
Securing Network and Web Applications
80 pages
Abu Qasim Wadoodi Records List
No ratings yet
Abu Qasim Wadoodi Records List
6 pages
SSC-II Admission Forms 2021
No ratings yet
SSC-II Admission Forms 2021
566 pages
Getting Started with Sinatra Framework
No ratings yet
Getting Started with Sinatra Framework
16 pages
Polarion Install Guide Linux
No ratings yet
Polarion Install Guide Linux
28 pages
Python Essentials: Data Types & Functions
No ratings yet
Python Essentials: Data Types & Functions
13 pages
Excel Curriculum for Data Analysts
No ratings yet
Excel Curriculum for Data Analysts
5 pages
Teledyne DALSA Spyder3 Color GigE SG 34 Series Manual POcvh8a
No ratings yet
Teledyne DALSA Spyder3 Color GigE SG 34 Series Manual POcvh8a
103 pages
Computer Basics for Business Management
No ratings yet
Computer Basics for Business Management
38 pages
Turkcell VoLTE Optimization Guide
100% (1)
Turkcell VoLTE Optimization Guide
23 pages
CAN Bus Solutions for PIC Microcontrollers
No ratings yet
CAN Bus Solutions for PIC Microcontrollers
8 pages
C++ Queue Data Structures Overview
No ratings yet
C++ Queue Data Structures Overview
17 pages
Flash Button with Sound Tutorial
No ratings yet
Flash Button with Sound Tutorial
4 pages
Streaming Databases Overview
No ratings yet
Streaming Databases Overview
43 pages
Firewall Solutions and Applications Guide
No ratings yet
Firewall Solutions and Applications Guide
41 pages
Vacon NXP Line Sync APFIF131 Application Manual UD
No ratings yet
Vacon NXP Line Sync APFIF131 Application Manual UD
130 pages
Updated Short User Manual
No ratings yet
Updated Short User Manual
14 pages
Power BI Workshop by Office Master
No ratings yet
Power BI Workshop by Office Master
29 pages
GWT RPC Implementation Guide
No ratings yet
GWT RPC Implementation Guide
16 pages
Understanding CS101: Computing Basics
No ratings yet
Understanding CS101: Computing Basics
59 pages
Introduction to Microsoft Sway
No ratings yet
Introduction to Microsoft Sway
6 pages
PU Wifi Registration Application Form
No ratings yet
PU Wifi Registration Application Form
1 page
GenAI Interview Questions & Answers Guide
No ratings yet
GenAI Interview Questions & Answers Guide
2 pages
IoT-Based Water Quality Monitoring System
No ratings yet
IoT-Based Water Quality Monitoring System
23 pages
MCA Examination Scheme Overview
No ratings yet
MCA Examination Scheme Overview
27 pages
Comparing Quantum Machine Learning Frameworks
No ratings yet
Comparing Quantum Machine Learning Frameworks
10 pages
Java Temperature Converter Project
No ratings yet
Java Temperature Converter Project
7 pages
Remote Terminal Unit RTU560
No ratings yet
Remote Terminal Unit RTU560
116 pages
Advanced PI Vision for Energy Management
No ratings yet
Advanced PI Vision for Energy Management
45 pages
FramReinMaker Revit Csharp
No ratings yet
FramReinMaker Revit Csharp
6 pages
Computer Science Exam Question Paper
No ratings yet
Computer Science Exam Question Paper
9 pages
Power BI ETL Process for Products
No ratings yet
Power BI ETL Process for Products
41 pages
Strathmore University Penetration Exam
No ratings yet
Strathmore University Penetration Exam
2 pages

Research Article 1

Uploaded by

Research Article 1

Uploaded by

The Journal of Systems and Software 80 (2007) 559–570

Defect prevention in software processes: An action-based approach

Keywords: Software process improvement; Software repositories; Classiﬁcation; Mining rarity

1. Introduction and prevent defects or other process problems from occur-

The planning stage is performed to plan the execution of Table 2

tomer sends a change request, which is approved by the a1 a2 a3 a4 a5 a6

Submit the Predict the Action_Type = N

The following subsection described in detail the data

If model To build the prediction model, the data of performed

Model Construction Task_severe_D > 5:H class

performed actions and defects records are reporting during 21 2000/5/14 0 … 20 …

been utilized in many research areas, and produces good 31 2000/5/16 1 … 26 …

Classified as MH L Accuracy = 90.15 Classified as MH L Accuracy = 90.61

You might also like