CHAPTER 1] ——eee,
SOFTWARE RELIABILITY
AND QUALITY
MANAGEMENT
Reliability of a software product is an important concern for most users. Users not only want
the products they purchase to be highly reliable, but for certain categories of products they
may even require a quantitative guarantee on the reliability of the product before making
their buying decision. This may especially be true for safety-critical and embedded software
products. However, as we discuss in this chapter, it is very difficult to accurately measure the
reliability of any software product. One of the main problems encountered while quantitatively
measuring the reliability of a software product is the fact that reliability is observer-dependent.
‘That is, different groups of users may arrive at different reliability estimates for the same
product. Besides this, several other problems (such as frequently changing reliability values
due to bug corrections) make accurate measurement of the reliability of a software product
difficult. We investigate these issues in this chapter. Even though no metric to accurately
measure the reliability of a software product exists, we shall discuss some metrics that are
being used at present to quantify the reliability of a software product. We shall also address
the problem of reliability growth modelling and examine how to predict when (and if at all) a
given level of reliability will be achieved. We shall also examine the statistical testing approach
to reliability estimation. In this chapter, we shall also discuss various issues associated with
Software Quality Assurance (SQA).
Software Quality Assurance (SQA) has emerged as one of the most talked about topics in
recent years in software industry circles. The major aim of SQA is to help an organization
develop high quality software products in a repeatable manner. A software development organi-
zation can be called repeatable when its software development process is person-independent.
That is, the success of a project does not depend on who exactly are the team members of
the project. Besides, the quality of the developed software and the cost of development are
important issues addressed by SQA. In this chapter, we first discuss a few important issues
concerning software reliability measurement and prediction before starting our discussion on
software quality assurance.gonware Reliability 371
is as
it
nd SOFTWARE RELIABILITY
ability of a software product essent
The igernatively; the reliability of a go
ity een working correctly over
of we aitively it is obvious that a sof
fware product having a large number of defects is un-
aiable. It is also very reasonable to assume that the reliabili
©
wamber of defects in it is reduced. Tt would have been very nice if we could mathematically
yo acterize this relationship between reliability and the number of bugs present in the system
sing ‘a simple closed form “xpression. Unfortunately, it is very difficult to characterize the
peered reliability of a system in terms the number of latent defects in the system using
° simple mathematical expression. To get an insight into this issue, consider the following.
removing errors from those parts of a software product that are very infrequently executed,
makes little difference to the Perceived reliability of the Product. It has been experimentally
observed by analyzing the behaviour of a large number of prograins thee 90% of the execution
time of a typical program is Spent in executing only 10% of the instructions in the program.
The most used 10% instructions are often called the core’ of a program. The rest 90% of
the program statements are called non-core and are on the average executed only for 10%
© may not be very surprising to note that removing
60% product defects from the least used parts of a system would typically result in only 3%
improvement to the product reliability. It is clear that the quantity by which the overall relia-
bility of a program improves due to the correction of a single error depends on how frequently
the instruction having the error is executed. If an error is removed from an instruction that is
frequently executed (i.e. belonging to the core of the Program), then this would show up as a
large improvement to the reliability figure. On the other hand, removing errors from parts of
the program that are rarely used, may not. cause any appreciable change to the reliability of
the product.
{ally denotes its trustworthiness or dependabil-
oftware Product can also be defined as the probability
# given period of time.
of a software product. For example, for a library automation software the library members
‘ould use functionalities such as issue book, search book, ete. on the other hand, the librarian
would normally execute features such as create member, create book record, delete: member
teord, ete. So defects which show up for the librarian, may not show up for the members,
Suppose the functions of a Library Automation Software which the library members use are
Sror-free; and functions used by the librarian have many bugs, ‘Then, these two categories
of users would have very different opinions about the reliability of the software. Therefore,
the reliability figure of a software product is observer-dependent, and it is very difficult to
absolutely quantify the reliability of the product.
(i? Miereaiah Che Cary a en ay commonly available tool called a
Profiler. On Unix platforms, a tool called prof is normally available for this purpose.
ia
a
Software Reliability and Quality Manage,
Software HA r _—
23
above discussions, we can summarize the maip reasons Chiat make sofyyg
Ware
Based on the « the
¢ than hardware reliability
reliability more diffieult to measur
© The reliability Improvement due to fixing # single bug depends on where the uy iy
located in the code.
« The perceived reliability of a $0
«The reliability of a product keeps cl
In the following subsection, we shall discuss why software reliability measurement iy 4
harder problem than hardware reliability measurement,
fiware product is observer-dependent. |
hhanging as errors are detected and fixed.
11.1.1. Hardware versus Software Reliability
‘An important characteristic feature that sets hardware and software reliability issues apart is
the difference between their failure patterns.
fo very different reasons as compared to software components,
Hardware components fail mostly due to wear and tear, whereas software components fail due
Hardware components fail due t
to bugs.
‘A logic gate may be stuck at 1 or 0, or a resistor might short circuit, To fix a hardware
fault, one has to either replace or repair the failed part. In contrast, a software product would
continue to fail until the error is tracked down and either the design or the code is changed
to fix the bug. For this reason, when 8 hardware part is repaired, ite reliability would be
maintained at the level that existed before the failure occurred; whereas when a software
failure is repaired, the reliability may either increase or decrease (reliability may decrease if «
bug fix introduces new errors).,To put this fact in a different perspective, hardware reliability
study is concerned with stability (for example, the inter-failure times remain constant). On the
other hand, the aim of software reliability study would be reliability growth (that is, increase
in inter-failure times).
‘A comparison of the changes in failure rate over the product lifetime for a typical hardware
product as well as a software product are sketched in Figure 11.1. Observe that the plot of
change of reliability with time for hardware component [Figure 11.1(a)] appears like a bath
tub. For a software component the failure rate is initially high, but decreases as the faulty
components are identified are either repaired or replaced. The system then enters its useful
life, where the rate of failure is almost constant. After some time (called product lifetime)
the major components wear out, and the failure rate increases. The initial failures are usually
covered through manufacturer's warranty. A corollary of this observation (though a digression
from our topic of discussion) is that it may be unwise to buy a product (even at a good discount
to its face value) towards the end of its lifetime. That is, one need not feel happy to buy a ten
year old car at one tenth of the price of a new car, since it would be near the rising edge of the
bath tub curve, and one would have to spend unduly large time, effort, and money on repairing
and end up as the loser, In contrast to the hardware products, the software product show
the highest failure rate just after purchase and installation (see the initial portion of the plot
in Figure 11.1 (b)]. As the system is used, more and more errors are identified and removed
resulting in reduced failure rate, This error removal continues at a slower pace during the
-—aaaaacamaammaaaiaiaa|
— 373
spe product. AS the software beco
mes obsolete
eins t no .
tet ee remains unchanged. Mote error cortection occurs
(2) Hardware product
Figure 11.1: Change in failure rate of a product,
vai Metrics
réiability requirements for different categories of software Products may be different. For
i necessary that the level of reliability required for a software product should
in the SRS (software requirements specification) ) document. In order to be able to
Hits, we heed some metrics to quantitatively express the reliability of a software product.
4 god reliability measure should be observer-independent, so that
; different people can agree
za the degree of reliability a system has. However, in practice, it is very difficult to formulate
smetric using which precise reliability measurement would be possible. In the absence of such
seasures, we discuss six metrics that correlate with reliability as follows:
4, Rate of OCcurrence Of Failure (ROCOF). ROCOF measures the frequency of oc-
currence of failures. ROCOF measure of a software product can be ol observing
the behaviour Of a Software product in operation over a specified time interval and then
calculating the ROCOF value as the ratio of the total number of failures observed and
the duration of observation. However, many software products do not run continuously
unlike @ car or a mixer), but deliver certain service when a demand is placed on them.
For example, a library software is idle until a book issue request is made. Therefore, for
2 typical software product such as a payroll software, applicability of ROCOF is very
iatied.
2 Mean Time To Failure (MTTF).{MTTF is the time between two successive failures,
averaged over a large number of failures. To measure MTTF, we can record the failure
dota for n failures. Let the failures occur at the time instants ty, t2, ...,tn- Then, MTTF
can be calculated as 77. aa It is important to note that only rin time is considered
in the time measurements. That is, the time for which the system is down to fix the
“ror, the boot time, etc. are not taken into account in the time measurements and the
clock is stopped’ at these times.Software Reliability and
a74 2
/ Managemen,
rime spair (MTTR).(Once failure occurs, some time
aerate Peale (MEST) Oe eee
failure and to fix the s —
we MT'TF and MTTR metric,
Betweon Failure (MTBF). The MT’ sg
» ee a5 oe the MTBF metric: MTBP=MTTF4MTTR. Thus, MTBF of 30) hon”
i Rates, that once a failure occurs, the next failure is expected after 300 hours, In this
tase, the time meas ts are real time and not the execution time as in MTTR,
5. Probability Of Failure On Demand (POFOD).(nlike the other metries di
is Fequired to gy
CTOs causing the
this metric does not explicitly involve time measurements. POFOD measures i
hood of the system Tailing when a service request Is made) For example, a
ss : of
8.001 would mean that T out of every 1000 service Feqaestd would result in a failure, We
have already mentioned that the reliability of a software product should be determines
through specific service invocations, rather than making the software run continuously,
Thus, POFOD metric is very appropriate for software products that are not required to
run continuously.
6, Availability. Availability of a system is « measure of how likely would the system Te
available for use over a given period of time. This metric not only iders the nuitiber
of failures. occurring during a time interval, but also takes into account the repair tine
(down time) of a system when a failure occurs) This metric is im
portant for systems such
‘as telecommunication systems, and-operating systems, and embedded controllers, ete.
which are supposed to be never down and where repair and restart time are significant
and loss of service during that time cannot be overlooked.
All the above reliability metrics suffer from se
software reliability measurement is concerned. One of the reasons is that these metrics are
centred around the probability of occurrence of system failures, but take no account of the
consequences of failures. That is, these reliability models do not distinguish the telative sever-
ity of different failures. Failures which are transient and whose consequences are not serious
are in practice of little concern in the operational use of a software product. These types of
failures can at best be minor irritants. On the other hand, more severe types of failures may
render the system totally unusable. In order to estimate the
reliability of a software product
more accurately, it is necessary to classify various types of failures. Please note that the dif.
ferent classes of failures may not be mutually exclusive. The classification is based on widely
different set of criteria. As a result, a failure type can at the same time belong to more than
/ ohe class, A scheme of classification of failures is as follows:
\LAransient:
Transient failures occur only for certain input values while invoking a function
of the system, i
‘veral shortcomings as far as their ‘use in
* “ermanent: Permanent failures occur for all input values while invoking a function of the
3-Recoverable: When a recoverable failure occurs, the system can recover without having
is ks and restart the system (with or without operator intervention).
nrecoverable. In unrecoverable failures, the system may need to be restarted.
lla aiivy ns |
aware Reliability
1
we
'
wie eee
3 Reliability Growth Modelling
yl
bility growth model is a mathematical
anh oe anion model of how software reliability improves as
eros