Software Reliability
and
Maintenance
26/11/24
Organization
Introduction
Reliability metrics
Reliability growth modelling
Software Maintenance and Support
Summary
Introduction
Reliability of a software product:
Denotes its trustworthiness or
dependability
Can be defined as the probability of the
product working correctly over a given
period of time.
Users not only want highly reliable
products:
want quantitative estimation of
reliability before making buying
decision.
Introduction
Accurate measurement of software
reliability:
a very difficult problem
Several factors contribute to making
measurement of software reliability
difficult.
Major Problems in
Reliability Measurements
Errors do not cause failures
at the same frequency and
severity.
measuring latent errors alone
not enough
The failure rate is observer-
dependent
Software Reliability
Intuitively:
a software product having a
large number of defects is
unreliable.
It is also clear:
reliability of a system
improves if the number of
defects is reduced.
Difficulties in Software
Reliability Measurement
(1)
No simple relationship between:
observed system reliability
and the number of latent
software defects.
Removing errors from parts of
software which are rarely used:
makes little difference to the
perceived reliability.
The 90-10 Rule
Experiments from analysis of
behavior of a large number of
programs:
90% of the total execution time is
spent in executing only 10% of
the instructions in the program.
The most used 10%
instructions:
called the core of the program.
Effect of 90-10 Rule on
Software Reliability
Least used 90% statements:
called non-core are executed only
during 10% of the total execution
time.
It may not be very surprising then:
removing 60% defects from least
used parts would lead to only about
3% improvement to product
reliability.
Difficulty in Software
Reliability Measurement
Reliability improvements
from correction of a single
error:
depends on whether the error
belongs to the core or the
non-core part of the program.
Difficulty in Software
Reliability Measurement
(2)
The perceived reliability
depends to a large extent
upon:
how the product is used,
In technical terms on its
operation profile.
Effect of Operational Profile on
Software Reliability
Measurement
If we select input data:
only “correctly”
implemented functions
are executed,
none of the errors will be
exposed
perceived reliability of the
product will be high.
Effect of Operational Profile on
Software Reliability
Measurement
On the other hand, if we
select the input data:
such that only functions
containing errors are invoked,
perceived reliability of the
system will be low.
Software Reliability
Different users use a software
product in different ways.
defects which show up for one user,
may not show up for another.
Reliability of a software product:
clearly observer-dependent
cannot be determined absolutely.
Difficulty in Software
Reliability Measurement
(3)
Software reliability keeps
changing through out the
life of the product
Each time an error is detected
and corrected
Hardware vs. Software
Reliability
Hardware failures:
inherently different from software
failures.
Most hardware failures are due
to component wear and tear:
some component no longer
functions as specified.
Hardware vs. Software
Reliability
A logic gate can be stuck at
1 or 0,
or a resistor might short
circuit.
To fix hardware faults:
replace or repair the failed
part.
Hardware vs. Software
Reliability
Software faults are latent:
system will continue to fail:
unless changes are made to
the software design and
code.
Hardware vs. Software
Reliability
Because of the difference in
effect of faults:
Though many metrics are
appropriate for hardware
reliability measurements
Are not good software reliability
metrics
Hardware vs. Software
Reliability
When a hardware is
repaired:
its reliability is maintained
When software is repaired:
its reliability may increase or
decrease.
Hardware vs. Software
Reliability
Goal of hardware reliability
study :
stability (i.e. interfailure times
remains constant)
Goal of software reliability
study
reliability growth (i.e.
interfailure times increases)
Bathtub curve for
Hardware reliability
Bathtub curve for software
Reliability
Reliability Metrics
Different categories of software
products have different
reliability requirements:
level of reliability required for a
software product should be
specified in the SRS document.
Reliability Metrics
A good reliability measure
should be observer-
independent,
so that different people can
agree on the reliability.
Rate of occurrence of
failure (ROCOF):
ROCOF measures:
frequency of occurrence
failures.
observe the behavior of a
software product in operation:
over a specified time interval
calculate the total number of
failures during the interval.
Mean Time To Failure
(MTTF)
Average time between two
successive failures:
observed over a large number
of failures.
Mean Time To Failure
(MTTF)
MTTF is not as appropriate for
software as for hardware:
Hardware fails due to a
component’s wear and tear
thus indicates how frequently the
component fails
When a software error is detected
and repaired:
the same error never appears.
Mean Time To Failure
(MTTF)
We can record failure data
for n failures:
let these be t1, t2, …, tn
calculate (ti+1-ti)
the average value is MTTF
(ti+1-ti)/(n-1)
Mean Time to Repair
(MTTR)
Once failure occurs:
additional time is lost to fix
faults
MTTR:
measures average time it
takes to fix faults.
Mean Time Between
Failures (MTBF)
We can combine MTTF and
MTTR:
to get an availability metric:
MTBF=MTTF+MTTR
MTBF of 100 hours would indicae
Once a failure occurs, the next
failure is expected after 100 hours
of clock time (not running time).
Probability of Failure on
Demand (POFOD)
Unlike other metrics
This metric does not explicitly involve
time.
Measures the likelihood of the
system failing:
when a service request is made.
POFOD of 0.001 means:
1 out of 1000 service requests may result
in a failure.
Availability
Measures how likely the system
shall be available for use over a
period of time:
considers the number of failures
occurring during a time interval,
also takes into account the repair
time (down time) of a system.
Availability
This metric is important for
systems like:
telecommunication systems,
operating systems, etc. which are
supposed to be never down
where repair and restart time are
significant and loss of service
during that time is important.
Reliability metrics
Failures which are transient and
whose consequences are not
serious:
of little practical importance in
the use of a software product.
such failures can at best be minor
irritants.
Reliability Metrics – contd.
Mean Time to Failure (MTTF)
average time between observed
failures (aka MTBF)
Availability = MTBF /
(MTBF+MTTR)
MTBF = Mean Time Between Failure
MTTR = Mean Time to Repair
Reliability = MTBF / (1+MTBF)
Reliability metrics
All reliability metrics we
discussed:
centered around the probability
of system failures:
take no account of the
consequences of failures.
severity of failures may be very
different.
Failure Classes
More severe types of failures:
may render the system totally
unusable.
To accurately estimate reliability of
a software product:
it is necessary to classify different
types of failures.
Failure Classes
Transient:
Transient failures occur only for
certain inputs.
Permanent:
Permanent failures occur for all input
values.
Recoverable:
When recoverable failures occur:
the system recovers with or without
operator intervention.
Failure Classes
Unrecoverable:
the system may have to be restarted.
Cosmetic:
These failures just cause minor
irritations,
do not lead to incorrect results.
An example of a cosmetic failure:
mouse button has to be clicked twice
instead of once to invoke a GUI function.
Examples
Failure Class Example Metric
Permanent ATM fails to
Non-corrupting operate with any ROCOF = .0001
card, must restart to Time unit = days
correct
Transient Magnetic stripe POFOD = .0001
Non-corrupting can't be read on Time unit =
undamaged card transactions
Reliability Growth
Modelling
A reliability growth model:
a model of how software reliability
grows
as errors are detected and repaired.
A reliability growth model can be
used to predict:
when (or if at all) a particular level of
reliability is likely to be attained.
i.e. how long to test the system?
Reliability Growth
Modelling
There are two main types of
uncertainty:
in modelling reliability growth which
render any reliability measurement
inaccurate:
Type 1 uncertainty:
our lack of knowledge about how the
system will be used, i.e.
its operational profile
Reliability Growth
Modelling
Type 2 uncertainty:
reflects our lack of knowledge about the
effect of fault removal.
When we fix a fault
we are not sure if the corrections are
complete and successful and no other faults
are introduced
Even if the faults are fixed properly
we do not know how much will be the
improvement to interfailure time.
Step Function Model
The simplest reliability growth
model:
a step function model
The basic assumption:
reliability increases by a constant
amount each time an error is
detected and repaired.
Step Function Model
ROCOF
Time
Step Function Model
Assumes:
all errors contribute equally to
reliability growth
highly unrealistic:
we already know that different
errors contribute differently to
reliability growth.
Jelinski and Moranda Model
Realizes each time an error is
repaired:
reliability does not increase by a
constant amount.
Reliability improvement due to fixing
of an error:
assumed to be proportional to the
number of errors present in the system
at that time.
Jelinski and Moranda Model
Realistic for many applications,
still suffers from several
shortcomings.
Most probable failures (failure
types which occur frequently):
discovered early during the testing
process.
Jelinski and Moranda Model
Repairing faults discovered early:
contribute maximum to the reliability
growth.
Rate of reliability growth should be
large initially:
slow down later on,
contrary to assumption of the model
Littlewood and Verall’s
Model
Allows for negative reliability
growth:
when software repair introduces
further errors.
Models the fact that as errors are
repaired:
average improvement in reliability
per repair decreases.
Littlewood and Verall’s
Model
Treats a corrected bug’s contribution
to reliability improvement:
an independent random variable having
Gamma distribution.
Removes bugs with large
contributions to reliability:
earlier than bugs with smaller
contribution
represents diminishing return as test
continues.
Reliability growth models
There are more complex
reliability growth models,
more accurate approximations to
the reliability growth.
these models are out of scope of
our discussion.
Applicability of Reliability Growth
Models
There is no universally
applicable reliability growth
model.
Reliability growth is not
independent of application.
Applicability of Reliability Growth
Models
Fit observed data to several
growth models.
Take the one that best fits the
data.
Software Maintenance
Types:
Corrective :It is necessary to
overcome the failures observed
while the system is in use
Adaptive: System to run in new
platform/OS/hardware interface etc.
Perfective: Change functionalities
as per customer demand/support
new features
Characteristics of Software
Evolution
Lehman and Belady studied the
characteristics of several software
product and develop the laws-
1st law: A software product must change
continually or become progressively less
useful
2nd law: Structure of a program tends to
degrade as more and more maintenance
is carried out
3rd law: rate at which code is written or
modified is approximately the same
during development and maintenance
Software Maintenance
Model 1
Software Maintenance
Model 2
Empirical Estimation of Maintenance
Cost
Estimation of Maintenance
Cost
Annual Change Traffic (ACT)
[Bohem ,1981]
Fraction of a software product’s source
instructions which undergo change during a
typical year either through addition or
deletion.
ACT=(KLOCadded + KLOCdeleted)/KLOCtotal
Total Maintenance Cost=ACT ×
Development cost
Software Maintenance and
Support
Both follow ITIL (Information
Technology Infrastructure Library
ITIL- a set of practices for
information technology service
management (ITSM)
It align IT service with business
need of the organization
Software Maintenance and
Support
Maintenance is a part of SDLC
Maintenance originates from Change
Request
Directly linked to changes to application
Support is directly linked to serving
(supporting) customer to do their jobs
Level 1- Interface points
Level 2-Quick fix/ minor enhancement
Focus on corrective modifications and
changes
Reference
Software Engineering
Dr. R Mall