Software Reliability
Theory and Practice
2
What is Reliability?
Definitions of
Software
Reliability
Factors Influencing
Outline of the Chapter
Software
Reliability
Applications of
Software
Reliability
Operational
Profiles
Reliability Models
Summary
3
Reliability is a broad concept.
It is applied whenever we expect something
to behave in a certain way.
Reliability is one of the metrics that are
used to measure quality.
It is a user-oriented quality factor relating
to system operation.
Intuitively, if the users of a system rarely
experience failure, the system is
considered to be more reliable than one
that fails more often.
What is Reliability?
A system without faults is considered to be
highly reliable.
Constructing a correct system is a difficult
task.
Even an incorrect system may be
considered to be reliable if the frequency of
failure is “acceptable.”
Key concepts in discussing reliability:
Fault
Failure
Time
Three kinds of time intervals: MTTR, MTTF,
MTBF
4
Failure
A failure is said to occur if
the observable outcome
of a program execution
is different from the
expected outcome.
Fault
The adjudged cause of
failure is called a fault.
Example: A failure may
What is Reliability?
be cause by a defective
block of code.
Time
Time is a key concept in
the formulation of
reliability. If the time gap
between two successive
failures is short, we say
that the system is less
reliable.
Two forms of time are
considered.
Execution time ()
Calendar time (t)
What is Reliability?
MTTF: Mean Time To Failure
MTTR: Mean Time To Repair
MTBF: Mean Time Between Failures (= MTTF + MTTR)
5
Mean Time To Failure (MTTF)
Average time between two
successive failures:
observed over a large number of
failures.
Mean Time to Repair (MTTR)
Once failure occurs:
additional time is lost to fix faults
MTTR:
measures average time it takes to
fix faults.
Mean Time Between Failures
(MTBF)
We can combine MTTF and MTTR:
to get an availability metric:
MTBF=MTTF+MTTR
MTBF of 100 hours would indicate
Once a failure occurs, the next failure
is expected after 100 hours of clock
time.
9
Two ways to measure
reliability
Counting failures in
periodic intervals
Observer the trend of
cumulative failure
count - µ().
Failure intensity
Observe the trend of
number of failures
per unit time – λ().
What is Reliability?
µ()
This denotes the total
number of failures
observed until execution
time from the beginning
of system execution.
λ()
This denotes the number
of failures observed per
unit time after time units
of executing the system
from the beginning. This is
also called the failure
intensity at time .
Relationship between λ()
Definitions of Software Reliability
10
First definition
Software reliability is defined
as the probability of failure-
free operation of a software
system for a specified time in
a specified environment.
Key elements of the
above definition
Probability of failure-
free operation
Length of time of
failure-free operation
A given execution
environment
Example
The probability that a
PC in a store is up
and running for eight
hours without crash
is 0.99.
Second definition
Failure intensity is a measure
of the reliability of a software
system operating in a given
environment.
Example: An air traffic
control system fails once
in two years.
Comparing the two
11
A user’s perception of
Factors Influencing Software Reliability
the reliability of a
software depends upon
two categories of
information.
The number of faults
present in the
software.
The ways users
operate the system.
This is known as
the operational
profile.
The fault count in a
system is influenced by
the following.
Size and complexity
of code
Characteristics of
the development
12
Comparison of software
Applications of Software Reliability
engineering technologies
What is the cost of
adopting a technology?
What is the return from the
technology -- in terms of
cost and quality?
Measuring the progress of
system testing
Key question: How of
testing has been done?
The failure intensity
measure tells us about the
present quality of the
system: high intensity
means more tests are to
be performed.
Controlling the system in
operation
The amount of change to a
software for maintenance
affects its reliability. Thus
the amount of change to
be effected in one go is
determined by how much
reliability we are ready to
potentially lose.
Operational Profiles
Developed at AT&T Bell Labs.
An OP describes how actual
users operate a system.
An OP is a quantitative
characterization of how a
system will be used.
Two ways to represent
operational profiles
Tabular
Graphical
Figure 15.2: Graphical representation of
operational profile of a library information
Table 15.1: An example of system.
operational profile of a library
information system.
13
14
Use of operational profiles
For accurate
estimation of the
reliability of a system,
test the system in the
same way it will be
actually used in the
field.
Other uses of operational
Operational Profiles
profiles
Use an OP as a
guiding document in
designing user
interfaces.
The more frequently
used operations
should be easy to
use.
Use an OP to design
an early version of a
software for release.
This contains the
more frequently
Main idea
15
We develop
mathematical
models for λ()
and µ().
Basic assumptions in
developing a
reliability model
Faults in the
program are
Reliability Models
independent.
Execution time
between failures is
large w.r.t.
instruction
execution time.
Potential test
space covers its
use space.
The set of inputs
16
Intuitive idea
As we observe another
system failure and the
corresponding fault is
fixed, there will be fewer
number of faults remaining
in the system and the
failure intensity will be
smaller with each fault
fixed.
In other words, as the
cumulative failure count
Reliability Models
increases, the failure
intensity decreases.
Two decrement processes
Decrement process 1
The decrease in failure
intensity after
observing a failure and
fixing the
corresponding fault is
constant.
This gives us the
Basic model.
Decrement process 2
The decrease in failure
intensity after
17
Reliability Models
Figure 15.3: Failure intensity λ as a function of
Parameters
cumulative failures of
µ. the models
λ 0: The initial failure intensity observed at the
beginning of system testing.
v 0: The total number of system failures that we
expect to observe over infinite time starting fro
the beginning of system testing.
: A parameter representing n0n-linear drop in
failure intensity in the Logarithmic model.
18
Reliability Models
Figure 15.4: Failure intensity λ as a function of
execution time (λ 0= 9 failures/unit time, v 0=
Basic
500 failures, =model
0.0075).
Assumption: λ(µ) = λ 0(1 - µ/v 0)
dµ()/d = λ 0(1 - µ()/v 0)
µ() = λ 0(1 - µ/v 0)
λ() = λ 0.e-λ0 /v0
Logarithmic model
Assumption: λ(µ) = λ 0e-µ
dµ()/d = λ 0e-µ()
µ() = ln(λ 0 + 1)/
λ() = λ 0/(λ 0 + 1)
19
Reliability Models
Figure 15.4:
Cumulative failure
µ as a function of
execution time
The number of faults in a system is influenced by the
following:
Size and complexity of code.
Development process.
Personnel quality.
Operational environment
Operational profile
20`
A quantitative characterization of how actual users
operate a system.
Tabular and graphical representation
Applications of reliability metric
Reliability models
Six assumptions
Two models
Basic
Logarithmic
Reliability is a user-oriented quality factor relating to
system operation.
The chapter introduced the following.
Fault and failure
Execution and calendar time
Time interval between failures
Summary
Failures in periodic intervals
Failure intensity
Software reliability was defined in two ways.
The probability of failure-free operation of a system
for a specified time in a given environment.
Failure intensity is a measure of reliability.
User’s perception of reliability:
The number of faults in a system.
How a user operates a system.
Questions?
21