0% found this document useful (0 votes)
21 views5 pages

Unit4 Reliability Evaluation

Uploaded by

Hari Jaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Unit4 Reliability Evaluation

Uploaded by

Hari Jaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Reliability and Evaluation

 Reliability: a measure of the success with which a system


conforms to some authoritative specification of its behaviour
 As with reliability, to ensure the safety requirements of an embedded
system, system safety analysis must be performed throughout all
stages of its life cycle development When the behaviour of a system
deviates from that which is specified for it, this is called a failure
 Failures result from unexpected problems internal to the system that
eventually manifest themselves in the system's external behaviour
 These problems are called errors and their mechanical or algorithmic
cause are termed faults
 Systems are composed of components which are themselves
systems: hence
 > failure -> fault -> error -> failure -> fault
 A transient (temporary) fault starts at a particular time, remains in the
system for some period and then disappears
 E.g. hardware components which have an adverse reaction to
radioactivity
 Many faults in communication systems are transient
 Permanent faults remain in the system until they are repaired; e.g., a
broken wire or a software design error
 Intermittent faults are transient faults that occur from time to time
 E.g. a hardware component that is heat sensitive, it works for a time,
stops working, cools down and then starts to work again
Approaches to Achieving Reliable Systems
 Fault prevention attempts to eliminate any possibility of faults
creeping into a system before it goes operational
 Fault tolerance enables a system to continue functioning even in the
presence of faults
 Both approaches attempt to produces systems which have well-
defined failure modes
Fault Prevention
 Two stages: fault avoidance and fault removal
 Fault avoidance attempts to limit the introduction of faults during
system construction by:
 use of the most reliable components within the given cost and
performance constraints
 use of thoroughly-refined techniques for interconnection of
components and assembly of subsystems
 use of proven design methodologies
 use of software engineering environments to help manipulate
software components and thereby manage complexity
Fault Removal
 Design errors (hardware and software) will exist
 Fault removal: procedures for finding and removing the causes of
errors;
 e.g. design reviews, program verification, code inspections and
system testing
 System testing can never be exhaustive and remove all potential
faults
 A test can only be used to show the presence of faults, not their
absence
 Most tests are done with the system in simulation mode and it
is difficult to guarantee that the simulation is accurate
 Requirements errors during the system's development may not
manifest themselves until the system goes operational
Failure of Fault Prevention Approach
 In spite of all the testing and verification techniques, hardware
components will fail; the fault prevention approach will therefore be
unsuccessful when
 either the frequency or duration of repair times are
unacceptable, or
 the system is inaccessible for maintenance and repair activities
 Alternative is Fault Tolerance
Levels of Fault Tolerance
 Full Fault Tolerance — the system continues to operate in the
presence of faults, although for a limited period, with no significant
loss of functionality or performance
 Graceful Degradation (fail soft) — the system continues to operate in
the presence of errors, accepting a partial degradation of functionality
or performance during recovery or repair
 Fail Safe — the system maintains its integrity while accepting a
temporary halt in its operation
 The level required will depend on the application
 Most safety critical systems require full fault tolerance, however in
practice many settle for graceful degradation

A fundamental way of improving the reliability of software systems depends


on the principle of design diversity where different versions of the functions are
implemented. In order to prevent software failure caused by unpredicted
conditions, different programs (alternative programs) are developed separately,
preferably based on different programming logic, algorithm, computer language,
etc. This diversity is normally applied under the form of recovery blocks or N-
version programming.
Fault-tolerant software assures system reliability by using protective
redundancy at the software level. There are two basic techniques for obtaining
fault-tolerant software: RB scheme and NVP. Both schemes are based on
software redundancy.
1. Recovery Block Scheme
The recovery block scheme consists of three elements: primary module,
acceptance tests, and alternate modules for a given task. The simplest scheme
of the recovery block is as follows:

Where T is an acceptance test condition that is expected to be met by successful


execution of either the primary module P or the alternate modules Q1, Q2, . . .,
Qn-1.

The probability of failure of the RB scheme is as follows:

where

= probability of failure for version Pi

= probability that acceptance test ‘i’ judges an incorrect result as


correct

= probability that acceptance test ‘i’ judges a correct result as


incorrect.
2. N-version Programming

NVP is used for providing fault-tolerance in software. In concept, the NVP


scheme is similar to the N-modular redundancy scheme used to provide tolerance
against hardware faults.
The NVP is defined as the independent generation of N>=2 functionally
equivalent programs, called versions, from the same initial specification.
Independent generation of programs means that the programming efforts are
carried out by N individuals or groups that do not interact with respect to the
programming process.
‘n’ alternative programs are usually executed simultaneously and their
results are sent to a decision mechanism which selects the final result.

The probability of failure of the NVP scheme, Pn, can be expressed as

The first term of this equation is the probability that all versions fail. The second
term is the probability that only one version is correct. The third term, d, is the
probability that there are at least two correct results but the decision algorithm fails
to deliver the correct result.

You might also like