Unit-5 Faults in RTOS

The document discusses faults in real-time operating systems. It describes different types of faults including permanent, intermittent, and transient faults. It also discusses fault detection techniques like online and offline detection. Error detection techniques like watchdogs and duplication are described. Fault tolerance techniques including triple modular redundancy, primary/backup, and backup overloading scheduling are also summarized.

Uploaded by

aravindjas95020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views5 pages

Unit-5 Faults in RTOS

Uploaded by

aravindjas95020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

FAULTS IN RTOS

Real time systems are systems in which there is a commitment for timely response by the
computer to external stimuli. Real time applications have to function correctly even in
presence of faults. Fault tolerance can be achieved by either hardware or software or time
redundancy. Safety-critical applications have strict time and cost constraints, which means
that not only faults have to be tolerated but also the constraints should be satisfied. Deadline
scheduling means that the task with the earliest required response time is processed.

In hard real time systems it is important that tasks complete within their deadline even in
the presence of a failure. In soft real-time systems it is more important to economically
detect a fault as soon as possible rather than to mask a fault. Fault tolerance is the ability to
continue operating despite the failure of a limited subset of their hardware or software. So
the goal of the system designer is to ensure that the probability of system failure is
acceptably small. There can be either hardware fault or software fault, which disturbs the
real time systems to meet their deadlines.
FAULT TYPES

There are three types of faults: Permanent, intermittent, and transient. A permanent fault
does not die away with time, but remains until it is repaired as the affected unit is replaced.
This is an intermittent fault cycle between the fault–active and fault being in states. A
transient fault dies away after some time.

(A permanent fault is one that continues to exist until the faulty component is repaired. Transient
fault occurs only once and we cannot trace it later on. If we repeat the operation, the fault goes
away. An intermittent fault becomes apparent not continuously but at irregular intervals.)

FAULT DETECTION

Fault detection can be done either online or offline. Online detection goes on in parallel
with normal system operation. Offline detection consists of running diagnostic tests.
ERROR DETECTION TECHNIQUES
In order to achieve fault tolerance, the first requirement is that transient faults have to be
detected. Several error-detection techniques are there against transient faults: watchdogs,
duplication and few others.

Watchdogs. In the case of watchdogs program flow or transmitted data is periodically

checked for the presence of errors. The simplest watchdog schema, watchdog timer,
monitors the execution time of processes, whether it exceeds a certain limit.

Duplication. Duplication is an approach to have multiple processors, which are supposed to

put out the same result and compare the results. A discrepancy indicates the existence of a
fault .
There are several other error-detections techniques,
e.g. signatures, widely-used parity- bit check.

Redundancy
Fault tolerance system is to be kept running despite the failure of some of its parts, it must
have spare capacity to begin.
There are two ways to make a system more resistant to faults.
Hardware: This technique relies on adding extra redundant hardware to a system to make it
fault- tolerant.
Software: this technique relies on duplicating the code, process, or even messages,
depending on the context.
A typical example of where the above techniques are applied would be the autopilot system
on-board a large-sized passenger aircraft.
A passenger aircraft typically consists of a central autopilot system with two other backups.
This is an example of making a system with two other backups. This is an example of
making a system fault tolerant by adding redundant hardware. The two extra systems will
not be used unless the main system is completely broken.
However, this is not sufficient, since in the event that the main system starts behaving
Erratically lives of many people is in danger. The system is therefore also made resistant to
faults using software.
Generally, every process of the autopilot runs more than two copies, distributed across
different computers. The system then votes on the results of these process. To make the
system even more secure, some autopilots also employ the principle of design diversity. In
this feature, not only a software is run multiple times, but also each copy is written by a
different engineering team. The likelihood of same mistake being made by different
engineering teams is very low.
However, such measures are only applied for highly critical systems. In general, hardware
redundancy is avoided as far as possible, due to limited resources that are available.
Weight of the system, power consumption, and price constraints make it difficult to
employ high hardware redundancy to make the system fault tolerant. Software redundancy
is therefore, more commonly used to increase fault tolerance of systems.
There are few factors that affect the diversity of the multiple versions. The first factor is
the requirements specification. A mistake in the specification causes a wrong output to be
delivered. A second approach is the programming language. The nature of the language
affects the programming style greatly.
A third factor is the numerical algorithms that are used. Algorithms implemented to a
finite precision can behave quite differently for certain sets of inputs than do theoretical
algorithms, which assume infinite precision.
A fourth factor is the nature of the tools that are being used, the probability of common-
mode failure might increase. A fifth factor is the training and quality of the programmers
and the management structure. The major difficulty in software is labor- intensive.

FAULT TOLERANCE TECHNIQUES

1) TMR (Triple Modular Redundancy)

Multiple copies are executed and error checking is achieved by comparing results after
completion. In this scheme, the overhead is always on the order of the number of copies
running simultaneously.
2) PB (Primary/Backup)
The tasks are assumed to be periodic and two instances of each task (a primary and a
backup) are scheduled on a uni-processor system. One of the restrictions of this approach
is that the period of any task should be a multiple of the period of its preceding tasks. It
also assumes that the execution time of the backup is shorter than that of the primary.

PRIMARY BACKUP FAULT TOLERANCE

This is the traditional fault-tolerant approach wherein both time as well as space exclusions
are used. The main idea behind this algorithm is that (a) the backup of a task need not
execute if its primary executes successfully, (b) the time exclusion in this algorithm ensures
that no resource conflicts occur between the two versions of any task, which might improve
the schedulability. Disadvantages in this system are that (a) there is no de-allocation of the
backup copy, (b) the algorithm assumes that the tasks are periodic (the times of the tasks are
predetermined), (c) compatible (the period of one process is an integral multiple of the
period of the other process) and execution time of the backup is shorter than that of the
primary process.
FAULT TOLERANT DEADLINE SCHEDULING
I)Backup Overloading Scheduling Algorithm

The following steps form the procedure used to implement the backup overloading
algorithm.
a)Arriving task
A task has four properties when it arrives, arrival time (ai), Ready time (ri), Deadline – (di)
and worst case computation time (ci) represented as Ti = (ai, ri, di, ci)
b)EDF schedulability

Check if all the tasks can be scheduled successfully using the earliest deadline first
algorithm. If the schedulability test fails, then reject the set of tasks saying that they are not
schedulable.
c) Searching for timeslot
When task Ti arrives, check each processor to find if the primary copy (Pri) of the task can
be scheduled between ri and di. Say it is scheduled on processor Pi.
d)Try overloading
Try to overload the backup copy (Bki) on an existing backup slot on any processor other
than Pi. Note: The backups of 2 primary tasks that are scheduled on the same processor
must not overlap. If the processor fails, it will not be possible to schedule the two backups
simultaneously since they are on the same time slot (overloaded).
e)EDF Algorithm
If there is no existing backup slot that can be overloaded, then schedule the backup on the
latest possible free slot depending upon the dead line of

the task. The task with the earliest deadline is scheduled first.
f) De-Allocation of backups
If a schedule has been found for both the primary and backup copy for a task, commit the
task, otherwise reject it. If the primary copy executes successfully, the corresponding
backup copy is de- allocated.
g) Backup execution
If there is a permanent or transient fault in the processor, the processor crashes and then all
the backups of the tasks that were running on this system are executed on different
processors.

Chapter 3
No ratings yet
Chapter 3
40 pages
Fault Tolerant Computing
No ratings yet
Fault Tolerant Computing
4 pages
Cloud
No ratings yet
Cloud
18 pages
Explain How The Dijkstra's Banker's Algorithm Can Be Used To Avoid Unsafe Situations That Can Lead To Deadlock For A Single Resource Type
No ratings yet
Explain How The Dijkstra's Banker's Algorithm Can Be Used To Avoid Unsafe Situations That Can Lead To Deadlock For A Single Resource Type
6 pages
Synthesis of Fault-Tolerant Embedded Systems: Eles, Petru Izosimov, Viacheslav Pop, Paul Peng, Zebo
No ratings yet
Synthesis of Fault-Tolerant Embedded Systems: Eles, Petru Izosimov, Viacheslav Pop, Paul Peng, Zebo
7 pages
Fault Tolerance Techniques: Unit 3
No ratings yet
Fault Tolerance Techniques: Unit 3
40 pages
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec12 - Distributed and Parallel ComputerSystems
28 pages
Ch-4-Fault Tularance - Naming-SM
No ratings yet
Ch-4-Fault Tularance - Naming-SM
42 pages
Introduction
No ratings yet
Introduction
8 pages
Design Patterns For High Availability
No ratings yet
Design Patterns For High Availability
10 pages
Fault-Tolerant Parallel Computing
No ratings yet
Fault-Tolerant Parallel Computing
4 pages
RTFT15 Unit 2
No ratings yet
RTFT15 Unit 2
53 pages
Esd Unit3
No ratings yet
Esd Unit3
8 pages
Design of Fault Tolerant Systems
No ratings yet
Design of Fault Tolerant Systems
7 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Reliable System Design
No ratings yet
Reliable System Design
2 pages
Fault Tolerance Techniques
No ratings yet
Fault Tolerance Techniques
4 pages
Week09-Fault Tolerant System
No ratings yet
Week09-Fault Tolerant System
26 pages
Reference Paper 1
No ratings yet
Reference Paper 1
10 pages
Introduction To Fault Tolerance
No ratings yet
Introduction To Fault Tolerance
20 pages
RTS UNiT 4
No ratings yet
RTS UNiT 4
19 pages
Rts
No ratings yet
Rts
44 pages
Nonstop Fault Tolerant Servers Quick Reference
No ratings yet
Nonstop Fault Tolerant Servers Quick Reference
10 pages
Fault-Tolerant Deadline-Monotonic Algorithm For Scheduling Hard-Real-Time Tasks
No ratings yet
Fault-Tolerant Deadline-Monotonic Algorithm For Scheduling Hard-Real-Time Tasks
6 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
Lesson 1 - Introduction To Fault-Tolerant Computing
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
6 pages
Computerized Approach For Matrixform Fmea 1979
No ratings yet
Computerized Approach For Matrixform Fmea 1979
1 page
Lecture 7 - FAULT-TOLERANT COMPUTING
No ratings yet
Lecture 7 - FAULT-TOLERANT COMPUTING
13 pages
Et3491 Unit III PPT Part 1
No ratings yet
Et3491 Unit III PPT Part 1
33 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
Fault Tolerance
No ratings yet
Fault Tolerance
10 pages
Fault Tolerant Design: An Introduction: Elena Dubrova
No ratings yet
Fault Tolerant Design: An Introduction: Elena Dubrova
162 pages
Real-Time Systems Error Scheduling
No ratings yet
Real-Time Systems Error Scheduling
10 pages
7.fault Tolerance
No ratings yet
7.fault Tolerance
35 pages
Faulttolerancech5 150426005118 Conversion Gate02
No ratings yet
Faulttolerancech5 150426005118 Conversion Gate02
24 pages
DFTS BE 4 II Sem Unit 2
No ratings yet
DFTS BE 4 II Sem Unit 2
112 pages
II Fault Tolerant Techniques
No ratings yet
II Fault Tolerant Techniques
101 pages
Redundant and Voting System
No ratings yet
Redundant and Voting System
10 pages
Ran Dell 75
No ratings yet
Ran Dell 75
18 pages
CSC 308 Fault Tolerant Computing
No ratings yet
CSC 308 Fault Tolerant Computing
24 pages
Rtes Reliability and Fault Torelance
No ratings yet
Rtes Reliability and Fault Torelance
40 pages
Extending Rt-Minix With Fault Tolerance Capabilities: Pablo J. Rogina
No ratings yet
Extending Rt-Minix With Fault Tolerance Capabilities: Pablo J. Rogina
8 pages
AvailabilityTactic PDF
No ratings yet
AvailabilityTactic PDF
3 pages
Fault Tolerance for Engineers
100% (1)
Fault Tolerance for Engineers
104 pages
Unit5 1
No ratings yet
Unit5 1
23 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
A Survey of Fault Tolerance Approaches On Different Architecture Levels
No ratings yet
A Survey of Fault Tolerance Approaches On Different Architecture Levels
9 pages
RTS Assignment 2
No ratings yet
RTS Assignment 2
5 pages
Revision Notes - 02 Reliability in Computer Systems
No ratings yet
Revision Notes - 02 Reliability in Computer Systems
12 pages
Distributed File Systems
No ratings yet
Distributed File Systems
19 pages
Fault Tolerance in EDF Scheduling
No ratings yet
Fault Tolerance in EDF Scheduling
26 pages
DS Unit - 4
No ratings yet
DS Unit - 4
20 pages
A Review On Fault Tolerance in Distributed Database
No ratings yet
A Review On Fault Tolerance in Distributed Database
4 pages
WRL0004 TMP
No ratings yet
WRL0004 TMP
9 pages
Icst 1011
No ratings yet
Icst 1011
6 pages
Instagram Beginner's Guide
No ratings yet
Instagram Beginner's Guide
11 pages
Quantum Tech for Researchers
No ratings yet
Quantum Tech for Researchers
260 pages
How Chess
No ratings yet
How Chess
1 page
Radware ThreatReport Report 2024 RW-459
No ratings yet
Radware ThreatReport Report 2024 RW-459
59 pages
Complier Design Lab Manual - R18 - III - Year - II - Semester - C.S.E
No ratings yet
Complier Design Lab Manual - R18 - III - Year - II - Semester - C.S.E
26 pages
FInal Research About Online Gaming
No ratings yet
FInal Research About Online Gaming
30 pages
Not Finished Capstone Baul Et Al.
No ratings yet
Not Finished Capstone Baul Et Al.
57 pages
CMA Part 1 Essay Prep
No ratings yet
CMA Part 1 Essay Prep
151 pages
User Guide For Cloning - 3 - 198 17-CNZ 222 76 Rev - J
No ratings yet
User Guide For Cloning - 3 - 198 17-CNZ 222 76 Rev - J
29 pages
Visitor Management System
No ratings yet
Visitor Management System
39 pages
KMS Activation Guide for IT Pros
No ratings yet
KMS Activation Guide for IT Pros
13 pages
Android Module Metadata Errors
No ratings yet
Android Module Metadata Errors
35 pages
DART Report and Acrobat - IntouchSupport
No ratings yet
DART Report and Acrobat - IntouchSupport
2 pages
SOP-M-002 Device History Record Rev A
No ratings yet
SOP-M-002 Device History Record Rev A
4 pages
Assignment-4 (Dbe)
No ratings yet
Assignment-4 (Dbe)
4 pages
AI (Seetharam)
No ratings yet
AI (Seetharam)
1 page
CertPrep NCP-CN PDF Questions
No ratings yet
CertPrep NCP-CN PDF Questions
5 pages
3.5.2.1integration Queries-Commonly Raised (Last Updated 24 March 2025)
No ratings yet
3.5.2.1integration Queries-Commonly Raised (Last Updated 24 March 2025)
17 pages
Class 12 Board Exam Project Bakery Management
No ratings yet
Class 12 Board Exam Project Bakery Management
19 pages
Manual 4JS
No ratings yet
Manual 4JS
408 pages
PYTHON
No ratings yet
PYTHON
5 pages
Dell Latitude 3520 Specs & Warranty
No ratings yet
Dell Latitude 3520 Specs & Warranty
4 pages
SOURCES
No ratings yet
SOURCES
12 pages
Car Shop Web Application
No ratings yet
Car Shop Web Application
6 pages
ch08 Unit3
No ratings yet
ch08 Unit3
56 pages
Metro Node - Metro Node Admin Procedures-Manage
No ratings yet
Metro Node - Metro Node Admin Procedures-Manage
9 pages
Spreadsheets and The Data Life Cycle
No ratings yet
Spreadsheets and The Data Life Cycle
11 pages
Animals &amp Creatures - Master Class Fact Sheet - Animation Mentor
No ratings yet
Animals &amp Creatures - Master Class Fact Sheet - Animation Mentor
2 pages
Details of Ciros Education
No ratings yet
Details of Ciros Education
1 page
Amiks Karki LB6
No ratings yet
Amiks Karki LB6
15 pages

Unit-5 Faults in RTOS

Uploaded by

Unit-5 Faults in RTOS

Uploaded by

FAULTS IN RTOS

Watchdogs. In the case of watchdogs program flow or transmitted data is periodically

Duplication. Duplication is an approach to have multiple processors, which are supposed to

FAULT TOLERANCE TECHNIQUES

1) TMR (Triple Modular Redundancy)

PRIMARY BACKUP FAULT TOLERANCE

You might also like