CSC310
Fault Tolerance
SPRING 2022
Lecture 2 - Fault Models, Detection & Simulation
Instructor: Dr. Belal Badawy
Fault Simulation
Fault simulation similar to testing process
evaluates fault detection capabilities of a set of test patterns
Fault coverage = percentage of faults detected by set of test vectors
Fault simulators:
emulate faults
compare output response to known good circuit output response
For a given set of input test patterns
At least one mismatch ⇒fault is detected
No mismatches ⇒fault is not detected
2 Fault Models, Detection & Simulation
Fault Detection
Fault detection requires:
observation of an error (from the fault) at a primary output Fault
Observability of the fault site
The ease at which we can observe the fault behavior
input stimuli that creates an error as a result of fault
Controllability of the fault site
Also considered part of observability
Testability of fault controllability & observability of site
Circuit testability overall circuit controllability & observability
3
Fault Coverage (Simple View)
Given a set of test vectors, each fault in the fault set for the circuit can be:
D = detected faults
Targeted faults and faults “accidentally” detected
U = undetected faults
Could not find a vector to detect fault. But there might be one not
included in vector set.
T = total faults = D + U
Fault coverage = D / T
4
Fault Simulation
5
Fault Simulation
6
Fault Modeling
Faults at the physical level in chips cannot be tested and detected directly, (
many of them are often complex in nature to analyze).
Faults need to be modeled at a higher abstraction level in order that they
can be analyzed and test signals generated to detect them.
These models are generally referred to as fault models.
7
Fault Modeling
A fault modeling: are analyzable approximations of defects and are
essential for a test methodology.
From the model, the designer or user can then predict the consequences of
this particular fault.
Fault models can be used in almost all branches of engineering.
It is must be simple.
It is must lead to accurate conclusions.
8
Fault Modeling
Why use a modeling?
I/O function tests inadequate for manufacturing.
Real defects (often mechanical) too numerous and often not analyzable.
A fault model identifies targets for testing
A fault model makes analysis possible
Effectiveness measurable by experiments
9
Fault Modeling
Fault models at different levels
Lower levels models like:
Transistor level
Gate level
Higher-levels fault models.
Function level (often error models)
Behavior level (often timing failure models)
. . .
• System level (usually failure models)
10 Fault Tolerant Computing Dr. Tarek abdul Hamid
Fault Modeling
Fault models at lower levels
The lower-level fault models include those defined at the transistor and the
gate levels.
The digital design is described as an interconnection of transistors and
gates, and faults can be modeled as defects in their respective components.
The commonly fault models used at the transistor and gate-level are:
• Stuck at fault model,
• Transition delay fault model and
•IDDQ fault model
•Bridging faults
11 Fault Tolerant Computing Dr. Tarek abdul Hamid
Fault Modeling
Fault models at lower levels
Stuck at fault model: one of the most widely used fault models for gate-
level digital
The faults are modeled on signal lines or interconnect between the gates.
Using the stuck-at fault model, two types of faults can be modeled for any
signal line in the gate-level digital circuit. The logic value of a considered
faulty signal line could be permanently stuckat logic ‟0‟ or stuck-at logic ‟1‟.
12
Fault Modeling
Fault models at lower levels
Advantages of stuck at fault model
• Simplicity
• Accuracy
• Can model most real faults
• Tractable model space - count the possible number of faults
• Easy to use and easy to quantify (for quality metric)
• Substantial empirical evidence of its practical use
13
Fault Modeling
Fault models at lower levels
Disadvantages of stuck at fault model
• With increasing device density the model is being questioned often
and loosing many of its advantages
• Some real defects can not be modeled by this model
• More powerful computers are making it possible to handle other
models - even at the fabrication level
14
Fault Modeling
Fault models at lower levels
Transition delay fault model: is introduced the delay defects in a digital
circuit
As per its model, a faulty signal line can behave as a slow-to-rise signal or a
slow-to-fall signal.
For a slow-to-rise, the signal line behaves as a temporary stuck-at logic ‟0‟
for a time period which exceeds the maximum delay of the circuit.
A similar behavior is exhibited by a slow-to-fall transition delay fault.
15
Fault Modeling
Fault models at lower levels
Advantages of transition delay fault model
• Performance oriented modeling
• Quite general
Disadvantages of transition delay fault model
• Difficult to use and intractable (path delay)
16 Fault Tolerant Computing Dr. Tarek abdul Hamid
Fault Modeling
Fault models at lower levels
IDDQ fault model: is a method for testing integrated circuits CMOS for
manufacturing faults. It depends on measuring the supply current (Idd) in the
quiescent state.
The IDDQ faults model certain physical defects occurring in fabrication like
shorts between signal lines, transistors permanently in ON state,
17
Fault Modeling
Fault models at lower levels
Bridging faults: are caused by manufacturing defects, pair of lines in a
circuit (at gate level) are shorted.
Its occur when signals are connected together.
18
Fault Modeling
Fault models at lower levels
Advantages of Bridging faults model
• simple
• realistic
Disadvantages of Bridging faults model
• large number of faults
• difficult to relate to the quality metric
19
Fault Modeling
Fault models at higher levels
At higher levels of abstraction, faults can be modeled at the Register-Transfer
Level (RTL) or the behavioral level.
At the RTL, faults can be modeled in the registers and/or in the data
transfers between the registers.
At the behavioral level, the digital design is described in the form of an
algorithm or functional description and faults can be modeled in the various
operations that are defined and used in the description.
20
Fault Modeling
Fault models at higher levels
Fault models at lower levels have higher correlation with the physical
defects and hence are able to be characterized better as compared to fault
models at higher levels .
Fault models at higher levels are less complex, easier to analyze, and
utilize for test generation and test evaluation than those at lower
abstraction levels.
21
Fault Modeling
Error models
Means of classifying the effect of physical fault(s) in a system - note from
modeling point of view it is not necessary that we deduce it using a fault
model
Goals
Extent of information corrupted
Extent of error(s) propagated
Latency issue
22
Fault Modeling
Error models
Error effects
Data
Control
State
Error Types (HW)
Bit errors (data, control, state) - single bit error assumption commonly used in
practice
Unidirectional errors (mostly in data)
Byte errors (data)
Other - intermediate logic level
23
Fault Modeling
Error models
Error Types (SW)
Branch error
Missing instruction error
Missing pointer errors
24
Fault Modeling
High-level failure models (process or system failure)
System model
Single or multiple processor system
Single - multiple processes executing
Key - interacting processes - such as message passing systems,
distributed systems, ...
25
Fault Modeling
High-level failure models (process or system failure)
General Classification
Crash failure - a faulty processor or system stops permanently
Omission failure - a faulty process omits inputs/outputs some times but
when it works, it works correctly
Timing failure - inputs/outputs are delayed or arrive too early
Byzantine failure or arbitrary failure - a faulty processor can exhibit
arbitrary behavior including malicious nature
26