VLSI Testing and Verification Challenges
VLSI Testing and Verification Challenges
Q.1
(Already provided earlier in the conversation — included here succinctly for continuity.)
1. Equivalent Faults: Faults that produce identical faulty behavior for all input tests (can
be collapsed).
2. Fault: Deviation from intended behavior due to defect (stuck-at, bridging, delay etc.).
3. Reject Rate: Fraction of manufactured chips failed by test.
4. Rule of Ten: Fixing a defect gets ~10× more expensive at each later stage (design →
verification → fab → field).
5. Fault Coverage: % of modeled faults that a test set detects.
6. Defect Level: Probability a faulty device escapes detection.
7. Fault Detection Efficiency: Ratio of faults detected to faults simulated (or introduced)
by a test set.
Q.2
(a) Explain bridging fault models. (3 marks)
Bridging fault: When two or more signal nets that should be isolated become electrically
connected (shorted) due to a manufacturing defect such as metal bridging, conductive particles,
or via bridging. Bridging faults are common in VLSI.
Types / behavior:
1. Wired-AND (dominant-0) bridging: One net dominates and forces the resultant node
behavior equivalent to a wired-AND (dominant-0). Example: If net X is strongly driven
to 0 and shorted to net Y, the shorted pair behaves as X AND Y (tends toward 0).
2. Wired-OR (dominant-1) bridging: One net dominates high and drives the short to
logic-1; equivalent to a wired-OR behavior.
3. Resistive bridging (R-bridging): The short is resistive; behavior depends on signal
strengths and can cause intermediate voltages, transient or crosstalk-like effects (difficult
to model with pure stuck-at).
4. Dominant driver model: Determine which driver (pull-up/pull-down) is stronger — that
decides the effect of the bridge.
Bridging faults are not always detected by single stuck-at testing, because a stuck-at
model may not mimic the effect of shorting two active nets.
ATPG must include bridging fault models (dominant-zero, dominant-one, resistive) or
use structural/transition tests that can differentiate combined behavior.
Test patterns often require toggling drivers and isolating the effect (e.g., set one net to
logic that will reveal the bridge and the other to the opposite to create a conflict).
Use patterns that force one net to 1 and the other to 0 (and vice versa) to cause observable
mismatches.
Add test points or use scan to control checks.
Use IDDQ for resistive bridging which may cause increased leakage.
(b) Obtain Controllability and Observability for various signals of 5-input OR Gate using
SCOAP and Probability-based testability analysis. (4 marks)
Interpretation (SCOAP):
Assume random, independent primary inputs with probability 0.5 for logic-1 and 0.5 for logic-
0.
SCOAP highlights that forcing output=0 is expensive; making output=1 is much cheaper.
Probability-based measure shows output usually equals 1, so detecting faults that require
output=0 will be rare with random stimuli (hence need for directed patterns).
For effective ATPG, tests that force all other inputs to the right polarity are necessary
(SCOAP guided) since random vectors will rarely activate certain faults.
(c) Explain testing methodology for transistor faults in two-input CMOS NAND Gate. (7
marks)
Pull-up (pMOS): two pMOS transistors in parallel from VDD to output Y (connected to
inputs A and B).
Pull-down (nMOS): two nMOS transistors in series from Y to GND (A in series with
B).
Output Y = NOT(A AND B).
Summary:
Transistor testing of CMOS NAND uses a mix of directed logic vectors (including two-vector
sequences), IDDQ checks for leakage/shorts, and timing/transition tests for
resistive/threshold issues. Good test plans combine gate-level ATPG to detect functional
manifestations and silicon-level current/timing tests for transistor defects.
1. Derived Clocks
2. Combinational feedback loops (7 marks)
Background:
Scan design (Design for Testability — DFT) converts flip-flops (FFs) into scan cells.
Scan chains allow shifting in test vectors and shifting out captured responses to make
sequential circuits testable like combinational circuits.
Scan design rules ensure that scan path and scan mode work correctly, do not introduce
test escapes, and avoid hazards during test.
1. Single clock domain per scan chain or controlled crossing: Avoid mixing derived
clocks into a scan chain; if needed, provide gating or safe clocks.
2. Avoid asynchronous set/reset during scan shifting: Asynchronous signals should be
disabled while shifting to avoid corruption.
3. Control scan-enable (SE) net carefully: SE should be synchronous or glitch-free; drive
SE from a dedicated, well-buffered net.
4. No combinational feedback loops without breakpoints: Combinational loops make
ATPG and scan shifting ambiguous (may require additional test points or special mode).
5. Safe handling of clocks derived from scan signals: Avoid clocks that change when SE
toggles; derived clocks must be static or disabled during shift.
6. Test mode isolation: Isolate scan signals from normal functional signals when needed;
ensure reliable timing.
Derived clocks are clocks generated inside the module using logic (e.g., gated clocks,
XOR/AND-based derived clocks).
Rule: Never use derived clocks to directly clock scan flip-flops unless the derived clock
is stable and well-defined in both functional and test modes. Prefer a single global test
clock for all scan FFs.
Problems if not followed:
o When in scan mode, SE may change the structure of toggle activity causing
derived clocks to glitch, corrupting shifted data.
o Derived clocks can create timing uncertainty during shift and capture.
Mitigations / design guidelines:
o Use clock gating cells that are test-aware (with test-mode passthrough) so during
scan shifting the clock is predictable.
o Ensure derived clocks are disabled in scan-shift mode (use tester clock gating) and
only enabled during capture cycle if required.
o Ensure scan FFs are clocked from a stable, dedicated test clock (TCK) during scan
operations. If multiple clock domains exist, create separate scan chains per domain.
o Avoid gating signals that depend on scan enable or test signals.
Combinational feedback loops (a path of combinational logic that forms a loop without
storage elements) create a problem for ATPG because logic becomes state dependent;
combinational loops can cause oscillations or ambiguous values during scan shift and
capture.
Rules:
o Break loops: Insert scan FFs or test points to break the loop so the combinational
portion becomes acyclic for ATPG.
o Make feedback through registered element: Ensure any feedback path has a
register (scan cell) on the path so the loop is broken during shift/capture and
behavior is well-defined.
o If loop is intentional (e.g., latch-based state-holding), ensure it’s covered by
sequential ATPG or by designing additional control signals to isolate during
test.
Testing implications:
o If a combinational loop remains, ATPG may not be able to generate tests (or tests
may be expensive). Emulate or constrain loop behavior for ATPG or manually
provide control signals that break the loop during test.
o Breakpoints and pseudo-primary inputs might be added to allow test vectors to
propagate.
Practical recommendations:
During design, identify all loops and decide where to place scan FFs so the combinational
logic appears acyclic for ATPG.
Use the Design Rule Checker (DRC) and DFT tools to enforce these rules before tape-
out.
Coordinate clock gating and derived clock design with DFT team to ensure scan
reliability.
Q.3
(a) Calculate number of collapsed faults for two-input CMOS NOR Gate. (3 marks)
Signal list: A, B (inputs), Y (output). Naive stuck-at faults: each signal has stuck-at-0 and
stuck-at-1 → 3 signals × 2 = 6 faults.
(b) Explain input scanning method for logic element evaluation. (4 marks)
Input scanning method (sometimes called scan-in testing or input scanning) refers to
techniques used to apply test vectors to the internal nodes of a combinational or sequential logic
element by shifting them in through serial scan chains and to evaluate the responses by
scanning them out.
Objective: Convert sequential circuit testing into an effectively combinational testing problem
by providing full control/observation of flip-flops (scan cells).
1. Scan insertion: Convert each storage element (flip-flop) to a scan cell (a multiplexer-fed
DFF with a scan-in pin).
o The scan cell has two modes:
Functional mode: FEED D from logic input; SE = 0 (normal operation).
Scan mode: SE = 1, the flip-flops are connected in a shift register chain for
serial shifting.
2. Creating scan chains: Connect scan-in of FF1 to external Scan-In (SI), scan-out of FF1
to scan-in of FF2, etc. Chains can be single or multiple (per clock domain).
3. Test sequence using input scanning method:
o Shift in a test vector (with SE=1) serially into the scan chain (using SHIFT clock).
This forces the internal flip-flop values to desired input pattern for the
combinational logic.
o Switch to capture mode (SE=0) and apply one functional clock tick (CAPT) so
the combinational logic processes the applied test inputs and the new responses are
captured in the FFs.
o Shift out the captured results by returning SE=1 and shifting the scan chain to the
tester or comparator for evaluation.
4. Advantages:
o Converts sequential testing problem into combinational ATPG problem: ATPG
tools can generate vectors assuming direct control of FFs.
o Reduces complexity of sequential ATPG (shift/launch-capture methodology).
o High controllability and observability on internal nodes (FF Q outputs).
5. Considerations:
o Scan shift overhead — test time increases because of serial shifting; mitigated by
multiple scan chains and compression techniques.
o Proper handling of clock domains, asynchronous resets, and clock gating.
o Careful management of power during shift (due to many toggling nodes).
o Need to secure SE signal and handle timing (scan clock skew, hold/setup margins).
Example (for single shift-capture-shift cycle):
Conclusion: Input scanning is a practical and widely used DFT technique that provides
controllability/observability for logic element evaluation enabling efficient ATPG and high
fault coverage for sequential circuits.
(c) Draw and explain Clocked scan cell design with necessary waveforms. (7 marks)
A typical clocked scan cell is a multiplexer-fed D-type flip-flop (M-DFF) that supports two
modes:
Functional mode (SE=0): The D input of the FF receives data from the functional
combinational logic (normal operation). The clock (CLK) causes normal operation.
Scan mode (SE=1): The D input of the FF receives the serial scan-in value (SI) through a
scan multiplexer; the flip-flops form a chain to shift data in/out using the scan clock (may
be the same as functional clock but often separate).
1. Multiplexer: Selects between functional data input (D_func) and scan-in (SI) depending
on SE.
2. D-FF: Edge triggered flip-flop storing selected data on rising (or falling) edge of CLK.
3. Scan-out (SO): Q of the flip-flop routed to next scan cell’s SI (or to external SO if last
cell).
4. Test control (SE): Scan enable input controlling multiplexer select.
Operation modes:
Timing/Design notes:
SE must be stable and glitch-free around clock edge to avoid corrupting shift/capture.
When using separate clock domains, ensure the clock for scan shifting is controlled and
that capture clocks are aligned to avoid metastability.
Optional features: scan clocks (shift vs capture), hold latch to prevent propagation of
glitches, test mode disabling of asynchronous resets.
It provides controllability (we can set any FF to 0/1 via shifting) and observability (we
can read FF states by shifting out). It’s the backbone of scan-based ATPG and widely
used DFT practice.
Q.3 — OR alternative parts (if answering other branch)
If you prefer the OR branch (parallel fault simulation, toggle coverage and fault sampling, scan
design flow) — I include it here too.
High speed: Evaluate many faults in parallel using bit-parallel operations on machine
words (e.g., 32/64/128 faults simultaneously).
Efficient memory usage: One simulation of test vector per multiple faults reduces
repeated logic evaluation.
Scalable: Useful for large circuits and large fault lists.
Approaches (major):
1. Bit-parallel (bitwise) simulation: Each bit in a machine word represents the effect of a
different fault. Logic operations are bitwise on machine words — simulating multiple
faults at once.
2. Fault-parallel (pattern-parallel) simulation: Many test patterns are simulated in
parallel (less common historically).
3. Event-driven parallel fault simulation: Combines event-driven simulation with parallel
fault encoding to avoid evaluating whole circuit each time.
4. Dominant/Masking-aware parallelization: Use masking to avoid false interactions
among faults in same simulation word.
Toggle coverage:
Measures how many signals (nets or registers) in the design have toggled (changed value)
during a simulation or at least once during a testbench run.
Important for power validation (estimating switching activity) and functional verification
completeness (un-exercised signals may hide bugs).
Typically collected as a percentage: Toggle Coverage (%) = (Number of toggled
items / Total items) × 100.
Fault sampling:
1. Design RTL / Synthesis: Create functional design and synthesize to gate-level with DFT
constraints.
2. Scan insertion: Insert scan cells (convert FFs to scan FFs) and create scan chains; ensure
clock gating and asynchronous resets handled.
3. Design rule checks (DFT DRC): Verify no illegal scan connections (e.g., asynchronous
set/reset during shift); ensure SE is properly buffered.
4. ATPG (Test generation): Generate test vectors (stuck-at or other faults) using
combinational ATPG leveraging the scan chain controllability.
5. Test simulation & fault simulation: Simulate tests on gate-level netlist for fault
coverage and debug failures.
6. Compression / pattern optimization: Reduce number of patterns (test compaction),
apply compression if supported by ATE.
7. Scan chain integration & layout: Connect scan chains in layout, check routing, timing.
8. Scan-based manufacturing tests & BIST (if any).
9. Sign-off: Ensure test coverage targets met, finalize test set for production.
(Flowchart is typically drawn showing sequential boxes for the above steps — include in your
answer diagram if drawing.)
Q.4
(a) Explain logic optimization process for logic simulation. (3 marks)
Logic optimization for simulation focuses on simplifying the netlist to reduce simulation time
while preserving functional behavior under constraints (like not disturbing observability of
faults used for testing). Typical logic optimization steps:
1. Constant propagation and folding: Evaluate constant drives and replace complex
expressions with constants (e.g., simplify A AND 1 → A).
2. Dead-logic elimination: Remove logic that does not affect primary outputs (unreachable
or masked nodes).
3. Redundancy removal / Boolean minimization: Combine/merge gates where Boolean
algebra permits (e.g., A & A → A).
4. Structural simplification: Replace complex gate networks with simpler equivalents
(reduce gate depth, fanout).
5. Retiming / register transfer optimization: Adjust register positions to reduce
combinational depth (helps timing simulation).
6. Preserving testability vs. optimization trade-offs: For test simulation, avoid
optimizations that break fault equivalences or that remove nodes needed for test insertion;
DFT-aware optimization required.
Purpose: Reduce simulation cycles, memory footprint, and accelerate event-driven simulation
while ensuring correctness.
(b) Draw and explain two-pass nominal event driven strategy. (4 marks)
Event-driven simulation overview: Rather than repeatedly evaluating all gates every clock
cycle (inefficient), event-driven simulation updates only those gates whose inputs have changed
(events), leading to efficient performance for large circuits.
The two-pass approach reduces glitches and properly orders signal updates. The passes
are:
Prevents intra-time-step races and ensures deterministic ordering when multiple events
cause cascading changes.
Enables correct accounting for inertial or transport delays, and avoids double counting an
input change.
Detailed steps:
Maintain an event queue sorted by simulation time.
At time t: pop all events scheduled for t. For each event:
o Evaluate affected gates and compute their new outputs (store in temporary buffer).
After evaluating all events at time t (first pass), apply all buffered outputs to nets (second
pass). If committing these outputs causes outputs to change at the same time instant due
to zero delay elements, they may be queued for time t (careful management necessary).
Continue with next earliest time from the event queue.
Benefits:
(c) What is the need of timing models in testing? List down various timing models and
explain any one in detail. (7 marks)
1. Unit-delay model: Assigns a uniform delay (unit) to every gate; simple but unrealistic.
2. Gate (inertial) delay model: Each gate has a single delay and an inertial behavior (short
pulses smaller than inertial threshold are filtered).
3. Transport delay model: Propagates all pulses (no filtering of short pulses).
4. Inertial delay model: Short pulses below a threshold at the input are suppressed
(modeling physical inertia of gates).
5. Path-delay model: Focuses on path delays (sum of gate and interconnect delays along a
path) — used for path-delay testing.
6. Timing annotated HDL/ SDF: Standard Delay Format (SDF) allows detailed timing
annotation (rise/fall delays, output load dependent).
7. PRPG/transition fault models (for dynamic tests): model the launch and capture timing
(launch on shift / capture on shift semantics).
Explain one in detail — Inertial delay model (commonly used):
Definition: Each gate has an associated propagation delay for rising/falling transitions
and an associated inertial threshold — i.e., the gate rejects (filters) input pulses that are
shorter than the inertial threshold, as real physical gates cannot reproduce very short
pulses due to internal capacitances and transistor inertia.
Key properties:
o A pulse at gate input shorter than the inertial threshold will not appear at the
output.
o Only pulses longer than or equal to the threshold will be propagated, delayed by
the gate’s propagation delay.
Why important: Inertial behavior captures glitch suppression — helps model how short-
duration hazards are filtered and determines whether transient pulses will cause
timing/fault coverage problems. Useful for detecting hazards and metastability issues.
Use in testing:
o When generating delay tests, inertial delay can change how faults are
excited/propagated; tests must account for pulse widths and gate filtering to ensure
a transition persists long enough to propagate to outputs.
o Transition fault tests (detect rising/falling transition faults) rely on ensuring launch
event produces a pulse long enough to pass through multiple gates — inertial
threshold matters.
Example:
Summary: Timing models (especially inertial/transport and path-delay models) are essential
for realistic test generation and to find defects that only show up as timing violations in silicon.
SDF annotations are typically used to feed timing-accurate simulation for ATPG and sign-off.
Q.4 — OR alternative
OR (alternative subquestions)
(b) Write a VHDL/Verilog code and test bench for 1 X 4 demux. (4 marks)
I’ll provide Verilog (more commonly used in GTU labs) — module and testbench.
module demux1x4 (
input wire d,
input wire [1:0] sel,
input wire en,
output reg [3:0] y
);
always @(*) begin
if (!en) begin
y = 4'b0000;
end else begin
case (sel)
2'b00: y = 4'b0001; // route d to y0
2'b01: y = 4'b0010; // route d to y1
2'b10: y = 4'b0100; // route d to y2
2'b11: y = 4'b1000; // route d to y3
default: y = 4'b0000;
endcase
// multiply by 'd' so if d=0 outputs all 0
y = y & {4{d}};
end
end
endmodule
initial begin
$display("time\ten\tsel\td\ty");
$monitor("%0t\t%b\t%b\t%b\t%b",$time,en,sel,d,y);
// Initialize
en = 0; d = 0; sel = 2'b00;
#5 en = 1; d = 1; sel = 2'b00; // expect y = 0001
#10 sel = 2'b01; // expect y = 0010
#10 sel = 2'b10; // expect y = 0100
#10 sel = 2'b11; // expect y = 1000
#10 d = 0; sel = 2'b10; // expect y = 0000 (d=0)
#10 en = 0; // expect y = 0000 (disabled)
#10 $finish;
end
endmodule
(You can run this in any Verilog simulator. For VHDL, I can translate if needed.)
(c) Draw and explain flowchart indicating steps to do concurrent fault simulation. (7
marks)
Concurrent fault simulation simulates multiple faults in parallel during a single pass over test
vectors, commonly used for speeding up fault grading.
Q.5
(a) What is meant by functional coverage? How is it useful in verification flow? (3 marks)
Functional coverage measures whether the functional behaviors and scenarios specified in
the design requirements have been exercised by verification tests (directed or random). Unlike
code coverage that measures exercised lines of source code, functional coverage measures
exercised features and requirements (e.g., states visited, combinations of control signals,
corner-case conditions).
Components:
Coverage items: Variables, cross coverage, bins for ranges, state transitions.
Coverpoints: Specify which functional aspects to monitor.
Cross coverage: Combines multiple coverpoints to ensure combinations are tested.
Usefulness:
Example: For an ALU, coverpoints might include operation types (add, sub, shift), operand
sign combinations, overflow conditions; cross coverage could ensure operations combined with
operand ranges are tested.
Code coverage measures which portions of the HDL code are executed during simulation.
Typical metrics:
1. Line coverage / Statement coverage: Percentage of lines/statements executed.
2. Toggle/Expression coverage: Whether expressions have evaluated to both 0 and 1.
3. Branch coverage / Decision coverage: Whether each branch of if/else, case items has
been executed.
4. Condition coverage: Whether each atomic condition in a compound expression
independently affects outcome.
5. FSM coverage / State coverage: Whether each state and transition in state machines was
exercised.
Role in verification:
Detect dead code / uncovered logic: Unused or unreachable code can indicate bugs or
design problems.
Help identify missing tests: Areas with low code coverage indicate tests that should be
added.
Quality metric for regression: Track improvement or degradation in test suites.
Complement functional coverage: While functional coverage ensures requirement
coverage, code coverage ensures source-level code paths have been executed.
Limitations:
High code coverage does not guarantee functional correctness (you can execute lines
without checking correct outputs).
Must be used with assertions and functional coverage for a complete verification strategy.
(c) Compare White box verification, Black box verification and Grey box verification. (7
marks)
Examples:
Q.5 — OR alternative
If answering the OR set for Q.5:
(a) Differentiate between static hazard and dynamic hazard. (3 marks)
Static hazard:
Dynamic hazard:
Involves multiple glitches: output changes multiple times before settling to final value
(e.g., 1→0→1→0 before finally 1). Often due to more than two paths and more complex
delay differences.
Dynamic hazard can be seen as repeated static hazards causing multiple transitions.
Mitigation: Add consensus terms (for combinational logic) or ensure hazard-free logic
design, or careful timing balancing.
Key difference: Static hazard is a single glitch for an input change that should not change
output; dynamic hazard is multiple glitches causing several transitions before final value.
Comparison table:
1. Requirement / Spec capture: Formalize feature list and constraints; derive verification
goals & acceptance criteria.
2. Verification Plan: Define coverage metrics (functional & code), test strategy (directed,
constrained random), tools, schedule, responsibilities.
3. Testbench Architecture: Create modular, reusable testbench (UVM or self-built) with
stimulus generators, drivers, monitors, scoreboards, and checkers.
4. Environment Setup: Compile RTL, create harness, integrate testbench components,
enable automation (regression scripts).
5. Create Tests:
o Directed tests for critical scenarios.
o Constrained-random tests to explore large state space.
o Formal properties for exhaustive checks on small modules.
6. Run simulation & collect results: Functional outputs, logs, waveform captures.
7. Coverage Collection:
o Functional coverage to see if requirements/scenarios are exercised.
o Code coverage (line/branch/statement/condition).
o Toggle / Path coverage for power and timing insights.
8. Analyze coverage & debug failures: For coverage holes, write new tests or update
constraints; for mismatches, debug and fix RTL or testbench.
9. Regression & Sign-off: Automate nightly/continuous runs; reach coverage goals and
sign-off.
[Link] / FPGA prototyping: For system-level or performance tests, move to
hardware emulators or prototypes.
[Link] sign-off & tape-out: After verification closure and sign-off.
Explanation:
Verification is iterative: tests reveal bugs → fix RTL/testbench → re-run regressions and
re-check coverage.
The scoreboard is used to compare expected vs actual results.
Assertions (SVA/PSL) are used for run-time checking of protocol/temporal properties.
Regression is essential for tracking stability across code changes.