0% found this document useful (0 votes)
118 views86 pages

Delay Calculation-1

The document outlines the fundamental concepts of Static Timing Analysis (STA), focusing on delay calculation methods, timing arcs, and the impact of various factors such as cell and wire delays. It details the definitions and parameters associated with timing constraints, including setup and hold times, as well as the differences between valid, leakage, and don't care paths. Additionally, it introduces delay models like the Non-Linear Delay Model (NLDM) and Composite Current Source (CCS) model, emphasizing their applications in digital circuitry design.

Uploaded by

p93848155
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views86 pages

Delay Calculation-1

The document outlines the fundamental concepts of Static Timing Analysis (STA), focusing on delay calculation methods, timing arcs, and the impact of various factors such as cell and wire delays. It details the definitions and parameters associated with timing constraints, including setup and hold times, as well as the differences between valid, leakage, and don't care paths. Additionally, it introduces delay models like the Non-Linear Delay Model (NLDM) and Composite Current Source (CCS) model, emphasizing their applications in digital circuitry design.

Uploaded by

p93848155
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

STA Basic Concepts

{ Concepts } + { Technique }

Ahmed Abdelazeem

Ahmed Abdelazeem
Ahmed Abdelazeem
02 Delay Calculation

✓ Timing Graph/Arcs/Sense/Path
✓ Cell Delay Model
✓ Wire Delay Model
✓ RC tree delay algorithm
✓ Analysis Mode
✓ GBA & PBA
✓ Parasitic Scaling & Timing Derating

Ahmed Abdelazeem
Ahmed Abdelazeem
Path Delay: Basic Approach

This illustration shows the calculation of the path delays.

Simple Design Showing Timing Arcs and Timing Paths

Cell timing arc Net timing arc


2 1 0
1 3 1
4 0
4
path Flop
clock
Path delay = 2 + 1 + 1 + 3 + 0 + 4 + 1 + 4 + 0 = 16 time units

Ahmed Abdelazeem
Ahmed Abdelazeem
Delay Dependencies

trise 10ps – 120ps


td Cload 10fF – 50 fF

CL P:SS td
V:0.9
trise T:-40

td ~ C L CL

td ~ trise
trise
td ~ Process Variations
td ~ Voltage Process: TT, FF, SS, etc.
td ~ Temperature Voltage: ∓10%
Temperature: -40 -1250C

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Graph

Schematic Timing Graph


The logic connectivity has been established after netlist is read in and design Represent the design as a node graph. The ports and pins in the design become
linked. the nodes in the graph, and the timing arcs become the connections between the
nodes

Ahmed Abdelazeem
Ahmed Abdelazeem
What Are Timing Arcs?

A timing arc is an imaginary arc that represents a single causal relationship.


If a change on an input causes a change on the output, it is known as a causal relationship.
Timing arcs provide a simple understanding of the structure and functionality of a gate.

rising
falling

Rising and falling timing arc delays across


Input Inverter Output
a gate are not always symmetric and are
listed separately in a library.
falling rising

Timing arc

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Arcs

Combinational Gate Sequential cell


Combinational logic cells have timing arcs from each input to each Sequential cell have timing arcs from the clock to outputs and
output. timing constraints for data pins w.r.t. the clock.
This is because a change in the output can only be caused by a change
at the clock pin for a simple flop.

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Arcs: Cell and Net Delay
Delays encountered in digital circuitry are composed of two A Y
main components: cell delay and net delay.
VDD

A Y
Each stage delay (Cell delay + Net delay) represents the time
required to propagate a signal from the input of one gate to the
VSS
input of the next.
Transistor
Cell Delay Representation
Transistors within a cell take a certain amount of time to
switch. Therefore, a change to the input of a cell takes time
to cause a change to the output.
Net Delay A A Y
Net delay is the delay between the time a signal is first
applied to a net and the time it takes to reach other devices
connected to that net.
Cell Net delay
delay (Interconnect)

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Group Names
N Parameter Unit Symbol Figure Definition
1. Rise transition ns tR V DD The time it takes a driving pin to
0.9VDD
time make a transition from kVDD to (1-
k)VDD value. Usually k=0.1 (also
rise_transition 0.1VDD
possible k=0.2, 0.3, etc)
V SS tR
2. Fall transition time ns tF VDD
0.9VDD The time it takes a driving pin to
fall_transition make a transition from (1-k)VDD to
kVDD value. Usually k=0.1 (also
0.1VDD
possible k=0.2, 0.3, etc)
tF VSS

3. Propagation delay ns tPLH IN


Time difference between the input
low-to-high (rise) (tPR) 0.5VDD
OUT signal crossing a 0.5VDD and the
output signal crossing its 0.5VDD
cell_rise 0.5VDD
when the output signal is changing
tPLH
from low to high

4. Propagation delay ns tPHL IN Time difference between the input


high-to-low (Fall) (tPF) 0.5VDD signal crossing a 0.5VDD and the
OUT
output signal crossing its 0.5VDD
cell_fall 0.5VDD
when the output signal is changing
tPHL
from high to low

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Constraints: Timing Types

Setup/Hold, Recovery/Removal Constraints


N Parameter Unit Symbol Figure Definition

1 Setup time ns tSU The minimum period in which the input


0.5VDD
(only for flip-flops or latches) DATA
t SU
data to a flip-flop or a latch must be
setup_rising stable before the active edge of the clock
0.5VDD
setup_falling CLOCK occurs
DATA

2 Hold time ns tH 0.5VDD The minimum period in which the input


. (only for flip-flops or latches) data to a flip-flop or a latch must remain
. hold_rising 0.5VDD stable after the active edge of the clock
CLOCK tH has occurred
hold_falling

3 Removal time ns tREM 0.5VDD The minimum time in which the


SET (RESET)
(only for asynchronous Set or asynchronous Set or Reset pin to a flip-
Reset) 0.5VDD flop or latch must remain enabled after
CLOCK
removal_rising, tREM the active edge of the clock has occurred
removal_falling
4 Recovery time ns tREC The minimum time in which Set or Reset
(only for asynchronous Set or SET (RESET)
0.5V DD
must be held stable after being
Reset) 0.5VDD deasserted before next active edge of the
recovery_rising, CLOCK
clock occurs
t REC
recovery_falling

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Sense (unateness)

Positive Unate Arc


A rising transition on an input causes the output to rise or not change.
A falling transition on an input causes the output to fall or not change.
E.g. AND/OR gate

Negative Unate Arc


A rising transition on an input causes the output to fall or not change.
A falling transition on an input causes the output to rise or not
change.
E.g. NAND/NOR gate

Non-unate Arc (state-dependent)


Output transition cannot be determined solely from the direction
of change of an input but also depends upon the state of the other
inputs. E.g. XOR gate

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Path: valid/leakage/don’t care?

Valid Path
Real functional path

Leakage Path
False path due to nature of STA

Don’t Care Path


Static path or unintended path

Ahmed Abdelazeem
Ahmed Abdelazeem
Late and Early Path

Max Timing Path (Long Path / Late Path) Min Timing Path (Short Path / Early Path)
The path with the largest delay between two end points. -> for setup The path with the smallest delay between two end points. -> for hold
check check

Ahmed Abdelazeem
Ahmed Abdelazeem
Drive Strength

Slower, Smaller footprint, Higher Resistance Faster, Larger footprint, Lower Resistance
Less current and slower slew rate, but consume lower power Draw more current thus faster slew rate, but consume more power

Drive Strength
The inverse of pull-up/pull-down resistance. In general the cells are designed to have similar drive strength for pull-up/pull-down structures.
When the CMOS cell switches state, the speed of switching is governed by how fast the capacitance on the output net can be charged/discharged.
Thus the path resistance are a major factor in determining the speed of CMOS cell.

Ahmed Abdelazeem
Ahmed Abdelazeem
Cell Delay

Transition & Propagation


Slew Delay

Ahmed Abdelazeem
Ahmed Abdelazeem
Cell Delay Model

Approximation to Real Physics


Trade-off between Accuracy and Speed

Ahmed Abdelazeem
Ahmed Abdelazeem
Linear Delay Model

T𝑡𝑜𝑡𝑎𝑙 = T𝑠𝑙𝑜𝑝𝑒 + T𝑖𝑛𝑡𝑟𝑖𝑛𝑠𝑖𝑐 + T𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 + T𝑐𝑜𝑛𝑛𝑒𝑐𝑡


1

• Slope Delay (Tslope)


- The transition time of the previous gate
Tconnect
• Intrinsic Delay (Tintrinsic )
B Tslope Tintrinsic
- Delay of an element
D
• Transition time (Ttransition) A
C
- Delay introduced by capacitive load on driving pin
Ttransition
- Ttransition = R drive ∗ σpins Cpin + Cwire

• Connect Delay (Tconnect )


- Delay from transition of the driving pin to
endpoint changing after the

Ahmed Abdelazeem
Ahmed Abdelazeem
Non-Linear Delay Model
Tconnect

B Tslope Tintrinsic
Ttotal = Tpropagation + Ttransition + Tconnect
2 D
A
C

Transition time (Ttransition ) Ttransition


Delay introduced by capacitive load on driving pin (measured, not calculated)
Propagation delay (Tpropagation)
Time from the 50 percent input pin voltage until the gate output just begins to switch (10 percent output voltage)
(measured, not calculated)
Connect Delay (Tconnect )
Delay from transition of the driving pin (estimated interconnect delay)

Transition time and Propagation delay for each cell are measured beforehand and stored in form
of lookup table

Ahmed Abdelazeem
Ahmed Abdelazeem
Nonlinear Delay Calculation Example
NLDM model is a voltage-based delay calculation model
which is widely used models representing the response G1
characteristics of cells in the libraries. It is very simple and G3

less time consuming for the tools to obtain the response of G2

the cells. This model uses two dimensional tables to T Ctotal


represent the cell delay, output slew and other timing
rise
checks. In this method of modelling the driver cell is
modeled to be a voltage source with resistance in series P T
(Thevenin Model). The receiver is modeled to be a load fall net n1
fall
capacitor.

The NLDM table is in the form of a two-dimensional table.


Notice that the NLDM table is characterized under the
condition where the output wire resistance is zero since we
have no idea what the load will be when just creating the
library.

Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM

2 NLDM (non-linear delay model, LUT)

C load: Single capacitance model depends only on rise/fall min/max arc


condition

* An important assumption is: Only one input is switching at a time.


Multi-input simultaneous switching is too complex for STA engine.

Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM
z

Fall
cell _ rise or cell _fall Propagation
Input slew table Delay

0 .7
0 .5
Input slew
Output
0 .2 110.1
Output cap Capacitance
10.2 30.8 58.7 99.9 151.6
0 .1 y
.023 .047 .065 .078 .091 0.20
output cap 0.34
0.56
Input
Delay/Power are measured as Transition 0.72
Time 1.23
function of input slew and output cap
x

Linear Delay model:


Delay = Intrinsic Delay + Slope_factor * Load(Cap)

Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM - library

Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM - interpolation

Step #1 – Solve for coefficients


Use the 4 pre-characterized library data to calculates coefficients A, B, C, D for this
particular timing arc. (substitute X, Y, Z into the plane equation Z = A+ B * X + C *
Y + D * X * Y)
0.067 = A + B*0.064 + C*0.5 + D*0.064*0.5
0.071 = A + B*0.064 + C*1.0 + D*0.064*1.0
0.082 = A + B*0.128 + C*0.5 + D*0.128*0.5
0.087 = A + B*0.128 + C*1.0 + D*0.128*1.0
Step #2 – Interpolate the cell delay
Use the solved coefficients and the plane equation to calculate cell delay with input
transition and output load.
Cell delay = A + B*0.09 + C*0.67 + D*0.09*0.67

Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM (C effective)

Resistive Shielding Effective Capacitance


The output capacitance seen from the drive point is effectively less than the total When wire resistance is not negligible anymore, effective capacitance has to be
capacitance of the wire. Near-end capacitance will be charged quicker than far-end used for calculating delay through the driving cell. Single capacitance model
capacitance because of wire resistance. depends only on rise/fall min/max arc condition.

Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM – path delay

Ahmed Abdelazeem
Ahmed Abdelazeem
Composite Current Source Model (CCS)

3 CCS (composite current source)

C1 C2
Network model

⚫ The driver model uses a time-varying current source.


⚫ The receiver model consists of 2 different capacitors.
The first one is used as load up to the input delay threshold. A second capacitance value is used when the input
waveform reaches the threshold value.

⚫ CCS models are frequently used in advanced technology nodes.

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS model

2 CCS (composite current source)

Cload: voltage-dependent also depends on input slew/output capacitance

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS model

2 CCS (composite current source)

Cload: voltage-dependent also depends on input slew/output capacitance

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (conventional)

* Above is driver model only

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (compact)

Conventional CCS Modeling with Base Curves and re-construct


Current v.s. Time and Voltage v.s. Time, too many sample points Modelling I-V curve which is smoother. Only 6 parameters needed to model one
transition process. Reconstruct the waveform using pre-characterized “base curve”

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (compact)

Base Curves Lookup Table


Template

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (compact)
Compact Driver Model
*Application variable rc_driver_model_mode -> advance means the tool is
using CCS model

Index 3: Six essential


data
initial current - 1.24e+01
peak current - 2.47e+01
peak voltage - 2.86e-01
peak time - 4.46e-02
left base curve ID - 799
right curve ID – 6920

Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Receiver Model

Compact Load Model


- Improved accuracy for both delay and slew calculation
- Consider non-linear effect such as miller effect

*Application variable rc_receiver_model_mode -> advance means the tool is using


CCS model

Ahmed Abdelazeem
Ahmed Abdelazeem
Synopsys Liberty Format (.lib)
library (Digital_Std_Lib) {
technology (cmos);
delay_model : table_lookup;
cell(AND2) {
area : 2;
pin(A) {
direction : input;
Lookup table
}
pin(B) { trise
direction : input;
} 0.016 0.032 0.064
pin(Z) { 2 1.0020 1.1280 3.547
direction : output; CL
function : "A*B"; 4 1.0080 1.1310 1.1310
timing() {
related_pin : “A" ;
timing_type : "combinational" ;
cell_rise(…) { td
index_1("0.016, 0.032, 0.064”);
index_2("2, 4");
values("1.0020, 1.1280, 3.547 “, \
"1.0080, 1.1310, 3.847 “ );
}
CL
}
} /* end of pin */
} /* end of cell */
trise
} /* end of library*/

Ahmed Abdelazeem
Ahmed Abdelazeem
Cell Timing Data
library(){
lu_table_template ("del_1_7_7") {
variable_1 : "input_net_transition";
index_1("1, 2, 3, 4, 5, 6, 7");
variable_2 : "total_output_net_capacitance";
index_2("1, 2, 3, 4, 5, 6, 7");
}

cell (INVX1) {
pin (Y) {
timing () {
related_pin : "A";
timing_type : "combinational";
timing_sense : "negative_unate";
cell_rise ("del_1_7_7") {
index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
values("0.016861, 0.0179019, 0.0195185, 0.0229259, 0.029658, 0.043145, 0.07712", \
"0.0239648, 0.0255491, 0.0279298, 0.0319930, 0.0387540, 0.0520896, 0.0790211", \
"0.0342118, 0.0366966, 0.0402223, 0.0462823, 0.0558327, 0.0705154, 0.0967339", \
"0.0491695, 0.0524727, 0.0576512, 0.0665647, 0.0810999, 0.1027237, 0.1342571", \
"0.0721332, 0.0765389, 0.0836775, 0.0960890, 0.1171612, 0.1497265, 0.1957640", \
"0.1111560, 0.1164417, 0.1252609, 0.1422002, 0.1712097, 0.2171862, 0.2847010", \
"0.1841131, 0.1901881, 0.2010298, 0.2194395, 0.2555983, 0.3182710, 0.4139452");
}

Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 7: State-Dependent Delay (conditional timing)

Here are the rules for conditional timing:

1. When statement is where to specify the condition. If the condition expression evaluates to true
value, the following timing values in that case are active. At the same time, the default case are
disabled.

2. If there is state cannot be determined in a condition and result the entire condition to be
undetermined state as well (X), the condition will still be evaluated to be true.

3. The timing engine will pick the worst active timing value for this particular timing arc.

4. To disable a particular conditional timing, we must force its condition to be known state false.

E.g.

1. There are three cases in parallel as shown on the left. When B is true, not true or don’t care (the
default)

2. If there is no case analysis set for input B, then all three cases will be active and taking into account
by timing engine. It will pick the worst case to use to calculation. Usually the worst case is the default
case.

3. If we set case analysis to make B constant at one value, then only one of the above two cases will be
picked and used during calculation. The default case is also disabled.

Ahmed Abdelazeem
Ahmed Abdelazeem
NET “Wire”
Net is a wire connecting pins of standard cells and blocks. It:
• Has only one driver
• Can drive a number of fanout cells or blocks
• Can travel on multiple metal layers of the chip
• Can be broken up into segments for equivalent electrical representation

Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Parasitics
Interconnect resistance - resistance between the output pin of a cell
and the input pins of the fanout cells
Interconnect capacitance - is comprised of grounded and between
neighboring signal routes
Interconnect inductance - arises due to current loops; effect of
inductance can be ignored for low frequencies

Ahmed Abdelazeem
Ahmed Abdelazeem
Overview
I1 O1

Cells
I2 O2

I1 O1

Input capacitances
O2
I2

I1
O1

Net parasitics
I2 O2

Ahmed Abdelazeem
Ahmed Abdelazeem
RC Tree of Interconnect

Interconnect Rtotal = Runit * L


Ctotal= Cunit * L

L
Distributed RC tree

Ahmed Abdelazeem
Ahmed Abdelazeem
T - model

Rtotal is broken in two parts Ctotal

Ctotal is connected between them

Ahmed Abdelazeem
Ahmed Abdelazeem
𝜋- model
Ctotal is broken into two parts
Rtotal is connected between them

Rtotal

Ahmed Abdelazeem
Ahmed Abdelazeem
N Section Representation
Breaking Rtotal and Ctotal into multiple sections increases accuracy

T model:
Rtotal Rtotal Rtotal Rtotal Rtotal
2∗N N N N 2∗N

π model: Ctotal Ctotal Ctotal Ctotal Ctotal


N N N N N

Rtotal Rtotal Rtotal Rtotal Rtotal


2∗N N N N 2∗N

Ctotal Ctotal Ctotal Ctotal Ctotal Ctotal


2∗N N N N N 2∗N

Ahmed Abdelazeem
Ahmed Abdelazeem
Elmore Delay
Elmore delays are used to calculate delay of RC trees, which
Have single input node
Do not have resistive loops
Have capacitances only coupled to ground

Ahmed Abdelazeem
Ahmed Abdelazeem
Elmore Delay
Elmore delay equation is:

1 2 i-1 i N

For example,

Ahmed Abdelazeem
Ahmed Abdelazeem
Wire Load: Parasitic Effects
Parasitics are inevitable
Not known without layout
Delay and area will be incorrect/optimistic without estimation of parasitics

a td
t2a
b
t2b
c y
C3
t2c CL
t2=f(t2d,t2c,c2)

ttotal= f(t2a,t2b,t2c,C1,C2,C3) trise

Delay > 0
a Area >0
t2a
b
t2b
c y
C3
t2c
t2=f(t2d,t2c,c2)

ttotal= f(t2a,t2b,t2c,C1,C2,C3)

Ahmed Abdelazeem
Ahmed Abdelazeem
Estimating Parasitic Devices
Generalization Length components

▪ The more the fanout (output connections) the larger


the length

▪ All parasitics depend on interconnect length


▪ The larger the chip (the more gates it has) the more
the length

R = length ∙ Runit length

C = length ∙ Cunit length

Area = length ∙ Areaunit length

length = f (gate count, fanout)

Ahmed Abdelazeem
Ahmed Abdelazeem
Wire Delay Model

Estimation? Extraction?
Parasitic Value can be estimated or extracted depends on whether or not the actual wire has been
routed.

Ahmed Abdelazeem
Ahmed Abdelazeem
RC Tree

RC Tree Topology
Single input node
No resistive loops
All capacitance are grounded

Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Delay Model v.s. Topographical

Wire Load Model Topographical

Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model

wire_load (name) {
area : float ;
capacitance : float ;
resistance : float ;
slope : float ;
{fanout_length (fanout, length) ;
...
}
}
wire_load_selection (name) {
{wire_load_from_area (min_area1,max_area1,wire_load_name1);
...}
}
default_wire_load : name ;
default_wire_load_selection : name ;
default_wire_load_area : float ;
default_wire_load_capacitance : float ;
default_wire_load_resistance : float ;
default_wire_load_mode : top | segmented | enclosed ;

Ahmed Abdelazeem
Ahmed Abdelazeem
Library: Wire Load Example

wire_load (“8000”) {
capacitance : 0.024051;
resistance : 1.860000e-03;
area : 0.010000;
slope : 30.285426;
fanout_length("1", "8.2750360");
fanout_length("2", "18.4914880");
fanout_length("3", "29.3531220");
}

default_wire_load : "8000";
default_wire_load_selection : "predcaps";

wire_load_selection (predcaps) {
wire_load_from_area (0.000000,200.000000, “4000");
wire_load_from_area (200.000000,8000.000000, "8000");
wire_load_from_area (8000.000000,16000.000000, "16000");
}
}

Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model

Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model

Ahmed Abdelazeem
Ahmed Abdelazeem
Library: Wire Load Table

wire_load_table (name) {
fanout_length(fanout, length);
fanout_capacitance(fanout, capacitance);
fanout_resistance(fanout, resistance);
fanout_area(fanout, area);
...
}

• It is practical to set the default_wire_load_mode to enclosed or segmented instead of top.


• If the default_wire_load_mode is set to top, all nets in both the top design and subblocks use the wire load
model selected from the area of the top design.
• If the default_wire_load_mode is set to enclosed, the nets fully enclosed within a design use the wire load
model selected from the area of that subdesign.
• If the default_wire_load_mode is set to segmented, the nets partially enclosed within a design use the
wire load model selected from the area of that subdesign.

Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model

Best-case Tree Balanced Tree Worst-case Tree

Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Trees: Best-case Tree
Assumed that:
Load pin is physically adjacent to the driver
None of the wire resistance is in the path to the destination pin
Wire capacitance and pin capacitances act as load on the driver pin

Rwire

Cwire

Rdrive C1

Cout

C1

C1

Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Trees: Balanced Tree
Assumed that:
Each destination pin is on a separate portion of the interconnect
Each path has equal portion of the total wire resistance and capacitance

C1

Rdrive
C1
Cout

C1

Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Trees: Worst-case Tree
Assumed that:
All the destination pins are together at the far end of the wire
Each destination pin sees the total wire resistance and the total wire capacitance

C1

Rwire
Rdrive

Cout Cwire C1

Rwire C1

Cwire

Ahmed Abdelazeem
Ahmed Abdelazeem
Post-layout: Parasitic & SPEF

Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 8: Parasitic Annotation Issue

Common Reason for partial annotation Floating Metal Piece


• Floating metal pieces
• Dangling pin/port in RTL
• Open nets

Consequences of annotation issue


• Tool could assume unrealistic delay value and create false violation
• Timing results cannot be trusted on these problematic nets having annotation issue
• Other nets might as well be impacted so the accuracy of timing analysis degrades.

Tiny Open

Ahmed Abdelazeem
Ahmed Abdelazeem
Elmore Delay Calculation

Application of Elmore Delay calculation


• For pre-route database which don’t have parasitic extracted yet.
• When analysis time is a concern, want fast turn around time.
• Can switch to AWE calculation if need better accuracy and correlation
with post-route, but at cost of runtime

Ahmed Abdelazeem
Ahmed Abdelazeem
Arnoldi Delay Calculation

Application of Arnoldi Delay calculation


• For post-route database with parasitic fully extracted. Usually used in APR tool
• Better RC delay calculation accuracy at cost of runtime
• Used in case where driver resistance is much less than the impedance of net to ground. (In case where net resistance is small compared to
drive resistance, Elmore delay can provide enough accuracy.)

Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 9: report_delay_calculation

Since it’s using WLM, we want to set some timing derate to safeguard the margin.
In this report, we are enlarge all the datapath cell delay by a factor of 1.35

The calculated fall cell delay is 0.0631, multiply it


by the derating factor 1.35 results in 0.085,
which matches the timing report

Ahmed Abdelazeem
Ahmed Abdelazeem
Slew Degradation

Slew Degradation
Slowdown of the slew rate due to resistance as it travels along the
wire

Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 10: Zero Interconnect Mode (Design Compiler)

Zero Interconnect, No slew degradation


- All Resistance and capacitance are treated as zero, no slew degradation along the wire
- Driver resistance and receiver pin cap are still kept non-zero

Ahmed Abdelazeem
Ahmed Abdelazeem
Slew Merge

Max Path Delay Calculation Min Path Delay Calculation


Worst slew must be chosen and propagated forward. Choose slowest Worst slew must be chosen and propagated forward. Choose fastest
slew to propagate to downstream slew to propagate to downstream

Ahmed Abdelazeem
Ahmed Abdelazeem
Max delay & Min delay

Max Delay Arc Min Delay Arc


“Max” stimuli is used for cell arc delay calculation “Min” stimuli is used for cell arc delay calculation
1) Maximum annotated lumped capacitive wire load are used. → max SPEF 1) Minimum annotated lumped capacitive wire load are used. → min SPEF
2) Maximum pin capacitance or receiver model are used. → max library 2) Minimum pin capacitance or receiver model are used. → min library
3) Maximum slew propagation is performed at slew merge point. → max 3) Minimum slew propagation is performed at slew merge point. → min slew
slew rate rate

Ahmed Abdelazeem
Ahmed Abdelazeem
Analysis Mode

Single Mode
Same delay across entire design.
* Setup check :
> Launch path: slowest path through max-delay arc, single operation condition, no
derating
> Capture path: fastest path through max-delay arc, single operation condition, no
derating
* Hold Check:
> Launch path: fastest path through max-delay arc, single operation condition, no
derating
> Capture path: slowest path through max-delay arc, single operation condition, no
derating
Best Case & Worst Case (bc_wc)
Two extreme PVT corners: One corner for either of setup or check
* Setup check :
> Launch path: slowest path through max-delay arc, worst-case operation condition, late derating
> Capture path: fastest path through max-delay arc, worst-case operation condition, early derating
* Hold Check:
> Launch path: fastest path through min-delay arc, best-case operation condition, early derating
> Capture path: slowest path through min-delay arc, best-case operation condition, late derating

Ahmed Abdelazeem
Ahmed Abdelazeem
Single Analysis Mode
The single mode is essentially the max delay-only FF1
mode.
Only the max corner library information is
used. root
Only constraints related to -max (or FF2
unspecified) are used.
Only a single delay calculation pass is made.
Only the slowest slews are propagated.

One library
One corner

Ahmed Abdelazeem
Ahmed Abdelazeem
Best-Case Worst-Case Analysis Mode
Using both BC and WC libraries together is one way to model off-chip variation of delays due to external
temperature and voltage variations.
Setup analysis uses Max delay information for both clock and data.
Hold analysis uses Min delay information for both clock and data.
When is BCWC used?
The BCWC mode has often been the default for early steps of the implementation flow: preCTS, preRoute. Lately, OCV is
appearing earlier in the flow.

FF1
Late path
Launch clock
root
Capture clock FF2

Early path Late path


Early path

Hold Setup WC
BC
(Min delays) library (Max delays) library

Ahmed Abdelazeem
Ahmed Abdelazeem
Analysis Mode (cont’d)

On-chip Variation (OCV)


The min/max delays and slews establish the ranges for possible delays and slews.
The actual delays and slews on the chip could be anywhere between these min/max bounds.
* Setup check :
> Launch path: slowest path through max-delay arc, worst-case operation condition, late derating
> Capture path: fastest path through min-delay arc, best-case operation condition, early derating
* Hold Check:
> Launch path: fastest path through min-delay arc, best-case operation condition, early derating
> Capture path: slowest path through max-delay arc, worst-case operation condition, late derating

Ahmed Abdelazeem
Ahmed Abdelazeem
What Are On-Chip Variations and Their Sources?
Chip Variations are the intrinsic variability of • Changes in physical parameters affect electrical
parameters, in turn causing delay variations.
semiconductors subjected to process variations.
- Variation in length, width, thickness.
- Doping variation.

OCV in STA:
Affects wire and cell delays.
Clock and data paths affected differently.
Increases pessimism in the design.
Location- and depth-based variations.

How to model on-chip variations!!!

OCV, AOCV and POCV !!!

Ahmed Abdelazeem
Ahmed Abdelazeem
What Is Advanced OCV (AOCV)?
AOCV reduces the level of derating for each stage • Here, it shows the derate table for a buffer cell based
on the depth along the timing graph.
on the basis that each successive stage will cancel
out the variation.

Data path
The shortcoming of OCV is that timing closure gets
Clock branch 4
difficult due to extra pessimism added with fixed
point 3 5
derates for the cells.
1 2

0 1 2
“Some stages will be faster, others slower. So the
more stages you have, the more it averages out.” Clock path

Depth 1 3 5 7 10 15 20 30
Late derate 1.6 1.4 1.3 1.25 1.21 1.17 1.12 1.1
Early derate 0.5 0.6 0.7 0.8 0.88 0.89 0.91 0.95

Ahmed Abdelazeem
Ahmed Abdelazeem
What Is Parametric OCV (POCV)?
POCV is a variation modeling technique that computes the impact of local process variations on the delay and
slew of each instance in the design at a given global variation corner. POCV propagates the sigma of arrival
and required times through the timing graph and then computes statistical characteristics of slack at all
timing pins.

● POCV solves shortcomings of the AOCV approach:


●Inefficiency at ultralow voltage operation
(Voperation ~ Vthreshold)
●Cell timing dependency on slews and loads
● POCV analysis requires variation libraries in the form of
a Synopsys POCV library or as a Liberty Variation
Format (LVF) library.
● LVF includes arc-level absolute variations of cell delays, PrimeTime
output transitions, and timing checks as functions of POCV
input slew and load.

Ahmed Abdelazeem
Ahmed Abdelazeem
Liberty Variation Format (LVF)
cell (cell_name) {
ocv_derate_distance_group: ocv_derate_group_name;
...
pin | bus | bundle (name) {
direction: input | output;

timing() {
...
ocv_sigma_cell_rise(delay_lu_template_name){
sigma_type: early | late | early_and_late;
index_1 ("float, ..., float");
index_2 ("float, ..., float");
values ( "float, ..., float", \
..., \
"float, ..., float");
}

ocv_sigma_cell_fall(delay_lu_template_name){
sigma_type: early | late | early_and_late;
index_1 ("float, ..., float");
index_2 ("float, ..., float");
values ( "float, ..., float", \
..., \
"float, ..., float");
}
...
} /* end of timing */
...
} /* end of pin */
... (Statistical calculations happen during path tracing.)

Ahmed Abdelazeem
Ahmed Abdelazeem
Parametric OCV (POCV)
Parametric OCV uses a parameter as a delay sigma variation (not a derate factor):
A function of input slew and output load
Arc-level granularity
Accuracy and correlation with SPICE
(silicon)
Close correlation of GBA and PBA

Unique per-arc, per load/slew sigmas

(Statistical calculations happen during path tracing.)


Ahmed Abdelazeem
Ahmed Abdelazeem
Timing (Arrival) Window

Timing Window
The timing window refers to the period of time between earliest
possible switching time and latest possible switching time.

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Graph-based Analysis (GBA)
In static timing analysis (STA), Graph-Based Analysis is a pessimistic algorithm
for timing which is based on the worst slew propagation (slew merging). It is
the default mode of analysis in the implementation stage of the design.

● In the GBA mode, the PT considers both the worst ● Here, it is assumed that for any input slew, the
arrival and the worst slew in a path during timing output slew is 25% more than the input slew. If
analysis, even if the worst slew corresponds to an slew at B is 500ps, then slew at Z is 625ps.
input pin different than the relevant pin for the
current path. ● In GBA, for calculating delay and slew propagation
through this AND gate, the worse input slew
● This approach is used during the initial timing (through B) is always considered, irrespective of
analysis before the final signoff. the fact that the path is through pin A or B.
• This reduces the analysis runtime of the whole
design.
• But it gives a pessimistic result.

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Graph-based Analysis (GBA)

Graph-based Analysis
A timing arc can have only a single set of timing values, and these values are used for all graph paths through the timing
arc.
Timing window is propagated downstream for crosstalk calculation.

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Path-based Analysis (PBA)
Path-Based Analysis (PBA) involves • GBA
re-timing the components of a timing path based a3
a2 a4
on the actual slew that is propagated in this path. a1

v3
Q D
v1 v2 v4
• It considers path-based slews and actual
arrival times for both base and SI delay a5 a6 a7
calculations.
• PBA
• It removes the pessimism that is introduced
due to slew merging at various nodes in the a2
a3
a4
a1
design when the graph-based analysis is run.
• It is recommended to use during the final
v3
stages of timing closure to ensure accuracy. Q D
v1 v2 v4
o This reduces the area, power dissipation,
and ECO cycles of your design but a5 a6 a7
imposes a huge runtime hit.
• Magnitude depicts the fast or slow slew.

Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Path-based Analysis (PBA)

Path-based Analysis
The same timing arc can have different delay and slew for every path through the
arc.
Single edge is propagated downstream for crosstalk calculation.

Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 11: report timing paths
Find the Difference from the two
Both are reporting the same timing path, not where goes differently?

Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 11: report timing paths

report_timing -net -input -tran –cap –derate –voltage –crosstalk –max_paths –nworst
➢ -net not only shows the net between pin nodes, but also show the number of fanouts for each net
➢ -input shows the input pin through which the path is going through. It’s useful when tracing report through multiple input cells. Also, it split the delay associate to a cell into net
delay and cell delay.
➢ -tran shows both the input and output transition time used for or calculated by the delay calculation.
➢ -cap shows the total capacitance appear on the net, including both wire capacitance and input pin cap from next stage
➢ -derate show the derating factor used for the delay value.
➢ -voltage show the operating voltage set to do the analysis, useful when the design involves multiple power domain and rail voltage.
➢ -crosstalk show the delta delay calculated during signal integrity check or manually annotated.
➢ -max_path show the maximum total paths to be reported among all path groups. –nworst shows the maximum number of paths to be reported for a single endpoint

report_timing –pba_mode [none | path | exhaustive] update_timing


In addition to all the switches above, -pba_mode enables the pba mode reporting If the timing constraints have been changed, STA engine will need to re-calculate the
➢ none (the default) - Path-based analysis is not applied. timing graph again. Even though report_timing will re-time the design implicitly
(incrementally), It’s always a good practice to do an explicit update_timing explicitly
➢ path - Path-based analysis is applied to paths after they have been gathered.
before report_timing.
➢ exhaustive - An exhaustive path-based analysis path search algorithm is applied to
By default, the update_timing command uses an efficient timing analysis algorithm that
determine the worst path-based analysis path set in the design.
requires minimal computation effort and updates existing timing analysis information
only where needed. You can override this default behavior using the -full option, which
causes the entire timing update to be performed from the beginning.

Ahmed Abdelazeem
Ahmed Abdelazeem
Chapter Summary

✓ Timing Graph/Arcs/Sense/Path
✓ Cell Delay Model
✓ Wire Delay Model
✓ RC tree delay algorithm
✓ Analysis Mode
✓ GBA & PBA
✓ Parasitic Scaling & Timing Derating

Ahmed Abdelazeem
Ahmed Abdelazeem
Thank You ☺

Ahmed Abdelazeem
Ahmed Abdelazeem

You might also like