Delay Calculation-1
Delay Calculation-1
{ Concepts } + { Technique }
Ahmed Abdelazeem
Ahmed Abdelazeem
Ahmed Abdelazeem
02 Delay Calculation
✓ Timing Graph/Arcs/Sense/Path
✓ Cell Delay Model
✓ Wire Delay Model
✓ RC tree delay algorithm
✓ Analysis Mode
✓ GBA & PBA
✓ Parasitic Scaling & Timing Derating
Ahmed Abdelazeem
Ahmed Abdelazeem
Path Delay: Basic Approach
Ahmed Abdelazeem
Ahmed Abdelazeem
Delay Dependencies
CL P:SS td
V:0.9
trise T:-40
td ~ C L CL
td ~ trise
trise
td ~ Process Variations
td ~ Voltage Process: TT, FF, SS, etc.
td ~ Temperature Voltage: ∓10%
Temperature: -40 -1250C
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Graph
Ahmed Abdelazeem
Ahmed Abdelazeem
What Are Timing Arcs?
rising
falling
Timing arc
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Arcs
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Arcs: Cell and Net Delay
Delays encountered in digital circuitry are composed of two A Y
main components: cell delay and net delay.
VDD
A Y
Each stage delay (Cell delay + Net delay) represents the time
required to propagate a signal from the input of one gate to the
VSS
input of the next.
Transistor
Cell Delay Representation
Transistors within a cell take a certain amount of time to
switch. Therefore, a change to the input of a cell takes time
to cause a change to the output.
Net Delay A A Y
Net delay is the delay between the time a signal is first
applied to a net and the time it takes to reach other devices
connected to that net.
Cell Net delay
delay (Interconnect)
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Group Names
N Parameter Unit Symbol Figure Definition
1. Rise transition ns tR V DD The time it takes a driving pin to
0.9VDD
time make a transition from kVDD to (1-
k)VDD value. Usually k=0.1 (also
rise_transition 0.1VDD
possible k=0.2, 0.3, etc)
V SS tR
2. Fall transition time ns tF VDD
0.9VDD The time it takes a driving pin to
fall_transition make a transition from (1-k)VDD to
kVDD value. Usually k=0.1 (also
0.1VDD
possible k=0.2, 0.3, etc)
tF VSS
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Constraints: Timing Types
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Sense (unateness)
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Path: valid/leakage/don’t care?
Valid Path
Real functional path
Leakage Path
False path due to nature of STA
Ahmed Abdelazeem
Ahmed Abdelazeem
Late and Early Path
Max Timing Path (Long Path / Late Path) Min Timing Path (Short Path / Early Path)
The path with the largest delay between two end points. -> for setup The path with the smallest delay between two end points. -> for hold
check check
Ahmed Abdelazeem
Ahmed Abdelazeem
Drive Strength
Slower, Smaller footprint, Higher Resistance Faster, Larger footprint, Lower Resistance
Less current and slower slew rate, but consume lower power Draw more current thus faster slew rate, but consume more power
Drive Strength
The inverse of pull-up/pull-down resistance. In general the cells are designed to have similar drive strength for pull-up/pull-down structures.
When the CMOS cell switches state, the speed of switching is governed by how fast the capacitance on the output net can be charged/discharged.
Thus the path resistance are a major factor in determining the speed of CMOS cell.
Ahmed Abdelazeem
Ahmed Abdelazeem
Cell Delay
Ahmed Abdelazeem
Ahmed Abdelazeem
Cell Delay Model
Ahmed Abdelazeem
Ahmed Abdelazeem
Linear Delay Model
Ahmed Abdelazeem
Ahmed Abdelazeem
Non-Linear Delay Model
Tconnect
B Tslope Tintrinsic
Ttotal = Tpropagation + Ttransition + Tconnect
2 D
A
C
Transition time and Propagation delay for each cell are measured beforehand and stored in form
of lookup table
Ahmed Abdelazeem
Ahmed Abdelazeem
Nonlinear Delay Calculation Example
NLDM model is a voltage-based delay calculation model
which is widely used models representing the response G1
characteristics of cells in the libraries. It is very simple and G3
Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM
Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM
z
Fall
cell _ rise or cell _fall Propagation
Input slew table Delay
0 .7
0 .5
Input slew
Output
0 .2 110.1
Output cap Capacitance
10.2 30.8 58.7 99.9 151.6
0 .1 y
.023 .047 .065 .078 .091 0.20
output cap 0.34
0.56
Input
Delay/Power are measured as Transition 0.72
Time 1.23
function of input slew and output cap
x
Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM - library
Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM - interpolation
Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM (C effective)
Ahmed Abdelazeem
Ahmed Abdelazeem
NLDM – path delay
Ahmed Abdelazeem
Ahmed Abdelazeem
Composite Current Source Model (CCS)
C1 C2
Network model
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS model
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS model
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (conventional)
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (compact)
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (compact)
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Driver Model (compact)
Compact Driver Model
*Application variable rc_driver_model_mode -> advance means the tool is
using CCS model
Ahmed Abdelazeem
Ahmed Abdelazeem
CCS Receiver Model
Ahmed Abdelazeem
Ahmed Abdelazeem
Synopsys Liberty Format (.lib)
library (Digital_Std_Lib) {
technology (cmos);
delay_model : table_lookup;
cell(AND2) {
area : 2;
pin(A) {
direction : input;
Lookup table
}
pin(B) { trise
direction : input;
} 0.016 0.032 0.064
pin(Z) { 2 1.0020 1.1280 3.547
direction : output; CL
function : "A*B"; 4 1.0080 1.1310 1.1310
timing() {
related_pin : “A" ;
timing_type : "combinational" ;
cell_rise(…) { td
index_1("0.016, 0.032, 0.064”);
index_2("2, 4");
values("1.0020, 1.1280, 3.547 “, \
"1.0080, 1.1310, 3.847 “ );
}
CL
}
} /* end of pin */
} /* end of cell */
trise
} /* end of library*/
Ahmed Abdelazeem
Ahmed Abdelazeem
Cell Timing Data
library(){
lu_table_template ("del_1_7_7") {
variable_1 : "input_net_transition";
index_1("1, 2, 3, 4, 5, 6, 7");
variable_2 : "total_output_net_capacitance";
index_2("1, 2, 3, 4, 5, 6, 7");
}
cell (INVX1) {
pin (Y) {
timing () {
related_pin : "A";
timing_type : "combinational";
timing_sense : "negative_unate";
cell_rise ("del_1_7_7") {
index_1("0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024");
index_2("0.1, 0.25, 0.5, 1, 2, 4, 8");
values("0.016861, 0.0179019, 0.0195185, 0.0229259, 0.029658, 0.043145, 0.07712", \
"0.0239648, 0.0255491, 0.0279298, 0.0319930, 0.0387540, 0.0520896, 0.0790211", \
"0.0342118, 0.0366966, 0.0402223, 0.0462823, 0.0558327, 0.0705154, 0.0967339", \
"0.0491695, 0.0524727, 0.0576512, 0.0665647, 0.0810999, 0.1027237, 0.1342571", \
"0.0721332, 0.0765389, 0.0836775, 0.0960890, 0.1171612, 0.1497265, 0.1957640", \
"0.1111560, 0.1164417, 0.1252609, 0.1422002, 0.1712097, 0.2171862, 0.2847010", \
"0.1841131, 0.1901881, 0.2010298, 0.2194395, 0.2555983, 0.3182710, 0.4139452");
}
Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 7: State-Dependent Delay (conditional timing)
1. When statement is where to specify the condition. If the condition expression evaluates to true
value, the following timing values in that case are active. At the same time, the default case are
disabled.
2. If there is state cannot be determined in a condition and result the entire condition to be
undetermined state as well (X), the condition will still be evaluated to be true.
3. The timing engine will pick the worst active timing value for this particular timing arc.
4. To disable a particular conditional timing, we must force its condition to be known state false.
E.g.
1. There are three cases in parallel as shown on the left. When B is true, not true or don’t care (the
default)
2. If there is no case analysis set for input B, then all three cases will be active and taking into account
by timing engine. It will pick the worst case to use to calculation. Usually the worst case is the default
case.
3. If we set case analysis to make B constant at one value, then only one of the above two cases will be
picked and used during calculation. The default case is also disabled.
Ahmed Abdelazeem
Ahmed Abdelazeem
NET “Wire”
Net is a wire connecting pins of standard cells and blocks. It:
• Has only one driver
• Can drive a number of fanout cells or blocks
• Can travel on multiple metal layers of the chip
• Can be broken up into segments for equivalent electrical representation
Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Parasitics
Interconnect resistance - resistance between the output pin of a cell
and the input pins of the fanout cells
Interconnect capacitance - is comprised of grounded and between
neighboring signal routes
Interconnect inductance - arises due to current loops; effect of
inductance can be ignored for low frequencies
Ahmed Abdelazeem
Ahmed Abdelazeem
Overview
I1 O1
Cells
I2 O2
I1 O1
Input capacitances
O2
I2
I1
O1
Net parasitics
I2 O2
Ahmed Abdelazeem
Ahmed Abdelazeem
RC Tree of Interconnect
L
Distributed RC tree
Ahmed Abdelazeem
Ahmed Abdelazeem
T - model
Ahmed Abdelazeem
Ahmed Abdelazeem
𝜋- model
Ctotal is broken into two parts
Rtotal is connected between them
Rtotal
Ahmed Abdelazeem
Ahmed Abdelazeem
N Section Representation
Breaking Rtotal and Ctotal into multiple sections increases accuracy
T model:
Rtotal Rtotal Rtotal Rtotal Rtotal
2∗N N N N 2∗N
Ahmed Abdelazeem
Ahmed Abdelazeem
Elmore Delay
Elmore delays are used to calculate delay of RC trees, which
Have single input node
Do not have resistive loops
Have capacitances only coupled to ground
Ahmed Abdelazeem
Ahmed Abdelazeem
Elmore Delay
Elmore delay equation is:
1 2 i-1 i N
For example,
Ahmed Abdelazeem
Ahmed Abdelazeem
Wire Load: Parasitic Effects
Parasitics are inevitable
Not known without layout
Delay and area will be incorrect/optimistic without estimation of parasitics
a td
t2a
b
t2b
c y
C3
t2c CL
t2=f(t2d,t2c,c2)
Delay > 0
a Area >0
t2a
b
t2b
c y
C3
t2c
t2=f(t2d,t2c,c2)
ttotal= f(t2a,t2b,t2c,C1,C2,C3)
Ahmed Abdelazeem
Ahmed Abdelazeem
Estimating Parasitic Devices
Generalization Length components
Ahmed Abdelazeem
Ahmed Abdelazeem
Wire Delay Model
Estimation? Extraction?
Parasitic Value can be estimated or extracted depends on whether or not the actual wire has been
routed.
Ahmed Abdelazeem
Ahmed Abdelazeem
RC Tree
RC Tree Topology
Single input node
No resistive loops
All capacitance are grounded
Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Delay Model v.s. Topographical
Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model
wire_load (name) {
area : float ;
capacitance : float ;
resistance : float ;
slope : float ;
{fanout_length (fanout, length) ;
...
}
}
wire_load_selection (name) {
{wire_load_from_area (min_area1,max_area1,wire_load_name1);
...}
}
default_wire_load : name ;
default_wire_load_selection : name ;
default_wire_load_area : float ;
default_wire_load_capacitance : float ;
default_wire_load_resistance : float ;
default_wire_load_mode : top | segmented | enclosed ;
Ahmed Abdelazeem
Ahmed Abdelazeem
Library: Wire Load Example
wire_load (“8000”) {
capacitance : 0.024051;
resistance : 1.860000e-03;
area : 0.010000;
slope : 30.285426;
fanout_length("1", "8.2750360");
fanout_length("2", "18.4914880");
fanout_length("3", "29.3531220");
}
default_wire_load : "8000";
default_wire_load_selection : "predcaps";
wire_load_selection (predcaps) {
wire_load_from_area (0.000000,200.000000, “4000");
wire_load_from_area (200.000000,8000.000000, "8000");
wire_load_from_area (8000.000000,16000.000000, "16000");
}
}
Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model
Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model
Ahmed Abdelazeem
Ahmed Abdelazeem
Library: Wire Load Table
wire_load_table (name) {
fanout_length(fanout, length);
fanout_capacitance(fanout, capacitance);
fanout_resistance(fanout, resistance);
fanout_area(fanout, area);
...
}
Ahmed Abdelazeem
Ahmed Abdelazeem
Prelayout: Wire Load Model
Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Trees: Best-case Tree
Assumed that:
Load pin is physically adjacent to the driver
None of the wire resistance is in the path to the destination pin
Wire capacitance and pin capacitances act as load on the driver pin
Rwire
Cwire
Rdrive C1
Cout
C1
C1
Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Trees: Balanced Tree
Assumed that:
Each destination pin is on a separate portion of the interconnect
Each path has equal portion of the total wire resistance and capacitance
C1
Rdrive
C1
Cout
C1
Ahmed Abdelazeem
Ahmed Abdelazeem
Interconnect Trees: Worst-case Tree
Assumed that:
All the destination pins are together at the far end of the wire
Each destination pin sees the total wire resistance and the total wire capacitance
C1
Rwire
Rdrive
Cout Cwire C1
Rwire C1
Cwire
Ahmed Abdelazeem
Ahmed Abdelazeem
Post-layout: Parasitic & SPEF
Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 8: Parasitic Annotation Issue
Tiny Open
Ahmed Abdelazeem
Ahmed Abdelazeem
Elmore Delay Calculation
Ahmed Abdelazeem
Ahmed Abdelazeem
Arnoldi Delay Calculation
Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 9: report_delay_calculation
Since it’s using WLM, we want to set some timing derate to safeguard the margin.
In this report, we are enlarge all the datapath cell delay by a factor of 1.35
Ahmed Abdelazeem
Ahmed Abdelazeem
Slew Degradation
Slew Degradation
Slowdown of the slew rate due to resistance as it travels along the
wire
Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 10: Zero Interconnect Mode (Design Compiler)
Ahmed Abdelazeem
Ahmed Abdelazeem
Slew Merge
Ahmed Abdelazeem
Ahmed Abdelazeem
Max delay & Min delay
Ahmed Abdelazeem
Ahmed Abdelazeem
Analysis Mode
Single Mode
Same delay across entire design.
* Setup check :
> Launch path: slowest path through max-delay arc, single operation condition, no
derating
> Capture path: fastest path through max-delay arc, single operation condition, no
derating
* Hold Check:
> Launch path: fastest path through max-delay arc, single operation condition, no
derating
> Capture path: slowest path through max-delay arc, single operation condition, no
derating
Best Case & Worst Case (bc_wc)
Two extreme PVT corners: One corner for either of setup or check
* Setup check :
> Launch path: slowest path through max-delay arc, worst-case operation condition, late derating
> Capture path: fastest path through max-delay arc, worst-case operation condition, early derating
* Hold Check:
> Launch path: fastest path through min-delay arc, best-case operation condition, early derating
> Capture path: slowest path through min-delay arc, best-case operation condition, late derating
Ahmed Abdelazeem
Ahmed Abdelazeem
Single Analysis Mode
The single mode is essentially the max delay-only FF1
mode.
Only the max corner library information is
used. root
Only constraints related to -max (or FF2
unspecified) are used.
Only a single delay calculation pass is made.
Only the slowest slews are propagated.
One library
One corner
Ahmed Abdelazeem
Ahmed Abdelazeem
Best-Case Worst-Case Analysis Mode
Using both BC and WC libraries together is one way to model off-chip variation of delays due to external
temperature and voltage variations.
Setup analysis uses Max delay information for both clock and data.
Hold analysis uses Min delay information for both clock and data.
When is BCWC used?
The BCWC mode has often been the default for early steps of the implementation flow: preCTS, preRoute. Lately, OCV is
appearing earlier in the flow.
FF1
Late path
Launch clock
root
Capture clock FF2
Hold Setup WC
BC
(Min delays) library (Max delays) library
Ahmed Abdelazeem
Ahmed Abdelazeem
Analysis Mode (cont’d)
Ahmed Abdelazeem
Ahmed Abdelazeem
What Are On-Chip Variations and Their Sources?
Chip Variations are the intrinsic variability of • Changes in physical parameters affect electrical
parameters, in turn causing delay variations.
semiconductors subjected to process variations.
- Variation in length, width, thickness.
- Doping variation.
OCV in STA:
Affects wire and cell delays.
Clock and data paths affected differently.
Increases pessimism in the design.
Location- and depth-based variations.
Ahmed Abdelazeem
Ahmed Abdelazeem
What Is Advanced OCV (AOCV)?
AOCV reduces the level of derating for each stage • Here, it shows the derate table for a buffer cell based
on the depth along the timing graph.
on the basis that each successive stage will cancel
out the variation.
Data path
The shortcoming of OCV is that timing closure gets
Clock branch 4
difficult due to extra pessimism added with fixed
point 3 5
derates for the cells.
1 2
0 1 2
“Some stages will be faster, others slower. So the
more stages you have, the more it averages out.” Clock path
Depth 1 3 5 7 10 15 20 30
Late derate 1.6 1.4 1.3 1.25 1.21 1.17 1.12 1.1
Early derate 0.5 0.6 0.7 0.8 0.88 0.89 0.91 0.95
Ahmed Abdelazeem
Ahmed Abdelazeem
What Is Parametric OCV (POCV)?
POCV is a variation modeling technique that computes the impact of local process variations on the delay and
slew of each instance in the design at a given global variation corner. POCV propagates the sigma of arrival
and required times through the timing graph and then computes statistical characteristics of slack at all
timing pins.
Ahmed Abdelazeem
Ahmed Abdelazeem
Liberty Variation Format (LVF)
cell (cell_name) {
ocv_derate_distance_group: ocv_derate_group_name;
...
pin | bus | bundle (name) {
direction: input | output;
timing() {
...
ocv_sigma_cell_rise(delay_lu_template_name){
sigma_type: early | late | early_and_late;
index_1 ("float, ..., float");
index_2 ("float, ..., float");
values ( "float, ..., float", \
..., \
"float, ..., float");
}
ocv_sigma_cell_fall(delay_lu_template_name){
sigma_type: early | late | early_and_late;
index_1 ("float, ..., float");
index_2 ("float, ..., float");
values ( "float, ..., float", \
..., \
"float, ..., float");
}
...
} /* end of timing */
...
} /* end of pin */
... (Statistical calculations happen during path tracing.)
Ahmed Abdelazeem
Ahmed Abdelazeem
Parametric OCV (POCV)
Parametric OCV uses a parameter as a delay sigma variation (not a derate factor):
A function of input slew and output load
Arc-level granularity
Accuracy and correlation with SPICE
(silicon)
Close correlation of GBA and PBA
Timing Window
The timing window refers to the period of time between earliest
possible switching time and latest possible switching time.
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Graph-based Analysis (GBA)
In static timing analysis (STA), Graph-Based Analysis is a pessimistic algorithm
for timing which is based on the worst slew propagation (slew merging). It is
the default mode of analysis in the implementation stage of the design.
● In the GBA mode, the PT considers both the worst ● Here, it is assumed that for any input slew, the
arrival and the worst slew in a path during timing output slew is 25% more than the input slew. If
analysis, even if the worst slew corresponds to an slew at B is 500ps, then slew at Z is 625ps.
input pin different than the relevant pin for the
current path. ● In GBA, for calculating delay and slew propagation
through this AND gate, the worse input slew
● This approach is used during the initial timing (through B) is always considered, irrespective of
analysis before the final signoff. the fact that the path is through pin A or B.
• This reduces the analysis runtime of the whole
design.
• But it gives a pessimistic result.
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Graph-based Analysis (GBA)
Graph-based Analysis
A timing arc can have only a single set of timing values, and these values are used for all graph paths through the timing
arc.
Timing window is propagated downstream for crosstalk calculation.
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Path-based Analysis (PBA)
Path-Based Analysis (PBA) involves • GBA
re-timing the components of a timing path based a3
a2 a4
on the actual slew that is propagated in this path. a1
v3
Q D
v1 v2 v4
• It considers path-based slews and actual
arrival times for both base and SI delay a5 a6 a7
calculations.
• PBA
• It removes the pessimism that is introduced
due to slew merging at various nodes in the a2
a3
a4
a1
design when the graph-based analysis is run.
• It is recommended to use during the final
v3
stages of timing closure to ensure accuracy. Q D
v1 v2 v4
o This reduces the area, power dissipation,
and ECO cycles of your design but a5 a6 a7
imposes a huge runtime hit.
• Magnitude depicts the fast or slow slew.
Ahmed Abdelazeem
Ahmed Abdelazeem
Timing Path-based Analysis (PBA)
Path-based Analysis
The same timing arc can have different delay and slew for every path through the
arc.
Single edge is propagated downstream for crosstalk calculation.
Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 11: report timing paths
Find the Difference from the two
Both are reporting the same timing path, not where goes differently?
Ahmed Abdelazeem
Ahmed Abdelazeem
Topic 11: report timing paths
report_timing -net -input -tran –cap –derate –voltage –crosstalk –max_paths –nworst
➢ -net not only shows the net between pin nodes, but also show the number of fanouts for each net
➢ -input shows the input pin through which the path is going through. It’s useful when tracing report through multiple input cells. Also, it split the delay associate to a cell into net
delay and cell delay.
➢ -tran shows both the input and output transition time used for or calculated by the delay calculation.
➢ -cap shows the total capacitance appear on the net, including both wire capacitance and input pin cap from next stage
➢ -derate show the derating factor used for the delay value.
➢ -voltage show the operating voltage set to do the analysis, useful when the design involves multiple power domain and rail voltage.
➢ -crosstalk show the delta delay calculated during signal integrity check or manually annotated.
➢ -max_path show the maximum total paths to be reported among all path groups. –nworst shows the maximum number of paths to be reported for a single endpoint
Ahmed Abdelazeem
Ahmed Abdelazeem
Chapter Summary
✓ Timing Graph/Arcs/Sense/Path
✓ Cell Delay Model
✓ Wire Delay Model
✓ RC tree delay algorithm
✓ Analysis Mode
✓ GBA & PBA
✓ Parasitic Scaling & Timing Derating
Ahmed Abdelazeem
Ahmed Abdelazeem
Thank You ☺
Ahmed Abdelazeem
Ahmed Abdelazeem