0% found this document useful (0 votes)
30 views31 pages

Clock Tree Synthesis in Openroad: December 12, 2024

Uploaded by

nareshdarling308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views31 pages

Clock Tree Synthesis in Openroad: December 12, 2024

Uploaded by

nareshdarling308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Clock Tree Synthesis in OpenROAD

December 12, 2024

© 2024 Precision Innovations, do not copy or distribute without authorization.


Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
2
© 2024 Precision Innovations, do not copy or distribute without authorization.
OpenROAD Flow Scripts Design Flow

Clock Tree
Floorplanning Placement Routing Chip Finishing
Synthesis

Define chip size, Place standard cells Build clock tree for Global routing to
layout synchronous logic minimize congestion Metal fill insertion
Optimize for timing
Place IO pads and and power Repair hold and Optimize for timing Timing signoff with
macros logical DRC and power parasitic extraction
Detailed placement violations
Build power (legalization) Detailed routing to Layout verification
distribution network Detailed placement repair all physical
DRC violations GDSII generation

3
© 2024 Precision Innovations, do not copy or distribute without authorization.
Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
4
© 2024 Precision Innovations, do not copy or distribute without authorization.
What is Clock Tree Synthesis?

Clocks drive lots of sequential elements (thousands and thousands of them)!


• Sequential elements include
• flip-flops and latches
• macros (memories, third-party intellectual properties, etc.) Floorplan
Clock tree synthesis involves distributing clock signals to all sequential elements such that
clock signals arrive at destinations as quickly and simultaneously as possible
• Without CTS, this is almost impossible Placement
• For example, sky130hd/riscv32i design has one clock ‘clk’ that drives 1056 sequential elements
• This is done by inserting clock buffers
Clock Tree Synthesis

Routing
Sequential elements
are scattered all
across the chip Chip Finishing

clock port

5
© 2024 Precision Innovations, do not copy or distribute without authorization.
How do we measure “quickly” and “simultaneously”?

Clock latency measures how quickly signals arrive (CL1,


CL2)
Clock skew measures how simultaneously signals arrive clock latency
• Skew is the worst difference in clock latency (CL2)
• Skew = Worst clock latency (CL2) – best clock clock latency
latency (CL1)
(CL1)
CTS tries to
• minimize clock latency and clock skew ff1
• improve design quality of results (QoR)
• Setup QoR
• Hold QoR
• Logical design rule check (LDRC) QoR
• max transition
• max capacitance ff2
• max fanout
• Power
• Static power
• Dynamic power

6
© 2024 Precision Innovations, do not copy or distribute without authorization.
Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
7
© 2024 Precision Innovations, do not copy or distribute without authorization.
Setup Violations
What is it? Setup violation occurs when data signal arrives late relative to clock signal
Why is it This can cause incorrect data value to be latched
bad?
How to fix? Speed up data path (upsize, add/remove buffer) or slow down clock paths
Clock paths can be slowed down by increasing clock period (decreasing clock
frequency)
Can waive? Non-critical path violations or small violations can be waived
Setup
violation
ff1 ff2
Q D Q
ff2/D
D
CK CK
Setup
clk ff2/CK
buf1 Setup
slack
ff1 ff2
ff2/D
D Q D Q
CK CK
Setup
clk ff2/CK
buf1
8
© 2024 Precision Innovations, do not copy or distribute without authorization.
Hold Violations
What is it? Hold violation occurs when data signal changes early relative to clock signal
Why is it bad? This can cause meta stability in storage elements (neither 0 or 1)
How to fix? Slow down data path (downsize or add buffer) or speed up clock paths
Fixing hold violations can cause additional setup violations and vice versa
Can waive? Hold violations are harder to waive than setup violations
Hold violation

ff1 ff2
ff2/D
D Q D Q
CK CK
Hold

clk ff2/CK

buf1 Hold slack

ff1 ff2
ff2/D
D Q D Q
CK CK

Hold
clk ff2/CK
buf1
9
© 2024 Precision Innovations, do not copy or distribute without authorization.
LDRC violations (max transition violations)

cell ("sky130_fd_sc_hd__buf_1") {
. . .
pin ("X") {
. . .
direction : "output";
What is Max transition violation occurs when signal function : "(A)";
max_capacitance : 0.1300150000;
it? transition time degrades over a long wire or max_transition : 1.5061030000;

heavy capacitive loads timing () {


cell_fall ("del_1_7_7") {
index_1("0.0100000000, 0.0230506000, 0.0531329000,
Why is it This causes models in static timing analysis 0.1224740000, 0.2823110000, 0.6507430000, 1.5000000000");
index_2("0.0005000000, 0.0012632100, 0.0031913700,
bad? to become invalid because values are out of 0.0080627200, 0.0203697000, 0.0514623000, 0.1300150000");
values("0.0593383000, 0.0643396000, 0.0749824000,
range 0.0973634000, 0.1492992000, 0.2787939000, 0.6061452000", \

How to Break up long wires by adding buffers, upsize


fix? drivers or split loads Non-linear delay model (NLDM) table for fall delay
Can Small violations can be waived Input 0.01 0.02 0.05 0.12 0.28 0.65 1.5
trans
waive? Output
cap

0.0005

0.0013
signal transition time (10-90% or 20-80%) 0.0032

0.0081

0.0204

0.0515

0.1300

limit
10
© 2024 Precision Innovations, do not copy or distribute without authorization.
Fixing max transition violations

transition time (slew) violation

limit

Add buffers, upsize drivers or split loads

transition time (slew) slack

limit

11
© 2024 Precision Innovations, do not copy or distribute without authorization.
Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
12
© 2024 Precision Innovations, do not copy or distribute without authorization.
How Do We Measure Design Quality of Results (QoR)?

Setup/hold/LDRC QoR involves


• Worst violator magnitude
• Total violator magnitude
• Number of violators
For example, design has these setup violations with 10 ns period clock
• 10 ns clock period => 1 / 1e-8 sec = 1e8 Hz =100 MHz frequency
• ff1/D: -1 ns
• ff2/D: -2 ns
• ff3/D: -3 ns
• Setup QoR is as follows:
• Worst negative slack (WNS) is -3 ns at ff3/D
• This means clock period needs to change from 10 ns to 10+3=13 ns (frequency = 77 MHz)
• Total negative slack (TNS) is -1 -2 -3 = -6 ns
• This indicates the amount of “fixing” that needs to happen
• Number of violating endpoints (NVP) is 3: ff1/D, ff2/D, ff3/D

13
© 2024 Precision Innovations, do not copy or distribute without authorization.
Benefits of CTS
QoR Metric Without CTS With CTS Comment
Worst clock latency 1.67 0.82
Best clock latency 0.24 0.72
Clock skew 1.43 0.10 14X less skew
Setup WNS / TNS / NVP -2.19 / -206.43 / 382 -1.01 / -21.46 / 105 10X less setup TNS
Hold WNS / TNS / NVP 0/0/0 0/0/0
Max trans / Max cap / 1025 / 3 / 0 187 / 4 / 0 5X less max trans
Max fanout violations violations
Total power 17.7 mW 24.1 mW 1.4X more power

Without CTS (sky130hd / riscv32i) With CTS (sky130hd / riscv32i) 14


Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
15
© 2024 Precision Innovations, do not copy or distribute without authorization.
Ideal vs. Propagated Clock Modes

Without clock trees, a high-fanout clock net causes significant delay


and transition degradations
• For pre-CTS designs, we want to ignore such large delays or transitions
because they cause too many false timing violations
• “Ideal clock mode” assumes clock nets have no delay degradation and no
transition degradation
set_ideal_network [get_clocks CLK1]
set_ideal_network [all_clocks]
After clock trees are built, we can use “propagated clock mode” to
model actual delay and transition degradations
set_propagated_clock [get_clocks CLK1]
set_propagated_clock [all_clocks]
16
© 2024 Precision Innovations, do not copy or distribute without authorization.
Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
17
© 2024 Precision Innovations, do not copy or distribute without authorization.
CTS Deep Dive

Main steps involve


• Sink clustering
• Sequential elements are grouped
into a fixed number of clusters
based on their locations
• Tree construction and balancing
• Buffers are inserted based on some
structure like hierarchical tree (H
Tree)
• Tree lengths are balanced such that
clock skews are minimized
• LDRC repair
• LDRC violations are repaired during
or after CTS
• Max transition, max capacitance,
max wire length, etc.

18
© 2024 Precision Innovations, do not copy or distribute without authorization.
Sink Clustering

Group sequential elements based on


their locations to produce the best
results (for example, minimum wire
length)
• These parameters can be specified
manually or determined automatically
• Cluster size
• Cluster diameter
All elements in the cluster will be driven
by the same buffer

19
© 2024 Precision Innovations, do not copy or distribute without authorization.
Clock Tree Balancing

Perform library cell and wire analysis to determine best buffer choices and
buffering distance
Construct Htree by alternating vertical segments with horizontal segments
Ensure that all segments of clock trees are balanced to minimize skew

L2 L L2
1

L2 L1 L2

20
© 2024 Precision Innovations, do not copy or distribute without authorization.
Obstruction-Aware CTS

Clock buffers should not be placed on top of macros, placement blockages


or another clock buffers
• Detailed placement may displace “illegal” buffers and cause timing to change after
CTS
• New “legal” buffer locations need to preserve balanced clock tree
• Obstruction-aware CTS can reduce legalizer displacement by up to 4X

sky130hd/microwatt without obstruction-aware CTS sky130hd/microwatt with obstruction-aware CTS


21
© 2024 Precision Innovations, do not copy or distribute without authorization.
Load Balancing

Some “dummy” cells are inserted to balance tree loads

Floating “dummy” inverter cell


is added to balance
capacitive loads between left
side and right side
22
© 2024 Precision Innovations, do not copy or distribute without authorization.
LDRC Repair

Max fanout can be fixed by adjusting cluster size


Max transition, max capacitance and max wire length violations can be
fixed by adding buffers to existing clock trees
• Impact on clock latency and skew needs to be considered

23
© 2024 Precision Innovations, do not copy or distribute without authorization.
Handling Macro Cells

Macro cells represent memories or third-party intellectual


property cells with limited internal visibility
• Full timing paths to sequential elements in the macro are
macro
not available
• Insertion delays are used to model clock latency from clock
insertion delay
source pin to sequential element clock pins
OpenSTA can extract insertion delays in liberty timing
models
CTS needs to consider insertion delays in macro cells
cell(array_tile) {
interface_timing : true;
pin(clk) {
direction : input;
clock : true;
timing() {
timing_sense : positive_unate;
timing_type : min_clock_tree_path;
cell_rise(scalar) {
values("0.002");
}
cell_fall(scalar) {
values("0.003");
}
}
timing() {
timing_sense : positive_unate;
timing_type : max_clock_tree_path;
cell_rise(scalar) {
values("0.003");
}
cell_fall(scalar) {
values("0.004");
}
}
Without insertion delay consideration With insertion delay consideration

24
© 2024 Precision Innovations, do not copy or distribute without authorization.
Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
25
© 2024 Precision Innovations, do not copy or distribute without authorization.
OpenROAD CTS Commands

Command Description Example Output


[INFO CTS-0050] Root buffer is BUF_X4.

clock_tree_synthesis Build a balanced Htree by [INFO CTS-0051] Sink buffer is BUF_X4.


[INFO CTS-0052] The following clock buffers will be used for CTS:
BUF_X4
. . .

choosing appropriate clock [INFO CTS-0017] Max level of the clock tree: 5.
[INFO CTS-0098] Clock net "clk"
[INFO CTS-0099] Sinks 2537
[INFO CTS-0100] Leaf buffers 96

buffers [INFO CTS-0101] Average sink wire length 9247.25 um


[INFO CTS-0102] Path depth 18 - 19
[INFO CTS-0207] Leaf load cells 62
[INFO RSZ-0058] Using max wire length 693um.
[INFO RSZ-0047] Found 33 long wires.
[INFO RSZ-0048] Inserted 94 buffers in 33 nets.

[INFO RSZ-0058] Using max wire length 2154um.


repair_clock_nets Fixes LDRC violations
including max wire length
Clock clk
report_clock_skew Report worst clock skew for 1.26 source latency inst_7_12/clk ^
-1.13 target latency inst_8_12/clk ^
0.00 CRPR

each clock signal in the --------------


0.13 setup skew

design
Startpoint: dp.rf.rf[31][3]$_DFFE_PP_

report_checks -format Report timing violations (rising edge-triggered flip-flop clocked by clk)
Endpoint: aluout[0] (output port clocked by clk)
Path Group: clk
Path Type: max

full_clock_expanded including clock paths Fanout Cap Slew Delay

0.00
0.00
Time

0.00
0.00
Description
-----------------------------------------------------------------------------
clock clk (rise edge)
clock source latency
1 0.09 0.00 0.00 0.00 ^ clk (in)
clk (net)
0.00 0.00 0.00 ^ clkbuf_0_clk/A (sky130_fd_sc_hd__clkbuf_16)
8 0.21 0.22 0.25 0.25 ^ clkbuf_0_clk/X (sky130_fd_sc_hd__clkbuf_16)
clknet_0_clk (net)
0.22 0.00 0.25 ^ clkbuf_3_3__f_clk/A (sky130_fd_sc_hd__clkbuf_16)
17 0.23 0.24 0.34 0.59 ^ clkbuf_3_3__f_clk/X (sky130_fd_sc_hd__clkbuf_16)
clknet_3_3__leaf_clk (net)
0.24 0.00 0.59 ^ clkbuf_leaf_47_clk/A (sky130_fd_sc_hd__clkbuf_16)
11 0.04 0.06 0.20 0.79 ^ clkbuf_leaf_47_clk/X (sky130_fd_sc_hd__clkbuf_16)
clknet_leaf_47_clk (net)
0.06 0.00 0.79 ^ dp.rf.rf[31][3]$_DFFE_PP_/CLK (sky130_fd_sc_hd__dfxtp_2)
3 0.01 0.03 0.32 1.11 v dp.rf.rf[31][3]$_DFFE_PP_/Q (sky130_fd_sc_hd__dfxtp_2)
dp.rf.rf[31][3] (net)

26
© 2024 Precision Innovations, do not copy or distribute without authorization.
Clock Tree Viewer

Open GUI
• gui::show
Enable “Clock Tree Viewer” if not
enabled
Clock tree viewer shows
latencies at all sinks
• Red sinks represent
FF/latches
• Green sinks represent
macros
• Insertion delays
are added to
macro sinks

27
© 2024 Precision Innovations, do not copy or distribute without authorization.
Outline

• Physical Design Flow


• CTS Basics
• Clock Latency and Skew
• Timing and Logical DRC Violations
• Design QoR and Benefits of CTS
• Ideal vs. Propagated Clock Modes
• CTS Deep Dive
• Clustering
• Tree Construction
• LDRC Repair
• Handling Macro Cells
• CTS in OpenROAD
• Commands and Examples
• Clock Tree Visualization
• Clock Power and Variability
28
© 2024 Precision Innovations, do not copy or distribute without authorization.
Clock Power
enable signal is active
only when clock logic
Clock networks can consume more than 30% of total is needed
chip power due to frequent charging and discharging of
lots of sequential elements
Total power = static power (leakage power) + dynamic
power enable
Clock gate cell
Static power can be reduced by clock
• using low drive-strength or high threshold voltage
(HVT) cells
• However, clock network often needs high drive-
strength or low threshold voltage (LVT) cells for
timing and variability reasons
Dynamic power (~ load cap x supply voltage ^ 2 x clock
frequency) can be reduced by
• powering down logic that is not needed (“clock
# Clock NDR
gating”)
• downsizing sequential elements
NONDEFAULTRULES 1 ;
- CTS_NDR_0
• reducing clock wire lengths + LAYER metal1 WIDTH 140 SPACING 260

• Clock NDR (non-default rules) can make router


+ LAYER metal2 WIDTH 140 SPACING 280
+ LAYER metal3 WIDTH 140 SPACING 280
prioritize clock nets over data nets + LAYER metal4 WIDTH 280 SPACING 560
• For example, clock wire spacing can be 2X + LAYER metal5 WIDTH 280 SPACING 560
vs. data wire such that data wires take + LAYER metal6 WIDTH 280 SPACING 560
detours instead of clock wires
• reducing clock frequency
+ LAYER metal7 WIDTH 800 SPACING 1600
+ LAYER metal8 WIDTH 800 SPACING 1600
• reducing supply voltage + LAYER metal9 WIDTH 1600 SPACING 3200
• Dynamic voltage scaling can increase voltage ;
+ LAYER metal10 WIDTH 1600 SPACING 3200
only when needed
NETS 4 ;
- clk ( PIN clk ) ( clkbuf_0_clk A ) + USE CLOCK
+ NONDEFAULTRULE CTS_NDR_0 ;

29
© 2024 Precision Innovations, do not copy or distribute without authorization.
Variability Considerations

Timing and power vary depending on process, voltage, temperature (PVT)


plus other factors
• Process: fast, slow
• Voltage: 0.75V, 1.0V, …
• Temperature: -40 deg C, 25 deg C, …
Even at the same PVT, some cells are more sensitive to small changes
• Cell delay or transition time changes significantly even though output wire length
changes very little
• Low drive strength (0.5X or 1X) and high threshold voltage (HVT) cells tend to be
sensitive
Avoid low drive strength and high threshold voltage cells for clock buffers
• Use LVT or SLVT cells
• Don’t mix different threshold voltage (VT) type cells (LVT and SLVT) because
they have different variability relative to PVT and other factors
30
© 2024 Precision Innovations, do not copy or distribute without authorization.
Summary

• CTS is essential in achieving desired design QoR


• Main CTS goals are to minimize latency and skew
• Balanced clock trees can be constructed using sink clustering and Htrees
• CTS needs to consider LDRC violations, macro cell insertion delays and
placement obstructions
• OpenROAD can
• build and visualize clock trees
• report skews and clock path timing
• CTS needs to minimize power and provide robustness against
process, temperature, voltage and other variability factors

31
© 2024 Precision Innovations, do not copy or distribute without authorization.

You might also like