
SNUG San Jose 2001 Have Your Cake And Eat It Too:
How To Optimize For Area AND Timing
4
4.2. Do Not Over-Constrain
Another key methodology issue involves the practice of over-constraining a design. Over-
constraining consumes area by causing designs to be overbuilt, using higher than necessary drive
strengths on cells and forcing more area hungry implementations of designware components.
Realistic and successively refined constraints are much more area friendly. Default constraints,
which should be built into the scripts and methodology, should reflect reasonable input & output
delays, input drivers and output loading. For example, use the “Q” output pin of a 1X drive flop
as the default driving cell pin, and 4 times the “A” input pin of a 1X drive NAND gate as the
default output load for a block. For a default input delay, allow for the clock-to-Q of the driving
flop plus a small wire delay. For a default output delay, allow for setup time of the driving flop
plus a small wire delay. Use virtual clocks to constrain I/O’s of combinatorial blocks, or
preferably ungroup combinatorial blocks into their parent designs. Combinatorial and “snaking”
paths, and possibly some sequential paths, will be under-constrained at first. This would cause
great difficulty in a single pass methodology. By using a multi-pass methodology however, the
constraints will be refined in subsequent passes without over-constraining the entire design.
4.3. Margin With Design Rules
Traditionally, over-constraining was done to minimize the timing losses incurred by non-timing
savvy layout tools. With the present widespread use of timing driven layout tools, timing
accuracy (especially with respect to predicted clock insertion delay and skew) can be much more
important than timing margin. It is advisable, however, to margin the design with design rules
instead of timing constraints. One example is to tighten the max_transition value before layout,
and relax it to an acceptable value after layout. Another example is to use a max_fanout of 10 to
20 on compiled designs to reduce unexpected timing swings in layout. Another example is to set
a max_fanout on all the input ports of compiled blocks. The max_fanout constraints on inputs
and designs combine to prevent high fanout loading surprises further up the design hierarchy, and
limit the amount of buffering to be performed by the timing driven layout tools. I have found an
input max_fanout value of one to be quite effective, and that this constraint can sometimes be
lifted on the last pass in a multi-pass compile strategy. Any resulting unnecessary buffering will
be optimized away with the core level incremental optimization in synthesis. Once again, these
design rule based margining techniques should be built into the scripts and methodology.
4.4. Use Selective Ungrouping
Yet another key methodology issue involves the use of selective ungrouping and set_dont_touch
on compiled designs. Very small designs, especially purely combinational ones, should be
ungrouped into their parent blocks whenever possible. Compile generated hierarchy including
designware and MUX OP components should also be ungrouped. Ungrouping these components
removes boundary conditions allowing further area optimization. Ungrouping before the compile
can result in longer run times, since synthesis has to look at more of the design at once.
Compiling hierarchically can have a runtime advantage, but consumes more area and also
requires hierarchical saves, leading to multiple versions of the same leaf block throughout the
design. To obtain the best of both worlds, load the parent and leaf blocks and then compile