ECE 551
Digital Design And Synthesis
Fall 09
Synthesis Flow
Synthesis Optimizations
Administrative Matters
Project Demos will be held in B555
Final Exam is:
Wednesday (12/16/09) from 1:00PM till evening.
Friday (12/18/09) from 1:00PM till evening.
Tuesday: 12/22/09 @ 10:05AM
Internal Synthesizer Flow
HDL Description
Parsing and
Syntax & Semantic
Error Checking
Synthesizer
Policy Checking
Translation
(Elaboration)
Structural
Representation
Architectural
Optimization
Technology
Library
Multi-Level Logic
Optimization
Technology
Mapping
Technology-Based
Implementation
(netlist)
Getting Lost in a Sea of Documentation
Add the following line to your .cshrc:
alias synopsys_doc acroread /afs/.engr.wisc.edu/apps/eda/synopsys/
syn_Z-2007.03-SP3/doc/online/top.pdf
Use this to kick off the Synopsys On-Line
Documentation (SOLD)
Only look at Design Compiler Related Stuff
Command Line Interface Guide
Constraints and Timing
Optimization and Timing Analysis
Full command set (on 2nd page) (Synthesis Commands) (man2)
4
Initial Steps (Analyze Verilog File)
Parsing for Syntax and Semantics Checking
Gives error messages and warnings to user
User may modify the HDL description in response
Synthesizer Policy Checking
Check for adherence to allowable language constructs
Check for usage recommendations
This is where you find out you cant use certain Verilog
constructs
This is synthesizer-dependent
Example: Design Vision allows indexed part-select
(guess[i*2 : 2]), but the Xilinx Foundation tool does not
Certain things common to MOST synthesizers
5
Translation (Elaboration)
Builds a structural representation of the design
Like a netlist, but includes larger components
Not just gate-level, may include adders, etc.
Gives additional errors or warnings to the user
Issues in initial transformation to hardware.
Affects quality achieved by optimization steps
Structural representation depends on HDL quality
Poor HDL can prevent optimization
Optimization in Synthesis
None of these are guaranteed!
Most synthesizers will make at least some attempt
Detect and eliminate redundant logic
Detect combinational feedback loops
Exploit don't-care conditions
Try to detect unused states (logic states you cant get to)
Synthesize optimal, multilevel realizations subject to:
constraints on area and/or speed
available technology (library)
7
Optimization Process
Optimization modifies the initial netlist resulting
from elaboration.
Architecture choices made first (CLA,RCA,)
Maps to cells from the technology library
Attempts to meet all specified constraints
The process is divided into major phases
All or some selection of the major phases may be
performed during optimization
Phase selection can be controlled by the user
Optimization Phases
Architectural optimization
High-level optimizations that occur before the design is
mapped to the logic-level
Based on constraints and high-level coding style
Level of parallelism?
Building block choices like adder architecture (DW
components)
After optimization circuit function is represented by a
generic, technology-independent netlist (GTECH)
Architectural Optimization
In Synopsys, types include:
Sharing common mathematical subexpressions
Sharing resources
Selecting DesignWare implementations
Reordering operators
Identifying arithmetic expressions for datapath synthesis
10
Architectural Optimization
Examples:
Replace an adder used as a counter with incrementer
Replace adder and separate subtractor with adder/subtractor
if not used simultaneously
Performs selection of pre-designed components (Synopsys
DesignWare)
adders, multipliers, shifters, comparators, muxes, etc.
Need good code for synthesizer to do this
Designer still knows more about the project
11
Logic/Gate-Level Optimization
Works on the generic netlist created by logic
synthesis
Produces a technology-specific netlist.
In Synopsys, it consists of four stages:
Mapping
Delay optimization
Design rule fixing
Area optimization
12
Logic/Gate-Level Optimization
Mapping
Generates a gate level implementation
Tries to meet timing and area goals
Delay optimization
Tries to fix delay violations from mapping phase.
Does not fix design rule violations or meet area constraints.
Design rule fixing
Tries to correct design rule violations
Inserting buffers or resizing existing cells
If necessary, violates optimization constraints
Area optimization
Tries to meet area constraints, which have lowest priority
13
Combinational Optimization
Boolean equation manipulation
Boolean reduction
Factoring
Sharing common terms
Mapping Optimizations
Gate mapping based on CF
Applying De-Morgans
Sizing Gates & Buffering
14
Gate-Level Optimization
15
Logic-Level Optimizations
Verilog
Description
Technology
Libraries
TRANSLATION
ENGINE
OPTIMIZATION
ENGINE
Two-level
Logic Functions
Optimized
Multi-level Logic
Functions
MAPPING
ENGINE
Technology
Implementation
16
Logic Optimizations
Area
Number of gates
Fan in of gates (# inputs)
Drive Strength (transistor width)
Delay
fewer == smaller
fewer == smaller
narrower == smaller
Number of logic levels fewer == faster (usually)
Size of gates (# inputs) fewer == faster
Note that examples that follow ignore NOT gates for
gate count / levels of circuits
17
Logic Optimizations
Decomposition
Extraction
Factoring
Substitution
Elimination
You dont have to remember the names of these
But understand the concept and the motivation
18
Decomposition
Find common expressions
Reduce redundancy
Reduce area (number/size of gates)
May increase delay
More levels of logic
19
Decomposition Example
F = abc + abd + acd + bcd
~7 gates, ~3 levels
F = ab(c + d) + cd(a + b)
F = ab(c + d) + (c + d)(ab)
X = ab
1 gate, 1 level
Y=c+d
1 gate, 1 level
F = XY + XY
5 gates, 3 levels (or what?)
Gate Effort = 4*(3-input AND) + 4-input OR = 16 effort
Gate Effort = 2-input AND + 2-input OR + 2*(2-input AND)
+ 2-input OR = 10 effort
20
Extraction
Find common sub-expressions in functions
Like decomposition, but across more than one
function
Reduce redundancy
Reduce area (number/size of gates)
May increase delay if more logic levels introduced
21
Extraction Example
F = (a + b)cd + e
G = (a + b) e
H = cde
3 gates, 3 levels
2 gates, 2 levels
1 gate, 1 level
1
4
2
2
Define common terms: X = a + b, Y = cd
F = XY + e
G = Xe
H = Ye
gate, 1 level (each)
gates, 3 levels
gate, 2 levels
gate, 2 levels
Before:
(3) 2-input ORs, (2) 3-input ANDs, (1) 2-input AND
Gate Effort = 6 + 6 + 2 = 14
After
(2) 2-input ORs, (4) 2-input ANDs
Gate Effort = 4 + 8 = 12
22
Factoring
Traditional two-level logic is sum-of-products
Sometimes better expressed by product-of-sums
Fewer literals => less area
May increase delay if logic equation not completely
factored (becomes multi-level)
23
Factoring Example
Definitely good:
F = ac + ad + bc + bd
F = (a + b)(c + d)
Gate Effort = 8 + 4
Gate Effort = 4 + 2
Maybe good:
F = ac + ad + e
F = a(c + d) + e
Gate Effort = 7
Gate Effort = 6
Factoring may improve area...
But will likely increase delay (tradeoff)
24
Substitution
Similar to Extraction (in fact a sub-case of extraction)
When one function is subfunction of another
Reduce area
Fewer gates
Can increase delay if more logic levels
25
Substitution Example
G=a+b
F=a+b+c
1 gate, 1 level
1 gate, 1 level
F=G+c
2 gate, 2 levels
Before:
(1) 2-input OR, (1) 3-input OR => Gate Effort = 5
After
(2) 2-input ORs (but increased levels) => Gate Effort = 4
26
Elimination (Flattening)
Opposite of previous optimizations
Goal is to reduce delay
Make signals travel though as few logic levels as possible
But will likely increase area
Gate replication / redundant logic
27
Elimination Example
G=c+d
F = Ga + G' b
1 gate, 1 level
3 gates, 3 levels
G=c+d
F = ac + ad + bcd
1 gate, 1 level
4 gates, 2 levels
Before:
(2) 2-input ORs, (2) 2-input ANDs
After:
(1) 2-input OR, (1) 3-input OR, (2) 2-input ANDs,
(1) 3-input AND (but fewer levels)
28
compile_ultra Optimizations
High effort, maximum optimization
Automatic hierarchical ungrouping
Automatic datapath extraction
Ungroups small modules before mapping
Ungroups critical path based on delay
E.g. carry-save adders
Boundary optimization
Propagates logic across hierarchical boundaries (constants, NC
inputs/outputs, NOT)
Sequential inversion
Sequential elements can have their outputs inverted
29
Register Retiming
At the HDL level, determining the optimal placement of registers
is difficult and tedious at best, or just plain impossible at worst
The register retiming tool moves registers through the synthesized
combinational logic network to improve timing and/or area
Equalize delay (i.e. reduce critical path delay by increasing delay in other
paths)
Reduce the number of flip-flops if timing criteria are met
Usually propagate registers forward
Can also automatically pipeline combinational logic modules
30
Register Retiming Example [1]
31
Register Retiming Example [2]
32