Intermediate Representations &
Control-Flow Analysis
1. Static Single Assignment (SSA) form — concept & properties
Definition:
SSA (Static Single Assignment) requires that each variable be assigned exactly once,
and every use of a variable refers to a unique definition. Practically this is done
by renaming variables and inserting φ (phi) functions at control-flow merge points.
Key Points:
- Each value has a unique definition → simplifies analysis.
- φ functions select correct incoming values at join points.
- Simplifies optimizations: constant propagation, dead code elimination, etc.
Example (simplified SSA):
Original:
a=1
b=2
c=a+b
if c > 2 goto L1 else goto L2
L1: a = c + 1; goto L3
L2: b = c - 1; goto L3
L3: d = a + b
SSA:
a0 = 1
b0 = 2
c0 = a0 + b0
if c0 > 2 goto L1 else goto L2
L1: a1 = c0 + 1; goto L3
L2: b1 = c0 - 1; goto L3
L3: a2 = φ(a1, a0)
b2 = φ(b0, b1)
d0 = a2 + b2
2. Control Flow Graph (CFG) construction
Definition:
A Control-Flow Graph (CFG) is a directed graph where nodes are basic blocks and edges
represent possible control transfers.
Steps to build CFG:
1. Identify leaders (first instruction, jump targets, instruction after jumps).
2. Form basic blocks.
3. Add edges for fallthroughs and jumps.
Notes:
- Reducible CFGs: loops with single entry (dominance-based detection possible).
- Irreducible CFGs: multiple-entry loops (harder for optimization).
Example (CFG from the SSA example):
B0 → B1, B0 → B2
B1 → B3, B2 → B3
3. Dominance relations & dominance frontiers
Dominance:
- Node D dominates N if every path from entry to N passes through D.
- Immediate dominator: closest strict dominator of a node.
- Dominator tree: hierarchy based on immediate dominators.
Dominance Frontier (DF):
- DF(n) = { m | n dominates a predecessor of m but does not strictly dominate m }.
- Used for φ placement in SSA.
Algorithm:
- Compute dominators (e.g., Lengauer–Tarjan algorithm).
- Place φ nodes at blocks in the DF of definition sites of variables.
4. IR design for functional and imperative languages
Imperative IR:
- Three-Address Code (TAC), CFG-based.
- Explicit loads, stores, mutable variables.
- Examples: GCC GIMPLE, LLVM IR.
Functional IR:
- Based on immutability and expressions.
- Often represented in CPS (Continuation-Passing Style) or ANF (A-Normal Form).
- Natural fit for SSA because values are single-assignment.
Tradeoffs:
- Imperative IRs must model mutable memory and aliasing.
- Functional IRs emphasize closures, higher-order functions, and inlining.
5. SSA & def-use chains
Def-Use Chains:
- Map each definition to its uses (def→use).
- In SSA, every use maps to exactly one definition → simplifies analysis.
Benefits:
- Enables constant propagation, dead code elimination, global value numbering.
- Sparse Conditional Constant Propagation (SCCP) is efficient in SSA.
Maintaining Def-Use Chains:
- Update when inserting/removing/replacing instructions.
- Used for optimizations and transformations.
Lowering SSA:
- Translate φ functions into parallel copies on incoming edges.
- Register allocation resolves virtual registers to physical ones.