Instruction Issue Optimization
From Instruction Execution/Commit To Instruction Issue
-
Instruction Issue may also be a bottleneck
- To achieve CPI < 1, need to complete multiple instructions per clock
-
Solution: Multiple Issue
- Statically scheduled superscalar processors
- VLIW (very long instruction word) processors
- Dynamically scheduled superscalar processors
Multiple Issue in VLIW Processors
-
Very Long Instruction Word (VLIW)
- Definition:
- Package multiple operations into one instruction
- Rather than attempting to issue multiple, independent instructions to the units
- Package multiple operations into one instruction
- Static issue & static scheduling
- All hazards determined and indicated by compiler
- There must be enough parallelism in code to fill the available slots
- By unrolling loops and scheduling code
- Definition:
-
Disadvantages:
- Need to statically find parallelism
- Large Code size
- All the function units must be kept synchronized
- A stall in any functional unit pipeline must cause the entire processor to stall
- Binary code compatibility
- Different numbers of functional units and unit latencies require different versions of the code
Example of VLIW Processor
-
Example VLIW processor:
- One integer instruction (or branch)
- Two independent floating-point operations
- Two independent memory references
-
Loop unrolled into 7 copies, eliminating all stalls
- seven results in 9 cycles, or 1.29 cycles per result
- much faster than the single issue counterpart (3.5 cycles)
Multiple Issue in Dynamically Scheduled Superscalars
- Modern microarchitectures:
- Multiple issue + dynamic scheduling (+ speculation)
- Issue logic is the bottleneck in dynamically scheduled superscalars
- Two approaches to achieve multiple issue
- Pipeline:
- Assign reservation stations and update pipeline control table in half clock
- Only supports 2 instructions/clock
- Widen the issue logic:
- design logic to handle any possible dependencies between the instructions
- Pipeline:
- Hybrid approaches are used in modern superscalar processors that issues ≥ 4 instructions per clock
- Two approaches to achieve multiple issue
Basic Strategy in Dynamically Scheduled Superscalar Processors
- Basic strategy for updating the issue logic and the RS table in a dynami