0% found this document useful (0 votes)
14 views88 pages

End02 Ca03 Noor

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views88 pages

End02 Ca03 Noor

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dr Noor Mahammad Sk 1

INTRODUCTION TO
ADVANCED PIPELINE
4 March 2025 Dr Noor Mahammad Sk
Review: Summary of Pipelining Basics
2

 Hazards limit performance


 Structural: need more HW resources
 Data: need forwarding, compiler scheduling
 Control: early evaluation & PC, delayed branch, prediction
 Increasing length of pipe increases impact of hazards; pipelining
helps instruction bandwidth, not latency
 Interrupts, instruction set, FP makes pipelining harder
 Compilers reduce cost of data and control hazards
 Load delay slots
 Branch delay slots
 Branch prediction
 Today: Longer pipelines (R4000)  more instruction level
parallelism  SW and HW loop unrolling

4 March 2025 Dr Noor Mahammad Sk


Case Study: MIPS R4000 (200MHz)
3

 8 Stage Pipeline:
 IF–first half of fetching of instruction; PC selection happens here as
well as initiation of instruction cache access.
 IS–second half of access to instruction cache.
 RF–instruction decode and register fetch, hazard checking and also
instruction cache hit detection.
 EX–execution, which includes effective address calculation, ALU
operation, and branch target computation and condition evaluation.
 DF–data fetch, first half of access to data cache.
 DS–second half of access to data cache.
 TC–tag check, determine whether the data cache access hit.
 WB–write back for loads and register-register operations.
 8 Stages: What is impact on Load delay? Branch delay? Why?

4 March 2025 Dr Noor Mahammad Sk


Case Study: MIPS R4000
4

TWO Cycle IF IS RF EX DF DS TC WB
Load Latency IF IS RF EX DF DS TC
IF IS RF EX DF DS
IF IS RF EX DF
IF IS RF EX
IF IS RF
IF IS
IF

THREE Cycle IF IS RF EX DF DS TC WB
Branch Latency IF IS RF EX DF DS TC
(conditions evaluated IF IS RF EX DF DS
during EX phase) IF IS RF EX DF
IF IS RF EX
Delay slot plus two stalls IF IS RF
Branch likely cancels delay slot if not taken IF IS
IF

4 March 2025 Dr Noor Mahammad Sk


MIPS R4000 Floating Point
5

 FP Adder, FP Multiplier, FP Divider


 Last step of FP Multiplier/Divider uses FP Adder HW
 8 kinds of stages in FP units:
Stage Functional unit Description
A FP adder Mantissa ADD stage
D FP divider Divide pipeline stage
E FP multiplier Exception test stage
M FP multiplier First stage of multiplier
N FP multiplier Second stage of multiplier
R FP adder Rounding stage
S FP adder Operand shift stage
U Unpack FP numbers

4 March 2025 Dr Noor Mahammad Sk


MIPS FP Pipe Stages
6

FP Instr 1 2 3 4 5 6 7 8 …
Add, Subtract U S+A A+R R+S
Multiply U E+M M M M N N+A R
Divide U A R D28 … D+A D+R, D+R, D+A, D+R, A, R
Square root U E (A+R)108 … A R
Negate U S
Absolute value U S
FP compare U A R
Stages:
M First stage of multiplier A Mantissa ADD stage
N Second stage of multiplier D Divide pipeline stage
R Rounding stage E Exception test stage
S Operand shift stage
U Unpack FP numbers

4 March 2025 Dr Noor Mahammad Sk


FP Loop: Where are the Hazards?
7

Loop: LD F0,0(R1) ;F0=vector element


ADDD F4,F0,F2 ;add scalar from F2
SD 0(R1),F4 ;store result
SUBI R1,R1,8 ;decrement pointer 8B (DW)
BNEZ R1,Loop ;branch R1!=zero
NOP ;delayed branch slot
Instruction producing result Instruction using result Latency in clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
Load double Store double 0
Integer op Integer op 0
Where are the stalls?
4 March 2025 Dr Noor Mahammad Sk
FP Loop Hazards
8

Loop: LD F0, 0(R1) ;F0=vector element


ADDD F4, F0, F2 ;add scalar from F2
SD 0(R1), F4 ;store result
SUBI R1, R1, 8 ;decrement pointer 8 Bytes (DW)
BNEZ R1, Loop ;branch R1!=zero
NOP ;delayed branch slot
Instruction producing result Instruction using result Latency in clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
Load double Store double 0
Integer op Integer op 0

4 March 2025 Dr Noor Mahammad Sk


FP Loop Showing Stalls
9

1 Loop: LD F0, 0(R1) ;F0=vector element


2 Stall
3 ADDD F4, F0, F2 F4,F0,F2
4 Stall
5 Stall
6 SD 0(R1), F4 ;store result
7 SUBI R1, R1, 8 ; decrement pointer 8B (DW)
8 BNEZ R1, Loop ;branch R1 != zero
9 Stall ;delayed branch slot
Instruction producing result Instruction using result Latency in clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1

4 March 2025 9 clocks: Rewrite code to minimize stalls? Dr Noor Mahammad Sk


Revised FP Loop Minimizing Stalls
10

1 Loop: LD F0, 0(R1)


2 Stall
3 ADDD F4, F0, F2
4 SUBI R1, R1, 8
5 BNEZ R1, Loop ; delayed branch
6 SD 8(R1), F4 ; altered when move past SUBI
Replace BNEZ stall with SD by changing address of SD
Instruction producing result Instruction using result Latency in clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
9 clocks: Rewrite code to minimize stalls?
4 March 2025 Dr Noor Mahammad Sk
Unroll Loop Four Times
11
(straightforward way)
1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
Rewrite loop to
3 SD 0(R1),F4 ;drop SUBI & BNEZ minimize stalls?
4 LD F6,-8(R1)
5 ADDD F8,F6,F2
6 SD -8(R1),F8 ;drop SUBI & BNEZ
7 LD F10,-16(R1)
8 ADDD F12,F10,F2
9 SD -16(R1),F12 ;drop SUBI & BNEZ
10 LD F14,-24(R1)
11 ADDD F16,F14,F2
12 SD -24(R1),F16
13 SUBI R1,R1,#32 ;alter to 4*8
14 BNEZ R1,LOOP
15 NOP

15 + 4 x (1+2) = 27 clock cycles


4 March 2025 Dr Noor Mahammad Sk
Unrolled Loop That Minimizes Stalls
12

1 Loop: LD F0,0(R1)
 What assumptions made
2 LD F6,-8(R1)
3 LD F10,-16(R1)
when moved code?
4 LD F14,-24(R1)  OK to move store past
5 ADDD F4,F0,F2 SUBI even though changes
6 ADDD F8,F6,F2 register
7 ADDD F12,F10,F2  OK to move loads before
8 ADDD F16,F14,F2 stores: get right data?
9 SD 0(R1),F4  When is it safe for
10 SD -8(R1),F8 compiler to do such
11 SD -16(R1),F12 changes?
12 SUBI R1,R1,#32
13 BNEZ R1,LOOP
14 SD 8(R1),F16 ; 8-32 = -24

14 clock cycles,
When safe to move instructions?
4 March 2025 Dr Noor Mahammad Sk
Compiler Perspectives on Code
13
Movement
 Definitions: compiler concerned about dependencies in
program, whether or not a HW hazard depends on a
given pipeline
 Try to schedule to avoid hazards
 (True) Data dependencies (RAW if a hazard for HW)
 Instruction i produces a result used by instruction j, or
 Instruction j is data dependent on instruction k, and
instruction k is data dependent on instruction i.
 If dependent, can’t execute in parallel
 Easy to determine for registers (fixed names)
 Hard for memory:
 Does 100(R4) = 20(R6)?
 From different loop iterations, does 20(R6) = 20(R6)?
4 March 2025 Dr Noor Mahammad Sk
Where are the data dependencies?
14

1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SUBI R1,R1,8
4 BNEZ R1,Loop ;delayed branch
5 SD 8(R1),F4 ;altered when move past SUBI

4 March 2025 Dr Noor Mahammad Sk


Compiler Perspectives on Code
15
Movement
 Another kind of dependence called name dependence:
two instructions use same name (register or memory
location) but don’t exchange data
 Antidependence (WAR if a hazard for HW)
 Instruction j writes a register or memory location that
instruction i reads from and instruction i is executed first
 Output dependence (WAW if a hazard for HW)
 Instruction i and instruction j write the same register or
memory location; ordering between instructions must be
preserved.

4 March 2025 Dr Noor Mahammad Sk


Where are the name dependencies?
16

1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SD 0(R1),F4 ;drop SUBI & BNEZ
4 LD F0,-8(R1)
2 ADDD F4,F0,F2
3 SD -8(R1),F4 ;drop SUBI & BNEZ
7 LD F0,-16(R1)
8 ADDD F4,F0,F2
9 SD -16(R1),F4 ;drop SUBI & BNEZ
10 LD F0,-24(R1)
11 ADDD F4,F0,F2
12 SD -24(R1),F4
13 SUBI R1,R1,#32 ;alter to 4*8
14 BNEZ R1,LOOP
15 NOP
How can remove them?
4 March 2025 Dr Noor Mahammad Sk
Where are the name dependencies?
17

1 Loop: LD F0,0(R1) 1 Loop: LD F0,0(R1)


2 ADDD F4,F0,F2 2 ADDD F4,F0,F2
3 SD 0(R1),F4 3 SD 0(R1),F4
4 LD F0,-8(R1) 4 LD F6,-8(R1)
2 ADDD F4,F0,F2 5 ADDD F8,F6,F2
3 SD -8(R1),F4 6 SD -8(R1),F8
7 LD F0,-16(R1) 7 LD F10,-16(R1)
8 ADDD F4,F0,F2 8 ADDD F12,F10,F2
9 SD -16(R1),F4 9 SD -16(R1),F12
10 LD F0,-24(R1) 10 LD F14,-24(R1)
11 ADDD F4,F0,F2 11 ADDD F16,F14,F2
12 SD -24(R1),F4 12 SD -24(R1),F16
13 SUBI R1,R1,#32 13 SUBI R1,R1,#32
14 BNEZ R1,LOOP 14 BNEZ R1,LOOP
15 NOP 15 NOP
How can remove them? Called “register renaming”
4 March 2025 Dr Noor Mahammad Sk
Compiler Perspectives on Code
18
Movement
 Again Name Dependencies are Hard for Memory
Accesses
 Does 100(R4) = 20(R6)?
 From different loop iterations, does 20(R6) = 20(R6)?

 Our example required compiler to know that if R1


doesn’t change then:

0(R1) ≠ -8(R1) ≠ -16(R1) ≠ -24(R1)

There were no dependencies between some loads


and stores so they could be moved by each other

4 March 2025 Dr Noor Mahammad Sk


Compiler Perspectives on Code
19
Movement
 Final kind of dependence called control dependence
 Example
if p1 {S1;};
if p2 {S2;};
S1 is control dependent on p1 and S2 is control
dependent on p2 but not on p1.

4 March 2025 Dr Noor Mahammad Sk


Compiler Perspectives on Code
20
Movement
 Two (obvious) constraints on control dependences:
 An instruction that is control dependent on a branch
cannot be moved before the branch so that its execution is
no longer controlled by the branch.
 An instruction that is not control dependent on a branch
cannot be moved to after the branch so that its execution
is controlled by the branch.
 Control dependencies relaxed to get parallelism; get
same effect if preserve order of exceptions (address in
register checked by branch before use) and data flow
(value in register depends on branch)

4 March 2025 Dr Noor Mahammad Sk


Where are the control dependencies?
21

1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SD 0(R1),F4
4 SUBI R1,R1,8
5 BEQZ R1,exit
6 LD F0,0(R1)
7 ADDD F4,F0,F2
8 SD 0(R1),F4
9 SUBI R1,R1,8
10 BEQZ R1,exit
11 LD F0,0(R1)
12 ADDD F4,F0,F2
13 SD 0(R1),F4
14 SUBI R1,R1,8
15 BEQZ R1,exit
....
4 March 2025 Dr Noor Mahammad Sk
When Safe to Unroll Loop?
22

 Example: Where are data dependencies?


(A,B,C distinct & nonoverlapping)
for (i=1; i<=100; i=i+1) {
A[i+1] = A[i] + C[i]; /* S1 */
B[i+1] = B[i] + A[i+1];} /* S2 */

1. S2 uses the value, A[i+1], computed by S1 in the same


iteration.
2. S1 uses a value computed by S1 in an earlier iteration,
since iteration i computes A[i+1] which is read in iteration i+1.
The same is true of S2 for B[i] and B[i+1].
This is a “loop-carried dependence”: between iterations
 Implies that iterations are dependent, and can’t be executed
in parallel
 Not the case for our prior example; each iteration was distinct
4 March 2025 Dr Noor Mahammad Sk
HW Schemes: Instruction Parallelism
23

 Why in HW at run time?


 Works when can’t know real dependence at compile time
 Compiler simpler
 Code for one machine runs well on another

 Key idea: Allow instructions behind stall to proceed


DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
 Enables out-of-order execution  out-of-order completion
 ID stage checked both for structural Scoreboard dates to
CDC 6600 in 1963

4 March 2025 Dr Noor Mahammad Sk


HW Schemes: Instruction Parallelism
24

 Out-of-order execution divides ID stage:


[Link]—decode instructions, check for structural
hazards
[Link] operands—wait until no data hazards, then read
operands
 Scoreboards allow instruction to execute whenever 1
& 2 hold, not waiting for prior instructions
 CDC 6600: In order issue, out of order execution, out
of order commit ( also called completion)

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Implications
25

 Out-of-order completion WAR, WAW hazards?


 Solutions for WAR
 Queue both the operation and copies of its operands
 Read registers only during Read Operands stage

 For WAW, must detect hazard: stall until other completes


 Need to have multiple instructions in execution phase 
multiple execution units or pipelined execution units
 Scoreboard keeps track of dependencies, state or
operations
 Scoreboard replaces ID, EX, WB with 4 stages

4 March 2025 Dr Noor Mahammad Sk


Four Stages of Scoreboard Control
26

1. Issue—decode instructions & check for structural hazards (ID1)


If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the scoreboard
issues the instruction to the functional unit and updates its internal
data structure. If a structural or WAW hazard exists, then the
instruction issue stalls, and no further instructions will issue until these
hazards are cleared.
2. Read operands—wait until no data hazards, then read operands
(ID2)
A source operand is available if no earlier issued active instruction is
going to write it, or if the register containing the operand is being
written by a currently active functional unit. When the source
operands are available, the scoreboard tells the functional unit to
proceed to read the operands from the registers and begin execution.
The scoreboard resolves RAW hazards dynamically in this step, and
instructions may be sent into execution out of order.

4 March 2025 Dr Noor Mahammad Sk


Four Stages of Scoreboard Control
27

3. Execution—operate on operands (EX)


The functional unit begins execution upon receiving operands.
When the result is ready, it notifies the scoreboard that it has
completed execution.
4. Write result—finish execution (WB)
Once the scoreboard is aware that the functional unit has
completed execution, the scoreboard checks for WAR hazards. If
none, it writes results. If WAR, then it stalls the instruction.
Example:
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads
operands

4 March 2025 Dr Noor Mahammad Sk


Three Parts of the Scoreboard
28

1. Instruction status—which of 4 steps the instruction is in


2. Functional unit status—Indicates the state of the
functional unit (FU). 9 fields for each functional unit
Busy—Indicates whether the unit is busy or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready
3. Register result status—Indicates which functional unit
will write each register, if one exists. Blank when no
pending instructions will write that register

4 March 2025 Dr Noor Mahammad Sk


Detailed Scoreboard Pipeline Control
29

Instruction
Wait until Bookkeeping
status
Busy(FU) yes; Op(FU) op;
Fi(FU) `D’; Fj(FU) `S1’;
Not busy (FU) and
Issue Fk(FU) `S2’; Qj Result(‘S1’);
not result(D)
Qk Result(`S2’); Rj not Qj;
Rk not Qk; Result(‘D’) FU;
Read
Rj and Rk Rj No; Rk No
operands
Execution
Functional unit done
complete

f((Fj( f )≠Fi(FU)
f(if Qj(f)=FU then Rj(f) Yes);
or Rj( f )=No) & (Fk(
Write result f(if Qk(f)=FU then Rj(f) Yes);
f ) ≠Fi(FU) or
Result(Fi(FU)) 0; Busy(FU) No
Rk( f )=No))

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example
30

FP Add latency = 2 clocks, Multiply = 10, Divide = 40


Instruction status Read Execution Write
Instruction j k Issue operandscompleteResult
LD F6 34+ R2
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 1
31

Instruction status Read ExecutionWrite


Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 2
32

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer

• Issue 2nd LD?


4 March 2025 Dr Noor Mahammad Sk
Scoreboard Example Cycle 3
33

Instruction status Read Execution


W rite
Instruction j k 1
Issue operandscompleteResult
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 4
34

Instruction status Read Execution


Write
Instruction j k Issueoperands
complete
Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVDF10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for FU
j for F
k j? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 5
35

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 6
36

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULTDF0 F2 F4 6
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 7
37

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 8a
38

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 8b
39

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 9
40

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide

• Read operands for MULT & SUBD? Issue ADDD?


4 March 2025 Dr Noor Mahammad Sk
Scoreboard Example Cycle 11
41

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 12
42

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide

• Read operands for DIVD?


4 March 2025 Dr Noor Mahammad Sk
Scoreboard Example Cycle 13
43

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 14
44

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 15
45

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
1 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 16
46

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 17
47

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide

• Write result of ADDD?


4 March 2025 Dr Noor Mahammad Sk
Scoreboard Example Cycle 18
48

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 19
49

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 20
50

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 21
51

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 22
52

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
40 Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 61
53

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide

4 March 2025 Dr Noor Mahammad Sk


Scoreboard Example Cycle 62
54

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

4 March 2025 Dr Noor Mahammad Sk


Review: Scoreboard Example Cycle 62
55

Instruction status Read Execution


W rite
In-order issue;
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4 out-of-order
LD F2 45+ R3 5 6 7 8 execute & commit
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

4 March 2025 Dr Noor Mahammad Sk


CDC 6600 Scoreboard
56

 Limitations of 6600 scoreboard:


 No forwarding hardware
 Limited to instructions in basic block (small window)

 Small number of functional units (structural hazards),


especially integer/load store units
 Do not issue on structural hazards

 Wait for WAR hazards

 Prevent WAW hazards

4 March 2025 Dr Noor Mahammad Sk


Dr Noor Mahammad Sk 57

ANOTHER CASE STUDY


EXAMPLE
4 March 2025
ILP Continues….
58

 Data Hazards
 LOAD R1, [R2 + 10] // Loads into R1
 ADD R3, R1, R2 //R3 = R1 + R2
 This is the “Read After Write (RAW)” Data Hazard for
R1
 LD R1, [R2+10]
 ADD R3, R1, R12
 LD R1, [R2 + 14]
 ADD R12, R1, R2

 This shows the WAW for R1 and WAR for R12

4 March 2025 Dr Noor Mahammad Sk


ILP – Pipelining Advanced
59

Superscalar: CPI < 1


Fetch + Inc. PC
Success: Different Instrns take
different cycle time

Decode Instrn

Fetch Data

Execute Unit 1 Execute Unit 2 Execute Unit K

Store Data Four FMULs while one FDIV


Implies – Out-of-Order Execution

4 March 2025 Dr Noor Mahammad Sk


Difficulties in Superscalar Construction
60

 Ensuring no Data Hazards among several instructions


executing in the different execution units at a same
point of time.
 If this is done by compiler – then Static Instruction
Scheduling – VLIW - Itanium
 Done by the hardware – then Dynamic Instruction
Scehduling – Tomasulo – MIPS Embedded Processor

4 March 2025 Dr Noor Mahammad Sk


Static Instruction Scheduling
61

 Compiler make bundles of “K” instructions that can be put at the same
time to the execution units such that there are no data dependencies
between them.
 Very Long Instruction Word (VLIW) to accommodate “K’ instructions at a time
 Lot of “NOPS” if the bundle cannot be filled with relevant instructions
 Size of the executable

 Does not complicate the Hardware


 Source code portability – if I make the next gen processor with K+5 units
(say) – then?
 Solved by having a software/firmware emulator which has a negative
say in the performance.

4 March 2025 Dr Noor Mahammad Sk


Dynamic Instruction Scheduling
62

 The data hazards are handled by the hardware


 RAW using Operand Forwarding Technique
 WAR and WAW using Register Renaming Technique

4 March 2025 Dr Noor Mahammad Sk


Processor Overview
63

Why should
result of LD go
to R2 in Reg file
and then reload Processor
to ALU? ALU/Control
Forward the Multiple function Register File
same on its way Units
to reg file

Bus

RAW
LD [R1+20],R2
Memory
ADD R3,R2,R4

4 March 2025 Dr Noor Mahammad Sk


Register Renaming
64

Dependencies due to Reg R1


1. ADD R1, R2, R3 RAW: (1,2), (1,4), (1,5) (3,4)
2. ST R1, [R4+50] (3,5)
WAR: (2,3), (2,6), (4,6), (5,6)
3. ADD R1, R5, R6
WAW: (1,3), (1,6), (3,6)
4. SUB R7,R1,R8
5. ST R1, [R4 + 54]
6. ADD R1, R9, R10

4 March 2025 Dr Noor Mahammad Sk


Register Renaming: Static Scheduling
65

1. ADD R1, R2, R3 Rename R1 to R12 after Instruction 3


till Instruction 6
2. ST R1, [R4+50] Dependency only within a window and
3. ADD R12, R5, R6 not the whole program.

4. SUB R7,R12,R8 Only WAR and WAW are between


(1,6) and (2,6) which are far away in
5. ST R12, [R4 + 54] the program order

6. ADD R1, R9,R10 Increases Register pressure for the


compiler

4 March 2025 Dr Noor Mahammad Sk


Dynamic Scheduling - Tomasulo
Instruction Fetch Unit
66
To Reg
file/Mem
Register Status Indicator
Reservation
Station

Exec 1 Exec 2 Exec 3 Exec 4

Common Data Bus (CDB)

Instructions are fetched one by one and decoded to find the type of
operation and the source of operands

4 March 2025 Dr Noor Mahammad Sk


Dynamic Scheduling - Tomasulo
Instruction Fetch Unit
67
To Reg
file/Mem
Register Status Indicator
Reservation
Station

Exec 1 Exec 2 Exec 3 Exec 4

Common Data Bus (CDB)


Register Status Indicator indicates whether the latest value of the
register is in the reg file or currently being computed by some
execution unit and if the latter it states the execution unit number

4 March 2025 Dr Noor Mahammad Sk


Dynamic Scheduling - Tomasulo
Instruction Fetch Unit
68
To Reg
file/Mem
Register Status Indicator
Reservation
Station

Exec 1 Exec 2 Exec 3 Exec 4

Common Data Bus (CDB)


If all operands available then operation proceeds in the allotted execution
unit, else, it waits in the reservation station of the allotted execution unit
pinging the CDB

4 March 2025 Dr Noor Mahammad Sk


Dynamic Scheduling - Tomasulo
Instruction Fetch Unit
69
To Reg
file/Mem
Register Status Indicator
Reservation
Station

Exec 1 Exec 2 Exec 3 Exec 4

Common Data Bus (CDB)

Every Execution unit writes the result along with the unit number on to the CDB
which is forwarded to all reservation stations, Reg-file and Memory

4 March 2025 Dr Noor Mahammad Sk


1. ADD R1, R2, R3
An Example: 2.
3.
ST R1, [R4+50]
ADD R1, R5, R6
70
4. SUB R7,R1,R8
Instruction Fetch 5. ST R1, [R4 + 54]
6. ADD R1, R9, R10

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 0 0 0 0 0 0 0 0 0 0

Empty Empty Empty Empty Empty Empty

4 March 2025 Dr Noor Mahammad Sk


1. --
An Example: 2.
3.
ST R1, [R4+50]
ADD R1, R5, R6
71
4. SUB R7,R1,R8
Instruction Fetch 5. ST R1, [R4 + 54]
6. ADD R1, R9, R10
ADD R1, R2, R3

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 1 0 0 0 0 0 0 0 0 0

Ins 1 Empty Empty Empty Empty Empty

4 March 2025 Dr Noor Mahammad Sk


1. ---
An Example: 2.
3.
---
ADD R1, R5, R6
72
4. SUB R7,R1,R8
Instruction Fetch 5. ST R1, [R4 + 54]
6. ADD R1, R9, R10
ST R1, [R4+50]

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 1 0 0 0 0 0 0 0 0 0

I 1, E I 2, W 1 Empty Empty Empty Empty

4 March 2025 Dr Noor Mahammad Sk


1. ---
An Example: 2.
3.
---
---
73
4. SUB R7,R1,R8
Instruction Fetch 5. ST R1, [R4 + 54]
6. ADD R1, R9, R10
ADD R1, R5, R6

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 3 0 0 0 0 0 0 0 0 0

I 1, E I 2, W 1 I 3, E Empty Empty Empty

Note: Reservation Station stores the number of the execution unit that shall yield the
latest value of a register.
4 March 2025 Dr Noor Mahammad Sk
1. ---
An Example: 2.
3.
---
---
74
4. ---
Instruction Fetch 5. ST R1, [R4 + 54]
6. ADD R1, R9, R10
SUB R7,R1,R8

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 3 0 0 0 0 0 4 0 0 0

I 1, E I 2, W 1 I 3, E I 4, W 3 Empty Empty

4 March 2025 Dr Noor Mahammad Sk


1. ---
An Example: 2.
3.
---
---
75
4. ---
Instruction Fetch 5. ---
6. ADD R1, R9, R10
ST R1, [R4 + 54]

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 3 0 0 0 0 0 4 0 0 0

I 1, E I 2, W 1 I 3, E I 4, W 3 I 5, W 3 Empty

4 March 2025 Dr Noor Mahammad Sk


1. ---
An Example: 2.
3.
---
---
76
4. ---
Instruction Fetch 5. ---
6. ---
ADD R1, R9, R10

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 6 0 0 0 0 0 4 0 0 0

I 1, E I 2, W 1 I 3, E I 4, W 3 I 5, W 3 I 6, E

4 March 2025 Dr Noor Mahammad Sk


1. ADD R1, R2, R3
An Example: 2.
3.
ST U1, [R4+50]
ADD R1, R5, R6
77
4. SUB R7, U3, R8
Instruction Fetch 5. ST U3, [R4 + 54]
6. ADD R1, R9, R10
ADD R1, R9, R10

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 6 0 0 0 0 0 4 0 0 0

I 1, E I 2, W 1 I 3, E I 4, W 3 I 5, W 3 I 6, E

Effectively three Instructions are executing and others waiting for the appropriate results. The
whole program is converted as shown above.

4 March 2025 Dr Noor Mahammad Sk


1. ADD R1, R2, R3
An Example: 2.
3.
ST U1, [R4+50]
ADD R1, R5, R6
78
4. SUB R7, U3, R8
Instruction Fetch 5. ST U3, [R4 + 54]
6. ADD R1, R9, R10
ADD R1, R9, R10

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 6 0 0 0 0 0 4 0 0 0

I 1, E I 2, W 1 I 3, E I 4, W 3 I 5, W 3 I 6, E

See that Operand Forwarding and Register Renaming is done automatically

4 March 2025 Dr Noor Mahammad Sk


1. ADD R1, R2, R3
An Example: 2.
3.
ST U1, [R4+50]
ADD R1, R5, R6
79
4. SUB R7, U3, R8
Instruction Fetch 5. ST U3, [R4 + 54]
6. ADD R1, R9, R10
ADD R1, R9, R10

Register Status Indicator


Reg R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
Number
Status 6 0 0 0 0 0 4 0 0 0

I 1, E I 2, W 1 I 3, E I 4, W 3 I 5, W 3 I 6, E

Execution unit 6, on completion will make R1 entry in Register Status Indicator 0. Similarly unit
4 will make R7 entry 0.

4 March 2025 Dr Noor Mahammad Sk


Dynamic Scheduling
80

 Rearrange order of instructions to reduce stalls while


maintaining data flow

 Advantages:
 Compiler doesn’t need to have knowledge of
microarchitecture
 Handles cases where dependencies are unknown at
compile time

 Disadvantage:
 Substantial increase in hardware complexity
 Complicates exceptions

4 March 2025 Dr Noor Mahammad Sk


Dynamic Scheduling
81

 Dynamic scheduling implies:


 Out-of-order execution
 Out-of-order completion

 Creates the possibility for WAR and WAW hazards

 Tomasulo’s Approach
 Trackswhen operands are available
 Introduces register renaming in hardware
 Minimizes WAW and WAR hazards

4 March 2025 Dr Noor Mahammad Sk


Register Renaming
82

 Example:

DIV.D F0,F2,F4
ADD.D F6,F0,F8
S.D F6,0(R1) antidependence

SUB.D F8,F10,F14 antidependence


MUL.D F6,F10,F8

+ name dependence with F6

4 March 2025 Dr Noor Mahammad Sk


Register Renaming
83

 Example:

DIV.D F0,F2,F4
ADD.D S,F0,F8
S.D S,0(R1)
SUB.D T,F10,F14
MUL.D F6,F10,T

 Now only RAW hazards remain, which can be strictly


ordered

4 March 2025 Dr Noor Mahammad Sk


Register Renaming
84

 Register renaming is provided by reservation stations (RS)


 Contains:
 The instruction
 Buffered operand values (when available)
 Reservation station number of instruction providing the
operand values
 RS fetches and buffers an operand as soon as it becomes available (not
necessarily involving register file)
 Pending instructions designate the RS to which they will send their output
 Result values broadcast on a result bus, called the common data bus (CDB)
 Only the last output updates the register file
 As instructions are issued, the register specifiers are renamed with the
reservation station
 May be more reservation stations than registers

4 March 2025 Dr Noor Mahammad Sk


Tomasulo’s Algorithm
85

 Load and store buffers


 Contain data and addresses,
act like reservation stations

 Top-level design:

4 March 2025 Dr Noor Mahammad Sk


Tomasulo’s Algorithm
86

 Three Steps:
 Issue
 Get next instruction from FIFO queue
 If available RS, issue the instruction to the RS with operand values if available
 If operand values not available, stall the instruction
 Execute
 When operand becomes available, store it in any reservation stations waiting for
it
 When all operands are ready, issue the instruction
 Loads and store maintained in program order through effective address
 No instruction allowed to initiate execution until all branches that proceed it in
program order have completed
 Write result
 Write result on CDB into reservation stations and store buffers
 (Stores must wait until address and value are received)

4 March 2025 Dr Noor Mahammad Sk


Example
87

4 March 2025 Dr Noor Mahammad Sk


Dr Noor Mahammad Sk 88

THANK YOU!!
4 March 2025

You might also like