CS530-Fall2015-Lecture9
CS530-Fall2015-Lecture9
3 4
Time 6 PM 7 8 9 10 11 Midnight
30 40 20 30 40 20 30 40 20 30 40 20 Time
T 30 40 40 40 40 20
a A
s T
a A
k
B s • Pipelined laundry
O k
B
takes 3.5 hours for 4
r
d
C O loads
e r
C
r d
D
e
r D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
5 6
1
9/22/15
MUX
Time
Adder
• Pipeline rate limited by slowest Next SEQ PC
T pipeline stage
30 40 40 40 40 20
a
• Multiple tasks operating 4 RS1
Zero?
MUX MUX
A simultaneously
Address
RS2
Memory
Reg File
k
Inst
ALU
• Potential speedup = Number
Memory
pipe stages RD L
Data
O
M
MUX
r B • Unbalanced lengths of pipe D
d stages reduces speedup
Sign
e • Time to “fill” pipeline and time Imm
C Extend
r to “drain” it reduces speedup
IR <= mem[PC] WB Data
D PC <= PC + 4
7 8
Reg[IRrd] <= Reg[IRrs] opIRop Reg[IRrt ]
Instruction Instr. Decode Execute Memory Write Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back Fetch Reg. Fetch Addr. Calc Access Back
Next PC Next PC
MUX
MUX
Next SEQ PC Next SEQ PC Next SEQ PC Next SEQ PC
Adder
Adder
4 RS1
Zero?
4 RS1
Zero?
MUX MUX
MUX MUX
Address
MEM/WB
Address
MEM/WB
Memory
Memory
RS2 RS2
EX/MEM
EX/MEM
Reg File
Reg File
ID/EX
ID/EX
IF/ID
IF/ID
ALU
ALU
Memory
Memory
Data
Data
MUX
MUX
IR <= mem[PC];
WB Data
WB Data
Sign Sign
Extend Extend
PC <= PC + 4 Imm Imm
RD RD RD RD RD RD
A <= Reg[IRrs];
WB <= rslt • Data stationary control
B <= Reg[IRrt] 9 10
rslt <= A opIRop B Reg[IRrd] <= WB – local decode for each instruction phase / pipeline stage
11 12
2
9/22/15
ALU
Load Ifetch Reg DMem Reg
– two different instructions use same storage
– must appear as if the instructions execute in correct order n
s
ALU
• Control hazard Instr 1 Ifetch Reg DMem Reg
ALU
Ifetch Reg DMem Reg
Instr 2
Resolution O
• Pipeline interlock logic detects hazards and fixes them r
ALU
Reg Reg
Instr 3 Ifetch DMem
ALU
Ifetch Reg DMem Reg
Instr 4
• better solution: partial stall -
– some instruction stall, others proceed better to stall early than late
13 14
I
ALU
n
s
ALU
15 16
How do you “bubble” the pipe?
• low cost, simple • Always use the resource in the same pipeline stage
• Use the resource for one cycle only
• Increases CPI
Many RISC ISA’s designed with this in mind
• use for rare case since stalling has performance effect
Pipeline hardware resource Sometimes very complex to do this. For example, memory of necessity
is used in the IF and MEM stages.
• useful for multi-cycle resources
Some common Structural Hazards:
• good performance • Memory - we’ve already mentioned this one.
• sometimes complex e.g., RAM • Floating point - Since many floating point instructions require
Replicate resource many cycles, it’s easy for them to interfere with each other.
• Starting up more of one type of instruction than there are
• good performance
resources. For instance, the PA-8600 can support two ALU + two
• increases cost (+ maybe interconnect delay) load/store instructions per cycle - that’s how much hardware it
• useful for cheap or divisible resources has available.
17 18
3
9/22/15
Data Hazard on R1
Figure A.6, Page A-17 Three Generic Data Hazards
Time (clock cycles)
n
s I: add r1,r2,r3
t
ALU
21 22
4
9/22/15
ALU
Reg DMem Reg
mux
Registers
t
MEM/WR
ALU
EX/MEM
r.
ID/EX
ALU
sub r4,r1,r3 Ifetch Reg DMem Reg
O Data
mux
r Memory
ALU
Ifetch Reg DMem Reg
and r6,r1,r7
d
mux
Immediate
e
r
ALU
Ifetch Reg DMem Reg
or r8,r1,r9
ALU
Ifetch Reg DMem Reg
xor r10,r1,r11
Forwarding to Avoid LW-SW Data Hazard Data Hazard Even with Forwarding
Figure A.8, Page A-20 Figure A.9, Page A-21
s
t
I
ALU
r. lw r1, 0(r2) Ifetch Reg DMem Reg
ALU
n
O s
r t
ALU
ALU
Ifetch Reg DMem Reg sub r4,r1,r6 Ifetch Reg DMem Reg
sw r4,12(r1)
d
r.
e
r
ALU
Reg
O
ALU
Ifetch Reg DMem
or r8,r6,r9 and r6,r1,r7 Ifetch Reg DMem Reg
r
d
ALU
ALU
Ifetch Reg DMem Reg
or r8,r1,r9
r
27 28
O
r
d Bubble
ALU
5