0% found this document useful (0 votes)
2 views5 pages

CS530-Fall2015-Lecture9

The document discusses instruction level parallelism (ILP) and pipelining, focusing on how pipelining can improve throughput but not latency of individual tasks. It outlines the structure of a MIPS datapath, the challenges of hazards that can occur during pipelining, and methods to resolve these issues. Key concepts include the types of hazards (structural, data, control) and techniques like forwarding to mitigate data hazards.

Uploaded by

oalqudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

CS530-Fall2015-Lecture9

The document discusses instruction level parallelism (ILP) and pipelining, focusing on how pipelining can improve throughput but not latency of individual tasks. It outlines the structure of a MIPS datapath, the challenges of hazards that can occur during pipelining, and methods to resolve these issues. Key concepts include the types of hazards (structural, data, control) and techniques like forwarding to mitigate data hazards.

Uploaded by

oalqudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/22/15

Pipelining & ILP

Instruction • ILP is focus of Chapter 3


Level
Parallelism: • Appendix C discusses basics
Chapter 3 &
Appendix C,
Part 1
Gregory D. Peterson
[email protected]

Datapath vs Control What Is Pipelining


Datapath Controller • Laundry Example
• Ann, Brian, Cathy, Dave
signals each have one load of clothes A B C D
to wash, dry, and fold
• Washer takes 30 minutes

Control Points • Dryer takes 40 minutes


• Datapath: Storage, FU, interconnect sufficient to perform the desired functions
– Inputs are Control Points
– Outputs are signals
• “Folder” takes 20 minutes
• Controller: State machine to orchestrate operation on the data path
– Based on desired function and signals

3 4

What Is Pipelining What Is Pipelining


Start work ASAP
6 PM 7 8 9 10 11 Midnight

Time 6 PM 7 8 9 10 11 Midnight

30 40 20 30 40 20 30 40 20 30 40 20 Time

T 30 40 40 40 40 20
a A
s T
a A
k
B s • Pipelined laundry
O k
B
takes 3.5 hours for 4
r
d
C O loads
e r
C
r d
D
e
r D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
5 6

1  
9/22/15  

Pipelining Lessons 5 Steps of MIPS Datapath


What Is Figure A.2, Page A-8

Pipelining Instruction Instr. Decode Execute Memory Write


• Pipelining doesn’t help latency Fetch Reg. Fetch Addr. Calc Access Back
6 PM 7 8 9 of single task, it helps
throughput of entire workload Next PC

MUX
Time

Adder
• Pipeline rate limited by slowest Next SEQ PC
T pipeline stage
30 40 40 40 40 20
a
• Multiple tasks operating 4 RS1
Zero?

MUX MUX
A simultaneously

Address
RS2

Memory

Reg File
k

Inst

ALU
• Potential speedup = Number

Memory
pipe stages RD L

Data
O
M

MUX
r B • Unbalanced lengths of pipe D
d stages reduces speedup
Sign
e • Time to “fill” pipeline and time Imm
C Extend
r to “drain” it reduces speedup
IR <= mem[PC] WB Data
D PC <= PC + 4

7 8
Reg[IRrd] <= Reg[IRrs] opIRop Reg[IRrt ]

5 Steps of MIPS Datapath 5 Steps of MIPS Datapath


Figure A.3, Page A-9 Figure A.3, Page A-9

Instruction Instr. Decode Execute Memory Write Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back Fetch Reg. Fetch Addr. Calc Access Back
Next PC Next PC
MUX

MUX
Next SEQ PC Next SEQ PC Next SEQ PC Next SEQ PC
Adder

Adder

4 RS1
Zero?
4 RS1
Zero?
MUX MUX

MUX MUX
Address

MEM/WB

Address

MEM/WB
Memory

Memory

RS2 RS2
EX/MEM

EX/MEM
Reg File

Reg File
ID/EX

ID/EX
IF/ID

IF/ID
ALU

ALU
Memory

Memory
Data

Data
MUX

MUX
IR <= mem[PC];
WB Data

WB Data
Sign Sign
Extend Extend
PC <= PC + 4 Imm Imm

RD RD RD RD RD RD

A <= Reg[IRrs];
WB <= rslt • Data stationary control
B <= Reg[IRrt] 9 10
rslt <= A opIRop B Reg[IRrd] <= WB – local decode for each instruction phase / pipeline stage

Visualizing Pipelining Pipelining is not quite that easy!


Figure A.2, Page A-8

Time (clock cycles) • Limits to pipelining: Hazards prevent next


Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 instruction from executing during its designated
I clock cycle
ALU

n Ifetch Reg DMem Reg


– Structural hazards: HW cannot support this combination
s
t
of instructions (single person to fold and put clothes away)
r. – Data hazards: Instruction depends on result of prior
ALU

Ifetch Reg DMem Reg

instruction still in the pipeline (missing sock)


O
r – Control hazards: Caused by delay between the fetching of
ALU

Ifetch Reg DMem Reg

d instructions and decisions about changes in control flow


e (branches and jumps).
r
ALU

Ifetch Reg DMem Reg

11 12

2  
9/22/15  

Pipeline Hurdles One Memory Port/Structural Hazards


Figure A.4, Page A-14
Definition
• conditions that lead to incorrect behavior if not fixed
• Structural hazard Time (clock cycles)
– two different instructions use same h/w in same cycle Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
• Data hazard
I

ALU
Load Ifetch Reg DMem Reg
– two different instructions use same storage
– must appear as if the instructions execute in correct order n
s

ALU
• Control hazard Instr 1 Ifetch Reg DMem Reg

– one instruction affects which instruction is next t


r.

ALU
Ifetch Reg DMem Reg
Instr 2
Resolution O
• Pipeline interlock logic detects hazards and fixes them r

ALU
Reg Reg
Instr 3 Ifetch DMem

• simple solution: stall -


d
– increases CPI, decreases performance e
r

ALU
Ifetch Reg DMem Reg
Instr 4
• better solution: partial stall -
– some instruction stall, others proceed better to stall early than late
13 14

One Memory Port/Structural Hazards


(Similar to Figure A.5, Page A-15)
Structural Hazards

Time (clock cycles)


Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

I
ALU

Load Ifetch Reg DMem Reg

n
s
ALU

Instr 1 Ifetch Reg DMem Reg


t
r.
ALU

Ifetch Reg DMem Reg


Instr 2
O
r
Stall Bubble Bubble Bubble Bubble Bubble
d This is another way to represent the stall we saw on
e the last few pages.
r
ALU

Ifetch Reg DMem Reg


Instr 3

15 16
How do you “bubble” the pipe?

Structural Hazards Structural Hazards


Dealing with Structural Hazards Structural hazards are reduced with these rules:

Stall • Each instruction uses a resource at most once

• low cost, simple • Always use the resource in the same pipeline stage
• Use the resource for one cycle only
• Increases CPI
Many RISC ISA’s designed with this in mind
• use for rare case since stalling has performance effect
Pipeline hardware resource Sometimes very complex to do this. For example, memory of necessity
is used in the IF and MEM stages.
• useful for multi-cycle resources
Some common Structural Hazards:
• good performance • Memory - we’ve already mentioned this one.
• sometimes complex e.g., RAM • Floating point - Since many floating point instructions require
Replicate resource many cycles, it’s easy for them to interfere with each other.
• Starting up more of one type of instruction than there are
• good performance
resources. For instance, the PA-8600 can support two ALU + two
• increases cost (+ maybe interconnect delay) load/store instructions per cycle - that’s how much hardware it
• useful for cheap or divisible resources has available.
17 18

3  
9/22/15  

Example: Dual-port vs. Single-port


Speed Up Equation for Pipelining
• Machine A: Dual ported memory (Harvard
Architecture)
• Machine B: Single ported memory, but its pipelined
CPIpipelined = Ideal CPI + Average Stall cycles per Inst implementation has a 1.05 times faster clock rate
• Ideal CPI = 1 for both
Ideal CPI × Pipeline depth Cycle Timeunpipelined
Speedup = ×
Ideal CPI + Pipeline stall CPI Cycle Timepipelined • Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
For simple RISC pipeline, CPI = 1:
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
Pipeline depth Cycle Timeunpipelined
Speedup = × = 0.75 x Pipeline Depth
1 + Pipeline stall CPI Cycle Timepipelined SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

• Machine A is 1.33 times faster


19 20

Data Hazard on R1
Figure A.6, Page A-17 Three Generic Data Hazards
Time (clock cycles)

IF ID/RF EX MEM WB • Read After Write (RAW)


I InstrJ tries to read operand before InstrI writes it
ALU

add r1,r2,r3 Ifetch Reg DMem Reg

n
s I: add r1,r2,r3
t
ALU

Ifetch Reg DMem Reg


sub r4,r1,r3 J: sub r4,r1,r3
r.
ALU

O Ifetch Reg DMem Reg


and r6,r1,r7
r • Caused by a “Dependence” (in compiler
d nomenclature). This hazard results from an actual
ALU

Ifetch Reg DMem Reg


or r8,r1,r9
e
r
need for communication.
ALU

xor r10,r1,r11 Ifetch Reg DMem Reg

21 22

Three Generic Data Hazards


Three Generic Data Hazards
• Write After Write (WAW)
• Write After Read (WAR) InstrJ writes operand before InstrI writes it.
InstrJ writes operand before InstrI reads it
I: sub r4,r1,r3 I: sub r1,r4,r3
J: add r1,r2,r3 J: add r1,r2,r3
K: mul r6,r1,r7 K: mul r6,r1,r7

• Called an “anti-dependence” by compiler writers • Called an “output dependence” by compiler writers


This results from reuse of the name “r1” This also results from the reuse of name “r1”
• Can’t happen in MIPS 5 stage pipeline because: • Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and – All instructions take 5 stages, and
– Writes are always in stage 5
– Reads are always in stage 2, and
– Writes are always in stage 5 • Will see WAR and WAW in more complicated pipes
23 24

4  
9/22/15  

Forwarding to Avoid Data Hazard HW Change for Forwarding


Figure A.7, Page A-19
Figure A.23, Page A-37

Time (clock cycles)


I NextPC
n add r1,r2,r3 Ifetch

ALU
Reg DMem Reg

mux
Registers
t

MEM/WR
ALU

EX/MEM
r.

ID/EX
ALU
sub r4,r1,r3 Ifetch Reg DMem Reg

O Data

mux
r Memory

ALU
Ifetch Reg DMem Reg
and r6,r1,r7
d

mux
Immediate
e
r

ALU
Ifetch Reg DMem Reg
or r8,r1,r9

ALU
Ifetch Reg DMem Reg
xor r10,r1,r11

What circuit detects and resolves this hazard?


25 26

Forwarding to Avoid LW-SW Data Hazard Data Hazard Even with Forwarding
Figure A.8, Page A-20 Figure A.9, Page A-21

Time (clock cycles)


I
n add r1,r2,r3 Ifetch Time (clock cycles)
ALU

Reg DMem Reg

s
t
I

ALU
r. lw r1, 0(r2) Ifetch Reg DMem Reg
ALU

lw r4, 0(r1) Ifetch Reg DMem Reg

n
O s
r t

ALU
ALU

Ifetch Reg DMem Reg sub r4,r1,r6 Ifetch Reg DMem Reg
sw r4,12(r1)
d
r.
e
r
ALU

Reg
O

ALU
Ifetch Reg DMem
or r8,r6,r9 and r6,r1,r7 Ifetch Reg DMem Reg

r
d
ALU

Ifetch Reg DMem Reg


xor r10,r9,r11 e

ALU
Ifetch Reg DMem Reg
or r8,r1,r9
r
27 28

Data Hazard Even with Forwarding


(Similar to Figure A.10, Page A-21)

Time (clock cycles)


I
n
lw r1, 0(r2)
ALU

Ifetch Reg DMem Reg


s
t
r.
ALU

sub r4,r1,r6 Ifetch Reg Bubble DMem Reg

O
r
d Bubble
ALU

Ifetch Reg DMem Reg


and r6,r1,r7
e
r
Bubble
ALU

Ifetch Reg DMem


or r8,r1,r9

How is this detected?


9/22/15 29

5  

You might also like