0% found this document useful (0 votes)

2 views5 pages

CS530-Fall2015-Lecture9

The document discusses instruction level parallelism (ILP) and pipelining, focusing on how pipelining can improve throughput but not latency of individual tasks. It outlines the structure of a MIPS datapath, the challenges of hazards that can occur during pipelining, and methods to resolve these issues. Key concepts include the types of hazards (structural, data, control) and techniques like forwarding to mitigate data hazards.

Uploaded by

oalqudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views5 pages

CS530-Fall2015-Lecture9

Uploaded by

oalqudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

9/22/15

Pipelining & ILP

Instruction • ILP is focus of Chapter 3

Level
Parallelism: • Appendix C discusses basics
Chapter 3 &
Appendix C,
Part 1
Gregory D. Peterson
[email protected]

Datapath vs Control What Is Pipelining

Datapath Controller • Laundry Example
• Ann, Brian, Cathy, Dave
signals each have one load of clothes A B C D
to wash, dry, and fold
• Washer takes 30 minutes

Control Points • Dryer takes 40 minutes

• Datapath: Storage, FU, interconnect sufficient to perform the desired functions
– Inputs are Control Points
– Outputs are signals
• “Folder” takes 20 minutes
• Controller: State machine to orchestrate operation on the data path
– Based on desired function and signals

3 4

What Is Pipelining What Is Pipelining

Start work ASAP
6 PM 7 8 9 10 11 Midnight

Time 6 PM 7 8 9 10 11 Midnight

30 40 20 30 40 20 30 40 20 30 40 20 Time

T 30 40 40 40 40 20
a A
s T
a A
k
B s • Pipelined laundry
O k
B
takes 3.5 hours for 4
r
d
C O loads
e r
C
r d
D
e
r D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
5 6

1
9/22/15

Pipelining Lessons 5 Steps of MIPS Datapath

What Is Figure A.2, Page A-8

Pipelining Instruction Instr. Decode Execute Memory Write

• Pipelining doesn’t help latency Fetch Reg. Fetch Addr. Calc Access Back
6 PM 7 8 9 of single task, it helps
throughput of entire workload Next PC

MUX
Time

Adder
• Pipeline rate limited by slowest Next SEQ PC
T pipeline stage
30 40 40 40 40 20
a
• Multiple tasks operating 4 RS1
Zero?

MUX MUX
A simultaneously

Address
RS2

Memory

Reg File
k

Inst

ALU
• Potential speedup = Number

Memory
pipe stages RD L

Data
O
M

MUX
r B • Unbalanced lengths of pipe D
d stages reduces speedup
Sign
e • Time to “fill” pipeline and time Imm
C Extend
r to “drain” it reduces speedup
IR <= mem[PC] WB Data
D PC <= PC + 4

7 8
Reg[IRrd] <= Reg[IRrs] opIRop Reg[IRrt ]

5 Steps of MIPS Datapath 5 Steps of MIPS Datapath

Figure A.3, Page A-9 Figure A.3, Page A-9

Instruction Instr. Decode Execute Memory Write Instruction Instr. Decode Execute Memory Write
Fetch Reg. Fetch Addr. Calc Access Back Fetch Reg. Fetch Addr. Calc Access Back
Next PC Next PC
MUX

MUX
Next SEQ PC Next SEQ PC Next SEQ PC Next SEQ PC
Adder

Adder

4 RS1
Zero?
4 RS1
Zero?
MUX MUX

MUX MUX
Address

MEM/WB

Address

MEM/WB
Memory

Memory

RS2 RS2
EX/MEM

EX/MEM
Reg File

Reg File
ID/EX

ID/EX
IF/ID

IF/ID
ALU

ALU
Memory

Memory
Data

Data
MUX

MUX
IR <= mem[PC];
WB Data

WB Data
Sign Sign
Extend Extend
PC <= PC + 4 Imm Imm

RD RD RD RD RD RD

A <= Reg[IRrs];
WB <= rslt • Data stationary control
B <= Reg[IRrt] 9 10
rslt <= A opIRop B Reg[IRrd] <= WB – local decode for each instruction phase / pipeline stage

Visualizing Pipelining Pipelining is not quite that easy!

Figure A.2, Page A-8

Time (clock cycles) • Limits to pipelining: Hazards prevent next

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 instruction from executing during its designated
I clock cycle
ALU

n Ifetch Reg DMem Reg

– Structural hazards: HW cannot support this combination
s
t
of instructions (single person to fold and put clothes away)
r. – Data hazards: Instruction depends on result of prior
ALU

Ifetch Reg DMem Reg

instruction still in the pipeline (missing sock)

O
r – Control hazards: Caused by delay between the fetching of
ALU

Ifetch Reg DMem Reg

d instructions and decisions about changes in control flow

e (branches and jumps).
r
ALU

Ifetch Reg DMem Reg

11 12

2
9/22/15

Pipeline Hurdles One Memory Port/Structural Hazards

Figure A.4, Page A-14
Definition
• conditions that lead to incorrect behavior if not fixed
• Structural hazard Time (clock cycles)
– two different instructions use same h/w in same cycle Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
• Data hazard
I

ALU
Load Ifetch Reg DMem Reg
– two different instructions use same storage
– must appear as if the instructions execute in correct order n
s

ALU
• Control hazard Instr 1 Ifetch Reg DMem Reg

– one instruction affects which instruction is next t

ALU
Ifetch Reg DMem Reg
Instr 2
Resolution O
• Pipeline interlock logic detects hazards and fixes them r

ALU
Reg Reg
Instr 3 Ifetch DMem

• simple solution: stall -

d
– increases CPI, decreases performance e
r

ALU
Ifetch Reg DMem Reg
Instr 4
• better solution: partial stall -
– some instruction stall, others proceed better to stall early than late
13 14

One Memory Port/Structural Hazards

(Similar to Figure A.5, Page A-15)
Structural Hazards

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

I
ALU

Load Ifetch Reg DMem Reg

n
s
ALU

Instr 1 Ifetch Reg DMem Reg

t
r.
ALU

Ifetch Reg DMem Reg

Instr 2
O
r
Stall Bubble Bubble Bubble Bubble Bubble
d This is another way to represent the stall we saw on
e the last few pages.
r
ALU

Ifetch Reg DMem Reg

Instr 3

15 16
How do you “bubble” the pipe?

Structural Hazards Structural Hazards

Dealing with Structural Hazards Structural hazards are reduced with these rules:

Stall • Each instruction uses a resource at most once

• low cost, simple • Always use the resource in the same pipeline stage
• Use the resource for one cycle only
• Increases CPI
Many RISC ISA’s designed with this in mind
• use for rare case since stalling has performance effect
Pipeline hardware resource Sometimes very complex to do this. For example, memory of necessity
is used in the IF and MEM stages.
• useful for multi-cycle resources
Some common Structural Hazards:
• good performance • Memory - we’ve already mentioned this one.
• sometimes complex e.g., RAM • Floating point - Since many floating point instructions require
Replicate resource many cycles, it’s easy for them to interfere with each other.
• Starting up more of one type of instruction than there are
• good performance
resources. For instance, the PA-8600 can support two ALU + two
• increases cost (+ maybe interconnect delay) load/store instructions per cycle - that’s how much hardware it
• useful for cheap or divisible resources has available.
17 18

3
9/22/15

Example: Dual-port vs. Single-port

Speed Up Equation for Pipelining
• Machine A: Dual ported memory (Harvard
Architecture)
• Machine B: Single ported memory, but its pipelined
CPIpipelined = Ideal CPI + Average Stall cycles per Inst implementation has a 1.05 times faster clock rate
• Ideal CPI = 1 for both
Ideal CPI × Pipeline depth Cycle Timeunpipelined
Speedup = ×
Ideal CPI + Pipeline stall CPI Cycle Timepipelined • Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
For simple RISC pipeline, CPI = 1:
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
Pipeline depth Cycle Timeunpipelined
Speedup = × = 0.75 x Pipeline Depth
1 + Pipeline stall CPI Cycle Timepipelined SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

• Machine A is 1.33 times faster

19 20

Data Hazard on R1
Figure A.6, Page A-17 Three Generic Data Hazards
Time (clock cycles)

IF ID/RF EX MEM WB • Read After Write (RAW)

I InstrJ tries to read operand before InstrI writes it
ALU

add r1,r2,r3 Ifetch Reg DMem Reg

n
s I: add r1,r2,r3
t
ALU

Ifetch Reg DMem Reg

sub r4,r1,r3 J: sub r4,r1,r3
r.
ALU

O Ifetch Reg DMem Reg

and r6,r1,r7
r • Caused by a “Dependence” (in compiler
d nomenclature). This hazard results from an actual
ALU

Ifetch Reg DMem Reg

or r8,r1,r9
e
r
need for communication.
ALU

xor r10,r1,r11 Ifetch Reg DMem Reg

21 22

Three Generic Data Hazards

Three Generic Data Hazards
• Write After Write (WAW)
• Write After Read (WAR) InstrJ writes operand before InstrI writes it.
InstrJ writes operand before InstrI reads it
I: sub r4,r1,r3 I: sub r1,r4,r3
J: add r1,r2,r3 J: add r1,r2,r3
K: mul r6,r1,r7 K: mul r6,r1,r7

• Called an “anti-dependence” by compiler writers • Called an “output dependence” by compiler writers

This results from reuse of the name “r1” This also results from the reuse of name “r1”
• Can’t happen in MIPS 5 stage pipeline because: • Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and – All instructions take 5 stages, and
– Writes are always in stage 5
– Reads are always in stage 2, and
– Writes are always in stage 5 • Will see WAR and WAW in more complicated pipes
23 24

4
9/22/15

Forwarding to Avoid Data Hazard HW Change for Forwarding

Figure A.7, Page A-19
Figure A.23, Page A-37

Time (clock cycles)

I NextPC
n add r1,r2,r3 Ifetch

ALU
Reg DMem Reg

mux
Registers
t

MEM/WR
ALU

EX/MEM
r.

ID/EX
ALU
sub r4,r1,r3 Ifetch Reg DMem Reg

O Data

mux
r Memory

ALU
Ifetch Reg DMem Reg
and r6,r1,r7
d

mux
Immediate
e
r

ALU
Ifetch Reg DMem Reg
or r8,r1,r9

ALU
Ifetch Reg DMem Reg
xor r10,r1,r11

What circuit detects and resolves this hazard?

25 26

Forwarding to Avoid LW-SW Data Hazard Data Hazard Even with Forwarding
Figure A.8, Page A-20 Figure A.9, Page A-21

Time (clock cycles)

I
n add r1,r2,r3 Ifetch Time (clock cycles)
ALU

Reg DMem Reg

s
t
I

ALU
r. lw r1, 0(r2) Ifetch Reg DMem Reg
ALU

lw r4, 0(r1) Ifetch Reg DMem Reg

n
O s
r t

ALU
ALU

Ifetch Reg DMem Reg sub r4,r1,r6 Ifetch Reg DMem Reg
sw r4,12(r1)
d
r.
e
r
ALU

Reg
O

ALU
Ifetch Reg DMem
or r8,r6,r9 and r6,r1,r7 Ifetch Reg DMem Reg

r
d
ALU

Ifetch Reg DMem Reg

xor r10,r9,r11 e

ALU
Ifetch Reg DMem Reg
or r8,r1,r9
r
27 28

Data Hazard Even with Forwarding

(Similar to Figure A.10, Page A-21)

Time (clock cycles)

I
n
lw r1, 0(r2)
ALU

Ifetch Reg DMem Reg

s
t
r.
ALU

sub r4,r1,r6 Ifetch Reg Bubble DMem Reg

O
r
d Bubble
ALU

Ifetch Reg DMem Reg

and r6,r1,r7
e
r
Bubble
ALU

Ifetch Reg DMem

or r8,r1,r9

How is this detected?

9/22/15 29

Pipeline
100% (2)
Pipeline
8 pages
The Big Picture: Requirements Algorithms Prog. Lang./Os Isa Uarch Circuit Device
No ratings yet
The Big Picture: Requirements Algorithms Prog. Lang./Os Isa Uarch Circuit Device
60 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
60 pages
Giub 20223 62 17427 2024-06-18T10 47 08
No ratings yet
Giub 20223 62 17427 2024-06-18T10 47 08
32 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Chapter 17_Pipelining Hazards
No ratings yet
Chapter 17_Pipelining Hazards
33 pages
Computer Architecture: Appendix A Pipelining Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Appendix A Pipelining Prof. Jerry Breecher CSCI 240 Fall 2003
58 pages
870use10110 - Manual PLC Schneider
No ratings yet
870use10110 - Manual PLC Schneider
376 pages
Pipelining
No ratings yet
Pipelining
43 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
Lec 1
No ratings yet
Lec 1
30 pages
03 Pipeline
0% (1)
03 Pipeline
38 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Lec07 Pipelining Review
No ratings yet
Lec07 Pipelining Review
121 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
The Big Picture: Requirements Algorithms Prog. Lang./Os Isa Uarch Circuit Device
No ratings yet
The Big Picture: Requirements Algorithms Prog. Lang./Os Isa Uarch Circuit Device
60 pages
Pipelining and parallel processing
No ratings yet
Pipelining and parallel processing
26 pages
3-Pipelining_241110_203716
No ratings yet
3-Pipelining_241110_203716
59 pages
Lecture-4-08.01.2025
No ratings yet
Lecture-4-08.01.2025
35 pages
Smart Energy Meter
100% (2)
Smart Energy Meter
99 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
02a ILP Pipeline
No ratings yet
02a ILP Pipeline
40 pages
1. Lecture 13 Pipelining
No ratings yet
1. Lecture 13 Pipelining
12 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Chapter_04_processor_2
No ratings yet
Chapter_04_processor_2
28 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
L04-Pipelining
No ratings yet
L04-Pipelining
38 pages
MIPS
No ratings yet
MIPS
70 pages
8 Pipeline Ddp Control
No ratings yet
8 Pipeline Ddp Control
54 pages
Pipelining2019_(1)[1]
No ratings yet
Pipelining2019_(1)[1]
82 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
9
No ratings yet
9
22 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
An Introduction To Microprocessor Architecture Using 8085 As A Classic Processor
No ratings yet
An Introduction To Microprocessor Architecture Using 8085 As A Classic Processor
24 pages
Pipeline
No ratings yet
Pipeline
39 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Cha 1 Intro Architec 8086
No ratings yet
Cha 1 Intro Architec 8086
39 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
Unit 4
No ratings yet
Unit 4
98 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Semest Er Subject Name Subject Code Exam Date Sessions
No ratings yet
Semest Er Subject Name Subject Code Exam Date Sessions
20 pages
CAO Pipelining Lecture
No ratings yet
CAO Pipelining Lecture
50 pages
PANASONIC tx21s4tP
No ratings yet
PANASONIC tx21s4tP
32 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Plasma TV and LCD TV No Video Audio Firmware Update
No ratings yet
Plasma TV and LCD TV No Video Audio Firmware Update
12 pages
Steps For Selection of Microcontrollers
No ratings yet
Steps For Selection of Microcontrollers
13 pages
Digital Assignment 2
No ratings yet
Digital Assignment 2
20 pages
d7600022lf Photon PT7320-51 - 2 - 1W
No ratings yet
d7600022lf Photon PT7320-51 - 2 - 1W
9 pages
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
No ratings yet
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
20 pages
Datasheet 2
No ratings yet
Datasheet 2
11 pages
CO Pipelining PDF notes
No ratings yet
CO Pipelining PDF notes
10 pages
Logic Gates: Group 2 - ICT
No ratings yet
Logic Gates: Group 2 - ICT
19 pages
Interrupt Notes
No ratings yet
Interrupt Notes
10 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
ATI Technologies
No ratings yet
ATI Technologies
8 pages
EEE 105 Lab Exercises 6 and 7: Multi-Cycle MIPS Processor I and II
No ratings yet
EEE 105 Lab Exercises 6 and 7: Multi-Cycle MIPS Processor I and II
4 pages
CPE 14 Reviewer Module 1 2
No ratings yet
CPE 14 Reviewer Module 1 2
6 pages
Minimum and Maximum Modes
No ratings yet
Minimum and Maximum Modes
21 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Asm 8086 14
No ratings yet
Asm 8086 14
6 pages
Unit 2 Boolean Algebra and Logic Gates Complete Notes
No ratings yet
Unit 2 Boolean Algebra and Logic Gates Complete Notes
23 pages
VLSI Design Syllabus - 2018 Scheme
No ratings yet
VLSI Design Syllabus - 2018 Scheme
2 pages
2-Bit_magnitude_comparator_using_GDI_technique
No ratings yet
2-Bit_magnitude_comparator_using_GDI_technique
5 pages
Experiment 3 Report
No ratings yet
Experiment 3 Report
4 pages
hw1 Sol PDF
No ratings yet
hw1 Sol PDF
1 page
Calc Project
No ratings yet
Calc Project
3 pages
Summative Assessment 3 First Grading
No ratings yet
Summative Assessment 3 First Grading
2 pages
UEC612
No ratings yet
UEC612
1 page
Sync Vs Async Resets
No ratings yet
Sync Vs Async Resets
4 pages
Power Board V3.3
No ratings yet
Power Board V3.3
1 page
Static Timing Analysis - Maharshi
100% (1)
Static Timing Analysis - Maharshi
29 pages

CS530-Fall2015-Lecture9

Uploaded by

CS530-Fall2015-Lecture9

Uploaded by

9/22/15

Pipelining & ILP

Instruction • ILP is focus of Chapter 3

Datapath vs Control What Is Pipelining

Control Points • Dryer takes 40 minutes

What Is Pipelining What Is Pipelining

Pipelining Lessons 5 Steps of MIPS Datapath

Pipelining Instruction Instr. Decode Execute Memory Write

5 Steps of MIPS Datapath 5 Steps of MIPS Datapath

Visualizing Pipelining Pipelining is not quite that easy!

Time (clock cycles) • Limits to pipelining: Hazards prevent next

n Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

instruction still in the pipeline (missing sock)

Ifetch Reg DMem Reg

d instructions and decisions about changes in control flow

Ifetch Reg DMem Reg

Pipeline Hurdles One Memory Port/Structural Hazards

– one instruction affects which instruction is next t

• simple solution: stall -

One Memory Port/Structural Hazards

Time (clock cycles)

Load Ifetch Reg DMem Reg

Instr 1 Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

Structural Hazards Structural Hazards

Stall • Each instruction uses a resource at most once

Example: Dual-port vs. Single-port

• Machine A is 1.33 times faster

IF ID/RF EX MEM WB • Read After Write (RAW)

add r1,r2,r3 Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

O Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

xor r10,r1,r11 Ifetch Reg DMem Reg

Three Generic Data Hazards

• Called an “anti-dependence” by compiler writers • Called an “output dependence” by compiler writers

Forwarding to Avoid Data Hazard HW Change for Forwarding

Time (clock cycles)

What circuit detects and resolves this hazard?

Time (clock cycles)

Reg DMem Reg

lw r4, 0(r1) Ifetch Reg DMem Reg

Ifetch Reg DMem Reg

Data Hazard Even with Forwarding

Time (clock cycles)

Ifetch Reg DMem Reg

sub r4,r1,r6 Ifetch Reg Bubble DMem Reg

Ifetch Reg DMem Reg

Ifetch Reg DMem

How is this detected?

You might also like