0% found this document useful (0 votes)

34 views53 pages

13 PipelinedProcessorDesign

The document discusses the design of pipelined processors, comparing serial and pipelined execution, and outlining key concepts such as pipeline hazards, data forwarding, and control hazards. It illustrates the advantages of pipelining through examples, including a laundry analogy, and details the MIPS processor's five-stage pipeline. The document also covers performance metrics, including speedup factors and the impact of unbalanced pipeline stages on overall efficiency.

Uploaded by

baims.contents

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views53 pages

13 PipelinedProcessorDesign

Uploaded by

baims.contents

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Pipelined Processor Design

COE 233
Logic Design and Computer Organization
Dr. Muhamed Mudawar

King Fahd University of Petroleum and Minerals

Presentation Outline

 Serial versus Pipelined Execution

 Pipelined Datapath and Control

 Pipeline Hazards

 Data Hazards and Forwarding

 Load Delay, Hazard Detection, and Stall

 Control Hazards

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 2
Laundry Example
 Laundry Example: Three Stages

1. Wash dirty load of clothes

2. Dry wet clothes

3. Fold and put clothes into drawers

 Each stage takes 30 minutes to complete

 Four loads of clothes to wash, dry, and fold A B

C D

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 3
Sequential Laundry

6 PM 7 8 9 10 11 12 AM
Time 30 30 30 30 30 30 30 30 30 30 30 30

 Sequential laundry takes 6 hours for 4 loads

 Intuitively, we can use pipelining to speed up laundry

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 4
Pipelined Laundry: Start Load ASAP

6 PM 7 8 9 PM
30 30 30
30 30 30 Time
30 30 30
30 30 30

A  Pipelined laundry takes 3

hours for 4 loads
B  Speedup factor is 2 for 4
loads
C
 Time to wash, dry, and
D fold one load is still the
same (90 minutes)

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 5
Serial versus Pipelined Execution
 Consider a task that can be divided into k subtasks
 The k subtasks are executed on k different stages
 Each subtask requires one time unit
 The total execution time of the task is k time units
 Pipelining is to overlap the execution
 The k stages work in parallel on k different tasks
 Tasks enter/leave pipeline at the rate of one task per time unit

1 2 … k 1 2 … k
1 2 … k 1 2 … k
1 2 … k 1 2 … k

Serial Execution Pipelined Execution

One completion every k time units One completion every 1 time unit

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 6
Synchronous Pipeline
 Uses clocked registers between stages
 Upon arrival of a clock edge …
 All registers hold the results of previous stages simultaneously

 The pipeline stages are combinational logic circuits

 It is desirable to have balanced stages
 Approximately equal delay in all stages

 Clock period is determined by the maximum stage delay

Input S1 S2 Sk Output

Clock

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 7
Pipeline Performance
 Let ti = time delay in stage Si
 Clock cycle t = max(ti) is the maximum stage delay
 Clock frequency f = 1/t = 1/max(ti)
 A pipeline can process n tasks in k + n – 1 cycles
 k cycles are needed to complete the first task
 n – 1 cycles are needed to complete the remaining n – 1 tasks

 Ideal speedup of a k-stage pipeline over serial execution

Serial execution in cycles nk

Sk = = Sk → k for large n
Pipelined execution in cycles k+n–1

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 8
MIPS Processor Pipeline

 Five stages, one cycle per stage

1. IF: Instruction Fetch from instruction memory

2. ID: Instruction Decode, register read, and J/Br address

3. EX: Execute operation or calculate load/store address

4. MEM: Memory access for load and store

5. WB: Write Back result to register

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 9
Single-Cycle vs Pipelined Performance
 Consider a 5-stage instruction execution in which …
 Instruction fetch = ALU operation = Data memory access = 200 ps
 Register read = register write = 150 ps
 What is the clock cycle of the single-cycle processor?
 What is the clock cycle of the pipelined processor?
 What is the speedup factor of pipelined execution?
 Solution
Single-Cycle Clock = 200+150+200+200+150 = 900 ps

IF Reg ALU MEM Reg

900 ps IF Reg ALU MEM Reg
900 ps

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 10
Single-Cycle versus Pipelined – cont’d
 Pipelined clock cycle = max(200, 150) = 200 ps

IF Reg ALU MEM Reg

200 IF Reg ALU MEM Reg
200 IF Reg ALU MEM Reg
200 200 200 200 200

 CPI for pipelined execution = 1

 One instruction completes each cycle (ignoring pipeline fill)

 Speedup of pipelined execution =

900 ps / 200 ps = 4.5
 Instruction count and CPI are equal in both cases

 Speedup factor is less than 5 (number of pipeline stage)

 Because the pipeline stages are not balanced

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 11
Pipeline Performance Summary
 Pipelining doesn’t improve latency of a single instruction

 However, it improves throughput of entire workload

 Instructions are initiated and completed at a higher rate

 In a k-stage pipeline, k instructions operate in parallel

 Overlapped execution using multiple hardware resources

 Potential speedup = number of pipeline stages k

 Pipeline rate is limited by slowest pipeline stage

 Unbalanced lengths of pipeline stages reduces speedup

 Also, time to fill and drain pipeline reduces speedup

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 12
Next . . .

 Serial versus Pipelined Execution

 Pipelined Datapath and Control

 Pipeline Hazards

 Data Hazards and Forwarding

 Load Delay, Hazard Detection, and Stall

 Control Hazards

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 13
Single-Cycle Datapath
 Shown below is the single-cycle datapath
 How to pipeline this single-cycle datapath?

Answer: Introduce pipeline registers at end of each stage

IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access

WB = Write Back
& Register Read
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

Next PC Address
ExtOp +
Imm16
+1 Ext Zero ALU result

Instruction Rs BusA
Data
RA
A
00

0 Memory Memory
Registers L Address
Rt 0
Address U
PC

1 RB 1 Data_out
Instruction 1
2 0
BusB 0
Rd RW Data_in
1 BusW

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 14
Pipelined Datapath
 Pipeline registers are shown in green, including the PC
 Same clock edge updates all pipeline registers and PC
 In addition to updating register file and data memory (for store)
IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access
& Register Read
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

WB = Write Back
BTA
Next PC Address +
NPC

ExtOp

+1 Imm16 ALU Result

Imm
Ext Zero

Instruction Rs Data

A
RA BusA
Memory A Memory
00

0
Registers L Address

R
Rt

Data
PC

0
1 Address
RB 1
U
Data_out
Inst

Instruction 1
2 0 BusB
Rd RW B 0
Data_in

D
1 BusW

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 15
Problem with Register Destination
 Instruction in ID stage is different from the one in WB stage
 WB stage is writing to a different destination register
 Writing the destination register of the instruction in the ID Stage

IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access

& Register Read
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

WB = Write Back
BTA
Next PC Address +
NPC

ExtOp

+1 Imm16 ALU Result

Imm
Ext Zero

Instruction Rs Data

A
RA BusA
Memory A Memory
00

0
Registers L Address

R
Rt

Data
PC

0
1 Address
RB 1
U
Data_out
Inst

Instruction 1
2 0 BusB
Rd RW B 0
Data_in

D
1 BusW

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 16
Pipelining the Destination Register
 Destination Register should be pipelined from ID to WB
 The WB stage writes back data knowing the destination register

IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access

& Register Read
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

WB = Write Back
BTA
Next PC Address +
NPC ExtOp

+1 Imm16 ALU Result

Imm
Ext Zero

Instruction Rs Data

A
RA BusA
Memory A Memory
00

0
Registers L Address

Data
PC

0
1 Address Rt
RB 1
U
Data_out
Inst

Instruction 1
2 BusB

B
0
Data_in

D
RW BusW
0

Rd4
Rd3
Rd
1
Rd2

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 17
Graphically Representing Pipelines
 Multiple instruction execution over multiple clock cycles
 Instructions are listed in execution order from top to bottom
 Clock cycles move from left to right
 Figure shows the use of resources at each stage and each cycle

Time (in cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

lw $t6, 8($s5) IM Reg ALU DM Reg

Program Execution

add $s1, $s2, $s3 IM Reg ALU DM Reg

Order

ori $s4, $t3, 7 IM Reg ALU DM Reg

sub $t5, $s2, $t3 IM Reg ALU DM Reg

sw $s2, 10($t3) IM Reg ALU DM

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 18
Instruction-Time Diagram
 Instruction-Time Diagram shows:
 Which instruction occupying what stage at each clock cycle
 Instruction flow is pipelined over the 5 stages

Up to five instructions can be in the

pipeline during the same cycle ALU instructions skip
Instruction Level Parallelism (ILP) the MEM stage. Store
instructions skip the
WB stage
Instruction Order

lw $t7, 8($s3) IF ID EX MEM WB

lw $t6, 8($s5) IF ID EX MEM WB
ori $t4, $s3, 7 IF ID EX – WB
sub $s5, $s2, $t3 IF ID EX – WB
sw $s2, 10($s3) IF ID EX MEM –

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 19
Control Signals
IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

WB = Write Back
BTA
Next PC Address +

NPC
ExtOp

+1 Imm16 ALU Result

Imm
Ext Zero

Instruction Rs Data

A
RA BusA
Memory A Memory
00

0
Registers L Address

Data
PC

0
1 Address Rt
RB 1
U
Data_out
Inst

Instruction 1
2 BusB

B
0
Data_in

D
RW BusW
0

Rd4
Rd3
Rd2
Rd
1

clk

PCSrc RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata

Same control signals used in the single-cycle datapath

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 20
Pipelined Control
IF = Instruction Fetch ID = Instruction Decode EX = Execute MEM = Memory Access
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

Pipeline control signals

WB = Write Back
BTA
Next PC Address + just like data

NPC
ExtOp

+1 Imm16 ALU Result

Imm
Ext Zero

Instruction Rs Data

A
RA BusA
Memory A Memory
00

0
Registers L Address

Data
PC

0
1 Address Rt
RB 1
U
Data_out
Inst
Instruction 1
2 BusB

B
0
Data_in

D
RW BusW
0

Rd4
Rd3
Rd2
Rd
PCSrc 1

clk
RegDst RegWr ALUSrc ALUOp MemRd MemWr WBdata
PC
Zero ExtOp
Control
Op
BEQ, BNE J Main & ALU

MEM
EX

func Control

WB
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 21
Pipelined Control – Cont'd
 ID stage generates all the control signals
 Pipeline the control signals as the instruction moves
 Extend the pipeline registers to include the control signals

 Each stage uses some of the control signals

 Instruction Decode and Register Read
 Control signals are generated
 RegDst and ExtOp are used in this stage, J (Jump) is used by PC control

 Execution Stage => ALUSrc, ALUOp, BEQ, BNE

 ALU generates zero signal for PC control logic (Branch Control)

 Memory Stage => MemRd, MemWr, and WBdata

 Write Back Stage => RegWr control signal is used in the last
stage
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 22
Control Signals Summary
Decode Execute Memory Write PC
Stage Stage Stage Back Control
Op
RegDst ExtOp ALUSrc ALUOp MemRd MemWr WBdata RegWr PCSrc

R-Type 1=Rd X 0=Reg func 0 0 0 1 0 = next PC

ADDI 0=Rt 1=sign 1=Imm ADD 0 0 0 1 0 = next PC

SLTI 0=Rt 1=sign 1=Imm SLT 0 0 0 1 0 = next PC

ANDI 0=Rt 0=zero 1=Imm AND 0 0 0 1 0 = next PC

ORI 0=Rt 0=zero 1=Imm OR 0 0 0 1 0 = next PC

LW 0=Rt 1=sign 1=Imm ADD 1 0 1 1 0 = next PC

SW X 1=sign 1=Imm ADD 0 1 X 0 0 = next PC

BEQ X X 0=Reg SUB 0 0 X 0 0 or 2 = BTA

BNE X X 0=Reg SUB 0 0 X 0 0 or 2 = BTA

J X X X X 0 0 X 0 1 = jump target
PCSrc = 0 or 2 (BTA) for BEQ and BNE, depending on the zero flag
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 23
Next . . .

 Serial versus Pipelined Execution

 Pipelined Datapath and Control

 Pipeline Hazards

 Data Hazards and Forwarding

 Load Delay, Hazard Detection, and Stall

 Control Hazards

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 24
Pipeline Hazards
 Hazards: situations that would cause incorrect execution
 If next instruction were launched during its designated clock cycle

1. Structural hazards
 Caused by resource contention
 Using same resource by two instructions during the same cycle

2. Data hazards
 An instruction may compute a result needed by next instruction
 Data hazards are caused by data dependencies between instructions

3. Control hazards
 Caused by instructions that change control flow (branches/jumps)
 Delays in changing the flow of control
 Hazards complicate pipeline control and limit performance
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 25
Structural Hazards
 Problem
 Attempt to use the same hardware resource by two different
instructions during the same clock cycle
 Example Structural Hazard
Two instructions are
 Writing back ALU result in stage 4 attempting to write the
 Conflict with writing load data in stage 5 register file during
same cycle

lw $t6, 8($s5) IF ID EX MEM WB

Instructions

ori $t4, $s3, 7 IF ID EX WB

sub $t5, $s2, $s3 IF ID EX WB
sw $s2, 10($s3) IF ID EX MEM

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 26
Resolving Structural Hazards
 Serious Hazard:
 Hazard cannot be ignored

 Solution 1: Delay Access to Resource

 Must have mechanism to delay instruction access to resource
 Delay all write backs to the register file to stage 5
 ALU instructions bypass stage 4 (memory) without doing anything

 Solution 2: Add more hardware resources (more costly)

 Add more hardware to eliminate the structural hazard
 Redesign the register file to have two write ports
 First write port can be used to write back ALU results in stage 4
 Second write port can be used to write back load data in stage 5

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 27
Data Hazards
 Dependency between instructions causes a data hazard
 The dependent instructions are close to each other
 Pipelined execution might change the order of operand access

 Read After Write – RAW Hazard

 Given two instructions I and J, where I comes before J
 Instruction J should read an operand after it is written by I
 Called a data dependence in compiler terminology

I: add $s1, $s2, $s3 # $s1 is written

J: sub $s4, $s1, $s3 # $s1 is read
 Hazard occurs when J reads the operand before I writes it

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 28
Example of a RAW Data Hazard
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20

sub $s2, $t1, $t3 IM Reg ALU DM Reg

Program Execution

add$s4, $s2, $t5 IM Reg ALU DM Reg

Order

or $s6, $t3, $s2 IM Reg ALU DM Reg

and$s7, $t4, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

 Result of sub is needed by add, or, and, & sw instructions

 Instructions add & or will read old value of $s2 from reg file
 During CC5, $s2 is written at end of cycle, old value is read
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 29
Solution 1: Stalling the Pipeline
Time (in cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
value of $s2 10 10 10 10 10 20 20 20 20
Instruction Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add$s4, $s2, $t5 IM Reg Reg Reg Reg ALU DM Reg

stall stall stall

or $s6, $t3, $s2 IM Reg ALU DM

 Three stall cycles during CC3 thru CC5 (wasting 3 cycles)

 The 3 stall cycles delay the execution of add and the fetching of or

 The 3 stall cycles insert 3 bubbles (No operations) into the ALU

 The add instruction remains in the second stage until CC6

 The or instruction is not fetched until CC6
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 30
Solution 2: Forwarding ALU Result
 The ALU result is forwarded (fed back) to the ALU input
 No bubbles are inserted into the pipeline and no cycles are wasted
 ALU result is forwarded from ALU, MEM, and WB stages

Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20
Program Execution Order

sub $s2, $t1, $t3 IM Reg ALU DM Reg

add$s4, $s2, $t5 IM Reg ALU DM Reg

or $s6, $t3, $s2 IM Reg ALU DM Reg

and$s7, $s6, $s2 IM Reg ALU DM Reg

sw $t8, 10($s2) IM Reg ALU DM

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 31
Implementing Forwarding
 Two multiplexers added at the inputs of A & B registers
 Data from ALU stage, MEM stage, and WB stage is fed back

 Two signals: ForwardA and ForwardB to control forwarding

ForwardA

Imm16 32

Imm
Ext
32 32 ALU result
0 A
Register File

Rs BusA 1
RA A L Address

R
2
Instruction

Rt 3 1 U Data 0
RB

Data
BusB
0 Memory
0 32
1 32 32 Data_out 1

D
B

RW BusW 2 Data_in
3
32
0
Rd2

Rd4
Rd3
1
Rd

clk

ForwardB

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 32
Forwarding Control Signals
Signal Explanation
ForwardA = 0 First ALU operand comes from register file = Value of (Rs)

ForwardA = 1 Forward result of previous instruction to A (from ALU stage)

ForwardA = 2 Forward result of 2nd previous instruction to A (from MEM stage)

ForwardA = 3 Forward result of 3rd previous instruction to A (from WB stage)

ForwardB = 0 Second ALU operand comes from register file = Value of (Rt)

ForwardB = 1 Forward result of previous instruction to B (from ALU stage)

ForwardB = 2 Forward result of 2nd previous instruction to B (from MEM stage)

ForwardB = 3 Forward result of 3rd previous instruction to B (from WB stage)

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 33
Forwarding Example
Instruction sequence: When sub instruction in ID stage
lw $t4, ori will be in the ALU stage
4($t0) lw will be in the MEM stage
ori $t7, $t1, ForwardA = 2 (from MEM stage)
2
sub $t3,$t4,$t7 ori $t7,$t1,2 lw $t4,4($t0)
sub $t3, $t4,
$t7 32 Imm16

Imm
Ext
32 32 ALU result
0 A
Register File

Rs BusA 1
RA L Address

R
2
Instruction

Rt 3 1 U Data 0
RB

Data
BusB
0 Memory
0 32
1 32 32 Data_out 1

D
B

RW BusW 2 Data_in
3
32
0
Rd2

Rd4
Rd3
1
Rd

clk

ForwardB = 1 (from ALU stage)

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 34
RAW Hazard Detection
 Current instruction is being decoded in the Decode stage
 Previous instruction is in the Execute stage
 Second previous instruction is in the Memory stage
 Third previous instruction is in the Write Back stage

If ((Rs != 0) and (Rs == Rd2) and (EX.RegWr)) ForwardA = 1

Else if ((Rs != 0) and (Rs == Rd3) and (MEM.RegWr)) ForwardA = 2
Else if ((Rs != 0) and (Rs == Rd4) and (WB.RegWr)) ForwardA = 3
Else ForwardA = 0

If ((Rt != 0) and (Rt == Rd2) and (EX.RegWr)) ForwardB = 1

Else if ((Rt != 0) and (Rt == Rd3) and (MEM.RegWr)) ForwardB = 2
Else if ((Rt != 0) and (Rt == Rd4) and (WB.RegWr)) ForwardB
= 3
Else ForwardB = 0
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 35
Hazard Detecting and Forwarding Logic
ExtOp

Imm16 32

Imm
Ext
32 32 ALU result
0 A
Register File
Rs BusA 1
RA L Address

R
2
Instruction

Rt 3 1 U Data 0
RB

Data
BusB Memory
0
0 32
1 32 32 Data_out 1

D
B
RW BusW 2 Data_in
3
32
0

Rd2

Rd4
Rd3
1
Rd

clk
ForwardB ForwardA

RegDst
Rs Hazard
Detect &
Rt
ExtOp Forward
ALUSrc MemRd
RegWr ALUOp RegWr MemWr RegWr
Op
WBdata
Main
& ALU
EX

MEM
func
Control

 Serial versus Pipelined Execution

 Pipelined Datapath and Control

 Pipeline Hazards

 Data Hazards and Forwarding

 Load Delay, Hazard Detection, and Stall

 Control Hazards

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 37
Load Delay
 Unfortunately, not all data hazards can be forwarded
 Load has a delay that cannot be eliminated by forwarding

 In the example shown below …

 The LW instruction does not read data until end of CC4
 Cannot forward data to ADD at end of CC3 - NOT possible

Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

lw $s2, 20($t1) IF Reg DM Reg However, load can

Program Order

ALU

forward data to 2nd next

add$s4, $s2, $t5 IF Reg ALU DM Reg and later instructions

or $t6, $t3, $s2 IF Reg ALU DM Reg

and$t7, $s2, $t4 IF Reg ALU DM Reg

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 38
Detecting RAW Hazard after Load
 Detecting a RAW hazard after a Load instruction:
 The load instruction will be in the EX stage

 Instruction that depends on the load data is in the decode stage

 Condition for stalling the pipeline

if ((EX.MemRd == 1) // Detect Load in EX stage

and (ForwardA==1 or ForwardB==1)) Stall // RAW

Hazard

 Insert a bubble into the EX stage after a load instruction

 Bubble is a no-op that wastes one clock cycle

 Delays the dependent instruction after load by one cycle

 Because of RAW hazard
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 39
Stall the Pipeline for one Cycle
 ADD instruction depends on LW  stall at CC3
 Allow Load instruction in ALU stage to proceed
 Freeze PC and Instruction registers (NO instruction is fetched)
 Introduce a bubble into the ALU stage (bubble is a NO-OP)
 Load can forward data to next instruction after delaying it

Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

lw $s2, 20($s1) IM Reg ALU DM Reg

Program Order

add$s4, $s2, $t5 IM stall bubble bubble bubble

Reg ALU DM Reg

or $t6, $s3, $s2 IM Reg ALU DM Reg

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 40
Showing Stall Cycles
 Stall cycles can be shown on instruction-time diagram
 Hazard is detected in the Decode stage
 Stall indicates that instruction is delayed
 Instruction fetching is also delayed after a stall
 Example:
Data forwarding is shown using green arrows

lw $s1, ($t5) IF ID EX MEM WB

lw $s2, 8($s1) IF Stall ID EX MEM WB
add $v0, $s2, $t3 IF Stall ID EX - WB
sub $v1, $s2, $v0 IF ID EX - WB

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 Time

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 41
Hazard Detecting and Forwarding Logic
ExtOp

Imm16 32

Imm
Ext
32 32 ALU result
0 A
Register File
Rs BusA 1
RA L Address

R
2
Instruction

Rt 3 1 U Data 0
RB

Data
BusB
PC

0 Memory
0 32
1 32 32 Data_out 1

D
B
RW BusW 2 Data_in
3
32
0

Rd2

Rd4
Rd3
1
Rd

RegDst
clk
ForwardB ForwardA
Disable PC

Disable IR

Rs Hazard Detect
Forward
Rt
and Stall
ALUSrc MemRd
RegWr ALUOp MemRd RegWr MemWr RegWr
Stall WBdata
Op
Main Control Signals
& ALU 0
EX

func

MEM
Control Bubble = 0 1

WB
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 42
Code Scheduling to Avoid Stalls
 Compilers reorder code in a way to avoid load stalls
 Consider the translation of the following statements:
A = B + C; D = E – F; // A thru F are in Memory
 Slow code:  Fast code: No Stalls
lw $t0, 4($s0) # &B = 4($s0) lw $t0, 4($s0)
lw $t1, 8($s0) # &C = 8($s0) lw $t1, 8($s0)
add $t2, $t0, $t1 # stall cycle lw $t3, 16($s0)
sw $t2, 0($s0) # &A = 0($s0) lw $t4, 20($s0)
lw $t3, 16($s0) # &E = 16($s0)add $t2, $t0,
lw $t4, 20($s0) # &F = 20($s0)$t1
sub $t5, $t3, $t4 # stall cycle sw $t2, 0($s0)
sw $t5, 12($0) # &D = 12($0) sub $t5, $t3,
$t4
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 43
Name Dependence: Write After Read
 Instruction J should write its result after it is read by I
 Called anti-dependence by compiler writers
I: sub $t4, $t1, $t3 # $t1 is read
J: add $t1, $t2, $t3 # $t1 is written
 Results from reuse of the name $t1
 NOT a data hazard in the 5-stage pipeline because:
 Reads are always in stage 2
 Writes are always in stage 5, and
 Instructions are processed in order

 Anti-dependence can be eliminated by renaming

 Use a different destination register for add (eg, $t5)
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 44
Name Dependence: Write After Write
 Same destination register is written by two instructions
 Called output-dependence in compiler terminology
I: sub $t1, $t4, $t3 # $t1 is written
J: add $t1, $t2, $t3 # $t1 is written again
 Not a data hazard in the 5-stage pipeline because:
 All writes are ordered and always take place in stage 5

 However, can be a hazard in more complex pipelines

 If instructions are allowed to complete out of order, and
 Instruction J completes and writes $t1 before instruction I

 Output dependence can be eliminated by renaming $t1

 Read After Read is NOT a name dependence
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 45
Next . . .

 Serial versus Pipelined Execution

 Pipelined Datapath and Control

 Pipeline Hazards

 Data Hazards and Forwarding

 Load Delay, Hazard Detection, and Stall

 Control Hazards

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 46
Control Hazards
 Jump and Branch can cause great performance loss
 Jump instruction needs only the jump target address
 Branch instruction needs two things:
 Branch Result Taken or Not Taken
 Branch Target Address
 PC + 4 If Branch is NOT taken
 PC + 4 + 4 × immediate If Branch is Taken

 Jump and Branch targets are computed in the ID stage

 At which point a new instruction is already being fetched
 Jump Instruction: 1-cycle delay
 Branch: 2-cycle delay for branch result (taken or not taken)

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 47
1-Cycle Jump Delay
 Control logic detects a Jump instruction in the 2nd Stage

 Next instruction is fetched anyway

 Convert Next instruction into bubble (Jump is always taken)

cc1 cc2 cc3 cc4 cc5 cc6 cc7

J L1 IF ID

Next instruction IF Bubble Bubble Bubble Bubble

. . .

Jump
L1: Target instruction Target IF Reg ALU DM Reg
Addr

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 48
2-Cycle Branch Delay
 Control logic detects a Branch instruction in the 2nd Stage
 ALU computes the Branch outcome in the 3rd Stage
 Next1 and Next2 instructions will be fetched anyway
 Convert Next1 and Next2 into bubbles if branch is taken
cc1 cc2 cc3 cc4 cc5 cc6 cc7

Beq $t1,$t2,L1 IF Reg ALU

Next1 IF Reg Bubble Bubble Bubble

Next2 IF Bubble Bubble Bubble Bubble

Branch
L1: target instruction Target IF Reg ALU DM
Addr

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 49
If Branch is NOT Taken . . .
 Branches can be predicted to be NOT taken

 If branch outcome is NOT taken then

 Next1 and Next2 instructions can be executed

 Do not convert Next1 & Next2 into bubbles

 No wasted cycles

cc1 cc2 cc3 cc4 cc5 cc6 cc7

Beq $t1,$t2,L1 IF Reg ALU NOT Taken

Next1 IF Reg ALU DM Reg

Next2 IF Reg ALU DM Reg

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 50
Pipelined Jump and Branch
Branch Target Address

Jump Target = PC[31:28] ‖ Imm26

BTA
Next PC Address +

NPC
ForwardA
Imm16 32 Zero
+1

Imm
Ext
32

Instruction 0 A

Memory

R
0 2
Rt 3 1 U
Address RB
PC

1 BusB

Instruction
0
0
2 Instruction 0 32
1

D
B
1 RW BusW 2
3
Bubble = NOP 32
PCSrc 0

Rd2

Rd3
1
Rd
Jump
Disable PC

Kill1
Disable IR

kills next
ForwardB
instruction
Rs Rd2, Rd3, Rd4
Forward & Stall RegWr, MemRd
Rt

Stall
Kill2

Op
Taken branch kills two
PC Main Control Signals
J & ALU 0 Control Signals

MEM
Control func

EX
BEQ, BNE J Control Bubble = 0 1
Zero BEQ, BNE

Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 51
PC Control for Pipelined Jump and Branch
if ((BEQ && Zero) || (BNE && !
BEQ BNE J
Zero))
Zero
{ Jmp=0; Br=1; Kill1=1; Kill2=1;
}
else if (J)
{ Jmp=1; Br=0; Kill1=1; Kill2=0;
}
else
Br = (( BEQ · Zero ) + (BNE ·
{Zero ))
Jmp=0; Br=0; Kill1=0; Kill2=0;
}Jmp = J · Br
Kill1 = J + Br Kill2 Kill1 Br Jmp
Kill2 = Br PCSrc
PCSrc = { Br, Jmp } // 0, 1, or 2
Pipelined Processor Design COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 52
Jump and Branch Impact on CPI
 Base CPI = 1 without counting jump and branch
 Unconditional Jump = 5%, Conditional branch = 20%
 90% of conditional branches are taken
 Jump kills next instruction, Taken Branch kills next two
 What is the effect of jump and branch on the CPI?
Solution:
 Jump adds 1 wasted cycle for 5% of instructions = 1 × 0.05
 Branch adds 2 wasted cycles for 20% × 90% of instructions
= 2 × 0.2 × 0.9 = 0.36
 New CPI = 1 + 0.05 + 0.36 = 1.41 (due to wasted cycles)

Pipelined Processor Design: Computer Architecture & Assembly Language Prof. Muhamed Mudawar
No ratings yet
Pipelined Processor Design: Computer Architecture & Assembly Language Prof. Muhamed Mudawar
66 pages
Pipelined Processor Design Overview
No ratings yet
Pipelined Processor Design Overview
106 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
Bản Sao Của Lecture 9 - Pipelined Processor Design
No ratings yet
Bản Sao Của Lecture 9 - Pipelined Processor Design
11 pages
CSE332 / EEE336 Computer Organization & Architecture Pipelining I
No ratings yet
CSE332 / EEE336 Computer Organization & Architecture Pipelining I
21 pages
Pipelined Processor Design: Computer Architecture and Assembly Language Prof. Muhamed Mudawar
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language Prof. Muhamed Mudawar
61 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
The Improvement of The Personal Computer
No ratings yet
The Improvement of The Personal Computer
74 pages
05 Pipelining
No ratings yet
05 Pipelining
34 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Understanding Processor Pipelining
No ratings yet
Understanding Processor Pipelining
28 pages
07 Pipeline Notes
No ratings yet
07 Pipeline Notes
145 pages
Pipelining: 5-Stage Pipeline: Mahdi Nazm Bojnordi
No ratings yet
Pipelining: 5-Stage Pipeline: Mahdi Nazm Bojnordi
35 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Single Cycle Processor Design: COE 233 Logic Design and Computer Organization
No ratings yet
Single Cycle Processor Design: COE 233 Logic Design and Computer Organization
41 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
اسمبلي ٩
No ratings yet
اسمبلي ٩
3 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Parallel Processing & Pipelining
No ratings yet
Parallel Processing & Pipelining
33 pages
Processor Organization (Part 2)
No ratings yet
Processor Organization (Part 2)
42 pages
Week 10
No ratings yet
Week 10
12 pages
Pipelined Processor Execution Diagram
100% (1)
Pipelined Processor Execution Diagram
31 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Pipe Lining
No ratings yet
Pipe Lining
14 pages
Pipelined MIPS Processor Design
No ratings yet
Pipelined MIPS Processor Design
51 pages
Ambo University Waliso Campus: Dep:-Information Technology Group 8 It2 Year
No ratings yet
Ambo University Waliso Campus: Dep:-Information Technology Group 8 It2 Year
10 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
07 MIPS Pipelining CH4
No ratings yet
07 MIPS Pipelining CH4
73 pages
ACA Handwriten Notes Chat GPT
No ratings yet
ACA Handwriten Notes Chat GPT
52 pages
MIPS Pipeline Performance Guide
No ratings yet
MIPS Pipeline Performance Guide
20 pages
Enhancing CPU Performance with Pipelining
No ratings yet
Enhancing CPU Performance with Pipelining
17 pages
COA Module 3 PPT Part 2
No ratings yet
COA Module 3 PPT Part 2
62 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
Understanding Pipelining Techniques
No ratings yet
Understanding Pipelining Techniques
21 pages
Week 11
No ratings yet
Week 11
33 pages
4 29 03 ImplementingMIPS 0429
No ratings yet
4 29 03 ImplementingMIPS 0429
45 pages
677adcc290db7CA Lab11 Fall2024
No ratings yet
677adcc290db7CA Lab11 Fall2024
8 pages
General Principles of Pipelining: Andrew Warfield CS313
No ratings yet
General Principles of Pipelining: Andrew Warfield CS313
25 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
77 pages
Parallel Processing and Pipelining
No ratings yet
Parallel Processing and Pipelining
53 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
Lecture 9
No ratings yet
Lecture 9
18 pages
Lec4 - ILP Pipelining Intro
No ratings yet
Lec4 - ILP Pipelining Intro
24 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
14 pages
5 Pipelining
No ratings yet
5 Pipelining
38 pages
Pipelining in Microcontroller Design
No ratings yet
Pipelining in Microcontroller Design
149 pages
Coa Notes Unit 5
No ratings yet
Coa Notes Unit 5
55 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Pipelining in MIPS Architecture
No ratings yet
Pipelining in MIPS Architecture
32 pages
MIPS Processor Design and Pipelining
No ratings yet
MIPS Processor Design and Pipelining
95 pages
Lecture-4-08 01 2025
No ratings yet
Lecture-4-08 01 2025
35 pages
Receipt - 2024-06-17T160151.332
No ratings yet
Receipt - 2024-06-17T160151.332
1 page
Evaluation Tool - Choking
No ratings yet
Evaluation Tool - Choking
2 pages
Understanding Legal Retainers in Ethics
No ratings yet
Understanding Legal Retainers in Ethics
8 pages
Ngữ Văn 12 Kiểm Tra Cuối Kì I 2023
No ratings yet
Ngữ Văn 12 Kiểm Tra Cuối Kì I 2023
5 pages
AP US Unit 9 Topic 2 Noteguides
No ratings yet
AP US Unit 9 Topic 2 Noteguides
6 pages
Ceyda Akgun Celikel (@ceydakgunn) TikTok
No ratings yet
Ceyda Akgun Celikel (@ceydakgunn) TikTok
1 page
Introduction To The Philippine Constitution
No ratings yet
Introduction To The Philippine Constitution
50 pages
ICAO Aviation English Exercises
100% (1)
ICAO Aviation English Exercises
5 pages
Ground 124707
No ratings yet
Ground 124707
1 page
IFM Notes
No ratings yet
IFM Notes
2 pages
65b Evidence Act
No ratings yet
65b Evidence Act
6 pages
Application Note 323 - D
No ratings yet
Application Note 323 - D
16 pages
Issues and Challenges: Cloud Computing E-Government in Developing Countries
No ratings yet
Issues and Challenges: Cloud Computing E-Government in Developing Countries
6 pages
English Test for Grade 4 Students
No ratings yet
English Test for Grade 4 Students
10 pages
Informative Speech Outline
No ratings yet
Informative Speech Outline
5 pages
Congreso 2024 Program Academico
No ratings yet
Congreso 2024 Program Academico
91 pages
Bridge PDF
No ratings yet
Bridge PDF
57 pages
High-Speed Integrated Motor Spindle
No ratings yet
High-Speed Integrated Motor Spindle
5 pages
Courier-Journal Special Report: Coal Ash - A Big Unknown
100% (2)
Courier-Journal Special Report: Coal Ash - A Big Unknown
4 pages
1632
No ratings yet
1632
663 pages
Water Facts for Students
No ratings yet
Water Facts for Students
19 pages
Module 1 (Defining The Self) - Lesson 2
No ratings yet
Module 1 (Defining The Self) - Lesson 2
4 pages
金刚经
No ratings yet
金刚经
13 pages
Legal Impacts on Business Operations
100% (1)
Legal Impacts on Business Operations
27 pages
Msce Phy I
No ratings yet
Msce Phy I
14 pages
DOH AO No 2020 0029
No ratings yet
DOH AO No 2020 0029
13 pages
Understanding Medical Terminology Basics
No ratings yet
Understanding Medical Terminology Basics
14 pages
Employee Welfare Study at Idappadi College
No ratings yet
Employee Welfare Study at Idappadi College
4 pages
TRANSYT Brochure July 2021
No ratings yet
TRANSYT Brochure July 2021
5 pages
Ex - 2 - Full Wave Rectifier With Filter
No ratings yet
Ex - 2 - Full Wave Rectifier With Filter
6 pages