0% found this document useful (0 votes)

70 views35 pages

Introduction To Pipelining Introduction To Pipelining

Pipelining is a technique that improves processor throughput by overlapping the execution of multiple instructions. It involves breaking down instruction execution into discrete stages (e.g fetch, decode, execute etc.) and allowing different instructions to be in different stages at the same time. This results in an instruction completing execution every clock cycle on average instead of waiting for the full execution of the previous instruction. The document describes pipelining using the analogy of an assembly line for laundry, showing how four loads of laundry can be completed in 3.5 hours with pipelining instead of 6 hours sequentially. It then discusses how pipelining is implemented in processors using the 5-stage DLX pipeline as an example.

Uploaded by

Kokila Ishwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views35 pages

Introduction To Pipelining Introduction To Pipelining

Uploaded by

Kokila Ishwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 35

Introduction to

Pipelining
Pipelining:Daily Works!!!
Laundry Example
• W, X, Y, Z each have one load of
clothes to wash, dry, and fold
A B C D

• Washer takes 30 minutes

• Dryer takes 40
minutes
• Folder takes 20 minutes
Sequential Laundry
Time
6 PM 7 8 9 10 11 Midnight

30 40 20 30 40 20 30 40 20 30 40 20
90
T
a
s
A 90
k

O
B 90

r
d C 90
e
r
D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
Pipelined Laundry
6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20
90
T
a
s
A
90
k

O B
90
r
d
e C
90
r

D
Pipelined laundry takes 3.5 hours for 4 loads
Pipelining Lessons
• Pipelining doesn’t help
6 PM 7 8 9 latency of single task, it
helps throughput of entire
Time
workload
• Pipeline rate is limited by
30 40 40 40 40 20 the slowest pipeline stage
• Multiple tasks operating
T A simultaneously
a
s • Potential speedup = Number
k B Draining pipe stages
• Unbalanced lengths of pipe
O Filling stages reduce speedup
r C • Time to “fill” pipeline and
d time to “drain” it reduces
e speedup
r D
What is pipelining?

• A pipeline is a series of stage, where some work is done at each

stage.

• Its an implementation technique whereby multiple instruction are

overlapped in execution.

• Implementation technique that exploits parallelism among the

instruction in a sequential instruction stream.

• A pipelined processor consists of sequence of processing

circuits called segment or stages through which a stream of
operands can be passed.
Why is pipelining desirable?
• Used to improved performance beyond what can be
achieved with non pipelined processing.
• Yields a reduction in the average execution time per
instruction.
Pipeline Designer Goal
• To balance the length of each pipeline stage
• If stage are well balanced, then the time per instruction on
the pipelined machine is equal to (T)

T = time per instruction on the pipelined machine

Number of pipe stages

Then, the speed up from pipelining equals the number of pipe

stages
Types of Pipelines
• Instructional Pipeline
– Where different stages of an instruction fetch and
execution are handled in a pipeline

• Arithmetic Pipeline
– Where different stages of an arithmetic operation are
handled along the stages of pipeline.
Instructional Pipeline
Contd…
• The processing of an instruction need not be divided into only two steps. To gain
further speed up, the pipeline must have more stages
• consider the following decomposition of the instruction
execution
- Fetch Instruction (FI): Read the next expected instruction into a
buffer.
- Decode Instruction ((DI): Determine the opcode and the operand
specifiers.
-Calculate Operand (CO): calculate the effective address of each
source operand.
- Fetch Operands(FO): Fetch each operand from memory.

- Execute Instruction (EI): Perform the indicated operation.

- Write Operand (WO): Store the result in memory.

The timing diagram …
Implementation Pipelining using
DLX
5 Steps of DLX Instr. Execution:
Step1
Step 1: Instruction fetch cycle (IF)
– Read instruction from memory and store into IR
• IR ← Mem[PC]

– Calculate the next instruction address

• NPC ← PC+4
• 1 instruction is stored in consecutive 4 bytes

Add NPC
+4
PC

Instr. IR
Memory
5 Steps of DLX Instr. Execution:
Step2
Step 2: Instruction decode/register fetch cycle (ID)
– Read source registers to A and B
A ← Regs[IR6..10 ]
B ← Regs[IR11..15 ]

– Make 16 bits sign extension of A

Reg
16-bit immediate field to make a IR File
B
32-bit immediate value
Imm ← ((IR16 )16 ## IR16..31 ) Rd
b

– Decoding is done in parallel: OP

fixed-field decoding Sign
Imm
b ← Rd 16 Ext 32
5 Steps of DLX Instr. Execution:
Step 3
Step 3: Execution/effective address cycle (EX):
– Memory reference: Effective Address calculation
» ALUOutput ← A + Imm

– Register-register ALU instruction: Perform ALU operation with R’s

» ALUOutput ← A func B; func B

– Register-Immediate ALU instruction: Perform ALU operation with

immediate operand
» ALUOutput ← A op Imm

– Branch: Effective Address calculation for branch target address

Determine condition code
» ALUOutput ← NPC + Imm; Cond ← (A op 0)
Step 3 EX
Zero? Cond

NPC

MUX
A
ALU ALUOut

B MUX

Imm

OP
5 Steps of DLX Instr. Execution:
Step 4
Step 4: Memory access/branch completion cycle (MEM):
– Memory reference : Access memory either
• for LD: LMD ← Mem[ALUOutput] or
• for ST: Mem[ALUOutput] ← B

– Branch : Test Condition

• if (cond) PC ← ALUOutput, NPC

MUX
PC
else PC ← NPC; ALUOut

Cond

Data
LMD
Memory
B
5 Steps of DLX Instr. Execution:
Step 5
Step 5: Write-back cycle (WB):
Reg-Reg ALU : Store the result into the destination register
Regs[IR16..20 ] ← ALUOutput;

Reg-Immediate ALU : Store the result into destination register

Regs[IR11..15 ] ← ALUOutput;

Load instruction: Store the data read from memory to the

destination register
Regs[IR11..15 ] ← LMD;

LMD

MUX
Register
File
ALUOut

OP
5 Steps of DLX Datapath
IF Stage ID Stage EX Stage MEM WB Stage
Stage

MUX
Add Zero?

MUX
ALU
PC

Instr. Reg ALU Output

Memory File Data LMD
MUX

MUX
Memory

SMD

Sign
Ext 32
16
A Simple Implementation
• A multi-cycle implementation
– needs temporary registers-- NPC, IC, A, B, Imm,
Cond, ALUOutput, LMD
– CPI improvements

• A single-cycle implementation
– one long clock cycle
– very inefficient for most machines that have a
reasonable variation among the amount of work
– requires the duplication of FU that could be shared in
a multi-cycle implementation
Visualizing Pipeline
Time(clock cycles)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9

ALU
IM Reg DM Reg
Instruction Order

IM Reg ALU DM Reg Draining

IM Reg ALU DM Reg

ALU
IM Reg DM Reg
Filling

ALU
IM Reg DM Reg
Saving Information Produced
by Each Stage of Pipeline
• Information need to be stored at the end of a clock cycle,
otherwise it will be lost
• Each pipeline stage produces information(data, address, and
control) at the end of the clock cycle
• Thus, we need a storage(called inter-stage buffer) at end of
each pipeline stage
Inter-Stage Buffer
in DLX Pipeline
• F/D Buffer
– IR, NPC
• D/EX Buffer
– A, B, Imm, b(destination Reg address to store result),
OP(OP-code), cond
– NPC
• EX/M Buffer
– ALUout(arithmetic result or effective address)
– NPC, cond, b, OP
• M/W Buffer
– LMD(data for LD)
– ALUout(arithmetic result), b, OP
Pipelined DLX Datapath
- Multicycle -
IF Stage ID Stage EX Stage MEM WB
Stage Stage

MUX
Add Zero?

MUX
PC

Instr. Reg

M/W Buffer
ALU
F/D Buffer

D/EX Buffer

EX/M Buffer
Memory File Data LMD
MUX

MUX
Memory

SMD

Sign
16 Ext 32
Basic Performance Issues
• Pipelining increases the CPU instruction throughput.

• Does not reduce the execution time of an individual instruction

rather slightly increase the execution time due to overhead in the control
of the pipeline.

• Increase in throughput means that a program run faster and has lower
execution time.

• Imbalance among the pipe stage reduces performance since clock can
run no faster than the time needed for the slowest pipeline stage.

• Pipeline overhead arises from pipeline register delay and clock skew.
Contd…
• Buffering between stages marginally increase Cycle time
• Harzards reduce the CPI.

What is Hazard???
- is a risk in which pipeline operation stall(stop) for one or
more clock cycle.
- it prevent next instruction from executing during its
designated clock cycle
Pipeline Hazards
• There are three classes of hazards:
– Structural
• Happen due to simultaneous request for the same
resources by two or more instruction
– Eg. IF and MEM both required memory port.
– Data
• Instruction depends on result of prior instruction still
in the pipeline
– Control
• Happen due to branch and jump instruction.
Structural Hazard
Data harzard
Data Harzard Solution
• Types:
– Interlock: H/w detect data dependency and stall depent
instructions.
Contd..
- Forwarding or Bypassing: forward the result as soon as
available to EX
Contd…
• Instruction Scheduling: Reorder instruction. Such that
dependent instruction are 2-3 cycle apart.
– Useful for covering load delays and branch delays
– Useful in hiding delays due to long latency FP operation
Data Hazard Classification
• True data dependency - (RAW)
• Anti dependency - WAR
• Output dependency - WAW
Control Hazard

Next Class……

Pipelining
No ratings yet
Pipelining
24 pages
Module-5 DDCO
No ratings yet
Module-5 DDCO
35 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
COA CH 6
No ratings yet
COA CH 6
14 pages
Pipelining Techniques in DLX Architecture
No ratings yet
Pipelining Techniques in DLX Architecture
39 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Pipeline 1
No ratings yet
Pipeline 1
6 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Instruction Pipelining Techniques
No ratings yet
Instruction Pipelining Techniques
43 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
CAAL-Micro Architechture
No ratings yet
CAAL-Micro Architechture
21 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
DDCO Jan25 Unit5
No ratings yet
DDCO Jan25 Unit5
30 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
CPU Pipelining Explained
No ratings yet
CPU Pipelining Explained
30 pages
DDCO Notes-162-171
No ratings yet
DDCO Notes-162-171
10 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Indirect Addressing in CPU Cycles
No ratings yet
Indirect Addressing in CPU Cycles
56 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
55 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Advanced Pipelining Techniques
No ratings yet
Advanced Pipelining Techniques
44 pages
Operand Forwarding in Pipelining
No ratings yet
Operand Forwarding in Pipelining
34 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Chapter7 - Basic Processing Unit 1
No ratings yet
Chapter7 - Basic Processing Unit 1
31 pages
What Is The Most Boring Household Activity?
No ratings yet
What Is The Most Boring Household Activity?
27 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Lecture Notes Pipelining Stages 7B
No ratings yet
Lecture Notes Pipelining Stages 7B
7 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Unit 7 - Basic Processing
No ratings yet
Unit 7 - Basic Processing
85 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
Computer Systems Pipelining Guide
No ratings yet
Computer Systems Pipelining Guide
7 pages
DLX Pipeline Implementation Overview
No ratings yet
DLX Pipeline Implementation Overview
64 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
32 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
121 pages
Instruction Formats and Control Units
No ratings yet
Instruction Formats and Control Units
63 pages
Digital Design & CPU Basics
No ratings yet
Digital Design & CPU Basics
10 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Unit2 Aca
No ratings yet
Unit2 Aca
118 pages
Instruction Pipelining Explained
No ratings yet
Instruction Pipelining Explained
38 pages
EC Chapter2 2014
No ratings yet
EC Chapter2 2014
88 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Pipe Lining
No ratings yet
Pipe Lining
43 pages
Pipelining vs Parallel Processing Explained
No ratings yet
Pipelining vs Parallel Processing Explained
23 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Lecture2a PDF
No ratings yet
Lecture2a PDF
63 pages
CPU Pipelining Concepts
No ratings yet
CPU Pipelining Concepts
28 pages
Presentation 35191 Content Document 20250423021246PM
No ratings yet
Presentation 35191 Content Document 20250423021246PM
46 pages
Join PDF
No ratings yet
Join PDF
409 pages
VegaPress 440MC660MC 4 in 1 Compressed
No ratings yet
VegaPress 440MC660MC 4 in 1 Compressed
6 pages
Multibyte Subtraction Guide
No ratings yet
Multibyte Subtraction Guide
5 pages
Green Computing: Efficient IT Practices
No ratings yet
Green Computing: Efficient IT Practices
18 pages
Unit 3
No ratings yet
Unit 3
69 pages
Module 4 DDCO
No ratings yet
Module 4 DDCO
25 pages
Module 5 Coa
No ratings yet
Module 5 Coa
25 pages
SEP BCOM Computer Science
No ratings yet
SEP BCOM Computer Science
12 pages
ICND110S04 5 Managing Cisco Devices
No ratings yet
ICND110S04 5 Managing Cisco Devices
29 pages
MINI-PC Product Model H3D-N100-4L
No ratings yet
MINI-PC Product Model H3D-N100-4L
1 page
Papercraft Fish: Print On A4
No ratings yet
Papercraft Fish: Print On A4
10 pages
Health: Dental
No ratings yet
Health: Dental
27 pages
Wheelman Game File Directory List
No ratings yet
Wheelman Game File Directory List
261 pages
Overview of Multiprocessor Systems
No ratings yet
Overview of Multiprocessor Systems
24 pages
NEW IT PAST PAPER UE 2020 (By Zusti Official
No ratings yet
NEW IT PAST PAPER UE 2020 (By Zusti Official
9 pages
DLG
No ratings yet
DLG
2 pages
AC800 Firmware Loading Guide
0% (1)
AC800 Firmware Loading Guide
4 pages
Configuring Sata Hard Drive (S) (Controller: Uli M1689) ................................................................... 2
No ratings yet
Configuring Sata Hard Drive (S) (Controller: Uli M1689) ................................................................... 2
15 pages
Elektor 1996 01
100% (1)
Elektor 1996 01
68 pages
Input and Output Devices Overview
No ratings yet
Input and Output Devices Overview
7 pages
Assembly Language Programming Manual
No ratings yet
Assembly Language Programming Manual
48 pages
PCIe Vs PCI Power 26690 PDF
No ratings yet
PCIe Vs PCI Power 26690 PDF
3 pages
Micro Programmed Control
100% (8)
Micro Programmed Control
24 pages
EET303 M3 Ktunotes - in
No ratings yet
EET303 M3 Ktunotes - in
33 pages
Operating Systems Notes FINAL - Unit1
No ratings yet
Operating Systems Notes FINAL - Unit1
11 pages
Supercomputers and PDAs Explained
No ratings yet
Supercomputers and PDAs Explained
3 pages
V+ Operating Manual
No ratings yet
V+ Operating Manual
162 pages
imagePROGRAF TA 20 Spec Sheet
No ratings yet
imagePROGRAF TA 20 Spec Sheet
2 pages
4th Gen Core Family Desktop Vol 1 Datasheet
No ratings yet
4th Gen Core Family Desktop Vol 1 Datasheet
125 pages
GG 3904 01 en
No ratings yet
GG 3904 01 en
2 pages

Introduction To Pipelining Introduction To Pipelining

Uploaded by

Introduction To Pipelining Introduction To Pipelining

Uploaded by

Introduction to

• Washer takes 30 minutes

• A pipeline is a series of stage, where some work is done at each

• Its an implementation technique whereby multiple instruction are

• Implementation technique that exploits parallelism among the

• A pipelined processor consists of sequence of processing

T = time per instruction on the pipelined machine

Then, the speed up from pipelining equals the number of pipe

- Execute Instruction (EI): Perform the indicated operation.

- Write Operand (WO): Store the result in memory.

– Calculate the next instruction address

– Make 16 bits sign extension of A

– Decoding is done in parallel: OP

– Register-register ALU instruction: Perform ALU operation with R’s

– Register-Immediate ALU instruction: Perform ALU operation with

– Branch: Effective Address calculation for branch target address

– Branch : Test Condition

Reg-Immediate ALU : Store the result into destination register

Load instruction: Store the data read from memory to the

Instr. Reg ALU Output

IM Reg ALU DM Reg Draining

IM Reg ALU DM Reg

• Does not reduce the execution time of an individual instruction

You might also like