0% found this document useful (0 votes)

62 views56 pages

5.1-5.3 Pipelining and Parallel Processing

The document covers the concepts of pipelining and parallel processing in computer organization and architecture, detailing the benefits of increased throughput and speedup through concurrent data processing. It introduces Flynn's taxonomy for classifying parallel processing systems and explains the mechanics of pipelining, including stages of instruction execution and challenges such as data dependency and branch difficulties. Additionally, it discusses techniques to mitigate these challenges, such as branch prediction and delay branches.

Uploaded by

anjali19389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views56 pages

5.1-5.3 Pipelining and Parallel Processing

Uploaded by

anjali19389

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 56

COMPUTER ORGANIZATION &

ARCHITECTURE
(BCS-DS-402)
Unit 5: Pipelining and parallel processing
5.1 Pipelining: Basic concepts of
pipelining, throughput and speedup, pipeline Dr. Meeta Singh,
hazards. Professor
5.2 Parallel Processors: Introduction to DepartmentEngineering
of Computer Science &

parallel processors. School of Engineering & Technology

5.3 Concurrent access to memory and Manav Rachna International Institute of
Research and Studies (Deemed to be
cache coherency. University), Faridabad
Parallel processing
 A parallel processing system is able to perform
concurrent data processing to achieve faster execution
time

 The system may have two or more ALUs and be able

to execute two or more instructions at the same time

 Goal is to increase the throughput – the amount of

processing that can be accomplished during a given
interval of time

 In parallel processing, throughput is the number of

computing tasks that can be completed in a given unit
of time. It's a metric used to measure the
performance.
Parallel processing
classification/ Flynn’s
taxonomy
Single instruction stream, single data stream – SISD

Single instruction stream, multiple data stream – SIMD

Multiple instruction stream, single data stream – MISD

Multiple instruction stream, multiple data stream –

MIMD
Single instruction
stream, single data
stream – SISD

Single control unit, single computer, and a memory unit

Instructions are executed sequentially. Parallel

processing may be achieved by means of multiple
functional units or by pipeline processing
Single instruction stream,
multiple data stream – SIMD
Represents an
organization that
includes many
processing units
under the supervision
of a common control
unit.

Includes multiple
processing units with
a single control unit.
All processors receive
the same instruction,
but operate on
different data.
Multiple instruction stream,
single data stream – MISD

Theoretical only

processors receive
different instructions,
but operate on same
data.
Multiple instruction stream,
multiple data stream –
MIMD
A computer system
capable of processing
several programs at the
same time.

Most multiprocessor
and multicomputer
systems can be
classified in this
category
Flynn’s taxonomy
Pipelining: Laundry
Example
 Small laundry has one
washer, one dryer and
one operator, it takes 90 A B C D
minutes to finish one
load:

 Washer takes 30 minutes

 Dryer takes 40 minutes
 “operator folding” takes
20 minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight

Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k

O B
r
d
e C
r 90 min
D
 This operator scheduled his loads to be delivered to the laundry every
90 minutes which is the time required to finish one load. In other
words he will not start a new task unless he is already done with the
previous task
 The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined
Laundry
Operator start work ASAP
6 PM 7 8 9 10 11 Midnight

Time
30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
 Another operator asks for the delivery of loads to the laundry every 40 minutes!?.
 Pipelined laundry takes 3.5 hours for 4 loads
 Multiple tasks
Pipelining Facts operating
simultaneously
 Pipelining doesn’t
6 PM 7 8 9 help latency of single
task, it helps
throughput of entire
Time workload
T 30 40 40 40 40 20  Pipeline rate limited
a by slowest pipeline
s A stage
k  Potential speedup =
B Number of pipe
O stages
r C The washer 
Unbalanced lengths
waits for the
d dryer for 10
of pipe stages
e minutes reduces speedup
D
r
 Time to “fill” pipeline
and time to “drain” it
reduces speedup
Pipelining
• Decomposes a sequential process into segments.
• Divide the processor into segment processors
each one is dedicated to a particular segment.
• Each segment is executed in a dedicated
segment-processor operates concurrently with all
other segments.
• Information flows through these multiple
hardware segments.
Pipelining
 Instruction execution is divided into k segments
or stages
 Instruction exits pipe stage k-1 and proceeds

into pipe stage k

 All pipe stages take the same amount of time;

called one processor cycle

 Length of the processor cycle is determined by

the slowest pipe stage

k segments
Pipelining
 Suppose we want to perform the
combined multiply and add
operations with a stream of numbers:

 A i * B i + Ci for i =1,2,3,…,7
Pipelining
 The suboperations performed in each
segment of the pipeline are as
follows:

 R1  Ai, R2  Bi
 R3  R1 * R2 R4  Ci
 R5  R3 + R4
Some definitions
 Pipeline: is an implementation technique
where multiple instructions are overlapped
in execution.

 Pipeline stage: The computer pipeline is

to divided instruction processing into
stages. Each stage completes a part of an
instruction and loads a new part in
parallel. The stages are connected one to
the next to form a pipe - instructions enter
at one end, progress through the stages,
and exit at the other end.
Some definitions

Throughput of the instruction pipeline is determined by how

often an instruction exits the pipeline. Pipelining does not
decrease the time for individual instruction execution. Instead, it
increases instruction throughput.

Machine cycle . The time required to move an instruction one

step further in the pipeline. The length of the machine cycle is
determined by the time required for the slowest pipe stage.
Instruction pipeline versus sequential
processing

sequential processing

Instruction pipeline
Instruction pipeline (Contd.)

sequential processing is
faster for few instructions
Two Stage Instruction
Pipeline
Difficulties...

 If a complicated memory access

occurs in stage 1, stage 2 will be
delayed and the rest of the pipe is
stalled.
 If there is a branch, if.. and jump,

then some of the instructions that

have already entered the pipeline
should not be processed.
 We need to deal with these difficulties
Flow chart for four segment pipeline
5-Stage Pipelining
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Five Stage
Instruction
Pipeline

 Fetch instruction
 Decode
instruction
 Fetch operands
 Execute
instructions
 Write result
5-Stage Pipelining
S1 S2 S3 S4 S5
Fetch Decode Fetch Execution Write
Instruction Instruction Operand Instruction Operand
(FI) (DI) (FO) (EI) (WO)

Time
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5
Five Stage
Instruction
Pipeline

 Fetch instruction
 Decode
instruction
 Fetch operands
 Execute
instructions
 Write result
6-Stage Pipelining
S1 S2 S3 S4 S5
Instruction Calculate Fetch
Fetch Decode operand Execution
Operand

S6 Write
Time operand
S1 1 2 3 4 5 6 7 8 9
S2 1 2 3 4 5 6 7 8
S3 1 2 3 4 5 6 7
S4 1 2 3 4 5 6
S5 1 2 3 4 5 6
Six Stage
Instruction
Pipeline
 Fetch instruction
 Decode instruction
 Calculate operands
(Find effective address)
 Fetch operands
 Execute
instructions
 Write result
Two major difficulties
 Branch Difficulties
 Data Dependency
 Data Dependency
 Branch Difficulties
Solutions:
 Prefetch target instruction
 Delayed Branch
 Branch target buffer (BTB)
 Branch Prediction
Data Dependency
 Use Delay Load to solve:

Example:
load R1 R1M[Addr1]
load R2 R2M[Addr2]
ADD R3R1+R2
Store M[addr3]R3
Delay Load
Delay Load
Example
 Five instructions need to be carried
out:

Load from memory to R1

Increment R2
Add R3 to R4
Subtract R5 from R6
Branch to address X
Delay Branch
Rearrange the Instruction
Delayed Branch
 In this procedure, the compiler
detects the branch instruction and
rearrange the machine language
code sequence by inserting useful
instructions that keep the pipeline
operating without interrupts
Prefetch target instruction
 Prefetch the target instruction in
addition to the instruction following
the branch

 If the branch condition is successful,

the pipeline continues from the
branch target instruction
Branch target buffer (BTB)
 BTB is an associative memory
 Each entry in the BTB consists of the
address of a previously executed
branch instruction and the target
instruction for the branch
Loop Buffer
 Very fast memory
 Maintained by fetch stage of pipeline
 Check buffer before fetching from memory
 Very good for small loops or jumps
 The loop buffer is similar (in principle) to a
cache dedicated to instructions. The
differences are that the loop buffer only
retains instructions in sequence, and is
much smaller in size (and lower in cost).
Branch Prediction
 A pipeline with branch prediction
uses some additional logic to guess
the outcome of a conditional branch
instruction before it is executed
Branch Prediction
 Various techniques can be used to predict
whether a branch will be taken or not:


Prediction never taken

Prediction always taken

Prediction by opcode

Branch history table

 The first three approaches are static: they do not

depend on the execution history up to the time of
the conditional branch instruction. The last
approach is dynamic: they depend on the
execution history.
Prefetch target instruction
 Prefetch the target instruction in
addition to the instruction following
th branch

 If the branch condition is successful,

the pipeline continues from the
branch target instruction
Branch target buffer (BTB)
 BTB is an associative memory
 Each entry in the BTB consists of the
address of a previously executed
branch instruction and the target
instruction for the branch
Branch Prediction
 A pipeline with branch prediction
uses some additional logic to guess
the outcome of a conditional branch
instruction before it is executed
Delayed Branch
 In this procedure, the compiler
detects the branch instruction and
rearrange the machine language
code sequence by inserting useful
instructions that keep the pipeline
operating without interrupts
 An example of delay branch is
presented in the next section

Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Parallel Processing & Pipelining
No ratings yet
Parallel Processing & Pipelining
33 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
29 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
Parallel Processing and Pipelining
No ratings yet
Parallel Processing and Pipelining
53 pages
Instruction Pipelining in Operating Systems
No ratings yet
Instruction Pipelining in Operating Systems
50 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Lec18 Pipeline Chap9 2
No ratings yet
Lec18 Pipeline Chap9 2
26 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
16 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
Unit 5
No ratings yet
Unit 5
51 pages
Pipeline Processing in Computer Systems
No ratings yet
Pipeline Processing in Computer Systems
16 pages
4 Instruction Pipeline
No ratings yet
4 Instruction Pipeline
13 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
52 pages
Pipelining in Parallel Processing
No ratings yet
Pipelining in Parallel Processing
63 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Understanding Processor Pipelining
No ratings yet
Understanding Processor Pipelining
28 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Chapter 05
No ratings yet
Chapter 05
14 pages
Pipelining in Instruction Processing
No ratings yet
Pipelining in Instruction Processing
76 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Canvas Pipelining and Parallel Processors
No ratings yet
Canvas Pipelining and Parallel Processors
5 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
BNCS1209 Chapter 6
No ratings yet
BNCS1209 Chapter 6
25 pages
Module 4
No ratings yet
Module 4
12 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
64 pages
Parallel Computing
No ratings yet
Parallel Computing
46 pages
Lecture 5
No ratings yet
Lecture 5
50 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Pipeline
No ratings yet
Pipeline
22 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
73 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Pipelining Unit 3
No ratings yet
Pipelining Unit 3
19 pages
Basics and Hazards of Pipeline Controller
No ratings yet
Basics and Hazards of Pipeline Controller
23 pages
Understanding Pipelining and Hazards
No ratings yet
Understanding Pipelining and Hazards
19 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Vector Processing and Pipelining
No ratings yet
Vector Processing and Pipelining
22 pages
IIC1082 Chapter8
No ratings yet
IIC1082 Chapter8
24 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
20-Unit 7-22-04-2024
No ratings yet
20-Unit 7-22-04-2024
97 pages
Instruction Pipelining and Performance Factors
No ratings yet
Instruction Pipelining and Performance Factors
34 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
32 pages
Understanding Pipelining Techniques
No ratings yet
Understanding Pipelining Techniques
21 pages
21 Races and State Assignment
No ratings yet
21 Races and State Assignment
22 pages
Architecture of A Computer System
No ratings yet
Architecture of A Computer System
6 pages
16-Bit RISC PROCESSOR
100% (1)
16-Bit RISC PROCESSOR
16 pages
ARM Architecture and Applications
No ratings yet
ARM Architecture and Applications
51 pages
Digital Logic Question Bank
No ratings yet
Digital Logic Question Bank
13 pages
Computer Hardware & Repair Guide
100% (1)
Computer Hardware & Repair Guide
67 pages
Adsp-Bf534 BF536 BF537
No ratings yet
Adsp-Bf534 BF536 BF537
68 pages
8051 Microcontroller Overview and Pins
No ratings yet
8051 Microcontroller Overview and Pins
141 pages
Geforce RTX 2060 Ventus 6G Oc Geforce RTX 2060 Ventus 6G Oc: Feature Specification
No ratings yet
Geforce RTX 2060 Ventus 6G Oc Geforce RTX 2060 Ventus 6G Oc: Feature Specification
1 page
Experiment No: 14 Astable Multivibrator Using Ic 555
No ratings yet
Experiment No: 14 Astable Multivibrator Using Ic 555
3 pages
ARM Architecture Overview
No ratings yet
ARM Architecture Overview
54 pages
Design and Implementation of I2c Master Controller On FPGA Using VHDL
No ratings yet
Design and Implementation of I2c Master Controller On FPGA Using VHDL
5 pages
Computer Organization and Architecture - Answer Ke
No ratings yet
Computer Organization and Architecture - Answer Ke
7 pages
Computer Short Cut Names
100% (1)
Computer Short Cut Names
4 pages
Output 1.5A or Less High Efficiency Step-Down Switching Regulator With Built-In Power MOSFET
No ratings yet
Output 1.5A or Less High Efficiency Step-Down Switching Regulator With Built-In Power MOSFET
21 pages
TSSN Syllabus
No ratings yet
TSSN Syllabus
2 pages
G.A. Rathy, SR Lecturer/Electrical, NITTTR, Chennai
100% (1)
G.A. Rathy, SR Lecturer/Electrical, NITTTR, Chennai
81 pages
Integrated Logic Analyzer & VIO Setup Guide
No ratings yet
Integrated Logic Analyzer & VIO Setup Guide
7 pages
Flex 6-15 AC2 BR 3ph-Vent V2012
100% (1)
Flex 6-15 AC2 BR 3ph-Vent V2012
13 pages
FET vs. BJT: Key Differences Explained
No ratings yet
FET vs. BJT: Key Differences Explained
93 pages
I O Interface
No ratings yet
I O Interface
27 pages
SSD
No ratings yet
SSD
3 pages
Quad 2-Input and Gate SN54/74LS09: Low Power Schottky
No ratings yet
Quad 2-Input and Gate SN54/74LS09: Low Power Schottky
2 pages
Microcontroller Interfacing Overview
100% (1)
Microcontroller Interfacing Overview
3 pages
certification-memoranda-import-EASA Proposed - CM-SWCEH-001
No ratings yet
certification-memoranda-import-EASA Proposed - CM-SWCEH-001
62 pages
Ratioed Logic and Pseudo-NMOS Overview
No ratings yet
Ratioed Logic and Pseudo-NMOS Overview
5 pages
Asus A8Le
No ratings yet
Asus A8Le
94 pages
Logic Gates Lab: NAND, NOR, Truth Tables
No ratings yet
Logic Gates Lab: NAND, NOR, Truth Tables
8 pages
SIMATIC NET Siemens PROFIBUS Controlle 3 Hardware Description
No ratings yet
SIMATIC NET Siemens PROFIBUS Controlle 3 Hardware Description
68 pages
Lab Task6,7,8,9,10
No ratings yet
Lab Task6,7,8,9,10
11 pages

5.1-5.3 Pipelining and Parallel Processing

Uploaded by

5.1-5.3 Pipelining and Parallel Processing

Uploaded by

COMPUTER ORGANIZATION &

parallel processors. School of Engineering & Technology

 The system may have two or more ALUs and be able

 Goal is to increase the throughput – the amount of

 In parallel processing, throughput is the number of

Single instruction stream, multiple data stream – SIMD

Multiple instruction stream, single data stream – MISD

Multiple instruction stream, multiple data stream –

Single control unit, single computer, and a memory unit

Instructions are executed sequentially. Parallel

 Washer takes 30 minutes

into pipe stage k

called one processor cycle

the slowest pipe stage

 Pipeline stage: The computer pipeline is

Throughput of the instruction pipeline is determined by how

Machine cycle . The time required to move an instruction one

 If a complicated memory access

then some of the instructions that

Load from memory to R1

 If the branch condition is successful,

 The first three approaches are static: they do not

 If the branch condition is successful,

You might also like