100% found this document useful (1 vote)

2K views18 pages

Pipeline and Vector Processing

The document discusses parallel processing and pipeline vector processing. It describes parallel processing as using simultaneous data processing tasks to increase computational speed. Pipeline processing breaks down sequential processes into sub-operations that are concurrently executed in dedicated segments. Vector processing performs the same operation on multiple data elements simultaneously. The document outlines various parallel processing techniques like pipelining arithmetic operations and instruction fetching to improve efficiency. It also discusses challenges like resource conflicts, data dependencies, and branch difficulties in pipeline implementations and how they can be addressed.

Uploaded by

Pavan Pulicherla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

2K views18 pages

Pipeline and Vector Processing

Uploaded by

Pavan Pulicherla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

PIPELINE AND VECTOR PROCESSING

1. Parallel processing:
Parallel processing is a term used for a large class of techniques that are used to
provide simultaneous data-processing tasks for the purpose of increasing the computational
speed of a computer system.

 It refers to techniques that are used to provide simultaneous data processing.

 The system may have two or more ALUs to be able to execute two
or more instruction at the same time.

 The system may have two or more processors operating concurrently.

 It can be achieved by having multiple functional units that perform same or

different operation simultaneously.

Example of parallel Processing:

– Multiple Functional Unit:

Separate the execution unit into eight functional units operating in parallel.

There are variety of ways in which the parallel processing can be classified

o Internal Organization of Processor

o Interconnection structure between processors
o Flow of information through system
Architectural Classification:

– Flynn's classification

» Based on the multiplicity of Instruction Streams and Data Streams

» Instruction Stream

• Sequence of Instructions read from memory

» Data Stream

• Operations performed on the data in the processor

 SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.

 SIMD represents an organization that includes many processing units under

the supervision of a common control unit.

 MISD structure is of only theoretical interest since no practical system has

been constructed using this organization.

 MIMD organization refers to a computer system capable of processing

several programs at the same time.

The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.

Parallel Processing can be discussed under following topics:

 Pipeline Processing

 Vector Processing

 Array Processors
2. PIPELINING

• A technique of decomposing a sequential process into suboperations, with each

subprocess being executed in a special dedicated segment that operates
concurrently with all other segments.

• It is a technique of decomposing a sequential process into sub operations, with each

sub process being executed in a special dedicated segments that operates concurrently
with all other segments.

• Each segment performs partial processing dictated by the way task is

partitioned.

• The result obtained from each segment is transferred to next segment.

• The final result is obtained when data have passed through all segments.

• Suppose we have to perform the following task:

• Each sub operation is to be performed in a segment within a pipeline. Each segment

has one or two registers and a combinational circuit.
OPERATIONS IN EACH PIPELINE STAGE:

• General Structure of a 4-Segment Pipeline

• Space-Time Diagram

The following diagram shows 6 tasks T1 through T6 executed in 4segments.

PIPELINE SPEEDUP:
Consider the case where a k-segment pipeline used to execute n tasks.
 n = 6 in previous example
 k = 4 in previous example

• Pipelined Machine (k stages, n tasks)

o The first task t1 requires k clock cycles to complete its operation since there
are k segments
o The remaining n-1 tasks require n-1 clock cycles
o The n tasks clock cycles = k+(n-1) (9 in previous example)

• Conventional Machine (Non-Pipelined)

o Cycles to complete each task in nonpipeline = k

o For n tasks, n cycles required is

• Speedup (S)

 S = Nonpipeline time /Pipeline time

 For n tasks: S = nk/(k+n-1)

 As n becomes much larger than k-1; Therefore, S = nk/n = k

PIPELINE AND MULTIPLE FUNCTION UNITS:

Example:

- 4-stage pipeline

- 100 tasks to be executed

- 1 task in non-pipelined system; 4 clock cycles

Pipelined System : k + n - 1 = 4 + 99 = 103 clock cycles

Non-Pipelined System : nk = 100 4 = 400 clock cycles

Speedup :Sk = 400 / 103 = 3.88

• Arithmetic Pipeline

• Instruction Pipeline

ARITHMETIC PIPELINE:
 Pipeline arithmetic units are usually found in very high speed computers.

 They are used to implement floating point operations.

23
UNIT-V
 We will now discuss the pipeline unit for the floating point addition and
subtraction.

 The inputs to floating point adder pipeline are two normalized floating point numbers.
 A and B are mantissas and a and b are the exponents.

 The floating point addition and subtraction can be performed in four

segments. Floating-point adder:

[1] Compare the exponents

[2] Align the mantissa

[3] Add/sub the mantissa

[4] Normalize the result

1) Compare exponents :

3-2=1

2) Align mantissas
3
X = 0.9504 x 10
3
Y = 0.08200 x 10
3) Add mantissas
3
Z = 1.0324 x 10
4) Normalize result
4
Z = 0.10324 x 10

24
UNIT-V
Instruction Pipeline:

Pipeline processing can occur not only in the data stream but in the instruction stream as
well.

An instruction pipeline reads consecutive instruction from memory while previous
instruction are being executed in other segments.

This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.

Four Segment CPU Pipeline:

 FI segment fetches the instruction.

 DA segment decodes the instruction and calculate the effective address.

 FO segment fetches the operand.
 EX segment executes the instruction.

25
UNIT-V
INSTRUCTION CYCLE:

Pipeline processing can occur also in the instruction stream. An

instruction pipeline reads consecutive instructions from memory while

previous instructions are being executed in other segments. Six Phases* in

an Instruction Cycle

[1] Fetch an instruction from memory

[2] Decode the instruction

26
UNIT-V
[3] Calculate the effective address of the operand

[4] Fetch the operands from memory

[5] Execute the operation

[6] Store the result in the proper place

* Some instructions skip some phases

* Effective address calculation can be done in the part of the decoding phase

* Storage of the operation result into a register is done automatically in the execution

phase ==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory

[2] DA: Decode the instruction and calculate the effective address of the operand

[3] FO: Fetch the operand

[4] EX: Execute the operation

Pipeline Conflicts :

– Pipeline Conflicts : 3 major difficulties

–
1) Resource conflicts: memory access by two segments at the same time. Most of
these conflicts can be resolved by using separate instruction and data memories.

2) Data dependency: when an instruction depend on the result of a previous

instruction, but this result is not yet available.

27
UNIT-V
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.

3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the
value of PC.

Handling Data Dependency:

 This problem can be solved in the following ways:

Hardware interlocks: It is the circuit that detects the conflict situation and
delayed the instruction by sufficient cycles to resolve the conflict.

 Operand Forwarding: It uses the special hardware to detect the

conflict and avoid it by routing the data through the special path between
pipeline segments.

 Delayed Loads: The compiler detects the data conflict and

reorder the instruction as necessary to delay the loading of the
conflicting data by inserting no operation instruction.

Handling of Branch Instruction:

 Pre fetch the target instruction.

 Branch target buffer(BTB) included in the fetch segment of the pipeline

 Branch Prediction
 Delayed Branch
RISC Pipeline:

Simplicity of instruction set is utilized to implement an instruction pipeline using

small number of sub-operation, with each being executed in single clock cycle.

Since all operation are performed in the register, there is no need of effective
address calculation.

Three Segment Instruction Pipeline:

 I: Instruction Fetch

 A: ALU Operation

 E: Execute
Instruction Delayed Load:

28
UNIT-V
Delayed Branch:

Let us consider the program having the following 5 instructions

29
UNIT-V
Organization of Intel 8085 Micro-Processor:

The microprocessors that are available today came with a wide variety of capabilities and
architectural features. All of them, regardless of their diversity, are provided with at least the
following functional components, which form the central processing unit (CPU) of a classical
computer.

1. Register Section : A set of registers for temporary storage of instructions, data and
address of data .
2. Arithmetic and Logic Unit : Hardware for performing primitive arithmetic and logical
operations .
3. Interface Section : Input and output lines through which the microprocessor
communicates with the outside world .
4. Timing and Control Section : Hardware for coordinating and controlling the activities
of the various sections within the microprocessor and other devices connected to the
interface section .

The block diagram of the microprocessor along with the memory and Input/Output (I/O)
devices is shown in the Figure 11.1.

Figure 11.1: Block diagram of Micorprocessor with memory and I/O.

30
UNIT-V
Intel Microprocessors:

Intel 4004 is the first 4-bit microprocessor introduced by Intel in 1971. After that Intel
introduced its first 8-bit microprocessor 8088 in 1972.

These microprocessors could not last long as general-purpose microprocessors due to their
design and performance limitations.

In 1974, Intel introduced the first general purpose 8-bit microprocessor 8080 and this is the
first step of Intel towards the development of advanced microprocessor.

After 8080, Intel launched microprocessor 8085 with a few more features added to its
architecture, and it is considered to be the first functionally complete microprocessor.

The main limitations of the 8-bit microprocessors were their low speed, low memory
capacity, limited number of general purpose registers and a less powerful instruction set .

To overcome these limitations Intel moves from 8-bit microprocessor to 16-bit

microprocessor.

In the family of 16-bit microprocessors, Intel's 8086 was the first one introduced in 1978 .

8086 microprocessor has a much powerful instruction set along with the architectural
developments, which imparted substantial programming flexibility and improvement over the
8-bit microprocessor.

Microprocessor Intel 8085 :

Intel 8085 is the first popular microprocessor used by many vendors. Due to its simple
architecture and organization, it is easy to understand the working principle of a
microprocessor.

Register in the Intel 8085:

The programmable registers of 8085 are as follows -

 One 8-bit accumulator A.

 Six 8-bit general purpose register (GPR’s)
B, C, D , E , H and L.
 The GPR’s are also accessible as three 16-bit register pairs BC, DE and HL.
 There is a 16-bit program counter(PC), one 16-bit stack
pointer(SP) and 8-bit flag register . Out of 8 bits of the flag
register , only 5 bits are in use.

The programmable registers of the 8085 are shown in the Figure 11.2-

31
UNIT-V
Figure 11.2: Register Organisation of 8085

Apart from these programmable registers , some other registers are also available which are
not accessible to the programmer . These registers include -

 Instruction Register(IR).
 Memory address and data buffers(MAR & MDR).
o MAR: Memory Address Register.
o MDR: Memory Data Register.
 Temporary register for ALU use.

ALU of 8085 :

The 8-bit parallel ALU of 8085 is capable of performing the following operations –

Arithmetic : Addition, Subtraction, Increment, Decrement, Compare.

Logical : AND, OR, EXOR, NOT, SHIFT / ROTATE, CLEAR.

Because of limited chip area , complex operations like multiplication, division, etc are not
available, in earlier processors like 8085.

The operations performed on binary 2's complement data.

The five flag bits give the status of the microprocessor after an ALU operation.

The carry (C) flag bit indicates whether there is any overflow from the MSB.

The parity (P) flag bit is set if the parity of the accumulater is even.

The Auxiliary Carry (AC) flag bit indicates overflow out of bit –3 ( lower nibble) in the same
manner, as the C-flag indicates the overflow out of the bit-7.

32
UNIT-V
The Zero (Z) flag bit is set if the content of the accumulator after any ALU operations is zero.

The Sign(S) flag bit is set to the condition of bit-7 of the accumulator as per the sign of the
contents of the accumulator(positive or negative ).

The Interface Section:

Microprocessor chips are equipped with a number of pins for communication with the outside
world. This is known as the system bus.
The interface lines of the Intel 8085 microprocessor are shown in the Figure 11.3 –

Address and Data Bus

The AD0 - AD7 lines are used as lower order 8-bit address bus and data bus , in time division
multiplexed manner .

The A8 - A15 lines are used for higher order 8 bit of address bus.

There are seven memory and I/O control lines -

RD : indicates a READ operation when the signal is LOW .

WR : indicates a WRITE operation when the signal is LOW .

IO/M : indicates memory access for LOW and I/O access for HIGH .

ALE : ALE is an address latch enable signal , this signal is HIGH when address information
is present in AD0-AD7 . The falling edge of ALU can be used to latch the address into an
external buffer to de-multiples the address bus .

33
UNIT-V
READY : READY line is used for communication with slow memory and I/O devices .

S0 and S1 : The status of the system bus is difined by the S0 and S1 lines as follows -

S1 S0 Operation Specified
0 0 Halt
0 1Memory or I/O WRITE
1 0Memory or I/O READ
1 1 Instruction Fetch

There are ten lines associated with CPU and bus control-

TRAP , RST7.5 , RST6.5 , RST5.5 and INTR are the Interrupt lines.
INTA: Interrupt acknowledge line.
RESET IN : This is the reset input signal to the 8085.
RESET OUT : The 8085 generates the RESET-OUT signal in response to
RESET-IN signal , which can be used as a system reset signal .
 HOLD : HOLD signal is used for DMA request.
 HLDA : HLDA signal is used for DMA grant .
 Clock and Utility Lines :

X1 and X2: X1 and X2 are provided to connect a crystal or a RC network for

generating theclockinternaltothe chip.
Sid: input line for serial data communication.
Sod: output line for serial data communication.
Vcc and vss: power supply.

 The block diagram of the Intel 8085 is shown in the Figure 11.4 -

34
UNIT-V
Addressing Modes :

The 8085 has four different modes for addressing data stored in memory or in registers -

Direct: Bytes 2 and 3 of the instruction contains the exact memory address of the data item(
the low-order bits of the address are in byte 2 , the high-order bits in byte 3 ).

Register Indirect: The instruction specifies a register pair which contains the memory address
where the data are located .( the high-order bits of the address are in the first register of the
pair and the low order bits in the second ).

Immediate: The instruction contains the data itself . This is either and 8-bit quantity or a 16-
bit quantity (least significant byte first , most significant byte second ).

Unless directed by an interrupt or branch instruction the execution of instructions proceeds

through consecutively increasing memory locations.

A branch instruction can specify the address of the next instruction to be executed in one of
two ways -

Direct: The branch instruction contains the address of the next instruction to be executed .

Project 1: EEE 300 (Fall 2020)
No ratings yet
Project 1: EEE 300 (Fall 2020)
4 pages
KPI Partners - Sunrun - v1
No ratings yet
KPI Partners - Sunrun - v1
32 pages
Thesis
100% (2)
Thesis
50 pages
Multithreaded Programming
No ratings yet
Multithreaded Programming
40 pages
SPF Catalogue
No ratings yet
SPF Catalogue
12 pages
Ely SlidesCarnival
No ratings yet
Ely SlidesCarnival
30 pages
Drum Brakes
No ratings yet
Drum Brakes
34 pages
Unit 5-2 COA
No ratings yet
Unit 5-2 COA
52 pages
JIT and Lean Operations
No ratings yet
JIT and Lean Operations
20 pages
PowerExpander Micro-14F User Guide (Rev A)
No ratings yet
PowerExpander Micro-14F User Guide (Rev A)
2 pages
Nokia 5530 XpressMusic UG en
No ratings yet
Nokia 5530 XpressMusic UG en
159 pages
AVL TreeSolutions
100% (1)
AVL TreeSolutions
22 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
En - Left Outer Join
No ratings yet
En - Left Outer Join
2 pages
SOLIS Export Limit Settings Using A CT Clamp
No ratings yet
SOLIS Export Limit Settings Using A CT Clamp
4 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Interview Questions Answer
No ratings yet
Interview Questions Answer
7 pages
Harish Project Frony Sheets
No ratings yet
Harish Project Frony Sheets
6 pages
Java Lab Manual
No ratings yet
Java Lab Manual
43 pages
Distributed Programming Using Java: Quick Recap: UNIT-1
No ratings yet
Distributed Programming Using Java: Quick Recap: UNIT-1
23 pages
HackWithInfy - Examination Guidelines
No ratings yet
HackWithInfy - Examination Guidelines
2 pages
T08135D ProXL Service Manual
100% (1)
T08135D ProXL Service Manual
271 pages
Volinfo
No ratings yet
Volinfo
1 page
Komatsu Hydraulic Excavator Pc228uslc 3 Eo Shop Manual
100% (61)
Komatsu Hydraulic Excavator Pc228uslc 3 Eo Shop Manual
20 pages
Java Lab Manual Aky
No ratings yet
Java Lab Manual Aky
33 pages
Stress Analysis Using Autodesk Inventor
No ratings yet
Stress Analysis Using Autodesk Inventor
9 pages
AKTU Operating System Important QUESTION
100% (1)
AKTU Operating System Important QUESTION
5 pages
Sep I Sem Bca General Question Bank
No ratings yet
Sep I Sem Bca General Question Bank
46 pages
CCSM Assignment Spring 2021
No ratings yet
CCSM Assignment Spring 2021
7 pages
04 Merged
No ratings yet
04 Merged
23 pages
Implementation of Pid Trained Artificial Neural Network Controller For Different DC Motor Drive
No ratings yet
Implementation of Pid Trained Artificial Neural Network Controller For Different DC Motor Drive
13 pages
Encoder and Decoder: A Project By:Priyanka Basak 3Rd Year, 6th Sem Roll:32
No ratings yet
Encoder and Decoder: A Project By:Priyanka Basak 3Rd Year, 6th Sem Roll:32
13 pages
Information For The Telecom & Technology - iPAS0LINK 200 Installation
No ratings yet
Information For The Telecom & Technology - iPAS0LINK 200 Installation
3 pages
PDF
No ratings yet
PDF
32 pages
Interview Questions Answer PDF
No ratings yet
Interview Questions Answer PDF
7 pages
Spiral Matrix Traversal
No ratings yet
Spiral Matrix Traversal
1 page
User's Manual: GB NL F E D I
No ratings yet
User's Manual: GB NL F E D I
21 pages
Model Multisplit
No ratings yet
Model Multisplit
4 pages
OOPS Through JAVA Important Questions
No ratings yet
OOPS Through JAVA Important Questions
3 pages
PPS-LAB-MANUAL BY Asst - Prof Mani Kumar (B.Tech, M.Tech (Ph.D. Central University)
100% (1)
PPS-LAB-MANUAL BY Asst - Prof Mani Kumar (B.Tech, M.Tech (Ph.D. Central University)
72 pages
Addition and Subtraction With Signed 2s Complement
No ratings yet
Addition and Subtraction With Signed 2s Complement
3 pages
MA3354 Discrete Mathematics Lecture Notes 2
No ratings yet
MA3354 Discrete Mathematics Lecture Notes 2
221 pages
Exception Handling and Multithreading
No ratings yet
Exception Handling and Multithreading
60 pages
BCA 3rd Data Structure 1 20
100% (1)
BCA 3rd Data Structure 1 20
20 pages
Synchronization Hardware
No ratings yet
Synchronization Hardware
10 pages
Unit-2 Os R23
No ratings yet
Unit-2 Os R23
16 pages
Iteration Control Statements: Ans: A
No ratings yet
Iteration Control Statements: Ans: A
9 pages
Selection Control
0% (2)
Selection Control
8 pages
Abstract Datatype
No ratings yet
Abstract Datatype
6 pages
Legend: 33KV Transfer Bus
No ratings yet
Legend: 33KV Transfer Bus
1 page
Java Tokens
No ratings yet
Java Tokens
68 pages
Centralized Traffic Control (CTC) Is A Form of
100% (1)
Centralized Traffic Control (CTC) Is A Form of
7 pages
Numerical Problem of Unit4 Coa
No ratings yet
Numerical Problem of Unit4 Coa
26 pages
Cao Question Bank Unit 1-5 BT Format 2020-2021
No ratings yet
Cao Question Bank Unit 1-5 BT Format 2020-2021
4 pages
ADS Lab Manual PDF
No ratings yet
ADS Lab Manual PDF
54 pages
Spa 400 PDF
No ratings yet
Spa 400 PDF
2 pages
Unit 4 TAFL
No ratings yet
Unit 4 TAFL
50 pages
Computer Architecture & Organization Unit 4
100% (2)
Computer Architecture & Organization Unit 4
24 pages
Experiment-1 Aim: Write A Program For Implementation of Bit Stuffing
No ratings yet
Experiment-1 Aim: Write A Program For Implementation of Bit Stuffing
56 pages
Hydrogen Fuel Cell Vehicle
100% (1)
Hydrogen Fuel Cell Vehicle
3 pages
Object Code Generation For SICxe
No ratings yet
Object Code Generation For SICxe
41 pages
0 - C Notes PDF
No ratings yet
0 - C Notes PDF
158 pages
Java Imp Questions Unit Wise
100% (1)
Java Imp Questions Unit Wise
4 pages
pst1 1st Sem Bca
No ratings yet
pst1 1st Sem Bca
14 pages
Thrashing in OS (Operating System) - What Is Thrash - Javatpoint
No ratings yet
Thrashing in OS (Operating System) - What Is Thrash - Javatpoint
7 pages
And Percentage Error.: 1. WAP To Find The Absolute Error, Relative Error
No ratings yet
And Percentage Error.: 1. WAP To Find The Absolute Error, Relative Error
16 pages
LK Rockwell Machines Brochure PDF
No ratings yet
LK Rockwell Machines Brochure PDF
2 pages
JNTUH Data Structures Important Questions
No ratings yet
JNTUH Data Structures Important Questions
2 pages
Booths Algo
No ratings yet
Booths Algo
8 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
23 pages
Arrays in C Programming
No ratings yet
Arrays in C Programming
24 pages
Written Report in HTML
No ratings yet
Written Report in HTML
8 pages
Data Structures Using Java - Lab Manual
83% (6)
Data Structures Using Java - Lab Manual
48 pages
Question Bank Subject: Digital Electronics and Computer Organization Subject Code: BCA - 202 (N)
100% (1)
Question Bank Subject: Digital Electronics and Computer Organization Subject Code: BCA - 202 (N)
5 pages
Requirements For Effective Market Segmentation
100% (1)
Requirements For Effective Market Segmentation
3 pages
13 Questions Related To Package
No ratings yet
13 Questions Related To Package
6 pages
Mod 3 Control Section and Program Linking: Chap 2
No ratings yet
Mod 3 Control Section and Program Linking: Chap 2
20 pages
Os Question Bank
100% (1)
Os Question Bank
12 pages
Assignment Questions For Os
0% (1)
Assignment Questions For Os
1 page
Experiment 2: Aim: To Implement and Analyze Merge Sort Algorithm. Theory
No ratings yet
Experiment 2: Aim: To Implement and Analyze Merge Sort Algorithm. Theory
5 pages
Computer Organization UNIT-3 Processor and Control Unit: Fundamental Concepts
No ratings yet
Computer Organization UNIT-3 Processor and Control Unit: Fundamental Concepts
23 pages
Call Center Hand Over
No ratings yet
Call Center Hand Over
3 pages
Undecidable Problems For Recursively Enumerable Languages: Continued
No ratings yet
Undecidable Problems For Recursively Enumerable Languages: Continued
54 pages
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
No ratings yet
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
6 pages
FILE Handling in C++ Program
100% (1)
FILE Handling in C++ Program
17 pages
Data Structures 2 Marks and 16 Marks Question Bank With Answers
75% (4)
Data Structures 2 Marks and 16 Marks Question Bank With Answers
27 pages
Experiment:1.3: Write A Program To Implement Sequential File Allocation Method. Ide Used: - Dev C++
No ratings yet
Experiment:1.3: Write A Program To Implement Sequential File Allocation Method. Ide Used: - Dev C++
4 pages
Steps For Handling Page Fault - Easy Notes
No ratings yet
Steps For Handling Page Fault - Easy Notes
4 pages
BCA T 113 Digital Electronics Question Bank
No ratings yet
BCA T 113 Digital Electronics Question Bank
9 pages
Recursively Enumerable Languages
No ratings yet
Recursively Enumerable Languages
8 pages
Viva Questions For Data Structures Lab
No ratings yet
Viva Questions For Data Structures Lab
12 pages
Computer Architecture 16 Marks
100% (1)
Computer Architecture 16 Marks
28 pages
Program-1 Write A Program in C To Create Two Sets and Perform The Union Operation On Sets
No ratings yet
Program-1 Write A Program in C To Create Two Sets and Perform The Union Operation On Sets
22 pages
Ada Lab Manual
No ratings yet
Ada Lab Manual
57 pages
Web Application Practical Questions
100% (2)
Web Application Practical Questions
2 pages
Os Model Paper
0% (1)
Os Model Paper
2 pages
Anna University CP 2 Marks
50% (2)
Anna University CP 2 Marks
62 pages
BCA Data Structures Notes
No ratings yet
BCA Data Structures Notes
24 pages
Unit 1 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Structure - WWW - Rgpvnotes.in
8 pages
DS Viva Questions
No ratings yet
DS Viva Questions
6 pages
Challenges InThreading A Loop - Doc1
100% (2)
Challenges InThreading A Loop - Doc1
6 pages

Pipeline and Vector Processing

Uploaded by

Pipeline and Vector Processing

Uploaded by

PIPELINE AND VECTOR PROCESSING

 It refers to techniques that are used to provide simultaneous data processing.

 The system may have two or more processors operating concurrently.

 It can be achieved by having multiple functional units that perform same or

Example of parallel Processing:

– Multiple Functional Unit:

o Internal Organization of Processor

» Based on the multiplicity of Instruction Streams and Data Streams

• Sequence of Instructions read from memory

• Operations performed on the data in the processor

 SIMD represents an organization that includes many processing units under

 MISD structure is of only theoretical interest since no practical system has

 MIMD organization refers to a computer system capable of processing

Parallel Processing can be discussed under following topics:

• A technique of decomposing a sequential process into suboperations, with each

• It is a technique of decomposing a sequential process into sub operations, with each

• Each segment performs partial processing dictated by the way task is

• The result obtained from each segment is transferred to next segment.

• Suppose we have to perform the following task:

• Each sub operation is to be performed in a segment within a pipeline. Each segment

• General Structure of a 4-Segment Pipeline

The following diagram shows 6 tasks T1 through T6 executed in 4segments.

• Pipelined Machine (k stages, n tasks)

• Conventional Machine (Non-Pipelined)

o Cycles to complete each task in nonpipeline = k

 S = Nonpipeline time /Pipeline time

 For n tasks: S = nk/(k+n-1)

 As n becomes much larger than k-1; Therefore, S = nk/n = k

PIPELINE AND MULTIPLE FUNCTION UNITS:

- 100 tasks to be executed

- 1 task in non-pipelined system; 4 clock cycles

Pipelined System : k + n - 1 = 4 + 99 = 103 clock cycles

Non-Pipelined System : n*k = 100 * 4 = 400 clock cycles

 They are used to implement floating point operations.

 The floating point addition and subtraction can be performed in four

[1] Compare the exponents

[2] Align the mantissa

[3] Add/sub the mantissa

[4] Normalize the result

Four Segment CPU Pipeline:

 DA segment decodes the instruction and calculate the effective address.

Pipeline processing can occur also in the instruction stream. An

instruction pipeline reads consecutive instructions from memory while

previous instructions are being executed in other segments. Six Phases* in

[1] Fetch an instruction from memory

[2] Decode the instruction

[4] Fetch the operands from memory

[5] Execute the operation

[6] Store the result in the proper place

* Some instructions skip some phases

phase ==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory

[3] FO: Fetch the operand

[4] EX: Execute the operation

– Pipeline Conflicts : 3 major difficulties

2) Data dependency: when an instruction depend on the result of a previous

Handling Data Dependency:

 Operand Forwarding: It uses the special hardware to detect the

 Delayed Loads: The compiler detects the data conflict and

Handling of Branch Instruction:

 Branch target buffer(BTB) included in the fetch segment of the pipeline

Simplicity of instruction set is utilized to implement an instruction pipeline using

Three Segment Instruction Pipeline:

Let us consider the program having the following 5 instructions

Figure 11.1: Block diagram of Micorprocessor with memory and I/O.

To overcome these limitations Intel moves from 8-bit microprocessor to 16-bit

Microprocessor Intel 8085 :

Register in the Intel 8085:

The programmable registers of 8085 are as follows -

 One 8-bit accumulator A.

Arithmetic : Addition, Subtraction, Increment, Decrement, Compare.

Logical : AND, OR, EXOR, NOT, SHIFT / ROTATE, CLEAR.

The operations performed on binary 2's complement data.

The Interface Section:

Address and Data Bus

There are seven memory and I/O control lines -

Non-Pipelined System : nk = 100 4 = 400 clock cycles