100% found this document useful (1 vote)
2K views18 pages

Pipeline and Vector Processing

The document discusses parallel processing and pipeline vector processing. It describes parallel processing as using simultaneous data processing tasks to increase computational speed. Pipeline processing breaks down sequential processes into sub-operations that are concurrently executed in dedicated segments. Vector processing performs the same operation on multiple data elements simultaneously. The document outlines various parallel processing techniques like pipelining arithmetic operations and instruction fetching to improve efficiency. It also discusses challenges like resource conflicts, data dependencies, and branch difficulties in pipeline implementations and how they can be addressed.

Uploaded by

Pavan Pulicherla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views18 pages

Pipeline and Vector Processing

The document discusses parallel processing and pipeline vector processing. It describes parallel processing as using simultaneous data processing tasks to increase computational speed. Pipeline processing breaks down sequential processes into sub-operations that are concurrently executed in dedicated segments. Vector processing performs the same operation on multiple data elements simultaneously. The document outlines various parallel processing techniques like pipelining arithmetic operations and instruction fetching to improve efficiency. It also discusses challenges like resource conflicts, data dependencies, and branch difficulties in pipeline implementations and how they can be addressed.

Uploaded by

Pavan Pulicherla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

PIPELINE AND VECTOR PROCESSING

1. Parallel processing:
Parallel processing is a term used for a large class of techniques that are used to
provide simultaneous data-processing tasks for the purpose of increasing the computational
speed of a computer system.

 It refers to techniques that are used to provide simultaneous data processing.

 The system may have two or more ALUs to be able to execute two
or more instruction at the same time.

 The system may have two or more processors operating concurrently.

 It can be achieved by having multiple functional units that perform same or


different operation simultaneously.

Example of parallel Processing:

– Multiple Functional Unit:

Separate the execution unit into eight functional units operating in parallel.

There are variety of ways in which the parallel processing can be classified

o Internal Organization of Processor


o Interconnection structure between processors
o Flow of information through system
Architectural Classification:

– Flynn's classification

» Based on the multiplicity of Instruction Streams and Data Streams

» Instruction Stream

• Sequence of Instructions read from memory

» Data Stream

• Operations performed on the data in the processor

 SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.

 SIMD represents an organization that includes many processing units under


the supervision of a common control unit.

 MISD structure is of only theoretical interest since no practical system has


been constructed using this organization.

 MIMD organization refers to a computer system capable of processing


several programs at the same time.

The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.

Parallel Processing can be discussed under following topics:

 Pipeline Processing

 Vector Processing

 Array Processors
2. PIPELINING

• A technique of decomposing a sequential process into suboperations, with each


subprocess being executed in a special dedicated segment that operates
concurrently with all other segments.

• It is a technique of decomposing a sequential process into sub operations, with each


sub process being executed in a special dedicated segments that operates concurrently
with all other segments.

• Each segment performs partial processing dictated by the way task is


partitioned.

• The result obtained from each segment is transferred to next segment.

• The final result is obtained when data have passed through all segments.

• Suppose we have to perform the following task:

• Each sub operation is to be performed in a segment within a pipeline. Each segment


has one or two registers and a combinational circuit.
OPERATIONS IN EACH PIPELINE STAGE:

• General Structure of a 4-Segment Pipeline

• Space-Time Diagram

The following diagram shows 6 tasks T1 through T6 executed in 4segments.

PIPELINE SPEEDUP:
Consider the case where a k-segment pipeline used to execute n tasks.
 n = 6 in previous example
 k = 4 in previous example

• Pipelined Machine (k stages, n tasks)

o The first task t1 requires k clock cycles to complete its operation since there
are k segments
o The remaining n-1 tasks require n-1 clock cycles
o The n tasks clock cycles = k+(n-1) (9 in previous example)

• Conventional Machine (Non-Pipelined)

o Cycles to complete each task in nonpipeline = k


o For n tasks, n cycles required is

• Speedup (S)

 S = Nonpipeline time /Pipeline time

 For n tasks: S = nk/(k+n-1)

 As n becomes much larger than k-1; Therefore, S = nk/n = k

PIPELINE AND MULTIPLE FUNCTION UNITS:

Example:

- 4-stage pipeline

- 100 tasks to be executed

- 1 task in non-pipelined system; 4 clock cycles

Pipelined System : k + n - 1 = 4 + 99 = 103 clock cycles

Non-Pipelined System : n*k = 100 * 4 = 400 clock cycles


Speedup :Sk = 400 / 103 = 3.88

• Arithmetic Pipeline

• Instruction Pipeline

ARITHMETIC PIPELINE:
 Pipeline arithmetic units are usually found in very high speed computers.

 They are used to implement floating point operations.


23
UNIT-V
 We will now discuss the pipeline unit for the floating point addition and
subtraction.

 The inputs to floating point adder pipeline are two normalized floating point numbers.
 A and B are mantissas and a and b are the exponents.

 The floating point addition and subtraction can be performed in four


segments. Floating-point adder:

[1] Compare the exponents

[2] Align the mantissa

[3] Add/sub the mantissa

[4] Normalize the result

1) Compare exponents :

3-2=1

2) Align mantissas
3
X = 0.9504 x 10
3
Y = 0.08200 x 10
3) Add mantissas
3
Z = 1.0324 x 10
4) Normalize result
4
Z = 0.10324 x 10

24
UNIT-V
Instruction Pipeline:

Pipeline processing can occur not only in the data stream but in the instruction stream as
well.

An instruction pipeline reads consecutive instruction from memory while previous
instruction are being executed in other segments.

This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.

Four Segment CPU Pipeline:


 FI segment fetches the instruction.

 DA segment decodes the instruction and calculate the effective address.


 FO segment fetches the operand.
 EX segment executes the instruction.

25
UNIT-V
INSTRUCTION CYCLE:

Pipeline processing can occur also in the instruction stream. An

instruction pipeline reads consecutive instructions from memory while

previous instructions are being executed in other segments. Six Phases* in

an Instruction Cycle

[1] Fetch an instruction from memory

[2] Decode the instruction

26
UNIT-V
[3] Calculate the effective address of the operand

[4] Fetch the operands from memory

[5] Execute the operation

[6] Store the result in the proper place

* Some instructions skip some phases

* Effective address calculation can be done in the part of the decoding phase

* Storage of the operation result into a register is done automatically in the execution

phase ==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory

[2] DA: Decode the instruction and calculate the effective address of the operand

[3] FO: Fetch the operand

[4] EX: Execute the operation

Pipeline Conflicts :

– Pipeline Conflicts : 3 major difficulties


1) Resource conflicts: memory access by two segments at the same time. Most of
these conflicts can be resolved by using separate instruction and data memories.

2) Data dependency: when an instruction depend on the result of a previous


instruction, but this result is not yet available.

27
UNIT-V
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.

3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the
value of PC.

Handling Data Dependency:


 This problem can be solved in the following ways:

Hardware interlocks: It is the circuit that detects the conflict situation and
delayed the instruction by sufficient cycles to resolve the conflict.

 Operand Forwarding: It uses the special hardware to detect the


conflict and avoid it by routing the data through the special path between
pipeline segments.

 Delayed Loads: The compiler detects the data conflict and


reorder the instruction as necessary to delay the loading of the
conflicting data by inserting no operation instruction.

Handling of Branch Instruction:


 Pre fetch the target instruction.

 Branch target buffer(BTB) included in the fetch segment of the pipeline


 Branch Prediction
 Delayed Branch
RISC Pipeline:

Simplicity of instruction set is utilized to implement an instruction pipeline using


small number of sub-operation, with each being executed in single clock cycle.

Since all operation are performed in the register, there is no need of effective
address calculation.

Three Segment Instruction Pipeline:


 I: Instruction Fetch

 A: ALU Operation

 E: Execute
Instruction Delayed Load:

28
UNIT-V
Delayed Branch:

Let us consider the program having the following 5 instructions

29
UNIT-V
Organization of Intel 8085 Micro-Processor:

The microprocessors that are available today came with a wide variety of capabilities and
architectural features. All of them, regardless of their diversity, are provided with at least the
following functional components, which form the central processing unit (CPU) of a classical
computer.

1. Register Section : A set of registers for temporary storage of instructions, data and
address of data .
2. Arithmetic and Logic Unit : Hardware for performing primitive arithmetic and logical
operations .
3. Interface Section : Input and output lines through which the microprocessor
communicates with the outside world .
4. Timing and Control Section : Hardware for coordinating and controlling the activities
of the various sections within the microprocessor and other devices connected to the
interface section .

The block diagram of the microprocessor along with the memory and Input/Output (I/O)
devices is shown in the Figure 11.1.

Figure 11.1: Block diagram of Micorprocessor with memory and I/O.

30
UNIT-V
Intel Microprocessors:

Intel 4004 is the first 4-bit microprocessor introduced by Intel in 1971. After that Intel
introduced its first 8-bit microprocessor 8088 in 1972.

These microprocessors could not last long as general-purpose microprocessors due to their
design and performance limitations.

In 1974, Intel introduced the first general purpose 8-bit microprocessor 8080 and this is the
first step of Intel towards the development of advanced microprocessor.

After 8080, Intel launched microprocessor 8085 with a few more features added to its
architecture, and it is considered to be the first functionally complete microprocessor.

The main limitations of the 8-bit microprocessors were their low speed, low memory
capacity, limited number of general purpose registers and a less powerful instruction set .

To overcome these limitations Intel moves from 8-bit microprocessor to 16-bit


microprocessor.

In the family of 16-bit microprocessors, Intel's 8086 was the first one introduced in 1978 .

8086 microprocessor has a much powerful instruction set along with the architectural
developments, which imparted substantial programming flexibility and improvement over the
8-bit microprocessor.

Microprocessor Intel 8085 :

Intel 8085 is the first popular microprocessor used by many vendors. Due to its simple
architecture and organization, it is easy to understand the working principle of a
microprocessor.

Register in the Intel 8085:

The programmable registers of 8085 are as follows -

 One 8-bit accumulator A.


 Six 8-bit general purpose register (GPR’s)
B, C, D , E , H and L.
 The GPR’s are also accessible as three 16-bit register pairs BC, DE and HL.
 There is a 16-bit program counter(PC), one 16-bit stack
pointer(SP) and 8-bit flag register . Out of 8 bits of the flag
register , only 5 bits are in use.

The programmable registers of the 8085 are shown in the Figure 11.2-

31
UNIT-V
Figure 11.2: Register Organisation of 8085

Apart from these programmable registers , some other registers are also available which are
not accessible to the programmer . These registers include -

 Instruction Register(IR).
 Memory address and data buffers(MAR & MDR).
o MAR: Memory Address Register.
o MDR: Memory Data Register.
 Temporary register for ALU use.

ALU of 8085 :

The 8-bit parallel ALU of 8085 is capable of performing the following operations –

Arithmetic : Addition, Subtraction, Increment, Decrement, Compare.

Logical : AND, OR, EXOR, NOT, SHIFT / ROTATE, CLEAR.

Because of limited chip area , complex operations like multiplication, division, etc are not
available, in earlier processors like 8085.

The operations performed on binary 2's complement data.

The five flag bits give the status of the microprocessor after an ALU operation.

The carry (C) flag bit indicates whether there is any overflow from the MSB.

The parity (P) flag bit is set if the parity of the accumulater is even.

The Auxiliary Carry (AC) flag bit indicates overflow out of bit –3 ( lower nibble) in the same
manner, as the C-flag indicates the overflow out of the bit-7.

32
UNIT-V
The Zero (Z) flag bit is set if the content of the accumulator after any ALU operations is zero.

The Sign(S) flag bit is set to the condition of bit-7 of the accumulator as per the sign of the
contents of the accumulator(positive or negative ).

The Interface Section:

Microprocessor chips are equipped with a number of pins for communication with the outside
world. This is known as the system bus.
The interface lines of the Intel 8085 microprocessor are shown in the Figure 11.3 –

Address and Data Bus

The AD0 - AD7 lines are used as lower order 8-bit address bus and data bus , in time division
multiplexed manner .

The A8 - A15 lines are used for higher order 8 bit of address bus.

There are seven memory and I/O control lines -

RD : indicates a READ operation when the signal is LOW .

WR : indicates a WRITE operation when the signal is LOW .

IO/M : indicates memory access for LOW and I/O access for HIGH .

ALE : ALE is an address latch enable signal , this signal is HIGH when address information
is present in AD0-AD7 . The falling edge of ALU can be used to latch the address into an
external buffer to de-multiples the address bus .

33
UNIT-V
READY : READY line is used for communication with slow memory and I/O devices .

S0 and S1 : The status of the system bus is difined by the S0 and S1 lines as follows -

S1 S0 Operation Specified
0 0 Halt
0 1Memory or I/O WRITE
1 0Memory or I/O READ
1 1 Instruction Fetch

There are ten lines associated with CPU and bus control-

TRAP , RST7.5 , RST6.5 , RST5.5 and INTR are the Interrupt lines.
INTA: Interrupt acknowledge line.
RESET IN : This is the reset input signal to the 8085.
RESET OUT : The 8085 generates the RESET-OUT signal in response to
RESET-IN signal , which can be used as a system reset signal .
 HOLD : HOLD signal is used for DMA request.
 HLDA : HLDA signal is used for DMA grant .
 Clock and Utility Lines :

X1 and X2: X1 and X2 are provided to connect a crystal or a RC network for


generating theclockinternaltothe chip.
Sid: input line for serial data communication.
Sod: output line for serial data communication.
Vcc and vss: power supply.

 The block diagram of the Intel 8085 is shown in the Figure 11.4 -

34
UNIT-V
Addressing Modes :

The 8085 has four different modes for addressing data stored in memory or in registers -

Direct: Bytes 2 and 3 of the instruction contains the exact memory address of the data item(
the low-order bits of the address are in byte 2 , the high-order bits in byte 3 ).

Register: The instruction specifies the register or register pair in which the data are located.

Register Indirect: The instruction specifies a register pair which contains the memory address
where the data are located .( the high-order bits of the address are in the first register of the
pair and the low order bits in the second ).

Immediate: The instruction contains the data itself . This is either and 8-bit quantity or a 16-
bit quantity (least significant byte first , most significant byte second ).

Unless directed by an interrupt or branch instruction the execution of instructions proceeds


through consecutively increasing memory locations.

A branch instruction can specify the address of the next instruction to be executed in one of
two ways -

Direct: The branch instruction contains the address of the next instruction to be executed .

You might also like