Pipeline and Vector Processing
Pipeline and Vector Processing
1. Parallel processing:
Parallel processing is a term used for a large class of techniques that are used to
provide simultaneous data-processing tasks for the purpose of increasing the computational
speed of a computer system.
The system may have two or more ALUs to be able to execute two
or more instruction at the same time.
Separate the execution unit into eight functional units operating in parallel.
There are variety of ways in which the parallel processing can be classified
– Flynn's classification
» Instruction Stream
» Data Stream
SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.
The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.
Pipeline Processing
Vector Processing
Array Processors
2. PIPELINING
• The final result is obtained when data have passed through all segments.
• Space-Time Diagram
PIPELINE SPEEDUP:
Consider the case where a k-segment pipeline used to execute n tasks.
n = 6 in previous example
k = 4 in previous example
o The first task t1 requires k clock cycles to complete its operation since there
are k segments
o The remaining n-1 tasks require n-1 clock cycles
o The n tasks clock cycles = k+(n-1) (9 in previous example)
• Speedup (S)
Example:
- 4-stage pipeline
• Arithmetic Pipeline
• Instruction Pipeline
ARITHMETIC PIPELINE:
Pipeline arithmetic units are usually found in very high speed computers.
The inputs to floating point adder pipeline are two normalized floating point numbers.
A and B are mantissas and a and b are the exponents.
1) Compare exponents :
3-2=1
2) Align mantissas
3
X = 0.9504 x 10
3
Y = 0.08200 x 10
3) Add mantissas
3
Z = 1.0324 x 10
4) Normalize result
4
Z = 0.10324 x 10
24
UNIT-V
Instruction Pipeline:
Pipeline processing can occur not only in the data stream but in the instruction stream as
well.
An instruction pipeline reads consecutive instruction from memory while previous
instruction are being executed in other segments.
This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.
25
UNIT-V
INSTRUCTION CYCLE:
an Instruction Cycle
26
UNIT-V
[3] Calculate the effective address of the operand
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the execution
[2] DA: Decode the instruction and calculate the effective address of the operand
Pipeline Conflicts :
–
1) Resource conflicts: memory access by two segments at the same time. Most of
these conflicts can be resolved by using separate instruction and data memories.
27
UNIT-V
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.
3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the
value of PC.
Hardware interlocks: It is the circuit that detects the conflict situation and
delayed the instruction by sufficient cycles to resolve the conflict.
Since all operation are performed in the register, there is no need of effective
address calculation.
A: ALU Operation
E: Execute
Instruction Delayed Load:
28
UNIT-V
Delayed Branch:
29
UNIT-V
Organization of Intel 8085 Micro-Processor:
The microprocessors that are available today came with a wide variety of capabilities and
architectural features. All of them, regardless of their diversity, are provided with at least the
following functional components, which form the central processing unit (CPU) of a classical
computer.
1. Register Section : A set of registers for temporary storage of instructions, data and
address of data .
2. Arithmetic and Logic Unit : Hardware for performing primitive arithmetic and logical
operations .
3. Interface Section : Input and output lines through which the microprocessor
communicates with the outside world .
4. Timing and Control Section : Hardware for coordinating and controlling the activities
of the various sections within the microprocessor and other devices connected to the
interface section .
The block diagram of the microprocessor along with the memory and Input/Output (I/O)
devices is shown in the Figure 11.1.
30
UNIT-V
Intel Microprocessors:
Intel 4004 is the first 4-bit microprocessor introduced by Intel in 1971. After that Intel
introduced its first 8-bit microprocessor 8088 in 1972.
These microprocessors could not last long as general-purpose microprocessors due to their
design and performance limitations.
In 1974, Intel introduced the first general purpose 8-bit microprocessor 8080 and this is the
first step of Intel towards the development of advanced microprocessor.
After 8080, Intel launched microprocessor 8085 with a few more features added to its
architecture, and it is considered to be the first functionally complete microprocessor.
The main limitations of the 8-bit microprocessors were their low speed, low memory
capacity, limited number of general purpose registers and a less powerful instruction set .
In the family of 16-bit microprocessors, Intel's 8086 was the first one introduced in 1978 .
8086 microprocessor has a much powerful instruction set along with the architectural
developments, which imparted substantial programming flexibility and improvement over the
8-bit microprocessor.
Intel 8085 is the first popular microprocessor used by many vendors. Due to its simple
architecture and organization, it is easy to understand the working principle of a
microprocessor.
The programmable registers of the 8085 are shown in the Figure 11.2-
31
UNIT-V
Figure 11.2: Register Organisation of 8085
Apart from these programmable registers , some other registers are also available which are
not accessible to the programmer . These registers include -
Instruction Register(IR).
Memory address and data buffers(MAR & MDR).
o MAR: Memory Address Register.
o MDR: Memory Data Register.
Temporary register for ALU use.
ALU of 8085 :
The 8-bit parallel ALU of 8085 is capable of performing the following operations –
Because of limited chip area , complex operations like multiplication, division, etc are not
available, in earlier processors like 8085.
The five flag bits give the status of the microprocessor after an ALU operation.
The carry (C) flag bit indicates whether there is any overflow from the MSB.
The parity (P) flag bit is set if the parity of the accumulater is even.
The Auxiliary Carry (AC) flag bit indicates overflow out of bit –3 ( lower nibble) in the same
manner, as the C-flag indicates the overflow out of the bit-7.
32
UNIT-V
The Zero (Z) flag bit is set if the content of the accumulator after any ALU operations is zero.
The Sign(S) flag bit is set to the condition of bit-7 of the accumulator as per the sign of the
contents of the accumulator(positive or negative ).
Microprocessor chips are equipped with a number of pins for communication with the outside
world. This is known as the system bus.
The interface lines of the Intel 8085 microprocessor are shown in the Figure 11.3 –
The AD0 - AD7 lines are used as lower order 8-bit address bus and data bus , in time division
multiplexed manner .
The A8 - A15 lines are used for higher order 8 bit of address bus.
IO/M : indicates memory access for LOW and I/O access for HIGH .
ALE : ALE is an address latch enable signal , this signal is HIGH when address information
is present in AD0-AD7 . The falling edge of ALU can be used to latch the address into an
external buffer to de-multiples the address bus .
33
UNIT-V
READY : READY line is used for communication with slow memory and I/O devices .
S0 and S1 : The status of the system bus is difined by the S0 and S1 lines as follows -
S1 S0 Operation Specified
0 0 Halt
0 1Memory or I/O WRITE
1 0Memory or I/O READ
1 1 Instruction Fetch
There are ten lines associated with CPU and bus control-
TRAP , RST7.5 , RST6.5 , RST5.5 and INTR are the Interrupt lines.
INTA: Interrupt acknowledge line.
RESET IN : This is the reset input signal to the 8085.
RESET OUT : The 8085 generates the RESET-OUT signal in response to
RESET-IN signal , which can be used as a system reset signal .
HOLD : HOLD signal is used for DMA request.
HLDA : HLDA signal is used for DMA grant .
Clock and Utility Lines :
The block diagram of the Intel 8085 is shown in the Figure 11.4 -
34
UNIT-V
Addressing Modes :
The 8085 has four different modes for addressing data stored in memory or in registers -
Direct: Bytes 2 and 3 of the instruction contains the exact memory address of the data item(
the low-order bits of the address are in byte 2 , the high-order bits in byte 3 ).
Register: The instruction specifies the register or register pair in which the data are located.
Register Indirect: The instruction specifies a register pair which contains the memory address
where the data are located .( the high-order bits of the address are in the first register of the
pair and the low order bits in the second ).
Immediate: The instruction contains the data itself . This is either and 8-bit quantity or a 16-
bit quantity (least significant byte first , most significant byte second ).
A branch instruction can specify the address of the next instruction to be executed in one of
two ways -
Direct: The branch instruction contains the address of the next instruction to be executed .