0% found this document useful (0 votes)

29 views21 pages

COA Unit - V Notes

computer organization and architecture unit 5

Uploaded by

eruvaram12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views21 pages

COA Unit - V Notes

computer organization and architecture unit 5

Uploaded by

eruvaram12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

COMPUTER ORGANIZATION AND ARCHITECTURE

UNIT- V
Pipelining: Basic concepts of pipelining, Arithmetic pipeline, Instruction pipeline, Instruction Hazards.
Parallel Processors: Introduction to parallel processors, Multiprocessor, Interconnection structures and
Cache coherency.
---------------------------------------------------------------------------------------------------------------------------------

Pipelining: Basic concepts of pipelining:

Pipelining is the process of accumulating instruction from the processor through a pipeline. It allows
storing and executing instructions in an orderly process. It is also known as pipeline processing.

Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline is divided
into stages and these stages are connected with one another to form a pipe like structure. Instructions enter
from one end and exit from another end.

Pipelining increases the overall instruction throughput.

In pipeline system, each segment consists of an input register followed by a combinational circuit. The
register is used to hold data and combinational circuit performs operations on it. The output of
combinational circuit is applied to the input register of the next segment.

Pipeline system is like the modern day assembly line setup in factories. For example in a car manufacturing
industry, huge assembly lines are setup and at each point, there are robotic arms to perform a certain task,
and then the car moves on ahead to the next arm.

Types of Pipeline

It is divided into 2 categories:

1. Arithmetic Pipeline
2. Instruction Pipeline

Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to implement floating-point
operations, multiplication of fixed-point numbers, and similar computations encountered in scientific
problems.

To understand the concepts of arithmetic pipeline in a more convenient way, let us consider an example of
a pipeline unit for floating-point addition and subtraction.

The inputs to the floating-point adder pipeline are two normalized floating-point binary numbers defined
as:
X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102

Where A and B are two fractions that represent the mantissa and a and b are the exponents.

The combined operation of floating-point addition and subtraction is divided into four segments. Each
segment contains the corresponding sub operation to be performed in the given pipeline. The sub
operations that are shown in the four segments are:

1. Compare the exponents by subtraction.

2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.

We will discuss each sub operation in a more detailed manner later in this section.

The following block diagram represents the sub operations performed in each segment of the pipeline.
Note: Registers are placed after each suboperation to store the intermediate results.

1. Compare exponents by subtraction:

The exponents are compared by subtracting them to determine their difference. The larger exponent is
chosen as the exponent of the result.

The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa associated with the
smaller exponent must be shifted to the right.

2. Align the mantissas:

The mantissa associated with the smaller exponent is shifted according to the difference of exponents
determined in segment one.
X = 0.9504 * 103
Y = 0.08200 * 103

3. Add mantissas:

The two mantissas are added in segment three.

Z = X + Y = 1.0324 * 103
4. Normalize the result:

After normalization, the result is written as:

Z = 0.1324 * 104

Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream as well.

Most of the digital computers with complex instructions require instruction pipeline to carry out operations
like fetch, decode and execute instructions.

In general, the computer needs to process each instruction with the following sequence of steps.

1. Fetch instruction from memory.

2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

Each step is executed in a particular segment, and there are times when different segments may take
different times to operate on the incoming information. Moreover, there are times when two or more
segments may require memory access at the same time, causing one segment to wait until another is
finished with the memory.

The organization of an instruction pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration. One of the most common examples of this type of organization is a Four-
segment instruction pipeline.

A four-segment instruction pipeline combines two or more different segments and makes it as a single
one. For instance, the decoding of the instruction can be combined with the calculation of the effective
address into one segment.

The following block diagram shows a typical example of a four-segment instruction pipeline. The
instruction cycle is completed in four segments.
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and eventually, the effective
address is calculated in a separate arithmetic circuit.
Segment 3:
An operand from memory is fetched in the third segment.
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.

Pipeline Conflicts

There are some factors that cause the pipeline to deviate its normal performance. Some of these factors are
given below:

1. Timing Variations

All stages cannot take same amount of time. This problem generally occurs in instruction processing where
different instructions have different operand requirements and thus different processing time.

2. Data Hazards

When several instructions are in partial execution, and if they reference same data then the problem arises.
We must ensure that next instruction does not attempt to access data before the current instruction, because
this will lead to incorrect results.

3. Branching

In order to fetch and execute the next instruction, we must know what that instruction is. If the present
instruction is a conditional branch, and its result will lead us to the next instruction, then the next
instruction may not be known until the current one is processed.

4. Interrupts

Interrupts set unwanted instruction into the instruction stream. Interrupts effect the execution of instruction.

5. Data Dependency

It arises when an instruction depends upon the result of a previous instruction but this result is not yet
available.

Advantages of Pipelining

1. The cycle time of the processor is reduced.

2. It increases the throughput of the system
3. It makes the system reliable.

Disadvantages of Pipelining

1. The design of pipelined processor is complex and costly to manufacture.

2. The instruction latency is more.
Instruction Hazards:

Dependencies in a pipelined processor

There are mainly three types of dependencies possible in a pipelined processor. These are :
1) Structural Dependency
2) Control Dependency
3) Data Dependency

These dependencies may introduce stalls in the pipeline.

Stall : A stall is a cycle in the pipeline without new input.

Structural dependency

This dependency arises due to the resource conflict in the pipeline. A resource conflict is a situation when
more than one instruction tries to access the same resource in the same cycle. A resource can be a register,
memory, or ALU.

Exampl6e:
Instruction / Cycle 1 2 3 4 5
I1 IF(Mem) ID EX Mem
I2 IF(Mem) ID EX
I3 IF(Mem) ID EX
I4 IF(Mem) ID

In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource (Memory) which
introduces a resource conflict.
To avoid this problem, we have to keep the instruction on wait until the required resource (memory in our
case) becomes available. This wait will introduce stalls in the pipeline as shown below:
Cycle 1 2 3 4 5 6 7 8
I1 IF(Mem) ID EX Mem WB
I2 IF(Mem) ID EX Mem WB
I3 IF(Mem) ID EX Mem WB
I4 – – – IF(Mem)

Solution for structural dependency

To minimize structural dependency stalls in the pipeline, we use a hardware mechanism called Renaming.
Renaming : According to renaming, we divide the memory into two independent modules used to store the
instruction and data separately called Code memory(CM) and Data memory(DM) respectively. CM will
contain all the instructions and DM will contain all the operands that are required for the instructions.
Instruction/ Cycle 1 2 3 4 5 6 7
I1 IF(CM) ID EX DM WB
I2 IF(CM) ID EX DM WB
I3 IF(CM) ID EX DM WB
I4 IF(CM) ID EX DM
I5 IF(CM) ID EX
I6 IF(CM) ID
I7 IF(CM)

Control Dependency (Branch Hazards)

This type of dependency occurs during the transfer of control instructions such as BRANCH, CALL, JMP,
etc. On many instruction architectures, the processor will not know the target address of these instructions
when it needs to insert the new instruction into the pipeline. Due to this, unwanted instructions are fed to
the pipeline.

Consider the following sequence of instructions in the program:

100: I1
101: I2 (JMP 250)
102: I3
.
.
250: BI1

Expected output: I1 -> I2 -> BI1

NOTE: Generally, the target address of the JMP instruction is known after ID stage only.
Instruction/ Cycle 1 2 3 4 5 6
I1 IF ID EX MEM WB
I2 IF ID (PC:250) EX Mem WB
I3 IF ID EX Mem
BI1 IF ID EX

Output Sequence: I1 -> I2 -> I3 -> BI1

So, the output sequence is not equal to the expected output, that means the pipeline is not implemented
correctly.

To correct the above problem we need to stop the Instruction fetch until we get target address of branch
instruction. This can be implemented by introducing delay slot until we get the target address.
Instruction/ Cycle 1 2 3 4 5 6
I1 IF ID EX MEM WB
I2 IF ID (PC:250) EX Mem WB
Delay – – – – – –
BI1 IF ID EX

Output Sequence: I1 -> I2 -> Delay (Stall) -> BI1

As the delay slot performs no operation, this output sequence is equal to the expected output sequence. But
this slot introduces stall in the pipeline.

Solution for Control dependency Branch Prediction is the method through which stalls due to control
dependency can be eliminated. In this at 1st stage prediction is done about which branch will be taken.For
branch prediction Branch penalty is zero.

Branch penalty : The number of stalls introduced during the branch operations in the pipelined processor
is known as branch penalty.

NOTE : As we see that the target address is available after the ID stage, so the number of stalls introduced
in the pipeline is 1. Suppose, the branch target address would have been present after the ALU stage, there
would have been 2 stalls. Generally, if the target address is present after the kth stage, then there will be (k –
1) stalls in the pipeline.

Total number of stalls introduced in the pipeline due to branch instructions = Branch frequency * Branch
Penalty

Data Dependency (Data Hazard)

Let us consider an ADD instruction S, such that
S : ADD R1, R2, R3
Addresses read by S = I(S) = {R2, R3}
Addresses written by S = O(S) = {R1}

Now, we say that instruction S2 depends in instruction S1, when

This condition is called Bernstein condition.

Three cases exist:

 Flow (data) dependence: O(S1) ∩ I (S2), S1 → S2 and S1 writes after something read by S2
 Anti-dependence: I(S1) ∩ O(S2), S1 → S2 and S1 reads something before S2 overwrites it
 Output dependence: O(S1) ∩ O(S2), S1 → S2 and both write the same memory location.

Example: Let there be two instructions I1 and I2 such that:

I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2

When the above instructions are executed in a pipelined processor, then data dependency condition will
occur, which means that I2 tries to read the data before I1 writes it, therefore, I2 incorrectly gets the old
value from I1.
Instruction / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF ID(Old value) EX

To minimize data dependency stalls in the pipeline, operand forwarding is used.

Operand Forwarding : In operand forwarding, we use the interface registers present between the stages to
hold intermediate output so that dependent instruction can access new value from the interface register
directly.

Considering the same example:

I1 : ADD R1, R2, R3
I2 : SUB R4, R1, R2
Instruction / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF ID EX

Data Hazards

Data hazards occur when instructions that exhibit data dependence, modify data in different stages of a
pipeline. Hazard cause delays in the pipeline. There are mainly three types of data hazards:
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]

Let there be two instructions I and J, such that J follow I. Then,

 RAW hazard occurs when instruction J tries to read data before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R4 <- R2 + R3
 WAR hazard occurs when instruction J tries to write data before instruction I reads it.
Eg:
I: R2 <- R1 + R3
J: R3 <- R4 + R5
 WAW hazard occurs when instruction J tries to write output before instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5

WAR and WAW hazards occur during the out-of-order execution of the instructions.

Parallel Processors:

Introduction to parallel processors:

Parallel processing can be described as a class of techniques which enables the system to achieve
simultaneous data-processing tasks to increase the computational speed of a computer system.
A parallel processing system can carry out simultaneous data-processing to achieve faster execution time.
For instance, while an instruction is being processed in the ALU component of the CPU, the next
instruction can be read from memory.
The primary purpose of parallel processing is to enhance the computer processing capability and increase
its throughput, i.e. the amount of processing that can be accomplished during a given interval of time.
A parallel processing system can be achieved by having a multiplicity of functional units that perform
identical or different operations simultaneously. The data can be distributed among various multiple
functional units.
The following diagram shows one possible way of separating the execution unit into eight functional units
operating in parallel.
The operation performed in each functional unit is indicated in each block if the diagram:
 The adder and integer multiplier performs the arithmetic operation with integer numbers.
 The floating-point operations are separated into three circuits operating in parallel.
 The logic, shift, and increment operations can be performed concurrently on different data. All units
are independent of each other, so one number can be shifted while another number is being
incremented.

1.Multiprocessor:
A Multiprocessor is a computer system with two or more central processing units (CPUs) share full access
to a common RAM. The main objective of using a multiprocessor is to boost the system’s execution speed,
with other objectives being fault tolerance and application matching.

There are two types of multiprocessors, one is called shared memory multiprocessor and another is
distributed memory multiprocessor. In shared memory multiprocessors, all the CPUs shares the common
memory but in a distributed memory multiprocessor, every CPU has its own private memory.
Applications of Multiprocessor –

1. As a uniprocessor, such as single instruction, single data stream (SISD).

2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is usually used
for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple instruction, single data
stream (MISD), which is used for describing hyper-threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions in multiple
perspectives, such as multiple instruction, multiple data stream (MIMD).

Benefits of using a Multiprocessor –

 Enhanced performance.
 Multiple applications.
 Multi-tasking inside an application.
 High throughput and responsiveness.
 Hardware sharing among CPUs.

Flynn's Classification of Computers

M.J. Flynn proposed a classification for the organization of a computer system by the number of
instructions and data items that are manipulated simultaneously.

The sequence of instructions read from memory constitutes an instruction stream.

The operations performed on the data in the processor constitute a data stream.

Note: The term 'Stream' refers to the flow of instructions or data.

Parallel processing may occur in the instruction stream, in the data stream, or both.

Flynn's classification divides computers into four major groups that are:

1. Single instruction stream, single data stream (SISD)

2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)

Parallel computing is a computing where the jobs are broken into discrete parts that can be executed
concurrently. Each part is further broken down to a series of instructions. Instructions from each part
execute simultaneously on different CPUs. Parallel systems deal with the simultaneous use of multiple
computer resources that can include a single computer with multiple processors, a number of computers
connected by a network to form a parallel processing cluster or a combination of both.
Parallel systems are more difficult to program than computers with a single processor because the
architecture of parallel computers varies accordingly and the processes of multiple CPUs must be
coordinated and synchronized.
The crux of parallel processing are CPUs. Based on the number of instruction and data streams that can
be processed simultaneously, computing systems are classified into four major categories:

Flynn’s classification –

1. Single-instruction,single-data(SISD)systems:
An SISD computing system is a uniprocessor machine which is capable of executing a single
instruction, operating on a single data stream. In SISD, machine instructions are processed in a
sequential manner and computers adopting this model are popularly called sequential computers.
Most conventional computers have SISD architecture. All the instructions and data to be processed
have to be stored in primary memory.

The speed of the processing element in the SISD model is limited(dependent) by the rate at which
the computer can transfer information internally. Dominant representative SISD systems are IBM
PC, workstations.

2. Single-instruction, multiple-data(SIMD)systems:
An SIMD system is a multiprocessor machine capable of executing the same instruction on all the
CPUs but operating on different data streams. Machines based on an SIMD model are well suited to
scientific computing since they involve lots of vector and matrix operations. So that the information
can be passed to all the processing elements (PEs) organized data elements of vectors can be
divided into multiple sets(N-sets for N PE systems) and each PE can process one data set.

Dominant representative SIMD systems is Cray’s vector processing machine.

3. Multiple-instruction,single-data(MISD)systems:
An MISD computing system is a multiprocessor machine capable of executing different instructions
on different PEs but all of them operating on the same dataset .

ExampleZ=sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built using the MISD
model are not useful in most of the application, a few machines are built, but none of them are
available commercially.

4. Multiple-instruction,multiple-data(MIMD)systems:
An MIMD system is a multiprocessor machine which is capable of executing multiple instructions
on multiple data sets. Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of application. Unlike SIMD and
MISD machines, PEs in MIMD machines work asynchronously.
MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory
MIMD based on the way PEs are coupled to the main memory.

In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are
connected to a single global memory and they all have access to it. The communication between
PEs in this model takes place through the shared memory, modification of the data stored in the
global memory by one PE is visible to all other PEs.

Dominant representative shared memory MIMD systems are Silicon Graphics machines and
Sun/IBM’s SMP (Symmetric Multi-Processing).
In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a
local memory. The communication between PEs in this model takes place through the
interconnection network (the inter process communication channel, or IPC). The network
connecting PEs can be configured to tree, mesh or in accordance with the requirement.

The shared-memory MIMD architecture is easier to program but is less tolerant to failures and
harder to extend with respect to the distributed memory MIMD model. Failures in a shared-memory
MIMD affect the entire system, whereas this is not the case of the distributed model, in which each
of the PEs can be easily isolated. Moreover, shared memory MIMD architectures are less likely to
scale because the addition of more PEs leads to memory contention.

Interconnection Structures:
The interconnection between the components of a multiprocessor System can have different
physical configurations depending n the number of transfer paths that are available between the processors
and memory in a shared memory system and among the processing elements in a loosely coupled system.
Some of the schemes are as: -
Time-Shared Common Bus –
Multiport Memory –
Crossbar Switch –
Multistage Switching Network –
Hypercube System
Time shared common Bus
All processors (and memory) are connected to a common bus or busses
- Memory access is fairly uniform, but not very scalable
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
- Operations of Bus

In the above figure we have number of local buses to its own local memory and to one or more processors.
Each local bus may be connected to a CPU, an IOP, or any combinations of processors. A system bus
controller links each local bus to a common system bus. The I/O devices connected to the local IOP, as
well as the local memory, are available to the local processor. The memory connected to the common
system bus is shared by all processors. If an IOP is connected directly to the system bus the I/O devices
attached to it may be made available to all processors
Disadvantage.:
• Only one processor can communicate with the memory or another processor at any given time.
• As a consequence, the total overall transfer rate within the system is limited by the speed of the single
path b.

Multiport Memory System employs separate buses between each memory module and each CPU. A
processor bus comprises the address, data and control lines necessary to communicate with memory. Each
memory module connects each processor bus. At any given time, the memory module should have internal
control logic to obtain which port can have access to memory.
Memory module can be said to have four ports and each port accommodates one of the buses. Assigning
fixed priorities to each memory port resolve the memory access conflicts. the priority is established for
memory access associated with each processor by the physical port position that its bus occupies in each
module. Therefore CPU 1 can have priority over CPU 2, CPU 2 can have priority over CPU 3 and CPU 4
can have the lowest priority.

Advantage:-
High transfer rate can be achieved because of multiple paths
Disadvantage:-
 It requires expensive memory control logic and a large number of cables and connectors.
 It is only good for systems with small number of processors.

Cache coherence :
In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same level of
the memory hierarchy.

In a shared memory multiprocessor with a separate cache memory for each processor, it is possible to have
many copies of any one instruction operand: one copy in the main memory and one in each cache memory.
When one copy of an operand is changed, the other copies of the operand must be changed also.

Example :
Cache and the main memory may have inconsistent copies of the same object.

Suppose there are three processors, each having cache. Suppose the following scenario:-
 Processor 1 read X : obtains 24 from the memory and caches it.
 Processor 2 read X : obtains 24 from memory and caches it.
 Again, processor 1 writes as X : 64, Its locally cached copy is updated. Now, processor 3 reads X,
what value should it get?
 Memory and processor 2 thinks it is 24 and processor 1 thinks it is 64.

As multiple processors operate in parallel, and independently multiple caches may possess different copies
of the same memory block, this creates a cache coherence problem.

Cache coherence is the discipline that ensures that changes in the values of shared operands are propagated
throughout the system in a timely fashion.

There are three distinct level of cache coherence :-

1. Every write operation appears to occur instantaneously.

2. All processors see exactly the same sequence of changes of values for each separate operand.
3. Different processors may see an operation and assume different sequences of values; this is known
as non-coherent behavior.

There are various Cache Coherence Protocols in multiprocessor system. These are :-

1. MSI protocol (Modified, Shared, Invalid)

2. MOSI protocol (Modified, Owned, Shared, Invalid)
3. MESI protocol (Modified, Exclusive, Shared, Invalid)
4. MOESI protocol (Modified, Owned, Exclusive, Shared, Invalid)

These important terms are discussed as follows:

 Modified –
It means that the the value in the cache is dirty, that is the value in current cache is different from
the main memory.
 Exclusive –
It means that the value present in the cache is same as that present in the main memory, that is the
value is clean.
 Shared –
It means that the cache value holds the most recent data copy and that is what shared among all the
cache and main memory as well.
 Owned –
It means that the current cache holds the block and is now the owner of that block, that is having all
rights on that particular blocks.
 Invalid –
This states that the current cache block itself is invalid and is required to be fetched from other
cache or main memory.
Coherency mechanisms :
There are three types of coherence :

1. Directory-based –
In a directory-based system, the data being shared is placed in a common directory that maintains
the coherence between caches. The directory acts as a filter through which the processor must ask
permission to load an entry from the primary memory to its cache. When an entry is changed, the
directory either updates or invalidates the other caches with that entry.
2. Snooping –
First introduced in 1983, snooping is a process where the individual caches monitor address lines
for accesses to memory locations that they have cached. It is called a write invalidate protocol.
When a write operation is observed to a location that a cache has a copy of and the cache controller
invalidates its own copy of the snooped memory location.
3. Snarfing –
It is a mechanism where a cache controller watches both address and data in an attempt to update its
own copy of a memory location when a second master modifies a location in main memory. When
a write operation is observed to a location that a cache has a copy of the cache controller updates its
own copy of the snarfed memory location with the new data.

UNIT - 5 Pipeling Concept
No ratings yet
UNIT - 5 Pipeling Concept
15 pages
Unit 6
No ratings yet
Unit 6
11 pages
Unit 5 1
No ratings yet
Unit 5 1
21 pages
Pipe Lining
No ratings yet
Pipe Lining
5 pages
5.Pipeline and Multiprocessors
No ratings yet
5.Pipeline and Multiprocessors
16 pages
PIpeline Processing and Multi Processing
No ratings yet
PIpeline Processing and Multi Processing
16 pages
Unit-5 (Coa) Notes
No ratings yet
Unit-5 (Coa) Notes
33 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
Pipelining basic concept
No ratings yet
Pipelining basic concept
23 pages
UNIT 6
No ratings yet
UNIT 6
20 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
5. Pipeline -3117
No ratings yet
5. Pipeline -3117
21 pages
V-Unit Co
No ratings yet
V-Unit Co
18 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Unit 5 - Pipeling and Multipoessors
No ratings yet
Unit 5 - Pipeling and Multipoessors
74 pages
Mca Coa-unit III
No ratings yet
Mca Coa-unit III
16 pages
Pipelining
No ratings yet
Pipelining
13 pages
Unit 5 Pipeline
No ratings yet
Unit 5 Pipeline
13 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Lecture 1
100% (1)
Lecture 1
10 pages
6. Pipeline -3117 (1)
No ratings yet
6. Pipeline -3117 (1)
22 pages
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
No ratings yet
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
6 pages
C.Arch Large
No ratings yet
C.Arch Large
57 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
COA M3 BIT (1)
No ratings yet
COA M3 BIT (1)
4 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
module 4-Pipelining
No ratings yet
module 4-Pipelining
39 pages
CA-unit 4-Material
No ratings yet
CA-unit 4-Material
31 pages
Coa Iat-2 QB Soln
No ratings yet
Coa Iat-2 QB Soln
16 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
CH-1 1 Pipelining
No ratings yet
CH-1 1 Pipelining
43 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
16 pages
ACA Unit 2,7th Sem CSE
No ratings yet
ACA Unit 2,7th Sem CSE
13 pages
Pipeline in ARM (1)
No ratings yet
Pipeline in ARM (1)
10 pages
CO Pipelining PDF notes
No ratings yet
CO Pipelining PDF notes
10 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Ch#16(CPU Structure and Function)
No ratings yet
Ch#16(CPU Structure and Function)
48 pages
CA unit-2 Chapter-2
No ratings yet
CA unit-2 Chapter-2
36 pages
Module 5 Pipeline and Vector Processing
No ratings yet
Module 5 Pipeline and Vector Processing
71 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Aca Unit-3 Notes
No ratings yet
Aca Unit-3 Notes
17 pages
Module 3 Pipelining
No ratings yet
Module 3 Pipelining
7 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
Concept of Pipelining 3.1.3
No ratings yet
Concept of Pipelining 3.1.3
6 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Lec 8 Performance enhancement-computer architecture
No ratings yet
Lec 8 Performance enhancement-computer architecture
23 pages
3.1-2 (1)
No ratings yet
3.1-2 (1)
8 pages
Pipelining
No ratings yet
Pipelining
5 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
Bare Metal C: Embedded Programming for the Real World
From Everand
Bare Metal C: Embedded Programming for the Real World
Stephen Oualline
No ratings yet
Topic 14 - How CPU Works (Notes)
No ratings yet
Topic 14 - How CPU Works (Notes)
11 pages
8086 Internal Block Diagram Enotes
100% (2)
8086 Internal Block Diagram Enotes
7 pages
Computer Architecture 1: 1. Introduction To Assembly Language Programming
100% (1)
Computer Architecture 1: 1. Introduction To Assembly Language Programming
9 pages
ET7102-Microcontroller Based System Design
No ratings yet
ET7102-Microcontroller Based System Design
11 pages
Chapter 2 8086 Addressing Modes1
100% (1)
Chapter 2 8086 Addressing Modes1
14 pages
Pipe Riscv Formats and Datapath
No ratings yet
Pipe Riscv Formats and Datapath
11 pages
Programs For Cso Lab 8085 SRGI-2017
No ratings yet
Programs For Cso Lab 8085 SRGI-2017
7 pages
Netapp Model Secriptions
No ratings yet
Netapp Model Secriptions
3 pages
1151CS110 Computer Organization and Architecture
No ratings yet
1151CS110 Computer Organization and Architecture
2 pages
7-1. Control Memory 7-1. Control Memory
No ratings yet
7-1. Control Memory 7-1. Control Memory
4 pages
MAES_MID_LECTURE 05_v4
No ratings yet
MAES_MID_LECTURE 05_v4
21 pages
LEGION
No ratings yet
LEGION
6 pages
SDM Vol 2d
No ratings yet
SDM Vol 2d
328 pages
Questions - Introduction To Microprocessor
No ratings yet
Questions - Introduction To Microprocessor
8 pages
Solved Problems 1
No ratings yet
Solved Problems 1
4 pages
5 PIC18 AddressingModes FSRs Table Part2
No ratings yet
5 PIC18 AddressingModes FSRs Table Part2
15 pages
ACE322_Quiz_Answers
No ratings yet
ACE322_Quiz_Answers
6 pages
Intel Packaging
No ratings yet
Intel Packaging
36 pages
Instruction Set
No ratings yet
Instruction Set
15 pages
Buses Buses Buses: Von Neumann Architecture: Von Neumann Architecture: Von Neumann Architecture
No ratings yet
Buses Buses Buses: Von Neumann Architecture: Von Neumann Architecture: Von Neumann Architecture
5 pages
3 X86 Family
No ratings yet
3 X86 Family
12 pages
Microprocessor and Microcontroller Part 1
No ratings yet
Microprocessor and Microcontroller Part 1
45 pages
What Is Arithmetic Instructions in 8086 Microprocessor
100% (1)
What Is Arithmetic Instructions in 8086 Microprocessor
2 pages
Pic Microcontroller Block Diagram
No ratings yet
Pic Microcontroller Block Diagram
15 pages
Instruction Opcode With Comments
100% (2)
Instruction Opcode With Comments
4 pages
Sap 3
No ratings yet
Sap 3
14 pages
Adobe Media Encoder Log-Last
No ratings yet
Adobe Media Encoder Log-Last
4 pages
10-8085 Microprocessor-04-Aug-2020Material - I - 04-Aug-2020 - Introduction - To - 8085 - Processor
No ratings yet
10-8085 Microprocessor-04-Aug-2020Material - I - 04-Aug-2020 - Introduction - To - 8085 - Processor
17 pages
Detailed Notes on Addressing Modes
No ratings yet
Detailed Notes on Addressing Modes
3 pages
Folien HyperThreading
No ratings yet
Folien HyperThreading
3 pages

COA Unit - V Notes

Uploaded by

COA Unit - V Notes

Uploaded by

COMPUTER ORGANIZATION AND ARCHITECTURE

Pipelining: Basic concepts of pipelining:

Pipelining increases the overall instruction throughput.

It is divided into 2 categories:

1. Compare the exponents by subtraction.

1. Compare exponents by subtraction:

2. Align the mantissas:

The two mantissas are added in segment three.

After normalization, the result is written as:

1. Fetch instruction from memory.

1. The cycle time of the processor is reduced.

1. The design of pipelined processor is complex and costly to manufacture.

Dependencies in a pipelined processor

These dependencies may introduce stalls in the pipeline.

Stall : A stall is a cycle in the pipeline without new input.

Solution for structural dependency

Control Dependency (Branch Hazards)

Consider the following sequence of instructions in the program:

Expected output: I1 -> I2 -> BI1

Output Sequence: I1 -> I2 -> I3 -> BI1

Output Sequence: I1 -> I2 -> Delay (Stall) -> BI1

Data Dependency (Data Hazard)

Now, we say that instruction S2 depends in instruction S1, when

Three cases exist:

Example: Let there be two instructions I1 and I2 such that:

To minimize data dependency stalls in the pipeline, operand forwarding is used.

Considering the same example:

Let there be two instructions I and J, such that J follow I. Then,

Introduction to parallel processors:

1. As a uniprocessor, such as single instruction, single data stream (SISD).

Benefits of using a Multiprocessor –

Flynn's Classification of Computers

The sequence of instructions read from memory constitutes an instruction stream.

Note: The term 'Stream' refers to the flow of instructions or data.

1. Single instruction stream, single data stream (SISD)

Dominant representative SIMD systems is Cray’s vector processing machine.

There are three distinct level of cache coherence :-

1. Every write operation appears to occur instantaneously.

1. MSI protocol (Modified, Shared, Invalid)

These important terms are discussed as follows:

You might also like