0% found this document useful (0 votes)

19 views12 pages

COA Unit 3 Pipelining 31.5.23

Pipelining is a technique to enhance processor performance by executing instructions concurrently through multiple stages such as fetch, decode, execute, and write. Hazards such as data, control, and structural hazards can cause stalls in the pipeline, which can be mitigated using techniques like operand forwarding and branch prediction. Effective management of these hazards is crucial for optimizing the performance of pipelined processors.

Uploaded by

deviveeranan7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

COA Unit 3 Pipelining 31.5.23

Uploaded by

deviveeranan7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT 3 PIPELINING (2nd Half)

Pipelining means to execute the instructions concurrently.

Basic concepts
The speed of execution of programs is influenced by many factors.

1. To improve performance is to use faster circuit technology to build the processor and the
main memory.

2. To arrange the hardware so that more than one operation can be performed at the same
time. In this way, the number of operations performed per second is increased even though the
elapsed time needed to perform any one operation is not changed. 2- Stage Pipeline :

The
processor executes a program by fetching & executing instructions one after the other. Let Fi
& Ei refers to the fetch & execute steps for instruction Ii. An execution of a program consists
of a sequence of fetch & executes steps. Computer has two separate units, one for fetching
instructions and another for executing them. The instruction fetched by the fetch unit is
deposited in an intermediate storage buffer, B1. This buffer is needed to enable the execution
unit to execute the instruction while the fetch unit is fetching the next instruction. The result
of execution is deposited in the destination location specified by the instruction.
The computer is controlled by a clock whose period is such that the fetch and execute
steps of any instruction can each be completed in one clock cycle. In the first clock cycle, the
fetch unit fetches an instruction I1(step F1) & stores it in buffer B1 at the end of the clock
cycle. In the second clock cycle, the instruction fetch unit proceeds with the fetch operation
for instruction I2(step F2). Meanwhile, the execution unit performs the operation specified by
instruction I1, which is available to it in buffer B1 (step E1). By the end of the second clock
cycle, the execution of instruction I1 is completed & instruction I2 is available. Instruction I2
is stored in B1, replacing I1, which is no longer needed. Step E2 is performs by the execution
unit during the third clock cycle, while instruction I3 is being fetched by the fetch unit.

4-Stage Pipeline :
A pipelined processor may process each instruction in 4 steps, as follows:
1. F – Fetch : Read the instruction from the memory
2. D – Decode : Decode the instruction & fetch the source operands.
3. E – Execute : Perform the operation specified by the instruction.
4. W – Write : Store the result in the destination location.

Four instructions are in progress at any given time. This means that four distinct
hardware units are needed. These units must be capable of performing their tasks
simultaneously and without interfering with one another. Information is passed from one unit
to the next through a storage buffer. As an instruction progresses through the pipeline, all the
information needed by the stage downstream must be passes along. During clock cycle 4, the
information in the buffers is as follows :
1. Buffer B1 holds instruction I3, which was fetched in cycle 3 & is being decoded by the
instruction-decoding unit.
2. Buffer B2 holds both the source operands for instruction I2 and the specification of the
operation to be performed. This is the information produced by the decoding hardware in
cycle 3. The buffer also holds the information needed for the write step of instruction
I2(Step W2). Even though it is not needed by stage E, this information must be passed on
to stage W in the following clock cycle to enable that stage to perform the required Write
operation.
3. Buffer B3 holds the results produced by the execution unit and the destination
information for instruction I1.

Role of cache memory :

The use of cache memories solves the memory access problem. In particular, when a
cache is included on the same chips as the processor, access time to the cache is usually the same
as the time needed to perform other basic operations inside the processor. This makes it possible
to divide instruction fetching & processing into steps that are more or loess equal in duration.
Each of these steps is performed by a different pipeline stage, and the clock period is chosen to
correspond to the longest one.
Pipeline Performance :
The potential increase in performance resulting from pipelining is proportional to the
number of pipeline stages.
Hazard : Any condition that causes the pipeline to stall is called a hazard.
Types of hazards :
1. Data hazard
2. Control hazards or instruction hazards
3. Structural hazard.

Data hazard : It is any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result some operation has to
be delayed, and the pipeline stalls.

Control hazards or instruction hazards : The pipeline may also be stalled because of a delay in
the availability of an instruction. For example, this may be a result of a miss in the cache,
requiring the instruction to be fetched from the main memory. Such hazards are called control or
instruction hazards.

Structural hazard : This is the situation when two instructions require the use of a given
hardware resource at the same time. The most common case in which this hazard may arise is in
access to memory. One instruction may need to access memory as part of the Execute or Write
stage while another instruction is being fetched. If instructions & data reside in the same cache
unit, only one instruction can proceed and the other instruction is delayed. Many processors use
separate instruction and data caches to avoid this delay.
Data Hazards
A data hazard is a situation in which the pipeline is stalled because the data to be operated on are
delayed for some reason.
Data dependency arises when the destination of one instruction is used as a source in the
next instruction.
For example, the two instructions
Mul R2,R3,R4
Add R5, R4, R6
Give rise to a data dependency. The result of the multiply instruction is placed into register
R4, which in turn is one of the two source operands of the Add instruction. As the Decode unit
decodes the Add instruction in cycle 3, it realizes that R4 is used as a source operand. Hence, the
D step of that instruction cannot be completed until the W step of the multiply instruction has
been completed. Completion of step D2 must be delayed to clock cycle 5. Instruction I3 is fetched
in cycle 3, but is decoding must be delayed because step D3 cannot precede D2. Hence, pipelined
execution is stalled for two cycles.
Operand Forwarding :
The data hazard arises because one instruction, instruction I2 is waiting for data to be
written in the register file. However, these data are available at the output of the ALU once the
Execute stage completes step E1. Hence, the delay can be reduced, or possibly eliminated, if
arrange for the result of instruction I1 to be forwarded directly for use in step E2. The registers
SRC1, SRC2, and RSLT constitute the interstage buffers needed for pipelined operation. SRC1
& SRC2 are part of buffer B2 & RSLT is part of B3. The two multiplexers connected at the inputs
to the ALU allow the data on the destination bus to be selected instead of the contents of either
SRC1 or SRC2 register.
Datapath

Position of the source and result registers in the processor pipeline

After decoding instruction I2 and detecting the data dependency, a decision is made to use
data forwarding. The operand not involved in the dependency, register R2, isread and loaded in
register SRC1 in clock cycle 3. In the next clock cycle, the product produced by instruction I1 is
available in register RSLT, and because of the forwarding connection, it can be used in step E2.
Hence, execution of I2 proceeds without interruption.
Handling Data hazards in software :
The control hardware delays reading register R4 until cycle 5, thus introducing a 2-cycle
stall unless operand forwarding is used. An alternative approach is to leave the task of detecting
data dependencies and dealing with them to the software. In this case, the compiler can introduce
the two-cycle delay needed between instructions I1 and I2 by inserting NOP (No-operation)
instructions, as follows:
I1 : Mul R2, R3, R4
NOP
NOP
I2 : Add R5, R4, R6.
If the responsibility for detecting such dependencies is left entirely to the software, the compiler
must insert the NOP instructions to obtain a correct result. This possibility illustrates the close link
between the compiler and the hardware. Being aware of the need for a delay, the compiler can
attempt to reorder instruction to perform useful tasks in the NOP slots, and thus achieve better
performance. On the other hand, the insertion of NOP instruction leads to larger code size. Also, it
is often the case that a given processor architecture has several hardware implementations,
offering different features.
Side effects :
When a location other than one explicitly named in an instruction as a destination operand
is affected, the instruction is said to have a side effect. For example, stack instructions, such as
push and pop, produce similar side effects because they implicitly use the autoincrement &
autodecrement addressing modes.

Another possible side effect involves the condition code flags, which are used by
instructions such as conditional branches and add-with-carry. Suppose that registers R1 & R2 hold
a double-precision integer number adds to another double-precision number in register R3 & R4.
This may be accomplished as follows :
Add R1, R3
AddWithCarry R2, R4.
An implicit dependency exists between these two instructions through the carry flag. This
flag is set by the first instruction & used in the second instruction, which performs the operation
R4 [R2] + [R4] + carry.
Instructions that have side effects give rise to multiple data dependencies, which lead to a
substantial increase in the complexity of the hardware or software needed to resolve them.

Instruction Hazards
∙ A pipeline stalled because of a delay in the availability of an instruction is called as
control hazard.
∙ This type of hazard is also called as Instruction hazard.

∙ For example, a miss in the cache, requiring the instruction to be fetched from the main
memory.

Instruction execution steps in successive clock cycles

Functions performed by each processor stage in successive clock cycles

Such idle periods are called stalls. They are also often referred to as bubbles in the
pipeline. A branch instruction may also cause the pipeline to stall. It may be
∙ Conditional branch
∙ Unconditional branch
Unconditional Branches :
Instructions I1 to I3 are stored at successive memory addresses, and I2 is a branch instruction.
Let the branch target be instruction Ik . The time lost as a result of a branch instruction is often
referred to as the branch penalty.

Instruction Queue and Prefetching

∙ Either a cache miss or a branch instruction stalls the pipeline for one or more clock
cycles.

∙ To reduce this stall period, many processors employ sophisticated fetch units that can
fetch instructions before they are needed and put them in a queue.
∙ Typically, the instruction queue can store several instructions.

∙ A separate unit, which calls the dispatch unit, takes instructions from the front of the
queue sends them to the execution unit.
∙ Every fetch operation adds one instruction to the queue and every dispatch operation
reduces the queue length by one. Hence, the queue length remains the same for the
first four clock cycles.
The sequence of instruction completions Instructions I1,I2, I3, I4, and Ik complete execution in
successive clock cycles. This is because the instruction fetch unit has executed the branch
instruction concurrently with the execution of other instructions. This technique is referred to as
branch folding.

∙ Branch folding occurs only if at the time a branch instruction is encountered, at least one
instruction is available in the queue other than the branch instruction.

∙ The effectiveness of this technique is enhanced when the instruction fetch unit is able to read
more than one instruction at a time from the instruction cache.
Conditional Branches and Branch Prediction :

∙ A conditional branch instruction introduces the added hazard caused by the dependency of
the branch condition on the result of a preceding instruction.

∙ The decision to branch cannot be made until the execution of that instruction has been
completed.

∙ Branch instructions represent about 20 percent of the dynamic instruction count of most
programs.

∙ Because of the branch penalty, this large percentage would reduce the gain in
performance expected from pipelining.
∙ The different solutions for this are:
▪Delayed branching
▪
Static branch prediction.
▪
Dynamic branch prediction.

Delayed branching :

- The location following a branch instruction is called a branch delay slot. -

In the above eg. There is one delay slot 44

- The instructions in the delay slots are always fetched and at least partially executed before
the branch decision is made and the branch target address is computed.

- A technique called delayed branching can minimize the penalty incurred as a result of
conditional branch instructions.
- The objective is to be able to place useful instructions in these slots.

- If no useful instructions can be placed in the delay slots, these slots must be filled with NOP
instructions.

- After reordering of the code for delayed branching as show below, the delay slot is
removed.

- In this branching takes place one instruction later than where the branch instruction appears
in the instruction sequence in the memory, hence the name ―delayed branch.‖
Branch Prediction
- It is a technique for reducing the branch penalty associated with conditional Branches. -
It attempts to predict whether or not a particular branch will be taken.

- The simplest form of branch prediction is to assume that the branch will not take place and
to continue to fetch instructions in sequential address order.

- The other variety is Speculative execution means that instructions are executed before the
processor is certain that they are in the correct execution sequence.

- Thus, care must be taken that no processor registers or memory locations are updated until
it is confirmed that these instructions should indeed be executed.

- If the branch decision indicates otherwise, the instructions and all their associated data in
the execution units must be purged, and the correct instructions fetched and executed.
- A simplest approach of assuming will be that branches will not be taken. -
It would save the time lost to conditional branches 50 percent of the time.

- However, better performance can be achieved if assume that for some branch instructions
prediction as taken and others as not taken. (Depending on the expected program
behavior.)
- Any approach that has this characteristic is called static branch prediction.
Dynamic Branch Prediction

- It is an approach in which the prediction decision may change depending on execution

history.

- The objective of branch prediction algorithms is to reduce the probability of making a

wrong decision.
- Thus to avoid fetching instructions that eventually have to be discarded.

- In this the processor hardware assesses the likelihood of a given branch being taken by
keeping track of branch decisions every time that instruction is executed.
- In a simplest form, it uses the branch instructions execution history.

- There exists a different type of dynamic prediction based on the amount of knowledge can
accumulate.
- Examples types include:
▪1 bit, 2 bit etc.
Dynamic prediction based on 1 -bit
The two states are:
◆LT: Branch is likely to be taken

◆LNT: Branch is likely not to be taken

Assume that the algorithm is started in state LNT

Dynamic prediction based on 2 -bit

- The four states are:
o ST: Strongly likely to be taken
o LT: Likely to be taken
o LNT: Likely not to be taken
o SNT: Strongly likely not to be taken
- Assume that the state of the algorithm is initially set to LNT.

Pipe Lining
No ratings yet
Pipe Lining
12 pages
Operand Forwarding in Pipelining
No ratings yet
Operand Forwarding in Pipelining
34 pages
Understanding Pipelining and Hazards
No ratings yet
Understanding Pipelining and Hazards
24 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Module 5 Notes Bcs302
No ratings yet
Module 5 Notes Bcs302
22 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
SIMD Pipeline System Overview
No ratings yet
SIMD Pipeline System Overview
35 pages
2 Performance Issue
No ratings yet
2 Performance Issue
4 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
16 pages
Module-5 Ddco - BCS302 DR Laxmi G
No ratings yet
Module-5 Ddco - BCS302 DR Laxmi G
7 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
2 - Performance Issue
No ratings yet
2 - Performance Issue
4 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
Pipeline in ARM
No ratings yet
Pipeline in ARM
10 pages
Snehasis Barik-Computer Architeacture
No ratings yet
Snehasis Barik-Computer Architeacture
11 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
Pipe Lining
No ratings yet
Pipe Lining
23 pages
Session6-Pipelining Approach
No ratings yet
Session6-Pipelining Approach
11 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
14 pages
Parallel Processing Chapter - 3: Instruction Level Parallelism
No ratings yet
Parallel Processing Chapter - 3: Instruction Level Parallelism
33 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
CAAL-Micro Architechture
No ratings yet
CAAL-Micro Architechture
21 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Pipe Line1
No ratings yet
Pipe Line1
7 pages
Module 3 Pipelining
No ratings yet
Module 3 Pipelining
7 pages
Pipelining: Techniques and Challenges
No ratings yet
Pipelining: Techniques and Challenges
13 pages
Pipe Lining
No ratings yet
Pipe Lining
35 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
Lec-12 Pipelining
No ratings yet
Lec-12 Pipelining
44 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
RISC Pipeline Implementation Explained
100% (1)
RISC Pipeline Implementation Explained
16 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Pipe Lining
No ratings yet
Pipe Lining
61 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Pipelining & Parallel Processing Guide
No ratings yet
Pipelining & Parallel Processing Guide
12 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
C5X Instruction Pipelining Overview
No ratings yet
C5X Instruction Pipelining Overview
21 pages
Unit 3
No ratings yet
Unit 3
94 pages
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
No ratings yet
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
6 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
Instruction Pipelining Explained
No ratings yet
Instruction Pipelining Explained
40 pages
Pipelining and Others
No ratings yet
Pipelining and Others
34 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
DDCO Jan25 Unit5
No ratings yet
DDCO Jan25 Unit5
30 pages
CA 5 Pipelining
No ratings yet
CA 5 Pipelining
47 pages
Uni1-2 Pipelining
No ratings yet
Uni1-2 Pipelining
12 pages
Computer Pipelining Explained
No ratings yet
Computer Pipelining Explained
45 pages
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
No ratings yet
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
8 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
ESDI - Self Study Essi
No ratings yet
ESDI - Self Study Essi
7 pages
C Programming
No ratings yet
C Programming
59 pages
Embedded System Design and Iot - Lab Manual
No ratings yet
Embedded System Design and Iot - Lab Manual
80 pages
20ec304-Adc Imp QP
No ratings yet
20ec304-Adc Imp QP
2 pages
Coa Unit Iii Final
No ratings yet
Coa Unit Iii Final
141 pages
Very Large Instruction Word (VLIW) : - VLIW - Architectures and Scheduling Techniques (Ch. 3.5)
No ratings yet
Very Large Instruction Word (VLIW) : - VLIW - Architectures and Scheduling Techniques (Ch. 3.5)
35 pages
MIPS CPU Design Essentials
No ratings yet
MIPS CPU Design Essentials
83 pages
Sifive E76 Manual 20G1.03.00
No ratings yet
Sifive E76 Manual 20G1.03.00
127 pages
ELEC6036 Ass1 20 21 Sol PDF
No ratings yet
ELEC6036 Ass1 20 21 Sol PDF
2 pages
CPU Pipelining and Architecture Comparison
No ratings yet
CPU Pipelining and Architecture Comparison
12 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
21css201t Coa Unit IV
No ratings yet
21css201t Coa Unit IV
136 pages
Pipeline 1
No ratings yet
Pipeline 1
6 pages
Presentation Pipelining: Types of Pipelines
No ratings yet
Presentation Pipelining: Types of Pipelines
2 pages
ECE-6913 - RISC-V Project - A1
No ratings yet
ECE-6913 - RISC-V Project - A1
4 pages
Unit 3 - Week 1: Pipelined Instruction Execution Principles
No ratings yet
Unit 3 - Week 1: Pipelined Instruction Execution Principles
4 pages
Unit 4 Coa
No ratings yet
Unit 4 Coa
25 pages
Coa - 2nd Sessional (Ece)
No ratings yet
Coa - 2nd Sessional (Ece)
1 page
Pipelining Techniques in DLX Architecture
No ratings yet
Pipelining Techniques in DLX Architecture
39 pages
Computer Architecture MA 305: Dr. Daya Sagar Gupta
No ratings yet
Computer Architecture MA 305: Dr. Daya Sagar Gupta
10 pages
Instruction Level Parallelism Explained
No ratings yet
Instruction Level Parallelism Explained
45 pages
Solved Consider A Version of The Pipeline From Section 4.6 That D...
No ratings yet
Solved Consider A Version of The Pipeline From Section 4.6 That D...
1 page
Computer ArchitectureT4
No ratings yet
Computer ArchitectureT4
7 pages
Understanding Pipelining in Computing
No ratings yet
Understanding Pipelining in Computing
13 pages
Five-Stage Pipelined 32-Bit RISC-V Base Integer Instruction Set Architecture Soft Microprocessor Core in VHDL
No ratings yet
Five-Stage Pipelined 32-Bit RISC-V Base Integer Instruction Set Architecture Soft Microprocessor Core in VHDL
6 pages
Pipelined CPU Stall Handling Techniques
No ratings yet
Pipelined CPU Stall Handling Techniques
4 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
Instruction Pipelining Explained
No ratings yet
Instruction Pipelining Explained
38 pages
Wa0003.
No ratings yet
Wa0003.
2 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Arithmetic and Instruction Pipelining
No ratings yet
Arithmetic and Instruction Pipelining
10 pages
Conference Template A4
No ratings yet
Conference Template A4
4 pages
Question Bank (ACA)
No ratings yet
Question Bank (ACA)
5 pages

COA Unit 3 Pipelining 31.5.23

Uploaded by

COA Unit 3 Pipelining 31.5.23

Uploaded by

UNIT 3 PIPELINING (2nd Half)

Pipelining means to execute the instructions concurrently.

Role of cache memory :

Position of the source and result registers in the processor pipeline

Instruction execution steps in successive clock cycles

Functions performed by each processor stage in successive clock cycles

Instruction Queue and Prefetching

- The location following a branch instruction is called a branch delay slot. -

- It is an approach in which the prediction decision may change depending on execution

- The objective of branch prediction algorithms is to reduce the probability of making a

◆LNT: Branch is likely not to be taken

Dynamic prediction based on 2 -bit

You might also like