UNIT – 5
Pipelining
The term Pipelining refers to a technique of decomposing a sequential process into
sub-operations, with each sub-operation being executed in a dedicated segment that
operates concurrently with all other segments.
The most important characteristic of a pipeline technique is that several computations can
be in progress in distinct segments at the same time. The overlapping of computation is
made possible by associating a register with each segment in the pipeline. The registers
provide isolation between each segment so that each can operate on distinct data
simultaneously.
The structure of a pipeline organization can be represented simply by including an input
register for each segment followed by a combinational circuit.
Let us consider an example of combined multiplication and addition operation to get a better
understanding of the pipeline organization.
The term Pipelining refers to a technique of decomposing a sequential process into sub-operations, with
each sub-operation being executed in a dedicated segment that operates concurrently with all other
segments. The most important characteristic of a pipeline technique is that several computations can be
in progress in distinct segments at the same time. The overlapping of computation is made possible by
associating a register with each segment in the pipeline. The registers provide isolation between each
segment so that each can operate on distinct data simultaneously. The structure of a pipeline
organization can be represented simply by including an input register for each segment followed by a
combinational circuit.
Let us consider an example of combined multiplication and addition operation to get a better
understanding of the pipeline organization.
The combined multiplication and addition operation is done with a stream of numbers such as:
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a particular
segment.
The output generated by the combinational circuit in a given segment is applied as an input register
of the next segment. For instance, from the block diagram, we can see that the register R3 is used as
one of the input registers for the combinational adder circuit.
In general, the pipeline organization is applicable for two areas of computer design which includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to implement floating-
point operations, multiplication of fixed-point numbers, and similar computations encountered in
scientific problems.
To understand the concepts of arithmetic pipeline in a more convenient way, let us consider an
example of a pipeline unit for floating-point addition and subtraction.
The inputs to the floating-point adder pipeline are two normalized floating-point binary numbers
defined as:
Where A and B are two fractions that represent the mantissa and a and b are the exponents.
The combined operation of floating-point addition and subtraction is divided into four segments.
Each segment contains the corresponding sub operation to be performed in the given pipeline. The
sub operations that are shown in the four segments are:
1. Compare the exponents by subtraction.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
We will discuss each sub operation in a more detailed manner later in this section.
The following block diagram represents the sub operations performed in each segment of the
pipeline.
Compare exponents by subtraction:
The exponents are compared by subtracting them to determine their difference. The larger exponent
is chosen as the exponent of the result.
The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa associated
with the smaller exponent must be shifted to the right.
2. Align the mantissas:
The mantissa associated with the smaller exponent is shifted according to the difference of
exponents determined in segment one.
Instruction pipelining:
Pipeline processing can occur not only in the data stream but in the instruction stream as well.
Most of the digital computers with complex instructions require instruction pipeline to carry out
operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence of steps.
1. Fetch instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
Each step is executed in a particular segment, and there are times when different segments may take
different times to operate on the incoming information. Moreover, there are times when two or more
segments may require memory access at the same time, causing one segment to wait until another is
finished with the memory.
The organization of an instruction pipeline will be more efficient if the instruction cycle is divided
into segments of equal duration. One of the most common examples of this type of organization is a
Four-segment instruction pipeline.
A four-segment instruction pipeline combines two or more different segments and makes it as a
single one. For instance, the decoding of the instruction can be combined with the calculation of the
effective address into one segment.
The following block diagram shows a typical example of a four-segment instruction pipeline. The
instruction cycle is completed in four segments.
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and eventually, the effective
address is calculated in a separate arithmetic circuit.
Segment 3:
An operand from memory is fetched in the third segment.
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.
As computer systems evolve, greater performance can be achieved by taking advantage of
improvements in technology, such as faster circuitry, use of multiple registers rather than a single
accumulator and the use of a cache memory. Another organizational approach is instruction
pipelining in which new inputs are accepted at one end before previously accepted inputs appear as
outputs at the other end.
Figure 3.1a depicts this approach. The pipeline has two independent stages. The first stage fetches
an instruction and buffers it. When the second stage is free, the first stage passes it the buffered
instruction. While the second stage is executing the instruction, the first stage takes advantage of
any unused memory cycles to fetch and buffer the next instruction. This iscalled instruction prefetch
or fetch overlap. This process will speed up instruction execution only if the fetch and execute
stages were of equal duration, the instruction cycle time would be halved. However, if we look
more closely at this pipeline
(Figure 3.1b), we will see that this doubling of execution rate is unlikely for 3 reasons:
1. The execution time will generally be longer than the fetch time. Thus, the fetch stage may have
to wait for some time before it can empty its buffer.
2. A conditional branch instruction makes the address of the next instruction to be fetched
unknown. Thus, the fetch stage must wait until it receives the next instruction address from the
execute stage. The execute stage may then have to wait while the next instruction is fetched.
3. When a conditional branch instruction is passed on from the fetch to the execute stage, the fetch
stage fetches the next instruction in memory after the branch instruction. Then, if the branch is not
taken, no time is lost .If the branch is taken, the fetched instruction must be discarded and a new
instruction fetched.
To gain further speedup, the pipeline must have more stages. Let us consider the following
decomposition of the instruction processing.
1. Fetch instruction (FI): Read the next expected instruction into a buffer.
2. Decode instruction (DI): Determine the opcode and the operand specifiers.
3. Calculate operands (CO): Calculate the effective address of each source operand. This may
involve displacement, register indirect, indirect, or other forms of address calculation.
4. Fetch operands (FO): Fetch each operand from memory.
5. Execute instruction (EI): Perform the indicated operation and store the result, if any, in the
specified destination operand location.
6. Write operand (WO): Store the result in memory.
Figure 3.2 show that a six-stage pipeline can reduce the execution time for 9 instructions from 54
time units to 14 time units.
3.2 Timing Diagram for Instruction Pipeline Operation
FO and WO stages involve a memory access. If the six stages are not of equal duration, there will be
some waiting involved at various pipeline stages. Another difficulty is the conditional branch
instruction, which can invalidate several instruction fetches. A similar unpredictable event is an
interrupt.
3.3 Timing Diagram for Instruction Pipeline Operation with interrupts
Figure 3.3 illustrates the effects of the conditional branch, using the same program as Figure 3.2.
Assume that instruction 3 is a conditional branch to instruction 15. Until the instruction is executed,
there is no way of knowing which instruction will come next. The pipeline, in this example, simply
loads the next instruction in sequence (instruction 4) and proceeds.
In Figure 3.2, the branch is not taken. In Figure 3.3, the branch is taken. This is not determined until
the end of time unit 7.At this point, the pipeline must be cleared of instructions that are not useful.
During time unit 8, instruction 15 enters the pipeline.
No instructions complete during time units 9 through 12; this is the performance penalty incurred
because we could not anticipate the branch. Figure 3.4 indicates the logic needed for pipelining to
account for branches and interrupts.
3.4 Six-stage CPU Instruction Pipeline
Through Put & Speed Up
In computer organization, throughput and speedup are two important performance metrics that
describe the efficiency of a system in processing tasks. Here's an explanation of each:
PIPELINE HAZARDS
Pipeline hazards: What are pipeline hazards?
Hazards are those situations that prevent the next instruction in the instruction stream from executing
during its designated clock cycle. They reduce the performance from the ideal speedup gained by
pipelining.
Classification of hazards:
Structural Hazards: arise from resource conflicts when the hardware can’t support all possible
combinations in simultaneous overlapped execution.
Data hazards: arise when an instruction depends upon the results of a previous instruction in a
way that is exposed by the overlapping of instructions in the pipeline.
Control Hazards: arise from the pipelining of branches and other instructions that change the PC
Structural hazards:
For any system to be free from hazards, pipelining of functional units and duplication of
resources is necessary to allow all possible combinations of instructions in the pipeline.
Structural hazards arise due to the following reasons:
When a functional unit is not fully pipelined, then the sequence of instructions using that unit
cannot proceed at the rate of one per clock cycle.
When the resource is not duplicated enough to allow all possible combinations of instructions.
Ex: a machine may have one register file write port, but it may want to perform 2 writes during the
same clock cycle.
A machine with a shared single memory for data and instructions . An instruction containing data
memory reference will conflict with the instruction reference for a later instruction.
This resolved by stalling the pipeline for one clock cycle when the data memory access occurs.
Data hazards:
It occur when an instruction depends on the result of previous instruction and that result of
instruction has not yet been computed. whenever two different instructions use the same storage.
the location must appear as if it is executed in sequential order.
There are four types of data dependencies: Read after Write (RAW), Write after Read (WAR),
Write after Write (WAW), and Read after Read (RAR). These are explained as follows below.
Read after Write (RAW) :
It is also known as True dependency or Flow dependency. It occurs when the value produced by
an instruction is required by a subsequent instruction. For example,
ADD R1, --, --;
SUB --, R1, --;
Write after Read (WAR) :
It is also known as anti dependency. These hazards occur when the output register of
an instruction is used right after read by a previous instruction. For example,
ADD --, R1, --;
SUB R1, --, --;
Write after Write (WAW) :
It is also known as output dependency. These hazards occur when the output register
of an instruction is used for write after written by previous instruction. For example,
ADD R1, --, --;
SUB R1, --, --;
Data hazards occur when instructions in a pipeline depend on the results of previous
instructions. To ensure smooth execution, various hazard-handling techniques like
forwarding and stalling are used.
Hazards requiring stalls:
Consider the situation where a load and a sub instruction are consecutive, where the
destination register of load is the source register for sub.
This hazard cannot be removed by forwarding. Hence a pipeline interlock is introduced to detect the
hazard and stalls the pipeline until the hazard is cleared. The hazard is checked during the ID phase
and stalls the instruction that wants to use the data until the source instruction produces it.
Control hazards:
Control hazards cause a greater performance loss compared to the losses posed by data
hazards.
The simplest method of dealing with branches is that the pipeline is stalled as soon the branch is
detected in the ID phase and until the MEM stage where the new PC is finally determined.
Each branch causes a 3 cycle stall in the DLX pipeline which is a significant loss as the 30%
of the instructions used are branch instructions.
The number of clock cycles in the branch is reduced by testing the condition for branching in
the ID stage and computing the destination address in the ID stage using a separate adder.
Thus there is only clock cycle on branches
We can get two types of conflicts while using this pipelining concept, data conflicts and branch
conflicts.