PIPELINING
Dipl.-Ing. Elena Gatti (2004), slightly modified by Stefan Freinatis (2005), Uni Duisburg-Essen
What is Pipelining?
• Pipelining is an implementation technique whereby multiple instructions are
overlapped in execution
• Each step of the pipeline completes a part of an instruction. Different steps are
completing different part of different instructions
• Each pipeline-step is called pipe stage or pipe segment
• The throughput of an instruction pipeline is determined by how often an
instruction exits the pipeline
• The time required between moving an instruction one step down the pipeline is
a processor cycle
• Because all pipe stages proceed at the same time, the length of a processor
cycle is determined by the time required for the slowest pipeline
• Ideal case: the pipe stages are perfectly balanced (i.e. the length1 of every
pipe stage is the same)
• Real case:
The pipe stages will not be perfectly balanced
Pipelining involves some overhead
RISC (Reduced Instruction Set Computer)
• RISC architectures (Ex: MIPS) are characterized by a few key-properties:
1) All operations on data apply to data in registers
2) The only operations that affect memory are LOAD and STORE
3) The instruction formats are few in number, with all instructions typically
being one (= of the same) size
1
“length” means “time”
1
• RISC architectures mostly have just 3 classes of instructions:
1) ALU
2) LOAD and STORE
3) BRANCHES and JUMPS
Implementation of a RISC Instruction Set (without pipelining)
Simple implementation: every instruction can be implemented in at most 5 clock
cycles. The 5 clock cycles are the following:
1) IF (Instruction Fetch): Fetch the current instruction from memory
2) ID (Instruction Decode): Decode the instruction and read the corresponding
source registers
3) EX (Execution): the ALU operates on the operands prepared in the prior cycle
4) MEM (Memory Access): if the instruction is a LOAD, memory does a READ.
If it is a STORE, memory does a WRITE
5) WB (Write Back): write the results into register, whether it comes from the
memory (LOAD) or from the ALU
Hardware Implementation:
Time (in CC (Clock Cycles))
CC1 CC2 CC3 CC4 CC5
IF ID EX MEM WB
MEM REG MEM REG
Figure 1: Pipeline stages
2
The 5-stage pipeline for RISC processors
• Simply ‘pipeline’ the execution described above by starting a new
instruction on each clock cycle
• During each clock cycle the hardware will initiate a new instruction
and will be executing some part of the 5 different tasks
Clock Number
Instruction Number
1 2 3 4 5 6 7 8
Instruction i IF ID EX MEM WB
Instruction i+1 IF ID EX MEM WB
Instruction i+2 IF ID EX MEM WB
Instruction i+3 IF ID EX MEM WB
……
Figure 3: ‚Pipelining’ an instruction
It is hard to believe that pipelining is as simple as this: it is NOT!!
Pipelining introduces some problems!
• Some problems introduced by pipelining:
Problem 1: Make sure not to try to perform two or more different operations
with the same data path resource on the same clock cycle!
Fig. 3 shows the hardware implementation of the pipeline above.
Note that in CC4, Instruction i and Instruction i+3 both try to access memory.
In CC5, Instruction i and Instruction i+3 both try to access registers.
3
Time (in CC (Clock Cycles))
Instruction Number
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
MEM REG MEM REG
Instruction i
MEM REG MEM REG
Instruction i+1
MEM REG MEM REG
Instruction i+2
Instruction i+3
MEM REG MEM REG
……
Figure 3: A pipeline problem
4
This kind of problems can be solved WITHOUT introducing stalls (delays) in the
pipeline.
Possible solutions:
- Use separate instruction- and data memories
- Perform the register WRITE in the first half of the clock
cycle (for WB) and READ in the second half (for ID)
The hardware implementation can be then drawn as in Fig. 4.
Time (in CC (Clock Cycles))
CC1 CC2 CC3 CC4 CC5
IF ID EX MEM WB
IM REG DM REG
Instruction Data
Memory Memory
Figure 4: modified pipeline
Problem 2: Make sure that instructions in different stages of the pipeline do not
interfere with each other.
To solve this problem, pipeline registers (also called latches) are implemented.
At the end of a clock cycle the results from a stage are stored into the register (Fig. 5)
Time (in CC (Clock Cycles))
CC1 CC2 CC3 CC4 CC5
IF ID EX MEM WB
IM REG DM REG
Figure 5: latches added to pipeline
5
Pipeline Hazards
• Hazards are situations that prevent the next instruction in the instruction
stream from executing during its designated clock cycle
• Hazards reduce performance of pipelines!
• There are 3 classes of hazards:
1) Structural hazards: arise from resource conflicts when the hardware
cannot support overlapped execution (Ex: Memory with just one port, see
Fig. 3)
2) Data hazards: arise when an instruction depends on the results of
previous instructions
Example:
ADD R1, R2, R3 (means R1 := R2 + R3)
SUB R4, R1, R5 (means R4 := R1 - R5)
AND R6, R1, R7 (means R6 := R1 AND R7)
OR R8, R1, R9 (means R8 := R1 OR R9)
XOR R10, R1, R11 (means R10 := R1 XOR R11)
All instructions after the ADD use the result of the ADD instruction (value of R1). But
which instructions are affected by hazards? See also Figure 6.
The ADD instruction writes the value of R1 in the WB pipe stage, but the SUB
instruction reads the value during its ID stage (data hazard!). Unless precautions are
taken to prevent it, the SUB instruction will read the wrong value.
The AND instruction is also affected by this hazard: the write of R1 does not
complete until the end of clock cycle 5. Thus, the AND instruction that reads the
registers during clock cycle 4 will receive the wrong result.
The OR instruction operates without incurring a hazard because the register READ is
performed in the second half of the cycle and the register WRITE in the first half.
The XOR instruction also operates properly because its register READ occurs in
clock cycle 6.
6
Time (in CC (Clock Cycles))
Instructions
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
IM REG DM REG
ADD R1, R2, R3
IM REG DM REG
SUB R4, R1, R5
IM REG DM REG
AND R6, R1, R7
IM REG DM REG
OR R8, R1, R9
XOR R10, R1, R11
IM REG DM REG
Figure 6: a program ‚passing’ the pipeline
7
3) Control Hazards: arise from the pipelining of branch instructions: no new
instruction is fetched until the branch instruction is terminated
Hazards in pipelines can make it necessary to stall the pipeline (the pipeline waits
until the hazard has been cleared)
There exist some techniques (ex: forwarding) that make it possible to avoid stalls.
Some performance issues in pipelining
• Pipelining increases the instruction throughput (number of instruction
completed per unit of time) BUT it does NOT reduce the execution time of an
individual instruction (in fact, it usually slightly increases the execution time of
a single instruction due to overhead introduced by pipelines)
• Limits in performance arise from:
- imbalance among the pipe stages (the clock can run no faster than
the time needed for the slowest pipeline stage)
- Pipeline overhead (ex: latches, filling of the pipeline)