0% found this document useful (0 votes)
39 views70 pages

Lecture # Pipelining and Datahazards

The document describes the pipelined datapath and control in computer organization, detailing the five stages of instruction execution and the exceptions that can lead to data and control hazards. It explains the importance of pipeline registers for maintaining instruction values across stages and addresses issues such as forwarding and stalling to handle data hazards. Additionally, it highlights the need for a hazard detection unit to manage control hazards, particularly in branch instructions.

Uploaded by

adeenhassan7575
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views70 pages

Lecture # Pipelining and Datahazards

The document describes the pipelined datapath and control in computer organization, detailing the five stages of instruction execution and the exceptions that can lead to data and control hazards. It explains the importance of pipeline registers for maintaining instruction values across stages and addresses issues such as forwarding and stalling to handle data hazards. Additionally, it highlights the need for a hazard detection unit to manage control hazards, particularly in branch instructions.

Uploaded by

adeenhassan7575
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

The Processor

Computer Organization and Assembly Language


Pipelined
Datapath
and Control
• There are five
stages in datapath
from left to right.
• However, two
exceptions:
1. PC is calculated
in next stages if
branch
instruction and
written back to
PC
2. Write-back
operation which
sent data from
memory to
register.
Pipelined Datapath and Control
(Figure Explanation)
• There are two exceptions to this left-to-right flow of
instructions:
1. The write-back-stage, which places the result back into the
register file in the middle of the datapath. It leads to Data
hazard.
2. The selection of the next value of the PC, choosing between
the incremented PC and the branch address from the MEM
stage. It leads to Control hazard.
• What is the effect of data flowing from Right to Left?
• It does not affect the current instruction
• However, reverse data movements influence only later
instructions in the pipeline
Datapath and Control
Datapath and Control
It want to write on 100,
but that has been
updated to 200

Write address
is 200

Write address
is 300

Write address
is 400

4
Write address
is 500
5
Write address
is 600

6
Pipelined Datapath and Control
(Figure Explanation)
• IM  Instruction memory
• Reg  Register file
• When Reg is shaded right Read Operation
• When Reg is shaded left Write Operation
• DM Data Memory
Pipelined Datapath and Control
(Figure Explanation)
• Observation: The instruction memory is used during
only one of the five stages of an instruction, allowing it
to be shared by following/next instructions during the
other four stages
• How to retain the value of an individual instruction for
its other four stages?
• The value read from instruction memory must be saved in a
register. This will be pipeline register.
• Thus, pipeline registers are required for every stage
wherever there are dividing lines between stages.
Pipelined Datapath and Control
Pipelined Datapath and Control
(Figure Explanation)
• Figure 4.35 shows the pipelined datapath with the
pipeline registers highlighted
• The pipeline registers are named for the two stages
separated by that register. For example,
• Pipeline register between the IF and ID stages is called IF/ID
• Pipeline register between the ID and EX stage is called ID/EX
• Pipeline register between the EX and MEM stages is called
EX/MEM
• Pipeline register between the MEM and WB stage is called
MEM/WB
Pipelined Datapath and Control
(Figure Explanation)
• Observation: Notice that there is no pipeline register at
the end of the write-backstage
• All instructions must update some state in the processor—the
register file, memory, or the PC—so a separate pipeline
register is redundant to the state that is updated
• In write-backstage, only data is written back in the register,
there is no update in processor state.
• For example, a load instruction will place its result in 1 of the
32 registers, and any later instruction that needs that data will
simply read the appropriate register
Pipelined Datapath and Control
(Figure Explanation)
• PC is incremented on every instruction
• Either PC+4
• Or PC is set to branch destination address if branch instruction
• In case of any exception (when there is need to discard
the content of pipeline register), the PC must be saved.
• Thus, PC can be thought of as a pipeline register and it’s the
one that feeds the IF stage of the pipeline
Pipelined Datapath and Control
• How the pipelining works?
• We show sequences of figures to demonstrate operation over
time
• Figures 4.36 through 4.38, our first sequence, show the
active portions of the datapath highlighted as a load
instruction goes through the five stages of pipelined
execution
• We show a load first because it is active in all five stages.
• Shaded right half of registers/memory  Read Operation
• Shaded left half of registers/memory Write Operation
Pipelined Datapath and Control
lw instruction
• The five stages for lw instruction in pipeline are the
following:
1. Instruction fetch
2. Instruction decode and register fle read
3. Execute or address calculation
4. Memory access
5. Write-back
Pipelined Datapath and Control
lw instruction
• The five stages for lw instruction are the following:
1. Instruction fetch:
1. Read from IM
2. Written in IF/ID register
3. PC+4 calculated
4. PC+4 is also saved in the IF/ID pipeline register in case needed later for an instruction, such as beq
• Because computer cannot know which type of instruction is being fetched, so it must prepare for any instruction

2. Instruction decode and register file read: All four values are stored in the ID/EX pipeline register
1. Reg Source 1
2. Reg Source 2
3. 16-bit Constant with sign extension to 32-bit
4. PC+4 (for late stages)
3. Execute or address calculation
1. Read [Reg 1 and 32-bit Constant] from the ID/EX pipeline register
2. Adds them using the ALU
3. Sum is placed in the EX/MEM pipeline register
4. Memory access
1. Reading the data memory using the address from the EX/MEM pipeline register
2. Loading the data into the MEM/WB pipeline register
5. Write-back
1. Reading the data from the MEM/WB pipeline register
Write in Pipeline
Register
1. Instruction
2. PC+4

Instructio
n Read
Do we have address of
Reg, where to write? If
next instruction is
store. No.
Pipelined Datapath and Control
• Load and store illustrate a second key point:
• Each logical component of the datapath—such as instruction
memory, register read ports, ALU, data memory, and register
write port—can be used only within a single pipeline stage
• Otherwise, we would have a structural hazard
• Hence these components, and their control, can be associated
with a single pipeline stage
Pipelined Datapath and Control
• If we have store instruction right after the load instruction.
• lw $t0, 100($gp)
• sw $t0, 100($gp)
• Store instruction passes the content of ID/EX  EX/ MEM pipeline registers for use in
the MEM stage
• Load instruction must pass the register number from the ID/EX  EX/MEM 
MEM/WB pipeline register for use in the WB stage
• It means to pass register number to share the pipelined datapath
• Because, we need to preserve the instruction read during the IF stage, so each pipeline
register contains a portion of the instruction needed for that stage and later stages
• This is a bug in the design of the load instruction
• Which register is changed in the final stage of the load?
• More specifically, which instruction supplies the write register number?
• The instruction in the IF/ID pipeline register supplies the write register number, yet this instruction occurs
considerably after the load instruction!
• Hence, we need to preserve the destination register number in the load instruction.
• Just as store passed the register contents from the
Pipelined Datapath and Control
(Figure Explanation)
• Figure 4.41 shows the correct version of the datapath
• Passing the write register number first to the ID/EX register
EX/MEM register MEM/WB register
• The register number is used during the WB stage to specify
the register to be written.
• Figure 4.42 is a single drawing of the corrected datapath
• Highlighting the hardware used in all five stages of the load
word instruction
Graphically Representing Pipelines
• Pipelining can be difficult to understand, since many
instructions are simultaneously executing in a single
datapath in every clock cycle
• To aid understanding, there are two styles of pipeline
figures
1. Multiple-clock-cycle pipeline diagrams
2. Single-clock-cycle pipeline diagrams
• The multiple-clock-cycle diagrams are simpler but do
not contain all the details
Graphically Representing Pipelines
• For example, consider the following five-instruction
sequence:
• lw $10, 20($1)
• sub $11, $2, $3
• add $12, $3, $4
• lw $13, 24($1)
• add $14, $5, $6
• None of instruction is depending on data of previous
instruction
5th 4th 3rd 2nd 1st
Pipelined Control
• We have added Pipeline Registers. Where is control?
• We need to add control for these pipeline registers in the main
control unit
• To specify control for the pipeline, we need only set the
control values during each pipeline stage
• Because each control line is associated with a component active
in only a single pipeline stage
• We can divide the control lines into five groups according
to the pipeline stage
• Figure 4.51 shows the full datapath with the extended
pipeline registers
Data Hazards: Forwarding vs
Stalling
• In previous example, None of instruction used the results
calculated by any of the others
• Let’s look at a sequence with many dependences, shown in color:
• sub $2, $1,$3 # Register $2 written by sub
• and $12,$2,$5 # 1st operand($2) depends on sub
• or $13,$6,$2 # 2nd operand($2) depends on sub
• add $14,$2,$2 # 1st($2) & 2nd($2) depend on sub
• sw $15,100($2) # Base ($2) depends on sub
• The last four instructions are all dependent on the result in
register $2 of the first instruction
• Initial value of $2 = 10
• After subtraction $2 = -20
Data Hazards: Forwarding vs
Stalling
(Figure Explanation)
• Figure 4.52 illustrates the execution of these instructions using a multiple-
clock-cycle pipeline representation.
• Observation: The value of register $2 changes during the middle of clock cycle
5, when the sub instruction writes its result
• What happens when a register is read and written in the same clock cycle?
• Assume,
• the write is in the first half of the clock cycle
• the read is in the second half of the clock cycle
• Thus, the read delivers what is written in first half of cycle
• It means that the values read for register $2 would not be the result of the sub
instruction unless the read occurred during clock cycle 5 or later
• For AND and OR instruction  Incorrect value of $2 = 10 is read
• For add and sw instruction  Correct value of $2 = -20 is read
• Using this style of drawing, such problems become apparent when a
dependence line goes backward in time
Data Hazards: Forwarding vs
Stalling
• Observation: The desired result is available at the end of the
EX stage or clock cycle 3 which is $2 = -20
• When is data actually needed by the AND and OR
instructions?
• At the beginning of the EX stage,
• That is clock cycles 4 and 5, respectively
• Solution:
• Forward the data from Pipeline register to required unit as it is
available
• Thus, there will be no bubble or stall
• The write backstage later write in the register file which can be
used by add and sw instruction in this case.
Now ALU result can be sent
as operand value for next
instruction
Data Hazards: Forwarding vs
Stalling
Forwarding Multiplexer Control Value
Data Hazards and Stalls
• Problem:
• One case where forwarding cannot save the day (avoid stall) is when an
instruction tries to read a register following a load instruction that writes the same
register
• lw $2, 100($1)
• add $4, $2, $5
• The data is still being read from memory in clock cycle 4
• While the ALU is performing the operation for the following instruction
• Solution:
• Stall the pipeline for the combination of load followed by an instruction that reads its result
• Requirements to provide solution:
• In addition to a forwarding unit, we need a hazard detection unit
• Why hazard detection unit?
• It operates during the ID stage so that it can insert the stall between the load instruction
and its use
Data Hazards and Stalls
• Figure 4.59 shows what really happens in the hardware
• The pipeline execution slot associated with the AND instruction is
turned into a nop and all instructions beginning with the AND
instruction are delayed one cycle
• Like an air bubble in a water pipe, a stall bubble delays everything
behind it and proceeds down the instruction pipe one stage each
cycle until it exits at the end.
• In this example, the hazard forces the AND and OR instructions to
repeat in clock cycle 4 what they did in clock cycle 3: AND reads
registers and decodes, and OR is refetched from instruction memory.
Such repeated work is what a stall looks like, but its effect is to
stretch the time of the AND and OR instructions and delay the fetch
of the add instruction.
Data Hazards and Stalls
• If the instruction in the ID stage is stalled, then the
instruction in the IF stage must also be stalled; otherwise,
we would lose the fetched instruction
• Preventing these two instructions from making progress is
accomplished simply by preventing the PC register and the IF/ID
pipeline register from changing
• Provided these registers are preserved, the instruction in the IF
stage will continue to be read using the same PC, and the
registers in the ID stage will continue to be read using the same
instruction fields in the IF/ID pipeline register
• What it is doing is executing instructions that have no
effect: nops.
Data Hazards and Stalls
• How can we insert these nops, which act like bubbles, into the pipeline?
• We see that deasserting all nine control signals (setting them to 0) in the EX,
MEM, and WB stages will create a “do nothing” or nop instruction.
• By identifying the hazard in the ID stage, we can insert a bubble into the
pipeline by changing the EX, MEM, and WB control felds of the ID/EX
pipeline register to 0.
• These benign control values are percolated forward at each clock cycle
with the proper effect: no registers or memories are written if the control
values are all 0.
Data Hazards and Stalls
(Figure Explanation)
• Figure 4.60 highlights the pipeline connections for both the
hazard detection unit and the forwarding unit
• What is purpose of forwarding unit?
• It controls the ALU multiplexors to replace the value from a
general-purpose register with the value from the proper pipeline
register
• What is the purpose of hazard detection?
• It controls the writing of the PC and IF/ID registers plus the
multiplexor that chooses between the real control values and all 0s
• The hazard detection unit stalls and deasserts the control
felds if the load-use hazard test above is true
Control Hazards
• Thus far, we have limited our concern to hazards
involving arithmetic operations and data transfers
• However, as we saw in Section 4.5, there are also
pipeline hazards involving branches. Figure 4.61 shows
a sequence of instructions and indicates when the
branch would occur in this pipeline
• An instruction must be fetched at every clock cycle to
sustain the pipeline, yet in our design the decision
about whether to branch doesn’t occur until the MEM
pipeline stage.
Control Hazards
• What is control hazard?
• The delay in determining the proper instruction to fetch is
called a control hazard or branch hazard
• They occur less frequently than data hazards, and there is
nothing as effective against control hazards as forwarding is
against data hazards
• Hence, we use simpler schemes
• Two schemes for resolving control hazards
• One optimization to improve these schemes
Control Hazards
Prediction Assume Branch Not
Taken
• Problem with Brach
• Stalling make the execution process slow
• Solution
• Predict that the branch will not be taken and thus continue execution down the sequential instruction
stream
• However, if branch is being taken (condition becomes true), what happens next?
• If the branch is taken, the instructions that are being fetched and decoded must be discarded
• Execution continues at the branch target
• If branches are untaken half the time, and if it costs little to discard the instructions, this
optimization halves the cost of control hazards
• How to discard an instruction?
• Discarding instructions means flush instructions in the IF, ID, and EX stages of the pipeline
• To discard instructions, change the original control values to 0s as we did to stall for a data hazard
• What is the difference between dealing with data hazard and control hazard except
change control values to zero?
• The difference is that we must also change the three instructions in the IF, ID, and EX stages when the
branch reaches the MEM stage
• For load-use stalls, we just change control to 0 in the ID stage and let them percolate through the pipeline
Control Hazards
Reducing the Delay of Branches
• One way to improve branch performance is to reduce the cost of the
taken branch.
• Thus far, we have assumed the next PC for a branch is selected in the
MEM stage, but if we move the branch execution earlier in the pipeline,
then fewer instructions need be flushed
• The MIPS architecture was designed to support fast single-cycle
branches that could be pipelined with a small branch penalty
• The designers observed that many branches rely only on simple tests
(equality or sign, for example) and that such tests do not require a full
ALU operation but can be done with at most a few gates
• When a more complex branch decision is required, a separate instruction
that uses an ALU to perform a comparison is required—a situation that is
similar to the use of condition codes for branches (see Chapter 2).
Control Hazards
Reducing the Delay of Branches
• Moving the branch decision up requires two actions to
occur earlier
• Computing the branch target address
• Evaluating the branch decision
• The easy part of this change is to move up the branch
address calculation
• We already have the PC value and the immediate field in
the IF/ID pipeline register
• We just move the branch adder from the EX-stage to the ID stage
• The branch target address calculation will be performed for all
instructions, but only used when needed
Control Hazards
Reducing the Delay of Branches
• The harder part is the branch decision itself
• For branch equal, we would compare the two registers
read during the ID stage to see if they are equal
• Equality can be tested by first exclusive ORing their respective
bits and then ORing all the results
• Moving the branch test to the ID stage implies
additional forwarding and hazard detection hardware
• Since a branch dependent on a result still in the pipeline must
still work properly with this optimization
Control Hazards
Reducing the Delay of Branches
• For example, to implement branch on equal (and its inverse), we will need to forward
results to the equality test logic that operates during ID. There are two complicating
factors:
1. During ID, we must decode the instruction, decide whether a bypass to the equality unit
is needed, and complete the equality comparison so that if the instruction is a branch,
we can set the PC to the branch target address. Forwarding for the operands of
branches was formerly handled by the ALU forwarding logic, but the introduction of the
equality test unit in ID will require new forwarding logic. Note that the bypassed source
operands of a branch can come from either the ALU/MEM or MEM/WB pipeline latches.
2. Because the values in a branch comparison are needed during ID but may be produced
later in time, it is possible that a data hazard can occur and a stall will be needed. For
example, if an ALU instruction immediately preceding a branch produces one of the
operands for the comparison in the branch, a stall will be required, since the EX stage
for the ALU instruction will occur afer the ID cycle of the branch. By extension, if a load
is immediately followed by a conditional branch that is on the load result, two stall
cycles will be needed, as the result from the load appears at the end of the MEM cycle
but is needed at the beginning of ID for the branch
Control Hazards
Reducing the Delay of Branches
• Despite these difficulties, moving the branch execution to
the ID stage is an improvement, because it reduces the
penalty of a branch to only one instruction if the branch is
taken, namely, the one currently being fetched
• The exercises explore the details of implementing the
forwarding path and detecting the hazard.
• To flush instructions in the IF stage, we add a control line,
called IF.Flush, that zeros the instruction field of the IF/ID
pipeline register. Clearing the register transforms the
fetched instruction into a nop, an instruction that has no
action and changes no state.
Control Hazards
Dynamic Branch Prediction
• Assuming a branch is not taken is one simple form of branch prediction
• In that case, we predict that branches are untaken, flushing the pipeline when
we are wrong
• For the simple five-stage pipeline, such an approach, possibly coupled
with compiler-based prediction, is probably adequate
• With deeper pipelines, the branch penalty increases when measured in
clock cycles
• Similarly, with multiple issue, the branch penalty increases in terms of
instructions lost
• This combination means that in an aggressive pipeline, a simple static
prediction scheme will probably waste too much performance
• As we mentioned in Section 4.5, with more hardware it is possible to try
to predict branch behavior during program execution
Control Hazards
Dynamic Branch Prediction
• One approach is to look up the address of the instruction
to see if a branch was taken the last time this instruction
was executed
• If so, to begin fetching new instructions from the same place as
the last time
• This technique is called dynamic branch prediction.
• One implementation of that approach is a branch
prediction buffer or branch history table
• A branch prediction buffer is a small memory indexed by the
lower portion of the address of the branch instruction
• The memory contains a bit that says whether the branch was
recently taken or not
Control Hazards
Dynamic Branch Prediction
• This is the simplest sort of buffer
• Actually, we don’t know, in fact, if the prediction is the right one—it may have
been put there by another branch that has the same low-order address bits.
• However, this doesn’t affect correctness
• Prediction is just a hint that we hope is correct, so fetching begins in
the predicted direction
• If the hint turns out to be wrong, the incorrectly predicted
instructions are deleted, the prediction bit is inverted and stored
back, and the proper sequence is fetched and executed
• This simple 1-bit prediction scheme has a performance shortcoming:
• Even if a branch is almost always taken, we can predict incorrectly twice,
rather than once, when it is not taken
• The following example shows this dilemma
Control Hazards
Dynamic Branch Prediction
• Ideally, the accuracy of the predictor would match the taken branch
frequency for these highly regular branches
• To remedy this weakness, 2-bit prediction schemes are often used
• In a 2-bit scheme, a prediction must be wrong twice before it is changed
• Figure 4.63 shows the finite-state machine for a 2-bit prediction
scheme
• A branch prediction buffer can be implemented as a small, special buffer
accessed with the instruction address during the IF pipe stage
• If the instruction is predicted as taken, fetching begins from the target as
soon as the PC is known; it can be as early as the ID stage
• Otherwise, sequential fetching and executing continue. If the
prediction turns out to be wrong, the prediction bits are changed as
shown in Figure 4.63
Control Hazards
Dynamic Branch Prediction
• Elaboration: In a five-stage pipeline, we can make the
control hazard a feature by redefining the branch. A
delayed branch always executes the following instruction,
but the second instruction following the branch will be
affected by the branch
• Compilers and assemblers try to place an instruction that
always executes after the branch in the branch delay slot
• The job of the software is to make the successor instructions
valid and useful
• Figure 4.64 shows the three ways in which the branch delay slot
can be scheduled
Control Hazards
Dynamic Branch Prediction
Control Hazards
Dynamic Branch Prediction
• The top box in each pair shows the code before scheduling the bottom box shows the
scheduled code
• In (a), the delay slot is scheduled with an independent instruction from before the
branch
• This is the best choice
• Strategies (b) and (c) are used when (a) is not possible
• In the code sequences for (b) and (c), the use of $s1 in the branch condition prevents
the add instruction (whose destination is $s1) from being moved into the branch delay
slot. In (b) the branch delay slot is scheduled from the target of the branch; usually the
target instruction will need to be copied because it can be reached by another path
• Strategy (b) is preferred when the branch is taken with high probability, such as a loop
branch. Finally, the branch may be scheduled from the not-taken fall-through as in (c)
• To make this optimization legal for (b) or (c), it must be OK to execute the sub
instruction when the branch goes in the unexpected direction. By “OK” we mean that
the work is wasted, but the program will still execute correctly
• This is the case, for example, if $t4 were an unused temporary register when the
branch goes in the unexpected direction.

You might also like