0% found this document useful (0 votes)

17 views

2.lecture 4,5,6 Basic Processing Unit

Uploaded by

zlib dis

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

2.lecture 4,5,6 Basic Processing Unit

Uploaded by

zlib dis

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

Computer Architecture and Organizations

Professor Dr. Rafiqul Islam

Department of CSE

Professor Dr. Rafiqul Islam 1

Fundamental Concepts
oProcessor fetches one instruction at a time and perform the operation
specified.
oInstructions are fetched from successive memory locations until a branch or
a jump instruction is encountered.
oProcessor keeps track of the address of the memory location containing the
next instruction to be fetched using Program Counter (PC).
oInstruction Register (IR)

Professor Dr. Rafiqul Islam 2

Instruction Subset

Professor Dr. Rafiqul Islam 3

Instruction Format

Professor Dr. Rafiqul Islam 4

Implementing MIPS
• Simplified to contain only
• arithmetic-logic instructions:
add, sub, and, or,
slt
• memory-reference
instructions: lw, sw
• control-flow instructions:
beq, j

Professor Dr. Rafiqul Islam 5

Implementing MIPS: the Fetch/Execute Cycle
• High-level abstract view of fetch/execute • Fetch the next instruction to be executed from
implementation memory.
• use the program counter (PC) to read instruction • Decode the opcode.
address • Read operand(s) from main memory, if any.
• fetch the instruction from memory and increment PC
• Execute the instruction and store results.
• Go to the first step.
• use fields of the instruction to select registers to read
• execute depending on the instruction
• repeat…
Data

Data

Professor Dr. Rafiqul Islam 6

Processor Implementation Styles
• Single Cycle
• perform each instruction in 1 clock cycle
• clock cycle must be long enough for slowest instruction; therefore,
• disadvantage: only as fast as slowest instruction
• Multi-Cycle
• break fetch/execute cycle into multiple steps
• perform 1 step in each clock cycle
• advantage: each instruction uses only as many cycles as it needs
• Pipelined
• execute each instruction in multiple steps
• perform 1 step / instruction in each clock cycle
• process multiple instructions in parallel – assembly line

Professor Dr. Rafiqul Islam 7

Executing an Instruction
oFetch the contents of the memory location pointed to by the PC. The
contents of this location are loaded into the IR (fetch phase).
IR ← [[PC]]
oAssuming that the memory is byte addressable, increment the contents of
the PC by 4 (fetch phase).
PC ← [PC] + 4
oCarry out the actions specified by the instruction in the IR (execution
phase).

Professor Dr. Rafiqul Islam 8

Components of a Processor

Professor Dr. Rafiqul Islam 9

High Level view of Micro-architecture

Professor Dr. Rafiqul Islam 10

Processor Organization Internal processor
bus

Control signals

Instruction
Address
decoder and
lines
MDR HAS MAR control logic
TWO INPUTS Memory
AND TWO bus

OUTPUTS MDR
Data
lines IR

Datapath
Y
Constant 4 R0

Select MUX

Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP

Professor Dr. Rafiqul Islam 11

Figure 7.1. Single-bus organization of the datapath inside a processor.
Executing an Instruction
o Transfer a word of data from one processor register to another or to the ALU.

o Perform an arithmetic or a logic operation and store the result in a processor register.

o Fetch the contents of a given memory location and load them into a processor register.

o Store a word of data from a processor register into a given memory location.

Professor Dr. Rafiqul Islam 12

Riout

Y in

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
Professor Dr. Rafiqul Islam 13
Register Transfers
o All operations and data transfers are controlled by the processor clock.
Bus

D Q
1
Q
Riout

Ri in
Clock

Figure
Figure7.3.
7.3. Input and output
InputProfessor
and output ating
g Islamfor
gating
Dr. Rafiqul for one
oneregister
re
gisterbit.
bit. 14
Performing an Arithmetic or Logic Operation
o The ALU is a combinational circuit that has no internal storage.
o ALU gets the two operands from MUX and bus. The result is temporarily
stored in register Z.

Professor Dr. Rafiqul Islam 15

Performing an Arithmetic or Logic Operation
o What is the sequence of operations to add the contents of register R1 to
those of R2 and store the result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in

Professor Dr. Rafiqul Islam 16

Fetching a Word from Memory
o Address into MAR; issue Read operation; data into MDR.

Memory-b us Internal processor

data lines MDR outE MDR out bus

MDR

MDR inE MDR in

Figure 7.4. Connection and control
Professor signals
Dr. Rafiqul Islam for register MDR. 17
Fetching a Word from Memory
oThe response time of each memory access varies (cache miss, memory-
mapped I/O,…).
oTo accommodate this, the processor waits until it receives an indication
that the requested operation has been completed (Memory-Function-
Completed, MFC).
o Example: Move (R1), R2
o MAR ← [R1]
o Start a Read operation on the memory bus
o Wait for the MFC response from the memory
o Load MDR from the memory bus
o R2 ← [MDR]

Professor Dr. Rafiqul Islam 18

Timing
Step 1 2 3

Clock

MARin MAR ← [R1]

Assume MAR
is always available
Address
on the address lines
of the memory bus.
Start a Read operation on the memory bus
Read

MDRinE

Data

Wait for the MFC response from the memory

MFC

MDR out
Load MDR from the memory bus

R2 ← [MDR]

Professor Dr. Rafiqul Islam 19

Figure 7.5. Timing of a memory Read operation.
Execution of a Complete Instruction
o Example: Add (R3), R1
o Fetch the instruction
o Fetch the first operand (the contents of the memory location pointed to by R3)
o Perform the addition
o Load the result into R1

Professor Dr. Rafiqul Islam 20

Example: Instructions
• R-format Instruction. Execution of an • Load/Store Instruction. Execution of a
R-format instruction (e.g., add $t1, load/store instruction (e.g., lw $t1,
$t0, $t1) using the datapath offset($t2)) using the datapath
• Fetch instruction from instruction • Fetch instruction from instruction
memory and increment PC memory and increment PC
• Input registers (e.g., $t0 and $t1) are read • Read register value (e.g., base address in
from the register file $t2) from the register file
• ALU operates on data from register file • ALU adds the base address from register
using the funct field of the MIPS $t2 to the sign-extended lower 16 bits of
instruction (Bits 5-0) to help select the the instruction (i.e., offset)
ALU operation • Result from ALU is applied as an address
• Result from ALU written into register file to the data memory
using bits 15-11 of instruction to select • Data retrieved from the memory unit is
the destination register (e.g., $t1). written into the register file, where the
register index is given by $t1 (Bits 20-16
of the instruction).

Professor Dr. Rafiqul Islam 21

Example: Instructions
• Branch Instruction. Execution of a branch instruction (e.g., beq $t1,
$t2, offset) using the datapath
• Fetch instruction from instruction memory and increment PC
• Read registers (e.g., $t1 and $t2) from the register file. The adder sums PC + 4
plus sign-extended lower 16 bits of offset shifted left by two bits, thereby
producing the branch target address (BTA).
• ALU subtracts contents of $t1 minus contents of $t2. The Zero output of the
ALU directs which result (PC+4 or BTA) to write as the new PC.

Professor Dr. Rafiqul Islam 22

Architecture Internal processor
bus
Riin

Riout

Y in

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
Professor Dr. Rafiqul Islam 23
Execution of a Complete Instruction Internal processor
bus

Control signals

Instruction: Add (R3), R1 PC

Instruction
Step Action Address
decoder and
lines
MAR control logic

1 PC out , MAR in , Read, Select4, Add, Zin Memory

bus

2 Zout , PC in , Y in , WMF C MDR

Data
lines IR
3 MDR out , IR in
4 R3out , MAR in , Read Y
Constant 4 R0
5 R1out , Y in , WMF C
6 MDR out , SelectY, Add, Zin Select MUX

7 Zout , R1 in , End Add

A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
Figure 7.6. Control sequencefor execution of the instruction Add (R3),R1.
Z

Professor Dr. Rafiqul Islam 24

Figure 7.1. Single-bus organization of the datapath inside a processor.
Execution of Branch Instructions
oA branch instruction replaces the contents of PC with the branch
target address, which is usually obtained by adding an offset X given
in the branch instruction.
oThe offset X is usually the difference between the branch target
address and the address immediately following the branch instruction.
oConditional branch

Professor Dr. Rafiqul Islam 25

Execution of Branch Instructions
Step Action

1 PC out , MAR in , Read, Select4, Add, Z in

2 Z out , PC in , Y in , WMF C
3 MDR out , IR in
4 Offset-field-of-IR out, Add, Z in
5 Z out, PC in , End

Figure 7.7. Control sequence for an unconditional branch instruction.

Professor Dr. Rafiqul Islam 26

Multiple-Bus Organization
o Instruction: Add R4, R5, R6

Step Action

1 PC , R=B, MAR , Read, IncPC

out in
2 WMF C

3 MDR , R=B, IR
outB in
4 R4 outA , R5 outB , SelectA, Add, R6 in , End

Figure 7.9. Control sequence for the instruction. Add R4,R5,R6,

for the three-bus organization in Figure 7.8.

Professor Dr. Rafiqul Islam 27

Quiz Internal processor
bus

Control signals

Instruction
Address
decoder and
lines
o What is the control sequence for execution MAR control logic

of the instruction Memory

bus

MDR
Add R1, R2 Data
lines IR

including the instruction fetch phase? Y

(Assume single bus architecture) Constant 4 R0

Select MUX

Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.

Professor Dr. Rafiqul Islam 28

A Complete Processor
Instruction Inte ger Floating-point
unit unit unit

Instruction Data
cache cache

Bus interf ace

Pr ocessor

System b us

Main Input/
memory Output

Figure 7.14. Block diagram of a complete processor .

Professor Dr. Rafiqul Islam 29

Datapath and Control

• Datapath is the hardware that performs all the required operations, for
example, ALU, registers, and internal buses.

• Control is the hardware that tells the datapath what to do, in terms of
switching, operation selection, data movement between ALU
components, etc.

Professor Dr. Rafiqul Islam 30

Hardwired Control Unit Organization
CLK Control step
Clock counter

External
inputs
Decoder/
IR
encoder
Condition
codes

Control signals

Figure 7.10. Control unit organization.

Professor Dr. Rafiqul Islam 31
Detailed Block Description
CLK
Clock Control step Reset
counter

Step decoder

T 1 T2 Tn

INS 1
External
INS 2 inputs
Instruction
IR Encoder
decoder
Condition
codes
INSm

Run End

Control signals

Professor Dr. Rafiqul Islam 32

Figure 7.11. Separation of the decoding and encoding functions.
Overview
Step Action

1 PCout , MAR in , Read, Select4,Add, Zin

2 Zout , PCin , Y in , WMF C
3 MDR out , IR in
4 R3out , MAR in , Read
5 R1out , Yin , WMF C
6 MDR out , SelectY, Add, Zin
7 Zout , R1in , End

Figure 7.6. Control sequencefor execution of the instruction Add (R3),R1.

Professor Dr. Rafiqul Islam 33
Micro-programmed Control
o Control store
Starting
IR address
generator One function
cannot be carried
out by this simple
organization.

Clock PC

Control
store CW

Figure 7.16. Basic organization of a microprogrammed control unit.

Professor Dr. Rafiqul Islam 34
Micro-programmed Control
o The previous organization cannot handle the situation when the control unit is required to check the
status of the condition codes or external inputs to choose between alternative courses of action.

o Use conditional branch microinstruction.

Address Microinstruction

0 PC out , MAR in , Read, Select4, Add, Z in

1 Z out , PC in , Y in , WMF C
2 MDR out , IR in
3 Branch to starting address of appropriate microroutine
. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ..
25 If N=0, then branch to microinstruction 0
26 Offset-field-of-IR out , SelectY, Add, Z in
27 Z out , PC in , End

Figure 7.17. Microroutine forRafiqul

Professor Dr. the instruction
Islam Branch<0. 35
Micro-programmed Control
External
inputs
Starting and
branch address Condition
IR codes
generator

Clock PC

Control
store CW

Figure 7.18. Organization of the control unit to allow

Professor Dr.conditional
Rafiqul Islambranching in the microprogram. 36
Microinstructions
oA straightforward way to structure microinstructions is to assign one
bit position to each control signal.
oHowever, this is very inefficient.
oThe length can be reduced: most signals are not needed
simultaneously, and many signals are mutually exclusive.
oAll mutually exclusive signals are placed in the same group in binary
coding.

Professor Dr. Rafiqul Islam 37

Pipelining

Professor Dr. Rafiqul Islam 38

Making the Execution of Programs Faster
oUse faster circuit technology to build the processor and the main
memory.
oArrange the hardware so that more than one operation can be
performed at the same time.
oIn the latter way, the number of operations performed per second is
increased even though the elapsed time needed to perform any one
operation is not changed.

Professor Dr. Rafiqul Islam 39

Traditional Pipeline Concept
oLaundry Example
oAnn, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
A B C D
oWasher takes 30 minutes
oDryer takes 40 minutes
o“Folder” takes 20 minutes

Professor Dr. Rafiqul Islam 40

Traditional Pipeline Concept
o Sequential laundry takes 6 hours for 4 loads
o If they learned pipelining, how long would laundry take?
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20

Professor Dr. Rafiqul Islam 41

Traditional Pipeline Concept
• Pipelined laundry takes 3.5 hours for 4 loads
6 PM 7 8 9 10 11 Midnight

Time
T
a 30 40 40 40 40 20
s
k A

O B
r
d C
e
r D
Professor Dr. Rafiqul Islam 42
Traditional Pipeline Concept
6 PM 7 8 9

o Pipelining doesn’t help latency of single

Time
task, it helps throughput of entire
T workload
a 30 40 40 40 40 20
s o Pipeline rate limited by slowest pipeline
k A stage
o Multiple tasks operating simultaneously
O using different resources
r
B
d o Potential speedup = Number pipe stages
e o Unbalanced lengths of pipe stages
r C reduces speedup
o Time to “fill” pipeline and time to
D “drain” it reduces speedup
o Stall for Dependences
Professor Dr. Rafiqul Islam 43
Use the Idea of Pipelining in a Computer
Fetch + Execution

T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction

I1 F1 E1
(a) Sequential execution

I2 F2 E2
Interstage buffer
B1
I3 F3 E3

Instruction Ex ecution
fetch unit (c) Pipelined execution
unit

Figure 8.1. Basic idea of instruction pipelining.

(b) Hardware organization

Professor Dr. Rafiqul Islam 44

Use the Idea of Pipelining in a Computer
Time
Clock cycle 1 2 3 4 5 6 7

Instruction
Fetch + Decode
+ Execution + Write I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

(a) Instruction execution divided into four steps

Interstage u
bffers

D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3

Textbook page: 457 (b) Hardware organization

Professor Dr. Rafiqul Islam 45

Figure 8.2. A 4-stage pipeline.
Pipelined vs. Single-Cycle Instruction
Execution: the Plan
P rogram
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
lw $1, 100($0)
Ins truction
Reg ALU
Data
Reg
Single-cycle
fetch access

Ins truction Data

lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Ins truction
lw $3, 300($0) 8 ns fe tch
...
8 ns

Assume 2 ns for memory access, ALU operation; 1 ns for register access:

therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
P rogram
e xecution 2 4 6 8 10 12 14
Time
order
(in instructions)
Ins truction Da ta
lw $1, 100($0) Reg ALU Reg
fetch acces s

Instruction Da ta
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Ins truction Data

lw $3, 300($0) 2 ns Reg ALU Reg
fetch acces s

2 ns 2 ns 2 ns 2 ns 2 ns
Professor Dr. Rafiqul Islam 46
Role of Cache Memory
oEach pipeline stage is expected to complete in one clock cycle.
oThe clock period should be long enough to let the slowest pipeline stage to
complete.
oFaster stages can only wait for the slowest one to complete.
oSince main memory is very slow compared to the execution, if each
instruction needs to be fetched from main memory, pipeline is almost
useless.
oFortunately, we have cache.

Professor Dr. Rafiqul Islam 47

Pipeline Performance
oThe potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
oHowever, this increase would be achieved only if all pipeline stages
require the same time to complete, and there is no interruption
throughout program execution.
oUnfortunately, this is not true.

Professor Dr. Rafiqul Islam 48

Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction
I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Professor Dr. Rafiqul Islam 49

Figure 8.3. Effect of an execution operation taking more than one clockycle.
c
Pipeline Performance
o The previous pipeline is said to have been stalled for two clock cycles.
o Any condition that causes a pipeline to stall is called a hazard.
o Data hazard – any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
o Instruction (control) hazard – a delay in the availability of an instruction causes the pipeline
to stall.
o Structural hazard – the situation when two instructions require the use of a given hardware
resource at the same time.

Professor Dr. Rafiqul Islam 50

Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction hazard Instruction
I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

(a) Instruction execution steps in successive clock cycles

Time
Clock cycle 1 2 3 4 5 6 7 8 9

Stage
F: Fetch F1 F2 F2 F2 F2 F3 Idle periods – stalls
D: Decode D1 idle idle idle D2 D3 (bubbles)
E: Execute E1 idle idle idle E2 E3

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Professor Dr. Rafiqul Islam 51

Figure 8.4. Pipeline stall caused by a cache miss in F2.
Pipeline Performance
Structural hazard Load X(R1), R2

Time
Clock cycle 1 2 3 4 5 6 7

Instruction
I1 F1 D1 E1 W1

I 2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Effect of aDr.

Professor Load instruction
Rafiqul Islam on pipeline timing. 52
Pipeline Performance
oAgain, pipelining does not result in individual instructions being executed
faster; rather, it is the throughput that increases.
oThroughput is measured by the rate at which instruction execution is
completed.
oPipeline stall causes degradation in pipeline performance.
oWe need to identify all hazards that may cause the pipeline to stall and to
find ways to minimize their impact.

Professor Dr. Rafiqul Islam 53

Pipelined Implementation of Data path and
Control
o We now move to actually building a pipelined datapath
o First recall the 5 steps in instruction execution
o Instruction Fetch & PC Increment (IF)
o Instruction Decode and Register Read (ID)
o Execution or calculate address (EX)
o Memory access (MEM)
o Write result into register (WB)
o Review: single-cycle processor
• all 5 steps done in a single clock cycle
• dedicated hardware required for each step
o What happens if we breakProfessor
the execution
Dr. Rafiqul Islam
into multiple cycles, but keep54
the extra hardware?
Bus A Bus B Bus C

Incrementer

Datapath Design PC

Constant 4

MUX
A

ALU R

Instruction
decoder

MDR

MAR

Memory b us Address
data lines lines

Professor Dr. Rafiqul Islam 55

Figure 7.8. Three-b us organization of the datapath.
Review - Single-Cycle Datapath “Steps”

ADD

4 ADD

PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
D

IF ID EX MEM WB
Instruction Fetch Instruction Professor
Decode Dr.Execute/
Rafiqul Islam
Address Calc. Memory Access Write Back 56
Register
file

Pipelined Design

Bus A
A

Bus B
ALU R

- Separate instruction and data caches B

- PC is connected to IMAR

Bus C
- DMAR
- Separate MDR PC
- Buffers for ALU Control signal pipeline
- Instruction queue Incrementer
- Instruction decoder output
Instruction IMAR
decoder

Memory address
(Instruction fetches)
Instruction
queue

MDR/Write DMAR MDR/Read

Instruction cache

Memory address
- Reading an instruction from the instruction cache (Data access)

- Incrementing the PC
- Decoding an instruction
- Reading from or writing into the data cache Data cache
- Reading the contents of up to two regs
- Writing into one register in the reg file Figure 8.18. Datapath modified for pipelinedxecution,
e with
- Performing an ALU operation interstage u
bffers at the input and output of the ALU.
Professor Dr. Rafiqul Islam 57
Source 1
Source 2

SRC1 SRC2

ALU

RSLT

Destination

(a) Datapath

SRC1,SRC2 RSLT

E: Execute W: Write
(ALU) (Register file)

Forwarding path

(b) Position of the source and result registers in the processor pipeline

Figure 8.7. Operand forw

arding in a pipelined processor
.
Professor Dr. Rafiqul Islam 58
Pipelined Example
o Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10

Professor Dr. Rafiqul Islam 59

Alternative View –
Multiple-Clock-Cycle Diagram
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8

Time axis
lw $t0, 10($t1) IM REG ALU DM REG

sw $t3, 20($t4) IM REG ALU DM REG

add $t5, $t6, $t7 IM REG ALU DM REG

sub $t8, $t9, $t10 IM REG ALU DM REG

Professor Dr. Rafiqul Islam 60

Hazards

Professor Dr. Rafiqul Islam 61

Data Hazards
o We must ensure that the results obtained when instructions are executed in a pipelined processor
are identical to those obtained when the same instructions are executed sequentially.
o Hazard occurs
A←3+A
B ←4×A
o No hazard
A←5×C
B ← 20 + C
o When two operations depend on each other, they must be executed sequentially in the correct order.
o Another example:
Mul R2, R3, R4
Add R5, R4, R6

Professor Dr. Rafiqul Islam 62

Data Hazards
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I 1 (Mul) F1 D1 E1 W1

I 2 (Add) F2 D2 D2A E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

Figure 8.6. Pipeline stalled by data dependenc

y between D
2 and W1.
Figure 8.6. Pipeline stalled by data dependency between D2 and W1.
Professor Dr. Rafiqul Islam 63
Handling Data Hazards in Software
oLet the compiler detect and handle the hazard:
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
oThe compiler can reorder the instructions to perform some useful
work during the NOP slots.

Professor Dr. Rafiqul Islam 64

Side Effects
o The previous example is explicit and easily detected.
o Sometimes an instruction changes the contents of a register other than the one named as the
destination.
o When a location other than one explicitly named in an instruction as a destination operand is
affected, the instruction is said to have a side effect. (Example?)
o Example: conditional code flags:
Add R1, R3
AddWithCarry R2, R4
o Instructions designed for execution on pipelined hardware should have few side effects.

Professor Dr. Rafiqul Islam 65

Structural, Data and Control Hazards

Professor Dr. Rafiqul Islam 66

Pipelining MIPS
o What makes it hard?
• structural hazards: different instructions, at different stages, in the pipeline want to use the same
hardware resource
• control hazards: succeeding instruction, to put into pipeline, depends on the outcome of a previous
branch instruction, already in pipeline
• data hazards: an instruction in the pipeline requires data to be computed by a previous instruction still
in the pipeline

o Before actually building the pipelined datapath and control we first briefly examine these potential
hazards individually…

Professor Dr. Rafiqul Islam 67

Structural Hazards

o Structural hazard: inadequate hardware to simultaneously support all instructions in the pipeline
in the same clock cycle
o E.g., suppose single – not separate – instruction and data memory in pipeline below with one
read port
• then a structural hazardP rogram
between first and fourth lw instructions
e xecution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Da ta
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Da ta
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Hazard if single memory
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

o MIPS was designed to be pipelined: structural hazards are easy to avoid!

Professor Dr. Rafiqul Islam 68

Control Hazards
o Control hazard: need to make a decision based on the result of a previous instruction still
executing in pipeline
o Solution 1 Stall the pipeline

P rogram
e xecution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Ins truction Data Note that branch outcome is
a dd $4, $5, $6 Reg ALU Reg
fetch acces s
computed in ID stage with
Ins truction Data
be q $1, $2, 40
fe tch
Reg ALU
access
Re g added hardware (later…)
2ns
Ins truction Data
lw $3, 300($0) bubble Reg ALU Re g
fe tch access

4 ns 2ns

Pipeline stall

Professor Dr. Rafiqul Islam 69

Control Hazards
• Solution 2 Predict branch outcome
• e.g., predict branch-not-taken :
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg

Instruction Data
beq $1, $2, 40 Reg ALU Reg
2 ns fetch access

Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access

Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access

Instruction Data
beq $1, $2, 40 Reg ALU Reg
fetch access
2 ns
bubble bubble bubble bubble bubble

Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
Prediction failure:
Professor undo Islam
Dr. Rafiqul (=flush) lw 70
Control Hazards
o Solution 3 Delayed branch: always execute the sequentially next statement with the branch
executing after one instruction delay – compiler’s job to find a statement that can be put in the slot
that is independent of branch outcome
• MIPS does this – but it is an option in SPIM (Simulator -> Settings)

Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Ins truction Data
beq $1, $2, 40 Re g ALU Reg
fe tch a cce ss

Ins truction Data

add $4, $5, $6 fe tch
Reg ALU
acce ss
Re g
(d elayed branch slot) 2 ns
Ins truction Da ta
lw $3, 300($0) fe tch
Reg ALU
access
Re g
2 ns

2 ns
Delayed branch beq is followed by add that is
independent of branch outcome
Professor Dr. Rafiqul Islam 71
Data Hazards

o Data hazard: instruction needs data from the result of a previous instruction still executing in
pipeline
o Solution Forward data if possible…
2 4 6 8 10
Time

Instruction pipeline diagram:

add $s0, $t0, $t1 IF ID EX MEM WB shade indicates use –
left=write, right=read

Progra m
execution 2 4 6 8 10
order Time
(in ins tructions)
a dd $s 0, $t0, $t1 IF ID EX MEM WB Without forwarding – blue
line –
data has to go back in time;
s ub $t2, $s 0, $t3 IF ID EX MEM WB with forwarding – red line –
data is available in time

Professor Dr. Rafiqul Islam 72

Data Hazards
• Forwarding may not be enough
• e.g., if an R-type instruction following a load uses the result of the load – called load-use data hazard

2 4 6 8 10 12 14
Progra m Time
exe cution
order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB

Without a stall it is impossible
to provide input to the sub
instruction in time
s ub $t2, $s0, $t3 IF ID EX MEM WB

2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB

With a one-stage stall, forwarding
can get the data to the sub
instruction in time
bubble bubble bubble bubble bubble

sub $t2, $s0, $t3 IF ID EX MEM WB

Professor Dr. Rafiqul Islam 73
Reordering Code to Avoid Pipeline Stall
(Software Solution)
o Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)

o Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)

Professor Dr. Rafiqul Islam 74

Exception Handling

Professor Dr. Rafiqul Islam 75

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6388)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (634)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1160)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (983)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8302)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (633)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1254)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (10337)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (933)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (887)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1007)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (3237)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5058)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4346)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (458)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2091)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1993)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1077)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2780)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2032)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2839)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (76)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (906)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (813)
Presentation-Questions 2023
No ratings yet
Presentation-Questions 2023
5 pages
1.atomic Structure
No ratings yet
1.atomic Structure
31 pages
7.solution - Properties of Dilute Solution
No ratings yet
7.solution - Properties of Dilute Solution
22 pages
6.chemical Equilibrium - PH
No ratings yet
6.chemical Equilibrium - PH
28 pages
5.chemical Kinetics
No ratings yet
5.chemical Kinetics
21 pages
8 Thermochemistry
No ratings yet
8 Thermochemistry
20 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (277)