Computer Architecture and Organization
Computer Architecture and Organization
ORGANIZATION
LECTURE
TOPICS
Pipelining
Latency
Throughput
Instruction Pipelining in
Detail Pipeline Stages
Pipeline Hazards
Structural Hazards
Data Hazards
Control Hazards
Pipelining
Instruction Branch
Branch Alternatives
WHAT IS A
PIPELINE?
Example:
T 6 PM 7 8 10 11 12 1
9
a
s
k 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
A
O
r B
d
e C
r
D
T 30 30 30 30 30 30 30 Time
a
A
s
k B
O C
r
d D
e
r
Start the work
ASAP!
Pipelined laundry takes 3.5 hours for 4
loads!
HARDWARE REQUIREMENTS
6 PM 7 8 9
Pipelining doesn’t help
T Time
latency of single task, it
a 30 30 30 30 30 30 30 helps throughput of
s entire workload
k A
Load F D O E S
ALU
Ifetch Reg Reg
n DMem
s
t
ALU
Ifetch Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
r
d
e
ALU
Ifetch Reg DMem Reg
r
TIMING OF PIPELINE
NON-PIPELINED
F D O E S F D O E S
time
PIPELINED
F D O E S
F D O E S
F D O E S
F D O E S F - Fetch
D - Decode
time O - Operand Fetch
E - Execute
S - Result Store
THREE, FOUR & FIVE STAGE RISC
PIPELINE
RISC II
Fetch Decode Inst. Execute Inst.
Instruction Select regs. Store result
SPARC MB86900, IBM
801
Fetch Decode Inst. Execute Inst. Store result
Instruction Select regs.
MIPS, intel 486
cc 1 2 3 4 5 6 7
stage
1 a b c d e f g
2 - a b c d e f
3 - - a b c d e
4 - - - a b c d
5 - - - - a b c
OTHER NUMBER OF
STAGES
INTEL
Pentium I: 7 stages
Pentium II/III: 12
stages Pentium 4: 22
stages
PIPELINE PERFORMANCE
Structural Hazards
Occurs when a certain resource is requested
by more than one instruction at the same time
Solutions:
• Duplicate certain resources to
avoid structural hazards.
• Extend the clashing cycle by stopping
the whole of the pipeline until both
memory accesses are finished (stalling).
EXAMPLE: ONE MEMORY PORT/
STRUCTURAL HAZARD
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
I Load Ifetch
ALU
Reg DMem Reg
n
s
ALU
t
Instr 1 Ifetch Reg DMem Reg
r.
ALU
Reg
Instr 2 Ifetch Reg DMem
O
r
ALU
d Instr 3
Ifetch Reg DMem Reg
e
r Instr 4
Structural Hazard
RESOLVING STRUCTURAL HAZARDS
Solution 1: Wait
must detect the hazard
must have mechanism to
stall
Solution 2: Throw more hardware at the
problem
DETECTING AND RESOLVING
STRUCTURAL HAZARD
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
ALU
Load Ifetch Reg DMem Reg
n
s
ALU
t
Instr 1 Ifetch Reg DMem Reg
r.
ALU
Reg
Instr 2 Ifetch Reg DMem
O
r
Stall Bubble Bubble Bubble Bubble Bubble
d
e
r
ALU
Instr 3 Ifetch Reg DMem Reg
ELIMINATING STRUCTURAL
HAZARDS AT DESIGN TIME
Next PC
MUX
Next SEQ PC Next SEQ PC
Adder
4 RS1
Zero?
MUX MUX
MEM/WB
Address
RS2
EX/MEM
Reg File
Cache
ID/EX
Instr
IF/ID
ALU
Cache
Data
MUX
WB Data
Sign
Extend
Imm
Data path
RD RD RD
Control Path
ROLE OF INSTRUCTION SET DESIGN
IN STRUCTURAL HAZARD RESOLUTION
Data Hazards
Occurs when an instruction depends on the
result of a previous instruction that has not
yet terminated.
ALU
add r1,r2,r3 Ifetch Reg DMem Reg
n
s
t
ALU
Ifetch Reg DMem Reg
sub r4,r1,r3
r.
ALU
O and r6,r1,r7 Ifetch Reg DMem Reg
r
d
ALU
Ifetch Reg DMem Reg
e or r8,r1,r9
r
ALU
xor r10,r1,r11 Ifetch Reg DMem Reg
THREE GENERIC DATA
HAZARDS
Read After Write (RAW)
InstrJ tries to read operand before InstrI writes
it
I: add r1,r2,r3
J: sub r4,r1,r3
ALU
Reg DMem Reg
s
t
sub r4,r1,r3
ALU
Reg
r. Ifetch Reg DMem
ALU
Ifetch Reg DMem Reg
r and r6,r1,r7
d
e
ALU
Ifetch Reg DMem Reg
r or r8,r1,r9
ALU
Ifetch Reg DMem Reg
xor r10,r1,r11
HW CHANGE FOR
FORWARDING
NextPC
mux
Registers
MEM/WR
EX/MEM
ALU
ID/EX
Data
mux
Memory
mux
Immediate
DATA HAZARD EVEN WITH
FORWARDING
Time (clock cycles)
ALU
Reg DMem Reg
n
s
t sub r4,r1,r6 Ifetch
ALU
Reg DMem Reg
r.
ALU
Ifetch Reg DMem Reg
and r6,r1,r7
r
d
e
ALU
Ifetch Reg DMem Reg
or r8,r1,r9
r
RESOLVING THIS LOAD
HAZARD
Adding hardware? ... not
Detection?
Compilation techniques?
ALU
Reg
s
lw r1, 0(r2) Ifetch Reg DMem
t
r.
ALU
Ifetch Reg Bubble DMem Reg
sub r4,r1,r6
O
r Bubble
ALU
Ifetch Reg DMem Reg
d and r6,r1,r7
e
r Bubble
ALU
Ifetch Reg DMem
or r8,r1,r9
How is this different from the instruction issue
stall?
SOFTWARE SCHEDULING TO
AVOID LOAD HAZARDS
Try producing fast code
for a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Control Hazards
produced by branch
instructions
decision made by partially executed
instruction affects currently loading
instruction
CONTROL HAZARD ON
BRANCHES THREE STAGE STALL
ALU
Ifetch Reg DMem Reg
ALU
Ifetch Reg DMem Reg
14: and r2,r3,r5
ALU
Reg
18: or r6,r1,r7 Ifetch Reg DMem
ALU
Ifetch Reg DMem Reg
22: add r8,r1,r9
ALU
36: xor r10,r1,r11 Ifetch Reg DMem Reg
EXAMPLE: BRANCH STALL
IMPACT
If 30% branch, Stall 3 cycles
significant
Two part solution:
Determine branch taken or not sooner,
AND Compute taken branch address earlier
MUX
SEQ PC
Adder
Adder
Zero?
4 RS1
MEM/WB
Address
Memory
RS2
EX/MEM
Reg File
ID/EX
ALU
IF/ID
Memory
MUX
Data
MUX
WB Data
Sign
Extend
Imm
RD RD RD
Assume:
Conditional & Unconditional = 14%, 65% change
PC
Scheduling BranchCPI speedup v.
scheme penalty stall
Stallpipeline 3 1.42 1.0
Predict taken 1 1.14 1.26
Predict not taken 1 1.09 1.29
Delayed branch 0.5 1.07 1.31
END OF LECTURE…