Performance Evaluation
Basic concepts
Midterm Recap
Response time
Throughput
Speedup
CPI
IPC
Sections 1.1 ~ 1.4, 1.8~1.10,
Amdahl’s law
Sections 2.1, 2.4 ~ 2.8,
Appendices A&B
How to Summarize Performance Instruction Set Architecture
Arithmetic mean ISA types
Stack
Weighted arithmetic mean Accumulator
General-purpose registers
Register-memory
Geometric mean Register-register
Common memory addressing modes
Harmonic mean Register, immediate, replacement, ……
n / (1/s1 + 1/s2 + … + 1/sn) Byte ordering: big vs. little endian
1
Basics of MIPS Pipelined 5-Stage Data Path
Instruction format
I-type, R-type, J-type
Instruction types
ALU, load/store, control, FP
5-stage pipeline
IF, ID, EX, MEM, WB
MIPS FP Pipeline Dependences and Hazards
Dependences possible hazards
Dependences
Data, name (anti, output), control
Hazards
RAW, WAR, WAW, branch
2
MIPS Five-Stage Pipeline With/
Dependences vs. Hazards
Without Data Forwarding
Data
Without data forwarding
i: ADD R3, R1, R2
RAW (if j gets the “old” j: ADD R5, R3, R4
value of R3) Resultexchanges via register file
Anti i: ADD R3, R2, R1
Producer: WB consumer: ID
WAR (if i gets the “new”
With data forwarding
j: ADD R2, R4, R5
value of R2)
Result produced result used
Output i: ADD R3, R2, R1
WAW (if final result in R3 is
j: SUB R3, R4, R5
produced by i)
Register Renaming Dynamic Scheduling
• Eliminate WAR and WAW hazards by register renaming Split ID stage into two
1. Issue: Decode inst, check for structural hazards
DIV.D F0,F2,F4 DIV.D F0,F2,F4 2. Read operands: Wait until no data hazards, then
ADD.D F6,F0,F8 ADD.D S,F0,F8 read operands
S.D F6,0(R1) S.D S,0(R1) In-order instruction issue
SUB.D F8,F10,F14 SUB.D T,F10,F14 Out-of-order execution
MUL.D F6,F10,F8 MUL.D F6,F10,T An inst begins execution as soon as its data
operand is available
Out-of-order completion cause complication in
handling exception
3
Tomasulo Components Three Stages of Tomasulo Algorithm
RS entry 1.
Issue—get instruction from Inst Queue
Op—Operation to perform in the unit If reservation station free (no structural hazard),
Vj, Vk—Value of source operands control issues inst & sends operands (renames registers).
2.
Execution—operate on operands (EX)
Qj, Qk—Reservation stations producing source
When both operands ready then execute;
registers if not ready, watch Common Data Bus for result
Qj,Qk = 0 ready 3.
Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
Busy—Indicates reservation station or FU is busy mark reservation station available
Register result status
Nospeculation
Indicates which RS will write each register
In-orderissue, out-of-order execution, and out-of-order
Blank: no pending instructions writing the register completion
Dynamic Scheduling, Single-Issue Dynamic Scheduling, 2-Way Issue
Iteration Inst Issue Exe Mem W CDB Iteration Inst Issue Exe Mem W CDB
1 LD R2, 0(R1) 1 2 3 4 1 LD R2, 0(R1) 1 2 3 4
1 ADD R2, R2, #1 2 5 6 1 ADD R2, R2, #1 1 5 6
1 SD R2, 0(R1) 3 4 7 1 SD R2, 0(R1) 2 3 7
1 ADD R1, R1, #4 4 6 7 1 ADD R1, R1, #4 2 3 4
1 BNE R2, R3, loop 5 7 1 BNE R2, R3, loop 3 7
2 LD R2, 0(R1) 6 8 9 10 2 LD R2, 0(R1) 4 8 9 10
2 ADD R2, R2, #1 7 11 12 2 ADD R2, R2, #1 4 11 12
2 SD R2, 0(R1) 8 9 13 2 SD R2, 0(R1) 5 9 13
2 ADD R1, R1, #4 9 10 11 2 ADD R1, R1, #4 5 8 9
2 BNE R2, R3, loop 10 13 2 BNE R2, R3, loop 6 13
2-way issue (branch single-issue), separate INT FUs for address, ALU,
branch, two CDBs
4
Dynamic Scheduling vs.
Reorder Buffer
Speculative Execution
Dynamic scheduling (w/o speculation) Contain all in-flight instructions
A branch must be resolved before actually executing any
instructions in the successor basic block (those instruction Reorder out-of-order inst to program
can be issued)
Issue, Exec, Memory (R/W), Write CDB order at the time of writing reg/
Speculative execution (using dynamic scheduling) memory (commit)
Allow the execution of later instructions before the branch
is resolved (with the ability to undo the effect of an Buffer results/supply operands between
incorrectly speculated sequence)
Issue, Exec, Read memory, Write CDB, Commit (Write execution complete and commit
memory)
Example Architectural Simulator
Iteration Inst. Issue @ Exec @ Read Write Commit@
Mem @ CDB @ Measurement
1 LD R2, 0(R1) 1 2 3 4 5 Accurate
Only working on existing systems, not flexible
1 ADD R2, R2, #1 1 5 6 7
Constructing hardware prototype -- Slow, expensive, and
1 SD R2, 0(R1) 2 3 7 complicate
1 ADD R1, R1, #4 2 3 4 8 Analytical models
1 BNE R2, R3, loop 3 7 8
Fast and with insights
Hard to model the complexity of today’s processor
2 LD R2, 0(R1) 4 5 6 7 9
Simulator
2 ADD R2, R2, #1 4 8 9 10 Fast, cheap, flexible and relatively accurate
2 SD R2, 0(R1) 5 6 10
2 ADD R1, R1, #4 5 6 7 11 What is an architectural simulator?
2 BNE R2, R3, loop 6 10 11 A tool that reproduces the behavior of a computing device