0% found this document useful (0 votes)

73 views26 pages

Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)

The document discusses static instruction level parallelism (ILP) including topics like predication and speculation. It provides an example of a scheduled and unrolled loop that executes in 14 cycles or 3.5 cycles per original iteration. It also discusses problems involving looping code and how loop unrolling and software pipelining can help avoid stall cycles in the integer and floating point pipelines. Superscalar pipelines are able to issue multiple instructions per cycle to different pipelines to help hide latencies. Predication is discussed as a way to handle branches within loops to avoid control dependences and re-fetching of instructions.

Uploaded by

Purab Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views26 pages

Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)

Uploaded by

Purab Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 26

Lecture: Static ILP

• Topics: predication, speculation (Sections C.5, 3.2)

1
Scheduled and Unrolled Loop

Loop: L.D F0, 0(R1)

L.D F6, -8(R1)
L.D F10,-16(R1)
L.D F14, -24(R1)
ADD.D F4, F0, F2 LD -> any : 1 stall
ADD.D F8, F6, F2 FPALU -> any: 3 stalls
ADD.D F12, F10, F2 FPALU -> ST : 2 stalls
ADD.D F16, F14, F2 IntALU -> BR : 1 stall
S.D F4, 0(R1)
S.D F8, -8(R1)
DADDUI R1, R1, # -32
S.D F12, 16(R1)
BNE R1,R2, Loop
S.D F16, 8(R1)

• Execution time: 14 cycles or 3.5 cycles per original iteration

2
LD -> any : 1 stall
Problem 2 FPMUL -> any: 5 stalls
FPMUL -> ST : 4 stalls
IntALU -> BR : 1 stall
for (i=1000; i>0; i--)
x[i] = y[i] * s; Source code

Loop: L.D F0, 0(R1) ; F0 = array element

3
LD -> any : 1 stall
Problem 2 FPMUL -> any: 5 stalls
FPMUL -> ST : 4 stalls
IntALU -> BR : 1 stall
for (i=1000; i>0; i--)
x[i] = y[i] * s; Source code

Loop: L.D F0, 0(R1) ; F0 = array element

MUL.D F4, F0, F2 ; multiply scalar
S.D F4, 0(R2) ; store result
DADDUI R1, R1,# -8 ; decrement address pointer Assembly code
DADDUI R2, R2,#-8 ; decrement address pointer
BNE R1, R3, Loop ; branch if R1 != R3
NOP
• How many unrolls does it take to avoid stall cycles?
Degree 2: LD LD MUL MUL DA DA 1s SD BNE SD
Degree 3: LD LD LD MUL MUL MUL DA DA SD SD BNE SD
– 12 cyc/3 iterations

4
Superscalar Pipelines

Integer pipeline FP pipeline

Handles L.D, S.D, ADDUI, BNE Handles ADD.D

• What is the schedule with an unroll degree of 5?

5
Superscalar Pipelines

Integer pipeline FP pipeline

Loop: L.D F0,0(R1)
L.D F6,-8(R1)
L.D F10,-16(R1) ADD.D F4,F0,F2
L.D F14,-24(R1) ADD.D F8,F6,F2
L.D F18,-32(R1) ADD.D F12,F10,F2
S.D F4,0(R1) ADD.D F16,F14,F2
S.D F8,-8(R1) ADD.D F20,F18,F2
S.D F12,-16(R1)
DADDUI R1,R1,# -40
S.D F16,16(R1)
BNE R1,R2,Loop
S.D F20,8(R1)

• Need unroll by degree 5 to eliminate stalls (fewer if we move DADDUI up)

• The compiler may specify instructions that can be issued as one packet
• The compiler may specify a fixed number of instructions in each packet:
Very Large Instruction Word (VLIW) 6
LD -> any : 1 stall
Problem 3 FPMUL -> any: 5 stalls
FPMUL -> ST : 4 stalls
IntALU -> BR : 1 stall
for (i=1000; i>0; i--)
x[i] = y[i] * s; Source code

Loop: L.D F0, 0(R1) ; F0 = array element

7
LD -> any : 1 stall
Problem 3 FPMUL -> any: 5 stalls
FPMUL -> ST : 4 stalls
IntALU -> BR : 1 stall
for (i=1000; i>0; i--)
x[i] = y[i] * s; Source code

Loop: L.D F0, 0(R1) ; F0 = array element

MUL.D F4, F0, F2 ; multiply scalar
S.D F4, 0(R2) ; store result
DADDUI R1, R1,# -8 ; decrement address pointer Assembly code
DADDUI R2, R2,#-8 ; decrement address pointer
BNE R1, R3, Loop ; branch if R1 != R3
NOP
• How many unrolls does it take to avoid stalls in the superscalar pipeline?
LD
LD
LD MUL
LD MUL
LD MUL 7 unrolls. Could also make do with 5 if we
LD MUL moved up the DADDUIs.
LD MUL 8
SD MUL
Software Pipeline?!
L.D ADD.D S.D
DADDUI BNE

L.D ADD.D S.D

DADDUI BNE

L.D ADD.D S.D

DADDUI BNE

L.D ADD.D S.D

DADDUI BNE

Loop: L.D F0, 0(R1) L.D ADD.D …

ADD.D F4, F0, F2 DADDUI BNE
S.D F4, 0(R1)
DADDUI R1, R1,# -8 L.D ADD.D …
BNE R1, R2, Loop
DADDUI BNE 9
Software Pipeline

L.D ADD.D S.D Original iter 1

L.D ADD.D S.D Original iter 2

L.D ADD.D S.D Original iter 3

L.D ADD.D S.D Original iter 4

New iter 1 L.D ADD.D S.D

L.D ADD.D S.D

New iter 2
L.D ADD.D
New iter 3
L.D
New iter 4

10
Software Pipelining

Loop: L.D F0, 0(R1) Loop: S.D F4, 16(R1)

ADD.D F4, F0, F2 ADD.D F4, F0, F2
S.D F4, 0(R1) L.D F0, 0(R1)
DADDUI R1, R1,# -8 DADDUI R1, R1,# -8
BNE R1, R2, Loop BNE R1, R2, Loop

• Advantages: achieves nearly the same effect as loop unrolling, but

without the code expansion – an unrolled loop may have inefficiencies
at the start and end of each iteration, while a sw-pipelined loop is
almost always in steady state – a sw-pipelined loop can also be unrolled
to reduce loop overhead

• Disadvantages: does not reduce loop overhead, may require more

registers

11
LD -> any : 1 stall
Problem 4 FPMUL -> any: 5 stalls
FPMUL -> ST : 4 stalls
IntALU -> BR : 1 stall
for (i=1000; i>0; i--)
x[i] = y[i] * s; Source code

Loop: L.D F0, 0(R1) ; F0 = array element

12
LD -> any : 1 stall
Problem 4 FPMUL -> any: 5 stalls
FPMUL -> ST : 4 stalls
IntALU -> BR : 1 stall
for (i=1000; i>0; i--)
x[i] = y[i] * s; Source code

Loop: L.D F0, 0(R1) ; F0 = array element

Loop: S.D F4, 0(R2)

MUL F4, F0, F2
L.D F0, 0(R1)
DADDUI R2, R2, #-8
BNE R1, R3, Loop
DADDUI R1, R1, #-8 There will be no stalls 13
Predication

• A branch within a loop can be problematic to schedule

• Control dependences are a problem because of the need

to re-fetch on a mispredict

• For short loop bodies, control dependences can be

converted to data dependences by using
predicated/conditional instructions

14
Predicated or Conditional Instructions

if (R1 == 0) R7 = !R1
R2 = R2 + R4 R8 = R2
else R2 = R2 + R4 (predicated on R7)
R6 = R3 + R5 R6 = R3 + R5 (predicated on R1)
R4 = R2 + R3 R4 = R8 + R3 (predicated on R1)

15
Predicated or Conditional Instructions
• The instruction has an additional operand that determines
whether the instr completes or gets converted into a no-op

• Example: lwc R1, 0(R2), R3 (load-word-conditional)

will load the word at address (R2) into R1 if R3 is non-zero;
if R3 is zero, the instruction becomes a no-op

• Replaces a control dependence with a data dependence

(branches disappear) ; may need register copies for the
condition or for values used by both directions
if (R1 == 0) R7 = !R1 ; R8 = R2 ;
R2 = R2 + R4 R2 = R2 + R4 (predicated on R7)
else R6 = R3 + R5 (predicated on R1)
R6 = R3 + R5 R4 = R8 + R3 (predicated on R1)
R4 = R2 + R3 16
Problem 1
• Use predication to remove control hazards in this code

if (R1 == 0)
R2 = R5 + R4
R3 = R2 + R4
else
R6 = R3 + R2

17
Problem 1
• Use predication to remove control hazards in this code

if (R1 == 0) R7 = !R1 ;
R2 = R5 + R4 R6 = R3 + R2 (predicated on R1)
R3 = R2 + R4 R2 = R5 + R4 (predicated on R7)
else R3 = R2 + R4 (predicated on R7)
R6 = R3 + R2

18
Complications

• Each instruction has one more input operand – more

• If the branch condition is not known, the instruction stalls

(remember, these are in-order processors)

• Some implementations allow the instruction to continue

without the branch condition and squash/complete later in
the pipeline – wasted work

• Increases register pressure, activity on functional units

• Does not help if the br-condition takes a while to evaluate

19
Support for Speculation

• In general, when we re-order instructions, register renaming

can ensure we do not violate register data dependences

• However, we need hardware support

 to ensure that an exception is raised at the correct point
 to ensure that we do not violate memory dependences

st
br

ld
20
Detecting Exceptions

• Some exceptions require that the program be terminated

(memory protection violation), while other exceptions
require execution to resume (page faults)

• For a speculative instruction, in the latter case, servicing

the exception only implies potential performance loss

• In the former case, you want to defer servicing the

exception until you are sure the instruction is not speculative

• Note that a speculative instruction needs a special opcode

to indicate that it is speculative
21
Program-Terminate Exceptions

• When a speculative instruction experiences an exception,

instead of servicing it, it writes a special NotAThing value
(NAT) in the destination register

• If a non-speculative instruction reads a NAT, it flags the

exception and the program terminates (it may not be
desireable that the error is caused by an array access, but
the segfault happens two procedures later)

• Alternatively, an instruction (the sentinel) in the speculative

instruction’s original location checks the register value and
initiates recovery
22
Memory Dependence Detection

• If a load is moved before a preceding store, we must

ensure that the store writes to a non-conflicting address,
else, the load has to re-execute

• When the speculative load issues, it stores its address in

a table (Advanced Load Address Table in the IA-64)

• If a store finds its address in the ALAT, it indicates that a

violation occurred for that address

• A special instruction (the sentinel) in the load’s original

location checks to see if the address had a violation and
re-executes the load if necessary
23
Problem 2

• For the example code snippet below, show the code after
the load is hoisted:

Instr-A
Instr-B
ST R2  [R3]
Instr-C
BEZ R7, foo
Instr-D
LD R8  [R4]
Instr-E

24
Problem 2

• For the example code snippet below, show the code after
the load is hoisted:
LD.S R8  [R4]
Instr-A Instr-A
Instr-B Instr-B
ST R2  [R3] ST R2  [R3]
Instr-C Instr-C
BEZ R7, foo BEZ R7, foo
Instr-D Instr-D
LD R8  [R4] LD.C R8, rec-code
Instr-E Instr-E

25
rec-code: LD R8  [R4]
Title

• Bullet

Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
Instruction-Level Parallelism Overview
No ratings yet
Instruction-Level Parallelism Overview
170 pages
Unit II
No ratings yet
Unit II
84 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
Instruction-Level Parallelism Techniques
0% (1)
Instruction-Level Parallelism Techniques
40 pages
Software Pipelining in Compiler Design
No ratings yet
Software Pipelining in Compiler Design
25 pages
Lecture 5
No ratings yet
Lecture 5
80 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Advanced Loop Optimization Techniques
No ratings yet
Advanced Loop Optimization Techniques
21 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Static ILP Exploitation Techniques
No ratings yet
Static ILP Exploitation Techniques
21 pages
Compiler Scheduling for MIPS ILP
No ratings yet
Compiler Scheduling for MIPS ILP
18 pages
Software Pipelining vs. Loop Unrolling
No ratings yet
Software Pipelining vs. Loop Unrolling
13 pages
5 Advanced-1
No ratings yet
5 Advanced-1
60 pages
Optimizing Instruction-Level Parallelism
No ratings yet
Optimizing Instruction-Level Parallelism
18 pages
Techniques for Enhancing ILP in Compilers
No ratings yet
Techniques for Enhancing ILP in Compilers
4 pages
Software Pipelining Patterson 1996
No ratings yet
Software Pipelining Patterson 1996
60 pages
Intro To Static Pipelining: CS252 Graduate Computer Architecture
No ratings yet
Intro To Static Pipelining: CS252 Graduate Computer Architecture
52 pages
Understanding Data Dependences and Hazards
No ratings yet
Understanding Data Dependences and Hazards
24 pages
Adv Topic Compiler Supported ILP
No ratings yet
Adv Topic Compiler Supported ILP
17 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Computer Architecture ILP - Techniques For Increasing
No ratings yet
Computer Architecture ILP - Techniques For Increasing
11 pages
Advanced Computer Architecture HW3
No ratings yet
Advanced Computer Architecture HW3
5 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
Unit 6
No ratings yet
Unit 6
22 pages
Αρχιτεκτονική Υπολογιστών: Παράλληλος Έλεγχος
No ratings yet
Αρχιτεκτονική Υπολογιστών: Παράλληλος Έλεγχος
34 pages
ACA Unit 3
No ratings yet
ACA Unit 3
17 pages
06 Ooo Basics
No ratings yet
06 Ooo Basics
74 pages
Lec 11
No ratings yet
Lec 11
19 pages
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
No ratings yet
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
24 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Out-of-Order Superscalar Optimization
No ratings yet
Out-of-Order Superscalar Optimization
156 pages
Lec 15
No ratings yet
Lec 15
15 pages
Understanding VLIW Processors
No ratings yet
Understanding VLIW Processors
11 pages
Advanced ILP Techniques for Developers
No ratings yet
Advanced ILP Techniques for Developers
104 pages
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
No ratings yet
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
11 pages
Cs152 Sp16 F Sol VLIW
No ratings yet
Cs152 Sp16 F Sol VLIW
40 pages
Vliw/Epic:: Statically Scheduled ILP
No ratings yet
Vliw/Epic:: Statically Scheduled ILP
34 pages
HW3S24 Sol
No ratings yet
HW3S24 Sol
16 pages
Code Generation
No ratings yet
Code Generation
43 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
05 Wideissue
No ratings yet
05 Wideissue
77 pages
Computer Architecture Homework
No ratings yet
Computer Architecture Homework
12 pages
VLIW Architecture Overview and Benefits
No ratings yet
VLIW Architecture Overview and Benefits
53 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Code Generation in Compilers: Overview
No ratings yet
Code Generation in Compilers: Overview
44 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
108 pages
Exe On Pipelining
No ratings yet
Exe On Pipelining
12 pages
Pipelining Achieves Instruction Level Parallelism (ILP)
No ratings yet
Pipelining Achieves Instruction Level Parallelism (ILP)
59 pages
Code Generation Challenges and Examples
No ratings yet
Code Generation Challenges and Examples
22 pages
Lecture-14-03 02 2025
No ratings yet
Lecture-14-03 02 2025
53 pages
Compiler Architecture
No ratings yet
Compiler Architecture
16 pages
Instruction Execution and Straight-Line Sequencing
No ratings yet
Instruction Execution and Straight-Line Sequencing
5 pages
CMP3010L05-Hazard Continue ILP
No ratings yet
CMP3010L05-Hazard Continue ILP
54 pages
ACA Unit 3
No ratings yet
ACA Unit 3
50 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
108 pages
Dielectric Effects on Plasmonic Nanoparticles
No ratings yet
Dielectric Effects on Plasmonic Nanoparticles
5 pages
Update Oct 29
No ratings yet
Update Oct 29
12 pages
Optical Binding in Plasmonic Nanodimers
No ratings yet
Optical Binding in Plasmonic Nanodimers
2 pages
Mil J Kovic 2010
No ratings yet
Mil J Kovic 2010
8 pages
Resume Ashiq
No ratings yet
Resume Ashiq
2 pages
Capacitively Coupled Plasma Guide
No ratings yet
Capacitively Coupled Plasma Guide
25 pages
BUET Postgrad Application Form
No ratings yet
BUET Postgrad Application Form
4 pages
This Is Absolute Garbage
No ratings yet
This Is Absolute Garbage
1 page
Operations Management: - Forecasting
No ratings yet
Operations Management: - Forecasting
96 pages
Triple Frequency IED Model Analysis
No ratings yet
Triple Frequency IED Model Analysis
34 pages
Understanding Plasma: Definition & Parameters
No ratings yet
Understanding Plasma: Definition & Parameters
31 pages
Routine 23
No ratings yet
Routine 23
1 page
Diagram 2hhmm
No ratings yet
Diagram 2hhmm
1 page
Input Port A: Hex To Bin Encoder
No ratings yet
Input Port A: Hex To Bin Encoder
1 page
VLSI Circuits Sessional Lab Report
0% (1)
VLSI Circuits Sessional Lab Report
9 pages
Root Locus Techniques for Control Systems
No ratings yet
Root Locus Techniques for Control Systems
10 pages
Op Amp Inverted vs Non-Inverted Inputs
No ratings yet
Op Amp Inverted vs Non-Inverted Inputs
1 page
DSP and IPE Lab Schedule
No ratings yet
DSP and IPE Lab Schedule
1 page
Date: 09/08/2014 Course No. EEE 212 Group No.: Submitted by
No ratings yet
Date: 09/08/2014 Course No. EEE 212 Group No.: Submitted by
4 pages
Physics Ideal Gas Law Notes
No ratings yet
Physics Ideal Gas Law Notes
5 pages
The Human Connectome
No ratings yet
The Human Connectome
9 pages
Admin-Ayuda Librerias VCX
No ratings yet
Admin-Ayuda Librerias VCX
343 pages
Unit 4
No ratings yet
Unit 4
70 pages
Concepts On Philosophy of Science
No ratings yet
Concepts On Philosophy of Science
3 pages
Borehole Data Capture Sheet USER FORMS
No ratings yet
Borehole Data Capture Sheet USER FORMS
3 pages
General Science Theory English Medium Book
No ratings yet
General Science Theory English Medium Book
524 pages
ICT IGCSE - Hardware and Software - Computers - Quizizz
No ratings yet
ICT IGCSE - Hardware and Software - Computers - Quizizz
5 pages
3D Drawing Techniques for Engineers
No ratings yet
3D Drawing Techniques for Engineers
3 pages
GED Mathematical Reasoning Formulas
No ratings yet
GED Mathematical Reasoning Formulas
13 pages
Residential Density Guidelines
No ratings yet
Residential Density Guidelines
35 pages
Electrical Parts List
No ratings yet
Electrical Parts List
4 pages
Worksheet - Procedures
No ratings yet
Worksheet - Procedures
3 pages
SSLC Mathematics Important Questions
No ratings yet
SSLC Mathematics Important Questions
10 pages
Chapter14 - THE ARITHMETIC COPROCESSOR, MMX, AND SIMD TECHNOLOGIES
No ratings yet
Chapter14 - THE ARITHMETIC COPROCESSOR, MMX, AND SIMD TECHNOLOGIES
134 pages
US6132758
No ratings yet
US6132758
6 pages
A2000bc en 20180608
No ratings yet
A2000bc en 20180608
4 pages
6TH Sem Android Question Bank
No ratings yet
6TH Sem Android Question Bank
116 pages
Understanding Simple Harmonic Motion Concepts
No ratings yet
Understanding Simple Harmonic Motion Concepts
1 page
TMUA Mock Set B
No ratings yet
TMUA Mock Set B
14 pages
Section 1: Details: Analogies Part 1 Practice Questions
No ratings yet
Section 1: Details: Analogies Part 1 Practice Questions
144 pages
C# Hello World AWS Simple Workflow (SWF)
No ratings yet
C# Hello World AWS Simple Workflow (SWF)
10 pages
3281g - en - LSA 51.2 Manual
No ratings yet
3281g - en - LSA 51.2 Manual
20 pages
Orthogonal Array Testing Strategy
No ratings yet
Orthogonal Array Testing Strategy
30 pages
Support Structure Design Report
No ratings yet
Support Structure Design Report
31 pages
Computer Network II
No ratings yet
Computer Network II
1 page
A Level Guide (2024 26)
No ratings yet
A Level Guide (2024 26)
12 pages
Statistics & Probability Monographs
100% (1)
Statistics & Probability Monographs
259 pages
Irrigation Water Requirement for Rice
100% (1)
Irrigation Water Requirement for Rice
5 pages
Science Film Script
No ratings yet
Science Film Script
6 pages

Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)

Uploaded by

Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)

Uploaded by

Lecture: Static ILP

• Topics: predication, speculation (Sections C.5, 3.2)

Loop: L.D F0, 0(R1)

• Execution time: 14 cycles or 3.5 cycles per original iteration

Loop: L.D F0, 0(R1) ; F0 = array element

Loop: L.D F0, 0(R1) ; F0 = array element

Integer pipeline FP pipeline

Handles L.D, S.D, ADDUI, BNE Handles ADD.D

• What is the schedule with an unroll degree of 5?

Integer pipeline FP pipeline

• Need unroll by degree 5 to eliminate stalls (fewer if we move DADDUI up)

Loop: L.D F0, 0(R1) ; F0 = array element

Loop: L.D F0, 0(R1) ; F0 = array element

L.D ADD.D S.D

L.D ADD.D S.D

L.D ADD.D S.D

Loop: L.D F0, 0(R1) L.D ADD.D …

L.D ADD.D S.D Original iter 1

L.D ADD.D S.D Original iter 2

L.D ADD.D S.D Original iter 3

L.D ADD.D S.D Original iter 4

New iter 1 L.D ADD.D S.D

L.D ADD.D S.D

Loop: L.D F0, 0(R1) Loop: S.D F4, 16(R1)

• Advantages: achieves nearly the same effect as loop unrolling, but

• Disadvantages: does not reduce loop overhead, may require more

Loop: L.D F0, 0(R1) ; F0 = array element

Loop: L.D F0, 0(R1) ; F0 = array element

Loop: S.D F4, 0(R2)

• A branch within a loop can be problematic to schedule

• Control dependences are a problem because of the need

• For short loop bodies, control dependences can be

• Example: lwc R1, 0(R2), R3 (load-word-conditional)

• Replaces a control dependence with a data dependence

• Each instruction has one more input operand – more

• If the branch condition is not known, the instruction stalls

• Some implementations allow the instruction to continue

• Increases register pressure, activity on functional units

• Does not help if the br-condition takes a while to evaluate

• In general, when we re-order instructions, register renaming

• However, we need hardware support

• Some exceptions require that the program be terminated

• For a speculative instruction, in the latter case, servicing

• In the former case, you want to defer servicing the

• Note that a speculative instruction needs a special opcode

• When a speculative instruction experiences an exception,

• If a non-speculative instruction reads a NAT, it flags the

• Alternatively, an instruction (the sentinel) in the speculative

• If a load is moved before a preceding store, we must

• When the speculative load issues, it stores its address in

• If a store finds its address in the ALAT, it indicates that a

• A special instruction (the sentinel) in the load’s original

You might also like