0% found this document useful (0 votes)
82 views

Pipelining Become Universal Technique in 1985

Pipelining became a universal technique in 1985 by overlapping instruction execution and exploiting instruction level parallelism. There are two main approaches to pipelining - hardware-based dynamic approaches used in servers and desktops, and compiler-based static approaches used for scientific applications. Exploiting instruction level parallelism aims to maximize instructions per cycle by minimizing pipeline stalls from structural hazards, data hazards, and control hazards. Loop unrolling, dynamic scheduling, register renaming, and branch prediction are some techniques used to reduce stalls and improve parallelism.

Uploaded by

Rajesh c
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Pipelining Become Universal Technique in 1985

Pipelining became a universal technique in 1985 by overlapping instruction execution and exploiting instruction level parallelism. There are two main approaches to pipelining - hardware-based dynamic approaches used in servers and desktops, and compiler-based static approaches used for scientific applications. Exploiting instruction level parallelism aims to maximize instructions per cycle by minimizing pipeline stalls from structural hazards, data hazards, and control hazards. Loop unrolling, dynamic scheduling, register renaming, and branch prediction are some techniques used to reduce stalls and improve parallelism.

Uploaded by

Rajesh c
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction

• Pipelining become universal technique in 1985


– Overlaps execution of instructions
– Exploits “Instruction Level Parallelism”

• There are two main approaches:


– Hardware-based dynamic approaches
• Used in server and desktop processors
• Not used as extensively in PMP processors
– Compiler-based static approaches
• Not as successful outside of scientific applications
Instruction-Level Parallelism
• When exploiting instruction-level parallelism, goal is to
maximize CPI
Pipeline CPI =Ideal pipeline CPI +Structural stalls +Data hazard stalls + Control stalls

• Ideal Pipeline CPI- It is a measure of the maximum performance


attainable by the implementation.
Techniques used to decrease the
overall pipeline CPI

3
Techniques for Improving ILP

- Loop unrolling
-Basic pipeline scheduling
-Dynamic scheduling, score boarding,
register renaming
-Dynamic memory disambiguation
-Dynamic branch prediction
-Multiple instruction issue per cycle
-Software and hardware techniques

4
Loop-Level Parallelism

• Basic block: straight-line code w/o branches Fraction


of branches: 0.15 to 0.25

• ILP is limited!
-Average basic-block size is 6-7 instructions
-These may be dependent
• LLP
-Easily unroll loop statically or dynamically
-Can use SIMD (vector processors and GPUs)

5
ILP
• ILP is increased by exploit Parallelism among
iteration of a loop
Ex:
For(i=0;i<=999;i=i+1)
x[i]=x[i]+y[i];
• Loops are used parallel
• Techniques are used to convert from LLP to ILP

6
Hazards & Stalls
Structural Hazards
– Cause: resource contention
– Solution: add more resources & better scheduling

Data Hazards
– Cause: Dependences
True data dependence: property of program: RAW
Name dependence: reuse of registers, WAR & WAW
– Solution: loop unrolling, dynamic scheduling, register renaming,
hardware speculation

Control Hazards
– Cause: branch instructions, change of program flow
– Solution: loop unrolling, branch prediction, hardware speculation
7
1. Data Dependence
• Loop-Level Parallelism
– Unroll loop statically or dynamically
– Use SIMD (vector processors and GPUs)

• Challenges:
– Data dependency
• Instruction j is data dependent on instruction i if

– Instruction i produces a result that may be used by instruction j


– Instruction j is data dependent on instruction k and instruction k
is data dependent on instruction i

• Dependent instructions cannot be executed


simultaneously
Data Dependence
• Dependencies are a property of programs
• Stalls are properties of the pipeline
• Pipeline organization determines if dependence is
detected and if it cause a stall

• Data dependence convey:


– Possibility of a hazard
– Order in which results must be calculated
– Upper bound on exploitable ILP

• Dependencies that flow through memory locations are


difficult to detect
• Data Hazards
– Read after write (RAW)
– Write after write (WAW)
– Write after read (WAR)

• Two possibilities:
- Maintain dependence, but avoid stalls
- Eliminate dependence by code transformation
Example
Instruction Sequence

1
2
3
4
5
Data Dependencies for float data

1
2
3
Data Dependence for Integer Data

11
2. Name Dependence
• Two instructions use the same name(Reg or
Mem) but no flow of information

– Anti dependence(WAR): instruction j writes a register


or memory location that instruction i reads
• Initial ordering (i before j) must be preserved
– Output dependence(WAW): instruction i and
instruction j write the same register or memory
location
• Ordering must be preserved

• To resolve, use renaming techniques


3. Control Dependence

Ordering of instruction i with respect to a branch


Instruction

• Instruction control dependent on a branch cannot be


moved before the branch so that its execution is no
longer controlled by the branch

• An instruction not control dependent on a branch


cannot be moved after the branch so that its execution
is controlled by the branch

13
Control Dependence
An example:
● T1;
if p1 {
S1;
}
if p2 {
S2;
}
● Statement S1 is control-dependent on p1, but T1 is not
Statement S2 is control-dependent on p2, but not p1

● What this means for execution


– S1 cannot be moved before p1
– T1 cannot be moved inside p2
Examples

Example 1: •Moving load instruction before the


branch the load instruction may
DADDU R2,R3,R4 cause memory protection
BEQZ R2,L1 exception.
LW R1,0(R2) •Preserve Control Dependence
L1:

15
Example 2: • OR instruction dependent on
DADDU R1,R2,R3 DADDU and DSUBU.
BEQZ R4,L • Data flow must be preserved
DSUBU R1,R1,R6
L: …
OR R7,R1,R8

Example 3:
DADDU R1,R2,R3 • Violating control dependence does not
BEQZ R12,skip affect data flow or exception
DSUBU R4,R5,R6 • Assume R4 isn’t used after skip
DADDU R5,R4,R9 – Possible to move DSUBU before the
skip: OR R7,R8,R9 branch

You might also like