2.lecture 4,5,6 Basic Processing Unit
2.lecture 4,5,6 Basic Processing Unit
Register #
PC Address Instruction Registers ALU Address
Register #
Instruction
memory Data
Register # memory
Data
Control signals
PC
Instruction
Address
decoder and
lines
MDR HAS MAR control logic
TWO INPUTS Memory
AND TWO bus
OUTPUTS MDR
Data
lines IR
Datapath
Y
Constant 4 R0
Select MUX
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
o Perform an arithmetic or a logic operation and store the result in a processor register.
o Fetch the contents of a given memory location and load them into a processor register.
o Store a word of data from a processor register into a given memory location.
Ri
Riout
Y in
Constant 4
Select MUX
A B
ALU
Z in
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Professor Dr. Rafiqul Islam 13
Register Transfers
o All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure
Figure7.3.
7.3. Input and output
InputProfessor
and output ating
g Islamfor
gating
Dr. Rafiqul for one
oneregister
re
gisterbit.
bit. 14
Performing an Arithmetic or Logic Operation
o The ALU is a combinational circuit that has no internal storage.
o ALU gets the two operands from MUX and bus. The result is temporarily
stored in register Z.
MDR
Clock
MR
MDRinE
Data
MDR out
Load MDR from the memory bus
R2 ← [MDR]
Ri
Riout
Y in
Constant 4
Select MUX
A B
ALU
Z in
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Professor Dr. Rafiqul Islam 23
Execution of a Complete Instruction Internal processor
bus
Control signals
Instruction
Step Action Address
decoder and
lines
MAR control logic
Step Action
3 MDR , R=B, IR
outB in
4 R4 outA , R5 outB , SelectA, Add, R6 in , End
Control signals
PC
Instruction
Address
decoder and
lines
o What is the control sequence for execution MAR control logic
MDR
Add R1, R2 Data
lines IR
Select MUX
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
Instruction Data
cache cache
System b us
Main Input/
memory Output
• Datapath is the hardware that performs all the required operations, for
example, ALU, registers, and internal buses.
• Control is the hardware that tells the datapath what to do, in terms of
switching, operation selection, data movement between ALU
components, etc.
External
inputs
Decoder/
IR
encoder
Condition
codes
Control signals
Step decoder
T 1 T2 Tn
INS 1
External
INS 2 inputs
Instruction
IR Encoder
decoder
Condition
codes
INSm
Run End
Control signals
Clock PC
Control
store CW
Clock PC
Control
store CW
Time
30 40 20 30 40 20 30 40 20 30 40 20
Time
T
a 30 40 40 40 40 20
s
k A
O B
r
d C
e
r D
Professor Dr. Rafiqul Islam 42
Traditional Pipeline Concept
6 PM 7 8 9
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Ex ecution
fetch unit (c) Pipelined execution
unit
Instruction
Fetch + Decode
+ Execution + Write I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage u
bffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
Ins truction
lw $3, 300($0) 8 ns fe tch
...
8 ns
Instruction Da ta
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Professor Dr. Rafiqul Islam 46
Role of Cache Memory
oEach pipeline stage is expected to complete in one clock cycle.
oThe clock period should be long enough to let the slowest pipeline stage to
complete.
oFaster stages can only wait for the slowest one to complete.
oSince main memory is very slow compared to the execution, if each
instruction needs to be fetched from main memory, pipeline is almost
useless.
oFortunately, we have cache.
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Stage
F: Fetch F1 F2 F2 F2 F2 F3 Idle periods – stalls
D: Decode D1 idle idle idle D2 D3 (bubbles)
E: Execute E1 idle idle idle E2 E3
Time
Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E1 W1
I 2 (Load) F2 D2 E2 M2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4
I5 F5 D5
Incrementer
Datapath Design PC
Register
file
Constant 4
MUX
A
ALU R
Instruction
decoder
IR
MDR
MAR
Memory b us Address
data lines lines
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
D
IF ID EX MEM WB
Instruction Fetch Instruction Professor
Decode Dr.Execute/
Rafiqul Islam
Address Calc. Memory Access Write Back 56
Register
file
Pipelined Design
Bus A
A
Bus B
ALU R
Bus C
- DMAR
- Separate MDR PC
- Buffers for ALU Control signal pipeline
- Instruction queue Incrementer
- Instruction decoder output
Instruction IMAR
decoder
Memory address
(Instruction fetches)
Instruction
queue
Memory address
- Reading an instruction from the instruction cache (Data access)
- Incrementing the PC
- Decoding an instruction
- Reading from or writing into the data cache Data cache
- Reading the contents of up to two regs
- Writing into one register in the reg file Figure 8.18. Datapath modified for pipelinedxecution,
e with
- Performing an ALU operation interstage u
bffers at the input and output of the ALU.
Professor Dr. Rafiqul Islam 57
Source 1
Source 2
SRC1 SRC2
Register
file
ALU
RSLT
Destination
(a) Datapath
SRC1,SRC2 RSLT
E: Execute W: Write
(ALU) (Register file)
Forwarding path
(b) Position of the source and result registers in the processor pipeline
Time axis
lw $t0, 10($t1) IM REG ALU DM REG
Instruction
I 1 (Mul) F1 D1 E1 W1
I 2 (Add) F2 D2 D2A E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
o Before actually building the pipelined datapath and control we first briefly examine these potential
hazards individually…
o Structural hazard: inadequate hardware to simultaneously support all instructions in the pipeline
in the same clock cycle
o E.g., suppose single – not separate – instruction and data memory in pipeline below with one
read port
• then a structural hazardP rogram
between first and fourth lw instructions
e xecution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Da ta
lw $1, 100($0) Reg ALU Reg
fetch access
Pipelined
Instruction Da ta
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Hazard if single memory
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400($0) Reg ALU Reg
2 ns fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
P rogram
e xecution 2 4 6 8 10 12 14 16
order Time
(in instructions)
Ins truction Data Note that branch outcome is
a dd $4, $5, $6 Reg ALU Reg
fetch acces s
computed in ID stage with
Ins truction Data
be q $1, $2, 40
fe tch
Reg ALU
access
Re g added hardware (later…)
2ns
Ins truction Data
lw $3, 300($0) bubble Reg ALU Re g
fe tch access
4 ns 2ns
Pipeline stall
Instruction Data
beq $1, $2, 40 Reg ALU Reg
2 ns fetch access
Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access
Prediction success
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access
Instruction Data
beq $1, $2, 40 Reg ALU Reg
fetch access
2 ns
bubble bubble bubble bubble bubble
Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
Prediction failure:
Professor undo Islam
Dr. Rafiqul (=flush) lw 70
Control Hazards
o Solution 3 Delayed branch: always execute the sequentially next statement with the branch
executing after one instruction delay – compiler’s job to find a statement that can be put in the slot
that is independent of branch outcome
• MIPS does this – but it is an option in SPIM (Simulator -> Settings)
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Ins truction Data
beq $1, $2, 40 Re g ALU Reg
fe tch a cce ss
2 ns
Delayed branch beq is followed by add that is
independent of branch outcome
Professor Dr. Rafiqul Islam 71
Data Hazards
o Data hazard: instruction needs data from the result of a previous instruction still executing in
pipeline
o Solution Forward data if possible…
2 4 6 8 10
Time
Progra m
execution 2 4 6 8 10
order Time
(in ins tructions)
a dd $s 0, $t0, $t1 IF ID EX MEM WB Without forwarding – blue
line –
data has to go back in time;
s ub $t2, $s 0, $t3 IF ID EX MEM WB with forwarding – red line –
data is available in time
2 4 6 8 10 12 14
Progra m Time
exe cution
order
(in instructions)
2 4 6 8 10 12 14
Program Time
execution
order
(in instructions)
o Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)