0% found this document useful (0 votes)
5 views33 pages

CA UNIT-2

The document discusses the basic implementation of the MIPS architecture, detailing instruction classes, the operation of the control unit, and the design of the datapath for various instruction types. It explains the roles of memory reference, arithmetic, and branch instructions, along with the use of multiplexors and adders in the processor. Additionally, it covers control implementation schemes and the concept of pipelining to enhance instruction execution efficiency.

Uploaded by

Geojini G.T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views33 pages

CA UNIT-2

The document discusses the basic implementation of the MIPS architecture, detailing instruction classes, the operation of the control unit, and the design of the datapath for various instruction types. It explains the roles of memory reference, arithmetic, and branch instructions, along with the use of multiplexors and adders in the processor. Additionally, it covers control implementation schemes and the concept of pipelining to enhance instruction execution efficiency.

Uploaded by

Geojini G.T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-3

Processor and Control Unit

1. BASIC MIPS IMPLEMENTATION:


NOV/DEC2015,APR/MAY2015(16MARKS)
The basic MIPs implementation includes a subset of the core MIPS instruction set.

Every instructions are divided into 3 instruction classes

Instruction classes:

1. Memory Reference Instruction. [load word and store word]


2. Arithmetic and logical instruction [Add, Sub, mul, Or, And ect]
3. Branch instruction. [jump and branch equal]

Overview of the MIPS implementation:

In every instruction, there are two steps which are identical

1. Fetch instruction: fetch the instruction from the memory


2. Fetch operand: select the registers to read

Operation:

The program counter: It supply instruction address to the instruction memory.

Instruction memory: After the instruction is fetched, the register operands required by an
instruction are specified by fields of that instruction
Register operand: Once the register operands have been fetched, they can be used to
compute three classes of instruction.

1. Memory Reference Instruction:


 It uses the ALU for an address calculation.
 After using ALU, memory reference instruction to access the memory either
to read data for a load or write data for a store.
2. Arithmetic and logical instruction:
 It uses ALU for the operation execution.
 After completing the execution the Arithmetic and logical instruction must
write the data from ALU or memory back into a register.
3. Branch instruction:
 It uses ALU for comparison.
 After comparison, need to change the next instruction address based on the
comparison, otherwise Pc should be incremented by 4 to get the address of
next instruction.

Multiplexor: The data going to a particular unit comes from two different sources. These
data lines cannot be wired together, we must add a device that combine the multiple
sources and sent to the destination. Such device called multiplexor (many inputs single
output)

Adder: Increment the PC to the address of the next instruction.

Basic implementation of MIPS with Control signals:


The multiplexor selects from several inputs based on the setting of its control lines.

The control lines are set Based on information taken from the instruction being execute.

Control unit:

Control unit which has the instruction as an input is used to determine the control signals
for the function unit and two of the multiplexors.

 The input to the control fields is the 6 bit opcode field from the instruction.
 Theoutputofthecontrolunitconsistofthree1-bitsignalsareusedtocontrol multiplexors.
 3signals for controlling reads and writes in the register file and data memory.
 1-bitcontrolsignal used in determining for branch
 2-bit control signal for ALU.

Logic Design Conventions:

In MIPS implementation consists of two different types of logic elements (datapath


element).

1. Combinational Element
2. State Element

Combinational Element:

 The element that operates on data value such as AND gate or an ALU, which means
the output depend only on the current inputs.

State Element:

 A memory element such as register or a memory is called as state element.


 An element contains state if it has internal storage.
 Logical component that contains state are called sequential, because their output
depend on both their inputs and the contents of the internal state.

Clocking Methodology:

 It is used to determine when the data is valid and stable relative to the clock.
 It specifies the timing of read and writes.
 A clocking methodology is designed to ensure predictability.

Edge–triggered clocking Methodology:

 Any values stored in a sequential logic element are updated only on a clock edge.
2. BUILDING DATA PATH:

A datapath element is used to operate on or hold data within a processor.

In MIPS implementation, the data path elements include the instruction and data memories,
the register files, the ALU and adders.

Instruction Memory: A memory unit to store the instruction of a program and supply
instructions gives an adderss.

Program counter: PC is register containing the address of the next instruction in the
program.

Adder: Increment the Pc to the address of the next instruction.

Data segments:

There are three data Segments:

1. Data segment for Arithmetic and logical instruction.


2. Data segment for load word and store word instruction.
3. Data segment for branch instruction.

[Link] segment for Arithmetic and logic instruction:


 Arithmeticandlogicalinstructionreadoperandsfrom2registers,perform an Arithmetic
and logical operation and write the result to the register.
 These instruction are also called R-format instructions.

R- format instruction:

 R-formatinstructionhavethreeregisteroperands.2source operandand1destination
operand.
 It include add, sub, AND, OR and slt.
 Example: OR $t1, $t2, $t3

The register files:

 In MIPS processor stores 32generalpurposeregister this structure called register file


 It is a collection of register.
 It contains the register state of computer.

Read registers:

 To read data words, we need to specify the register number to the register file.

Write Register: To write data words, we need two inputs.

1. To specify the register number to be written.


2. To supply the data to be written into the register.

Register write:

 It is a write control signal.


 It control write operation.
 If the signal is edge triggered, to perform write operation.
 If the signal is earlier clock cycle, to perform read operation.

[Link] Segment for load word and store word instruction:

The general form of load word and store word instruction in MIPS processor are

Lw $t1, offset

value($t2) Sw

$t1,offset value($t2)
These instruction compute memory address by adding the base register.

Memory address= base register+ offset value.

Store value: read from the Register File, written to the Data

Memory load value : read from the Data Memory, written to the

Register File Sign extend :

 To convert 16 bits offset field in the instruction to a 32 bit signed value.


 It is used to increase the size of the data item by replacing the high order sign bit of
the original data item in the higher order bit of the larger destination data item.

Data memory unit:

 It has read and write control signal, an address input and an input for the data to be
written into memory

[Link] Segment for branch instruction:

 The branch instruction has three operands, 2 register &1 offset.


 2 register are compared for equality (zero ALU output).
 16 bit offset used to compute branch target address.
 Branch target address is an address specified in a branch which becomes the new
program counter if the branch is taken.
Example: beq $t1,$t2, offset

 If the condition is true ,the branch target address becomes new pc and the branch is
taken.
 If the condition is false, incremented pc should replace the current pc and branch is
not taken

The branch datapath must perform two operations:

1. Compute the branch target address: the branch data path includes a sign extension
unit, shifter and an adder

2. Compare the register content: used register file and the ALU.

Jump operation involves:

Replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted
left by 2 bits.

Creating a single datapath:

 The datapath component for individual instruction classes, combine them into a
single data path and add the control to complete the implementation.
 The simplest data path will attempt to execute all instruction in one clock cycle
(fetch, decode and execute each instructions in one clock cycle)
 No datapath resource can be used more than once per instruction.
 Multiplexors needed at the input of shared elements with control lines to do the
selection
 Write signals to control writing to the Register File and Data Memory
 Cycle time is determined by length of the longest path.

Problem:

How to build a data path for the operational portion of the memory reference and
arithmetic logical instructions that use a single register file and a single ALU to handle
both types of instructions assign any necessary multiplexors.
Answer: to create a data path with only a single register file and a single ALU, we use two
multiplexors. One is placed at the ALU input and another at the data input to the register
file.

Show how to built a datapath for arithmetic-logical, memory reference and branch
instructions.

We can combine all the pieces to make a simple data path for the MIPS architecture by
adding the datapath for instruction fetch, the data path from R- format and memory
instruction and the data path for branches.

3. CONTROL IMPLEMENTATION SCHEME:


A control implementation scheme by adding simple control functions to the existing data
path.

Different control implementation scheme:

1. The ALU control


2. Designing the main control unit
3. Operation of the datapath

The ALU control:

There are 6 possible combinations of 4 control inputs. ALU will need to perform one of
these function
Load word and store word instruction: use the ALU to compute the memory address by
adding.

R-type instruction: uses ALU to perform one of the five

actions. Branch instruction: use ALU must perform a

subtraction ALUop control bits and different function

codes:

4 bit control input using a small control inputs the function field of the instruction and 2 bit
control field.

Truth table: It is representation of a logical operation by listing all the values of the inputs.

Truth table shows how the 4 bit ALU control is set depending on these 2 input fields.
Don’t care term(x) indicates that the output does not depend on the value of the input
corresponding to that column.

Designing the main control unit:

 The input to the control fields is the 6 bit opcode field from the instruction.
 The output of the control unit consist of three1-bit signals are used to control
multiplexors.
 3signals for controlling reads and writes in the register file and data memory.
 1-bitcontrolsignal used in determining for branch
 2-bit control signal for ALU.

Simple data path with the control unit:

 And gate is used to combine the branch control signal and zero output from ALU.
 And gate output controls selection of the next PC.
 The multiplexor selects the 1 input, the control is asserted.
 The multiplexor selects the 0 input, the control is de-asserted.

Seven control signals:


Opcode fields of the instruction:

Operation of the datapath:

3 instruction classes which help to understand how to connect the fields of an institution
to the data path.

1. Instruction format for R-format instruction, which all have an opcode of 0.


2. Instruction format for load (opcode=35 ten) and store=43 ten) instruction.
3. Instruction format for branch equal(opcode=4)

s1

+
+

0 rs rt rd shamt funct

31:26 25:21 20-16 15:11 10:6 5:0

add $s1,$s2,$s3

base

35or43 rs rt address

31:26 25:21 20-16 15:0

sw $s1,100($s2)
Downloadedfrom:[Link]
lw $s1,100($s2)

sw $s1,100($s2)
Downloadedfrom:[Link]
Important observation about this instruction format:

Bit31:26 in the instruction is opcode (op) field

Bit25:21and20:16 in the instruction format always specify rs and rt.

Bit25:21alwaysgivethebase register (rs) for load and store.

Bit15:0 give the The16-bitoffset for branch equal, load and store

The destination register is in one or two place.

This figure shows the additions plus the ALU control block, the write signal for state
elements, the read signal for the data memory and the control signals for the multiplexor.

[Link] for an operation in a R-type instruction:

Datapath for an R-type instruction such as

add $t1, $t2,$t3

Four steps needed to execute the instruction:

1. The instruction is fetched and the Pc is incremented.


2. To registers, $t2 and $t3 are read from the register file and main control unit
computes the setting of the control lines during this step.
3. The ALU operates on the data read from the register file, using the function code to
generate the ALU function.
4. The result from the ALU is written into the register file using bits15:11of the
instruction to select the destination register ( $ t1)

[Link] for an operation in a Load instruction:


Datapath for an Load word instruction

LW $t2, offset valu ($t1)

Five steps needed to execute the

instruction:

1. An instruction is fetched from the instruction memory, and the PC is incremented.


2. A register ($t2) value is read form the register file.
3. The ALU computes the sum of the value read from the register file and the sign
extended lower 16 bits of the instruction(offset).
4. The sum from the ALU is used as the address for the data memory.
5. The data from the memory units is written into the register files; the Register
destination is given by bits 20:16 of the instruction ($t1).

[Link] for an operation in a branch–on–equal instruction:

Datapath for beq instruction

Beq $t1,$t2,offset

Four steps needed to execute the instruction:

1. An instruction is fetched from the instruction memory and the Pc is incremented.


2. Two register r$t1, $t2 are read from the register file.
3. The ALU performs a subtract on the data value read from the register file. The value
PC+4 is added to the sign extended , lower 16 bits of the instruction shifted left by
two, the result is the branch target address.
4. The Zero result from the ALU is used to decide which adder result to store into the PC.

Implementing Jumps:

 The low-order 2 bits of a jump address are always 00two.


 The next lower 26bits of this 32 bit address come from the 26 bit immediate field in
the instruction.
 The upper 4 bits of the address replace the Pc come from Pc of the jump plus 4.

Jump by storing into the PC the concatenation of:

1. The upper 4 bit of the current PC+4


2. The26bitimmediatefieldofthejumpinstruction.
3. The bit 00 two.

000010 address
31:26 25:0
4. PIPELINING: MAY/JUNE2016,NOV/DEC2014(16MARKS)
WHAT IS PIPELINING AND DISCUSS ABOUT PIPELINED DATAPATH AND CONTROL

Pipelining is an implementation technique in which multiple instructions are overlapped in


execution.

Five steps in a MIPS instruction:

1. IF: Instruction fetch from memory

2. ID: Instruction decode & register read

3. EX: Execute operation or calculate address

4. MEM: Access memory operand

5. WB: Write result back to

register Example:

Assume time for stages is100ps for register read or write 200ps for other stages.
Compare pipelined datapath with single-cycle datapath

Solution:

Instruction Instruction Registe ALU Data Registe TotalTime


class fetch r read operation Acces r write
s
LW 200ps 100ps 200ps 200ps 100ps 800ps
SW 200ps 100ps 200ps 200ps 700ps
R-format 200ps 100ps 200ps 100ps 600ps
Branch 200ps 100ps 200ps 500ps

Single-Cycle
0 100 200 300 400 500 600 700 800 900 1000 110012001300140015001600170018001900
Instr
Time(ps)
1 Fetch Decode Execute Memory Write
Instruction ReadReg ALU Read/Write Reg
2 Fetch Decode Execute Memory Write
Instruction ReadReg ALU Read/Write Reg

Pipelined
Instr

1 Fetch Decode Execute Memory Write


Instruction ReadReg ALU Read/Write Reg
2 Fetch Decode Execute Memory Write
Instruction ReadReg ALU Read/Write Reg
3 Fetch Decode Execute Memory Write
Instruction ReadReg ALU Read/Write Reg
Pipeline Speedup:

If all stages are balanced i.e., all take the same time.

𝐓𝐢𝐦𝐞 𝐛𝐞𝐭𝐰𝐞𝐞𝐧 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧𝐬 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞=


𝑻𝒊𝒎𝒆 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝒊𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝒏𝒐𝒏 𝒑𝒊𝒑𝒆𝒍𝒊𝒏𝒆
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇𝑷𝒊𝒑𝒆 𝑺𝒕𝒂𝒈𝒆𝒔

If not balanced, speedup is less. Speedup due to increased throughput Latency (time
for each instruction) does notdecrease

PIPELINED DATAPATH AND CONTROL:

The datapath and control unit share similarities with both the single-cycle and multicycle
implementations that we already saw.

An example execution highlights important pipelining concepts.

PIPELINING DATAPATH:
The single cycle datapath with the pipeline stages identified. The division of an instruction
into 5 stages means a five stage pipeline and the five stage are as follows:

IF: Instruction fetch

ID: Instruction decode and register file read.

Execution or address calculation

MEM: Data memory access

WB: Write back

Five components correspond roughly to the way the datapath:


Five stages as they complete execution. Returning to

The instructions and data move generally from left to right through the five

stages. Two exceptions to this left to right flow of instructions:

Pipeline version of datapath:

Pipelined Data path highlighting the pipeline registers:


The registers are named for separating the 2 stages.

The register between IF & ID stages to separate instruction fetch and decode.

The register between ID & EXE to separate decode and ALU execution.

The register between EXE and MEM to separate ALU execution and data Memory

The register between MEM and WB to separate data memory and write data to register

The register file is split into 2 local parts:

1. Register read during register fetch (ID)


2. Register written during write back(WB)

Instruction execution in a pipeline manner

1 2 3 4 5 6 7 8 9 10

Time (cycles)

IMlw $0 DM $s2
lw$s2, 40($0) RF
40
+ RF

IMadd $t1 DM $s3


add$s3,$t1,$t2 RF
$t2
+ RF

IMsub $s1 DM $s4


sub$s4,$s1,$s5 RF
$s5
- RF

IMand $t5 DM $s5


and$s5,$t5,$t6 RF $t6
& RF

IMsw $s1 DM $s6


sw$s6, 20($s1) RF
20
+ RF

$t3
IMor DM $s7 RF
or$s7,$t3,$t4 RF$t4 |

IM represent the instruction memory and the PC in the instruction fetch stage(IF)

Reg stand for the register file in the instruction decode/register file read stage

Execution of Load /store instruction in a five stage pipeline:

Stage1: Instruction fetch:

 The instruction being read from instruction memory using the address in PC and
then storing the instruction in the IF/ID pipeline register.
 PC address is incremented by 4and then write back into PC to read for next clock
cycle.
 Incremented address is also saved in the IF/ID pipeline register.

Stage2: Instruction decode & register file read:

 The instruction portion of the IF/ID pipeline register.


 It supply the 16 bit immediate field, which is sign extended to 32 bit and the register
numbers to read the 2 registers.
 All three values are stored in the ID? EX pipeline register.

Stage3: Execute and address calculation:

 The load instruction reads the contents of register 1 and the sign extended
immediate from ID/EX pipeline register and add them using ALU.
 The sum is placed in EX/MEM pipeline register.

Stage4: Memory access:

 The load instruction reading the data memory using the address from the EX/MEM
pipeline register and loading the data into the MEM/WB pipeline registers.

Stage5: Write Back:

 Reading the data from the MEM/WB pipeline register and writing into the register file.

PIPELINED CONTROL:

 In pipeline control just add the control to the pipelined data path.
 Thus data path borrows the control logic for source, register destination number and
ALU control.

 The control signals are generated in the same way as in the single-cycle processor—after an
instruction is fetched, the processor decodes it and produces the appropriate control values.

 But just like before, some of the control signals will not be needed until some later stage
and clock cycle.
 These signals must be propagated through the pipeline until they reach the appropriate
stage. We can just pass them in the pipeline registers, along with the other data.

Control signals can be categorized by the pipeline stage that uses them:

Stage Control signals needed

EX ALUSrc ALUOp RegDst

MEM MemRead MemWrite PCSrc

WB RegWrite MemToReg

Stage1:Instruction fetch

 The control signals to read instruction memory and to write PC are always asserted.

Stage2: Instruction decode/register file read:

 As in the previous stage the same thing happens at every clock cycle, so there are
no optional control line to set.

Stage 3: Execution/address calculation:

 The signal to be set are RegDst, ALUOpand ALUSrc.


 The signals select the Result register, The ALU operation and either Read data 2 or
asign extended immediate for the ALU.

Write and draw ALU tabular column from (control


implementation scheme_)
Stage 4: Memory access:

 The control lines set in this stage are Branch, MEM Read and MEM write.
 These signals are set by branch equal, load and store instructions.
 PCSrc selects the next sequential address unless control asserts Branch and the ALU
result was 0.

Stage 5:WriteBack:

 The two control lines are MemtoReg, which decides between sending the ALU result
or the memory value to the register file and Reg-write, which writes the chosen
value.
5. PIPELINE HAZARDS:
MAY/JUNE2016,APR/MAY2015,NOV/DEC2014(16MARKS)

A pipeline hazard refers to situations that prevent an instruction from entering


the next stage is called hazard.

3 different types of hazard:


1. Structural hazards
2. Data hazards
3. Control hazards

Structural hazards - Attempt to use the same resource by two or more instructions
at the same time

Control hazards-attempt to make branching decisions before branch condition is evaluated

Data hazards – When wither the source or destination operands of an instruction are not
available at time expected in the pipeline and as a result pipeline is stalled. This situation
is a data hazard.

Let’s look at a sequence with many dependences,:

Sub $2,$1,$3 #Register $2 written by sub


And $12,$2,$5 # 1st operand($2) depends on
sub or $13,$6,$2 #2nd operand($2) depends on sub
add $14,$2,$2 # 1st($2) & 2nd($2) depend on
sub sw $15,100($2) # Base ($2) depends on sub

The last four instructions are all dependent on the result in register $2 of the first instruction.
If register $2 had the value 10 before the subtract instruction and −20 afterwards, the
programmer intends that −20 will be used in the following instructions that refer to
register $2.

HANDLING DATA HAZARD: Forwarding versus Stalling

Data hazards occur when the pipeline changes the order of read/write accesses to operands
that differs from the normal sequential order

Forwarding (by passing) with two instructions:

 It is also known as by passing.


 This is a method resolving a data hazard by retrieving the missing data element
from the internal buffer rather than waiting for it to arrive from the memory.

Graphical Representation of Forwarding:


Its how the connection to forward the value in $s 0 after the execution stage of the add
instruction.
Forwarding paths are valid only if the destination stage is later in time than source stage.
Forwarding cannot prevent all pipeline stalls. They are also often referred to as bubbles in the
pipeline.
Forwarding versus stalling:

Data hazard with a sequence of many dependencies


Sub $2, $1, $3 # register $2 written by sub
and $12, $2, $5 # 1st oper ($2) depends on s
and u
b
or $13, $6, $2 # 2nd oper ($2) depends on s
and u
b
add $14, $2, $2 # 1st ($2 & 2nd ($2) depen on s
) d u
b
sw $15, 100($2) # base($2)depends on sub

The last four instructions are all dependent on the result in register $2 of the first
instruction.

If the register $2 had the value 10 before the subtraction instruction and -20 afterwards, the
programmer intends -20 will be used in the following instruction that refer to register $2.

Pipelined Dependences in a five instruction sequence:


Time(inclockcycl es)

Valueof CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC


9
register$2: 1 10 10 10 10/–20 –20 –20 –20 –
0 20
Program
execution
order
(in instructions)
sub$2,$1,$3 IM Reg DM Reg

and$12,$2, $5 IM Reg DM Reg

or$13,$6,$2 IM Reg DM Reg

add$14,$2, $2 IM Reg DM Reg

sw$15,100($2) IM Reg DM Reg

This diagram illustrates the execution of these instructions using a multiple clock cycle
pipeline representation.

Hazard condition: HAZARD DETECTION UNIT

The two pairs of hazard conditions are

1a: EX/[Link] =

ID/[Link] 1b: EX/[Link]

= ID/[Link]

2a:MEM/[Link]=ID/[Link]

s 2b:MEM/[Link]=

ID/[Link]

The first hazard in the sequence is one register $2, between the result of sub $2 $1 $3
and thefirstreadoperandofand$12,$2,$[Link]
the EX stage and the prior instruction is in the MEM stage

EX/[Link]=ID?[Link]=$2.

The sub-or is a type 2b hazard:


MEM/[Link]=ID/[Link]=$2
■ The two dependences on sub-add are not hazards because the register file supplies the
proper data during the ID stage of add.
■ There is no data hazard between sub and sw because sw reads $2 the clock cycle after
sub writes $2.
Can forward only to the―or‖and―add‖instructions without stalling $2 still unavailable in
EX/MEM for―and‖.When sub was the―writing ‖instruction, we forwarded from EX/MEM to
the ALU for―and‖

The multiplexors have been expanded to add the forwarding paths, and we show
the forwarding unit.

The hardware necessary to support forwarding for operations that use results during the EX stage.
Note that the EX/[Link] field is the register destination for either an ALU
instruction(which comes from the Rd field of the instruction) or a load (which comes from the Rt fi
eld).
Forwarding can also help with hazards when store instructions are dependent on other
instructions. Since they use just one data value during the MEM stage, forwarding is easy.

The control values for the forwarding multiplexors:


Mux control Source Explanation
Forward A= 00 ID/EX The first ALU operand comes from the register file.
Forward A= 10 EX/MEM The first ALU operand is forwarded from the prior ALU
Result
Forward A= 01 MEM/WB The first ALU operand is forwarded from data memory or
an earlier ALU result
Forward B= 00 ID/EX ID/EX The second ALU operand comes from the register
file.
Forward B= 10 EX/MEM The second ALU operand is forwarded from the prior ALU
result
Forward B= 01 MEM/WB The second ALU operand is forwarded from data
memory
Or an earlier ALU result

Let’s now write both the conditions for detecting hazards and the control signals to resolve
them: Example:

[Link] Hazard:

if (EX/[Link])

and (EX/[Link] ≠ 0)

and(EX/[Link]=ID/[Link])) ForwardA=10

if (EX/[Link])

and (EX/[Link] ≠ 0)

and(EX/[Link]=ID/[Link])) ForwardB=10

MEM Hazard:

if (MEM/[Link])

and (MEM/[Link] ≠ 0)

and(MEM/[Link]=ID/[Link])) ForwardA=01

if (MEM/[Link])

and (MEM/[Link] ≠ 0)

and(MEM/[Link]=ID/[Link])) ForwardB=01

Data hazards and stalls:

• So far, we’ve only addressed ―potential‖ data hazards, where the forwarding unit
was able to detect and resolve them without affecting the performance of the
pipeline.
• There are also―unavoidable ‖data hazards, which the forwarding unit cannot
resolve, and whose resolution does affect pipeline performance.

• We thus add a(unavoidable) hazard detection unit, which detects them and
introduces stalls to resolve them.

Forwarding technique is used to minimize data hazard:


Program
Time(in clock cycles)
execution
order CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10
(in instructions)
IM Reg DM Reg
lw$2,20($1)

and$4,$2,$5 IM Reg Reg DM Reg

or$8,$2,$6 IM IM Reg DM Reg

bubble

add$9,$4,$2 IM Reg DM Reg

slt$1,$6,$7 IM Reg DM Reg

In the load instruction the data is read from memory in clock cycle 4. While the Alu
perform the operation for the following instruction. sometimes the stall the pipeline for the
combination of load.

HANDLING CONTROL HAZARDS:


Control hazard:

 It is also known has branch hazard.


 When the proper instruction cannot execute in the proper pipeline clock cycle is
know has control branch.
 A control hazard is when we need to find the destination of a branch, and can’t fetch
any new instructions until we know that destination.

Reducing the delay of branch:

One way to improve branch performance is to reduce the cost of the taken branch.

There are two complicating factors: A branch is either

1. Taken: If a branch is changing the PC to its target address, than it is a taken branch.

PC<= PC + 4 + Immediate
2. Not Taken: If a branch doesn’t change the PC to its target address, than it is a not
taken branch.

PC<= PC + 4

The branch instruction decided where to branch in MEM stage the clock cycle 4 for the beq
instruction.

3 sequential instructions that follow the branch will be fetch and being

execution. 3 following instruction begin execution beq branch to location 36.

There is delay in the proper instruction to fetch

fl ush To discard
instructionsinapipeline,
usually due to an
unexpected event.
Handling control branch:

Control hazard can be handle using branch prediction.

Prediction means: A statement of what will happen in the

future Branch predictor:

 Branch prediction technique is used to handle branches.

 A branch predictor is a digital circuit that tries to guess which way a branch
([Link]-then-else structure) will go before this is known for sure.

 If the prediction is correct, avoid delay of the branch hazard


 Ifthepredictionisincorrect,flushthewrong-pathinstructionsfromthepipeline& fetch the
right path

 The purpose of the branch predictor is to improve the flow in the instruction pipeline

Two types of branch prediction:

 The behavior of branch can be predicted both statically at compile time and
dynamically by the hardware at execution time.

1. Static branch prediction

2. Dynamic branch prediction


Static branch prediction:

Predicting a branch at compile time helps for scheduling data hazards.

Dynamic branch prediction:

 The prediction determined at run time is known as dynamic branch prediction

 The simplest dynamic branch-prediction scheme is a branch-prediction buffer or


branch history table.

Branch-prediction buffer:

 A branch-prediction buffer is a small memory(cache)indexed by the lower portion of


the address of the branch instruction.

 The memory contains a bit that says whether the branch was recently taken or

not. Two branch prediction scheme:

1. one-bit prediction scheme

2. Two-bits prediction

scheme 1-bit prediction

scheme:
 If a branch is almost take, we can predict incorrectly twice otherwise it is not taken

2-bitpredictionscheme:

 In a two bits prediction scheme must miss twice before it is changed

 In a two bits prediction scheme are used to encode the four states in the system.

 One bit that predicts the direction of the current branch if the previous branch
was not taken (PNT).

 One bit that predicts the direction of the current branch if the previous branch was
taken
(PT).

 It is general instance of a counter-based predictor.


 Counter-based predictor is incremented when the prediction is accurate
and decremented otherwise.
 The counters saturate at 00 or 11
 It has an n-bit saturating counter for each entry in the prediction buffer.
 With an n-bit counter, the counter can take on values between 0 and 2^n–1:
 When the counter is greater than or equal to one half of its maximum value
(2^n–1), the branch is predicted as taken; otherwise, it is predicted untaken.

Branch delay slot:


 The slot directly aft era delayed branch instruction, which in the MIPS architecture is
filled by an instruction that does not affect the branch.
 Compilers and assemblers try to place an instruction that always executes after the
branch in the branch delay slot.
 The job of the software is to make the successor instructions valid and useful.

Three ways in which the branch delay slot can be scheduled:

The top box in each pair shows the code before scheduling; the bottom box shows the
scheduled code.
In (a), the delay slot is scheduled with an independent instruction from before the branch. This
is the best choice. Strategies (b) and (c) are used when (a) is not possible.
In the code sequences for (b) and (c), the use of $s1 in the branch condition prevents the add
instruction (whose destination is $s1) from being moved into the branch delay slot.
In (b) the branch delay slot is scheduled from the target of the branch; usually the target
instruction will need to be copied because it can be reached by another path.
Strategy (b) is preferred when the branch is taken with high probability, such as a loop
branch.
Finally, the branch may be scheduled from the not-taken fall-through as in (c).
To make this optimization legal for (b) or (c), it must be OK to execute the sub instruction
when the branch goes in the unexpected direction.
By―OK‖we mean that the work is wasted, but the program will still execute correctly.

6. EXCEPTIONS:

 One of the difficult parts of control is to implement exceptions and interrupts.


 Exceptions are generally generated unscheduled events that disrupt
program execution and they have used to detect overflow.
 Exception Also called interrupt.
 Interrupt comes from outside of the processor.

An interrupts is an exception that comes from outside of the processor:

Type of event From where MIPS terminology


I/O device request External Interrupt
Invoke the operating Internal Exception
system from user
Arithmetic overflow Internal Exception
Hardware Either Exception
malfunction
or interrups
Handling Exception in the MIPS architecture:

Types of exception:

1. Execution of an undefined instruction.


2. Arithmetic overflow in the instruction add $1,$2,$1.

Reponses to an exception:

When an exception occurs the processor save the address of the offending instruction in
the Exception Program Counter ( EPC) and transfer control to the OS at some specified
address.

The OS take the appropriate action, which involve providing some service to the user
program, taking some predefined action in response to an exception or stopping the
execution of the program and reporting an error.

Methods used to communicate the reason for an exception:

To handling the exception it is must for the OS to know the reason for the exception.

Two main methods used to communicate the reason for an exception:

1. Status register method


2. Vectored interrupts method

Status register method: the MIPS architecture uses a status register which holds a field
that indicates the reason for the exception.

Vectored interrupts method: In a vectored interrupt the address to which control is


transferred is determined by the cause of the exception.

Exception in a pipelined implementation:

A pipelined implementation treats exceptions as another form of control hazard.

In pipeline computers interrupt and exceptions are further lassifiedas:

1. Imprecise interrupts or imprecise exceptions


2. Precise interrupts or precise exceptions

Imprecise interrupts or imprecise exceptions:

In pipelined computers that are not associated with the exact instruction that war the
cause of the interrupt or exception is called imprecise interrupts or imprecise exceptions

Precise interrupts or precise exceptions:


An interrupt or exception that is always associated with the correct instruction in pipeline
computer called precise interrupts or precise exceptions.

You might also like