0% found this document useful (0 votes)
15 views24 pages

MIPS Instruction Execution and Control

Unit IV covers processor architecture, focusing on instruction execution, data path construction, and control unit design. It discusses hardwired and microprogrammed control, pipelining, and handling data and control hazards. The content includes MIPS implementation details and the operation of various instruction classes, emphasizing the importance of the program counter and register file in executing instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views24 pages

MIPS Instruction Execution and Control

Unit IV covers processor architecture, focusing on instruction execution, data path construction, and control unit design. It discusses hardwired and microprogrammed control, pipelining, and handling data and control hazards. The content includes MIPS implementation details and the operation of various instruction classes, emphasizing the importance of the program counter and register file in executing instructions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT IV

7 Processor

Syllabus
Instruction Execution - Building a Data Path - Designing a Control Unit - Hardwired Control,
Microprogrammed Control - Pipelining - DataHazard - Control Hazards.

Contents
7.1 Instruction Execution
Dec.-15, 18, May-19, Marks 16
7.2 Basic MIPS Implementation
Dec.-14, May-15, Marks 8
7.3 Building a Data Path
7.4 Designing a Control Unit. Dec.-14,18, Marks 8

75 Hardwired Control
7.6 Microprogrammed Control
7.7 Comparison Between Hardwired
Marks 6
and Microprogrammed Control Units Dec.-18,
7.8 Pipelining May-07, 1213, 15, 16,17,19,
Dec.-06,08, 09,14, 15,16,18,
June-09 Marks 6

May-16, 19, Marks 12


7.9 Pipelined Datapath and Control
7.10 Handling Data Hazards. Dec.-08, 11, 12,
May-14,17,18 19, Marks 2

7.11 Handling Control Hazards. May-08, 09, 10, 11, 13, 14,18,
Dec.-07, 08, 16, 18, Marks 6

12 Two Maks Questions and Answers

(7- 1)
Digital Principles and Computer Organization 7-2
Processor
7.1 Instruction Execution
" Let us see how instruction is executed. The complete instruction cycle involves
three operations : Instruction fetching, opcode decoding and instruction execution.
Fig. 7.1.1 shows the basic instruction cycle. After each instruction cycle, central
processing unit checks for any valid interrupt request. If so, central processing unit
fetches the instructions from the interrupt service routine and after completion of
interrupt service routine, central processing unit starts the new instruction cvcle
from where it has been interrupted.
Fig. 7.1.2 shows instruction cycle with interrupt cycle.
START

START
Fetch the
next instruction Fetch cycle

Fetch the
next instruction Fetch cycle
Decode instruction Decode cycle

Decode instruction Decode cycle


Execute instruction Execute cycle

Execute instruction Execute cycle


No Check
for interrupts Interrupt cycle

STOP
Yes

Process interrupts

Fig. 7.1.1 Basic instruction cycle Fig. 7.1.2 Basic instruction cycle with interrupt
Instruction fetch cycle : In this cycle, the instruction is
location whose address is in the PC. This instruction is placedfetched from the memory.
in the Instruction Register
(IR) in the processor.
Instruction decode cycle : In this cycle, the opcode of the instruction stored in the
instruction register is decoded/examined to determine which operation is to be
performed.
Instruction execution cycle :
In this cycle, the specified operation is performed by the processor. This often
involves fetching operands from the memory or from processor registers, performing an
arithmetic or logical operation and storing the result in the destination location. During
TECHNICAL PUBLICATIONS -an up-thrust for knowiedge
ProcesSor
Principles and Computer Organization 7-3

ihe
execution, PC contents are incremented to point to the next instruction.
instruction

Aker
completion of execution of the current instruction, the PC contains the address of
the next instruction and a new instruction fetch cycle can begin.

Review Question

1 Writea short 1ote on instruction execution.

12 Basic MIPS Implementation AU: Dec.-15,18, May-19


the core MIPS
. In this chapter we will see the implementation of a subset of
instruction set. These instructions are divided into three classes :
The memory-reference instructions : load word (w) and store word (sw)
The arithmetic-logical instructions : add, sub, AND, OR, and slt
The branch instructions : branch equal (beq) and jump (j)
all the integer instructions (for
" The subset considered here does not include
example, shift, multiply, and divide are missing), nor does it inciude any
creating a datapath and
floating-point instructions. The key principles used in
of the remaining
designing the control are discussed here. The implementation
instructions is somewhat similar.
first two steps are same :
For implementing every instruction, the
Program Counter (PC) contents (address of
1. Fetch the instruction : Send the
opcode and fetch the instruction
instruction) to the memory that contains the
from that memory.
operand(s) : Read one or two registers, using fields of the instruction to
2. Fetch read only
the registers to read. For the load Word instruction, we need to
select
other instructions we require to read two regisiers.
one register, but most depend on the
remaining actions required to complete the instruction
" The instruction classes (memory-reference.
instruction class. For each of the three independent of
the same,
arithmetic-logica! and branches), the actions are mostlyand regularity of the MIPS
the simplicity
the exact instruction. This shows that by making the execution of many
of
simplifies the implementation
instruction set
similar.
the instruction classes
classes, except jump, use the Arithmetic-Logical Unit
For example, a!l instruction
(ALU) after reading the registers. calculation
ALUfor an address
Memory-reference instructions use the and
for the operation execution
Arithmetic-logical instructions use the ALU
comparison.
Branches use the ALUfor
TECHNICAL PUBLICATIONS - an up-thrust for knowledge
Digital Principles and Computer Organization 7-4

After using the ALU, the actions required to complete various


Proces
are not same.

A memory-reference instruction nceds to access the memory either to


innstruclion clase:
for a load or write data for a store. read
An arithmetic-logical or load instruction must write the data from
memory back into aregister.
the ALU
Abranch instruction may need to change the next instruction address
the comparison; otherwise, the PC should be incremented by 4 to baased on
address of the next instruction.
Fig. 7.2.1 shows the block diagram of a MIPS implementation, showing ha
functional units and interconnection between them.

M
Program
COunter
X

A A
D D
D D

Data
Address

Register1
Instruction A Address
Register 2
Data
M
Instruction Register 3 memory
memorY X
Register file Data

Fig. 7.2.1 Major functional units and interconnections between them for implementation
of MIPS subset
Operation
The program cOunter gIves the instruction
address
After the instruction is fetched, the register therequired
operands
to instruction
by ne instructie
an
are specified by fields of that instruction
Comput!
Once the register operands
have been
memory address (for a load or store), fetched, they can be used to (tor
result
to compute an arithmei.
integer arithmetic-logical
instruction), or a
compare (for a branch).
TECHNICAL PUBLICATIOS an up-thrust for knowledge
Principlesand Computer Organization Processor
7-5

instruction
is an ALU must
Ifthe arithmetic-logical instruction, the result from the
be written to a register.
address to either
If the operation is a load or store, the ALU
result is used as an
registers.
store a value from the registers or load a value from memory into the
file.
The result from the ALU or memory is written back into the register
instruction
Branches require the use of the ALU output to determine the next
branch offset are
address, which comes either from the ALU(where the PC and
summed) or from an adder that increments the current PC by 4.
is coming from two ditrerent
Fig. 7.2.l shows that data going to a particular unit from one Of two
sources. For eXample, the value written into the PC can come
ALUor the
the data written into the register file can come from either the
adders, the
and the second input to the ALU can come from a register or
data memory,
selection of appropriate source 15 one
immediate field of the instruction. The
(data selector). The multiplexer selects from among several
using multiplexer control lines. The control lines are set based
of its
inputs based on the setting being executed. This i
taken from the instruction
primarily on information
illustrated in Fig. 7.2.2.
Branch

Control
M
Program
cOunter X

ALU
4
D operation

D M

Mem R
Zero
Data

Register 1 Address
Address A
Register2 L Data
Instruction memory
Register 3 M
Data Mem W
Registerfile X
Reg W
Instruction
memory

with control signals


implementation of MIPS
Basic
Fig. 7.2.2
knowledge
for
PUBLICATIONS" an up-thrust
Digital Principles and
Computer Orgarnization
7-6
instruction as an
Proces or
the control
has the
unit, which functional units and two
also shows control signals for the
inputof, thie
Fig. 7.2.2
determine the
Used to or the branch
multiplexers.
The third
multiplexer, which
the PC, is set based on
the
PC+4
determines whether Zero output of the
instruction.
destinatwhiionh
ALU,
into
address is written comparison of a beq
perfornm the
IS used to
subset.
Review Questions
for implementation of MIPS
functional block diagram
1. Draw and explain the
with control signals for basic impiementation
of
function block diagram
2. Draw and explainthe
MIPS subset.
necessary nultiplexers and control lines.
MIPS iplementation with
3. Explain the basic AU : Dec.-15, May-19, Marks 16

cominon to implement any type of instruction.


that are
4. Write the two steps AU: Dec.-18, Marks 2

AU :Dec.-14, May-15
7.3 Building a Data Path
As shown in Fig. 7.3.2,
the MIPS M
Program
implementation includes, the datapath
x
Counter
X
elements (a unit used to operate on or
hold data within a processor) such as
the instruction and data memories, the
register file, the ALU, and adders.
" Fig. 7.3.1 shows the combination of
D
D
the three elements (instruction
memory, program counter and adder)
from Fig. 7.3.2 to form a datapath that
fetches instructions and increments the
PC to obtain the address of the next
sequential instruction.
Address
The instruction memory stores the
instructions of a programn and gives
instruction as an output corresponding Instruction
to the address specified by the
program counter. The adder is used to Instruction
increment the PC by 4 to the address memory
of the next instruction. instruction
Fig. 7.3.1 Data path to fetch
and increment PC
Since the instruction memory only
reads, the output at any time reflects the contents of the location
Specifiedbythe
address input, and no read control signal is needed

TECHNICAL PUBLICATIONS an
up-thrust for knowledge
wlalPrinCjples and Computer Organization Processor
7-7

The program Counter is a 32-bits register that is written at the end of every clock
Cvcle and thus does not necd a write
control signal.
The adder always adds its two 32-bits inputs and place the sum on its output.

3.1 patapath Segment for Arithmetic - Logic Instructions


The arithmetic-logic instructions read operands from two registers, perforn an

ALU operation on the contents of the registers, and write the result to a register.
We call these instructions as R-type instructions. This instruction class incudes
add, sub, AND, OR, and slt. For example, OR $t1, $t2, St3 reads $t2 and Sts,
performs logical OR operation and saves the result in $t1.
. The processor S 32 general-purpose registers are stored in a structure called a
register can be
register file. A register file is a collection of registers in which any
the file. The register
read or written by specifying the number of the register in
file contains the register state of the computer.
. Fig. 7.3.2 shows multiport Write data
ALU operation
register file (two read ports
and one write port) and the Zero

ALU section of Fig. 7.3.2 Data

We know that, the Read data 1


Read register
R-format instructions have
Register Read register A
three register operands numbers
Write register
Iwo source operands and Read data 2
one destination operand. Register file
be Reg W
For each data word to
register file and the ALU
read from the register file, Fig. 7.3.2 Multiport
the
we need to specify One
register numberto the
data word, we need two inputs :
other hand, to write a supply the data to be
register file. On the and one to
register number to be written
to specify the
written into the register. register numbers are on
contents of whatever
file always outputs the however, are controlled by the
write
The register inputs. Write operations,
asserted for a write operation at the
clock
the Read register
signal. This signal is
Control (Reg W) pertorm read
edge-triggered, it is possible to
edge. register file are within a clock cycle: The read operation
the
Since writes to
the same register cycle, while the value written will be
for
and write operation in an
earlier clock
written cycle.
gives the value subsequent clock
in a
available to a read TECHNICAL PUBLICA TIOs an up-thrust
for
knowledge
Digital Principles and Computer Organization

As shown in Fig. 7.3.2, the register number


inputs are 5 bits wjde to Proces or
32 registers, whereas the data input and
two data output buses are each specity one of
wide. 32bits
7.3.2 Datapath Segment for Load Word and Store Word Instructions
" Now, consider the MIP'S load word and store word instructions, which
general form Iw $t1, offset_value($t2) or sw $t1, offset_value ($t2). have the
" In these instructions $t1 is a data register and $t2 is a base register. The
address is computed by adding the base register ($t2), to the 16-bits signedmernor:
offset
estishebo

value specified in the instruction.


In case of store instruction, the value from the data
register ($t1) must be read and
in case of load instruction, the value read
from nemory must be written into the
data register ($t1). Thus, we will need both the
Fig. 7.3.2. register file and the ALU from
We know that, the offset value is
16-bits and base register contents are
Thus, we need a sign-extend unit to 32-bits.
convert the
instruction to a32-bits signed value so that it can be 16-bits offset field in the
In addition to sign added to base register.
extend unit,
we need a data
to read from or
memory unit
write to. The Mem R
data
memory has read and
write control signals to Read data
the read and write control
It also has an operations. Address Sign
address
and an input for the data input, Data
extend

to be
written into memory. Fig. 73.3 memory
shows these two elements. Data
Mem W
Sign extension is
by replicating theimplemented
sign bit of the high-order Fig. 7.3.3 Data
item in the original data memory unit and the sign
extension unit
.
Therefore,
high-order bits of the
tw0 units needed to larger, destination data itemn.
register file and
extension unit. ALU of implement
Fig. 7.3.2, are loads and
the data stores, unit and the sig
in addition
tothe'

memory
Dyilal
/Principles and Computer Organization 7-9
Processor

73.3 Datapath Segment for Branch Instruction


. The beq instruction has three operands, two registers that are compared tor
equality, and a l6-bits offset which is used to compute the branch target address
relative to the branch instruction address. It has a general form : beq $t1, $t2,
offset.

To implement this instruction, it is necessary to compute the branch target aduress


by adding the sign-extended offset field of the instruction to the PC. The two
important things in the definition of branch instructions which need careful
attention are :
branch address
The instruction set architecture specifies that the base for the
PC + 4 the
calculation is the address of the instruction following the branch (i.e.,
address of the next instruction.
field is shifted left 2 bits so that it is a
The architecture also states that the offset
range of the offset field by a factor
word offset; this shift increases the effective
of 4.

Therefore, the branch target address is given by


offset (shifted left 2 bits)
Branch target address = PC+4 + Branch target address
" In addition to computing the M
must Program
branch target address, we cOunter X
also see whether the two

or not. If two
operands are equal PC + 4
next
operands are not equal the that
is the instruction
instruction
(PC= PC+4); D
Tollows sequentially that the D
we say PC
in this case,
On the other
branch is not taken.
are equal.
hand, if two operands the branch
condition is true),
(1.e.,
becomes the new Shift leit
target address
the branch is
by
PC, and we say that Address 2 bits

taken.
branch datapath must Instruction
Thus, the operations
two
perform branch target Instruction
the
Compute the register memory
address and compare Computation of branch target
Fig. 7.3.4 address
contents.
knowledge
TECHNICAL PUBLICA TIONS - an up-thrust for
Digital Principles and Computer 7- 10
Organization
Fig. 7.3.5 shows the structure of the datapath segment that handles Pr
branches.
oces or
Branch targetaddress
Program
Counter
M

PC + 4

A
A
D
D
PC

Shift left by ALUGpeaicn


2 bits Zero for
Data branch logic
Address
Sign Register Read register Read data 1
extend numbers
Read register
Instruction Write register
Instruction Read data 2
Register file
memory
Reg W
Fig. 7.3.5Structure of the datapath segment that
To compute the branch handles branches
target address, the branch
extension unit, shifter and an adder. datapath includes a sig
To perform the
Fig. 7.3.2.
compare, we need to use the register file and
the ALU show
Since the ALU provides an
Zero signal that
can sernd the two
register operands to the indicates
ALU
whether the result ts
with the control set to
subtract operation. If the Zero
are equal. signal is asserted, we know that the two Values
" For jump
instruction lower 28 bits of the PC
instruction shifted left by 2 bits and are replaced by lower 26 bits of the
implemented by simply making two SB bits = 0. This Can be
In the MIPS concatenating 00 to the jump.
instruction set, branches are delayed,
immediately following instruction
the branch is always meaning the
that of whether the
branch condition is true or
false. executed,
like a normal branch. When the When the condition is false, the execution independent looks

the instruction
immediately condition is true, a
delayed
first execute
branch
before jumping to the specifiedfollowing the branch in instruction order
address. sequential
branch target
CHNICAL PL
Principles and Computer ProcesSor
Dgial
Organization 7- 11

134Creating aSingle Datapath


We can combine the datapath components needed for the individual instruction
classes into asingle datapath and add the control to complete the implementation.
This simplest datapath will attempt to execute all instructions in one clock cycie.
This means that no datapath resource can be used more than once per instruction,
so any element needed more than once must be duplicated. We therefore need a
memory for instructions separate from one for data. We need the functional units
to be duplicated and many of the elements can be shared by different instruction
flows.

" To share a datapath element between two different instruction classes, we have
connected multiple connections to the input of an element and used a multiplexer
and control signal to select among the multiple inputs.
and Microprogrammed
Between Hardwired
7.7 Comparison
Control Units
Hardwired Control
Microprogrammed Control
Attribute
Slow
Fast
Speed Implemnented in software
Implemented in hardware
Control functions
new
accommodate new More flexible, to accOmmOdate
Not flexible, to new
Flexibility system
specifications or new System specification or required.
instructions redesign is
instructions.

TECHNICAL PUBLICA TIOS® an up-thrust for knowledge


-
Digital Principles and Computer
Organization 7-46

Ability to handle Somewhat difficult Easier


ProcesSor
large/complex
instruction sets
Ability to support Very difficult (unless anticipated
operating systems Easy
and diagnostic during design)
features
Design process Somewhat complicated Orderly and systematic
Applications
Mostly RISC microprocessors Mainframes, some microprocessors
Instruction set
size Usually under 100 instructions Usually over 100 instructions
ROM size
2K to 10 K by 20-400 bit
microinstructions
Chip area Uses least area
efficiency Uses more area

Review Question

1. Compare hardwired control unit with


microprogrammed control unit.
AU: Dec.-18, Marks 6
7.8 Pipelining AU : May-07,12, 13, 15,
16,17,19,
We have seen various
cycles involved in the
Dec.-06,08,09,14.15,16.18,
June-08,09
and execute cycles for several instruction cycle. These fetch, decode
instructions
Overall processing time. This process is are performed simultaneously to reduce
To apply the concept of referred to as instruction pipelining
instruction pipelining,
processing in number of stages We must subdivide instruction
as given below.
S, - Fetch (F) : Read
instruction from the
S, - Decode (D):
Decode the opcode and fetch source
memory.
S3 - Execute (E):Perform the operand (s) if necessay"
operation specified by the
S4 - Store (S):
Store the result in the instruction.
Here, instruction destination.
processing divided into four stages
four-stage instruction
is
pipeline. With hence it is known
equal
duration for each stage we can this subdivision and
16 time units to 7 assuming from
reduce the execution time for 4 instructions
time units. This is
illustrated in Fig. 7.8.1.

TECHNICAL PUBLICA TIONS®


an up-thrust for
Principles and Computer Organization Processor
7-47

Clock
cycle 23 45678
2

Instruction

F D,E, S,

F2 D Ez S2

l3 F3 D, E S3

F4 D4 E4 S4
Fig. 7.8.1 Four stage instruction pipelining
" In this instruction pipelining four instructions are in progress at any given time.
Fig. 7.8.2.
This means that four distinct hardware units are needed, as shown in
their tasks
These units are implemented such that they are capable of performing
with one another. Information from the
simultaneously and without interfering with
help of buffers.
stage is passed to the next stage with the
Interstage buffers

D E
F Decode instruction Execution Store
Fetch and fetch operation result
instruction operands
B2 B
B four-stage instruction pipeline
Fig. 7.8.2Hardware organization for
7.8.1 Pipeline Stages in the MIPS Instructions
1. Fetch instruction from
memory.
2. Read registers while decoding the
instruction. The regular format
instructions allows reading and decoding to occur
3. Execute the
simultaneously.
operation or calculate an address.
4. Access an operand in data memory.
5. Write the result into a register.

TECHNICAL PUBLICATIONS an up-thrust for knowledge


ProcesSor
Principles and Comnputer Organization 7-49

L82 Designing Instruction Sets for Pipelining


is
" From the following points we can realized that the MIPS instruction set
designed for pipelined execution.
1. All MIPS instructions are the same lenoth. This restriction makes it muc
easier to fetch instructions in the first pipeline stage and to decode them in the
second stage.
being
2. MIPS has only a few instruction formats, with the source register fields
the second
located in the same place in each instruction, Due to this symmetry
same time that the hardware is
stage can begin reading the register file at the
determining what type of instruction was fetched.
loads or stores in MIPS. Due to this
3. Memory operands only appear in
calculate the memnory address and
restriction we can use the execute stage to
then access memory in the following stage.
between
4. Since operands are aligned in memory, data can be transferred
processor and memory in a single pipeline stage.
7.8.3 Pipeline Hazards
diagram for instruction pipeline operation shown in Fig. 7.8.3 (b)
" The timing instruction in each clock cycle. This
means that
completes the processing of one
four times that of sequential operation.
the rate of instruction processing is to
increase in performance resulting from pipelining is proportional if
The potential However, this increase would be achieved
ondy
pipeline stages.
the number of
shown in Fig. 7.8.3 (b) could be performed without any
pipelined operation execution. Unfortunately, this is not the case.
interruption throughout program
its
of the pipeline stages may not be able to complete
For many of reasons, one
time.
operation in the allotted
which the operation specified in instruction 2
in
Fig. 7.8.4 shows an example cycle 4 through cycle 6. Thus, in cycles 5
cycles to complete, from
requires three buffer B, must
remain intact until the instruction
information in
and 6, the
completed its operation. This means that stage 2 and in turn,
in B.
execution stage has
accepting new instructions because the information
stage 1are blocked from Thus decode step for instruction and fetch step for
overwritten.
Cannot be in the Fig. 7.8.4.
postponed as shown
instruction 5 must be in Fig. 7.8.4 is said to have been stalled for two
shown
The instruction pipeline resumes in clock
cycles 5 and 6) and normal pipeline operation
clock cycles (clock
cycle 7. a hazard.
causes the pipeline to stall is called
Any reason that
TECHNICAL PUBLICATIOwS an up-thrust for knowledge
Digital Principles and Computer OrgAnization 7-50

Clock
cycle 12 3 67 8 Time
Proces Or
Instruction

FD, E, S,

F2 D, E

E3 S3

F D4 E4 S4

Fs DsEs
Fig. 7.8.4 Effect on pipeline of an
execution operation taking more than one clock cycle
Types of Hazards
1. Structural hazards : These
hazards are because of conflicts due to
resources whern even with all possible insufficient
operation. combination, it may not be possible to
overlap the
2. Data or data dependent hazards :
depends on the result of previous These result when instruction in the pipeline
completed. instructions which are still in pipeline and not
3. Instruction or control hazards : They arise while
instructions that change the contents of programn pipelining branch and otne
these hazards is to stall the counter. The simplest way to nat
pipeline. Stalling of the pipeline allows few
proceed to completion while stopping the execution of those which instructions
results in hazards.
7.8.4 Structural Hazards
The performance of
pipelined
pipelined and whether theyprocessor depends on
are multiple executionwhether
are the functional
units to Possible
Combination of instructions in the pipeline.
to be stalled to avoid If for some
allow all
Pipeline has
the resource conflicts then there is a combination,
In other words, we can say that when two structural hazara. given
hardware resource at the same tine, the instructions require the use of a
The most common case in which this
structural hazard oceurs.
hazard may ariseof the
One instruction may need to access memory for storage is in result
Imemory.
accesswhile
to another

TECHNICAL PUBLICATIONS an up-thrust for knowledge


DgialPrincijples and Computer Organization 7- 51 ProcesSor

instruction or operand needed is being fetched. If instructions and data reside in


the samne cache unit, only one instruction can proccod
and the other instruction 15
delayed. To avOId such type of structural hazards many
caches for instruction and data. processors use separate

78.5 Data Hazards


When either the source or the destination operands of an instruction are not
available at the time expected in the pipeline and as aresult pipeline is stalled, we
say such a situation is a data hazard.
" Consider a program with two instructions, I, followed by I,. When this program
is executed in a pipeline, the execution of these twoinstructions can be performed
concurrently. In such case the result ofI, may not be available for the execution
of I,. If the result of I, is dependent on the result of 1, we may get incorrect
result if both are executed concurrently. For example, assume A =10 in the
following two operations :
I, : A A+5
I,. : B Ax2
order, one after the other,
" When these two operations are performed in the given
concurrently, the value of A used in
we get result = 30. But if they are performed
to an incorrect result. In this
computing B would be the original value, 10, leading
on the result of I,. The hazard due to such
case data used in the I, depend
dependent hazard. To avoid incorrect
situation is called data hazard or data
instructions one after the other (in-order).
results we have to execute dependent

8.6 Control (Instruction) Hazards with a


to supply the execution units
The purpose of the instruction tetch unit is pipeline stalloceurs
stream of instructions. This stream is interrupted when
steady branch instruction. Such a situation is known as
cache miss or due to
either due to
instruction hazard. than data
cause greater degradation in performance
Instruction hazard can
hazards.
18.6.1 Unconditional Branching
instructions being executed in atwostage pipeline.
sequence of
Fig. 7.8.5 shows a branch instruction and its target instruction is l7. In clock
The instruction I2 is a is fetched and at the same time branch instruction (l2) is
instruction I3 computed. In clock cycle 4, the incorrectly
ycle 3, the address is
decoded and the target
TECHNICAL PUBLICA TIOwS an up-thrust for knowledge
Digital Principles and Conmputer OrganizaBion 7-52

Clock cycle 1 2 34 5 6
Time Proces!
Instruction

F1 E

I,(Branch) F2 E
Execution unit idle

F3 ! X

Fk Ek

Fk+1 Ek+1
Fig. 7.8.5 Effect of branching in two-stage pipelining
fetched instruction I3 is discarded and instruction Ik is fetched. During this ame
execution unit is idle and pipeline is stalled for one clock cycle.
Branch Penalty : The time lost as a result of a branch instruction is often referred to
the branch penalty.
Factor effecting branch penalty
1. It is more for complex instructions.
2. For a longer pipeline, branch penalty is more.
In case of longer pipelines, the branch penalty can be reduced by computing
branch address earlier in the pipeline.
14. Explain the hazards caused by unconditional branching statements.
AU

AU :-May-16,19
7.9 Pipelined Datapath and Control
The Fig. 7.9.1 shows the general structure of multistage pipeline. As shown in the
" sequence of
of m data
processor consists of aa
Fig. 7.9.1, the usually pipeline
segments.
processing circuits, called elements, stages or
Control unit

Data
Data R2 Rm Cm out
In R1

Stage S Stage Sm
Stage S processor
Fig. 7.9.1 Structure of pipeline
collectively perform a single operation on a stream of data operands
These stages done part by part in each stage, but the
passing through them. The processing 1s
only after an operand set has passed through the entire
final result is obtained
pipeline.
consists of two major blocks : Multiword input register and datapath
Each stage
circuit.
registers R hold partially processed results as they move
The multiword input
butters that prevent neighbouring
through the pipeline and they also serves as In each clock period the individual
stages from interfering with one another.
stage.
process its data and transfers its results to the next
Stages 7.9.T can smultaneOusly process un
pipeline processor shown in Fig.
T'he m-stage when the pipeline is full
to m independent sets of data operands. Thus
TECHNICAL PUBLICA TIONS - an up-thrust for knowledge
Digital Principles and Computer Organization 7-54

separate operations are being executed concurrently, cach in a


gives anew final result from the pipeline every clock cycle.
different [Link]
This
If time required to perform single suboperation in the pipeline is T
for mstage pipeline the time required to complete a single
seconds. This is called delay or latency of the pipeline. operation seconds then
The maximum number of op operations completed per second can be
This is called throughput of the pipeline. given as i/
7.9.1| lmplementation of Two Stage Instruction Pipelining
The simplest instruction pipelining breaks
fetch stage S, and an execute stage S. Wheninstruction processing into two parts
these two stages are overlapped. we
get two stage pipelining with increased
throughput.
The Fig. 7.9.2 shows an
fetch stage S consists of implementation
of a two-stage
the microprogram counter uPC,instruction pipeline. The
which is the source for
External address

Microprogram counter
(uPC)

Branch
address
Control memory
(CM)

Stage S

Microinstruction
register uIR

External Next
Conditions address
logic Decoders

Stage S,

Fig. 7.9.2 Two-stage pipelined microprogramControl signals


control unit
TECHNICAL PUBLICATIONS an up-thrust for knowledge
DgialPrinciples and Computer Organization Processor
7-55

microinstruction addresses, and the control memory CM, which stores the
microinstructions.
The execution stage S consists of microinstruction register ulR, the
decoders that
extract control signals from the microinstructions in ulR, and the logC o
determining next address or the branch address.
The microinstruction register acts as a buffer register for stage S). With these two
stages it is possible that, while instruction I, with address A, is being executed by
stage S2 the instruction lË, 1 with the next consecutive address Aj41 is fetched
from memory by stage S1. If on executing |. in S, it is determined that abranch
must be made to a nonconsecutive address, then the prefetched instruction Ij+1 in
S has to be discarded. In such cases branch address is obtained directly from ul
itself and fed back to S,. The branch address is then loaded into uPC and next
instruction is fetched from the branch address.

7.9.2 Organization of CPU with Four Stage Instruction Pipelining


The Fig. 7.9.3 shows the implementation of four-stage instruction pipelining. As
shown in the Fig. 7.9.3 the CPU is directly connected to a cache memory, which is
split into instruction and data parts, called the I-cache and D-cache, respectively.
This splitting of the cache permits both an instruction word and a memory data
word to be accessed in the same clock cycle.
Main memory
instruction Main memory
data

D - cache
I- cache

Data read Data write


Fetch PC| logic
ALU
logic
and
decode
logic IR

Register file
(RF)

S, : OL S, : EX S, :os
S, :IF (Operand load) (ALUoperand) (Operand store)
(Instruction fetch)

Organization of CPU for four-stage instruction


pipelining
Fig. 7.9.3
TECHNICAL PUBLICATIONs® an up-thrust for knowledge
Digital Principles and Computer rganization 7-56

The four stages S1 :S4 shown in Fig, 7.9.3 perform the following ProcesSo
S : IF : Instruction fetching and decoding using the Icache. functions:
S; : OL : Operand loading from the D-cache to register file.
S3 : EX : Data processing using the ALU and register file.
S4 : OS :Operand storing to the D-cache from register file.
Stages S, and S, implements memory load and store operations,
respectively
Stages S2, S and S4 share the CPU's local Register File (RF). The
register file act as interstage buffer registers. registers in the
The stage 3 implements data transfer and
data processing operations of the
register to register type using ALU of the CPU.
7.9.3 Implementation of MIPS Instruction Pipeline
Fig. 7.9.4 shows the single-cycle datapath with the pipeline stages. The instructon
execution is divided into five stages means a five-stage pipeline.
IF/ID
ID/EX
EX/MEM MEMWB
Program
COunter
X

D
4 D
Register Shift2 left
bit
byl
D Write
data

Read
data 1
Read Zero
Address register 1 A
Read Read data
Read data 2
Instruction 0 M Address
register 2 U
Write 1 X ALU Data
register result memory
Instruction
memory Write
16
32 data
Sign
extend

Fia. 7.9.4
Single-cycle datapath with the pipeline stages
TECHNICAL PUBLICATIONS® an up-thrust for
Principles and Computer Organization ProcesSor
7-61

The source and result register constitute the interstage buffers


pipelined operation, as shown in Fig. 7.10.1 (b). needea o

Interstage buffers

Source Result
register register

E
store
Execute result in
(ALU) register file
B B2
Forwarding path
(b) Forwarding path in the processor pipeline
Fig. 7.10.1Operand forwarding in a pipelined processor
The data forwarding mechanism is indicated by dashed lines.
destination bus or from
The two multiplexers select the data for ALU either from
source 1and source 2 registers.
it forwards ALU output
When the forwarding logic detects data dependency,
forwarding path to the ALUfor the next
available in the result register using data
Hence the execution of next (dependent) instruction proceeds without
operation.
interruption.
shows the hardware necessary to support forwarding for operations that
" Fig. 7.10.2
use results during the EX stage. ID/EX
WB EX/MEM
wB MEMWB
Control M ws
EX

IF/ID

Instruction
ALU
Registers Data
memory

Program Instruction
ounter mermory
Rs

[Link]
Rt EX/[Link]
IF/ID. Rt
[Link] Rd
[Link] Forwarding
MEMWB RegisterRd
unit

forwarding
Modified datapath to resolve hazards via
Fig. 7.10.2 7.10.7 the multiplexers are added to provide
shownin Fig.
path
to data unit.
Compared along with the forwarding
ALU
nputs to the TECHNICAL PUBLICATIONS up-thrust for knowledge an

You might also like