A BASIC MIPS EXECUTION
The core MIPS instruction set includes
The memory-reference instructions load word (lw) and store word (sw)
The arithmetic-logical instructions add, sub, AND, OR, and slt
The instructions branch equal (beq) and jump (j), which we add last
For every instruction, the first two steps are identical:
1. Load the program counter (PC) from the memory that contains the code and fetch
the instruction from that memory.
2. After the instruction is fetched, Read one or two registers, using fields of the
instruction to select the registers to read.
3. After these two steps, the actions required to complete the instruction depend on the
instruction class. all instruction classes, except jump, use the arithmetic-logical
unit (ALU) after reading the registers.
The memory-reference instructions use the ALU for an address calculation, access the
memory either to read data for a load or write data for a store.
the arithmetic-logical instructions for the operation execution, and must write the data
from the ALU or memory back into a register.
branches for comparison. we may need to change the next instruction address based on
the comparison; otherwise, the PC should be incremented by 4 to get the address of the
next instruction.
4. Finally, the result from the ALU or memory is written back into the memory or
register file. Branches require the use of the ALU output to determine the next
instruction address, which comes either from the ALU (where the PC and branch off
set are summed) or from an adder that increments the current PC by 4.
Figure shows data going to a particular unit as coming from two different sources. For
example,
The value written into the PC can come from one of two adders,
The data written into the register file can come from either the ALU or the data memory,
and the second input to the ALU can come from a register or the immediate field of the
instruction.
The selection is commonly done with a device called a multiplexor, although this device
might better be called a data selector.
A control unit, which has the instruction as an input, is used to determine how to set the
control lines for the functional units and two of the multiplexors.
The top multiplexor (“Mux”) controls what value replaces the PC (PC + 4 or the branch
destination address)
The middle multiplexor, whose output returns to the register fi le, is used to steer the
output of the ALU (in the case of an arithmetic-logical instruction) or the output of the data
memory (in the case of a load) for writing into the register file.
Finally, the bottommost multiplexor is used to determine whether the second ALU input
is from the registers (for an arithmetic-logical instruction or a branch) or from the offset
field of the instruction (for a load or store).
Building a Datapath:
In the MIPS implementation, the datapath elements include the instruction and data
memories, the register file, the ALU, and adders.
1. program counter:
Program counter (PC) is a 32-bit register. It holds the address of the current instruction. It
does not need a write control signal.
2. Instruction memory:
Instruction memory is a memory unit that is used to store the instructions of a program.
It get address as input and outputs the instruction which is stored in the given address.
The instruction memory need only provide read access because the datapath does not write
instructions. Since the instruction memory only reads, no write control signal is needed.
3. Adder:
The adder add its two 32-bit inputs and place the sum on its output.
The adder is also used to increment the PC to the address of the next instruction.
It is combinational circuit that can built from the ALU.
To execute any instruction, we must start by fetching the instruction from memory.
To prepare for executing the next instruction, we must also increment the program counter
so that it points at the next instruction, 4 bytes later.
Figure shows how to combine the three elements to form a datapath that fetches
instructions and increments the PC to obtain the address of the next sequential instruction.
The fetched instruction is used by other parts of the datapath.
4. Register file:
The processor’s 32 general-purpose registers are stored in a structure called a register file.
Any register can be read or written by specifying the number of the register in the file.
Read
For each data word to be read from the registers, we need an input to the register file that
specifies the register number to be read and an output from the register file that will carry
the value that has been read from the registers.
Write
To write a data word, we will need two inputs: one to specify the register number to be
written and one to supply the data to be written into the register. The write operation needs
a write control signal.
We need a total of four inputs (three for register numbers and one for data) and two outputs
(both for data). The register number inputs are 5 bits wide to specify one of 32 registers
(32 = 25), whereas the data input and two data output buses are each 32 bits wide.
5. ALU:
ALU, which takes two 32-bit inputs and produces a 32-bit result, as well as a 1-bit signal
if the result is 0.
The operation to be performed by the ALU is controlled with the ALU operation signal,
which will be 4 bits wide. We will use the Zero detection output of the ALU shortly to
implement branches.
6. Data memory:
The memory unit is a state element with two inputs, an address and a write data and a single
output called read data.
There are separate read(MemRead) and write control signals (MemWrite)
7. Sign extended:
It is used to increase the size of a data item by replicating the high-order sign bit of the
original data item in the high order bits of the larger, destination data item.
It is used to sign-extend the 16-bit off set field in the instruction to a 32-bit signed value.
Data path elements for branches:
In MIPS architecture the branch target is given by the sum of the offset filed of the
instruction and the address of the next instruction.
The address specified in a branch, which becomes the new program counter (PC) if the
branch is taken.
The data path for a branch uses the ALU to evaluate the branch condition and a separate
adder to compute the branch target as the sum of the incremented PC and the sign-extended,
lower 16 bits of the instruction (the branch displacement), shifted left 2 bits.
Control logic is used to decide whether the incremented PC or branch target should
replace the PC, based on the Zero output of the ALU.
Branch taken
A branch where the branch condition is satisfied and the program counter (PC) becomes
the branch target. All unconditional jumps are taken branches.
Branch not taken or (untaken branch)
A branch where the branch condition is false and the program counter (PC) becomes
the address of the instruction that sequentially follows the branch.
Control implementation scheme:
ALU control:
The MIPS ALU defines the 6 following combinations of control inputs having 4 bits’
Depending on the instruction class, the ALU will need to perform one of these first five
functions.
For load word and store word instructions, ALU is used to compute the memory
address by addition.
For the R-type instructions, the ALU needs to perform one of the five actions (AND,
OR, subtract, add, or set on less than), depending on the value of the 6-bit funct
(or function) field.
For branch equal, the ALU must perform a subtraction.
We can generate the 4-bit ALU control input using a small control unit that has inputs as
the function field of the instruction and a 2-bit control field, called ALUOp.
The above table shows the ALU control inputs based on the 2-bit ALUOp control and the
6 bit function code. The opcode, listed in the first column, determines the setting of the
ALUOp bits.
ALUOp indicates
whether the operation to be performed:
00- add for loads and stores,
01- subtract for beq,
10- determined by the operation encoded in the funct field.
The output of the ALU control unit is a 4-bit signal that directly controls the ALU by
generating one of the 4-bit combinations.
Designing the Main Control Unit
To understand how to connect the fields of an instruction to the data path, it is useful to
review the formats of the three instruction classes: the R-type, branch, and load-store
instructions.
Figure shows these formats.
R-type instruction:
Instruction format for R-format instructions, which all have an opcode of 0.
These instructions have three register operands: rs, rt, source and rd is the
destination.
The ALU function is in the funct field.
The R-type instructions that we implement are add, sub, AND, OR, and slt.
The shamt field is used only for shifts.
Load and store instruction:
Instruction format for load (opcode = 3510) and store (opcode = 4310) instructions.
The register rs is the base register that is added to the 16-bit address field to form the
memory address.
For loads, rt is the destination register for the loaded value.
For stores, rt is the source register whose value should be stored into memory.
Branch instruction:
Instruction format for branch equal (opcode =4).
The registers rs and rt are the source registers that are compared for equality.
The 16-bit address field is sign-extended, shifted, and added to the PC + 4 to compute
the branch target address.
There are several major observations about this instruction format that we will rely on:
The op field is always contained in bits 31:26.
The two registers to be read are always specified by the rs and rt fields, at positions
25:21 and 20:16. This is true for the R-type instructions, branch equal, and store.
The base register for load and store instructions is always in bit positions 25:21 (rs).
The 16-bit off set for branch equal, load, and store is always in positions 15:0.
The destination register is in one of two places. For a load it is in bit positions 20:16
(rt), while for an R-type instruction it is in bit positions 15:11 (rd).
Thus, we will need to add a multiplexor to select which field of the instruction is used to
indicate the register number to be written.
The following table shows the effect of each of the seven control signals when asserted and
deasserted.
Pipelining:
Pipelining is an implementation technique in which multiple instructions are
overlapped in execution. It exploits parallelism among the instructions in a
sequential instruction stream. It has an important advantage that is basically invisible
to the programmer.
Pipelining increases the number of simultaneously executing instructions and the
rate at which instructions are started and completed. Pipelining does not reduce the
time it takes to complete an individual instruction.
It improves the instruction throughput rather than individual instruction execution
time or latency.
MIPS instructions classically take five steps:
1. Fetch instruction from memory.
2. Read registers while decoding the instruction. The regular format of MIPS
instructions allows reading and decoding to occur simultaneously.
3. Execute the operation or calculate an address.
4. Access an operand in data memory.
5. Write the result into a register.
The above figure shows the execution of instruction nonpipelined (top) and with pipelining
( bottom)
The division of an instruction into five stages means a five-stage pipeline, which in turn
means that up to five instructions will be in execution during any single clock cycle. Thus,
we must separate the data path into five pieces, with each piece named corresponding to a
stage of instruction execution:
1. IF: Instruction fetch
2. ID: Instruction decode and register fi le read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back
Instructions and data move generally from left to right through the five stages as they
complete execution.
Pipelined data path:
The above figure shows the pipelined data path with the pipeline registers highlighted. All
instructions advance during each clock cycle from one pipeline register to the next.
The registers are named for the two stages separated by that register. For example, the
pipeline register between the IF and ID stages is called IF/ID. There is no pipeline register
at the end of the write-back stage.
The pipeline registers, separate each pipeline stage. The registers must be wide enough to
store all the data corresponding to the lines that go through them.
For example, the IF/ID register must be 64 bits wide, because it must hold both the 32-bit
instruction fetched from memory and the incremented 32-bit PC address.
Five stages of load instruction:
The five stages of load instruction are the following:
1. Instruction fetch:
The instruction is read from memory using the address in the PC and then being
placed in the IF/ID pipeline register.
The PC address is incremented by 4 and then written back into the PC to be ready
for the
next clock cycle. This incremented address is also saved in the IF/ID pipeline register
in case it is needed later for an instruction, such as beq.
2. Instruction decode and register file read:
The IF/ID pipeline register supplying the 16-bit immediate field, which is sign-
extended to 32 bits, and the register numbers to read the two registers.
All three values are stored in the ID/EX pipeline register, along with the incremented
PC address.
3. Execute or address calculation:
The load instruction reads the contents of register 1 and the sign-extended immediate
from the ID/EX pipeline register and adds them using the ALU.
That sum is placed in the EX/MEM pipeline register.
4. Memory access:
The load instruction reading the data memory using the address from the EX/MEM
pipeline register and loading the data into the MEM/WB pipeline register.
5. Write-back:
The final step is reading the data from the MEM/WB pipeline register and writing it into
the register file in the middle of the figure.
To aid understanding, of pipeline figures: multiple-clock-cycle pipeline diagrams.
For example, consider the following five-instruction sequence:
lw $10, 20($1)
sub $11, $2, $3
add $12, $3, $4
lw $13, 24($1)
add $14, $5, $6 Ø
Figure shows the multiple-clock-cycle pipeline diagram for these instructions.
Pipelined Control
Just as we add control to the pipelined data path. We can divide the control lines into five
groups according to the pipeline stages.
1. Instruction fetch: The control signals to read instruction memory and to write the PC
are always asserted, so there is nothing special to control in this pipeline stage.
2. Instruction decode/register file read: As in the previous stage, the same thing happens at
every clock cycle, so there are no optional control lines to set.
3. Execution/address calculation: The signals to be set are RegDst, ALUOp, and ALUSrc.
4. Memory access: The control lines set in this stage are Branch, MemRead, and
MemWrite. The branch equal, load, and store instructions set these signals, respectively.
5. Write-back: The two control lines are MemtoReg, which decides between sending the
ALU result or the memory value to the register file, and Reg- Write, which writes the
chosen value.
Figure shows the control lines for the final three stages.
Four of the nine control lines are used in the EX phase,
Five control lines passed on to the EX/MEM pipeline register extended to hold the control
lines;
Three are used during the MEM stage, and the
last Two are passed to MEM/ WB for use in the WB stage.