Computer Organization With MIPS
Computer Organization With MIPS
7-24-2018
DOI: 10.31986/issn.2689-0690_rdw.oer.1008
Let us know how access to this document benefits you - share your thoughts on our feedback
form.
Recommended Citation
Bergmann, Seth D., "Computer Organization with MIPS" (2018). Open Educational Resources. 9.
https://rdw.rowan.edu/oer/9
This Book is brought to you for free and open access by Rowan Digital Works. It has been accepted for inclusion in
Open Educational Resources by an authorized administrator of Rowan Digital Works.
Computer Organization with MIPS
Seth D. Bergmann
June 2, 2023
2
Preface
i
ii PREFACE
Secondary Authors
Contributors
Technical Consultant
Joshua Grochowski, Rowan University
Contents
Preface i
2 Number Systems 9
2.1 Base Two - Binary . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Binary Arithmetic . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Base 8 - Octal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Base 16 - Hexadecimal . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Hexadecimal Values in the MIPS Architecture . . . . . . 16
2.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Twos Complement Representation . . . . . . . . . . . . . . . . . 17
2.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Powers of Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Arithmetic With Powers of Two . . . . . . . . . . . . . . 23
2.5.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
iii
iv CONTENTS
Glossary 348
In driver education classes students are taught not only how to drive and the
rules of the road, but they are also taught some fundamentals of the inner
workings of the car - the four cycle engine, the distributor, the electrical system,
etc. Strictly speaking it is not necessary to know these things in order to drive
the car, but they are generally considered important enough for every driver to
have a rudimentary understanding. When something fails, a mechanic may not
be immediately available, and the driver who has some knowledge of what is
under the hood will be better prepared to deal with the problem than the driver
who is clueless.
Some computer scientists work with computer hardware (often in conjunc-
tion with software), but many work exclusively with software. Like automobile
drivers they will be better prepared to deal with failures if they have some
knowledge of what is ‘under the hood’. In addition software developers who are
hardware-savvy can produce more efficient software than those who are not.
For these reasons most computer science curricula include at least one hard-
ware course. This book includes topics such as CPU design, datapath, the
memory hierarchy, and assembly language. All of these are essential to a broad
understanding of computer organization and design. A discussion of the impor-
tance of this subject to software professionals can be found in the Kode Vicious
column by George Neville-Neil, in the March/April 2021 issue of ACM Queue.
As a prototypical example of a computer, we use the MIPS 1 architecture.
This architecture is complex enough that it is used in some real devices, yet it
is simple enough to be understood and programmed by novices.
The computer is but one example of a digital device. By this we mean that
at its most fundamental level it stores and works with binary values - zeros
and ones. There is no other value in a digital device; all other information
1 Microprocessor without Interlocked Pipeline Stages. MIPS has been used primarily in
1
2 CHAPTER 1. COMPUTERS AND COMPUTER PROGRAMS
Registers
The CPU registers are storage elements with fast access time. The time to
access the contents of a value in the computer’s memory can be over 1000
times slower than the time to access a CPU register. A register consists of a
fixed number of bits, usually 32 or 64. In the MIPS architecture which we will
be studying there are 32 bits in a register, and there are 32 general purpose
registers. These registers can store intermediate results from arithmetic and
logical computations, for which the operands must also be stored in registers.
They can also be used to move information from one location in memory to
another, and to store memory addresses (see the section on memory below).
In this book we will be diagramming registers as shown in Fig 1.1 Each
register is designated by a unique number: 0..31 and stores 32 bits. To save
space on the page only the first 8 of 32 general registers are shown. In the
MIPS architecture register 0 will always contain all zeros. The other registers
in Fig 1.1 contain randomly selected values with no particular intent or purpose.
We will be using these diagrams to explain the various operations which the CPU
is capable of performing, by showing the contents of registers before and after
the operation is performed.
1.1. HARDWARE COMPONENTS 3
0 00000000000000000000000000000000
1 11011000010101010100010101010100
2 00010101000111101010101010010101
3 11010000011100010010101010101010
4 00000000000000001111111111111111
5 11111111111111110000000000000000
6 00000000000000000000000000000001
7 00000000000000000000000000000000
Figure 1.1: Possible values for 8 of the 32 CPU registers in the MIPS architecture
Program Counter
The 32 general purpose registers described above may be referred to as pro-
grammable registers, i.e. the values which they contain can be explicitly altered
at the programmer’s discretion. There are other registers in the CPU which are
necessary for the correct sequence of operations to take place. One such reg-
ister is called the program counter register (PC). It contains the location (i.e.
memory address) of the next instruction to be executed.
Datapath
The datapath is the name which we give to the components in the CPU which
enable data to move between memory and the registers. The datapath also
contains hardware which can execute fundamental arithmetic and logical oper-
ations. The following components are included in the datapath: the registers,
the memory, the arithmetic/logic unit (ALU), the PC, the control unit, and the
connections necessary for these components to work together.
1.1.2 Memory
Closely associated with the CPU is the memory, also known as main memory
or random access memory (RAM). The memory stores data which is needed
for CPU operations. For example, if a program is working with an array of
numbers, those numbers would be stored in memory, where the CPU would
have immediate access to them. The instructions making up a program, coded
in binary, are also stored in memory. Each memory location has a unique
address, much like the houses on a street have unique, sequential, addresses.
When the CPU needs to access a particular memory value, it uses the address
of that value to access it.
The bits (binary digits) of memory are normally viewed in groups of 8 bits.
Each 8-bit group is called a byte. In the MIPS architecture which we study in
this book, each byte of memory has a unique address; we say the memory is
byte addressable. Recalling that registers are 32 bits, every 4 bytes constitute
a full word of memory; we say that the word size for the MIPS architecture is 4
bytes, or 32 bits. This means that calculations and memory access are normally
4 CHAPTER 1. COMPUTERS AND COMPUTER PROGRAMS
... 00000000 00000000 00000000 00000000 11011000 01010101 01000101 01010100 ...
4024 4028
Figure 1.2: A portion of the MIPS memory, showing word addresses
done with 32-bit values. Fig 1.2 is a diagram of the MIPS memory structure.
This diagram shows only two words of memory, each with its own address. The
32 bits in each word are shown with a space between bytes to show the 4 bytes
in a word (in an actual memory there is no such space). Since there are 4 bytes
in a word, and the memory is byte addressable, the word addresses increase by
4 (from 4024 to 4028 in this example).
SystemUnit
Non-volatile
CPU Memory
Storage
2 Technically, the display is used for both input and output, because the display can send
1.6 Exercises
1. What is a computer program?
2. (a) What does CPU stand for?
(b) What are the two primary purposes of the CPU?
3. In the CPU where are intermediate results of calculations stored?
4. In the MIPS archtitecture:
(a) How many registers are there?
(b) What is the size (in bits) of each register?
5. What is the purpose of the Program Counter (PC) register?
6. In the MIPS architecture:
(a) How many bits are in a byte?
(b) How many bytes are in a word?
(c) How many bits are in a word?
7. (a) If the address of a particular byte in memory is 4321, what is the
address of the next byte?
8 CHAPTER 1. COMPUTERS AND COMPUTER PROGRAMS
11. Use wikipedia to find out what is meant by open source versus proprietary
software. What are their relative advantages and disadvantages?
Chapter 2
Number Systems
1 9 4 0 3
100 = 1
101 = 10
102 = 100
103 = 1, 000
104 = 10, 000
9
10 CHAPTER 2. NUMBER SYSTEMS
1 0 0 1 1
20 = 1
21 = 2
22 = 4
23 = 8
24 = 16
1012 = 5
01012 = 5
101012 = 21
100002 = 16
11112 = 15
1000000000002 = 4096
1000000000012 = 4097
111111111112 = 4095
sum is the value of the binary number. Other examples of binary numbers are
shown in Fig 2.3.
1 1 1
0 0 1 0 1 0 1 1 = 43 0 0 1 0 1 0 1 1 = 43
+ 0 0 0 0 1 1 1 0 = 14 + 0 0 0 0 1 1 1 0 = 14
----------------------- ---------------------
0 0 1 1 1 0 0 1 = 57 0 0 1 1 1 0 0 1 = 57
(a) (b)
Figure 2.4: (a) Addition of 43 + 14 in binary using 8-bit values and (b) The
same operation showing carry bits
2.1. BASE TWO - BINARY 11
0 1 0 0 0 1 1 0 = 70
- 0 0 0 0 1 1 0 1 = 13
-----------------------
0 0 1 1 1 0 0 1 = 57
0 1 1 10 0 10
0 61 60 60 60 1 61 60 = 70
- 0 0 0 0 1 1 0 1 = 13
0 0 1 1 1 0 0 1 = 57
(low-order) 1 is written and the (high-order) 1 is a carry into the next column.
Subtraction is similar to addition. When attempting to subtract 02 − 12 we
will need to borrow a 1 from the its (high-order) neighbor. If that neighbor is a
0, it will become 1 by borrowing from its neighbor, and so on. An example of
a binary subtraction is shown in Fig 2.5 in which we subtract 70 - 13. Fig 2.6
shows the same operation, with the borrow digits at the top. In our example we
are subtracting a small number from a larger number, ensuring that we get a
positive result. If we were to subtract a large number from a smaller number, the
result would be negative. This implies that we need a way to represent negative
numbers, which is described in the section on Twos Complement Representation.
2.1.2 Exercises
1. Show each of the following numbers as an 8-bit binary value: 15, 3, 0, 64,
63, 127
2. Show the following numbers in binary using only as many bits as are
needed: 15, 3, 0, 128, 255, 256
4. Show how to do the following operations in binary, using 8-bit words (show
the carry bits for additions as shown in Fig 2.4(b) and the borrows for
subtractions as shown in Fig 2.6): 12+3, 64+64, 64+63, 63+63, 12-4,
17-3, 128-127
5. Read parts (a) and (b) aloud so that they make sense.
(a) There are 10 kinds of people in the world: those who know binary
and those who do not.
12 CHAPTER 2. NUMBER SYSTEMS
1 0 3 7
80 = 1
81 = 8
82 = 64
83 = 512
238 = 19
2058 = 69
10008 = 512
30128 = 1546
10018 = 513
7778 = 511
(b) There are 10 kinds of people in the world: those who know base 3,
those who do not know base 3, and those who do not know what I’m
talking about.
(c) Make up a statement similar to the ones in parts (a) and (b) above,
using a base in the range [4..9].
6. Show how to count from 0 to 31 using only the fingers on one hand (try
not to offend anyone when you get to 4).
octal binary
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
the number of bits provided is 16 (not a multiple of 3). This means we have
one bit ‘left over’. It must be the left-most bit, not the right-most bit, in order
that the octal result represents the same number as the given binary value.
Conversely, we have a more common situation: we are given the octal repre-
sentation of an n-bit field. If n is not a multiple of 3, the high order octal digit
does not represent 3 bits. For example, if we are describing a 10-bit field in octal
as 12348, then the 10-bit field is 1 010 011 100 = 1010011100. The high order
(leftmost) octal digit represents just one bit. Another example, if describing
a 5-bit field as 328 , the 5-bit field must be 11 010 = 11010. In this case the
high order octal digit represents just 2 bits. It will be important to remember
this when dealing with the MIPS architecture in which the word size is 32 bits
(not a multiple of 3), and many of the field widths in MIPS instructions are not
multiples of 3.
2.2.1 Exercises
1. Show each of the following decimal numbers in base 8, using only as many
octal digits as are necessary: 7, 9, 23, 100, 511, 512
2. Show each of the following octal numbers in decimal (base 10): 128 , 328 ,
778 , 7778 , 10008, 10108
3. Show each of the following binary values in base 8, using only as many octal
digits as are necessary: 1112 , 1102 , 1000000002, 1000000012, 1111111112,
101010112, 11111112
Hint: There is no need to convert to decimal.
14 CHAPTER 2. NUMBER SYSTEMS
Figure 2.11: Base 16: Each hexadecimal digit represents 4 bits. The 16 hex
numerals have values ranging from 0 to 15.
4. Show each of the following octal values in binary (base 2): 108 , 378 , 738 ,
2348 , 71508
Hint: There is no need to convert to decimal.
5. An 8-bit field is storing the value 101010112. Show the 8-bit field in octal,
using no more digits than are necessary.
6. What is the largest (decimal) value that can be represented with 4 octal
digits?
2 1 3
160 = 1
161 = 16
162 = 256
Figure 2.12: The hexadecimal (base 16) representation of 541 (541 = 512 + 16
+ 3)
a316 = 10 · 16 + 3 = 163
20d16 = 2 · 162 + 0 · 16 + 13 = 512 + 0 + 13 = 525
100016 = 1 · 163 + 0 · 162 + 0 · 16 + 0 = 4096
301216 = 3 · 163 + 0 · 162 + 1 · 16 + 2 = 12288 + 16 + 2 = 12306
100116 = 1 · 163 + 0 · 162 + 0 · 16 + 1 = 4097
f f f16 = 15 · 162 + 15 · 16 + 15 = 4095
Note that each hexadecimal digit represents 4 bits, thus providing a some-
what more efficient representation for long bit strings. Fig 2.14 shows some
examples of bit strings which can be represented much more easily in hexadec-
imal. Note in the last line of Fig 2.14 that the number of bits (19) is not a
multiple of 4. We have 3 bits left over. As in the case with octal numbers, these
left over bits must be the high order (leftmost) bits, in order for the number
represented by the hex digits to be equal to the number represented by the given
binary digits.
We will often show hexadecimal values with a subscript of ’x’ instead of 16
to indicate base 16. 321x = 32116 = 801. When using the MARS software,
there are no subscripts, so base 16 constants will be designated with a prefix of
’0x’. 0x321 = 32116 = 801. The student may often see numbers written without
any base indicated. These are usually intended to be base 10, but at times the
base is evident from the context. For example, the 6-bit opcode is 101001 is
obviously binary, and the result in register 2 is 4a56bf0f is obviously base 16.
To describe each field separately we would need two hex digits for each field,
whether it represents 5 or 6 bits (for a 6-bit field, the high order hex digit
represents two bits, and for a 5-bit field the high order digit represents just one
bit):
29 11 00 1f 08 07
To describe the entire word in hexadecimal, we need to regroup the 32 bits
into fields of 4 bits each:
1010.01 10.001 0.0000.1111.1 010.00 00.0111
1010 0110 0010 0000 1111 1010 0000 0111
This can now be shown in hexadecimal by substituting the correct hex digit
for each group of 4 bits:
a620 fa07
2.3.2 Exercises
1. Show the following decimal values in hexadecimal (base 16): 13, 25, 170,
4095, 4096
2. Show the following hexadecimal values in decimal (base 10): 1216 , 2016 ,
2e16 , f f16 , 10016 , abc16 , f f f16 , 100016
6. What is the largest value (in decimal) which can be represented with 4
hexadecimal digits?
7. A MIPS register contains the value 0xab3c401f. Show the 32 bits stored
in that register.
0110 = +6
+ 1111 = -1
-----------
0101 = +5
Students often ask, given a binary value, how do you know whether it is
intended to be twos complement representation? For example, does the binary
value 1100 represent 12 or -4? The answer is that given no other information
about this binary value, it is impossible to know what it is supposed to represent.
If you are told that it is two’s complement representation, then you know it
represents -4. But if you are told that it is unsigned, then you know it represents
12. You will see this concept again when we look at the instructions in the
MIPS architecture. There are two add instructions, one of which is called add
unsigned. The first add instruction assumes twos complement representation,
and the add unsigned instruction assumes all values are non-negative.
How can we negate a value in twos complement representation? Here are
three fairly easy algorithms for negating (or complementing) a number in binary:
• Subtract from 0. 0 - x = -x
• 1. Change all zeros to ones, and change all ones to zeros (this is called
the ones complement ).
2. Add 1
0000 = 0
- 0100 = +4
-----------
1100 = -4
1100
00
We then copy the first 1 digit:
1100
100
Finally we complement the remaining digits:
1100
0100 = +4
In summary, we have seen that negative as well as positive numbers can be
represented using the twos complement representation. This scheme allows for
easy implemenation of addition and subtraction, and it is used by virually every
chip maker in the world.
2.4.1 Exercises
1. Show the following numbers using 8-bit twos complement representation:
+6, -1, -2, -6, +22, -15, +127, -127, -128
2. (a) What are the largest and smallest numbers which can be represented
using 8-bit twos complement representation?
(b) What are the largest and smallest numbers which can be represented
using an n-bit twos complement representation?
3. Show each of the following in twos complement representation, using only
as many bits as are necessary: 15, 23, -15, -23, 2, 1, 0, -1, -2, 511, 512,
-512
Hint: A twos complement number is negative if and only if the high order
bit is 1.
4. Show the decimal value of each of the following assuming (a) unsigned (b)
twos complement representation:
01112
11112
01012
10102
0112
112
111111112
111111102
110101012
102
12
012
2.5. POWERS OF TWO 21
5. Use any of the three algorithms given to negate each of the following,
showing the solution in binary (binary numbers are twos complement rep-
resentation).
+75
-76
+15
11112
010002
100012
11112
Hint: See Fig 2.15. Use a java compiler to check your solutions if you are
not sure.
letters for powers of ten. Thus 1K = 210 = 1024 but 1k = 103 = 1000.
22 CHAPTER 2. NUMBER SYSTEMS
n 2n
0 1
1 2
2 4
3 8
4 16
5 32
6 64
7 128
8 256
9 512
10 1024
213 = 8 · 210 = 8K
219 = 512 · 210 = 512K
You may recall the following properties of exponents from your math classes:
• xy · xz = xy+z
• xy /xz = xy−z
z
• xy = xy·z
These properties will make arithmetic with powers of two much easier:
• 22 · 23 = 22+3 = 25 = 32
• 29 /23 = 29−3 = 26 = 64
3
• 22 = 22·3 = 26 = 64
We will be working extensively with powers of two in chapter 8 and will find it
much easier using what we have learned here.
2.5.2 Exercises
n 2n
3
7
11
15
20
24
36
48
16
512
8K
128K
1M
64M
4G
32G
32T
512T
2. Evaluate each of the following:
(a) 4K * 32K
(b) 16M * 16M
(c) 32M * 64G / 2T
(d) 16T * 32G * 128M / 4T / 8T
(e) (32K)3
Hint: See the identities in this section, and use powers of two.
3. Use the definitions provided in this section.
(a) If there are 128K protozoa in a liter of pond water, and there are 4M
liters of water in the pond, how many protozoa are in the pond?
(b) If a ROM (read-only memory) consists of 4G bits, and there are 8
hits in a byte, how many bytes are in the ROM?
Chapter 3
25
26 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
3.1.1 Exercises
1. Briefly describe the purpose of each of the following general registers
(a) $a2
(b) $v1
(c) $ra
(d) $29
• The optional label will be discussed in the section on branches and jumps
(transfer of control)
• The mnemonic is the operation to be performed; examples are add (for
addition) and sub (for subtraction). The word ‘mnemonic’ means ‘to re-
member’; mnemonics are easier to remember than their machine language
equivalents (binary operation codes).
• There may be 0, 1, 2, or 3 operands separated by commas. At this point
the operands will simply be general registers. Later in this chapter we will
cover symbolic memory references as operands.
• The operands may be followed by an optional comment. Comments begin
with a # and extend to the end of the line. There are no multi-line com-
ments. Comments are strictly for the programmers use and are ignored by
the assembler. Comments are especially important in assembly language,
which is typically more difficult to read and understand than a high level
language. Comments may also appear on a line with no statements, but
must always begin with a # character.
In the example above the mnemonic is add, and the three operands are $t0,
$t1, and $zero. That statement has a comment, but no label.
3.2.1 Exercises
1. A comment
(a) must begin with what character?
(b) has what use in a statement?
(c) is always required: true or false.
(d) is terminated with what character?
2. How many operands may an assembly language statement have?
3. What is a mnemonic?
(a)
[label:] add $rd, $rs, $rt [# comment]
(b)
Reg[$rd] ← Reg[$rs] + Reg[$rt]
(c)
(d)
Figure 3.3: Add Statement: (a) Format (b) Meaning (c) Example, which puts
the sum of registers $t3 and $a0 into register $s0. (d) Example which doubles
the value stored in register $t0.
• (a) The general format of an add statement. It must have three operands,
all of which must be general registers. It may have an optional label and
an optional comment.
• (b) The meaning of the add statement. The first operand is the destination
register ($rd) and specifies which register is to receive the result of the
addition. The second and third operands ($rs and $rt) are the registers
which contain the values to be added. The notation Reg[$reg] means to
select the general register with name reg.
• (c) An example of an add statement which adds the values stored in reg-
isters $t3 and $a0, and puts the sum into register $s0.
• (d) An example of an add statement which adds the value stored in register
$t0 to itself, putting the sum back into register $t0, effectively doubling
the value in register $t0. This example shows that the operand registers
need not be different registers.
3.3. ARITHMETIC INSTRUCTIONS 29
(a)
[label:] sub $rd, $rs, $rt [# comment]
(b)
Reg[$rd] ← Reg[$rs] − Reg[$rt]
(c)
(d)
Figure 3.4: Subtract Statement: (a) Format (b) Meaning (c) Example, which
puts the value of register $a0 subtracted from register $t3 into register $s0. (d)
Example which puts the value 0 into register $t0
.
$a0 = 00 00 00 05
$a1 = 00 00 00 11
$a2 = 00 00 00 1e
$v0 = ?? ?? ?? ??
Figure 3.5: Example to calculate 5 + 17 - 30, leaving the result in register $v0.
The contents of the relevant registers are shown in hexadecimal before and after
each instruction is executed.
• It is ok to store a new value in register $v0; the current value is not needed.
We do not assume that register $v0 contains 0. The add instruction will
overwrite any value that is in register $v0, so its initial contents is irrelevant
here.
In Fig 3.5 the add instruction will add the contents of register $a0 (5) to
the contents of register $a1 (17) and store the result (22) in register $v0. The
sub instruction will subtract the contents of register $a2 (30) from the value in
register $v0 (22) and store the result (-8) in register $v0.
chapter 4
3 There is another instruction, sltu, which performs an unsigned comparison.
4 the li instruction loads a constant value into a register
3.3. ARITHMETIC INSTRUCTIONS 31
(a)
[label:] slt $rd, $rs, $rt [# comment]
(b)
Reg[$rd] ← 1 if Reg[$rs] < Reg[$rt]
Reg[$rd] ← 0 if Reg[$rs] ≥ Reg[$rt]
(c)
Figure 3.6: Set If Less Than Statement: (a) Format (b) Meaning (c) Example,
which stores 1 in register $s0 if register $t3 is less than register $a0, and which
clears register $s0 if register $t3 is not less than register $a0
li $t0, 5
li $t1, -7
slt $t2, $t0, $t1 # compare $t0 with $t1
slt $t3, $t1, $t0 # compare $t1 with $t0
slt $t4, $t0, $t0 # compare $0 with itself
3.3.5 Exercises
1. Show a diagram similar to Fig 3.5 for the following sequence of instruc-
tions:
Assume that register $a0 initially contains +37 and that register $a1 con-
tains -12. All other registers contain garbage.
$t0 = ?? ?? ?? ??
$t1 = ?? ?? ?? ??
$t2 = ?? ?? ?? ??
$t3 = ?? ?? ?? ??
li $t0,5
$t0 = 00 00 00 05
li $t1,-7
$t1 = ff ff ff f9
slt $t2,$t0,$t1
$t2 = 00 00 00 00
slt $t3,$t1,$t0
$t3 = 00 00 00 01
Figure 3.7: Trace of a program which uses slt to compare register values.
5. Show a program trace, similar to Fig 3.7, for the following sequence of
instructions:
li $t0, -13
slt $v0, $0, $t0
slt $v1, $t0, $v0
• The AND operation results in true only if both operands are true. We
use the notation ∧ for the AND operator (many logic textbooks use the ·
symbol).
• The OR operation results in false only if both operands are false. This
operation is sometimes called INCLUSIVE OR, to distinguish it from
EXCLUSIVE OR. We use the notation ∨ for the OR operator (many
logic textbooks use the + symbol).
• The NOT operation has only one operand (i.e. it is a unary operation).
Its result is the complement of its operand. We use the notation ∼ for
the NOT operator (many logic textbooks use x′ or x to designate NOT
x). For example, ∼ true is f alse and ∼ f alse is true .
covered in chapter 4
34 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
The first two identities in the last row of the table are known as deMorgan’s
Laws. The identity y ⊕x⊕y = x is used extensively in private key cryptography.
Each of these identities can be proven with a simple truth table, in which
we show that the identity holds for every possible value of the variables. As an
example, Fig 3.11 shows a proof of deMorgan’s first law.
In computer architecture, or logic design, the binary value 0 correspoonds
to false, and 1 corresponds to true. In what follows we will make use of this.
x y ∼ (x ∧ y) (∼ x) ∨ (∼ y)
false false true true
false true true true
true false true true
true true false false
$a0 = 00 00 00 05
$a1 = 00 00 00 0c
$v0 = ?? ?? ?? ??
and $v0,$a0,$a1
$v0 = 00 00 00 04
or $v0,$a0,$a1
$v0 = 00 00 00 0d
xor $v0,$a0,$a1
$v0 = 00 00 00 09
not $v0,$a0
$v0 = ff ff ff fa
Thus, for example, the and instruction will perform the logical AND operation
on corresponding bits of the operand registers, comprising a total of 32 AND
operations, with 32 results.
Some examples of logical instructions are shown in Fig 3.13, in which the
contents of registers are shown before and after an instruction is executed.
To understand Fig 3.13, we must view the values in binary. The and instruc-
tion will perform the logical AND operation on all 32 bits of those two registers,
putting the result into register $v0, as shown below (recalling that 0 represents
false, and 1 represents true):
The logical OR, XOR, and NOT operations from Fig 3.13 are shown below:
36 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
NOT 0000 0000 0000 0000 0000 0000 0000 0101 = 0x00000005
----------------------------------------------------
1111 1111 1111 1111 1111 1111 1111 1010 = 0xfffffffa
Masks
We conclude this section with some useful examples of logical instructions. The
first such example is called a mask. A masks can be used to change or sense
individual bits of a register, while leaving other bits unchanged (or unsensed).
The first example of a mask will use the and in-
$a0 = 00 ff ff ff struction. Recall from Fig 3.10 that x ∧ 0 = 0 and
that x ∧ 1 = x. Now suppose that we would like to
$a1 = fe dc ba 98
force the high order byte of a register to all zeros,
and $a1,$a0,$a1 leaving the low order 3 bytes unchanged. We would
use a mask of 0x00ffffff. This is shown in Fig 3.14
$a1 = 00 dc ba 98 in which it is assumed that the appropriate mask,
Figure 3.14: Example 0x00ffffff, has been loaded into register $a0 (we will
of a mask (in register see how this can be done in the next section), and
$a0), to clear the high the high order byte of register $a1 is to be set to all
order byte of a register zeros.
($a1) We can also use a mask with an or instruction
to set certain bits to 1. In this case we rely on two
identities from Fig 3.10: x ∨ 0 = x and x ∨ 1 = 1.
Fig 3.15 shows an example where a negative number (-53) has been (somehow)
loaded into the low order two bytes of register $a1. In order for this to be a
valid 32-bit negative number, the high order two bytes must be set to all ones,
leaving the low order two bytes unchanged. This can be done with a mask of
0xffff0000 (we assume this value has been loaded into register $a0).
We can also use a mask with an xor instruction to complement certain bits.
In this case we rely on two identities from Fig 3.10: x ⊕ 0 = x and x ⊕ 1 =∼ x.
Fig 3.16 shows an example in which we are interested in complementing
alternate bits in a register. The value 0xff009876 has (somewhow) been loaded
into register $a1. In binary this is
3.4. LOGICAL INSTRUCTIONS 37
$a0 = ff ff 00 00
$a1 = 00 00 ff cb
or $a1,$a0,$a1
$a1 = ff ff ff cb
Figure 3.15: Example of a mask (in register $a0), to set the high order 2 bytes
of a register ($a1)
$a0 = 55 55 55 55
$a1 = ff 00 98 76
xor $a1,$a0,$a1
$a1 = aa 55 cd 23
Figure 3.16: Example of a mask (in register $a0), to complement alternate bits
in a register ($a1)
3.4.3 Exercises
1. Find the value of each of the following expressions:
Bob wishes to send an 8-bit message to Alice so that she and only she
will be able to read it. He will encrypt the message by applying a bitwise
XOR operation with a secret 8-bit key.
(a) Show how Bob can encrypt the message m = 01101100 by applying
the XOR operation with the secret key, k = 11010001. Show the
ciphertext which he sends to Alice.
(b) Show how Alice can decrypt the ciphertext to obtain the original
message, m, using the same secret key, k.
6. Show a diagram similar to Fig 3.13 For the following sequence of instruc-
tions. Assume that register $a0 contains 0x0011abcd and that register
$a1 contains 0xffab0123.
7. Show how the contents of $a0 can be copied into $v0 using:
(a) An and instruction
(b) An or instruction
(c) An xor instruction
8. Show an instruction which will put the value -1 into register $v0, using
only one (R format) instruction covered in this section.
9. In each of the following show the value of the mask in hexadecimal, and
the instruction which will accomplish the given task. Assume the bits in a
register are numbered, with the low order bit as bit 0, and the high order
bit as bit 31.
(a) Clear bits 0,1, and 31 of register $t0 using a mask in register $a1.
(b) Set bits 6, 7, 9, 12 of register $a0 using a mask in register $a1.
(c) Complement bits 0,1,2,3,28,29,30,31 of register $t0 using a mask in
register $t1.
10. An iPod control system uses a 32-bit word in register $a0 to determine,
and change, the state of the iPod according to the following table:
bit number state
0 playing
1 paused
Only one of these bits should be set at any time.
2 searching
3 stopped
4..31 [unused]
For example, if bit 2 is set, the iPod is searching.
3.5. SHIFT INSTRUCTIONS 39
source ? 1 0 0 1 1 0 1
target 1 0 0 1 1 0 1 0 0
(a) Registers $t0, $t1, $t2, $t3 are to be used as masks to control the 4
states. Show the values of these registers.
(b) Using your response to part (a), show an instruction which will put a
0 into register $v0 if the iPod is not in the searching state, and some
non-zero value into register $v0 if it is in the searching state.
(c) Show instruction(s) which will put the iPod into the stopped state,
regardless of the state it is currently in.
(a) Show an instruction which will encrypt the value in register $a0,
putting the result into register $v0.
(b) Show an instruction which will decrypt the value in register $v0,
putting the result into register $v1.
Hint: A previous exercise in this section described how this can be done.
$a0 = 00 00 00 05
$v0 = ?? ?? ?? ??
sll $v0,$a0,1
$v0 = 00 00 00 0a
sll $v0,$a0,24
$v0 = 05 00 00 00
sll $v0,$a0,32
$v0 = 00 00 00 00
Figure 3.18: Example showing three left shift operations; shift 1 bit position,
24 positions, and 32 positions.
1 0 1 1 0 0 1 ? source
0 0 1 0 1 1 0 0 1 target
$rd is the destination, or target, register. $rt is the source register, and shamt is
the shiftamount, or number of bits to be shifted. Examples of shift instructions
are shown in Fig 3.18.
A logical right shift goes in the other direction, as shown in Fig 3.19. A zero
is shifted in at the high order bit, and the low order bit of the source register
(shown with a ?) is ignored. The format of a logical right shift instructions is
srl $rd, $rt, shamt
Examples of logical right shift instructions are shown in Fig 3.20, in which we
shift by 1, 8, and 32 bits, respectively.
$t3 = f3 00 00 05
$v0 = ?? ?? ?? ??
srl $v0,$t3,1
$v0 = 79 80 00 02
srl $v0,$t3,8
$v0 = 00 f3 00 00
srl $v0,$t3,32
$v0 = 00 00 00 00
Figure 3.20: Example showing three right shift instructions; shift 1 bit position,
8 positions, and 32 positions.
0 0 1 1 0 0 1 ? source
0 0 0 1 1 0 0 1 target
1 0 1 1 0 0 1 ? source
1 1 0 1 1 0 0 1 target
1101 = 13
11010 = 2 * 13 = 26
110100 = 4 * 13 = 52
1101000 = 8 * 13 = 104
00011010 = 26
00001101 = 26 / 2 = 13
00000110 = 26 / 4 = 6
00000011 = 26 / 8 = 3
The result of the shift provides the quotient only (no remainder). When
shifting right we may wish to use an arithmetic shift to preserve the sign of the
number7 :
11110100 = -12
11111010 = -12 / 2 = -6
11111101 = -12 / 4 = -3
3.5.4 Exercises
1. Show a diagram similar to Fig 3.18 for the following sequence of instruc-
tions. Assume that register $t2 initially contains 80a3001f16 and that
register $t3 initially contains ffffff0b16.
7 Caution: When shifting a negative number right, we get a valid division by a power of 2
2. In each case, show an instruction which will accomplish the given task:
(a) Multiply the contents of register $a0 by 2, leaving the result in reg-
ister $t0.
(b) Multiply the contents of register $a3 by 128, leaving the result in
register $a3.
(c) Divide the (unsigned) contents of register $a3 by 1024, leaving the
result in register $v0.
3. (a) If register $t0 contains -1, what value is left in that register after the
following instruction has executed?
srl $t0, $t0, 31
Show your solution in hexadecimal.
(b) If register $t0 contains -1, what value is left in that register after the
following instruction has executed?
sll $t0, $t0, 31
Show your solution in hexadecimal.
4. Show a diagram similar to Fig 3.18 for the following sequence of instruc-
tions. Assume that register $a0 initially contains 0000001116 = 17 and
register $a1 initially contains ffffffef16 = -17.
5. Does an arithmetic right shift always yield a correct quotient for a division
by a power of 2?
(a)
[label:] addi $rt, $rs, constant [# comment]
(b)
Reg[$rt] ← Reg[$rs] + constant
(c)
(d)
Figure 3.23: Add Immediate Instruction: (a) Format (b) Meaning (c) Example,
which puts the sum of register $t3 and 17 into register $s0. (d) Example which
puts the value -8 in register $t0.
This works because the hardware extends the sign of the immediate field in the
addi instruction to a full 32 bits.
Pseudo operations
Before introducing any more immediate format instructions, we digress briefly
to introduce pseudo operations. Strictly speaking, these are not part of the
MIPS instruction set architecture; however, they are permitted by the assem-
bler, which translates them into actual instructions. A simple example of a
pseudo-op is load immediate, for which the mnemonic is li. The purpose of
li is simply to load a constant value into a register. The word load generally
3.6. IMMEDIATE INSTRUCTIONS 45
(a)
[label:] li $rd, constant [# comment]
(b)
Reg[$rd] ← constant
(c)
(d)
li $t0, -1 # $t0 = -1
(e)
Figure 3.24: Load Immediate Instruction: (a) Format (b) Meaning (c) Example,
which puts 1023 into register $v0 (d) Example which puts -1 in register $t0 (e)
addi instruction which is equivalent to example (d)
means to move data into a register, overwriting the data previously stored in
the register. The format and meaning of an li pseudo-op is shown in Fig 3.24.
The assembler will translate an li instruction into an equivalent addi in-
struction, which makes use of the fact that register $0 always contains 0, as
shown in part(e) of Fig 3.24.
Another useful pseudo-op is the move instruction, which will copy the con-
tents of one register into another register. The format and meaning of the move
instruction are shown in Fig 3.25. Notice that the target, or destination, for
the move is the first operand, and the source, or origin, is the second operand.
Fig 3.25 also shows that the assembler will translate a move pseudo-op to an
actual MIPS instruction, such as an add instruction.
At this point we are able to write a somewhat meaningful program. Suppose
we wish to do the following calculation, leaving the result in register $v0:
(2456 + 723 - 412) * 64
Note that these instructions are executed sequentially, beginning with the li
instruction and ending with the sll instruction. Figure 3.26 shows a trace of
the execution of this short program. At this point it would be advisable to run
46 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
(a)
[label:] move $rd, $rs [# comment]
(b)
Reg[$rd] ← Reg[$rs]
(c)
(d)
Figure 3.25: Move Instruction: (a) Format (b) Meaning (c) Example, which
copies the value from register $a0 into register $v0 (d) An add instruction which
is equivalent to example (c)
a small program such as this on a real computer. To do that you will need
the software package known as MARS which is available free on the internet.
Instructions on downloading, installing, and using MARS are in the Appendix.
$v0 = ?? ?? ?? ??
li $v0,2456
$v0 = 00 00 09 98
addi $v0,$v0,723
$v0 = 00 00 0c 6b
addi $v0,$v0,-412
$v0 = 00 00 0a cf
sll $v0,$v0,6
$v0 = 00 02 b3 c0
$t0 = ?? ?? ?? ??
$v0 = ?? ?? ?? ??
$v1 = ?? ?? ?? ??
ori $t0,$0,23
$t0 = 00 00 00 17
andi $v0,$t0,42
$v0 = 00 00 00 02
xori $v1,$t0,42
$v1 = 00 00 00 3d
$a0 = 12 34 56 78
$t0 = ?? ?? ?? ??
addi $t0,$0,0xffff
$t0 = 00 00 ff ff
sll $t0,$t0,16
$t0 = ff ff 00 00
and $a0,$a0,$t0
$a0 = 12 34 00 00
xor $a0,$a0,$t0
$a0 = ed cb 00 00
Figure 3.29: Example to clear the low order 16 bits of register $a0, and com-
plement the high order 16 bits of register $a0.
Assuming that register $a0 initially contains 0x12345678, Fig 3.29 shows a trace
of this program.
In this example we have used the ori (or immediate) instruction to get the
desired result in register $t3. We will see extensive use of the lui instruction
in chapter 4.
8 Some assemblers, such as MARS, will permit a 32-bit operand as a pseudo-operation.
3.6. IMMEDIATE INSTRUCTIONS 49
(a)
[label:] lui $rt, imm [# comment]
(b)
Reg[$rt]16..31 ← imm
Reg[$rt]0..15 ← 0
(c)
Figure 3.30: Load Upper Immediate Statement: (a) Format. The imm field is 16
bits. (b) Meaning (c) Example, which loads the value 3001000016 into register
$s0.
3.6.4 Exercises
1. Show a diagram similar to Fig 3.20 for the following sequence of instruc-
tions:
li $v0, 23
li $a0, 17
addi $v0, $a0, 9
move $v1, $v0
srl $v0, $a0, 3
li $t0, 75
li $t1, 0x23
addi $v0, $t0, 23
ori $v0, $t1, 0xf070
50 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
5. Show a program which will clear bits 0, 2, 3, and 31 of register $a0, leaving
the other bits unchanged. Bit 0 is the low order bit of the register. You
may use the t registers for temporary storage if necessary.
6. Show a program which will set bits 10, 11, 12, and 29 of register $a0,
leaving the other bits unchanged. Bit 0 is the low order bit of the register.
You may use the t registers for temporary storage if necessary.
7. Show a single MIPS statement which will complement bits 4, 5, and 7 of
register $a0, leaving the other bits unchanged. Bit 0 is the low order bit
of the register.
8. An iPod control system uses a 32-bit word in register $a0 to determine,
and change, the state of the iPod according to the following table:
bit number state
0 playing
1 paused
2 searching
3 stopped
4..31 [used for other purposes]
The iPod can be in one state only at any time. For example, if bit 2 is
set, the iPod is searching.
(a) Show the statement(s) which can be used to put the iPod into the
stopped state. Do not change bits 4 through 31 of register $a0.
(b) Show the statement(s) which will put 0 into register $v0 if the iPod
is currently in the searching state and some non-zero value into
register $v0 if the iPod is in some other state.
9. Show a sequence of instructions which will load the value 100,000 into
register $t0.
10. Show the contents of register $s1 and $s2, in hexadecimal, after the fol-
lowing instructions have executed:
lui $s1, 25
li $s2, 18
lui $s2, 0xfffb # -5
been made to access data in main memory (RAM), yet this is typically where
most data will reside during program execution.
There are two fundamental operations for memory reference:
• Transferring data from a full word in memory into a CPU register. This
is called a load operation.
• Transferring data from a CPU register into a full word of memory. This
is called a store operation.
The instructions which accomplish the load and store operations are, tech-
nically, Immediate format instructions. The reason for this will be more clear
in the section on explicit memory addresses.
1001000016 00 00 00 07 ff ff ff ff 00 00 00 09 00 00 00 0a
1001001016 00 00 00 ff 00 00 00 0b ?? ?? ?? ?? ?? ?? ?? ??
Figure 3.31: Six full words of data in contiguous memory locations, beginning
at memory address 1001000016
.text
lw $t0, x
10 In actual practice, MARS will initialize all such data memory to zeros, but we prefer not
to rely on this.
11 We use the word clobber to imply that the existing value is overwritten and no longer
available.
3.7. MEMORY REFERENCE INSTRUCTIONS 53
100100016
$reg =
Figure 3.32: The load word (lw) instruction copies a full word from memory
into a register
100100016
$reg =
Figure 3.33: The store word (sw) instruction copies a full word from a register
into memory
lw $t1, y
add $v0, $t1, $t0 # $v0 = x + y
sw $v0, result # result = x + y
.data
x: .word 17
y: .word 3
result: .word 0 # store sum here
When this program executes, the value of x (17 = 1116 ) is loaded from
memory into register $t0, then the value of y (3) is loaded from memory into
register $t1. The add instruction puts the sum of those values (20 = 1416 ) into
register $v0, which is then stored into the memory location labeled result. A
trace of the execution of this program is shown in Fig 3.35 in which all data
values are shown in hexadecimal.
Some computer architectures permit arithmetic and/or logical operations
directly on memory locations. However, in the MIPS architecture all operands
must be loaded into registers.
We conclude this section with an example to increment the value in a mem-
ory word, modulo 256. This means that the value will be reset to 0 when
incrementing 255. We do this by using a mask to clear the high order 24 bits.
Figure 3.34: Format of the load word (lw) and store word (sw) instructions,
using symbolic memory addresses (i.e. labels)
54 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
1001000016 00 00 00 11 00 00 00 03 00 00 00 00 ?? ?? ?? ??
$t0 = ?? ?? ?? ??
$t1 = ?? ?? ?? ??
$v0 = ?? ?? ?? ??
lw $t0,x
$t0 = 00 00 00 11
lw $t1,y
$t1 = 00 00 00 03
add $v0,$t0,$t1
$v0 = 00 00 00 14
sw $v0,result
1001000016 00 00 00 11 00 00 00 03 00 00 00 14 ?? ?? ?? ??
Figure 3.35: Trace of a program which stores the sum of two values in memory
(x and y) into a third memory location (result)
.text
lw $t0, x # mod 256 counter
addi $t0, $t0, 1 # increment by 1
andi $t0, $t0, 0xff # clear high order 24 bits
sw $t0, x # store back to memory
.data
x: .word 33
In this program we load the value of x into register $t0, add 1, and then we use
a mask of 00000000ff16 to clear the high order 24 bits, leaving the low order
8 bits unchanged. We then store the result, 34, back into the memory location
labeled x. Fig 3.36 shows a trace of the execution of this program.
If the value of x had been 255 instead of 33, the result would be 0, as shown
in Fig 3.37. A modulo 256 counter resets to 0 when incrementing 255.
We can also add or subtract a fixed number of bytes to a symbolic address
in assembly language. For example, if we have a label, x, on a data value, the
expression x+12 represents the address 12 bytes (or 3 words) larger; i.e. the
address of data 12 bytes away from x. In the example below, we store the
difference of the two words beginning at start into the location named diff.
.text
3.7. MEMORY REFERENCE INSTRUCTIONS 55
1001000016 00 00 00 21 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
$t0 = ?? ?? ?? ??
lw $t0,x
$t0 = 00 00 00 21
addi $t0,$t0,1
$t0 = 00 00 00 22
andi $t0,$t0,0xff
$t0 = 00 00 00 22
sw $t0, x
1001000016 00 00 00 22 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Figure 3.36: Trace of a program which increments a memory value, 33, modulo
256
1001000016 00 00 00 ff ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
$t0 = ?? ?? ?? ??
lw $t0,x
$t0 = 00 00 00 ff
addi $t0,$t0,1
$t0 = 00 00 01 00
andi $t0,$t0,0xff
$t0 = 00 00 00 00
sw $t0, x
1001000016 00 00 00 00 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Figure 3.37: Trace of a program which increments a memory value, 255, modulo
256
56 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Figure 3.38: Format of the load word (lw) and store word (sw) instructions,
using explicit memory addresses
lw $t0, start
lw $t1, start+4 # next word after start
sub $t1, $t0, $t1 # difference
sw $t1, diff
.data
start: .word 23
.word 17
diff: .word 0 # should be 6 when done
.text
lw $t0, 0($a0) # load word whose address is in $a0
sub $t0, $0, $t0 # $t0 = 0 - $t0
12 A memory address is called a pointer in C++ or a reference in java.
3.7. MEMORY REFERENCE INSTRUCTIONS 57
1001002016 00 00 00 23 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
$a0 = 10 01 00 20
$t0 = ?? ?? ?? ??
lw $t0,0($a0)
$t0 = 00 00 00 23
sub $t0,$0,$t0
$t0 = ff ff ff dd
sw $t0,4($a0)
1001002016 00 00 00 23 ff ff ff dd ?? ?? ?? ?? ?? ?? ?? ??
Figure 3.39: Trace of a program which stores the negation of the memory word
whose address is in register $a0 in the next adjacent memory location
Figure 3.40: Format of the load address (la) instruction, with an example
Note that the memory word adjacent to the one whose address is in register $a0
is obtained with a displacement of 4, because there are 4 bytes in a word. If we
assume that memory location 1001002016 contains the value 0000002316 = 35,
and that register $a0 contains 1001002016, then Fig 3.39 shows an execution
trace of this program.
Load Address
In the previous example we assumed that register $a0 contained the desired
memory address. We now show how a memory address can be placed in a
register. This is done with the load address (la) instruction. At this point we
consider only the symbolic form of this instruction (it also has an explicit form,
which is rarely used). The la instruction is actually a pseudo-operation, and in
chapter 4 we will see how it is translated to machine language. The format of
the la instruction is shown in Fig 3.40.
In this figure, compare the meaning of the load address instruction with the
meaning of the load word instruction (in Fig 3.34). Instead of putting the value
of the memory word into the destination register, it puts the address of the
memory word into the destination register.
As an example we show below a program which will copy the value of the
memory word labeled source to the next three adjacent words of memory (which
have no labels).
58 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
1001000016 00 00 00 4b 00 00 00 00 00 00 00 00 00 00 00 00
$t0 = ?? ?? ?? ??
$t1 = ?? ?? ?? ??
la $t0,source
$t0 = 10 01 00 00
lw $t1,0($t0)
$t1 = 00 00 00 4b
sw $t1,4($t0)
1001000016 00 00 00 4b 00 00 00 4b 00 00 00 00 00 00 00 00
sw $t1,8($t0)
1001000016 00 00 00 4b 00 00 00 4b 00 00 00 4b 00 00 00 00
sw $t1,12($t0)
1001000016 00 00 00 4b 00 00 00 4b 00 00 00 4b 00 00 00 4b
Figure 3.41: Trace of a program which copies the memory word labeled source
to the next three adjacent memory words
.text
la $t0, source # address of source
lw $t1, 0($t0) # value of source
sw $t1, 4($t0) # store into source + 4
sw $t1, 8($t0) # store into source + 8
sw $t1, 12($t0) # store into source + 12
.data
source: .word 75
.word 0,0,0
An execution trace of this program is shown in Fig 3.41. Note that when
execution finishes, all four words of memory store the same value (4b16 = 75).13
3.7.3 Exercises
1. Show an execution trace of the following program:
.text
13 This program can be done without using the la instruction, by storing into label+4,
label+8, and label+12
3.7. MEMORY REFERENCE INSTRUCTIONS 59
lw $t0, x
lw $t1, y
sub $v0, $t0, $t1 # v0 = x - y
sw $v0, x # store result in x
sw $t0, y
.data
x: .word 0xff
y: .word 127
.text
lw $t0, x
lw $t1, y
xor $v0, $t0, $t1 # v0 = x xor y
sw $v0, x # store result in x
sw $t0, y
.data
x: .word 127
y: .word 0xff
3. Show the contents of register $v0 after the program shown below has
executed:
.text
lw $v0, x
lw $v1, x+8
add $v0, $v1, $v0
.data
foo: .word 100, 50
.word 40
x: .word 12, 13
.word 4, 5
4. Given the data values shown below, write a MIPS program to store the
sum of the values labeled mary, jim, sue into the location labeled result.
.data
mary: .word 17
jim: .word -99
sue: .word 10
result: .word 0
60 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
5. Given the data values shown below, write a MIPS program to store the
sum of the 3 full-word values beginning at the location labeled junk into
the location labeled result (the result should be 23).
.data
junk: .word 17, 23
.word -17, 5
result: .word 0
6. Show an execution trace of the following program (when memory words are
changed, show the address, as well as the new value, both in hexadecimal):
.text
la $t0, first
la $t1, second
la $t2, first+8
lw $v0, 0($t0)
lw $v1, 0($t1)
lw $t3, 8($t0)
sw $t3, 8($t1)
.data
first: .word 7
second: .word 8
.word 9
.word 10
7. What value will be in register $v0 after the program shown below has
executed? What value will be stored in the full word of memory labeled
result?
.text
la $t0, x
addi $t0, $t0, 12
lw $t1, 0($t0)
lw $v0, 0($t0)
la $t2, result
sw $v0, 0($t2)
.data
x: .word 17
.word 1, 2, 3, 4
result: .word 0
3.8. TRANSFER OF CONTROL 61
8. Write a program which will add the 5 contiguous full words of memory
beginning with the word whose address is in register $a0, and leave the sum
in register $v0. (Assume register $a0 has been loaded with the appropriate
address.)
9. Write a program which will compute the number of data values in the
full words labeled array, leaving the result in register $v0. Use the data
section shown below. If the data values are changed, your program should
work without any changes to your code.
Hint: Find the difference between the addresses represented by the start
and end labels, then shift right to divide by 4.
.data
array: .word 12, 3, -9, 0, 0, 55, -44, 0, 99
end: .word 0
• We may wish execution to take one of two possible paths, depending on the
current state of the program (i.e. the values currently stored in registers).
This is called a selection structure and is usually implemented with an if
statement in high level programming languages.
• A conditional transfer is one in which the transfer may or not take place;
it depends on the current state of the program. A conditional transfer is
implemented in assembly language with a branch instruction.
Program
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
Figure 3.43: Format and meaning of the branch instructions, with examples
Figure 3.44: Format and meaning of the jump instruction, with example
in chapter 4.
Examples of programs with branch instructions will be given in the sections
on Selection Structures and Iteration Structures.
.text
move $v0, $a0 # assume $a0 is smaller
64 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Program
===============
branch to label?
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
===============
label==========
===============
===============
Figure 3.45: Diagram of a one-way selection structure
The logic is as follows: We move the value in register $a0 into register $v0,
whether it is the smaller or not. We then compare the registers $a0 and $a1;
if register $a0 is smaller, we do not wish to change register $v0, so we branch
around the second move instruction, to the sw instruction. In this program
we could have put the label done on the same line with the sw instruction;
normally we will put a label on a line by itself, for clarity, with the intention
that it labels the following instruction. An execution trace of this program is
shown in Fig 3.46. In that example, register $a0 initially contains -25, which is
the smaller value, and register $a1 initially contains 3.
1001000016 00 00 00 00 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
$a0 = ff ff ff e7
$a1 = 00 00 00 03
$v0 = ?? ?? ?? ??
move $v0,$a0
$v0 = ff ff ff e7
sw $v0,result
1001000016 ff ff ff e7 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Figure 3.46: Trace of a program which loads the smaller of registers $a0 and $a1
into register $v0, and also stores the smaller into the memory location labeled
result
Program
===============
===============
branch to else part?
if part
===============
===============
===============
===============
end if part
jump to end
else part
===============
===============
===============
===============
===============
end else part
===============
===============
===============
Figure 3.47: Diagram of a two-way selection structure
66 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
taken and execution falls through into the if part; at the conclusion of the if
part there is an (unconditional) jump to avoid execution of the else part.
As an example, we wish to code the following java statement in assembly
language:14
if ($a0 > 6)
{ $v0 = 0;
$a1++;
}
else
{ $v0 = $a0;
$a1 = 0;
}
The assembly language version of this statement uses register $t0 for tem-
porary storage:
li $t0, 6
ble $a0, $t0, else # branch if $a0 is NOT greater than 6
li $v0, 0 # $v0 = 0
addi $a1, $a1, 1 # $a1++
j done
else:
move $v0, $a0 # $v0 = $a0
li $a1, 0 # $a1 = 0
done:
We load the constant 6 into the temporary register $t0 because the condi-
tional branch compares the contents of two registers. Note that the condition
$a0 > 6 is implemented with the logical complement: branch if less or equal
(ble).
We show a few execution traces for this program. In both of these register
$a1 initially contains the value 15. Fig 3.48 shows an execution trace in which
register $a0 initially contains 6 (the conditioon is false because $a0 is not strictly
greater than 6). Fig 3.49 shows an execution trace in in which register $a0 is
initially 7 (the condition is true).
$a0 = 00 00 00 06
$a1 = 00 00 00 0f
$v0 = ?? ?? ?? ??
li $t0,6
$t0 = 00 00 00 06
ble $a0,$t0,else
move $v0,$a0
$v0 = 00 00 00 06
li $a1,0
$a1 = 00 00 00 00
$a0 = 00 00 00 07
$a1 = 00 00 00 0f
$v0 = ?? ?? ?? ??
li $t0,6
$t0 = 00 00 00 06
ble $a0,$t0,else
li $v0,0
$v0 = 00 00 00 00
addi $a1,$a1,1
$a1 = 00 00 00 10
j done
Program
===============
===============
branch out of loop?
==== loop body ====
===============
===============
===============
===============
=== end loop body ===
jump to test
===============
===============
===============
===============
will show how to implement each of these loops in assembly language. In both
cases we refer to the sequence of statements to be repeated as the body of the
loop.
Note that the body of the loop is the same as for pretest loops, but no
(unconditional) jump is needed in the loop control for this program. When the
bne branch fails, control falls through to the next instruction, terminating the
loop.
i = n · (n + 1)/2
i=1
70 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
$t0 = ?? ?? ?? ??
$v0 = ?? ?? ?? ??
li $t0,100
$t0 = 00 00 00 64
li $v0,0
$v0 = 00 00 00 00
beq $t0,$0,done
add $v0,$v0,$t0
$v0 = 00 00 00 64
addi $t0,$t0,-1
$t0 = 00 00 00 63
j lp
beq $t0,$0,done
add $v0,$v0,$t0
$v0 = 00 00 00 c7
addi $t0,$t0,-1
$t0 = 00 00 00 62
j lp
...
add $v0,$v0,$t0
$v0 = 00 00 13 ba
addi $t0,$t0,-1
$t0 = 00 00 00 00
j lp
beq $t0,$0,done
Figure 3.51: Partial trace of a program which uses a pretest loop to sum the
first 100 whole numbers, leaving the result in $v0
3.8. TRANSFER OF CONTROL 71
the correct number of iterations by just one. Be sure to check this by hand
simulating a simple example.
3.8.5 Exercises
1. Show an execution trace of the following program:
li $t0, 3
li $v0, 1
bgt $t0, $0, positive
li $v0, -1
positive:
li $t0, 3
blt $t0, $0, negative
li $v0, 1
j done
negative:
li $v0, -1
done:
if ($a0 == 17)
{ $v0 = 0;
$v1 = 3;
}
4. What value will be left in register $v0 after the program shown below has
executed?
li $t0, -7
li $v0, 6
blt $t0, $v0, skip
li $v0, 0
skip:
5. Write a program which will put the larger value (first or second) into
register $v0. Use the following data section. Your program should work
without change if the values in the data section are changed.
.data
first: .word 23
second: .word -23
72 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
6. Write a program which will compare the values in registers $a0, $a1, and
$a2 and put the smallest of the three values into register $v0.
.text
lw $t0, x
lw $t1, y
ble $t0, $t1, skip
li $v0, 0
li $v1, 1
j done
skip:
li $v0, 1
li $v1, 0
done:
.data
x: .word 15
y: .word -17
9. Write a program which will determine the sign of a full word in memory
labeled x. If x is negative, $v0 should be -1; if x is positive, $v0 should be
+1; if x is zero, $v0 should be 0.
10. What value will be left in register $v0 after the program shown below has
executed?
.text
li $v0, 3
lw $t0, count
lp:
ble $t0, $0, done
addi $t0, $t0, -1
add $v0, $v0, $v0
j lp
3.9. MEMORY ARRAYS 73
done:
.data
count: .word 4
11. Show an execution trace for the program in the preceding exercise.
12. What value will be left in register $v0 after the program shown below has
executed?
.text
li $v0, 3
lw $t0, count
lp:
addi $t0, $t0, -1
add $v0, $v0, $v0
ble $t0, $0, lp
.data
count: .word 4
13. Show an execution trace for the program in the preceding exercise.
14. Write a program which will find the sum of the whole numbers from 20
through 40, inclusive, leaving the result in register $v0. Use a loop.
15. Write a program which will multiply data values x and y, leaving the
product in register $v0. Assume x is not negative.
Hint: Use repeated addition in a loop. Use x as a counter, and add the
value of y into $v0, used as an accumulator.
numbers[7] = 14.5;
numbers[3] = numbers[7] + 1.0; // 15.5
74 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
.text
li $t0, 5 # counter = 5
la $t1, grades # pointer to array
li $v0, 0 # accumulator = 0
lp:
beq $t0, $0, done # exit loop?
lw $t2, 0($t1) # grades[i]
add $v0, $v0, $t2 # acc = acc + grades[i]
addi $t1, $t1, 4 # pointer = pointer + 4
addi $t0, $t0, -1 # counter--
j lp # repeat loop
done:
.data
grades: .word 25, 63, -45, 0, 12
3.9.1 Exercises
1. The following program should find the sum of the positive values in an
array of whole numbers (the length is 5). Show an execution trace of this
program.
.text
la $t0, numbers # pointer to array
li $t1, 5 # counter
17 A pointer in C/C++ is essentially the same as a reference in java - a memory address.
The difference is that you can do arithmetic with a pointer but not with a reference.
3.9. MEMORY ARRAYS 75
1001000016 00 00 00 19 00 00 00 3f ff ff ff d3 00 00 00 00
1001001016 00 00 00 0c ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
$t0 = ?? ?? ?? ??
$t1 = ?? ?? ?? ??
$t2 = ?? ?? ?? ??
$v0 = ?? ?? ?? ??
li $t0,5
$t0 = 00 00 00 05
la $t1,grades
$t1 = 10 01 00 00
li $v0,0
$v0 = 00 00 00 00
beq $t0,$0,done
lw $t2,0($t1)
$t2 = 00 00 00 19
add $v0,$v0,$t2
$v0 = 00 00 00 19
addi $t1,$t1,4
$t1 = 10 01 00 04
addi $t0,$t0,-1
$t0 = 00 00 00 04
j lp
Figure 3.52: Partial trace of a program which sums the values in an array of
five whole numbers, beginning at grades.
76 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
li $v0, 0 # accumulator
loop:
ble $t1, $0, done
lw $t2, 0($t0)
ble $t2, $0, notPos
add $v0, $v0, $t2
notPos:
addi $t0, $t0, 4 # pointer to next word
addi $t1, $t1, -1 # decrement counter
j loop
done:
.data
numbers: .word -12, 32, 0, -3, 4
2. Modify the example in this section so that it will find the sum of the values
in an array of any length. Assume the array is followed by a label: end.
.data
array: .word 25, 63, -45, 0, 12, -25, 66, 99
end: .word 0 # Marks the end of the array.
3. Write a program which will scan the values in an array named array and
leave the smallest value in register $v0. Assume the length of the array is
not 0. Assume the end of the array is marked by the label end as in the
preceding exercise.
Hint: Use the first value in the array as a temporary result, then replace
it if you find a value which is smaller.
.data
array: .word 25, 63, -45, 0, 12, -25, 66, 99
end: .word 0 # Marks the end of the array.
.data
array: .word -25, 63, 450, 450, 512
end: .word 0 # Marks the end of the array.
5. Write a program which will find the vector sum of two arrays having the
same length, leaving the sum in an array named result. It should add
corresponding elements of the two arrays, producing an array of the same
length as the result.
3.10. FUNCTIONS 77
.data
array1: .word 16, 17, 0 -2, 4, 5
end1: .word 0
array2: .word 0, 3, 5, 2, -7, 16
end2: .word 0
result .word 0
3.10 Functions
In mathematics a function is a mapping from a set of values, called the domain,
to a set of values, called the range. Functions may have 0 or more arguments
(also known as parameters). The arguments may be thought of as the input(s)
to the function, which produces a single range value as its result. Examples of
mathematical functions are:
• g(x, y, z) = x2 + y · z − 4 + f (0.5)
• exp(x) = 1 + x + x2 /2 + x3 /6 + x4 /24
1. It will load the address of the next instruction into the $ra register. This
is the return address, or the address to which the function should return
control when it terminates.
The format and meaning of the jal instruction is shown in Fig 3.53.
18 Some languages use the term procedure for a function which has no explicit return value,
Figure 3.53: Format and meaning of the jump and link instruction, with example
Figure 3.54: Format and meaning of the jump register instruction, with example
• The part shown as the main program is included simply to test the func-
tion. It contains the call to the function (jal order2). It is given the
name main, though this label is not used, except for documentation.
• When the main program calls the order2 function, the jal instruction
puts the address of the next instruction (addi $a0, $a0, 0) into reg-
ister $ra.
19 The main program is used simply to test the function for correctness. It is not part of
the function. Software which exists only for testing purposes is often called a driver.
3.10. FUNCTIONS 79
Program
• Since the order2 function is expecting the address of the memory words
to be swapped in register $a0, it is the responsibilty of the main program
to ensure that this is the case. This is done with a la instruction. This
is known as a precondition; the function will work correctly only if the
precondition is satisfied.
• This function has no explicit result. However, it does have side effects: it
may change data in memory.
If you run the above program using MARS, you should single-step one instruc-
tion at a time and stop at the addi instruction in the main program. If you run
it at full speed, control will fall through into the function after the last line of
the main program, which is not desired. We will find a way to improve on this
below.
We emphasize the importance of separating the function itself from the code
that is used to test it - the main program. This testing code is often called a
Driver ; it serves no other purpose. Once we are sure the function is working
correctly it can be extracted and used in other programs, where needed.
Figure 3.56: A function which will arrange two contiguous words of memory in
ascending order
3.10. FUNCTIONS 81
uses of syscall, which we shall examine later in this chapter. A syscall uses
the contents of register $v0 to determine the desired action. To terminate the
program, $v0 should contain 10. An example is shown below, in which we add
two values from memory and store the result back to memory:
.text
lw $t0, first
lw $t1, second
add $t0, $t0, $t1
sw $t0, result
li $v0, 10 # return code for syscall
syscall # terminate the program
.data
first: .word 5
second: .word 9
result: .word 0
Note that this program will terminate with no error messages when finished;
this is a much better way to terminate a program.
We could also make this change in the driver for our order2 function in
Fig. 3.56. However there is a potential problem with our order2 function,
which we examine next.
.text
main:
la $a2, words
jal order3 # invoke order3
li $v0, 10
syscall # terminate
.data
words: .word 9, 0, -12
.text
82 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Note that we could have avoided moving the return address back to $ra by
changing the last instruction to jr $t2, but for reasons made clear later, this
is not a good practice. Also, we were careful not to use registers $t0 nor $t1 for
this purpose, because they would be clobbered by the lw instructions in order2.
What would happen if the order2 function was later modified to use $t2 for
some other purpose? What would happen if the order2 function was modified
to call some other function? When working as a member of a team, how can
your functions call the functions of other team members without conflicting
usages of registers? These questions will be resolved in the section on register
conventions.
Stacks
Finally we address the question: Where in memory should the $ra register and
other registers be saved? In order for function calls to work in a general way
(no matter how deeply function calls may be nested) and to handle recursive
function calls, we will need a software stack. A stack is a last-in first-out (LIFO)
structure. When extracting an item from a stack, it must be the most recent
item that was added to the stack. The process of adding an item to a stack
is usually called a push operation, the process of determining the last item
added is called a peek operation, and the process of removing an item is called
a pop operation. Fig 3.58 depicts these operations on a stack, which is shown
vertically, with the last item added on the top.
23 The ’s’ stands for saved
24 The ’t’ stands for temporary
25 This is known as call by reference.
86 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
c d
b b b
a pop a push(d) a
Figure 3.58: Diagram of a stack containing the values a,b,c, showing the effects
of a pop operation followed by a push(d) operation
MIPS stack
We will implement a software stack using the $sp register. This register contains
the memory address of the top value on the stack. When we wish to push a
value onto the stack we will first decrement the $sp register, then store the value
to be pushed at the address specified by the $sp register. Thus if we wish to
push the value of the $ra register, we would use the following two instructions
To peek at the top value on the stack (i.e. put its value into a register), simply
use the $sp register to access that memory location:
lw $ra, 0($sp) # peek
To pop a value off the stack, simply increment the $sp register:
addi $sp, $sp, 4 # pop
Note that we decrement the stack pointer when doing a push operation, and
we increment the stack pointer when doing a pop operation. This means that the
stack grows toward a lower memory address, which may seem counter-intuitive.
...
done: # return to calling function
lw $s7, 12($sp) # pop 3 s regs
lw $s5, 8($sp)
lw $s2, 4($sp)
lw $ra, 0($sp) # pop return address
addi $sp, $sp, 16 # original stack pointer
jr $ra # return to calling function
################# End function
Figure 3.59: A function which modifies registers $s2, $s5, and $s7 needs to push
them on the stack on entry to the function, and pop them from the stack when
exiting the function
terminates. The $s registers are pushed onto the stack the same way the $ra
register is pushed - by using the stack pointer register, $sp.
For example, if a function modifies the $s2, $s5, and $s7 registers then it
would push them, along with the $ra register onto the stack when the function
is entered as shown in Fig 3.59. It would also pop them from the stack when
the function is to return to the calling function.
In Fig 3.59 note that we need to decrement the $sp register by 16 instead of
by 4 because we are pushing 4 registers onto the stack instead of 1, and there
are 4 bytes in a word. As usual, the stack grows toward low-address memory.
For a more complete example, we return to our order2 function, which
arranges two contiguous words of memory in ascending order. However, this
time we will use registers $s0 and $s1 instead of $t0 and $t1 for temporary
storage. The example is shown in Fig 3.60.
It may seem that this version of order2 is unnecessariy complicated by the
fact that we are using $s registers. However, in more complex software systems,
it will be essential that we use the $s registers, especially if our functions are
(potentially) recursive.
We now rewrite our order3 function so that it also agrees with the MIPS
register conventions:
88 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Figure 3.60: A function which will arrange two contiguous words of memory in
ascending order. The $s registers and return address are saved on the runtime
stack.
3.10. FUNCTIONS 89
Name conflicts
At this point we have been using labels in our program text sections, without
regard to the possibility of duplicate labels. The assembler will not permit
duplicate labels; all labels in a source file must be unique.
Consider writing a program in which we wish to find the range of an array;
i.e., we want to find the difference between the largest and smallest values in
an array. We will do this with two separate functions, one to find the smallest
and one to find the largest, then subtract to get the range. The two functions
to find the smallest and largest values are shown below:
lp:
ble $a1, $s0, done
lw $s1, 0($s0)
bge $s1, $v0, ok
move $v0, $s1 # new candidate for smallest
ok:
addi $s0, $s0, 4
j lp
done:
lw $s1, 8($sp) # pop s registers
lw $s0, 4($sp)
lw $ra, 0($sp) # pop return address
addi $sp, $sp, 12
jr $ra
##################### End smallest function
Similar changes made to the largest function, though not strictly necessary,
would be consistent with our new convention on labels. Our goal is to be able to
copy a tested and trusted function, and paste it into any source file which may
have a need for it. We should not have to make any changes to the function.27
lp_distr:
27 We will see later in Chapter 5 that MARS provides an .include directive which eliminates
the need for copying and pasting, thus eliminating the problem of duplicated code in multiple
source files.
3.10. FUNCTIONS 93
la $s0, buffer_distr
add $s0, $s0, $a2 # add displacement
lw $s1, 0($s0)
addi $s1, $s1, 1
sw $s1, 0($s0)
lw $s1, 8($sp)
lw $s0, 4($sp)
lw $ra, 0($sp)
addi $sp, $sp, 12
jr $ra
####################### End incr function
Note that the local data is called buffer distr because it is in the function
distr (to distinguish it from local buffers in other functions). Also, it might
have been a good idea to name our called function incr distr instead of incr,
to indicate that it is used as a ‘local’ function by the distr function, and to
distinguish it from other functions which may have a similar purpose.
• There must be a base case which does not involve a recursive call.
• When the function calls itself, the input to the function (usually contained
in the parameters) must be reduced in some way.
If these two properties are satisfied, we will avoid ‘infinite recursion’ and it
should work as intended. In many cases a recursive function call can be replaced
by a loop, but not always. There are some tasks which cannot be completed
with loops; they require recursive calls (which in turn require the use of a stack).
Note that it is possible for a function to be indirectly recursive. If function A
calls function B, and function B calls function A, both function A and function
B are (indirectly) recursive. Thus it is not always evident at first glance whether
a function is recursive, and that is one reason that we should save registers on
the stack.
As an example we choose the fibonacci sequence, a well known sequence of
numbers which is found in various natural phenomena. The sequence is:
1 1 2 3 5 8 13 ...
3.10. FUNCTIONS 95
Note that after the first two numbers, each number in the sequence is the
sum of the previous two numbers, so the next number after 13 would be 8 + 13
= 21.
We can state this more formally with a recursive definition of the fibonacci
sequence, where fib(n) is the nth number in the sequence:
f ib(n) = 1 if n < 3
f ib(n) = f ib(n − 1) + f ib(n − 2) if n > 2
We will now use this definition to implement the fib function with a recursive
call. Note that the first part of the definition is the base case (does not involve
a call to fib). The second part involves two calls to fib, but the input is reduced
for each of those recursive calls. The function is shown in Fig 3.61.
Note that we are using $s registers here to store the values of n-1 and n-2.
These are values which need to be pushed onto the stack, just as local variables
in a recursive java method are pushed onto the runtime stack.
Fig 3.61 shows a rather inefficient way to find the nth Fibonacci number.
For example, to find fib(5) we add fib(4) + fib(3). To find fib(4) we add fib(3) +
fib(2); thus in finding fib(5) there are at least two separate calls to fib(3). This
function can be programmed much more efficiently with a loop. We chose to
use recursion merely to demonstrate the correct usage of the stack and recursive
functions.
3.10.5 Exercises
1. The function shown below is supposed to return the sum of three contigu-
ous words of memory. Modify the function (including its API) so that it
correctly observes the MIPS register conventions. It should use registers
$s0 and $s1 for temporary storage.
2. Write and test a MIPS function named reverse3 to reverse the order of 3
contiguous words in memory. It can expect that the address of the first
word is in register $a0. Be sure to include an appropriate API. To test
your solution you will need a main program which calls reverse3.
96 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Figure 3.61: Function to find the nth number in the Fibonacci sequence
3.10. FUNCTIONS 97
3. The function named ’sum’, shown below, is supposed to return the sum of
its two arguments, registers $a0 and $a1 in register $v0. What is wrong
with this function?
.text
sum:
addi $sp, $sp, -4
sw $ra, 0($sp) # push return address
add $v0, $a0, $a1
lw $ra, 0($sp) # pop return address
addi $sp, $sp, 4
jr $ra # return
5. Write and test a MIPS function, named reverseArray, to place the words
of an array in reverse order. For example, if the array is 43, -12, 5, 6
then when the function terminates, the array should be 6, 5, -12, 43. The
arguments to the function should be the start address of the array, in
register $a0, and the ending address (the address of the word after the
last word in the array) in register $a1. Use at least one $s register for
temporary storage. Don’t forget to include the API and observe MIPS
register conventions.
6. (a) Write and test a MIPS function named addrSmallest which will re-
turn the address of the smallest word in a memory array. The ar-
guments to the function should be the start address of the array, in
register $a0, and the address of the word following the last word in
the array in register $a1. The address of the smallest should be re-
turned in register $v0. You may assume the length of the array is at
least 1.
(b) Write and test a MIPS function named ’sort’ to sort the words of
a memory array in ascending order. Use the following algorithm
(known as Selection Sort):
• For each position in the array, find the address of the smallest
value beginning at that position. (Call your addrSmallest func-
tion from part (a))
• Swap the word at that position with the smallest.
98 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
7. (a) Write and test a function named ’pal’ which will determine whether
a memory array is a palindrome: It reads the same backwards as it
does forward. Each of the following arrays is a palindrome:
4, 9, -3, -3, 9 4
2, 0, 2
17
Return a one in register $v0 only if the array is a palindrome, and
zero if it is not a palindrome. Your function should use a loop. The
arguments to the function should be the start address of the array,
in register $a0, and the ending address in register $a1.
(b) Repeat part (a) but your function should use a recursive call rather
than a loop. Name the function palRecursive.
Hints:
• Base case: The length of the array is less than two. It must be
a palindrome, so put 1 into register $v0 and return.
• Base case: Compare the first and last words of the array. If they
are not equal, the array could not be a palindrome, so put zero
into register $v0 and return.
• Recursive case: Determine whether the rest of the array is a
palindrome by calling palRecursive with different arguments.
• The .ascii directive allows one to initialize the data memory with the
characters of a string:
.data
name: .ascii "harry"
• The .asciiz directive is the same as the .ascii directive but the string
is terminated with a null byte. This null byte is a character whose code is
0. (Not the character ’0’, but the binary value 0, which is 000000002):
.data
name: .asciiz "harry"
These two definitions may look alike, but the first occupies 5 bytes of mem-
ory, and the second occupies 6 bytes, because of the null byte at the end. One of
the issues to be addressed when working with strings is the need to determine
when we have reached the end of a string. Some high level languages, such
as Java, store a numeric length, along with the characters of a string. Other
languages, such as C store a null byte at the end of the string (and calculate the
length if needed). The MIPS directive .asciiz is in agreement with C strings -
terminated by a null byte. With the .asciiz directive, the null byte serves as
a sentinel, or terminating character.
In what follows, we will generally use the .asciiz form and rely on the null
byte to terminate the string. When processing the characters of a string, the
logic will be very similar to the logic we used when processing the elements of
an array, with two main distinctions:
• Since each element of the string consists of one byte rather than one word,
we will increment the address register by 1 instead of 4, to advance to the
next character of a string.
• We will normally wish to load a single character into a register; thus the
lw and sw instructions are not what we want. Instead we will use byte
instructions, introduced in the next section.
loaded through the high order 24 bits of the register; this will not normally be used when
working with strings.
100 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Figure 3.62: Format of the load byte unsigned (lbu) and store byte (sb) instruc-
tions, using symbolic memory addresses (i.e. labels), and explicit addressessing
Both the lbu and the sb instructions may be given in either symbolic or
explicit format, as shown in Fig 3.62. In that figure, note that we use subscripts
to designate specific bit locations within a register - 0 is the position of the low
order bit and 31 is the position of the high order bit.30
• Return some negative number if str1 < str2. Intuitively, str1 is smaller
than str2 if str1 precedes str2 alphabetically.
li $v0,0 # counter
lp_strlen:
lbu $s0, 0($a0) # next char of string
beq $s0, $0, done_strlen # end of string?
addi $v0, $v0, 1 # increment counter
addi $a0, $a0, 1 # address of next char
j lp_strlen
done_strlen:
lw $ra, 0($sp)
lw $s0, 4($sp)
addi $sp, $sp, 8
jr $ra # return
######################## End function strlen
Figure 3.63: Function to find the length of a string. Address of the string is in
register $a0.
(a)
1001000016 64 6f 6f 47 72 6f 4d 20 67 6e 69 6e ?? ?? ?? 00
(b)
1001000016 d o o G r o M g n i n ?? ?? ??
Figure 3.64: Diagram of the characters of the string “Good Morning” in data
memory, showing the hex codes (a), and the characters(b)
102 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
• Return some positive number if str1 > str2. Intuitively, str1 is greater
than str2 if str1 follows str2 alphabetically.
When comparing strings, we are not comparing the lengths of the strings. Some
examples of string comparisons are:
Note that if a string is a prefix of the string with which it is being compared,
the prefix is smaller.
The logic we use to compare two strings is to compare corresponding charac-
ters of the two strings, beginning at the left end. If the characters are different,
we know which string is smaller. If the characters are equal, we advance to
the next position and repeat until we reach the end of one (or both) of the
strings. When comparing two characters, we will load each into a register and
compare the two registers; hence, we are actually comparing the ASCII codes
of the characters. The function strcmp is shown in Fig 3.65.
In the strcmp function note that if we determine that the two characters are
equal, and one of them is the null byte (zero), then both characters are the null
byte, and the two strings must be equal. At this point register $v0 must be 0,
which is the desired result when the two strings are equal.
3.11.4 Exercises
1. Write the API for the function shown below. In the API describe the
purpose of the function, its parameters, precondiitions, if any, explicit
result(s) if any, and side effects, or post conditions, if any. Give the
function an appropriate name.
.text
foo:
addi $sp, $sp, -8
sw $ra, 0($sp)
sw $s0, 4($sp)
lp_foo:
lbu $s0, 0($a0)
beq $s0, $0, done_foo
sb $a1, 0($a0)
addi $a0, $a0, 1
j lp_foo
done_foo:
3.11. STRINGS AND STRING FUNCTIONS 103
strcmp:
addi $sp, $sp, -12 # push return address
sw $ra, 0($sp)
sw $s0, 4($sp)
sw $s1, 8($sp)
lp_strcmp:
lbu $s0, 0($a0) # load byte from str1
lbu $s1, 0($a1) # load byte from str2
sub $v0, $s0, $s1 # v0 = t0 - t1
bne $v0, $0, done_strcmp # different chars, finished
Figure 3.65: Function to compare two strings. A negative result means the first
string is smaller. A positive result means the second string is smaller.
104 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
lw $s0, 4(sp)
lw $ra, 0($sp)
addi $sp, sp, 8
jr $ra
############### End function foo
2. Write and test a function named toUpper which will convert all the al-
phabetic characters in a given string to upper case. Any uppercase char-
acters or non-alphabetic characters should be unchanged. For example,
the string “fooBaR!@” should be changed to “FOOBAR!@”. Assume that
register $a0 points to the string, and that it is terminated by a null byte.
Hint: In Appendix .6.1 compare the binary ascii codes of lower case al-
phabetic characters with the binary ascii codes of the corresponding upper
case characters.
3. Write and test a function named isNumeric which will determine whether
a string consists entirely of numeric characters, i.e. the characters ’0’..’9’.
Your function should return a 1 in register $v0 if this is so; otherwise it
should return a 0 in register $v0. Assume that register $a0 points to the
string, and that it is terminated by a null byte. The string of length 0 is
not a numeric string.
4. (a) Write and test a function named toInt which will convert a given
string of numeric characters to a binary full word (i.e. an int), which
is returned in register $v0. Assume that register $a0 points to the
string, and that it is terminated by a null byte. Assume the given
string is a valid positive int, and that its length is not 0.
Hint: 10x = 8x + x + x
(b) Modify your solution to part (a) to allow for negative numbers. The
given string could begin with a ’-’ character.
5. Write and test a function which will extract a substring from a given
string. Use the API shown below:
############### Begin function substr
### Return the string which is a substring of a given string.
### Pre: Register $a0 points to the given string, which is
### null-terminated.
### Pre: Register $a1 contains the starting position of the substring
### (First position is 0)
### Pre: Register $a2 is the length of the substring.
### Pre: Register $a3 is the address of a memory buffer for the result,
### which should be a null-terminated substring.
### Author:
6. Write and test a function which will concatenate two strings to produce a
string result. For example, if the strings ”Holy” and ”Cow” are concate-
nated, the result would be ”HolyCow”.
3.12. MULTIPLICATION OF WHOLE NUMBERS 105
.text
######################## Begin multLoop function
### Multiply two whole numbers using repeated addition
### Pre: The values to be multiplied are in registers
### $a0 and $a1
### Pre: $a0 is not negative
### Return product in $v0
multLoop:
addi $sp, $sp, -4
sw $ra, 0($sp)
Figure 3.66: Function to multiply two whole numbers, using repeated addition
3.12. MULTIPLICATION OF WHOLE NUMBERS 107
1. Accumulator = 0
2. Multiplier is not 0
3. Multiplier is odd, Accumulator = Accumulator + Multiplicand
= 0 + 110 = 110
4. Shift Multiplicand left
Multiplicand = 1100
5. Shift Multiplier right
Multiplier = 10
2. Multiplier is not 0
3. Multiplier is even
4. Shift Multiplicand left
Multiplicand = 11000
5. Shift Multiplier right
Multiplier = 1
2. Multiplier is not 0
3. Multiplier is odd, Accumulator = Accumulator + Multiplicand
= 110 + 11000 = 11110
4. Shift Multiplicand left
Multiplicand = 11000
5. Shift Multiplier right
Multiplier = 0
6. Terminate, result is Accumulator = 11110 = 30
done_mult:
lw $ra, 0($sp) # return to calling function
addi $sp, $sp, 4
jr $ra
########################## End function mult
(a)
[label:] mult $rs, $rt [# comment]
[label:] mfhi $rd [# comment]
[label:] mflo $rd [# comment]
(b)
{Hi, Lo} ← Reg[$rs] · Reg[$rt]
Reg[$rd] ← Hi
Reg[$rd] ← Lo
(c)
Figure 3.70: Multiply Statement: (a) Format of Multiply, Move from Hi, and
Move from Lo (b) Meaning of Multiply, Move from Hi and Move from Lo (c)
Example, which puts the product of registers $t3 and $a0 into the register pair
$v1, $v0.
3.12.3 Exercises
1. (a) Use pencil and paper to multiply in binary 101112 · 1102 .
(b) Use pencil and paper to multiply in binary 10112 · 110112.
2. You wish to use one of the algorithms given to multiply 4023 (given in
register $a0) times 201 (given in register $a1). How many times will the
loop repeat if using:
3. Show a trace (similar to Fig 3.46) of the mult function given in Fig 3.69
when the value in register $a0 is 25, and the value in register $a1 is 13.
3.12. MULTIPLICATION OF WHOLE NUMBERS 111
.text
##################### Begin function volume
### Find the volume of a cube
### Edge length is in $a0
### Pre: Edge length must not exceed 0x8000
### (32,768)
### Post: Volume is returned in register pair ($v1,$v0)
volume:
addi $sp, $sp, -8
sw $ra, 0($sp)
sw $s0, 4($sp)
lw $s0, 4($sp)
lw $ra, 0($sp)
addi $sp, $sp, 8
jr $ra
###################### End function volume
4. Write a MIPS function named ‘block’ to find the volume and surface area
of a right rectangular prism (i.e. a cube for which the angles are all 90,
but the edges may have different lengths). The API is shown below:
3.13 Division
In this section we consider the division of whole numbers. Division is interesting
in that there are actually two results for division of integers - a quotient and a
remainder, both of which are also whole numbers. For example 39 divided by 5
produces a quotient of 7 and a remainder of 4. The remainder is often called a
‘modulus’, or simply ‘mod’. Many high level languages provide an operator for
mod (usually the % symbol). Thus in a language such as Java or C++, 39/5 is
7 but 39%5 is 4.33
33 Be careful when using the % operator with negative numbers; high level languages do not
.text
######################## Begin divLoop function
### Perform division using repeated subtraction
### The dividend is in register $a0
### The divisor is in register $a1
### The quotient is left in $v0
### The remainder is left in $v1
### Pre: The divisor is positive
### The dividend is not negative
### Post: Both $a0 and $a1 are unchanged
### Author: sdb
divLoop:
addi $sp, $sp, -4
sw $ra, 0($sp)
$v1 $a0 0
Figure 3.73: Diagram of a left shift on a pair of registers, $v1 and $a0
To implement this shift in a register pair, we can use three instructions (the
left register is $v1, and the right register is $a0):
3. Treating the remainder (left) and the divisor (right) as a register pair,
shift left.
(a) Subtract the divisor from the remainder, leaving the result in the
remainder.
(b) Increment the quotient
5. Repeat from 2, once for each bit in the word (i.e. 32 times).
.text
######################## Begin div function
### Perform division using shift and suubtract algorithm
### The dividend is in register $a0
### The divisor is in register $a1
### The quotient is left in $v0
### The remainder is left in $v1
### Pre: The divisor is positive
### Pre The dividend is not negative
### Author: sdb
div:
addi $sp, $sp, -4
sw $ra, 0($sp)
li $v0, 0 # quotient
li $v1, 0 # remainder
li $t0, 32 # loop counter
lp_div:
beq $t0, $0, done_div
sll $v0, $v0, 1 # shift quotient
sll $v1, $v1, 1 # shift remainder,dividend
bge $a0, $0, notNeg_div
addi $v1, $v1, 1
notNeg_div:
sll $a0, $a0, 1 # shift dividend
blt $v1, $a1, noSubtr_div # subtract?
sub $v1, $v1, $a1
addi $v0, $v0, 1
noSubtr_div:
addi $t0, $t0, -1
j lp_div # Repeat the loop
done_div:
lw $ra, 0($sp) # return to calling function
addi $sp, $sp, 4
jr $ra
########################## End function div
Figure 3.74: Function to divide whole numbers, using shift and subtract algo-
rithm
3.13. DIVISION 117
(a)
[label:] div $rs, $rt [# comment]
[label:] mfhi $rd [# comment - Remainder]
[label:] mflo $rd [# comment - Quotient]
(b)
Hi ← Reg[$rs]%Reg[$rt]
Lo ← Reg[$rs]/Reg[$rt]
Reg[$rd] ← Hi
Reg[$rd] ← Lo
(c)
Figure 3.76: Divide Statement: (a) Format of Divide, Move from Hi, and Move
from Lo (b) Meaning of Divide, Move from Hi and Move from Lo (c) Example,
which divides using register $t3 as the dividend and register $a0 as the divisor,
leaving the quotient in register $v1 and the remainder in register $v0.
1. Divide the given number by 60. The remainder is the the number of
seconds in the result, and the quotient will be the total minutes remaining.
For example, 4713 % 60 is 33, so the number of seconds is 33. And 4713
/ 60 is 78. Save this for the next step.
2. The quotient from the previous step is then divided by 60. The remainder
is the number of minutes, and the quotient is the number of hours. For
example, the number of minutes from the previous step is 78. Divide 78
/ 60. The remainder, 18 is the number of minutes, and the quotient, 1, is
the number of hours.
3.13.3 Exercises
1. Using pencil and paper, perform the following division operations:
(a) 29 / 3 = ?
29 % 3 = ?
(b) 290 / 17 = ?
290 % 17 = ?
(c) 4098 / 256 = ?
4098 % 256 = ?
2. You wish to use one of the algorithms given to divide 4023 by 21. How
many times will the loop repeat if using:
3. Show a trace (similar to Fig 3.46) of the div function given in Fig 3.74 when
the dividend (in register $a0) is 4023 and the divisor (given in register $a1)
is 210.
4. Write a function which will take a distance measurement, given in inches,
and produce the same distance in yards, feet, and inches. For example, if
the given distance is 86 inches, the result should be 2 yards, 1 foot, and 2
inches. The API is shown below:
### Author:
.text
######################## Begin hms function
### Convert a whole number of seconds to
### hours, minutes, and seconds.
### 3701 seconds => 1 hour 1 minute, 41 seconds
### Total seconds is provided in $a0
### Register $a1 points to a memory area for
### three results: hours, minutes, seconds
### Pre: All values are non-negative.
### Author: sdb
hms:
addi $sp, $sp, -12
sw $ra, 0($sp)
sw $s0, 4($sp)
sw $s1, 8($sp)
li $s0, 60
div $a0, $s0
mfhi $s1
sw $s1, 8($a1) # seconds
mflo $a0 # total minutes
div $a0, $s0
mfhi $s1
sw $s1, 4($a1) # minutes
mflo $s1
sw $s1, 0($a1) # hours
Use the MIPS div instruction for division. Disregard the remainders.
6. Write a MIPS function named ‘dotDiv’ to find the dot quotient of two
vectors. The dot quotient is the vector which contains the quotients of
corresponding elements. For example, if the two vectors are A = (5, 0, 7)
and B = (3, 3, 2), then the quotient vector is (1, 0, 3). The API is shown
below:
Use the MIPS div instruction for division. Disregard the remainders.
• 2.075
• -3.0
• 6.02 · 1023
• 0.000001
(a)
[label:] add.s $fd, $fs, $ft [# comment]
[label:] sub.s $fd, $fs, $ft [# comment]
[label:] mul.s $fd, $fs, $ft [# comment]
[label:] div.s $fd, $fs, $ft [# comment]
(b)
F pReg[$f d] ← F pReg[$f s] + F pReg[$f t]
F pReg[$f d] ← F pReg[$f s] − F pReg[$f t]
F pReg[$f d] ← F pReg[$f s] · F pReg[$f t]
F pReg[$f d] ← F pReg[$f s]/F pReg[$f t]
(c)
(d)
(e)
(f)
Figure 3.78: Single Precision Floating Point Instructions: (a) Format of arith-
metic instructions; (b) Meaning of each instruction from part (a); (c) Example,
which adds the values in floating point registers $f3 and $f6, plaicng the result
in floating point register $f2; (d) Example which decreases the value in float-
ing point register $f2 by the value in floating point register $f6; (e) Example,
which squares floating point register $f2; (f) Example, which divides the value
in floating point register $f22 by the value in floating point register $f0, leaving
the result in floating point register $f22
124 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
(a)
[label:] mov.s $rd, $rs [# comment]
(b)
F pReg[$rd] ← F pReg[$rs]
(c)
Figure 3.79: Floating Point Move Instruction: (a) Format of move instruction;
(b) Meaning of move instruction; (c) Example, which copies the floating point
value from floating point register $f4 into floating point register $f2.
these operations are not commutative. A few comments on these floating point
instructions:
• There is no floating point register which always contains the value zero,
as there is with the general registers.
• The programmer may wish to put the value 0.0 into a floating point reg-
ister, by subtracting a register from itself:
sub.s $f2, $f4, $f4 # Put zero into reg $f2
However, because of the inexact nature of floating point representations,
this should be avoided. Instead load a zero constant from memory.
• The programmer may wish to put the value 1.0 into a floating point reg-
ister, by dividing a register by itself:
div.s $f2, $f4, $f4 # Put 1.0 into reg $f2
However, because of the inexact nature of floating point representations,
this should be avoided. Instead load a 1.0 constant from memory.
• For division there is only one result, the quotient (unlike the fixed point
division instruction which produces two results).
To transfer the contents of one floating point register into another float-
ing point register, there is a floating point move instruction. It is mov.s (or
mov.d for double precision). The format and definition of a floating point move
instruction is shown in Fig 3.79.
word of memory to a particular single precision value, and the .double directive
initializes two consecutive full words of memory to a particular double precision
value:
pi: .float 3.141592653
pi: .double 3.14159265358979324
An array (i.e. a vector of single precision floating point values) can be
initialized:
numbers: .float 2.3, 0.00001, 45, 6.02e23
The last value in the array named numbers is Avogadro’s number, which is
6.02 · 1023 .
2. Branch to another instruction. This step will use the condition code, set
in the previous step, to determine whether the branch should take place.
The comparison instructions are c.eq.s, for compare floats for equality,
c.lt.s for compare floats for strictly less than, and c.le.s for compare floats
for less than or equal. These instructions are defined in Fig 3.82.
36 The c1 stands for coprocessor 1
126 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
(a)
[label:] lwc1 $rt, symbol [# comment] symbolic address
[label:] lwc1 $rt, imm($rs) [# comment] explicit address
[label:] swc1 $rt, symbol [# comment] symbolic address
[label:] swc1 $rt, imm($rs) [# comment] explicit address
(b)
F pReg[$rt] ← M emory[symbol]
F pReg[$rt] ← M emory[imm + Reg[$rs]]
M emory[symbol] ← F pReg[$rt]
M emory[imm + Reg[$rs]] ← F pReg[$rt]
(c)
lwc1 $f0, pi
(d)
(e)
(f)
.text
################## Begin circle function ##########
# Find the area and circumference of a circle having
# a given radius.
# Pre:
# Register $a0 contains the memory address of the circle’s
# radius (single precision)
# Register $a1 contains the memory address for the two
# floating point results:
# - area = pi*r*r
# - circumference = 2*pi*r
# Post: Registers $a0 and $a1 are unchanged
# Author sdb
circle:
addi $sp, $sp, -4
sw $ra, 0($sp)
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra # return
Figure 3.81: Function to compute the area and circumference of a circle having
a given radius
128 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
(a)
[label:] c.eq.s $fs, $ft [# comment]
[label:] c.lt.s $fs, $ft [# comment]
[label:] c.le.s $fs, $ft [# comment]
(b)
if F pReg[$f s] = F pReg[$f t], cc ← 1 else cc ← 0
if F pReg[$f s] < F pReg[$f t], cc ← 1 else cc ← 0
if F pReg[$f s] ≤ F pReg[$f t], cc ← 1 else cc ← 0
(c)
(d)
(e)
Figure 3.82: Single Precision Floating Point Comparison Instructions: (a) For-
mat of comparison instructions; (b) Meaning of each instruction from part (a)
in which cc is the 1-bit condition code; (c) Example which compares floating
point registers $f4 and $f6 for equality; (d) Example which determines whether
floating point register $f14 is strictly less than floating point register $f12; (e)
Example which determines whether floating point register $f22 is less than or
equal to floating point register $f16
3.14. FLOATING POINT INSTRUCTIONS 129
(a)
[label:] bc1t symbol [# comment]
[label:] bc1f symbol [# comment]
(b)
if cc = true, branch to instruction of symbol
if cc = false, branch to instruction of symbol
(c)
bc1t lp
(d)
bc1f done
Figure 3.83: Single Precision Floating Point Branch Instructions: (a) Format
of conditional branch instructions; (b) Meaning of each branch instruction from
part (a) in which cc is the 1-bit condition code; (c) Example which branches to
lp only if the condition code is 1 (d) Example which branches to done only if
the condition code is 0
.text
##################### Begin function largestOf3
### Author: sdb
### Find the largest of three given floats
### Register $a0 points to three consecutive floats in memory
### Register $a1 points to memory word where largest is to be stored
largestOf3:
addi $sp, $sp, -4
sw $ra, 0($sp)
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra
###################### End function largestOf3
Figure 3.84: Function to find the largest of three given floating point values
3.14. FLOATING POINT INSTRUCTIONS 131
As one final example for this section, we show a function which searches an
array of floats for a particular target, with a given error tolerance. The reason
for the error tolerance is that floats do not have perfect precision. Thus, for
example, if searching for the target 17.01, we may wish to specify a tolerance
of 0.000001 so that any value in the array which is sufficiently close to 17.01
qualifies as matching the target. Figures 3.85 and 3.86 show this sequential
search function.
In order to determine whether a value from the array is sufficiently close to
the target, we compute the absolute value of the difference between the value
and the target, and compare with the tolerance, epsilon.
|value − target| < epsilon
For absolute value we use a local function which puts the absolute value of
floating point register $f6 into floating point register $f6.
• Assignment to a variable:
132 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
Figure 3.85: Function to search a given array of floats for a given target, to
within a given tolerance. It calls an absolute value function - Fig 3.86
3.14. FLOATING POINT INSTRUCTIONS 133
Figure 3.86: Function to find the absolute value of a float, called by the sequen-
tial search function in Fig 3.85
134 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
.data
x: .word 23
result: .float 0
.text
lwc1 $f0, x ## $f0 <- 23
cvt.s.w $f2, $f0 ## $f2 <- 23.0
swc1 $f2, result
float f = 3.0;
int i;
i = f; // convert 3.0 to an int
• A cast:
float f = 3.0;
int i;
i = (int) f; // convert 3.0 to an int
• floor(23.99) = 23
• floor(23.49) = 23
3.14. FLOATING POINT INSTRUCTIONS 135
.data
x: .float 23.65
result: .word 0
.text
lwc1 $f0, x ## $f0 <- 23.65
cvt.w.s $f2, $f0 ## $f2 <- 23
swc1 $f2, result
• floor(23.0) = 23
3.14.7 Exercises
1. Test the function which finds the area and circumference of a circle (Fig 3.81)
using the MARS simulator. Write a Driver (i.e. a main program which
calls the circle function), and check the results in memory.
2. Test the function which finds the largest of three floats (Fig 3.84) using
the MARS simulator. Write a Driver (i.e. a main program which calls the
largestOf3 function), and check the result in memory.
3. Test the sequential search function (Fig 3.85) using the MARS simulator.
Write a Driver (i.e. a main program which calls the seqSearch function),
and check the result in memory. Be sure to test the case where the target
is not found in the array, and the case where there is more than one
occurrence of the target in the array.
38 When the decimal place is 5, as in 7.5, for example, we take the position that the result
.data
x: .float 23.65
half: .float 0.5
result: .word 0
.text
lwc1 $f0, x ## $f0 <- 23.65
lwc1 $f2, half ## $f2 <- 0.5
add.s $f0, $f0, $f2 ## $f0 <-- 24.15
cvt.w.s $f2, $f0 ## $f2 <- 24
swc1 $f2, result
4. Show a trace (similar to Fig 3.52) of the largestOf3 function when the three
floats are 2.0, 5.5, and -9.9. Show the contents of floating point registers
in decimal. Do not show memory contents (we have not yet discussed how
floating point values are represented).
5. Write and test a function which will return the volume and surface area
of a sphere having a given radius. Store an approximation of pi as local
data in your function. The API is shown below:
Hint: Assume the first value is the largest; then use a loop to scan the
rest of the array. Each time you find a value larger than the largest you’ve
seen so far, save it in a floating point register.
7. The exponential function, exp(x) = ex where e is approximately 2.7181818284590
is the inverse of the natural log function, ln(x). This function can be com-
puted as an infinite sum of terms (this is called a Taylor series):
exp(x) = 1 + x2 /2! + x3 /3! + x4 /4! + ...
Define a MIPS function named exp to evaluate the exponential function
for a given value of x, and a given tolerance value, epsilon, such that the
result is within epsilon of the correct result. The API is shown below:
Hints:
• Each term, t, can be calculated from the previous term by multiplying
by x and dividing by a counter, n:
t = t ∗ x/n
• Terminate the loop when a term’s value is smaller than epsilon.
8. Write and test a binary search function to search a sorted array of floats
for a given target, within a given tolerance.
The API is shown below:
Hint: Find the midpoint of the array, and compare it with the target. If
equal (within the tolerance), terminate. If the value of the midpoint is
less than the target, you know the target must be after the midpoint if
138 CHAPTER 3. ASSEMBLY LANGUAGE FOR MIPS
it is in the array; repeat using the position after the midpoint as the left
end. If the value of the midpoint is greater than the target, you know the
target must be before the midpoint if it is in the array; repeat using the
position before the midpoint as the right end. If the position of the left
end exceeds the position of the right end, the target is not found.
9. Write and test a function named average which will find the average of an
array of floats.
10. Write and test a function named round which will round a float to the
nearest hundred, million, thousandth, etc. The first argument is the float
to be rounded, and the second number describes the kind of rounding
desired. Examples:
round(13189,100) = 13200
round(13189,1000) = 13000
round(13189,10) = 13190
round(13189,10) = 13190
round(17.0653,0.01) = 17.07
round(17.0653,0.1) = 17.1
round(17.0653,0.001) = 17.065
emphasize that these system calls are specific to MARS and may not apply with
other MIPS simulators (such as SPIM).39
All MARS system calls are invoked with the syscall instruction. The
particular function to be performed is specified by the value in register $v0.
Fig 3.90 shows some of the options available for a system call. For a complete
list of options, see the MARS web site.
li $v0, 10
syscall # normal termination
This means that if register $v0 contains the final result of a calculation, it
must be copied to another register, or saved in memory, before terminating (the
return code of 10 will clobber the result in register $v0).40
As an example, we show below a main function which calls the string com-
parison function shown in Fig 3.65. The main function stores the result in
memory before terminating execution.
.data
str1: .asciiz "Good Morning"
str2: .asciiz "Good morning"
To read a string load the value 8 into register $v0. Also load a the address
of a memory buffer for the input string into register $a0, and load a maximum
length for the input into register $a1 before executing the syscall instruction.
Execution will then pause, waiting for the user to enter any string of characters
on the keyboard and press the Enter key. The string entered will be stored in
the memory buffer, as shown below:
The .space directive reserves 1001 bytes (one extra byte for the null termi-
nating character) of memory for the buffer.
To read a floating point value from the keyboard, load the value 6 into
register $v0 before executing the syscall instruction. The value entered at the
keyboard will stored in floating point register $f0:
To display an int on the monitor, load the value 1 into register $v0, and
load the value to be displayed into register $a0, before executing the syscall
instruction, as shown below:
# we wish to display the int in register $t3
move $a0, $t3 # move into register $a0
li $v0, 1 # code to put out an int
syscall
# value of $t3 is displayed on monitor
To display a floating point value on the monitor, load the value 2 into register
$v0, and load the value to be displayed into register $f12, before executing the
syscall instruction, as shown below:
# we wish to display the float in register $f0
move $f12, $f0 # move into float register $f12
li $v0, 1 # code to put out a float
syscall
# value of $f0 is displayed on monitor
To display a (null terminted) string on the monitor, load the address of the
string into register $a0, and load the value 4 into register $v0. Then when the
instruction is executed, the string will be displayed on the monitor (in MARS’
message window):
# we wish to display the message
la $a0, message # address of message to be displayed
li $v0, 4 # code to put out a string
syscall # message is written out
.data
message: .asciiz "Good morning"
• To convert the sum of the lengths (register $s1) to floating point, we use
the cvt.s.w instruction. It expects a fixed point argument in its second
operand, and produces the corresponding floating point value in its first
operand.
• Likewise for the number of strings (register $s2).
3.15.5 Exercises
1. Write a program to calculate 10! and display the result on the monitor
(i.e. MARS’ message window).
2. Write a program to obtain the radius of a circle from the user’s keyboard.
It should then display the area and circumference of the circle on the
monitor (i.e. MARS’ message window).
3. Write a program to input a string from the keyboard, eliminate all spaces
from the string, and write the resulting string out to the monitor (MARS’
message window).
Figure 3.91: Program to display the average length of several strings entered at
the keyboard
Chapter 4
144
4.1. INSTRUCTION FORMATS 145
R Format
31 26 25 21 20 16 15 11 10 6 5 0
Figure 4.1: Register Format (R) is used for instructions such as add, or, and
srl. The diagram shows the bit positions for each field. A description of each
field is shown in the table.
• The right operand for arithmetic and logical operations for which the right
operand is a constant.
I Format
31 26 25 21 20 16 15 0
opcode rs rt immediate
Figure 4.2: Immediate Format (I) is used for immediate instructions such as
andi, compare instructions such as beq, and memory reference instructions
such as lw. The diagram shows the bit positions for each field. A description of
each field is shown in the table.
J format instructions are used for j (jump) and jal (jump and link) in-
structions. This is the simplest instruction format - it consists of only two fields,
the operation code and the jump address. The format is shown in Fig 4.3.
in Fig 3.1.
2 Conversely, the design of the CPU has an impact on the fields widths in an instruction.
4.1. INSTRUCTION FORMATS 147
J Format
31 26 25 0
opcode address
Figure 4.3: Jump Format (J) is used for unconditional jump instructions, such
as j and jal. The diagram shows the bit positions for each field. A description
of each field is shown in the table.
the width of the register fields in a MIPS instruction is always 5 bits, the CPU
cannot have more than 32 general registers.3
4.1.2 Exercises
1. Explain why the shamt field in an R format shift instruction is 5 bits in
length.
11 9 8 6 5 3 2 0
op left right dest
Figure 4.4: Instruction format for a hypothetical machine. The fields left,
right, and dest specify registers.
• 64 general registers
• 16 different instructions
• Instructions which have two operands, both of which are registers.
Show a diagram, similar to Fig 4.1 of a possible instruction format for this
machine.
(a) How many different instructions (i.e. different opcodes) could this
machine have?
(b) How many general registers could this machine have? Assume the
left, right, and destination fields specify register numbers.
The convention which is normally followed for a field with a length not a
multiple of 4, is to work from right to left (low order bit to high order bit),
grouping the bits into groups of four. The remaining bits at the high order end
can still be represented by a hex digit which may not correspond to four bits.
As an example we take binary value 01001011010. In groups of four, we
have 010 0101 1010 which is 25a in hexadecimal. Note that the 2 represents
only 3 bits, whereas each of the other hex digit represents 4 bits.
Another example: starting with the 9 bit value 111111111 we have
1 1111 1111 which is 1ff in hex.
We also will be interested in the opposite transformation: given an instruc-
tion in hexadecimal, find the values of the fields (in hex or decimal). For exam-
ple, if we have the 32-bit instruction given in hexadeximal as ba0af863 which
can be written binary as:
1011 1010 0000 1010 1111 1000 1100 0011. If this is an R format instruc-
tion we can regroup the fields as:
• opcode = 2ex = 46
• rs = 10x = 16
• rt = 0ax = 10
• rd = 1fx = 31
• shamt = 03x = 3
• funct = 03x = 3
4.2.1 Exercises
1. Show each of the following binary fields in hexadecimal. Do not convert
to decimal; rather, group the bits into groups of 4 bits.
(a) 10111
(b) 101010101010110
(c) 110000110110
2. Show each of the following hex values, with an associated field size, in
binary. Do not convert to decimal; rather, treat each hex digit as repre-
senting 4 or fewer bits.
overflow exception.
4.3. PSEUDO OPERATIONS 151
4.3.2 Move
The move operation in assembly language has no corresponding instruction in
machine language; it is a pseudo-op. The assembler will replace each move
operation with an equivalent add instruction, in which one of the operands is
register $0.5
For example, if you use the following instruction:
move $t0, $t3 # move $t3 into $t0
the assembler will substitute:
add $t0, $0, $t3
It accomplishes the desired result - the value in register $t3 will be stored into
register $t0.
As with all pseudo-ops, move was not included in the MIPS architecture
because it is not essential - but the assembler provides it as a convenience for
the programmer.
4.3.3 Not
The logical not operation introduced in chapter 3 is actually a pseudo-op. The
assembler translates the not operation to a machine language nor instruction
in which one of the operands is register $0. The logical identity applied here is:
∼ x =∼ (x ∨ 0).
For example, if the assembly language statement is:
not $v0, $a0 # $v0 = complement of $a0
then the assembler will translate it to:
nor $v0, $a0, $0
This accomplishes the desired result.
overflow exception.
6 MARS would actually translate this to an ori instruction followed by an add instruction.
152 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
• This example makes use of the $at (assembler temporary) register which
is register 1. This register is normally reserved for use by the assembler.
The programmer who attempts to use this register for temporary storage
is asking for trouble.
1. The constant value, 0x00010003, is split into two halves. The high
order half, 0x0001, is put into the high order half of register $at by
the lui instruction.
2. The low order half of the constant, 0x0003, is placed in the low order
half of the $at register, without disturbing the high order half of that
register. This is done with the ori instruction. The $at register is
now storing the full constant, 0x00010003.
3. The addition can now be done with an add instruction, using the $at
register as an operand.
4.4. R FORMAT INSTRUCTIONS 153
4.3.6 Exercises
1. The following assembly language statements all use pseudo operations.
Show equivalent statements from the MIPS instruction set. Use the $at
register if temporary storage is needed.
function ignored
Instruction opcode code fields Example
add $rd, $rs, $rt 00x 20x shamt add $a1, $v1, $0
$rs $rt $rd
03 00 05
sub $rd, $rs, $rt 00x 22x shamt sub $t0, $t1, $t2
$rs $rt $rd
09 0a 08
Here we show the shamt field with question marks. The add instruction
ignores this field, so it really doesn’t matter what value is in that field.
This is sometimes called a don’t-care value.
2. Group the bits into groups of 4, as shown below.
00 60 28 20
Again we show the shamt field as a don’t care (i.e. question marks) because
the sub instruction ignores this field.
2. Group the bits into groups of 4, as shown below.
function ignored
Instruction opcode code fields Example
and $rd, $rs, $rt 00x 24x shamt and $a1, $v1, $0
$rs $rt $rd
03 00 05
or $rd, $rs, $rt 00x 25x shamt or $t0, $t1, $t2
$rs $rt $rd
09 0a 08
xor $rd, $rs, $rt 00x 26x shamt xor $at, $ra, $a0
$rs $rt $rd
1f 10 01
nor $rd, $rs, $rt 00x 27x shamt nor $t0, $t1, $t2
$rs $rt $rd
09 0a 08
Figure 4.6: Machine language for the logical instructions and, or, xor, nor
01 2a 40 22
The student should verify both of these results by assembling the source state-
ments with MARS to view the corresponding machine language instructions in
hexadecimal.
Again we show the shamt field as a don’t care (i.e. question marks) because
the logical R format instructions ignore this field.
03 e4 08 26
The unary Not operation was covered in chapter 3 but it is actually a pseudo-
operation making use of the identity:
∼ x = x N or 0
The assembler will translate a Not statement into a Nor instruction in machine
language; the second operand will be register $0. As an example, we take:
not $v1, $t9 # $v1 = ~ $t9
For this example there will be an additional step to translate the not statement
to an equivalent nor statement.
03 20 18 27
4.4. R FORMAT INSTRUCTIONS 157
function ignored
Instruction opcode code fields Example
srl $rd, $rt, shamt 00x 02x rs srl $a1, $v1, 17
shift right logical $rt $rd shamt
03 05 11
sll $rd, $rt, shamt 00x 00x rs sll $t0, $t0, 12
shift left logical $rt $rd shamt
08 08 0c
sra $rd, $rt, shamt 00x 03x rs sra $sp, $s0, 2
shift right arithmetic $rt $rd shamt
10 1d 02
00 10 e8 83
158 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
function ignored
Instruction opcode code fields Example
mult $rs, $rt 00x 18x $rd mult $t0, $t1
shamt $rs $rt
08 09
div $rs, $rt 00x 1ax $rd div $a0, $a3
shamt $rs $rt
04 07
00 87 00 1a
function ignored
Instruction opcode code fields Example
jr $rs 00x 08x rt jr $ra
rd $rs
shamt 31
03 e0 00 08
4.4.6 Exercises
1. Translate the following assembly language statements to machine language
instructions. In each case, show your solution in hexadecimal.
(a) add $v0, $t0, $t1
(b) or $ra, $v0, $s1
(c) not $t3, $a3
(d) sll $v1, $a0, 25
(e) mult $a1, $s7
(f) jr $a3
2. Translate each of the following machine language instructions to an equiv-
alent assembly language statement. Note that some of the information
provided should be ignored.
160 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
(a) 00 0a 4c c2
(b) 00 85 11 20
(c) 00 a8 20 27
(d) 00 e2 ff 18
opcode rs rt immediate
4.5. I FORMAT INSTRUCTIONS 161
ignored
Instruction opcode fields Example
addi $rt, $rs, imm 08x addi $a1, $v1, 27
$rs $rt imm
03 05 001b
andi $rt, $rs, imm 0cx andi $t0, $t1, -3
$rs $rt imm
09 08 fffd
ori $rt, $rs, imm 0dx ori $t0, $0, 0
$rs $rt imm
00 08 0000
xori $rt, $rs, imm 0ex xori $a3, $a1, 1000
$rs $rt imm
05 07 03e8
lui $rt, imm 0fx $rs lui $a3, 0x1001
$rt imm
07 1001
Figure 4.10: Machine language formats for instructions which use the immediate
field for a constant operand
ignored
Instruction opcode fields Example
lw $rt, imm($rs) 23x lw $a1, 12($a0)
$rs $rt imm
04 05 000c
sw $rt, imm($rs) 2bx sw $t0, -24($a1)
$rs $rt imm
05 08 ffe8
Figure 4.11: Machine language formats for memory reference instructions with
explicit addressing
1. We show the opcode and register fields in binary, but we find it easier to
show the immediate field in hex:
opcode rs rt immediate
.data
salary: .word 300
neg: .word -1
name: .asciiz "harry"
negByte: .byte -1
zero: .word 0, 12
4.5. I FORMAT INSTRUCTIONS 163
1001000016 00 00 01 2c ff ff ff ff 72 72 61 68 00 ff 00 79
1001001016 00 00 00 00 00 00 00 0c
The labels in the data section have values corresponding to their memory
addresses:
salary = 0x10010000
neg = 0x10010004
name = 0x10010008
negByte = 0x1001000e
zero = 0x10010010
• A data value declared with the .word directive is a twos complement inte-
ger aligned on a full word boundary. This means that its hex address must
end with a 0, 4, 8, or c. Thus the variable zero does not begin immediately
after negByte; it begins at the next full word boundary, 0x10010010.
1. A lui (load upper immediate) instruction to load the high order half word
of the address into the $at (assembler temporary) register.
164 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
If we were to refer to the data labeled by name, for example, the offset would
be 0x0008. If we have 0x1001 in the high order 16 bits of register $at, then the
data can be accessed at the explicit address:
0x0008($at)
The effective address will be 8 + 0x10010000 = 0x10010008, which is the
memory address represented by the label name.
For another example, we will translate the instruction
lw $a3, zero
to hexadecimal machine language. We do this in three steps:
opcode rs rt immediate
(b) Group the binary fields into groups of 4, treating don’t cares as zeros:
0011 1100 0000 0001 0001 0000 0000 0001
(c) Convert the binary fields to hexadecimal:
3c 01 10 01
opcode rs rt immediate
Load address
Recall from chapter 3 that the load address instruction (it is actually a pseudo
op) is used to put the memory address of a data value into a register. This
instruction is most often used with a symbolic address. Using the same data
values given above, we translate the following instruction to machine language.
la $a3, negByte # $t0 = address of negByte
opcode rs rt immediate
(b) Group the binary fields into groups of 4, treating don’t cares as zeros:
0011 1100 0000 0001 0001 0000 0000 0001
(c) Convert the binary fields to hexadecimal:
3c 01 10 01
3. For the ori statement, we use the same three steps:
(a) First we show all fields in binary:
opcode rs rt immediate
166 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
ignored
Instruction opcode fields Example
beq $rs, $rt, imm 04x a: beq $t0, $t1, forward
$rs $rt imm
$t0 $t1 0003
bne $rs, $rt, imm 05x b: bne $t3, $t2, start
$rs $rt imm
$t3 $t2 fff8
Figure 4.12: Machine language formats for actual MIPS branch instructions
3 instructions after the instruction after the branch instruction. This would
be a branch to the instruction at address 0x00400040. A branch instruction at
address 0x0040030 with a relative branch address of -4 would be a branch to
the instruction which is 3 instructions prior to the branch instruction; it would
be at address 0x0400024.
The branch instructions are I (immediate) format instructions. The two
registers being compared are in the rs and rt fields of the instruction. The
relative address is a twos complement value in the immediate field of the branch
instruction. The branch instructions are described more formally in Fig 4.12.
The labels shown in Fig 4.12 are taken from the following example:
.text
start: # Beginning of text area
li $t0, 0
li $t1, 1
a: beq $t0, $t1, forward # forward branch
li $t2, 2
li $t3, 3
li $t4, 4
forward:
li $t5, 5
b: bne $t3, $t2, start # backward branch
li $t6, 6
Note in Fig 4.12. that branch (beq)to forward is three instructions after the
li $t2, 2 instruction. Also the branch (bne)to start is eight instructions prior
to the li $t6, 6 instruction.
Translating the bne instruction to machine language, we use the same three
steps used previously:
b: bne $t3, $t2, start
1. We show the opcode and register fields in binary, but we find it easier to
show the immediate field in hex (-8 = 0xfff8):
168 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
function ignored
Instruction opcode code fields Example
slt $rd, $rs, $rt 00x 2ax shamt slt $v0, $a1, $a0
$rs $rt $rd
05 04 02
Figure 4.13: Machine language for the ‘set if less than’ instruction
opcode rs rt immediate
2. Group the bits into groups of four, assuming don’t cares are zeros, as
shown below.
1. Compare the two registers using slt, storing the result in the $at
register:
slt $at, $rs, $rt # $rs < $rt ?
2. Branch if the result is 1:
bne $at, $0, dest # branch if not 0
1. Compare the two registers using slt but reverse the operands, stor-
ing the result in the $at register:
slt $at, $rt, $rs # $rt < $rs ?
2. Branch if the result is not 1:
beq $at, $0, dest # branch if 0
1. Compare the two registers using slt but reverse the operands, stor-
ing the result in the $at register:
slt $at, $rt, $rs # $rt < $rs ?
2. Branch if the result is 1:
bne $at, $0, dest # branch if not 0
1. Compare the two registers using slt but reverse the operands, stor-
ing the result in the $at register:
slt $at, $rs, $rt # $rs < $rt ?
2. Branch if the result is not 1:
beq $at, $0, dest # branch if 0
2. Group the binary fields into groups of 4, assuming don’t cares are zeros:
0000 0001 0000 0100 0000 1000 0010 1010
3. Convert the binary fields to hexadecimal:
01 04 08 2a
Next we translate the bne instruction:
1. We show the opcode and register fields in binary, but we find it easier to
show the immediate field in hex:
opcode rs rt immediate
4.5. I FORMAT INSTRUCTIONS 171
4.5.5 Exercises
In these exercises assume the text begins at location 0x00400000.
.text
start:
lw $a0, parm
addi $t0, $t0, 64
ble $t0, $s0, done
sw $t0, x
bne $t0, $t1, start
bgt $a0, $a1, start
done:
.data
x: .word 7
parm: .word 3
Figure 4.15: Machine language for j (Jump) and jal (Jump And Link) instruc-
tions. In the example foo labels the instruction at location 0x0040001c
38 e2 ff ff
23 ff ff f8
3c 01 10 01
8c 30 00 04
2. Take the low order 26 bits (the first hex digit represents two bits):
0x0100813
start:
j done
li $t0, 3
add $t0, $t1, $t4
done:
jr $ra
000010 00000100000000000000000011
31 26 25 0
opcode address
08 10 00 03
174 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
4.6.1 Exercises
1. Given the program shown below, translate the j lp statement to ma-
chine language. Assume the starting location is 0x00400000.
start:
add $t0, $a0, $0
sub $t1, $a1, $a2
lp:
bne $t0, $0, done
addi $t0, $t0, -1
xori $t3, $t3, 0xffff
j lp
done:
start:
jal fn # call the function
bne $t0, $0, done
addi $t0, $t0, -1
done:
jr $ra
fn:
addi $t3, $t3, 5
jr $ra
start:
li $t3, 7
li $t4, 12
lp:
bne $t3, $t4, done
jal function
addi $t3, $t3, 1
j lp
done:
jr $ra
function:
addi $sp, $sp, 4
mult $a0, $a1
mflo $t0
jr $ra
4.7. FLOATING POINT DATA REPRESENTATION 175
1 2 . 0 5 3
• 101 = 10
• 100 = 1
• 10−1 = 0.1
• 10−2 = 0.01
• 10−3 = 0.001
9 MARS has no option to show Data Memory in decimal. The user must view floating point
1/2=0.5=0.12
1/4=0.25=0.012 3/4=0.75=0.112
Now that we understand the meaning of places after the binary point, we
can produce the binary fixed point representation for a rational number. For
example, 3/4 = 0.5 + 0.25 = 0.75 = 0.112 and 3/8 = 1/4 + 1/8 = 0.375 = 0.0112
A diagram of some binary fixed point numbers is shown in Fig 4.17.
In that diagram we show only values which can be represented with perfect
precision. However, as with decimal numbers, there will be some rational num-
bers for which we do not have an exact representation. For example, 1/3 is not
exactly equal to 0.333333, and no matter how many decimal places we write, it
will never be perfectly accurate.10
In base two, 1/3 could be approximated as follows:
1/3 ≈ 1/4 + 1/16 + 1/64 = 0.0101012
Interestingly there are numbers, such as one tenth, which can be represented
exactly in decimal, but not in binary. This is why we can get apparently strange
results when doing calculations with numbers which are not integers. For ex-
ample, try the following in Java or C++:
if (0.1 + 0.1 + 0.1 == 0.3) ...
10 As noted above a repeating sequence of digits can imply a correct representation.
4.7. FLOATING POINT DATA REPRESENTATION 177
Just as any rational number can be represented with repeating decimal digits,
any rational number can be represented in binary with a repeating bit sequence.
• Sign (1 bit): This is the sign of the number (not to be confused with the
sign of the exponent). 0 represents positive, 1 represents negative.
• Fraction (23 bits): The fraction is taken from the normalized mantissa;
imagine a binary point with a single bit, always 1, before the binary point,
and the exponent is adjusted accordingly. Since the high order bit of the
mantissa is always 1, it is not stored as part the fraction!
10000001
30 23
exp
01010000000000000000000
22 0
fraction
0 10000001 01010000000000000000000
00 30 23 22 0
7. Group in groups of 4:
0100 0000 1010 1000 0000 0000 0000 000
Fig 4.18 shows a few other examples of floating point numbers in IEEE 754
format, specifically 0.375 and -27.0. As you read the table in Fig 4.18, each of
the three examples is in a single row of the table, with the final result labeled
as ‘hex result’.
4.7.3 Exercises
1. Show each of the following decimal numbers in binary fixed point notation.
If the number cannot be represented precisely, show enough binary places
to indicate a repeating sequence of bits after the binary point.
(a) 4 34
(b) 13/16
(c) 7.0
(d) 0.1
(e) 13.6
4.8. FLOATING POINT INSTRUCTIONS 179
Figure 4.18: Examples of IEEE 754 single precision floating point data: 5.25,
0.375, -27.0
2. Show each of the following numbers in IEEE 754 single precision floating
point format. Show your final result in hexadecimal,
(a) 17.0
(b) 13.375
(c) 0.15625
(d) -5.0
(e) 0.0
(f) 3.6
3. (a) Run the following Java or C++ code, and explain why it appears to
behave in an undesirable way:
for (double x = 0.0; x!=1.0; x = x + 0.1)
System.out.println ("x is " + x); // If using Java
cout << "x is " << x << ’\n’; // If using C++
(b) Show a better way to code the following statement:
if (x == 3.1) x = 0.0;
Figure 4.19: Machine language for floating point instructions (all values are in
hexadecimal)
31 26 25 21 20 16 15 11 10 6 5 0
ignored
Instruction opcode fields Example
lwc1 $rt, imm($rs) 0x31 lwc1 $f2, 12($a0)
$rs $rt imm
$a0 $f2 000c
swc1 $rt, imm($rs) 0x39 swc1 $f4, -24($a1)
$rs $rt imm
$a1 $f4 ffe8
Figure 4.20: Machine language formats for memory reference instructions for
floating point data (single precision)
opcode rs rt immediate
Figure 4.22: Machine language formats for floating point conditional branch
instructions
3. Show the result in hexadecimal, assuming the don’t cares are zeros.
46 02 20 3c
31 26 25 21 20 16 15 0
lp:
add.s $f2, $f4, $f2
mul.s $f6, $f8, $f2
c.lt.s $f6, $f2
bc1f lp
3. Show the instruction in binary (we show the immediate field in hex)
184 CHAPTER 4. MACHINE LANGUAGE FOR MIPS
4.8.4 Exercises
1. Given the following code:
.text
c.le.s $f6, $f2
bc1t done
sub.s $f12, $f16, $f2
lwc1 $f6, y
sw $t0, x
div.s $f0, $f28, $f2
done:
swc1 $f0, y
.data
x: .float 3.45
y: .float 0
A MIPS Assembler
185
186 CHAPTER 5. A MIPS ASSEMBLER
skipCommaWhite
Note that in an assembly language statement the mnemonic is separated from
the operands by one or more spaces and/or tab characters (we call this white
space 2 because that is what it looks like when printed on white paper). Note also
that the operands are separated from each other by commas and possibly white
space. The MARS assembler does not require the use of commas, i.e. white
2 For our purposes here we exclude newline characters from white space.
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 187
space can be used interchangeably with commas, and we will take the same
approach. Thus the above statement could conceivably (but not advisably) be
written as:
add,$2 $3,, , $8
Thus the first, lowest-level function, will be a function which scans a string
from a given starting point until it finds a character which is neither white
space nor comma (i.e. neither space nor tab nor comma). We call this function
skipCommaWhite.
This function is shown in Fig 5.1. Note that the API for this function
specifies that register $a0 points to a character in a statement (the start point
for the scan). The post conditions are (1) register $a0 points to the first non-
white, non-comma character found and (2) register $v0 is unchanged. The
skipCommaWhite function uses local data to store the space, tab, and comma
characters, which are loaded into registers $t1, $t2, and $t3, respectively. In
the body of the loop it checks a character from the given string for one of these
delimiters, terminating if not found. In order to satisfy the post condition, it
decrements the pointer in $a0 when finished. This function does not need to
save any registers because it does not use any s registers, and it does not call
any functions (which would clobber the $ra register).
As we develop our assembler incrementally, it is important that we test each
function as it is developed. For this purpose we should develop a driver for each
function that we develop.3 The sole purpose of the driver is to test a specific
function. A driver for the skipCommaWhite function is shown in Fig 5.2.
To run the driver, we could copy and paste the skipCommaWhite function
into the file containing the driver. However, that would give us two identical
copies of skipCommaWhite. This is not a good idea - if we ever need to make a
change to this function, we would need to make the change in each copy. For
this reason, duplicated code should be avoided whenever possible.
The MARS assembler provides a way of avoiding duplicated code; it is the
include directive. The operand is a file name, and the code from that file
is included as the assembler processes the current source file. The include
directive is after the syscall termination statement in Fig 5.2.
strcmp
We next turn our attention to the mnemonic in a statement. As noted in
chapter 3 the mnemonic represents the operation to be performed. For example
in the statement
add $2, $3, $8
the mnemonic is add. As our assembler encounters a mnemonic in a statement,
we will need to determine which operation it represents. To do this we will use
a table of valid mnemonics, and compare against the entries in the table. In
chapter 3 we devloped a function to compare strings, strcmp, but unfortunately
it assumed the strings were terminated with null bytes. We will have to modify
3 In most cases we leave the driver as an exercise.
188 CHAPTER 5. A MIPS ASSEMBLER
.text
skipCommaWhite:
lbu $t1, space_skipCommaWhite
lbu $t2, tab_skipCommaWhite
lbu $t3, comma_skipCommaWhite
lp_skipCommaWhite:
lbu $t0, 0($a0) # load byte from input string
addi $a0, $a0, 1
beq $t0, $t1, lp_skipCommaWhite # check for space
beq $t0, $t2, lp_skipCommaWhite # check for tab
beq $t0, $t3, lp_skipCommaWhite # check for comma
done_skipCommaWhite:
addi $a0, $a0, -1
jr $ra
######################## skipCommaWhite function end ##################
Figure 5.1: Function to scan past white space and commas in a statement
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 189
## Local data
.data
input: .asciiz " , , abc "
.word -1
result: .byte 0
.text
main:
la $a0, input # address of first byte in string
jal skipCommaWhite
lbu $t0, 0($a0) # should be non-white char
.include "skipCommaWhite.asm"
that function to use it here, because the strings we are comparing could be
terminated with white space or commas.
The modified version of strcmp is shown in Fig 5.3. Note that we load
the space and tab characters into registers $t2 and $t3, respectively. Then,
in the loop, we load a character from the first string into register $t0 and
the corresponding character from the second string into register $t1. After
subtracting these characters, we know that if the result is not zero, the strings
could not be equal, and we terminate the function. If we reach the end of both
strings on the same iteration of the loop, then we know the strings are equal.
The driver for strcmp should test several cases:
• The strings are equal, with identical terminating characters
• The strings are equal, with different terminating characters
• The strings are not equal
• The strings are not equal, but one string is a prefix of the other string,
e.g. “add” and “addi”
memonic
In the mnemonic function, the assembler scans the mnemonic in the state-
ment, locates that mnemonic in a table of mnemonics, and starts to con-
190 CHAPTER 5. A MIPS ASSEMBLER
.text
strcmp:
lb $t2, space_strCmp
lb $t3, tab_strCmp
lp_strCmp:
lb $t0, 0($a0) # load byte from src1
lb $t1, 0($a1) # load byte from src2
sub $v0, $t0, $t1 # v0 = t0 - t1
bne $v0, $0, unequal_strCmp
beq $t0, $t2, done_strCmp # space or tab?
beq $t0, $t3, done_strCmp
unequal_strCmp:
beq $t0, $t2, white0_strCmp
beq $t0, $t3, white0_strCmp
j done_strCmp
white0_strCmp:
beq $t1, $t2, white1_strCmp
beq $t1, $t3, white1_strCmp
j done_strCmp
white1_strCmp:
move $v0, $0 # Strings are equal
done_strCmp:
jr $ra # return
######################## strCmp function end ##################
Figure 5.3: Function to compare strings for equality. The strings are terminated
by white space or a comma
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 191
ops_mnemonic:
.asciiz "add " # mnemonic
.word 0x00000020 # opCode, function.
.asciiz "sub "
.word 0x00000022
.asciiz "and "
.word 0x00000024
.asciiz "or "
.word 0x00000025
.asciiz "slt "
.word 0x0000002a
.asciiz "beq "
.word 0x10000000 # opCode = 4 (shift left 2 bits)
.asciiz "bne "
.word 0x14000000 # opCode = 5
.asciiz "j "
.word 0x08000000 # opCode = 2
.asciiz "end "
.word -1
opsEnd_mnemonic:
.text
mnemonic:
addi $sp, $sp, -20
sw $s0, 0($sp)
sw $s1, 4($sp)
sw $s2, 8($sp)
sw $s3, 12($sp)
sw $ra, 16($sp)
move $s3, $a1 # address of instruction
Figure 5.4: Function to search a table for a mnemonic from the assembly lan-
guage statement, and initialize the machine language instruction with an opcode
and a function code (continued in Fig 5.5)
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 193
jal skipCommaWhite
move $v0, $0
j done_mnemonic
end_mnemonic:
addi $v0, $0, 1 # end of program
j done_mnemonic
error_mnemonic:
addi $v0, $0, -1
done_mnemonic:
move $a1, $s3
lw $ra, 16($sp)
lw $s3, 12($sp)
lw $s2, 8($sp)
lw $s1, 4($sp)
lw $s0, 0($sp)
addi $sp, $sp, 20
jr $ra
.text
##################### mnemonic function end #######################
.include "strcmp.asm"
Figure 5.5: Function to search a table for a mnemonic from the assembly lan-
guage statement, and initialize the machine language instruction with an opcode
and a function code (continued from Fig 5.4)
194 CHAPTER 5. A MIPS ASSEMBLER
## Local data
.data
zero_parseInt: .asciiz "0"
ten_parseInt: .word 10
.text
parseInt:
addi $sp, $sp, -16
sw $ra, 0($sp) # push return address onto stack
sw $s0, 4($sp)
sw $s1, 8($sp)
sw $s2, 12($sp)
Figure 5.6: Function to parse a numeric string, producing a 5-bit binary field
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 195
## Local data
.data
zero_isNumeric: .asciiz "0"
nine_isNumeric: .asciiz "9"
###################################### End isNumeric function
.text
isNumeric:
addi $sp, $sp, -4
sw $ra, 0($sp)
li $v0, 0
lbu $t0, zero_isNumeric
blt $a1, $t0, done_isNumeric # too small
lbu $t0, nine_isNumeric
bgt $a1, $t0, done_isNumeric # too big
reg:
addi $sp, $sp, -4
sw $ra, 0($sp)
jal parseInt
li $t0, 32
blt $v0, $t0, done_reg # check for valid reg number
li $v0, -1
done_reg:
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra
###################### reg function end ##########################
.include "parseInt.s"
Figure 5.8: Function to scan a register number in a statement, and obtain its
binary value
reg
We are now ready to process a register number in a statement. The function reg,
shown in Fig 5.8, will accept the address of the register number in a statement,
and return the register number, in binary, in register $v0. Thus, if the statement
is
add $31, $20, $20
and register $a0 points to the ’3’ then the binary value, 31, will be returned in
register $v0.
operand
We next consider a function to process a register operand in an R format state-
ment. The operand could be any of the three operands: RD, RS, or RT. One of
the inputs to the function will determine which of the three operands is being
processed. The operand function will convert the register number to binary,
and place it in the correct field of the instruction.
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 197
This function is shown in Figs 5.9 and 5.10, in which the API specifies a
code in register $a2 to specify which operand is being processed. This code is
actually a shift amount, to place the operand in the correct field of the machine
language instruction. For example, if we are processing the RT register, then
$a2 will contain 16. This is used to shift the register number 16 bits left, which
is where it is to be placed in the instruction. To do this we use a variable
shift instruction, sllv, in which the third operand is a register containing the
number of bits to be shifted. Once it has been shifted it can be ORed into the
instruction.
asm
Finally, we have all the tools we need to process an assembly language statement,
and produce the machine language instruction (for a limited subset of the MIPS
architecture). We call this function asm, for assembler.
The asm function is shown in Fig 5.14. This function is fairly short; all the
real work is done in other functions which are called by this function, specifically:
.text
operand:
addi $sp, $sp, -12
sw $s0, 0($sp)
sw $s1, 4($sp)
sw $ra, 8($sp)
Figure 5.9: Function to place an operand (i.e. register number) into the machine
language instruction (continued in Fig 5.10)
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 199
error_operand:
move $v0, $a0
done_operand:
move $a1, $s0
lw $s0, 0($sp)
lw $s1, 4($sp)
lw $ra, 8($sp)
addi $sp, $sp, 12
jr $ra
.text
########## operand function end ########################################
.include "reg.s"
.include "skipCommaWhite.s"
Figure 5.10: Function to place an operand (i.e. register number) into the ma-
chine language instruction (continued from Fig 5.9)
200 CHAPTER 5. A MIPS ASSEMBLER
operandRD:
addi $sp, $sp, -4
sw $ra, 0($sp)
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra
########## operandRD function end ########################################
.include "operand.asm"
Figure 5.11: Function to place the RD operand (i.e. register number) into the
machine language instruction
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 201
operandRT:
addi $sp, $sp, -8
sw $s0, 0($sp)
sw $ra, 4($sp)
done_operandRT:
move $a1, $s0
lw $s0, 0($sp)
lw $ra, 4($sp)
addi $sp, $sp, 8
jr $ra
########## operandRT function end ########################################
.include "lineEnd.s"
Figure 5.12: Function to place the RT operand (i.e. register number) into the
machine language instruction and scan to the beginning of the next statement
202 CHAPTER 5. A MIPS ASSEMBLER
jal skipCommaWhite
li $v0, 0
lbu $t0, 0($a0) # load byte from input string
beq $t0, $zero, done_lineEnd # null byte, end of stms
lbu $t1, newline_lineEnd
beq $t0, $t1, done_lineEnd # newline character, end of stmt
done_lineEnd:
addi $a0, $a0, 1
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra
######################## lineEnd function end ##################
.text
Figure 5.13: Function to scan to the end of a line, to the start of the next
statement in the assembly language program
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 203
The program repeats the above sequence once for each statement in the assembly
language program.
We also show a driver for the asm function; this will tell us whether our as-
sembler is working correctly. The driver is shown in Fig 5.15. In the driver there
are four statements after the syscall statement which terminates the program.
Since there are no labels on those four statements, they can never be reached
during execution. Nevertheless they will be useful. They are the same four
statements which we have placed in memory at the label input asmDriver. We
can then compare the machine language instructions produced by our assembler
with the instructions produced by MARS, at the end of the driver.
At this point we emphasize the fact that the assembler function, asm, is
relatively short and simple, because it simply calls other functions, none of
which is excessively long or complex. This exposes a few important principles
of software engineering:
• Use many small and simple components (functions in this case) as opposed
to a few large and complex components.
• Be sure that the interfaces4 are appropriate, and are clearly documented
in the API for each component.
# Pre: $a0 contains the memory address of the first line of asm code
# Last asm statement followed by -1.
# $a1 contains memory address for output.
# Post: $v0<0 => syntax error
# $a0 will contain address of error
asm:
addi $sp, $sp, -4
sw $ra, 0($sp)
lp_asm:
jal mnemonic
bne $v0, $0, done_asm # end of source program
jal operandRD
bne $v0, 0, done_asm
jal operandRS
bne $v0, 0, done_asm
jal operandRT
bne $v0, 0, done_asm
addi $a1, $a1, 4 # next instruction
blt $v0, $0, done_asm # error
beq $v0, $0, lp_asm
done_asm:
lw $ra, 0($sp)
addi $sp, $sp, 4
jr $ra
Figure 5.14: Function to scan an assembly language program and create the
corresponding machine language program (version 1a)
5.1. VERSION 1 - R FORMAT INSTRUCTIONS ONLY 205
# testing assembler
# version 1a: R format instructions only (no shift)
# No symbolic memory addresses
# No symbolic register addresses
# ’end’ mnemonic terminates input
# Limited error checking
# No comments nor Pseudo-ops
.data
input_asmDriver:
.asciiz " add $2,$3,$4 "
.asciiz " sub $12, $2, $3"
.asciiz " and $9, $10,$11"
.asciiz " or $21, $22, $23 "
.asciiz " end "
output_asmDriver:
.word 0, 0, 0, 0
returnCode: .word -1
.text
main:
la $a0, input_asmDriver
la $a1, output_asmDriver
jal asm
sw $v0, returnCode
li $v0, 10
syscall
add $2,$3,$4
sub $12, $2, $3
and $9, $10, $11
or $21, $22, $23
.include "asm.s"
.data
.word -1
regNames_reg: .asciiz "zero "
.asciiz "at "
.asciiz "v0 "
.asciiz "v1 "
.asciiz "a0 "
.asciiz "a1 "
.asciiz "a2 "
.asciiz "a3 "
.asciiz "t0 "
.asciiz "t1 "
.asciiz "t2 "
.asciiz "t3 "
.asciiz "t4 "
.asciiz "t5 "
.asciiz "t6 "
.asciiz "t7 "
.asciiz "s0 "
.asciiz "s1 "
.asciiz "s2 "
.asciiz "s3 "
.asciiz "s4 "
.asciiz "s5 "
.asciiz "s6 "
.asciiz "s7 "
.asciiz "t8 "
.asciiz "t9 "
.asciiz "k0 "
.asciiz "k1 "
.asciiz "gp "
.asciiz "sp "
.asciiz "fp "
.asciiz "ra "
regNamesEnd_reg: .word -1
.asciiz ""
.text
reg:
addi $sp, $sp, -20
sw $ra, 0($sp)
sw $s0, 4($sp)
sw $s1, 8($sp)
sw $s2, 12($sp)
sw $s3, 16($sp)
asm
skipCommaWhite reg
parseInt
isNumeric
Figure 5.18: Call graph for version 1a of the assembler. Solid arrows represent
a function call, with an .include directive. Dashed arrows represent function
calls with no .include directive. operandRD, operandRS, and operandRT have
been abbreviated to save space.
5.1.4 Exercises
1. Write a driver for each of the following functions:
(a) strcmp
(b) mnemonic
(c) isNumeric
(d) parseInt
(e) reg (version 1a)
(f) reg (version 1b)
(g) operand
(h) operandRD
(i) operandRS
(j) operandRT
(k) lineEnd
2. Extend the driver for the assembler (Fig 5.15) to include at least three
other statements. Compare your results with the program produced by
the MARS assembler.
3. Extend the mnemonic table (Fig 5.4) to include the operations xor, mult,
div, mflo, and mfhi,
5. Include the shift instructions sll and srl in your mnemonic table. You
will need to implement changes to mnemonic , asm , operandRT , and
lineEnd . Also write a function, shamt to process the shift amount in-
stead of the RS register.
Hint: This is tricky because the shift amount replaces the RS operand in
the instruction, so the second operand in the statement is the RT register.
6. How would the call graph shown in Fig 5.18 be changed for version 1b of
the assembler (i.e. register names are permitted)?
Most of the changes will be in our mnemonic function, which determines the
instruction op code. We now include a one byte value in the table of instruction
mnemonics to tell us whether the instruction is R, I, or J format. We use this
as the return code for the mnemonic function.
We then make use of this return code in the asm function. If the statement
is I format, it will process the $rt and $rs fields. Then instead of a $rd field (and
shamt and function code fields) it will process the immediate field and store it
directly in the low order 16 bits of the instruction. In doing so, the asm function
will call the parseInt function to convert the immediate field from ascii to a
binary 16-bit value. We used the parseInt function in version 1 to convert a
register number to a binary value. Now we must update this function to accept
negative numbers as well as positive. To do so, we merely check for a minus
sign (’-’) at the beginning. If so, we negate the returned result.
This version of our assembler is shown in Appendix ??
Figure 5.19: Two simple pseudo operations, and their equivalent instructions
lw $t0, 0($sp)
sw $s3, -12($a0)
These are I (immediate) format instructions, in which the $rt register is the
register being loaded or stored, and the effective memory address is the sum of
the $rs register and the immediate field. To handle these changes we will need
to work with the second pass of our assembler which we now call asmPass2.
This function will call a function which determines the operation code, which
we now call mnemonic2. It will determine the instruction type and return values
as shown below:
• $v0 = 0: R format
• $v0 = 3: J format
5.2.3 Exercises
1. Include a clear pseudo operation in version 2b of the assembler. It will
have just one operand, the register to be cleared. For example, the state-
ment clear $t0 will put the value 0 into register $t0.
Hint: Use an add or addi instruction to clear the operand.
2. Does the assembler permit white space in the middle of an explicit ad-
dress? For example, is the following handled correctly?
lw $t0, 12 ( $a0 ) If not, make the necessary modifica-
tions to version 2b of the assembler.
3.
This chapter begins our discussion of digital hardware. We start with some basic
theory of boolean algebra. We will then show how boolean functions may be
realized using some elementary building blocks - logic gates. We then use these
to build more complex components, which are then used to build more complex
components, and so on. Our goal is to build a small central processing unit, or
CPU. At this point the student should be able to understand the execution of
a machine language program. Various technologies are used to build the logic
gates, but this text will treat the gates as atomic components; i.e. we will work
from the gate level and up.
In what follows, a true value is represented by a binary 1, and a false value
is represented by a binary 0, which is consistent with chapter 3.
213
214 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
x y xy + x term
0 0 0
0 1 0
1 0 1 xy’
1 1 1 xy
Hopefully the context of boolean functions will always make it clear that we
are working with booleans and not with numbers.
In an expression involving more than one operation, the order of operations
is important. Parentheses may be used to specify the order of operaions. If
parentheses are omitted, the NOT operation takes precedence over the AND
operation:
x · y ′ = x · (y ′ )
The AND operation takes precedence over the OR operation, consistent with
algebra:
x + y · z = x + (y · z)
Thus we could write more complex expressions such as the one below:
x + yz ′ + x′ z
which is equivalent to:
(x + (y · (z ′ ))) + ((x′ ) · z)
The Exclusive OR operation is designated by a ⊕ symbol. Its precedence
level is the same as the +. When an expression contains two or more operations
of the same precedence, the leftmost operation is performed first. For example:
x + y ⊕ z + w = ((x + y) ⊕ z) + w
Canonical Forms
The student may have noticed that in Fig 6.1 the entries in the column for x
are the same as the entries in the column for xy+x. Thus we have the identity,
for any variables, x and y:
xy + x = x
6.1. NOTATION FOR BOOLEAN FUNCTIONS 215
x y z xy + x’yz’ term
0 0 0 0
0 0 1 0
0 1 0 1 x’yz’
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1 xyz’
1 1 1 1 xyz
It should now be clear that there is more than one expression for a given
boolean function. There are many areas of computer science where there exist
multiple representations for an entity. For example, in chapter 4 we saw multiple
representations (mantissa and exponent) of a floating point number. In such
situations we often wish to designate exactly one of those representations as
preferred over the others. This is known as a canonical form or a normal form.
For any boolean expression we have a canonical form which is known as a
sum of products or disjunctive normal form 2 . The sum of products normal form
can be obtained from a truth table as follows (here we assume the variables are
x, y, and z):
1. In each row of the truth table which has a 1 for the function’s value,
include a term involving an AND of all inputs, but for those inputs which
have a 0 value in that row, negate the variable. For example, in the row
for 0 1 0, the term would be x’yz’. These terms are shown in the last
column of Figs 6.1 and 6.2.
2. Form the OR of all the above terms.
For the function of Fig 6.1 the sum of products normal form would be
xy’ + xy. For the function of Fig 6.2 the sum of products normal form would
be x’yz’ + xyz’ + xyz.
here.
216 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
two OR operations, and two NOT operations. However since we know that the
expression represents the same function as xy + x’yz’ we can implement the
function with only three AND operations, one OR operation, and three NOT
operations.
As we will see later, minimizing a boolean expression leads to reduced hard-
ware requirements, and consequently reduced production costs. For that reason
we will be interested in finding ways of finding a minimal expression for a boolean
function.
Boolean identities
One way to minimize a boolean expression is by applying various identities,
such as x + xy = x. This identity, and many more, are shown in Fig 6.33 . For
each identity, there is a corresponding dual identity, shown in the last column.
The dual is obtained by changing all AND operations to ORs, changing all OR
operations to ANDs, changing all 1’s to 0’s, and changing all 0’s to 1’s. Note
in particular that, unlike arithmetic algebra, the OR operation distributes over
the AND operation: x + yz = (x+y)(x+z).
As an example we take the boolean expression x’yz + x’yz’ + xyz’ and we
attempt to find an equivalent expression which is, in some sense, minimal. Below
we show each step in the derivation, with the algebraic identity as justification:
yz yz yz yz
00 01 11 10
x=0
x=1
yz yz yz yz
00 01 11 10
x=0 1 0 1 1
x=1 1 0 0 0
Figure 6.5: A K-map for the boolean expression x’y’z’ + x’yz + x’yz’ + xy’z’
yz yz yz yz
00 01 11 10
x=0 1 0 1 1
x=1 1 0 0 0
Figure 6.6: A K-map for the boolean expression x’y’z’ + x’yz + x’yz’ + xy’z’.
Two groups of two are identified. The minimized expression is y’z’ + x’y.
the variables for which the value is zero. For example, in Fig 6.6 there is a
vertical group in the column for y=0, z=0. This means we will have the term
y’z’ in our result. There is also a horizontal group in the row for x=0 and the
columns for which y=1. This means that we will have the term x’y in our result.
Thus, our minimized expression is y’z’ + x’y.
Next we attempt to minimize the boolean expression x’y’z’ + x’yz’. There
are only two 1’s in the map, but they are actually adjacent, because the groups
may wrap around from end to end, as shown in Fig 6.7. Thus the resulting term
will have x=0 and z=0, producing x’z’ as the minimal expression.
We can also have a 2x2 block of 1’s in a K-map, and the blocks may overlap
with other blocks to produce a minimal expression. Our example, shown in
Fig 6.8, is the function x’yz + x’yz’ + xy’z + xyz + xyz’. In this case we have
a 2x2 block (y) and a 1x2 block (xz) which overlap, yielding the minimized
expression y + xz.
yz yz yz yz
00 01 11 10
x=0 1 0 0 1
x=1 0 0 0 0
Figure 6.7: A K-map for the boolean expression x’y’z’ + x’yz’. The group wraps
around from end to end. The minimized expression is x’z’.
yz yz yz yz
00 01 11 10
x=0 0 0 1 1
x=1 0 1 1 1
Figure 6.8: A K-map for the boolean expression x’yz + x’yz’ + xy’z + xyz +
xyz’. The 2x2 block overlaps the 1x2 block. The minimized expression is y +
xz.
yz yz yz yz
00 01 11 10
wx=00
wx=01
wx=11
wx=10
yz yz yz yz
00 01 11 10
wx=00 0 1 0 0
wx=01 0 0 1 1
wx=11 0 0 1 1
wx=10 1 0 0 1
Figure 6.10: A K-map for the boolean expression w’x’y’z + w’xyz + w’xyz’ +
wxyz + wxyz’ + wx’y’z’ + wx’yz’. A 1x2 group and a 2x2 group are identified.
The minimized expression is xy + wx’z’ + w’x’y’z
The technique for minimizing a boolean function with four variables is very
much like the technique used for three variables. However, with four variables
we can now have 4x1 blocks, 2x4 blocks, and 4x2 blocks. We use the same rules
for blocking the 1’s and determining the terms of the minimized expression.
Note that blocks can overlap, as with three variable maps, and that blocks can
now wrap around vertically as well as horizontally.
As an example, we take the boolean function of four variables given by
w’x’y’z + w’xyz + w’xyz’ + wxyz + wxyz’ + wx’y’z’ + wx’yz’. The K-map
is shown in Fig 6.10. Note that we have a 2x2 group (the term is xy) and a
1x2 group, wrapping around horizontally (the term is wx’z’). There is also a 1
which is not adjacent to any others, corresponding to the term w’x’y’z. That
term will have to appear in the final result, which is xy + wx’z’ + w’x’y’z.
Don’t cares
Up to this point all boolean expressions have been completely specified (the
truth tables show a 0 or 1 for every possible row). There are often situations
in which the problem does not require a complete specification, i.e. for some
inputs we do not care what the output is. In many applications certain inputs
are not expected, or disallowed, making the corresponding output irrelevant. In
this case we call the output a don’t care - it may be either a 0 or a 1. When
minimizing the boolean expression, it is possible to improve the minimization
by making use of don’t cares.
For example, consider the truth table in Fig 6.11. The don’t care outputs
are shown as question marks. A canonical sum of products expression for this
boolean function is x’y’z + xyz’. Looking at the K-map in Fig 6.12, at first
6.1. NOTATION FOR BOOLEAN FUNCTIONS 221
x y z f(x,y,z)
0 0 0 ?
0 0 1 1
0 1 0 1
0 1 1 ?
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 ?
Figure 6.11: Truth table defining a boolean function with 3 don’t cares (shown
as question marks)
yz yz yz yz
00 01 11 10
x=0 ? 1 ? 1
x=1 0 0 ? 0
Figure 6.12: A K-map for the boolean function whose truth table is given in
Fig 6.11 (question marks are don’t cares)
glance it appears that it cannot be simplified. But if we assume that the two
don’t cares in the top row are 1’s and the other don’t care is a 0, then we have
a group of 4 1’s in the top row, as shown in Fig 6.12, and this boolean function
simplifies to f(x,y,z) = x’.
6.1.3 Exercises
1. Show a truth table corresponding to each of the following boolean expres-
sions:
(a) x + yz’
(b) xy + x’yz’ + yz
(c) x (y’ + xz)
2. Show the disjunctive normal form (sum of products) expression for the
boolean function corresponding to each of the following expressions:
(a) x + yz’
(b) xy + x’yz’ + yz
(c) x (y’ + xz)
222 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
Hint: Form a truth table first; there will be one term for each 1 in the
truth table.
3. Show how to minimize each of the following boolean expressions using the
identities given in Fig 6.3. Justify each step in your derivation by naming
the identity which is used.
x y z f(x,y,z)
0 0 0 1
0 0 1 ?
0 1 0 0
(a) 0 1 1 1
1 0 0 ?
1 0 1 1
1 1 0 0
1 1 1 0
6.2. BASIC LOGIC GATES 223
x y z f(x,y,z)
0 0 0 0
0 0 1 ?
0 1 0 0
(b) 0 1 1 0
1 0 0 1
1 0 1 ?
1 1 0 1
1 1 1 1
w x y z f(w,x,y,z)
0 0 0 0 0
0 0 0 1 ?
0 0 1 0 ?
0 0 1 1 1
0 1 0 0 0
0 1 0 1 1
0 1 1 0 0
(c) 0 1 1 1 1
1 0 0 0 0
1 0 0 1 ?
1 0 1 0 0
1 0 1 1 1
1 1 0 0 ?
1 1 0 1 1
1 1 1 0 0
1 1 1 1 ?
x x·y
y
x
x·y·z
y
z
Figure 6.14: A simple AND gate with three inputs, x, y, and z
to use more than five inputs, to avoid cluttering our circuit diagrams). There
is no problem with several inputs because the AND operation is associative:
(x · y) · z = x · (y · z)
Thus the parentheses are not needed, and an AND gate may have any num-
ber of inputs.
6.2.2 OR Gates
The OR gate, shown in Fig 6.15, performs the logical OR operation, as described
in chapter 3. The output of an OR gate is 0 only if all inputs are 0. In all other
respects OR gates are exactly like AND gates. An OR gate with three inputs
is shown in Fig 6.15.
When drawing circuit diagrams be sure that your AND gates have a straight
base, and that your OR gates have a curved base. Also, your AND gates should
have a rounded tip, and your OR gates should have a sharp tip.
6.2.3 Inverters
An inverter performs the NOT operation. Thus, an inverter has one input, and
the output is the logical complement of the input:
An inverter is shown in Fig 6.16. Since the NOT operation is a unary operation
(i.e. it has only one operand), the inverter must have just one input. Note that
x
x+y+z
y
z
Figure 6.15: A simple OR gate with three inputs, x, y, and z
6.2. BASIC LOGIC GATES 225
x’
x
xy + yz
y
the inverter symbol consists of a small triangle with a small circle, or ‘bubble’ on
the tip. As we will see below, the bubble will usually signify a NOT operation.
y (x + y) · (y ⊕ z) · z ′
x y z w
xy ′ zw′ + y ′ x + wx′
Figure 6.19: A sum of products logic diagram for the expression xy ′ zw′ + y ′ x +
wx′
???
x x
???
y
Source Target
Component Component
(a)
(b)
Figure 6.21: Two components connected by four wires, (a) The wires are shown
explicitly, and (b) The wires are shown as a bus of width 4
Buses
In digital logic diagrams we will often need to send several bits from one com-
ponent to another, all at the same time. We will need several parallel wires to
do this; this is called a bus. The width of a bus is the number of wires which
make up the bus. Fig 6.21 shows two components, a source and a target con-
nected by 4 wires. For convenience we can use the diagram in part (b) instead
of the diagram in part (a). Note that the width of the bus is always shown in
parentheses.
[0..7]
(32)
[8..31]
Figure 6.22: A 32-bit bus is split into an 8-bit bus and a 24-bit bus
[0..7]
(16)
[8..15]
Figure 6.23: Two 8-bit buses are joined to form a 16-bit bus
We should note how the buses are split or joined. We note which are the
low-order bits and which are the high-order bits by specifying bit locations in
square brackets. This is done in both Fig 6.22 and Fig 6.23.
6.2.7 Exercises
1. Show a logic diagram corresponding to each of the following boolean ex-
pressions (do not attempt to minimize the expressions):
(a) xy + z’
(b) (x+y’+z)(x’+y)z
(c) xy ⊕ y’z’
(d) x(x+yz)
2. Show a logic diagram using only AND gates, OR gates, and Inverters, to
implement the Exclusive OR operation.
Hint: Build a truth table first.
3. (a) Assume you have no OR gates. Show how the function x+y can be
implemented using only AND gates and Inverters.
(b) Assume you have no AND gates. Show how the function xy can be
implemented using only OR gates and Inverters.
Hint: Use the DeMorgan identities from Fig 6.3
4. Show a sum of products logic diagram for each of the boolean expressions
shown below (do not attempt to simplify the expressions):
(a) xy + y’z
6.2. BASIC LOGIC GATES 229
(c) w’x’y’z’ + wx + yz
(a) x y z
(b) x y z
?
230 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
(c) w x y z
6. Show a Karnaugh map for each of the logic diagrams in the previous
exercise, and show a simplified expression, if possible.
Once we understand how the component is constructed, and how its outputs
are determined from its inputs, we can then represent the component as a plain
box, showing simply its name, inputs, and outputs. In this way we can use it
in a logic diagram without cluttering the diagram with the inner workings of
our component. This process, known as abstraction, is also used in software
development. A clearly defined and appropriate interface replaces the need to
expose the inner workings of a component.
6.3. COMBINATIONAL LOGIC CIRCUITS AND COMPONENTS 231
7
6
5
4
3 3
2 2
1 1
0 0
Figure 6.24: A sign-extend component for which the input is a 4-bit bus, and
the output is an 8-bit bus, preserving the sign of the number
This can be accomplished easily by connecting the high order bit of the input
bus to all the extended bits in the output bus, as shown in Fig 6.24.
A block diagram for a sign extend component is shown in Fig 6.25.
6.3.2 Decoders
A component which selects one of several output lines, based on the binary
value of its input, is called a decoder. A decoder with n input lines will have
2n output lines. For example, a decoder with 3 inputs will have 8 outputs; this
would be known as a 3x8 decoder.
If the 8 outputs of a 3x8 decoder are labeled D0 through D7 , then an input
value of i will result in a value of 1 on output Di and a value of 0 on all other
outputs. For example, if the inputs to a 3x8 decoder are 1012 = 5, then the
outputs will be 001000002 in which D5 is 1 (here we show the low order bit, D0
at the right).
Fig 6.26 shows the 3x8 decoder function as a truth table. The inputs are
labeled I, and the outputs are labeled D. Fig 6.27 shows the logic diagram which
implements that function. Think of the inputs to a decoder as selectors, because
they act to select one output line.
232 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
I2 I1 I0 D7 D6 D5 D4 D3 D2 D1 D0
0 0 0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 1 0 0
0 1 1 0 0 0 0 1 0 0 0
1 0 0 0 0 0 1 0 0 0 0
1 0 1 0 0 1 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0
Now that we understand how to build a decoder of any size, we can present
a more abstract view of a decoder, showing only the name, inputs, and outputs.
This is called a block diagram of a decoder and is shown in Fig 6.28, which
shows a 3x8 decoder. For simplicity we show the three input lines as a bus of
width 3, and the eight output lines as a bus of width 8.
How can a decoder perform a useful task? Consider a traffic signal with
four lights: red, yellow, green, and left-turn arrow. Assume we have a mod-4
counter, i.e. a counter which repeatedly puts out the values 00, 01, 10, 11, 00,
01, 10, 11, ... on two output lines (mod-4 counters will be covered in the section
on sequential circuits, below). We can then use a 2x4 decoder to send a 1 or
0 signal to each light, so that only one light is on at any one time. The logic
diagram is shown in Fig 6.29.
6.3.3 Encoders
We now turn our attention to a component which performs the inverse function
of the decoder; an encoder is a device with 2n inputs and n outputs. Normally
only one of the inputs will have a value of 1, and the others will all have a value
of 0. If the ith input is a 1, then the binary value of i will be on the output lines.
For example, for an 8x3 encoder, if the inputs are 00010000 (i.e. I4 is 1), then
the output will be 1002 = 4. Fig 6.30 shows a truth table for an 8x3 encoder.
encoder.5
How can we build a 4x2 encoder using our basic logic gates? We take the
first four rows of Fig 6.30, and form a K-map for each of the two outputs in
which unexpected input combinations are shown as don’t cares, as shown in
Figures 6.31 and 6.32. In those figures, the inputs which we had previously
been calling w,x,y,z are now I3 , I2 , I1 , I0 , respectively. The resulting minimized
expressions are
E1 = I3 + I2
E0 = I3 + I1
5 This is only a partial truth table; we are not showing all the rows because we assume that
I2 I1 I0
D0
D1
D2
D3
D4
D5
D6
D7
7
2 6
5
Decoder4 (3) Decoder (8)
1
3x8 3 3x8
2
0 1
0
Figure 6.28: Block diagrams of a 3x8 decoder; left diagram shows the inputs
and outputs as separate lines; right diagram shows the inputs and outpus as
busses.
D0 = Red
Decoder D1 = Yellow
(2) 2x4
D2 = Green
Traffic
D3 = Arrow
Figure 6.29: Traffic signal control using a 2x4 decoder. The inputs come from a
mod-4 counter, the outputs go to the four lights on a traffic signal - Red, Yellow,
Green, Left-turn arrow
I7 I6 I5 I4 I3 I2 I1 I0 E2 E1 E0
0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1 1
0 0 0 1 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 1 0 1
0 1 0 0 0 0 0 0 1 1 0
1 0 0 0 0 0 0 0 1 1 1
Figure 6.30: Truth table defining an 8x3 encoder; the inputs are labeled I, and
the outputs are labeled E
6.3. COMBINATIONAL LOGIC CIRCUITS AND COMPONENTS 235
yz yz yz yz
00 01 11 10
wx=00 0 ? ? 0
wx=01 1 ? ? ?
wx=11 ? ? ? ?
wx=10 1 ? ? ?
E1 = w + x = I3 + I2
Figure 6.31: A K-map for the high order output bit, E1 of a 4x2 encoder.
Unexpected input combinations are shown as don’t cares (question marks). The
inputs wxyz = I3 I2 I1 I0 .
Note that the input I0 is not used! Fig 6.33 shows the logic diagram for a
4x2 encoder.
We could also have 2x1, 8x3, 16x4 encoders, etc. As an example of an appli-
cation which could use an encoder, consider a building with a motion detector
in each room. We wish to put out an alert signal if motion is detected in any
room. The alert signal should indicate the room number in which motion is
detected.6 If the building has 32 rooms, we could use a 32x5 encoder, with
an input coming from a motion detector in each room (0=no motion detected,
1=motion detected). Our output would then be the room number (in binary)
of the room in which motion is detected.
A block diagram for an 8x3 encoder is shown in Fig 6.34.
6.3.4 Multiplexers
There are many applications in which we wish to select one of several input lines
(or buses), and pass it on to the output. This kind of selector is generally called
a multiplexer.7 Thus in addition to the data inputs, the multiplexer will require
control inputs which determine which of the input lines are put on the output
lines. For example, a multiplexer with 8 data inputs will have one output line
and 3 control lines to select one of the 8 data inputs. This would be called an
8x1 multiplexer. In an 8x1 multiplexer, if the three control bits are 101 (i.e. 5),
then the value of the fifth input data line is copied to the output data line.
6 We assume that motion may be detected in no more than one room at a time.
7 Also spelled multiplexor
236 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
yz yz yz yz
00 01 11 10
wx=00 0 ? ? 1
wx=01 0 ? ? ?
wx=11 ? ? ? ?
wx=10 1 ? ? ?
E0 = w + y = I3 + I1
Figure 6.32: A K-map for the low order output bit, E0 of a 4x2 encoder. Un-
expected input combinations are shown as don’t cares (question marks). The
inputs wxyz = I3 I2 I1 I0 .
I0
I1
E0 = I1 + I3
I2
E1 = I2 + I3
I3
I7 I6 I5 I4 I3 I2 I1 I0 S2 S1 S0 M
0 0 0 I0
0 0 1 I1
0 1 0 I2
0 1 1 I3
1 0 0 I4
1 0 1 I5
1 1 0 I6
1 1 1 I7
Figure 6.35: Truth table defining an 8x32 multiplexer. Each of the eight data
inputs, I, is a 32-bit bus, and the output,M, is a 32-bit bus.
S I1 I0 M
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
As with other components, the inputs and outputs may be buses. A multi-
plexer with 16 data inputs, each of which is a 32-bit bus, will have one output
bus (also 32 bits) and 4 control bits to select one of the 16 data input busses.
This would be called a 16x32 multiplexer. A truth table for an 8x32 multiplexer
is shown in Fig 6.35.
In general, a multiplexer8 with k control bits, and n-bit data buses would be
called a 2k x n multiplexer.9
How can we design a simple multiplexer, using basic logic gates? To build a
2x1 multiplexer, working from the truth table in Fig 6.36 we form the K-map
shown in Fig 6.37.
This will then give us the sum of products expression:
M = S ′ I0 + SI1
The logic diagram is shown in Fig 6.38
Block diagrams of an 8x1 multiplexer and a 4x16 multiplexer are shown in
Fig 6.39.
As an example of an application which could use a multiplexer, consider a
8 Terminology for multiplexers in the literature is not consistent. What we call an 8x4
yz yz yz yz
00 01 11 10
x=0 0 1 1 0
x=1 0 0 1 1
Figure 6.37: K-map used to build a 2x1 multiplexer, derived from the truth
table in Fig 6.36
S I1 I0
7 (16)
6 3
5 (16)
4 Mux 2 (16)
Mux
3 (16)
8x1 1 4x16
2
1 (16)
0 S2 S1 S0 0 S1 S0
Figure 6.39: Block diagrams of an 8x1 multiplexer (3 control inputs), left, and
a 4x16 multiplexer (2 control inputs), right
6.3. COMBINATIONAL LOGIC CIRCUITS AND COMPONENTS 239
channel 7 7
channel 6 6
channel 5 5
channel 4 4 Mux Channel 3
channel 3 3 8x1
channel 2 2
1
channel 1
0 S2 S1 S0
channel 0
0 1 1
Figure 6.40: Application of a multiplexer: A digital radio or TV channel selector.
Channel 3 is selected by the user.
radio or TV which needs to select one of several digital channels to be put out
to the user. Each data input to the multiplexer would be the digital signal for
a particular channel, and the control inputs would be used to select one of the
channels, which is than sent to the output data line. A possible diagram is
shown in Fig 6.40, in which the user has selected channel 3.
carries 1 1 1
x = 11 0 1 0 1 1
y = 14 0 1 1 1 0
sum = 25 1 1 0 0 1
x y S C
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
Figure 6.42: Truth table defining the sum and carry outputs of a half adder
a sum bit and a carry bit. The full adder can be implemented using two half
adders.
To design the half addder we refer to the truth table in Fig 6.42 which
shows two outputs, a sum (S) and a carry (C). From the truth table we obtain
boolean expressions for the outputs:
S = x′ y + xy ′ = x ⊕ y
C = xy
Using these expressions for the output we can build the logic diagram for
a half adder; it is shown in Fig 6.43, and a block diagram for a half adder is
shown in Fig 6.44
We now turn our attention to the full adder; it will have three inputs: x, y,
and the carry from the previous stage. It will have two outputs: sum and carry
to the next stage. The truth table for a full adder is shown in Fig 6.45 which
shows the two outputs, a sum (S) and a carry (C). In this figure we distinguish
between the two carries. cin , or carry-in, is the carry from the previous column.
Cout , or carry-out, is the carry out to the next coliumn. In general the carry-out
from column i is the carry-in to column i+1, working from right to left.
At this point we could find minimal boolean expressions for the two outputs
and construct the logic diagram. But there is an easier way: we propose using
S =x⊕y
x
y
C = x·y
Figure 6.43: A logic diagram implementing a half adder. S is the sum, and C is
the carry.
6.3. COMBINATIONAL LOGIC CIRCUITS AND COMPONENTS 241
x S
Half Adder
2x2
y C
Figure 6.44: Block diagram for a Half Adder. S is the one-bit sum, x+y, and C
is the carry.
x y cin S Cout
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Figure 6.45: Truth table defining the sum and carry outputs of a full adder
242 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
S
x x S x S
Half Adder Half Adder
2x2 2x2
y y C y C
cin cout
Figure 6.46: Logic diagram for a full adder, using two half adders
x
S
y Full Adder
3x2
Cout
cin
Figure 6.47: Block diagram for a Full Adder. S is the one-bit sum, x + y + cin ,
and Cout is the carry-out.
x3 y3 x2 y2 x1 y1
x x x x x0
S S S S
FA y FA y FA y FA y y0
3x2 3x2 3x2 3x2
cout Cout Cout Cout Cout
Cin Cin Cin Cin 0
s3 s2 s1 s0
Figure 6.48: Design of a 4-bit adder to find the sum x + y, using four full adders
(32)
A
(32)
ADD A+B
(32)
B
point. It will have two input busses (the values being added) and one output
bus (the sum). The adder will consist of n Full Adders, each of which is as
shown in Fig 6.47, where n is the size of the busses. For example, an adder
which adds two 16-bit numbers, producing a 16-bit result would be called a
16-bt adder. A 4-bit adder is shown in Fig 6.48.11 In this adder the S output
of the ith full adder is the ith bit of the adder’s output bus. The Cout output
of the ith full adder is the cin to the i + 1st full adder. The carry-in to stage 0
is set to the constant 0. The carry out from the high order bit can be used to
detect overflow. Overflow is a condition indicating that the result will not fit in
the number of bits being used. Overflow can be detected if the carry-out from
the last stage is different from the carry-in to the last stage.
A block diagram of a 32-bit adder is shown in Fig 6.49.12
11 We have placed the inputs in each full adder on the right, and the outputs on the left,
because binary numbers are normally written with the low-order bit at the right.
12 Historically, the block diagram for an adder has been a wedge shape, for reasons unknown
to the author.
244 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
6.3.6 Exercises
1. In each case show the 8-bit output of a 3x8 decoder if the inputs are as
shown below (the low order bit is at the right end):
(a) 000
(b) 011
(c) 110
2. In each case show the 3-bit output of an 8x3 encoder for the inputs given
below
(a) 00000010
(b) 00100000
(c) 10000000
3. (a) Show the logic diagram for an 8x3 encoder using only AND gates,
OR gates, and inverters.
Hint: You will not be able to use kMaps because there are too many
inputs. Instead write a boolean expression for each of the three out-
puts.
(b) What would be the output of your 8x3 encoder if the input were
00101000 ? (This is not a valid input, but your encoder would still
produce an 3-bit output)
4. Show a block diagram for each of the following:
(a) A 4x1 multiplexer
(b) A 4x4 multiplexer
(c) An 8x32 multiplexer
5. Show a logic diagram for a 4x1 multiplexer.
6. What is the output of a 4x8 multiplexer if the inputs are I3 = 01010000,
I2 = 01011110, I1 = 00101101, I0 = 01010000, and the control input is
10?
7. In each case show the output of a full adder if the input is:
(a) 011
(b) 101
(c) 110
8. How many full adders are needed to implement the MIPS add instruction?
9. Construct a full adder using only basic logic gates (i.e. do not use a half
adder). Try to minimize your design.
6.4. SEQUENTIAL CIRCUITS 245
1
0 time
(a) Show a Kmap for each of the two outputs. Work from the truth table
in Fig 6.45.
(b) Draw the logic diagram, or two separate logic diagrams for the two
outputs. The Kmaps give you a minimal sum-of-products for the
function. Perhaps using XOR gates provides a cheaper solution.
With sequential circuits components can maintain state, thus storing some
representation of the inputs that it has received over some period of time. Gen-
erally, the components in a sequential circuit need to be synchronized in such a
way that they all update their states at the same time. This is done with a clock
signal. A clock signal is a 1-bit signal that varies periodically, and consistently,
between 0 and 1. A diagram of a clock signal is shown in Fig 6.50 in which the
vertical axis shows the value of the clock signal (0 or 1), and the horizontal axis
is time.
The period of a clock signal is the time it takes to undergo a complete tran-
sition from 0 to 1 and back to 0. This is called a cycle. The frequency of a clock
signal is the number of complete periods in a unit of time. Frequency is usually
expressed with the unit cycles per second, or Hertz.13 Period and frequency
are multiplicative inverses of each other:
frequency = 1 / period
period = 1 / frequency
If the clock signal for a particular CPU has a period of 1 nanosecond (1ns) =
10−9 sec, then its frequency would be 1/10−9sec = 109 cycles per second =
109 Hertz = 1 GigaHertz (1GHz)
13 Named after the German Physicist Heinrich Hertz, who proved the existence of electro-
S Q’
Q
R
6.4.1 SR Flip-Flops
Our first example of a flip-flop is known as an SR flip-flop, because the two
inputs are Set and Reset. To begin we construct a one-bit storage device known
as an SR Latch. Though it has some undesirable characteristics, it will lead us
to the design of an SR flip-flop. A diagram of an SR latch is shown in Fig 6.51.
This is our first example of a combinational circuit with feedback. The output
of the top NOR gate is used as the first input to the bottom NOR gate, and the
output of the bottom NOR gate is used as the second input to the top NOR
gate. This will require some careful thinking to analyze.
With a NOR gate, if one of the inputs is a 1, we know the output must be
a 0, regardless of the value of the other input. Algebraically, (1+x)’ = 0. This
leads to the table shown below:
S R Q’ Q State
1 0 0 ? ? Unchanged
2 0 1 1 0 Reset (0)
3 1 0 0 1 Set (1)
4 1 1 0 0 Unknown
The Q output of the SR latch represents the state of the latch, and Q’
represents the complement of the state. We explain rows 2, 3, 4, and 1 in this
table:
S
Q’
clock
Q
R
• Row 3: State = Set(1), S=1, R=0. Because S is 1, the output of the top
NOR gate must be (1+x)’ = 0, thus Q’ is 0. The top input to the bottom
NOR gate is 0, thus the output of the bottom NOR gate is (0+0)’ = 1,
and thus Q is 1.
• Row 4: State = Unknown, S=1, R=1. Because S and R are both 1, the
outputs of both NOR gates must be 0. Thus, Q=0 and Q’=0, which is a
contradiction because Q’ is supposed to be the complement of Q.
• Row 1: State = Unchanged, S=0, R=0. Here the current state of the
flip-flop depends on its previous state. If the previous state had been Set
(Q=1,Q’=0), then the current state would still be Set (Q=1, Q’=0). If the
previous state had been Reset (Q=0,Q’=1), then the current state would
still be Reset (Q=0, Q’=1).
To make effective use of the SR latch, the user must be careful not to set both
inputs at 1.
We will extend the design of the SR latch to arrive at the design of a clocked
SR flip-flop. This is done by adding two AND gates and a clock input, as shown
in Fig 6.52.
The clock input insures that the flip-flop can change state only when the
clock signal is 1. When the clock signal is 0, both inputs to the NOR gates
are 0, and as we showed above the latch maintains its current state. This will
make the SR flip-flop useful in digital circuits in which all components need to
be synchronized (i.e. change state at the same time).
6.4.2 D Flip-Flops
Above we pointed out that the inputs to the SR flip-flop should not both be
1, because that yields an unknown state. The D flip-flop is similar to the SR
flip-flop; however, it ensures that the S and R inputs are complements of each
other. The D flip-flop is shown in Fig 6.53. As with the SR flip-flop, the clock
input serves to synchronize the change of state with other devices. Aside from
the clock input, there is only one input, D, and its complement is formed with
248 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
D
Q’
clock
J Q’
clock
Q
K
an inverter. The D input determines the state of the flip-flop, as shown in the
table, below:
D Q’ Q State
1 0 1 0 Reset (0)
2 1 0 1 Set (1)
6.4.3 JK Flip-Flops
JK flip-flops are similar to SR flip-flops, with the addition of feedback from the
NOR gates to the AND gates, as shown in Fig 6.54. Because of the feedback,
the behavior of this flip-flop will clearly depend on the current state. In the
table below, Qn represents the current value of the Q output (which represents
the state of the flip-flop), and Qn+1 represents the state of the flip-flop when
the clock signal returns to 1, i.e. when the state is permitted to change.
The behavior of a JK flip-flop is analyzed using the table below:
6.4. SEQUENTIAL CIRCUITS 249
J K Qn Qn+1
1 0 0 0 0
2 0 0 1 1
3 0 1 0 0
4 0 1 1 0
5 1 0 0 1
6 1 0 1 1
7 1 1 0 1
8 1 1 1 0
Referring to Fig 6.54, when the output of both AND gates are 0, the inputs
to both NOR gates are 0, and the state of the flip-flop is unchanged. This is
the case in rows 1, 2, 3, and 6 of the table. In row 4 the output of the top AND
gate is 0, and the output of the bottom AND gate is 1; thus we essentially have
inputs of S=0 and R=1 to the embedded latch (the two NOR gates). The state
is changed to 0 (Q=0). In rows 5 and 7 the output of the top AND gate is 1, and
the output of the bottom AND gate is 0; thus we essentially have inputs of S=1
and R=0 to the embedded latch (the two NOR gates). The state is changed to
1 (Q=1). In row 8 the output of the top AND gate is 0, and the output of the
bottom AND gate is 1; thus we essentially have inputs of S=0 and R=1 to the
embedded latch (the two NOR gates). The state is changed to 0 (Q=0).
The important feature of JK flip-flops, as compared with the other flip-flops
we’ve seen, is that it has a toggle feature: when J and K are both 1, the state of
the flip-flop is complemented. This is like a toggle light control which is simply
a button that turns the light off if it is on, and on if it is off.
6.4.5 Registers
We have seen that a flip-flop is merely a 1-bit storage device. Thus, several
flip-flops can be used to implement a CPU register. For example, a 32-bit
register would consist of 32 flip-flops, one flip-flop for each bit of the register.
We choose to use D flip-flops for this purpose. However, to ensure that the
register’s state changes only at the appropriate times (for example, during a
store word instruction) we will use a load signal to tell the flip-flops that they
should change state according to the D input. To do this all we need is an AND
250 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
S Q
clock S R State (Q)
R 0 0 No change
0 1 Reset (0)
1 0 Set (1)
1 1 Undefined
D Q
clock
D State (Q)
0 Reset (0)
1 Set (1)
J Q
clock J K State (Q)
K 0 0 No change
0 1 Reset (0)
1 0 Set (1)
1 1 Complement
Data In
D Q
Load
clock
gate, taking as inputs the clock signal and the load signal. The output of the
AND gate will then be the clock input to the D flip-flop, as shown in Fig 6.58
We can use four of these devices to build a 4-bit register with a load signal.
It is shown in Fig 6.59. In this diagram the signal ik is the k th bit of the input
bus to the register, and the signal regk is the k th bit of the register’s stored
value.
Load
clock
i0 D Q reg0
i1 D Q reg1
i2 D Q reg2
i3 D Q reg3
even❦0
1
1
❦0
♥
odd
Inputs
States 0 1
even even odd
odd odd even
• When the machine is in the even state, it has seen an even number of 1’s.
When the machine is in the odd. state, it has seen an odd number of 1’s.
Such a machine is known as a parity checker.
• The unlabeled arrow pointing to the even state indicates that the even
state is the start state. Every state machine should have exactly one start
state. Before reading any input symbols, the machine is in the start state.
• The double circle on the odd state indicates that it is an accepting state.15
A state machine may have zero or more accepting states.
State graphs are most useful when designing and analyzing state machines.
To construct the state machine using flip-flops, it will be helpful to represent the
machine as a table. In a state table, the columns are labeled by input symbols,
and the rows are labeled by states. A state, s, in row r and column c indicates
that if the machine is in state r and the input is c, then the machine makes a
transition to state s.
The parity checker of Fig 6.60 is shown in table form, also known as a
transition table, in Fig 6.61.
We are now ready to build the parity checker logic circuit. We know that we
will not need more than one flip-flop because the machine has only two states.
Let 0 represent the even state, and let 1 represent the odd state. We will use a
D flip-flop; the Q output of the flip-flop represents the state of the machine. To
15 If viewed as a terminating state, an accepting state provides the machine with the capa-
bility of defining a language as the set of strings which cause the machine to end up in an
accepting state after the entire string has been read. The machine of Fig 6.60 will accept any
string of 0’s and 1’s which has odd parity. This kind of machine is known as a Moore machine:
the output is associated with the state; state machines which produce an explicit output on
each transition are known as Mealy machines.
254 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
in D Q
build the logic circuit we first show a truth table relating the input, the current
state of the machine, and the next state of the machine. This will show us what
logic gates are needed in a feedback loop from the output of the flip-flop.
i Q D = next State
0 0 0
0 1 1
1 0 1
1 1 0
In this truth table i represents the input symbol, Q represents the current
state, and D represents the next state. We can write a boolean expression for D:
D =i⊕Q
Using what we have developed above, we can now build the logic circuit for
our parity checker. It is shown in Fig 6.62.16
6.4.7 Exercises
1. Implement the parity checker of Fig 6.60. using a JK flip-flop and no other
logic gates.
2. Design a state machine to control a traffic signal. The state machine has
outputs which enable the lights on the traffic signal: Red, Yellow, Green,
and Left Turn Arrow. The sequence of states for the traffic signal are
Green and Left Turn, Green, Yellow, Red. Each time a ’clock’ signal is
applied, the machine goes to the next state.
3. Design a state machine which will calculate n % 3 (i.e. n mod 3) for any
unsigned binary number n provided as the input. Examples:
n n%3 n n%3
0000 = 0 00 = 0 0110 = 6 00 = 0
0001 = 1 01 = 1 0111 = 7 01 = 1
0010 = 2 10 = 2 1110 = 14 10 = 2
0011 = 3 00 = 0 1000000 = 64 01 = 1
0100 = 4 01 = 1 1010110 = 86 10 = 2
0101 = 5 10 = 2 1111111 = 127 01 = 1
16 We have chosen to use a D flip-flp because it generalizes well to more complex state
On each clock signal, the machine reads one bit of the number, high order
bit first. The flip-flop outputs represent the binary value of n % 3.
Hints:
• This machine will need 4 states; the machine will never return to
the start state after reading the first input symbol. The other states
represent the three possible results: 0=00, 1=01, and 2=10.
• After reading several bits of a binary number, if the next bit is a 0,
the number is doubled.
• After reading several bits of a binary number, if the next bit is a 1,
the number is doubled, plus 1.
Operation Data Z
Select Out Out
0000 A AND B 1 if A AND B is all 0’s
0001 A OR B 1 if A OR B is all 0’s
0010 A+B 1 if A + B is all 0’s
0110 A-B 1 if A - B is all 0’s
0111 [unspecified] 1 if A < B
1100 A NOR B 1 if A NOR B is all 0’s
Figure 6.63: Function table for the MIPS ALU. Each of the inputs, A and B, is
a 32-bit bus. Z is a 1-bit output, indicating a zero result
(32)
A
(32)
ALU
(32)
B
Operation
Select
Figure 6.64: Block diagram of an ALU for the MIPS processor
6.5.1 Exercises
1. Assume the A and B inputs to the 32-bit ALU defined in Fig 6.63 are:
A = 0102030416
B = f 0f 100f f16
Show the 32-bit output (in hex) and the 1-bit Z output for each of the
following operation select inputs (show unspecified outputs as question
marks):
6.6. CONSTRUCTION OF THE ALU 257
Ai
AN Di = Ai · Bi
Bi
Figure 6.65: Implementation of the AND function for stage i of the ALU. Op-
eration select is 0000 (raised dot indicates logical AND)
Ai
ORi = Ai + Bi
Bi
Figure 6.66: Implementation of the OR function for stage i of the ALU. Oper-
ation select is 0001 ( + indicates logical OR)
Ai x Si
S
Bi y Full Adder
3x2 c ci+1
ci cin out
Figure 6.67: Implementation of the twos complement addition function for stage
i of the ALU. Operation select is 0010. cout is connected to cin for stage i+1.
to the high order bit is different from the carry-out of the high order bit.
18 If assuming unsigned representation, one would have to make sure that the B input is
A0 x D0
S
B0 y Full Adder
3x2 cout c1
1 cin
Figure 6.68: Implementation of the twos complement subtract function for stage
0 of the ALU. Operation select is 0110. cin is the constant 1, and cout is
connected to cin for stage 1.
Ai x Di
S
Bi y Full Adder
3x2 c ci+1
ci cin out
Figure 6.69: Implementation of the twos complement subtract function for stage
i, where i > 0, of the ALU. Operation select is 0110. cin is the carry-in, ci , which
is the carry-out of the previous stage. cout is ci+1 and is the carry-in for the
next stage
This means that the low order stage of the 32-bit ALU will be slightly
different from the other 31 stages. Fig 6.68 shows the subtract operation for the
low order stage of the ALU (i.e. stage 0) in which the carry-in to the full adder
is 1 rather than 0. Fig 6.69 shows the subtract operation for all other stages
of the ALU. In both of these figures we complement the B input to produce a
subtraction rather than an addition.
260 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
N ORi
ORi
Figure 6.70: Implementation of the NOR function for stage i of the ALU. ORi
represents the output of the OR operation shown in Fig 6.66. Operation select
is 1100.
For example, the output labeled Di in Fig 6.69 (the ith bit of the result of a
subtraction) is connected to the ith bit of input 6 to the MUX in Fig 6.71.
Fig 6.71 is our first attempt at the ALU design. It handles all of the ALU
operations except for the comparison operation (select code 7 = 01112).
In addition to the data output, the ALU also has a 1-bit Z output. This
output is 0 only when the data output bus is all zeros (see Fig 6.63). To
accomplish this we need to OR all the output data bits, and complement the
result:
6.6. CONSTRUCTION OF THE ALU 261
15
14
13
NOR 12
11
10
9
8 Mux (32)
ALU Output
7 16x32
D 6
5
4 Z
3
S 2
OR 1
AND 0 S3 S2 S1 S0
(4)
Operation
Select
Figure 6.71: Using a multiplexer to select the desired output for the ALU. Each
data input is a bus, and all data buses are 32 bits.
262 CHAPTER 6. BOOLEAN ALGEBRA AND DIGITAL LOGIC
6.6.7 Exercises
1. If an OR gate can have a maximum of 5 input lines, how many OR gates
would be needed to generate the Z output in Fig 6.71?
2. How many Full Adders are needed in the implementation of the ALU?
3. Show how our ALU could produce the one’s complement of the 32-bit
word on the A input. Do not make any changes to the ALU designed
in this section; instead show the appropriate operation select bits, and a
possible value for the B input to the ALU.
4. Show how to modify the design of the ALU so as to include an XOR
operation. Assume the operation select is 10112 .
19 There is generally a limit on the number of inputs to a single logic gate. In this case,
several OR gates would be needed, with their outputs all fed into a single OR gate.
20 With the development of computers with several cores, software is starting to become
increasingly parallel, but much remains to be done to make effective use of multiple cores in
a personal computer.
6.6. CONSTRUCTION OF THE ALU 263
D31
15
14
13
NOR 12
11
10
9
8 Mux (32) ALU
7 16x32 Output
D 6 1
5 Mux
Z
4 2x1
0
3 S0
S 2
OR 1
AND 0 S3 S2 S1 S0
Operation
Select
Figure 6.72: Completed implementation of the ALU, showing Z output for the
case where Operation Select = 7 (A < B).
Chapter 7
MIPS Datapath
This chapter shows how digital components can be used to implement the MIPS
instruction set. When the student has completed this chapter, it should be clear
how software can be implemented and executed by hardware.
The name commonly given to the components, storage elements, and connec-
tions which accomplish this is the datapath. We first introduce the components
which will be needed, and then show how they are connected to implement (a
subset of) the MIPS instruction set.
264
7.1. STORAGE COMPONENTS 265
(32) Data
in A (32)
(5) Register File
A sel 32x32 out
(5)
B sel B (32)
(5) Write out
Sel
R/W
instructions. Each register has a unique address (or register number) in the
range 0..31. Since there are 32 registers, the register addresses will be 5 bits in
length. For example, the address 101112 specifies register number 23. A block
diagram for the register file is shown in Fig 7.1.
The control signal to the register file labeled R/W determines whether a
Read or Write operation is to take place. When the R/W control signal is 1,
the operation is Read, and when the R/W is 0, the operation is Write.2
The register file has two output busses, A and B. The register values which
are placed on those busses are determined by the A Select and B Select inputs,
respectively. For example, when the value of R/W is 1, signifying a Read
operation, and A Select is 011012 = 13, and B Select is 000002 = 0, then the
contents of register 13 is placed on the A output bus, and the contents of register
03 is placed on the B output bus. In this case the Data In and Write Select
busses are ignored.
As another example, when the R/W is 0, signifying a Write operation, and
Write Select is 011112 = 15, the value on the Data In line is copied into Register
15. In this case the A Select, B Select busses are ignored.4
The table in Fig 7.2 shows examples of several other operations that are
possible with the register file.
Out busses, respectively, even when R/W is 0, but that is beyond the scope of our current
discussion.
266 CHAPTER 7. MIPS DATAPATH
Data
(32) Address Memory
in 1Gx32
Data (32)
out
(32) Data
in
R/W
Figure 7.3: Data Memory unit for MIPS consists of 1G 32-bit words.
Memory is where the program instructions are stored. Each of these storage
units consists of 1G x 32-bit words, where 1G = 230 .5
Each of these storage units has a 32-bit data input bus and a 32-bit output
bus. Each also needs a 32-bit address input bus to select a word from the
memory. The Data Memory also needs a R/W control signal. A block diagram
of the Data Memory unit is shown in Fig 7.3. When the R/W control signal is
0, the word on the Data In bus is copied into the Data Memory at the address
specified on the Address bus. The existing word at that address is clobbered
(i.e. it is over-written). When the R/W control signal is 1, the word at the
address specified on the Address bus is copied onto the Data Out bus.
The Instruction Memory differs from the Data Memory in that the Instruc-
tion Memory cannot be changed during program execution, i.e. a MIPS program
cannot modify itself. Thus there is no Instruction In bus for the Instruction
Memory. Also the Instruction Memory is always used in conjunction with two
dedicated CPU registers, the Program Counter register (PC) and the Instruc-
tion Register (IR). A block diagram for the Instruction Memory, with its two
dedicated registers is shown in Fig 7.4.
The PC register always stores the address of the next instruction to be
executed; thus, it serves as an Address input to the Instruction Memory. Each
time an instruction is executed by the CPU, the PC must be incremented by
4 to move to the next instruction in the program. When a branch or jump
5 Recall that since a memory address is stored in a 32-bit word, the memory is byte ad-
dressable, and there are 4 bytes in a word we will have an address space of 230 words.
7.1. STORAGE COMPONENTS 267
Instruction Memory
1Gx32
(32) Address Instruction (32)
PC IR
In Out
Figure 7.4: Instruction Memory unit for MIPS consists of 1G 32-bit instructions.
IR
J I R
(6)
op
(5)
rs
(5)
rt
(5)
rd
(5)
shamt
(6)
funct
(16)
immediate
jump address
(26)
Figure 7.5: Instruction Register, showing the fields for the three instruction
formats: J, I, and R.
7.1.3 Exercises
1. In an architecture with 16 registers in the register file, each of which is 8
bits:
(a) What is the width of the Data In bus to the Register File?
(b) What is the width of the A out and B out busses from the Register
File?
(c) What is the width of the A sel and B sel busses to the Register File?
(d) What is the width of the Write sel bus to the Register File?
(e) What is the width of the R/W signal to the Register File?
2. Draw a block diagram of the Register File described in the previous prob-
lem.
3. Assume that in the Register File described in Fig 7.1 each register has
been initialized with its own address. Thus, register 0 contains 0, register
1 contains 1, register 2 contains 2, ... register 31 contains 31. Complete
the table shown below by showing the register or bus which is changed
(e.g. reg[8] = 00003212x):
7.2. DESIGN OF THE DATAPATH 269
R/W = 0 R/W = 1
DataIn= 00004c3fx
A Sel = 00112
B Sel = 01012
Write Sel = 00012
4. Refer to the Data Memory in Fig 7.3. Show the input signals and busses
which are needed to:
(a) Put the value of the word at location 04c00124x onto the Data Out
bus.
(b) Change the word at location 4c001008x to f f f f f f f fx.
(c) Clear the word at location 4c001000x.
5. Show a block diagram for a byte addressable 16Gx64-bit Data Memory.
6. Refer to the Instruction Register (IR) which is loaded from the Instruction
Memory (Figures 7.4 and 7.5). Show the values of the fields for each of
the following 32-bit instructions. (Hint: The number of fields will depend
on the instruction type.)
(a) 03c42320x
(b) 2d004501x
(c) 0c000048x
1
0 time
(32) Data
in Register File (32)
(5) 32x32 A
A sel
(5)
B sel (32)
B
(5) Write
Sel
R/W
clock
r/w
Figure 7.7: A clock signal is used to synchronize the effects of all components
in the datapath
The period of the clock signal determines the speed of the processor. For
example, a clock which goes through one complete cycle in 0.001 second has a
frequency of 1000 cycles per second, or 1000 Herz = 1 KHz. A clock speed for
a typical processor would be on the order of 109 cycles per second = 1 GHz.
In implementing the datapath, we could use more than one clock cycle to
execute a single instruction. For example, an add instruction could load the
ALU inputs from the register file during one clock cycle, and store the result
back into the register file during the next clock cycle. This would be called a
two-cycle data path. Alternatively it is possible to implement the register file
in such a way that this instruction could be completed in one clock cycle, even
if the destination register is the same as one of the operand registers. In this
case we would have a one-cycle datapath (for which the clock speed would have
to be slower to avoid clobbering an operand register before the result has been
computed by the ALU). To simplify the exposition of the datapath we will be
using a one-cycle design.
7.2. DESIGN OF THE DATAPATH 271
(32)
Data A
in Register File
Z
32x32 A
A sel ALU
B sel
B B
Write
Sel
R/W
Figure 7.8: Connecting the Register File with the ALU in the datapath
IR
R
Data
in Reg File
rs(5) 32x32 A
Instr Mem A sel
1Gx32
(32) rt(5)
Address Instr B sel
in Out B
rd(5) Write
Sel
R/W
Figure 7.9: Connecting the Instruction Memory, Instruction Register and the
Register File in the datapath, for an R format instruction, such as add
IR
(32)
I
Data
in Reg File (32)
rs(5) 32x32 A A
A sel
Z
rt(5) Addr Data Mem
B sel ALU (32) 1Gx32
B In
Data
Write
Out
Sel B Data
R/W In
imm(16) (32)
SE R/W
Figure 7.10: Connecting the Instruction Register, the Register File, the ALU,
and the Data Memory in the datapath, for load/store instructions
ADD
4 A
ADD
(32)
(32)
B
IR
J I R
Instr Mem
1Gx32
PC Address Instr
(32) in Out
Figure 7.11: Connecting the PC, Instruction Memory, and Instruction Register
for sequential transfer of control. A dedicated 32-bit adder is used to increment
the PC by 4 (the second adder is not needed here).
7.2. DESIGN OF THE DATAPATH 275
ADD
4 A
ADD
IR
J I R
Instr Mem
1Gx32
PC (32) Address Instr (32)
(32) in Out
Figure 7.12: Connecting the PC, Instruction Memory, and Instruction Register
for unconditional Jump instructions. The jump address is copied into the PC.
(The dedicated adders are not used here)
The three options described above are combined into a single diagram show-
ing the connection of the PC, Instruction Memory, and Instruction Regster (IR)
in Fig 7.14. Note that when we combine all this logic into a single diagram,
we produce contradictions, as described in chapter 6. These are points in the
datapath where two or more sources come together, and are circled in Fig 7.14.
It is critical that we resolve these contradictions; each contradiction can be
resolved with a multiplexer. Recall from chapter 6 that a multplexer (or MUX)
with n select inputs can select one of 2n input busses to be copied to a single
output bus. In Fig 7.14 each contradiction can be resolved with a 2x32 MUX
(i.e. a MUX with two 32-bit inputs, one 32-bit output, and a single select input).
This is shown in Fig 7.15. In this diagram we label the two MUXes MUX BC
(for BC instructions) and MUX J (for J or JAL instructions); in what follows
we will need to refer to them individually.
Note that there is nothing connected to the select input for either of these
multiplexers. They will both have to come from the Control Unit, to be de-
scribed below.
7.2.5 Exercises
1. The speed of a CPU is determined by a clock signal.
(a) For a 20 MHz clock (1M Hz = 106 Hz): How many clock cycles are
there in 3.7 seconds?
(b) What is the speed of a clock which issues 6,750 pulses every second?
2. The diagram in Fig 7.8 is designed to execute which of the following MIPS
instructions?
(a) add
(b) lw
(c) or
(d) bc
(e) j
3. (a) What are the names (and widths, in bits) of the unlabeled fields in
the IR shown in Fig 7.9?
(b) Briefly explain why they are not relevant in this diagram?
4. (a) In Fig 7.10 which of the ALU operations should be selected? Refer
to Fig 6.63.
7.2. DESIGN OF THE DATAPATH 277
ADD
4 A
ADD
(32)
IR
J I R
Instr Mem
1Gx32
PC Address Instr
(32) in Out
SE
Figure 7.13: Connecting the PC, Instruction Memory, and Instruction Register
for conditional branch instructions. A dedicated 32-bit adder is used to incre-
ment or decrement the PC by the relative branch address (the other adder is
not used here).
278 CHAPTER 7. MIPS DATAPATH
ADD
4 A
ADD
(32)
(32)
B
IR
J I R
Instr Mem
1Gx32
PC Address Instr
(32) in Out
Figure 7.14: Connecting the PC, Instruction Memory, and Instruction Register.
Dedicated 32-bit adders are used for transfer of control. Contradictions are
circled.
7.2. DESIGN OF THE DATAPATH 279
1
MUX BC
0
2x1
0 S0
(32)
A
(32)
ADD
4 A
ADD
(32)
B
IR
J I R
Instr Mem
1 1Gx32
MUX J
0
2x1 PC Address Instr
0 S0
(32) in Out
Figure 7.15: Connecting the PC, Instruction Memory, and Instruction Register.
Dedicated 32-bit adders are used for transfer of control. Contradictions have
been resolved with multiplexers.
280 CHAPTER 7. MIPS DATAPATH
5. In Fig 7.10 there is a R/W signal to the Register File and to the Data
Memory.
(a) What should be the value of each of those signals if an add instruction
is being executed?
(b) What should be the value of each of those signals if a beq instruction
is being executed?
(c) What should be the value of each of those signals if a lw instruction
is being executed?
(d) What should be the value of each of those signals if a sw instruction
is being executed?
6. In Fig 7.12 the jump address field is only 26 bits, but the PC is 32 bits.
Show a better version of this diagram to rectify this problem. (See the
section on busses in chapter 6)
7. In Fig 7.10 explain briefly why the Sign Extend SE component is needed
for conditional branches.
8. In Fig 7.16 Component 1 has one output, and Component 2 has two
outputs. Identify the contradiction(s), if any, resolve using multiplexer(s)
and redraw the diagram, if necessary. (See Fig 7.14 and Fig 7.15)
• R/W signals for the Data Memory and the Instruction Memory.
The datapath logic for the production of these signals is called the control
unit. Since these signals depend on the instruction being executed, the control
unit will take as input the op field6 of the instruction; i.e. the operation code.
In cases where several different R format instructions share the same op code,
the control unit will also examine the funct field of the instruction. Using these
inputs the control unit will produce the necessary select and control signals for
the datapath. We examine these output signals below for a subset of the MIPS
instruction set. Specifically, we will handle the following:
6 This was called the opcode field in chapter 4.
7.3. THE CONTROL UNIT 281
Comp 1 Comp 3
Comp 2 Comp 4
Comp 5
• R format instructions: add, sub, and, or. (we provide framework for the
slt instruction and leave its completion as an exercise)
• Unconditional jump: j
In the above table we see that the high order bit (bit 3) of the ALU operation
7 Recall that 0 asserts a Write operation for the R/W input.
7.3. THE CONTROL UNIT 283
code is always 0.
ALU3 = 0
Bit 2 is 1 for the sub and slt instructions only:
ALU2 = op′5 op′4 op′3 op′2 op′1 op′0
( f5 f4′ f3′ f2′ f1 f0′ + f5 f4′ f3 f2′ f1 f0′ )
For bit 1 of the ALU, it is 0 only for the add, sub, and slt instructions:
ALU1 = op′5 op′4 op′3 op′2 op′1 op′0
( f5 f4′ f3′ f2′ f1′ f0′ + f5 f4′ f3′ f2′ f1 f0′ + f5 f4′ f3 f2′ f1 f0′ )’
Finally, bit 0 of the output to ALU operation select is 1 for the or and slt
instructions only.
ALU0 = op′5 op′4 op′3 op′2 op′1 op′0
( f5 f4′ f3′ f2 f1′ f0 + f5 f4′ f3 f2′ f1 f0′ )
We have shown boolean expressions for each of the four output lines from the
control unit to the ALU operation select. The logic diagram can be constructed
from these four expressions.
To MuxJ
From Adder(immediate) 1
MUX BC
From Reg File(rs) 0
2x1
From Adder(PC+4) 0
S0
A
Z
From Reg BEQ
ALU
File(rt)
BNE
Op Select
(Subtract=0110)
Figure 7.17: Generating the Select input for the BC Multiplexer, from the ALU
Z output and the control unit. Signals from control unit are dashed arrows.
• BEQ is a signal from the Control Unit that the instruction being executed
is beq
• BNE is a signal from the Control Unit that the instruction being executed
is bne
• Z is the Z output signal from the ALU indicating that the output of the
ALU is 0.
This means we will need two AND gates,an Inverter, and an OR gate in our
datapath, forming the input to the MUX BC multiplexer. The select input to
the BC multiplexer is shown in Fig 7.17.
We can now write the boolean expressions for the Control Unit outputs
described above. The op codes for the conditional branch instruction are be =
04x = 00 01002 and bne = 05x = 00 01012.
BEQ = op′5 op′4 op′3 op2 op′1 op′0
BNE = op′5 op′4 op′3 op2 op′1 op0
being executed is an unconditional jump. Thus, all that is needed is to use that
as the select signal to MUX J, as shown in Fig 7.18.
IR to Write Select
In comparing Fig 7.9 with Fig 7.10 we see anothe contradiction which needs
to be resolved. This is at the Write Select input to the Register File, which
determines which register is to receive the result of the operation. For an R
format instruction, such as add, the Write Select should come from the rd
field of the IR. For the I format instruction lw (load word) the Write Select
comes from the rs field of the IR. Thus a multiplexer is needed to resolve this
contradiction as shown in Fig 7.20. The control unit will select the rs field only
if the instruction is lw. Otherwise it will select the rd field.
1
MUX BC
0
2x1
0 S0
(32)
See
Fig 7.17
A
(32)
ADD
4 A
ADD
(32)
B
IR
J I R
Instr Mem
1 1Gx32
MUX J
0
2x1 PC Address Instr
0 S0
(32) in Out
Figure 7.18: Select signal to the MUX J multiplexer is the signal labeled J, from
the control unit.
7.3. THE CONTROL UNIT 287
(32)
1
MUX RF
0
2x1
0 S0
Data (32)
in Reg File (32)
32x32 A A RF
A sel
Z Data
Addr Mem
B sel ALU
B In 1Gx32
Write Data
Sel Out
B Data
R/W In
R/W
From IR
Immediate
Figure 7.19: Using a multiplexer to resolve the contradiction at the Data In to
the Register File. The RF signal is produced by the control unit.
IR
I R
Data
in Reg File
rs(5) 32x32 A
A sel
rt(5)
B sel
B
Write
rd(5) Sel
1
MUX WS R/W
2x1
0 S0
WS
Figure 7.20: Using a muiltplexer to resolve the contradiction at the Write Select
input to the Register File. The WS signal is produced by the control unit.
288 CHAPTER 7. MIPS DATAPATH
IR
I
Data
in Reg File (32)
A
rs(5) 32x32 A
A sel Z To Reg File
rt(5) ALU
and Data Mem
B sel Addr In
B
Write 1
MUX ALU
Sel 0 B
2x1
R/W 0 S0
imm(16)
ALU B
Figure 7.21: Using a multiplexer to resolve the contradiction on the B input to
the ALU. The ALU B signal is produced by the control unit.
We finally have all the information we need to build the control unit. We
do this by examining each instruction in our subset, and deciding what each
output of the control unit should be for that instruction. This is shown as a
table in Fig 7.22. The value for RF for the slt instruction is left as an exercise.
7.3. THE CONTROL UNIT 289
Figure 7.22: Table showing the outputs of the control unit for each instruction.
Don’t cares are indicated by question marks.
Note that some of the entries in the table are question marks. These repre-
sent don’t care values. These values could be either 0 or 1; it doesn’t matter.
This gives us more flexibility and can simplify the logic for the control unit. As
an example, for a store word (sw) instruction, the RF signal is a don’t care.
This signal determines whether the Register File is loaded from the ALU out-
put or from the Data Memory. But for a store word instruction, the Register
File is not written, hence it does not matter what comes out of the MUX RF
multiplexer. In general, whenever the RegW signal is 1 (the Register File is not
written), the RF signal to MUX RF will be a don’t care.
We now take one instruction, the and instruction and explain each control
unit output for that instruction.
• ALUOP must be 0000 because that is the ALU operation code for logical
AND.
To build the control unit we should first write a boolean expression for
each output, using the instruction op code and function code (for R format
instructions). For example, the table in Fig 7.22 shows that the RegW signal
should be 1 when the instruction is any one of the following:
7.3.6 Exercises
1. The Control Unit output to the Register File R/W input is called RegW,
and is shown above. In that expression op represents the 6-bit opcode
field of the instruction and f represents the 6-bit function code. Rewrite
this expression to accommodate the jump register (jr) instruction and the
jump and link (jal) instruction.
2. The table used to determine the Control Unit output to the ALU operation
select is shown below.
Instruction Op code Function code ALU Operation Code
add 0 20x = 10 00002 00102
sub 0 22x = 10 00102 01102
and 0 24x = 10 01002 00002
or 0 25x = 10 01012 00012
lw 23x = 10 00112 00102
sw 2bx = 10 10112 00102
(a) Include another row in this table for the set if less than (slt) instruc-
tion.
7.3. THE CONTROL UNIT 291
1
MUX BC
2x1
0 S0
(32)
See
Fig 7.17
A
(32)
ADD
4 A
ADD
(32)
B
IR
J I R
From MUX RF
Instr Mem
1 1Gx32 To
MUX J
2x1 ALU
0 S0 PC Address Instr Data A
(32) in Out in Regs
rs(5) 32x32 A
A sel
J rt(5)
B sel
MUX B
WS Write
Sel
R/W
WS
SE
RegW
To
Data
1 Mem
MUX ALU
2x1
0 S0
To
ALU
ALUB B
Figure 7.23: Datapath for the MIPS architecture. Signals from the control unit
are shown with dashed arrows (see also Fig 7.24).
292 CHAPTER 7. MIPS DATAPATH
To Regs
Data In
1
MUX RF
From Regs A A 0
2x1
Z (32) 0 S0
ALU
R/W
(6) DW
From IR op op To MUX ALU
ALUB
RF
Control Unit
WS To MUX WS
2x9
J To MUX J
(6) BEQ To Fig 7.17
From IR func func
BNE To Fig 7.17
ALU OP
Figure 7.24: The full datapath (with Fig 7.23) showing the control unit. Control
Unit signals are shown with dashed arrows.
7.3. THE CONTROL UNIT 293
(b) Rewrite the boolean expressions, as needed, for the Control Unit’s
4-bit output to the ALU.
3. Describe in words how the two 2x1 multplexers in Fig 7.15 can be combined
into a single 4x1 multiplexer (with one of its 4 inputs unused).
4. What would be the output of the Control Unit if we were to include the
set if less than (slt) instruction in our subset of MIPS instructions?
(a) Show the value for RF in the row for slt in Fig 7.22.
(b) What other changes would have to be made to the datapath?
5. (a) Show the boolean expression for the DW output of the Control Unit.
(b) Show the boolean expression for the ALUB output of the Control Unit.
(c) Show the boolean expression for the RF output of the Control Unit.
(d) Show the boolean expression for the WS output of the Control Unit.
(e) Show the boolean expression for the J output of the Control Unit.
(f) Show the boolean expression for the BEQ output of the Control Unit.
(g) Show the boolean expression for the BNE output of the Control Unit.
(h) Show the boolean expression for the low order bit of the ALU OP
output of the Control Unit.
Chapter 8
This chapter will focus on strategies used to improve latency for the data
memory. The fast (but expensive) memory in the CPU is known as cache
memory. We will investigate strategies that are used to ensure that frequently
accessed words are kept in the cache as much as possible.
The principles of fast access to data memory will also apply to virtual mem-
ory. Virtual memory is the term normally used to describe an expansion of
the RAM using a secondary storage device such as disk or flash memory. Thus
the memory hierarchy consists of (in order from fastest and most expensive to
cheapest and slowest):
294
8.1. INTRODUCTION TO THE MEMORY HIERARCHY 295
1. cache memory
2. RAM
3. virtual memory
8.1.2 Exercises
1. Given the following types of storage and memory:
• Flash memory
• Fixed Magnetic Disk
• (Removable) Optical Disk
• Static Random Access Memory (SRAM)
• Magnetic Tape
• Dynamic Random Access Memory (DRAM)
1 We borrow this term from chemistry, in which a volatile fluid is one which evaporates
quickly.
296 CHAPTER 8. THE MEMORY HIERARCHY
sub:
lp:
ble $t0,$0, done # finished with loop?
lw $t1, incr # increment
add $t2, $t2, $t1 # $t2 <- $t2 + incr
addi $t0, $t0, -1 # decrement loop counter
j lp # repeat
done:
• Every time a write operation occurs the cache block is copied back to the
corresponding block in RAM, ensuring that the cache block agrees with
its corresponding RAM block.
• The cache stores a dirty bit for each cache block. Initially the dirty bits
are all 0. When a write operation occurs, the dirty bit for the selected
cache block is set to 1. When a different RAM block is copied to cache
(see above) the cache checks the dirty bit. If it is 1, the cache block differs
from its corresponding RAM block, so the cache block is copied to the
corresponding RAM block. Then the new RAM block can be copied to
the cache, and its dirty bit is cleared.
If you are viewing this page in color, you will see that column 1 in the cache
is yellow, as are columns 1, 9, and 17 in RAM. This shows that these blocks
map directly to block 1 in the cache. In general, since there are 8 blocks in the
cache, block number b in the RAM will map directly to block b mod 8 in the
cache. Also notice that block 3 in the cache is shown in blue, as are blocks 3,
11, and 19 in the RAM, showing that these blocks map directly to block 3 in
the cache.
As mentioned above, the cache needs to write modified blocks back to RAM;
at this point it will need to know which of the 4 possible RAM blocks it is storing.
This information is shown as a 2-bit quantity labeled main block in the cache.
It is also known as a tag.
In this example there was a reference to the byte at address 4bx = 0100 10112.
This byte is shown with a (red) circle in Fig 8.2. We can dissect that address
as follows: 0100 10112 = 01 001 0112
In general, a RAM address can be viewed as shown in Fig 8.3. The number
of bytes in a block determines the size of the byte field. The number of blocks
in the cache determines the size of the cache block field. The remaining bits in
a RAM address determine the size of the tag field.
In our example, we are referencing the byte at address 4bx . The correspond-
ing block has been copied from block 9 in RAM into block 1 of the cache. Also
the cache stores the 2-bit main block number (or tag) 01 so that it knows which
of the 4 possible RAM blocks it is currently storing.
We now provide an example to illustrate the behavior of the cache memory.
Recall that when the RAM block being accessed is already in cache, it is called
a cache hit, and when that block is not in the cache it is called a cache miss. In
the table below we show a sequence of RAM addresses being accessed, and the
effect that they have on the cache.
Note in the reference to 2dx in the last line of the table that it is a repeat
of the second line. However, the effect is a cache miss because cache block 101
has been clobbered by the reference to efx which is mapped to the same cache
block.
8.2. CACHE MEMORY 299
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 = 1 byte
0 1 0 1 0 1 0 1
000
Cache 001
(fast) 010
011 = 1 block
100
101
110
111
Main block
(tag) 0 0
1 0
Main memory
(slow)
000
001
010
011
100
101
110
111
0 0 0 0 1 1
Block 0 0 1 1 0 0
number 0 0 0 0 0 0
0 1 0 1 0 1
1 1 1 1 1 1
Figure 8.2: Diagram of direct-mapped cache memory and main memory (RAM).
Byte at memory address 4bx = 0100 10112 = 01 001 0112 is accessed.
Figure 8.3: Fields of a RAM address for a direct-mapped cache memory, corre-
sponding to Fig 8.2
300 CHAPTER 8. THE MEMORY HIERARCHY
Figure 8.4: Fields of a RAM address for an associative cache memory, corre-
sponding to Fig 8.5
The fields of a RAM address, with an associative cache, are similar to those
shown in Fig 8.3. However, instead of a cache block field, we have a set field,
as shown in Fig 8.4. Each RAM block maps to a set of blocks in the cache,
rather than to an individual block.
To see why this scheme is potentially faster than a direct-mapped cache,
consider the case where there are successive references to different RAM blocks,
all of which map to the same cache block. For example, referring to Fig 8.2,
suppose there are references to the following memory locations in the sequence
shown:
88x (= 1000 10002 = 10 001 0002 )
4bx (= 0100 10112 = 01 001 0112 )
09x (= 0000 10012 = 00 001 0012 )
cax (= 1100 10102 = 11 001 0102 )
The bytes referenced by these addresses are all in blocks which map to the same
cache block (0012 ). Thus there will be a cache miss on each reference, which
essentially nullifies the speed advantage provided by a cache memory.
If this had been a 4-way set associative cache, it could potentially store all
4 blocks in cache at the same time, thus converting 3 of the cache misses into
cache hits.
In Fig 8.5 there is a reference to the byte at RAM address 4bx = 0100 10112 =
01 001 0112 . The byte at this address is shown with a filled circle, and its block
(number 01001) has been copied to one of the two blocks in cache set 01. Let’s
assume it is in the second column of this set, though it could as well be in the
first column (see discussion of block replacement strategies, below). The tag for
this block will be the three high order bits of the block number: 010.
If you are viewing Fig 8.5 in color, you will see that RAM blocks 1, 5, 9,
13, 17, 21, shown in yellow, all map to the same set, set 1, because these block
numbers are all congruent to 1 (mod 4); if these numbers are expressed as 5-bit
binary numbers, take the last two bits to get the set number. Similarly, RAM
blocks 3, 7, 11, 15, 19, shown in blue, all map to the same set in the cache, set
3.
0 0 1 1
= 1 byte
0 1 0 1
000
Cache 001
(fast) 010
011 = 1 block
100
101
110
111
Main block 1 0
(tag) 0 1
1 0
Main memory
(slow)
000
001
010
011
100
101
110
111
0 0 0 0 0 0 0 0 1 1 1
Block 0 0 0 0 1 1 1 1 0 0 0
number 0 0 1 1 0 0 1 1 0 0 1
0 1 0 1 0 1 0 1 0 1 0
1 1 1 1 1 1 1 1 1 1 1
Figure 8.5: Diagram of a 2-way associative cache memory and main memory
(RAM). Bytes at RAM addresses 4bx = 0100 10112 = 01 001 0112 and abx =
1010 10112 = 10 101 0112 are accessed, which map to blocks in the same cache
set.
8.2. CACHE MEMORY 303
0 0 1 1
= 1 byte
0 1 0 1
000
Cache 001
(fast) 010
011 = 1 block
100
101
110
111
Main block 1 0 1 0 1 0 0 0
(tag) 1 0 0 1 0 1 1 0
1 1 1 0 1 1 0 0
Figure 8.6: Initial state of the cache for an example of the LRU strategy
0 0 1 1
= 1 byte
0 1 0 1
000
Cache 001
(fast) 010
011 = 1 block
100
101
110
111
Main block 1 0 1 1 1 0 0 0
(tag) 1 0 0 1 0 1 1 0
1 1 1 0 1 1 0 0
Figure 8.7: Final state of the cache for an example of the LRU strategy
304 CHAPTER 8. THE MEMORY HIERARCHY
8.2.3 Exercises
1. You are given a byte-addressable RAM with 4K bytes and a direct-mapped
cache memory with 512 bytes. The block size is 32 bytes.
(Hint: Work with exponents of 2)
(a) How many blocks are in the RAM?
(b) How many blocks are in the cache memory?
(c) If a program accesses the byte at location 9c7x, which block of RAM
is copied to the cache? Give the block number as shown in Fig 8.2.
(d) To which block of the cache is it copied? Show the column heading
as shown in Fig 8.2.
(e) What will be the tag value on that cache block?
2. (a) Complete the following table for the cache and RAM shown in Fig 8.2.
Assume the memories are initially clear, so the first reference to any
block is a cache miss.
(c) Put another circle on the cache byte to which the RAM address 0057x
is mapped.
(d) Show the tag value (in hex) in the cache for the block containing the
circled byte.
4. Given a RAM storing 256M bytes and a 4-way associative cache memory
storing 64K bytes, with a block size of 128 bytes:
Hint: Use exponents of 2.
(a) How many blocks are in the RAM?
(b) How many blocks are in the cache?
(c) How many sets of blocks are in the cache?
(d) Show a diagram of a RAM address, similar to Fig 8.4. In the diagram
show the width, in bits, of each field.
5. Refer to the 2-way associative cache of Fig 8.5 and the table of RAM
memory references shown below.
ory was developed by operating systems people, they often use different terminology for the
same concepts.
8.3. VIRTUAL MEMORY 307
Processor
RAM
Disk
Swap
Space
Figure 8.8: Diagram of a virtual memory system. Each small block represents
one page.
308 CHAPTER 8. THE MEMORY HIERARCHY
0 1 2 3
RAM
Disk
Swap
Space
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 8.9: Diagram of a small virtual memory system; the page table is shown
in Fig 8.10.
This table is shown in Fig 8.10. It shows that page 2 in RAM, for example, is a
copy of page 12 in virtual memory. When it is to be replaced, the page table is
consulted so the page can be copied back to the correct page in the Disk Swap
Space.
In our example we have a virtual memory system storing 16 pages, and
only 4 pages in RAM. If we assume that each page is 8K bytes (8K = 213 ), a
virtual memory address would be 17 bits: 4 bits for the page number and 13
bits for the offset within a page. For a virtual memory address of 18158x =
1 1000 0001 0101 10002 = 1100 0 0001 0101 10002 the page number is 12 =
11002 and the byte offset within the page is 0158x = 0 0001 0101 10002.
As with associative cache memories, a virtual memory system will need a
page replacement policy. This algorithm decides which page is replaced when a
page fault occurs. Virtual memory systems typically use a Least Recently Used
(LRU) algorithm, though First-In First-Out (FIFO) and random algorithms are
also used (see the section on cache memories).
Using the page table of Fig 8.10 and assuming the RAM pages were loaded
in order: 0, 1, 2, 3, we show what results from a reference to virtual memory
address 074f cx = 0 0111 0100 1111 11002 = 0011 10100111111002. We see that
it is a reference to virtual page 00112 = 3, which is not currently in RAM. This
is a page fault. If we are using an LRU page replacement algorithm, we will
replace the page at RAM page 0. It is copied back to the disk, and replaced
with page 3 from the disk swap space.
We emphasize that the cost of virtual memory is an increased latency in
memory access. A page fault is the term used when the system must copy a
page from the disk swap space into RAM. Every time there is a page fault there
is a significant delay, because access time to the disk is on the order of one
thousand times the access time to RAM. If an executing program causes many
page faults, the system will spend more time swapping pages of memory than
it will executing the user’s program. This situation is called thrashing.
8.3. VIRTUAL MEMORY 309
Page Table
Disk RAM
5 = 0101 0
1 = 0001 1
12 = 1100 2
6 = 0110 3
Figure 8.10: Page table for the small virtual memory of Fig 8.9
(4) (13)
Figure 8.11: Diagram of a virtual address for a system with 16 pages in virtual
memory, and 8K bytes in each page. Address is 18158x.
8.3.1 Exercises
1. Given a virtual memory system storing 4G bytes, a page size of 64K bytes,
and a RAM storing 16M bytes.
Hint: Work with powers of 2.
(c) What is the page number for the virtual memory address 402a0100x?
2. Refer to Fig 8.11. Consider a computer with 128K bytes in virtual memory,
and a page size of 8K bytes. The table below shows a sequence of virtual
memory references. Complete the table showing the virtual memory page
number (in binary and in hex), whether or not a page fault has occurred,
and the RAM page which is referenced. Assume there are 4 pages in the
RAM.
310 CHAPTER 8. THE MEMORY HIERARCHY
(a) Assume an LRU page replacement algorithm is used. Show the page
table when completed.
(b) Assume a FIFO page replacement algorithm is used. Show the page
table when completed.
8.4 Locality
As we have seen in the preceding sections, cache memory is capable of improving
run-time efficiency, and virtual memory is capable of expanding the memory’s
capacity without a significant degradation of performance. However, both of
these improvements are subject to a condition known as locality. If there are
many references to locations in a small number of different cache blocks (or
virtual memory pages), then there will be few cache misses (or page faults). In
this case we say that the executing program exhibits good locality. If there are
references to memory locations in many widely scattered cache blocks (or in
many different virtual memory pages), performance is degraded, and thrashing
may occur. In this case we say that the executing program exhibits poor locality.
This locality principle applies to both the cache memory level and the virtual
memory level in the memory hierarchy. Hence we will use the phrase memory
unit to mean either cache block or virtual memory page. In place of the phrases
‘block miss’ or ‘page fault’, we use the phrase memory fault.
In what follows we distinguish between the various kinds of locality on dif-
ferent dimensions: Data vs. Instruction, and Temporal vs. Spatial.
import java.util.Random;
System.out.println ("done");
}
}
Figure 8.12: Java program to contrast good and poor data locality
8.4. LOCALITY 313
16 blocks. This would be an example of good temporal locality, but poor spatial
locality. Temporal locality is good when memory units accessed are accessed
subsequently soon thereafter. In this example the program would exhibit good
locality despite the poor (spatial) data locality.
8.4.3 Exercises
1. Which of the following loops exhibits good spatial locality, and which
exhibits good temporal locality?
(a) int MAX = 10000;
for (int i=0; i<MAX; i++)
sum = sum + nums[i];
(b) int MAX = 10000;
int ctr = 0;
for (int i=0; i<MAX; i++)
{ ctr = (ctr+100) % MAX;
sum = sum + nums[ctr];
}
2. Show an example (or template) of a Java program with poor instruction
locality.
3. In Java, as with most programming languages, the elements of an array
are stored in row-major order. That means the elements are mapped to
the one-dimensional memory by rows, first all the elements in row 0, then
all the elements in row 1, then all the elements in row 2, etc. Consider the
program shown in Fig 8.13. The main method simply uses a nested loop
to store a value in each position of a two dimensional array. It does this
twice, once in row-major order, and once in column-major order.
(a) Which loop will execute faster, or do they run in the same time?
(b) Run this program on a computer to verify your response to part (a).
(c) Explain why one of the loop executes much faster than the other, or
explain why they execute in the same amount of time.
4. Which of the following sorting algorithms would you assume exhibit good
data locality, and which would exhibit poor data locality?
• Selection Sort
• Bubble Sort
• Quick Sort
• Merge Sort
314 CHAPTER 8. THE MEMORY HIERARCHY
System.out.println ("done");
}
}
Figure 8.13: Java program to explore running time for a matrix of ints
Chapter 9
Alternative Architectures
Since the early years of computing, many different designs have been promoted
for the central processing unit of a computer. However, there are some things
which are relatively stable and commonplace. In most computer architectures
today there is a stored program design in which a sequence of instructions is
stored in memory. This design was first proposed by the Princeton mathemati-
cian John Von Neumann in 1945. It has come to be known as the Von Neuman
Architecture and is defined by:
tecture; or they may be in the same memory, in which case a program is capable of modifying
itself.
315
316 CHAPTER 9. ALTERNATIVE ARCHITECTURES
All arithmetic instructions can produe a result without any operands; they
always operate on the top two values on the stack (popping them from the
stack) and push the result of the operation onto the stack. For example, to
compute a-(b+c):
Push a
Push b
Push c
Add // Pop c, Pop b, Push b+c
Sub // Pop b+c, Pop a, Push a-(b+c)
Alternatively, to compute (a-b)+c:
Push a
Push b
Sub // Pop b, Pop a, Push a-b
Push c
Add // Pop c, Pop a-b, Push (a-b)+c
Note that the value on top of the stack is the right operand of the operation.
Clr // Acc = 0
Add x // Acc = x
Some one-address machines would also have a negate operation, Neg, to form
the two’s complement of the value in the accumulator. This can be used for
subtraction:
a-b = a + (-b)
To compute the value of the expression a-(b+c):
Clr // Acc = 0
Add b // Load b into Acc
Add c // Acc = b+c
Neg // Acc = -(b+c)
Add a // Acc = a-(b+c)
Clr // Acc = 0
Add b // Load b into Acc
Neg // Acc = -b
Add a // Acc = a-b
Add c // Acc = (a-b)+c
This machine would also have Load and Store instructions which reference mem-
ory locations:
Load r1, b // r1 = b
Add r1, c // r1 = b+c
Load r2, a // r2 = a
Sub r2, r1 // r2 = a-(b+c)
2 Note that here the second operand could be either a memory location or a register.
318 CHAPTER 9. ALTERNATIVE ARCHITECTURES
9.1.5 Exercises
1. Show how to evaluate the expression x = (a+b+c)-(d-f), where each vari-
able represents a memory location, using
(a) A zero-address architecture
(b) A one-address architecture (assume a Store instruction can store the
accumulater in a given memory location)
(c) A two-address architecture (assume registers r1 and r2 are available)
(d) A three-address architecture (assume registers r1, r2, and r3 are avail-
able)
Memory
Instruction
op address
operand
Figure 9.1: Diagram of the direct addressing mode. The instruction stores an
absolute memory address
Instruction Memory
op address
operand
Figure 9.2: Diagram of the indirect addressing mode. The instruction stores
the address of the memory word which contains the address of the operand
Instruction
Memory
op reg disp
Regs
operand
able. Index registers are usually used to step through the elements of an array,
by starting with 0 in the index register, and incrementing the index register (by
the size of an array element) each time the next array element is needed. A
diagram of the base-displacement mode is shown in Fig 9.4. In this diagram the
instruction contains fields for a base register, an index register, and a displace-
ment. The base register is register 4. The index register is register 1, which
contains 3, and the displacement is 4 memory words. The effective address of
the operand is thus 3 + 4 = 7 words beyond the address in register 4.
9.2.5 Exercises
1. Show how the six elements of a list of 32-bit contiguous numbers, named
A, can be added, using instructions with each of the following addressing
modes. Assume that we are using a two-address architecture with the
following instructions:
322 CHAPTER 9. ALTERNATIVE ARCHITECTURES
Instruction
Memory
op base ndx disp
Regs
operand
Instruction Meaning
add rs,rt reg[rs] = reg[rs] + reg[rt]
sub rs,rt reg[rs] = reg[rs] - reg[rt]
lod rs,addr reg[rs] = memory[addr]
sto rs,addr memory[addr] = reg[rs]
add rs,addr reg[rs] = reg[rs] + memory[addr]
sub rs,addr reg[rs] = reg[rs] - memory[addr]
beq rs,rt,label branch to label if reg[rs]==reg[rt]
blt rs,rt,label branch to label if reg[rs]<reg[rt]
bgt rs,rt,label branch to label if reg[rs]>reg[rt]
ble rs,rt,label branch to label if reg[rs]≤reg[rt]
bge rs,rt,label branch to label if reg[rs]≥reg[rt]
bne rs,rt,label branch to label if reg[rs]6=reg[rt]
(a) Direct addressing. Assume the values in the array have labels A0,
A1, A2, A3, A4, A5. Assume there is an add instruction with two
operands; the first operand is a register and the second operand is
an absolute memory address.
add reg, address
will add the contents of the register to the memory word at the
specified address, and store the sum back into the register.
(b) Indirect addressing. Assume the addresses of the six numbers are in
contiguous memory locations named A0,A1,A2,A3,A4,A5. Add the
following instructions to the instruction set described above:
9.2. ADDRESSING MODES 323
0000010016 00 00 01 03 00 00 01 08 00 00 01 00 00 00 01 05
0000011016 00 00 01 07 00 00 01 09 00 00 01 01 00 00 01 0c
Instruction Meaning
lodI rs,addr reg[rs] = memory[memory[addr]]
stoI rs,addr memory[memory[addr]] = reg[rs]
addI rs,addr reg[rs] = reg[rs] + memory[memory[addr]]
subI rs,addr reg[rs] = reg[rs] - memory[memory[addr]]
(c) Base-Displacement addressing. Add the following instructions to the
instruction set described above:
Instruction Meaning
lod rs,(rt)disp reg[rs] = memory[reg[rt]+disp]
sto rs,(rt)disp memory[reg[rt]+disp] = reg[rs]
add rs,(rt)disp reg[rs] = reg[rs] + memory[reg[rt]+disp]
sub rs,(rt)disp reg[rs] = reh[rs] - memory[reg[rt]+disp]
(d) Base-Index-Displacement addressing. Add the following instructions
to the instruction set described above:
Instruction Meaning
lod rs,(rt,rx)disp reg[rs] = memory[reg[rt] +reg[rx]+disp]
sto rs,(rt,rx)disp memory[reg[rt] +reg[rx]+disp] = reg[rs]
add rs,(rt,rx)disp reg[rs] = reg[rs] + memory[reg[rt]+reg[rx] +disp]
sub rs,(rt,rx)disp reg[rs] = reh[rs] - memory[reg[rt]+reg[rx] +disp]
2. Assume you are given the instruction set from the problem above, and
assume that register r0 always contains 0. Also assume that memory has
been initialized as shown in the memory dump in Fig 9.5. Show the value
stored in register r1 when the label done is reached for each of the following
code segments:
9.3 ARM
ARM (Advanced RISC Machine) was first produced in the early 1980’s by the
British corporation Acorn Computers. RISC is a Reduced Instruction Set Com-
puter. These computers typically have many registers, but just a few instruc-
tions in the instruction set, and have often outperformed computers with many
more instructions.
R Format Instructions
An example of an R format instruction is the ADD instruction. In ARM assembly
language it is:
ADD Xd, Xn, Xm
The intent is that registers Xn and Xm are added, with the result placed in
register Xd:
Xd ← Xn + Xm
For example, the instruction below will add the contents of registers X3 and
9.3. ARM 325
Figure 9.6: Register names and conventions for the ARM processor
R Format
I Format
Bits Value
Bits Value
0..4 Rd register
0..4 Rd register
5..9 Rn register
5..9 Rn register
10..15 Shift amount
10..21 Immediate
16..20 Rm
22..31 Opcode
21..31 Opcode
D Format
CB Format IW Format
There are instructions for multiplication and division. Recall that when mul-
tiplying two n-bit values, the result could require 2n bits. In the ARM archi-
tecture fixed point multplication is handled with a few instructions. Assuming
the registers are 64-bit registers, the MUL instrction can be used to multiply two
registers, storing a 64-bit result in a third register. To multiply the X7 register
by the X8 register, leaving the 64-bit product in the X2 register:
MUL X2, X7, X8
If the result exceeds 64 bits, the above instruction will produce the low order
64 bits. To obtain the high order 64 bits we must use either the SMULH in-
struction (for a signed multiply) or the UMULH instruction (for an unsigned
multiply). If the previous example produces a result which exceeds 64 bits, we
can put the high order 64 bits of the result into register X3 as shown below:
SMULH X3, X7, X8
For division we can obtain the quotient for a fixed point divide using the
SDIV instruction for a signed divide, or the UDIV for an unsigned divide. For
example to divide the X3 register by the X1 register, putting the signed quotient
in the X10 register:
SDIV X10, X3, X1
I Format Instructions
D Format Instructions
An example of a D format instruction is the LDUR (LoaD Unscaled Register)
instruction. In ARM assembly language it is:
LDUR Xt, [Xn, #DtAddress]
This is a memory reference instruction; the effective memory address is in regis-
ter Xn, with DtAddress as the offset. The referenced word is loaded into register
Xn.
Xn ← M em[Xn + DtAddress]
For example, the instruction below will load the memory word whose address is
the sum of the X3 register plus 48 into register X8.
LDUR X8, [X3, #48]4
Note that if you are loading a full word from an array, the array index should
be multiplied by 4 to get the DtAddress, since there are 4 bytes in a word, and
the memory is byte addressable. This instruction is said to be unscaled because
the offset is a byte address. There is also a load instruction which scales the
offset by the size of the word being loaded. LDR scales the offset by multiplying
it by 4, to get the effective address of a given position in an array of full words.
There are also STUR and STR instructions to store a register into memory, with
unscaled and scaled offsets, respectively.
B Format Instructions
A B format instruction is used for unconditional branch instructions (these were
called jump instructions in MIPS). In assembly language we would typically have
a label as the target:
B Label // jump to Label
The assembler finds the memory address associated with the Label, and fills
in a 26 bit address for the branch. There is also a BL instruction, Branch and
Link, for function calls. It stores the return address in the X30 register (LR)
and branches to the function. The BR Branch Register instruction is R formant,
and branches to the address in the Rt register. It is used to return to the calling
function.
Figure 9.8: Instructions which set (or clear) the condition code flags N,Z,V,C
0101 = +5 1100 = -4
+ 0100 = +4 + 1001 = -7
---------- ----------
1001 = -7 0101 = +5
Figure 9.10: Two examples of overflow when adding 4-bit words, two’s comple-
ment representation
to 0. The Z flag is set to 1, only if all bits of the result are zeros. The V flag
(oVerflow) is set to 1 only when overflow occurs. When the instruction is an
ADD, overflow can occur when the addition of two positive numbers produces a
negative result, or when the additon of two negative produces a positive result.
To explain the overflow condition more clearly, Fig 9.10 shows examples of
overflow when adding 4-bit words (assuming two’s complement represenation).
These principles apply to a 32-bit word, or a word of any size.
When adding positive numbers, or when adding negative numbers, we can
get an incorrect result because the result does not fit into a 4-bit word. This is
called overflow.
Another way to detect the overflow condition is to note that a carry into the
high order bit is different from the carry out of the high order bit. Of course,
overflow can result from a subtract operation as well. In this case, overflow
occurs when the borrow from the sign bit differs from the borrow into the sign
bit.
The C flag is used primarily when the operands are unsigned, and we do not
consider it here.
There are two kinds of conditional branch instructions, both of which are
type CB:
• CBZ and CBNZ (discussed above) are used to branch if a given register
stores 0.
• BC.cond is used to branch if the given condition is true (and should be
preceded by a subtract instruction which sets the flags, such as SUBS or
SUBIS).
Thus, conditional branches on inequalities (<, ≥, ...) are also possible. These
conditions are shown in Fig 9.11. Again, these conditional branch instructions
are designed with the intent that they be used after a subtract instruction is
used to set/clear the condition code flags.6
These instructions will need to examine the N and V flags, in addition to
the Z flag. To understand how the flags are used we need to recall that in
two’s complement representation, there are more negative numbers than there
are positive numbers (as described in chapter 2). When testing for > we clearly
6 Note that there are now two ways to compare registers for equality: CBZ (after a subtract)
and B.EQ, and two ways to compare registers for inequality: CBNZ (after a subtract) and
B.NE
330 CHAPTER 9. ALTERNATIVE ARCHITECTURES
Figure 9.11: Usage of condition codes for conditional branch on equalities and
inequalities (assuming the conditional branch is preceded by a subtract which
sets/clears the flags)
(A) (B)
0001 = +1 1010 = -6
- 1000 = -8 - 0011 = +3
---------- ----------
1001 = -7 0111 = +7
Figure 9.12: Two examples, A and B, showing why the overflow flag needs to
be used for B.GT conditional branch
Figure 9.13: Flags tested for a conditional branch if strictly greater (B.GT)
instruction
yz yz yz yz
00 01 11 10
x=0 1 0 1 0
x=1 0 0 0 0
Figure 9.14: A Karnaugh map derived from Fig 9.13 showing how the condition
code flags are used in the B.GT instruction
332 CHAPTER 9. ALTERNATIVE ARCHITECTURES
Another example, suppose we wish to branch to the label error if the addi-
tion of registers X3 and X5 results in overflow (and we don’t mind clobbering
register X3). We could use the following pair of instructions:
9.3.3 Exercises
1. Show ARM code to perform each of the following calculations (assume all
operations are done on signed quantities):
Figure 9.15: ARM instructions which test an individual condition code flag
(assuming a prior instruction set the flags)
(b) Show ARM code to branch to the label lp only if register X3 is equal
to register X5. Assume register X1 is available for temporary use.
(c) Show ARM code to branch to the label lp only if register X3 is less
than or equal to register X5. Assume register X1 is available for
temporary use.
(d) Show ARM code to branch to the label lp only if register X3 is greater
than register X5. Assume register X1 is available for temporary use.
4. Explain why each of the last three entries in Fig 9.13 can be don’t-cares.
5. Figures 9.11 and 9.14 describe how the condition code flags are used in the
B.GT instruction to branch if the first operand is strictly greater than the
second operand. Show a similar map and the resulting boolean expression
for each of the following instructions (your solution should agree with
Fig 9.11):
for example, a program that runs on a 386 will also run on a Pentium (but
a program on that runs on a Penitum is not assured to run on a 386). The
architectures are all two-address architectures, with the capability of including
memory operands for most instructions.
An instruction set is said to be orthogonal if all addressing modes are avail-
able across all instruction (or data) types. The Intel processors are not orthog-
onal, which makes them difficult to program.9
In this section we describe some aspects of the Intel Pentium architecture.
We will describe the CPU registers, condition code flags, instruction set, and
addressing modes.
M68000 series, which were orthogonal. These chips were used in Apple Macintosh computers,
whereas the Intel chips were used in IBM compatible computers. Ultimately the Intel chips
were favored, and Apple dropped the Motorola chips from future desktop computers.
9.4. INTEL PENTIUM 335
of the mov instruction are summarized in Fig 9.17. Note that the immediate
operand can be a full word.
Fig 9.17 does not show how a memory operand is specified. Here we describe
the addressing modes used to form an effetive memory address. The Pentium
architecture uses a variation on the base-index-displacement addressing mode,
described earlier in this chapter. The only difference is that it includes a scaling
factor for the index register. The effective memory address is thus
base + sc * idx + disp
where base is the base register, sc is the scale factor (which must be 1, 2, 4,
or 8), and disp is the displacement. Thus, a mov instruction which references
memory would have two registers and two constants for the memory address
operand. Examples memory addresses are shown in Fig 9.18.
The scaling factoris designed to make array processing efficient. For example,
to step through an array of full words, one would use a scaling factor of 4, since
there are 4 bytes in a word. The index register would contain the index of the
array position being accessed; thus, an value of 3 in the index register would
address position 3 of the array. This design also optimizes matrix multiplication,
which is an important operation in many scientific and simulation applications.
These addressing modes can be expressed in assembly language in many
ways (not all of which are available with all assemblers). A few examples are
shown in Fig 9.19
In assembly language memory is accessed through symbolic addresses as
well, which we will describe later.
In assembly language, the lea instruction, load effective address, can be used
to load the memory address of a symbolic memory location into a register. For
example,
336 CHAPTER 9. ALTERNATIVE ARCHITECTURES
The increment (inc) and decrement (dec) operations are supplied merely
for convenience; they provide a convenient way to increment or decrement a
register of memory location. For example, a counter could be incremented by
one with the instruction inc counter
Fig 9.20 also shows whether the instructions set the CPU flags (more on this
later).
9.4. INTEL PENTIUM 337
The compare (cmp) instruction is used in connection with the CPU flags and
conditional branching, as described later in this section. It does not change any
of the CPU registers nor memory locations.
There are also multiply and divide instructions for fixed-point integer data.
These require more explanation, but are summarized in Fig 9.21. As noted in
chapter 3, when multiplying two n-bit values, the result will not exceed 2n bits.
Also, when dividing an m-bit value by an n-bit value, the remainder cannot
exceed n bits. Thus, register pairs are often used with multiply and divide
instructions. In the Pentium architecture, the pair of registers (EAX,EDX) is
used for this purpose.
Note that when dividing, it is the programmer’s responsibilty to ensure that
precision is not lost in the quotient.
For example, to calculate a*b/c, where a,b,c are all positive integers, we
could use the following sequence of instructions.
imul and idiv instructions. They are similar to the unsigned instructions, with
the inclusion of an immediate operand for multiply. If two (or three) operands
are provided with an imul instruction, it is assumed that the result will fit in a
single (32-bit) register. These instructions are also summarized in Fig 9.21
For example, to calculate result = a/b*17 for signed quantities, we could
use the following instructions:
mov EDX, 0 ; EDX <- 0
mov EAX, a ; EAX <- a
idiv EAX, b ; EAX <- a/b
imul EAX, EAX, 17 ; EAX <- a/b*17
mov result, EAX ; result <- a/b*17
The multiply instructions which do not use the (EDX,EAX) register pair
presume that the results will fit in a single 32-bit register; it is the programmer’s
responsibility to ensure that this is the case.
Shift
0
Right
31 0
Shift
0
Left
31 0
Rotate
Right
31 0
Rotate
Left
31 0
Figure 9.25: Operands for the Pentium shift and rotate instructions
9.4. INTEL PENTIUM 343
lp:
dec ECX ; ECX--
jz done
mov tmp,EDX ; save EDX
and EDX,1 ; test low order bit
jz noIncr ; if zero, no increment
add EAX,1 ; incremnt counter
noIncr:
mov EDX,tmp ; load saved word
shr EDX,1 ; shift right logical
jmp lp
done:
; finished, result in EAX reg.
9.6 Exercises
1. For each of the following instructions, describe the effect it would have on
a register or memory location. Also, indicate which instructions are not
valid. In each case assume the EAX register contains 17, the EDX register
contains 19, and the memory location loc1 contains -2, and the memory
location loc2 contains -3.
3. Given the array of the preceding problem, show the instruction which will
increment the value at position 23 in that array. Assume the EBP register
contains 10014cd0H and the ESI register contains 00000017H = 23.
4. Show the pair of instructions which will compute (x-y)+1, where the value
of x is in the EAX register, and the value of y is in the EDX register. Leave
the result in the EAX register.
5. Show the assembly language code which will compute (x-y)*(x/y), where
the value of x is in the EAX register, and the value of y is in the ECX
register. Leave the result in the EAX register. Use a memory location,
tmp, for temporary storage.
6. Show the pair of instructions which will do each of the following (do not
use shift instructions, but assume the EDX register is available for use):
(a) Shift the contents of the EAX register to the left by 1 bit, preserving
the sign of the number.
(b) Shift the contents of the EDX register to the right by 3 bits. Do not
preserve the sign; assume it is an unsigned value.
(c) Multiply the EBX register by 32, leaving the result in the EBX reg-
ister, assuming it is a signed value. Use a shift instruction.
(d) Divide the EDX register by 8, leaving the result in the EDX register,
assuming it is unsigned. Use a shift instruction.
9.6. EXERCISES 347
(e) Multiply the EBX register by 42, leaving the result in the EBX reg-
ister, assuming it is a signed value. Use three shift instructions, and
two add instructions. Assume the EDX register is available for use.
(In this case use as many as 6 instructions)
(f) Rotate the EAX register to the left by 3 bits, leaving the result in
the EAX register.
8. Show the optimal (in some sense, best) code needed to set the EDX register
to 1 if it is positive, to -1 if it is negative, and leave it at 0 if it is 0. This
is the so-called signum function:
(a) Without using any shift or rotate instructions.
(b) Shift and rotate instructions are permitted.
9. Show how to derive the Flag settings in the first four rows of Fig 9.27.
Assume the jump instruction is preceded by a subtract (or compare) in-
struction.
Hint: Use a 3-variable truth table, and a Karnaugh map, with the vari-
ables SF, OF, and ZF.
Glossary
348
Glossary 349
Exclusive OR gate - A logic gate which puts out the logical Exclusive
OR of its inputs
Exponent - That portion of a floating point number which is used to scale
the number by a power of the base (usually either 2 or 16)
Field - A contiguous portion of an instruction or word
Field Programmable Gate Array - A digital component which can be
programmed and reprogrammed to perform any digital function (FPGA)
FIFO - First-in First-out
First-in First-out - An algorithm in which the item to be removed from
a data structure is the first one to be added (FIFO) Used in cache and virtual
memories
Fixed disk - Secondary storage device (non-volatile) which is not remov-
able
Flash memory - Solid state memory (non-volatile)
Flip-flop - A one-bit storage element
Floating point - A numeric data representation allowing for non-integer
values very large values, and values which are very close to 0
Floating point instruction - An instruction which performs an arithmetic
operation on floating point data
Floating point register - A register which stores data in floating point
representation
FPGA - See Field Programmable Gate Array
Fraction - That portion of a floating point number representing the man-
tissa of the number, separately from the exponent
Full adder - A logic component with three inputs that puts out the logical
sum and carry
Function - A group of program statements which may be invoked from
elsewher in a program
Function table - A description of the operating characteristecs of a com-
ponent, such as an ALU or a flip-flop
G - An abbreviation for 230
Half adder - A logic component with two inputs that puts out the logical
sum and carry
Herz - A unit of frequency; one cycle per second
Hexadecimal - Base 16 number system
I format - An instruction format for immediate instructions
352 Glossary
Tag - The portion of a memory address which stores the block number
Temporal Locality - The degree to which successive memory references
are located in proximity
Thrashing - Excessively frequent cache misses in a cache memory, or ex-
cessively frequent page faults in a virtual memory
Three address architecture - An architecture in which the three memory
addresses are the result, the left operand, and the right operand
Transfer of control - The process in which a program skips to another
instruction non-sequentially
Two address architecture - An architecture in which the result of an
operation is the same register as the left operand
Twos complement representation - A representation for signed binary
integers
Unconditional transfer of control - The process in which a program
skips to another instruction (not dependant on the state of the machine)
Virtual memory - An extension to the RAM using secondary storage
Volatile - Requiring continuous power to retain information
Von Neumann architecture - A classical computer design consisting of
ALU, control unit, memory storing instructions and data, and external storage.
Word - A unit of memory, 32 bits in the MIPS architecture
Write - A signal to a storage component specifying that data on its input
bus is to be stored
Zero address architecture - A stack machine
Appendix: MARS
The MARS (MIPS Assembler and Runtime System) can be used to assemble and
execute MIPS programs; it is written in Java and should run on any computer
which supports Java.
This appendix contains some basic information needed to download and use
the MARS software.
357
358 APPENDIX: MARS
to a folder on some disk such as the main fixed disk for your computer. Each
time you need to make changes to this source file, you can save it with the same
name.
If you terminate MARS and wish to continue later, load this source file from
the place where you had saved it.
• Click the icon which looks like a screw driver with a wrench.
If you have an incorrect statement in your source file, MARS will show error
message(s) in the bottom window pane (Messages). Correct the errors and try
to assemble again.
Once you have eliminated all syntax errors, when you assemble your source
file, MARS will automatically show the Execute window, which normally con-
sists of four window panes:
• The Text Segment pane shows your source code (or most of it) on the
right. On the left it shows the machine language code (with memory
addresses) which the assembler produced. The Basic column is an in-
termediate form which the assembler uses to translate your source file to
machine language.
• The Registers window pane shows the value of each of 32 general reg-
isters, most of which are initially 0. (This pane should not be showing
coprocessors at this point)
• The Data Segment pane shows the values currently stored in the Data
Memory, all of which should be 0 at this point. The Data Memory ad-
dresses begin at address 0x10010000.
• The Run I/O pane corresponds to the former Messages pane. It shows
error messages, along with any text output produced by your program
when it executes.
.4 Execute Programs
It is now possible to execute your program and view the effects it may have on
registers, memory, and output. To execute your program do one of the following:
This will execute your program at full speed, and terminate when it encounters
a terminating system call (or non-valid code at the end). If you used the sample
program (two statements) shown above, you can see that register $t0 has been
loaded with 23 (1716 ) and that register $t1 has been loaded with 46 (2e16 ).
To execute your program in such a way that you can see intermediate results
as the program executes, you have a few options:
• Slow down the run speed, using the slider at the top of the window, so that
you can view changes to registers and memory as the program executes.
• Choose Step, click Function Key F7, or click the green circle with a 1
subscript. This will allow you to execute one statement at a time.
• Set breakpoints in your program by selecting one or more Bkpt check
boxes on the left of the Text Segment pane. When running at full speed,
execution will pause at each breakpoint.
By judiciously choosing among these options, you can diagnose difficult prob-
lems. To start over and run again, choose Function Key F12 to Reset, or click
the green circle containing a double white triangle.
If there are semantic (i.e. logical) errors in your program, you will need to
click on the Edit tab of the Text Segment pane (not the Edit menu). This will
take you back to your source file. Make the necessary changes, assemble, and
execute the program again to verify that it is correct.
In general you will be using an Edit/Assemble/Test cycle as you develop
software.
Appendix: MIPS
Instruction Set
This appendix shows selected MIPS instructions. For each instruction we show:
360
.5. CORE INSTRUCTIONS 361
.5 Core Instructions
Assembly Language
Mne- For-
Name Machine Language
monic mat
Semantics
Add add R add $rd,$rs,$rt
opcode rs rt immediate
$rt ← $rs + imm
opcode rs rt immediate
$rt ← $rs ∧ imm
362 APPENDIX: MIPS INSTRUCTION SET
Branch
beq I beq $rs,$rt,addr
Equal
opcode rs rt immediate
→ (relative)addr if $rs = $rt
Branch
bne I bne $rs,$rt,addr
Not Equal
opcode rs rt immediate
→ (relative)addr if $rs 6= $rt
Jump j J j address
02 (absolute) address
31 26 25 0
opcode address
→ address
Jump
jal J jal address
and link
03 (absolute) address
31 26 25 0
opcode address
$ra ← address of next instruction
→ address
Jump
jr R jr $rs
Reg
00 $rs 08
31 26 25 21 20 16 15 11 10 6 5 0
opcode rs rt immediate
$rt8..31 ← 0, $rt0..7 ← M em[$rs + imm]0..7
Load
lui I lui $rt, imm
Upper
0f $rt imm
31 26 25 21 20 16 15 0
opcode rs rt immediate
$rt0..15 ← 0, $rt16..31 ← imm
Load
lw I lw $rt,displ($rs)
Word
opcode rs rt immediate
.5. CORE INSTRUCTIONS 363
Assembly Language
Mne- For-
Name Machine Language
monic mat
Semantics
Nor nor R nor $rd,$rs,$rt
Or or R or $rd,$rs,$rt
opcode rs rt immediate
$rt ← $rs ∨ imm
Set If
slt R slt $rd,$rs,$rt
LessThan
Shift Left
sllv R sllv $rd,$rs,$rt
Variable
Shift Right
Logical srlv R srlv $rd,$rs,$rt
Variable
00 $rs $rt $rd 06
31 26 25 21 20 16 15 11 10 6 5 0
Store
sb I sb $rt,displ($rs)
Byte
opcode rs rt immediate
M em[$rs + imm] ← $rt0..7
Store sw I sw $rt,displ($rs)
Word
opcode rs rt immediate
M em[$rs + imm] ← $rt
Assembly Language
Mne- For-
Name Machine Language
monic mat
Semantics
Excl Or
xori I xori $rt,$rs,imm
Immediate
opcode rs rt immediate
$rt ← $rs ⊕ imm
00 $rs $rt 18
31 26 25 21 20 16 15 11 10 6 5 0
00 $rs $rt 1a
31 26 25 21 20 16 15 11 10 6 5 0
00 $rd 10
31 26 25 21 20 16 15 11 10 6 5 0
00 $rd 12
31 26 25 21 20 16 15 11 10 6 5 0
Assembly Language
Mne- For-
Name Machine Language
monic mat
Semantics
Add
add.s FR add.s $fd,$fs,$ft
Float
opcode rs rt immediate
M em[$rs + displ] ← $f t
Load
lwc1 I lwc1 $ft,displ($rs)
Float
opcode rs rt immediate
$f t ← M em[$rs + displ]
368 APPENDIX: MIPS INSTRUCTION SET
Assembly Language
Mne- For-
Name Machine Language
monic mat
Semantics
Branch If
bc1t FI bc1t (rel)address
FPTrue
11 08 01 address
31 26 25 21 20 16 15 0
11 08 00 address
31 26 25 21 20 16 15 0
11 10 $ft $fs 32
31 26 25 21 20 16 15 11 10 6 5 0
11 10 $ft $fs 3e
31 26 25 21 20 16 15 11 10 6 5 0
11 10 $ft $fs 3c
31 26 25 21 20 16 15 11 10 6 5 0
Appendix: Pseudo
Operations Supported by
MARS
Name Mnemonic Assembly Language Semantics
Absolute abs abs rd,rs $rd ← |$rs|
Value
Branch b b label → label
Unconditional
Branch If beqz beqz $rs, label → label if $rs = 0
Equal Zero
Branch If bgt bgt $rs, $rt, address → address(relative) if $rs > $rt
Greater Than
Branch If bge bge $rs, $rt, address → address(relative) if $rs >= $rt
Greater Or Equal
Branch If blt blt $rs, $rt, address → address(relative) if $rs < $rt
Less Than
Branch If ble ble $rs, $rt, address → address(relative) if $rs <= $rt
Less Or Equal
Load li li $rd, imm $rd ← imm
Immediate
Divide div div $rd, $rs, $rt $rd ← $rs/$rt
Divide div div $rd, $rs, imm $rd ← $rs/imm
Immediate
Load la la $rd, label $rd ← label’s address
Address
Move move move $rd, $rs $rd ← $rs
Multiply mulo mulo $rd, $rs $rd ← $rs · $rt
Short
Negate neg neg $rd, $rs $rd ← −$rs
Not not not $rd, $rs $rd ←∼ $rs
Remainder rem rem $rd, $rs, $rt $rd ← $rs mod $rt
Remainder rem rem $rd, $rs, imm $rd ← $rs mod imm
Immediate
Rotate rol rol $rd, $rs, $rt $rd ← $rs ֒→ $rt
Left
Rotate rol rol $rd, $rs, imm $rd ← $rs ֒→ imm
Left Immediate
Rotate ror ror $rd, $rs, $rt $rd ← $rs ←֓ $rt
Right
Rotate ror ror $rd, $rs, imm $rd ← $rs ֒→ imm
Right Immediate
Name Mnemonic Assembly Language Semantics
Set If seq seq $rd, $rs, $rt $rd ← $rs == $rt?1 : 0
Equal
Set If seq seq $rd, $rs, imm $rd ← $rs == imm?1 : 0
Equal
Immediate
Set If sge sge $rd, $rs, $rt $rd ← $rs ≥ $rt?1 : 0
Greater or Equal
Set If sge sge $rd, $rs, imm $rd ← $rs ≥ imm?1 : 0
Greater or Equal
Immediate
Set If sgt sgt $rd, $rs, $rt $rd ← $rs > $rt?1 : 0
Greater
Set If sgt sgt $rd, $rs, imm $rd ← $rs > imm?1 : 0
Greater
Immediate
Set If sle sle $rd, $rs, $rt $rd ← $rs ≤ $rt?1 : 0
Less or Equal
Set If sle sle $rd, $rs, imm $rd ← $rs ≤ imm?1 : 0
Less or Equal
Immediate
Set If slt slt $rd, $rs, $rt $rd ← $rs < $rt?1 : 0
Less
Set If slt slt $rd, $rs, imm $rd ← $rs < imm?1 : 0
Less
Immediate
Set If sne sne $rd, $rs, $rt $rd ← $rs 6= $rt?1 : 0
Not Equal
Set If sne sne $rd, $rs, imm $rd ← $rs 6= imm?1 : 0
Not Equal
Immediate
Appendix: ASCII
Character Set
Dec Hex Chr Dec Hex Chr Dec Hex Chr Dec Hex Chr
0 0 null 53 35 5 78 4e N 103 67 g
8 8 BS 54 36 6 79 4f O 104 68 h
9 9 HT 55 37 7 80 50 P 105 69 i
10 a LF 56 38 8 81 51 Q 106 6a j
13 d CR 57 39 9 82 52 R 107 6b k
32 20 space 58 3a : 83 53 S 108 6c l
33 21 ! 59 3b ; 84 54 T 109 6d m
34 22 ” 60 3c < 85 55 U 110 6e n
35 23 # 61 3d = 86 56 V 111 6f o
36 24 $ 62 3e > 87 57 W 112 70 p
37 25 % 63 3f ? 88 58 X 113 71 q
38 26 & 64 40 @ 89 59 Y 114 72 r
′
39 27 65 41 A 90 5a Z 115 73 s
40 28 ( 66 42 B 91 5b [ 116 74 t
41 29 ) 67 43 C 92 5c \ 117 75 u
42 2a * 68 44 D 93 5d ] 118 76 v
43 2b + 69 45 E 94 5e ^ 119 77 w
44 2c , 70 46 F 95 5f 120 78 x
45 2d - 71 47 G 96 60 ‘ 121 79 y
46 2e . 72 48 H 97 61 a 122 7a z
47 2f / 73 49 I 98 62 b 123 7b {
48 30 0 74 4a J 99 63 c 124 7c |
49 31 1 75 4b K 100 64 d 125 7d }
50 32 2 76 4c L 101 65 e 126 7e ∼
51 33 3 77 4d M 102 66 f 127 7f DEL
52 34 4
372
Bibliography
373
Index
374
INDEX
beq, 60 xori, 45
bge, 60 instruction formats
bgt, 60 floating point, 173
ble, 60 instruction locality, 305
blt, 60 instruction memory
bne, 60 datapath, 260
branch, 60 instruction prefetch, 289
byte, 97 instruction register
c.eq.s, 123 see IR, 261
c.le.s, 123 Instruction Set Architecture, 310
c.lt.s, 123 Intel Pentium, 328
div, 116 inverter
divu, 116 logic gate, 218
fields, 143 IO devices, 4
floating point, 119, 120 IR
formats, 139 datapath, 261
I format, 139 ISA, Instruction Set Architecture, 3
immediate, 42 isNumeric
J format, 140 assembler function, 185
jump, 62 iteration structures, 65
la, 146
li, 145 J format instructions, 140
load, 51 machine language, 166
load address, 55 j instruction, 62
load upper immediate, 46 jal instruction, 76
logical, 31 JK flip-flop, 242
logical immediate, 45 block diagram, 243
lwc1, 123 function table, 243
memory reference, 49 joining
mfhi, 105 of buses, 221
mflo, 105 jr instruction, 76
move, 145 jump instruction, 62
mult, 105 machine language, 166
multu, 105 jump register instruction
not, 33, 146 machine language, 153
or, 33
ori, 45 K = 210 , 20
R format, 139, 148 Karnaugh map, 211
shift, 38 four variables, 212
arithmetic, 39
logical, 38 la instruction, 56, 146
store, 51 label
subtract, 28 assembly language, 26
swc1, 123 latch, 240
syscall, 134 latency, memory, 289
xor, 33 lb instruction, 97
INDEX
T = 240 , 21
tag
cache memory, 293
temporal locality, 306
termination
of program, 134
termination of a program, 79
thrashing
in cache memory, 299
in virtual memory, 303
three-address architecture, 313
transfer of control, 60
conditional, 60
unconditional, 60
transition, state machine, 245
truth table, 208
two-address architecture, 312
two-cycle
datapath, 265
two-way selection structure, 62, 63
twos complement, 16
wires
in logic diagrams, 220
word, 3
xor instruction, 33
machine language, 150
xori instruction, 45
machine language, 155