0 ratings0% found this document useful (0 votes) 618 views35 pages4-1. Book Rafiquzzaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
CHAPTER 2
Computer
Instruction Set
Computer architecture is defined as the study of the components and their interconnee-
tions that form a computer system. The computer supports the instruction types and
data, which become primary architectural considerations. In this chapter some important
characteristics and properties of computer instruction sets are discussed. Topics will
include: op-code encoding, addressing modes, and instruction types.
ITRODUCTION
péction manipulates the stored data, and a sequence of instructions constitutes @
sim. In general, an instruction has two components:
+ Op-code field
+ Address field(s)
The op-code ficld specifies how data is to be manipulated, The data items may |
reside within a CPU register or tn the main memory. wumpose of the address field 3
10 indicate the data address. When operations require data to be read from or stored
222.1 INTRODUCTION 23
into Wo oF more addresses, the address field may contain more than one address. For
example, consider the following instruction:
ADD RI, RO
op-code field address field
Assume that this computer uses RI as the source register and RO as the destination
register. The preceding instruction then adds the contents of CPU registers RO and RI
and saves the sum in register RO. The number and types of instructions supported by a
computer vary from one computer to another and depend primarily on the architecture
of a particular machine.
Depending on the number of addresses specified, one can have the following in-
struction formats:
+ Three-address:
+ Two-address:
+ One-address
+ Zero-address
ye stored in the main memory, instruction formats are de-
signed so that instruction sizes are optimized and have powerlul processing capabilities.
‘The CPU architecture has considerable influence on a specific instruction format, For
example, zero-address instructions are very predominant in stack machines,
—The following are some important technical points that have to be considered
when designing an instruction forma
size of an instruction word is chosen by the designer to specify several
ations. For example, with 4- and 8-bit op-code fields, 16 and 256 distinct
operations, respectively, can be specified.
\ge
of .
S b¢ a A Instructions are used to manipulate various clements, such as integers,
floating-point numbers, and character strings, In particular, all programs written
in a symbolic language such as FORTRAN or Pascal are internally stored as
characters. Therefore, memory space will not be wasted if the word length of
the machine is some integral multiple of the number of bits needed to Tepresent
a character. Since all characters are represented using typical 8-bit character
codes such as ASCII or EBCDIC, it is desirable to have an 8-, 16-, 32-, of 64-
bit word length.
7 The size of the address ficld is chosen to guarantee high resolution. In any
computer, the ultimate resolution is a bit. Memory resolution is a function of
the instructional length and, in particular, of short instructions that provide less
resolution. For example, in a computer with 32K 16-bit memory words, at least
19 bits are required to access each bit of the word. However, the resolution
achieved by some processors lies at the extremes. For example, the Burrough’s
B1700 processor achieves a I-bit resolution (cach bit is addressable). CDC’s
CYBER 70 series computer, however, has a minimum addressable unit of one
60-bit memory word.24 COMPUTER INSTRUCTION SET
Instruction Meaning
LDA addr | Ace < (addr)
STA addr | M( addr ) < (Ace)
ADD addr | Acc (Ace) + (addr)
AND addr | Acc <(Acc) A (addr)
CMA Ace <(Ace)’
INCA Ace (Ace) + 1
IMP addr | _Unconditionally branch to addr
HLT Halt CPU
Figure 2.1 A Hypothotical Instruction Set
ra
BE ENCODING s
A processor can execute an instruction only if it is represented as a binary sequence. A
unique bins is known as op-
this section.
The simplest way (o carry out op-code encodi
binary pattern to each op-code. For ex:
distinct op-codes. This method is"kWlownras the block-code technique, To illustrate this
concept, consider the hypothetical instruction set shown in Figure 2.1. In this figure,
there arc 8 different instructions that can be encoded using a 3-bit binary pattern papypo
(Figure 2.2).
to assign a fixed length of
ry pattern can represent 2*
Op-Code Binary Pattern
P2PiPo
LDA 000
STA ot
ADD. o10
AND oul
cMA 100
inca
JMP 10
Figure 2.2 Op-code Encoding Using a 2-bit Block Code2.2 OP-CODE ENCODING 25
‘The op-codes of the hypothetical instruction set can be decoded using a 3-to-8
decoder such as the 74L$138 shown in Figure 2.3. An n-to-2" decoder is required for
an n-bit op-code. As the value of 1 increas of the decoder and decoding time
also increas
s, the cos
Some op-code encoding techniques are considered in which the length of the op-
a function of parameters such as the number of addresses or the relative fre-
quency of its usage. The following approaches are discussed:
code
+ Expanding op-code technique
+H
uffman encoding
rationale behind expanding op-code technique is to find a compromise be-
ween the instruction length and memory resolution, Consider an instruction in which
the lengths of the op-code and address fields are 4 and 12 bits, respectively. Using such
a format, 16 operations can be specified that will allow access to 4096 memory loca-
tions. If the size of the address field is increased to 13 bits and the instruction length is
kept at 16 bits, the op-code length is three bits. However, this change will reduce the
number of possible operations by 50%. At the same time, it will increase memory
resolution by 100%. Conversely, the original instruction format ean be changed so that
the sizes of the op-code and address field are 5 and 11 bits, respectively.
With this new format, 16 more operations can be specified, which will be a 100%
increase. However, this gain results in a 50% reduction in memory resolution (beca
the 11-bit address field allows access to only 2048 different memory local
the concept behind the expanding op-code technique, The following exa
provided to explain the usefulness of this approach
Consider an instruction format with an instruction length and address field size of
8 and 3 bits, respectively. Only 4 distinct, two-address instructions can be formed be~
cause the op-code ficld has only 2 bits. ‘This result is illustrated in Figure 2.4.
If there are 3 twouuklress instructions rather than 4, $ one-address instructions
can also be specified. ‘This happens because each one-address instruction requires only
one address field, and the other 3-bit address-fickl can be used for specifying the 8 op-
codes. This idea is illustrated in Figure 2.5.
Ae 3-0-8
‘Op-code
decoder
(74.8198)
Figure 2.3 Instruction Decoder
a26 COMPUTER INSTRUCTION SET
tn Bt
2bits> <3 bits> 3 bits
op-code addr 1 addr 2
J Po
00
aaa yty bybiby
or dad ty babiby
10 axa Dybyby
" aya ty bybiby
Figure 2.4 Four, two-address Instructions Are Derived Using a 2-bit Op-code Field
‘The length of the op-code for cach one-address instruction is 5 bits. This means
ic length of the op-code field inere: as the number of address fields is decreased,
For this reason, this technique is referred to a the expanding op-code techniqui
In a typical instruction set, it is necessary to include some zero-address instruc-
tions. Suppose the number of one-ddress instructions is reduced from 8 to 7. Then op
to 8 zero-address instructions can be accommodated in the same instruction format. This
is illustrated in Figure 2,6.
For zero-address instructions, 8 bits are used for op-code specification. The ex-
panding op-code technique is employed to encode the instruction set of the PDP-IT
computer. .
Huffman’s encoding scheme is discussed next. The block-code technique assumes
all instructions are used with equal pobability. In practice, not all instructions are used
with the same relative frequency count. On the average, 40% of the instructions used
in a program arc load and store instructions. This Pattern is similar to the occurrence of
— 3 bits —___,
<2 bis 3 bits 3 bits
op-code, addr 1 addr 2
0 axA\Ay bybybo
ol axtiag babiby ‘The }-address Instructions
10 A209 Dibiby 1
Jv po 000) bsbiby 1
IW 001 bdby
" 010 bybiby
oo! on b, ic
bit op-code 7 to ei Eight {address Instructions
ul 101 bybby
u 110 bybiby
" M1 bbb,
Figure 2.5 Three 2-address and Eight 1-address instructions Using 8-bit instruction
Format2.2 OP-CODE ENCODING 27
<—— 8 bits
2dits> 3 dits> 3 bits
op code addr 1 addr 2
on. 200g Drbybo
2-bit op-code .. teh fee 3 two-address instructions
10 20,9 bib,
me 000) Dabiby
7 onc-address instructions
Nl 110
lt mW 000)
1 mM 01 |
I im i a 8 zero-address instructions
8-bit op-code in i. a
un mL Mm |
Figure 2.6 Illustration of 3 two-address, 7 one-address, and 8 zero-address instructions *
in an 8-bit instruction format
vowels in an English text. Therefore, this study warrants an encoding scheme that will
encode the op-code of the most frequently used instructions with fewer bits and the least
its. This allows the average number of bits
require lc a typical program to be optimum.
Huffman’s procedure carries out this idea, as explained by the following example.
‘Suppose it is desired to encode the hypothetical instruction set shown in Figure 2.7. A
a
Relative
Mnemor Frequency
Count
LOAD w
sto Me
ADD %
AND %
NOT Yoo
RSHIFT Ye
JUMP Ye
HALT Ye
Figure 2.7 An Instruction Set with Relative Frequency Count28 COMPUTER INSTRUCTION SET
LOAD STO ADD AND NOT SHIFT JUMP HALT
Figure 2.8 Initial Arrangement of Instruction Mnemonics
relative frequency count of each instruction is shown in this figure. These values are
obtained by inspecting the occurrence of cach instruction in a set of representative Pro
grams. In this procedure, first arrange the instructions to obtain a graph, as shown in
Figure 2.8. In this graph, there is one node for cach instruction mnemonic, and these )
nodes are labeled with the corresponding relative frequency count. The nodes are ar-
ranged in the ascending order of their relative frequency counts.
Next, scan all the nodes of the graph and select two nodes with minimum values,
Create a new node with a value equal to the sum of these minimum values. Exclude the
{Wo nodes that were picked up from the subsequent scanning process. ‘This result is
shown in Figure 2.9,(This figure is developed by scanning the nodes of Figure 2.8.
Here, the nodes correSponding-to-the~instruction mnemonics JUMP and HALT are
shown to have the least values. So these two nodes are selected, and new node with a
value ‘& (Vis + Ye = A) is created. After this, these nodes are excluded from the
subsequent’ process by crossing them out, ‘The nodes corresponding to the mnemonics
NOT and RSHIFT of RSHIFT and JUMP could have been chosen. When there are
several possibilities, the choice is arbitrary. ‘The important aspect of this procedure is to
select two nodes with minimum values and forma MEW WouS Willa valie UAT TS TN
Sum of the two chosen nodes. Ifthe scanning Process is continued, a new graph devel
ops, as shown in Figure 2.10.
This scanning process is continued until a single node with a
formed. At this point, a tree is formed, as shown in Figure 2.11.
value equal to 1 is
LOAD STO ADD AND NOT SHIFT Jump
©OOOCOO®
HALT
O¢)
Figure 2.9 The Result ofthe Inia Scanning of he Nodes in Figure 2.8
LoaD STO ADD AND NOT SHIFT Jump
©) © © © 8 &
«9 2.40 The Result of Scanning of Nodes of the Figure 2.9
Figure 2+2.2 OP-CODE ENCODING 29
LOAD STO ADD AND NOT ASHIFT JUMP HALT
Figure 2.11 The Tree Obtained by Continuously Scanning the Graph of Figure 2.10
ht and left branches of this tree are labeled with O and 1, respectively,
to obtain the Hu tree shown in Figure 2.12
“To find the op-code for a mnemonic, a path from the root to the leaf node corre-
sponding to this mnemonic must be found; the Os and 1s are picked up from the path.
For example, the op-code corresponding to the mnemonic LOAD in Figure 2.12 is 11-
Starting from the root, first move to the left; and move again to the Left to reach the
node corresponding t0 LOAD. The values on the left branches are 1. ‘The op-cades for
ihe mnemonics can be found in a similar manner; they are tabulated as follows:
oe Leaves
Now the ri
Figure 2.12 Huffman Tree30 COMPUTER INSTRUCTION SET
wy IEMONIC = OP-CODE_— PATH FROM THE ROOT
OAD i Lefileft
STO 10 Leftright
ADD on Rightleftleft
AND 010 Rightleftright
‘NOT 0011 Rightrightleftleft
\ gsiner 0010 Rightrightleftright
JUMP 0001 Rightrightrightleft
HALT 0000 Rightrightrightright
From the preceding result, it is easy to sce that Huffman's procedure encodes the
most-frequently used instructions with short op-codes and the least-frequently used with
long op-codes.
The average number of bits nceded per instruction can be calculated using the
formula
/ > Wi 21
Where /; and fj are op-code lengih and the relative frequency count of the ith instruction,
respectively. For this example, the average number of bits is:
2(A) + 2CA) + 3A) + 3A) + A(Yie) + 4(M0) + 4(Ye) + 4(Yia)
= U2 +2) + AB +3) + Vold +444 4 4)
= 1 + % + 1 = 2.75 bits a
Using the block-coue scheme, cach instruction Give encoded with a 3-bit op-
code (2° = 8), giving an average value of:
3(A + Mi) + 30K + A) + 3(Yo + Yo + Yo + Yo)
= 15 + 0.75 + 0.75 = 3.0 bits
From the information theory, the optimum number of bits needed to encode a sct
of messages is
- > — flosstf) 22
The difference between the actual average length and the optimum length is called
redundancy (R), and it can be written as
p= seul fength — optimum length A
actual length2.3 ADDRESSING MODES 31
If cquation (2.2) is applied to the example, the following results:
—[2CA)loga(A) + 2(A)log.(™) + 4(Ao)loga(i)}
= 0.75 - 1
If equation (2.3) is applied to the result of Huffnan’s scheme, & is found to be
zero. However, the block-code scheme introduces a redundancy of “12, of 8.33%.
's scheme achieves an optimal result by keeping the redun-
dancy to a minimum value, However, when op-codes are encoded using Huffman's
scheme, the decoding process takes more time because a search must be conducted on
the Huffman tree. This ide: used in Burrough’s B1700 computer. The op-codes of
the Zilog's Z 80 microprocessor's instruction set are encoded using a scheme that is
very close to the Huffman’s scheme. Even though block-code encoding takes extra
storage space, it is widely used because of the simplicity of the decoding procedure.
2.3 ADDRESSING MODES
KCesbor Executes
sequence of instructions in the follow
1g manner:
begin
read next instruction from the memory
decode the instruction and recognize the type
of operation
while the required operation is not a halt op-
eration do
begin
determine operand addresses
retrieve the operands
perform the desired operation
determine the destination address
save the result of the operation in the desti-
nation
read the next instruction from memory
decode the instruction and recognize the type
of operation
end
while there is a hardware reset do
skip ¢/ this is a dummy statement that allows /)
(/ the processor to execute an infinite loop/)
end
~~” The sequence of operations that a processor has to carry out while executing an
instruction is called its instruction cycle. The most important activity in an instruction
cycle is the determination of the addresses of the operands involved in that instruction.32 COMPUTER INSTRUCTION SET
accomplishes this task is called the addressin,
—" ich a processor accomplishe: iM ; d ng mg
The mans in which a roe supported by the instruction sets of popular contpena?
s the 8085, Z 80, MC6S09, MC6S000, PDP-I1, and VAxe11 Will by
The typica
processors such
examjped, ; et te a
fn instruction is suid to have an inherent addressing mode i its opcode indica
the address of the operand, which is usually the contents of a CPU realster For exam,
ple, consider the following instruction:
C. ASG the carry flag in the status register. Since the op-code implies the ad.
dress of the operand, the processor docs not have to compute the operand
auldress. This mode is very common with 8-bit microprocessors such asthe
8085, Z 80, and MC6809.
(renee an instruction contains the operand value, it is called an inmedtae
‘mode Msteuction) For example, consider the following instruction:
Ald #25815 RGR 4 25,7
In this instruction, the symbol # indicates that i is an-imunedi jate-mode instruc-
fon) This convention is adopted in dhe ssemblers for processors such ay the #1C6809,
168000, PDP-11, and VAX-11. In these systems, the machine representation of this
instruction occupies two consecutive memory words:
The first word holds the op-code, whereas the next word holds the data
value. (For the preceding case, it is 25),
© exceute this instruction, the processor has to aecess memory twice,
n instruction is said to have an absolute addressing mode if it contains the ad-
Gress Of the operand.) For example, consider the following move instruction:
[ov @#5000, R2; R2— (5000)
his instruction copies the contents of memory location $000 in the CPU-register R2)
Asin the previous case, the ol is instruction occupies two
Consecutive memory words. However, in this Gise, the contents of the memory word
that follows the op-code is interpreted as the auldregs OF the operand, To execute this
instruction, the proces one for the op-code, one
for the address, and one for the wh absolute-mnde instrue-
tion ig more than the corresponding imme
‘An instruction is said t0 have a regivier mode if it contains
opposed,to & memory address. In this mode, the operand values are held in the CPU
imple, consider the following register mxte
R2, R3; R3— R2 + RZ
mode instr
registe
Tin the register-addressing mode, the effective address (EA) of an operand is @
CPU register. Since many contemporary CPUs have a small number of registers, the
machine representation of a register mode i struction requires ‘only a few bits. Memory
space can be conserved by using register-mode instructions. I ane, mode, the processor
hn not require any memory reference for data retrieval, Hence, the instruction exc2.3 ADDRESSING MODES 33
cution rate can be considerably increased. Since there is always a limit on the number
of CPU registers, it is not possible to handle a large number of operands by the excli-
sive usc of the register-addressing mode. However, having a CPU with large number of
registers is a key characteristic of the reduced-instruction-set computers (RISC). The
RISC architecturg is discussed in a later section of this chapter.
Whsnevedin instruction specifies the address of a CPU register that holds the
wot of an operand, the resulting addressing mode is known as the register indirect
mode) From this definition, it follows that the EA of an operand in the register-indirect
modé is the contents of the CPU regi
follows: —
ik= ®)
To illustrate this idea clearly, consider the following instruction:
MOV (R2), (R3);_ (3) — (R2)
er R. More formally, this result is written as
Assume that the following configuration exists:
(R2) 7 5000i4
(R3) = 4000,
(5000) = 1256.6
(4000) 462945
This instruction copies the contents of the memory location, whose address is
specified by the CPU register R2, into the location whose address is specified by the
CPU-register R3. Thus, after the execution of this instruction, the memory location
4000 will contain the value 1256. Whenever a CPU register is uscd as a data pointer,
the assembler convention is to enclose that register using a sct of parentheses. Alterna-
tively, an indirect register may be specilicd by using the prefix @:
MOV, @R2, @R3
1g mode is very useful whenever there is a need to manipulate two differ-
of the register indirect mode are the auo-increment and auto-
decrement modes. In auto-inerement mode, first the contents of the specified CPU reg-
ister are used as the address of the operand, and then data transfer takes place. After
this, the register contents are automatically ineremented by some constant k. To indicate
this mode, the register involved will be enclosed by parentheses and immediately fol-
lowed by the plus sign. For example, consider the following instruction:
MOV (R2) +, R3
In this instruction, the source operand is in the auto-increment mode, and the action
taken by this instruction can be described as follows:
R3<—(R2)
R2—R2 + k
RR34 COMPUTER INSTRUCTION SE!
¢ is similar to the auto-increment mode, except the
ig SA SnD eae are fist deeremented by &, and the resulting value
the specific ak operand, his action i symbolically represented by sume
the ares 0 with parentheses together with a minus sign Just before the lef Pate.
theses. For example, consider the auto-decrement mode clear instruction:
8
CLR ~ (R5)
This instruction can be precisely described as follows:
RS <—RS -k
(R5)}— 0
‘The constant value k used in the auto-increment and auto-decrement modes is actually
4 function of the number of bits involved in the data transfer. ‘Typically, this value is |
for 8-bit, 2 for 16-bit, and 4 for 32-bit operands.
These modes are useful in array manipulations. For example, assume the cpu
registers R2 and R3 are initialized with the starting addresses of two arrays X and Y,
respectively. Then, the following instruction transfers the first clement of the array X
into the first clement of the array Y:
MOV (R2) +, (R3) +
After this data transfer, the CP
corresponding elements of these
ment mode, then the same result
tions:
MOV (R2), (R3)
ADD #k, R2
ADD #k, R3
The auto-increment mode
terms of space and time.
By using the stack pointer in a
the s auto-decrement and auto-i 8, PUSH
and POP operations can be obtained. rider the foe pea
SFexample, consider the following instructions: *
MOV R2, ~ (SP)
MOV R3, ~ (Sp)
MOV R4, ~ (sp)
U registers are automat ically pointed to the next
iurays. If the system does not support an duto-inere-
can be obtained by using a sequence of three instruc:
allows one to write Programs that are more efficient both in
‘This sequence pus
To tore the re;
follows:
MOV (SP) +, R4
MOV (SP) +, R3
MOV (SP) +, R2
the content of the CPU registers R1, R2, and R3 into the stack.
ons three successive: POP Operations are performed, a2.3 ADDRESSING MODES 35
cation.({n this approach the EA of an operand is expressed as the sum of two parame-
: reference address (RA) and modifier (M), formally written as:
EA = RA+M
co concept used in the context of addres ing modes is address modifi-
te
The nioditicr M is also called the offset, or displacement Such an address-modification
principle is the basic concept associated with the following addressing modes:
+ Indexed mode
+ Base-register mode
+ Relative mode
fi the indexed mode, the value of RA is included in the instruction, and a CPU
register contains the value M. The CPU register X is called the index register. This
mode is useful for accessing arrays, For example, consider the following Pascal integer
array y:
var
y: array [0..9] of integer,
Assume that each clement of this array requires 1 byte and that the entire array is
configured in the memory as shown in Figure 2.13. From this figure, notice the array
starts at the memory address 0100. Now, assume the index register X contains the value
0002, and execute the following indexed-mode load instruction:
LDA 0100 (X)
s instruction indicates that the register X is the index register. Its 1
tation includes the reference address, 0100, which is the starting addres
Figure 2.13). For this situation, the EA of the operand is:
hhine represen-
s of array Y (see
EA = RA +X
= 0100 +2
= 0102
‘Therefore, when this instruction is executed, the contents of memory location 0102 are
transferred to the A register. This memory address actually holds the array element
y [2]. Since the register X contains the required index value of the array element, it is
referred to as the index register. ‘To access the third clement of array y, register X needs
to be incremented by 1. This operation can be performed quickly. Therefore, the in-
dexed mode allows a programmer to carry out array manipulations in an efficient
manner,
In the baso-register addressing mode, the parameter RA is held in a separate reg-
ister called the base register, and the modifier M is included in the instruction. This
mode is very significant in a system that provides a virtual memory support. In partic-
ular, the base-register mode has application in segmented memory systems. In these
systems, the base register holds the base address (or the starting address) of a segment.Ref
address
je 8-bits—r| Memory
address ¥ Machine
‘Op Code 0100 +— represent
esas eee oer
——~_]
——_ Index Reg X
0002
yl0}__ [0100 <—
yi] fotor [__+___~
yl2| _|o102 «4 LDA 0100 (x) «— instruction
yi3]__}o103
yl4l__| 0104
yi5} Aregistor
yi6]__|0106
sve |
a
yi8}__| 0109
Figure 2.13 Use of the Indexed Addressing Mode in Accessing Arrays
In general, the s ,
sequtiee ip aaa 1 the indexed mode modifier (M) is the same as the number of bits
the modifier held (M) may f, Saate8: Inthe base-register mode, the number of bis
aay be less than the number weet toes :
ee han the number of bits required for a direct memory
The contents of the M field are of
ie often interpreted as a 2° ber.
Cc © as asa 2's c ement number.
Whenever the sizes of the modifier and the memory address fields are accel, the sit
cen The Peo tse it the etfective address ealeuh en eau
the PC is configurated as the b; “ess calculation,
. as Se register, the 7 . sults.
This mode is particularly usetul in de ing shave relative addressing mode e
consider the Z 80 branch instruction JP Ooage instruction is oquivale 7 fre hgh
svel Janguage st : ° is cquivs
mn te atyes Thain 0248. The machine representation of this instuctio®
wei hi instruction is siege (C3) tnd 2 bytes for the benny anieace (248):
Assume this instruction is stored as shown in Figure 2 Lagy, ee mach alates
adress of the nes instction © be executed, le oo, Sige PC aay hs
implies loading the PC with the branch address," NCMentition of a branch instiMemory
address.
(0000
0240
241
ozs,
024
024g
0245
0246
0247
0248
‘0000
0244
0245
0246
0247
0248,
Bits
cs
48
2
}
2.3 ADDRESSING MODES 37
Bolore executing
ceo, the JP 0248 instruction
Machine representation
of the JP 0248 instruction
i execu
ec] oz | ae Attor executing
the JP 0248 instruction
ebits
-—_
(|
18
02
j¢——___
}
Machine representation PC 0246,
of the JR 06 instruction
‘a. Operation of an Absolute Jump Instruction
El» 0244
Current
contents
of the PC
Otfset ©
value
—r_
0248
PC
Operation of a Relative Branch Instruction
Contents of PC before
the instruction Fotch
Contents of the PC after
the instruction Fetch
Branch
address,
generation
Figure 2.14 Mechanics of an absolute and a relative branch instructions (All Numbers
‘Aro in Hex Decimal Form)
Alternatively, cons
valuc 02 represents the value of the modifier or offset. The machine representation of
this instruction required only 2 bytes: | byte for the op-code (18) and | more to hold
the offset value 02. If we assume that this instruction is stored as shown in Figure
2.14(b), the execution of this instruction can be explained as follows:
lor the Z 80 relative branch instruction JR 02. The numerical30 CUMPUIEH INSIKUUIIUN 301
+ First, the entire instruction is fetched.
+ Since this is a 2-byte instruction, after instruction Fetch, the PC will Contain
0246 (0244 +2).
+ The branch address is computed as follows:
current contents of the PC + offset value PC
branch
= 0246 + 02
= 0248
+ Finally, the PC is loaded with this branch
be executed is located in the memory location 0248.
When the offset is a negalive number, reverse branching takes place. For exam.
ple, consider JR 06 with all other data (in hex) us in Figure 2, 14(b). In this case, 06
will be subtracted from 0246. Since this is a subtraction of an 8-bit signed number from
4 16-bit signed number, the correct result is obtained if the $-bit number (—06) is sign-
extended and then subtracted from 0242 using 2's complement (ignoring the final carry)
as follows:
0246 9000 0010 o100 o110
2's complement 7
of Hii aT titi ior0
1 0000 o010 ol1o e000
7 0 2 4 0
Ignore cary
Sign extension
Since the offset value is an §-bit quantity, only —128 to 127 bytes can be
branched relative to the current contents of the PC. In most computers, conditional
branching instructions use the relative mode. If one needs to go beyond the —128 to
+127 range with conditional branching, then the unconditionsl branching instruction
can be uscd anywhere in this range to branch to any location within the computer’s
directly addressable memory. Another important aspect of the relative addressing mode
is its ability to produce a relocatable Program. For example, consider the situation in
Figure 2.14(b). Jn this configuration, assume that the branch address is 2 bytes away
from the current contents of the PC. This means when this program is relocated, the JR
02 instruction will be placed in a different location, ang different PC contents will
contents of the PC, : .
The usefulness of various addressing modes will now be discussed. The imple-
; arti dressing mode is largely dependent on the processor OfB2-
tation of a particular ad iS largely dependent on °
aeition, In a processor with many general registers, indexed and indirect addressing
ae are easy to obtain. If a computer supports powerful addressing modes, the task
me2.4 INSTRUCTION TYPES 39
of designing language translators, operating systems, and efficient application programs
can be greatly simplified, For example, the auto-increment and auto-decrement modes
allow one to implement stack operations and program loops efficicntly. For a compiler
designer, efficient stack operations are very important for implementing procedure calls,
‘The indexed addressing mode allows an applications programmer to manipulate
arrays, whereas absolute and relative modes allow the programmers to write position-
independent programs. A program is said to be position-independent if it can be placed
anywhere in memory. This is a desirable feature for operating system designers because
it allows the operating system to relocate programs in a dynamic manner. In a multiuser
system, many different users may need the same library program provided by the op-
erating system. If this library routine is position-independent, the operating syst
load the machine code of this routine into any available portion of the main memory.
INSTRUCTION TYPES i
EE
The purpose of a computer instruction set is to provide a virtual machine with all f
tures so that compilers, operating systems, and library subroutines can be designed
quickly (see Figure 2.15).
From Figure 2.15, it can be seen t sing the instruction set is trans-
lated into the machine code by the translator program assembler. Finally, the machine
code is interpreted by the microcode so thiit the hardware produces the desired result.
the microcode directs the hardware by generating necessary commands
. In general, instructions available in a processor may be broadly clas-
In other words
(control signals
sified into six groups. ‘These are:
d
2 Arithmetic instructions
Logical i
ata transfer instructions .
tructions
Machine seen by a
system programmer
Figure 2.15 Creation of a Virtual Machine via Instruction Set40 COMPUTER INSTRUCTION SET
of) Progeam-control instructions
9 System-control instructions
VO instructions
instruction types are discussed in the following sections
Thes
fata-transfer instructions
Yer instructions are concerned primarily with data transfers betwee
Cessor and main memory. Typically, san idea! instruction set must be able to
following transfers:
2 1 -lreansl 2 2a PYcoces so
+ Register to register bri
+ Register to memory LO nce are mort
n the pro-
idle the
+ Memory to register
+ Memory to memory
In contemporary computers, the system architecture is configured so the preceding
Possibilities ean be implemented in an efficient manner, Toe example, in a three-bus
architecture, simultaneous data transfers can occur. Various bus architectures are ine
cluded in Chapter 4. ‘This group is given prims Y consideration while designing the
conirol unit because results reported by D. E. Knuth {5] and S$. H. Fuller [2] have
shown that, on the average, 40% of the instructions. in many user programs are data-
transfer instructions.
2.4:2 Arithmetic Instructions
I instruction sets typically include ADD and SUBTRACT ‘structions) The instruction
Sets of the VLSI microprocessors such ay the Intel $086 sand ae MC8000/
68020 include multiplication and division Iristions. ‘The CPUs of these processors
paust include provisions for performing muliph tion an division. The desten of the
hardware required for these operations is Covered in Chapter 3. Many iene he
applications require the system to my; ipulate decimal quick ve ene i
the instruction set of modetn processors tnlue Uetions that ear yes pans
BCD numbers. For example, the ABCD instruction of the MCouegg ey Drees
pable of ang to BCD nutabes stored in nes memory e200 processor is ca
arithmetic oper i iemory. Simi
manipulation, some proce i ita
wended instruction set of the PDP.1 1/79 6 7 Ons. For example, the ex-
ean DIVE to multiply and divide Woating-pint nea ri istretions baidiadLderal
Aoating-pot arithmetic operations, ihe ppp CPU renege: to speed up rr
general-purpose floating-point register, (Eo through, includes 6 additional 32-bit
cessor, FP-11. With the advent of Vig) technology J the floating-point copro-
sors compatible With mictoprocessory. jae? iS Possible to obtie nancrous
coprocesso “os. For example, the 18087, AM9SI1, and2.4 INSTRUCTION TYPES a
MC6888 1. coprocessor chips are compatible with 18086, 18085, and MC68000 micro-
processors, respectively. Likewise, Motorola's floating-point ROM chip, the MC6839,
can be interfaced with the MC6809 microprocessor to handle floating-point computa-
tions.
In general, the following
level language:
ignment statements are common with respect to high-
X=X+h
Y-
As a consequence, the instruction repertoire of the state-of-the-art processors includes
special instructions to increment or decrement by small quantities and to assign a small
value to a specified memory address. According to the Motorola designers, these in-
structions are called quick mode instructions. With VAX-I1 computers, these instruc
tions are relerred to as short immediate or literal mode instructions. For example, the
assignment statement Z: = Z — 4 can be implemented in the MC68000 and VAX-11
assembly languages as follows:
MC68000
LEA:
SUBQ.B #4, (A2) ; Subtract 4 from the memory byte whose address
specified in the register A2.
AZ ;_ Load the address of Z into the address register A2
‘This suffix indicates
that this is a quick
mode instruction.
VAX-11
MOVEAB Z, R6 ; Move the address of Z into the CPU
register RG
SUBB SA#4 (R6) ; Subtract 4 from the memory
y + byte whose address is specified in the register R6.
‘This prelix indi
that the opera
is in the literal
mode.
Logical Instructions
Invariably, the instruction sets of all contemporary processors include instructions to
perform Boolean AND, OR, NOT, and EXCLUSIVE-OR operations on bit-by-bit
basis. These instrucHiON ScIs also include the Tollowing shift instructions:
: whe eer.
=42 COMPUTER INSTRUCTION SET
«+ Arithmetic shift (left or right)
© 6 Logical shift (left or right)
+ Rotate shift (left or right) through or without the carry flag
11 instruction sets include even more clegant
MC68000 and the VAX: ation ses includ ev
shift tase For exainple, consider the following shift instructions:
Mc68000
is instruct sa left logical shift of the low-order
R.W #3,D5 This instruction performs a
7 16-bits of the data register DS by three places,
VAX-11
ROTL #16,R7
‘These instructions rotate the contents of the CPU register R7 to the left by 16
Places. After this instruction is executed, the low-order 16-bits and the higher-order 16-
bits of the 32-bits register R7 are exchanged. The VAX-11 instruction sct also allow a
Programmer (0 perform memory-to-memory compare operations. The test instruetion
tests the specified operand with zero and sets the zero and sign flags, depending on the
Status of the tested operand. These Mags are held in a dedicated register called condition
ett is design is covered in the next chapter,
Program-control Instructions
/ In a conventional computer, instr
iclions are always executed a
‘ i ‘lys in the same order they are
Presented, In reality, the flow of control depends on the result of 4 computation. In this
situation {4 program can select a particular sequence of i
the resultS gf a computation, Instructions thay Perform this are ene ea eased oo
instructions. mt
¢ i i his are called program-contral
general, these instructions may be classified into four eroups
'conditional branch instructions
+ Conditional branch instructions
+ Subroutine call instructions
+ Interrupt-handlin,
structions
An unconditional branely i
f th atus of the soma #8 the cont 1
less of the status of the computation, Processors se i '0 the specified address regard-
and VAX-11 include both the absolute relitiy ich as the Z go, MC68000, PDP-II,
‘A conditional brah instruction worig f° BFC instructions
Sas follows: 7
ICLion transfoy
I (Condition) then brane *
lowing instruction. 8 exe24 & ney instruction ef the fol
else execute A
Condition fa, ,
: =
tional branete rueeeey Set by some instruction
'°R. Typically, the instruction
In this situation, assume the
that immediately precedes the cong2.4 INSTRUCTION TYPES 43
may be an arithmetic instruction (such as ADD, SUBTRACT, INCREMENT, DEC-
REMENT, or COMPARE) or a logical instruction (such as TEST or COMPARE). te
ing the condition flag settings, traditional relational operators (such ‘as equal to, not
cqual to, greater than, greater than or equal to, less than, or less than of equal to) can
be implemented. For example, consider the following VAX-11 instruction sequence:
MOVAB. X,R7;
MOVAB Y,R8;
Move the address of X register into R7
Move the address of Y into register R8
TSTB, (R7) ; Check whether X = 0
BEQL UPDTY ; If it is then update Y
BRB NEXT : otherwise go to next instruction
UPDTY: ADDB SA #2, (R8): perform Y: = Y+2
NEXT:
This sequence is equivalent to the Pascal if statement:
IEX = 0 then
Y+2
In this example, the variables X and Y are assumed to be 8-bit 2's complement
numbers. The test instruction sets the Z-flag (zero-status flag) only if X is zero, and the
branch instruction BEQL causes a branch only if the Z-llag is set to 1. PDP-I1,
MC68000, and VAX-I1 conditional branch instructions can handle both signed and
unsigned operand values.
The preceding numerical example suggests that by utilizing the logical and con-
ditional branching instructions, the following control structures, which are regarded as
the primitive components of a structured programming language such as Pascal or C;
can be implemented:
+ If (cond) then (statement)
else (statement)
While (cond) do
(statement)
+ Repeat
(statement)
Until (cond)
Case (label) of
idl: (statement)
id: (statement)
end
‘The VAX-I1 instruction set includes useful instructions such as SOBGTR (sub-
tract | from the loop index and branch if the index is still greater than zero) and conga
LEQ (add | to the loop index and branch if index is-Jess than or equal to limis)
implement Pascal repeat and for loops, respectively.a gagepeuaenaune
STRUCTION ‘set
4a CONPUTER I
For example, the VAX-11 code:
#10, R53 Initialize RS with 10
MOVB
TBL RS, RS ; Sign extend RS to 32 bits
cv 5,
CLRB RO ; Clear R6 to 0
MOVAB SUM, R7 ; Point R7 to SUM
Loop: ADDB2 RS, R6
SOBGTR R5,LOOP Repeat until (R5) = 0
MOVB R6, (R7) ‘ ‘Transfer the result to SUM
is equivalent (o the Pascal program:
until
Similarly, the VAX-11 program fragment
Movs #1, RS
Initia ie
CVTBL Rs Rs itialize RS with the initial
Value of the loop index
MOVB
cb aL 7 + Initiatize R6 with
MOVAB unt ny + Limiting value of the loop index
cuRn im + Point R7 10 SUM
: Clear R8
LOOP: ADpB2 RS, Rg
ADBLEQ RO, Rs, Loop, oe index by 1 7
MOvB R8, (R7) ling “P&E until Coop index
is equivalent to this C program seg + save result
Ment:2.4 INSTRUCTION TYPES 45
for(i = 1
We leave the task of verifying the correctness of the preceding results as an ex-
ereise, With respect to VAX-I1 programming, all loop parameters must be specified as
32-bit quantities. This is why we use the CV'TBL instruction to sign-extend a byte into
32-bit operand (long-word operand).
ZA subroutine is a program segment for carrying out repeatedly needed tasks such
as converting code from binary to ASCH, searching, and sorting. A subroutine may be
written and tested separately. A subroutine can be linked with a user program so that
the latter can call the former as many tings as necded) Thus the use of subroutines ean
save programmer's time as well as the memory spacerieeded by an application program, )
~ A large program can be thought of as a collection of independent program niod-
ules, where cach module may be a subroutine or a set of subroutines. ‘This is the key
feature of the modern software approach called modular programming
In this method, programmer has a global view of all components of a large
Program, and efficient programs can be developed within a short period of time. Since
each subroutine can be independently tested, it follows that the modular programming
approach considerably improves the overall software reliability. The subroutine concept
strongly encourages the idea of program sharing. For example, ina multiuser system,
several user programs may share the same 1/0 subroutine provided by the operating
system. Therefore, a user does not have to spend time developing /O routines.
Subroutine calls and returns from subroutines are usually handled by two special
instructions, CALL and RET, respectively. ‘The CALL instruction is of the frm CALL
(addr), where the para
icter (addr) refers to the address of the first instruction of the
subroutine. When this instruction is executed, the current contents of the PC are saved
“the stack, and the PC is loaded with (addr). ‘The current contents of the PC provide
the address of the instruction that immediately follows the CALL instruction. This ad-
dress is also called the return address because this is the'point where execution of the
calling program will take place after exiting from the subroutine. ‘The CALL instruction
is functionally cquivalent to the following instruction sequence:
PUSH PC; save the return address in
the stack (SP) + — PC
JP addr 5 branch to the subroutine
The RET instruction is usually the last instruction of the subroutine. When this instruc-
tion is executed, the return address previously saved in the stack is retrieved and loaded
into the PC. The control is then transferred to the calling program. A RET instruction
is functionally equivalent to POP PC; PC — = (SP)
AL this point, one may be wondering why the return address is not saved in a'
4 COMPUTER INSTRUCTION =
nthe stack. ‘This arrangement fails (0 work if nested Subrouing
ted, Subroutine nesting refers to es routine calling AMOthe,
, consider the main program M and two sul broutines P and ‘
For example, consid rogram calls subroutine P, and this subroutine in (um
in Figure 2.1600: To a perol flow sequence is shown in Figure 2.16(4), meee
zetia Pepin the rtumn address ofthe main program (M) andthe subyoun
® respective ain program calls subroutine P, the ies a (MR) is sha
7 x ack (see Figure 2.16(c) ) and the control is transferred to subroutine P. Si
a aa atin P calls subroutine Q, the return address (PR) is pushed ay the
} i (see Figure 2.16(d) ), and the control is transferred to = 8 When sub.
routine Q completes its execution, the return address is retrieved from the stack and
loaded into the PC. Since the return address is PR, the execution of subroutine P jg
resumed. Similarly, when subroutine P terminates, the return address (MR) (see Figure
2,16(¢) ) is retrieved from thy stack and loaded into the PC. ‘The execution of the main
“program is then resumed. /
To implement subroutine nesting, the return addresses must be retrieved exactly
in the reverse order in which they are saved. Sinee a stack is a LIFO data structure, its
use is a natural solution to this problem. Suppose a CPU register is used to save the
return address. The return address (PR) will write over the return address (MR), and
control will not be transferred back to the main program at d
Solutions to some problems such as traversing a binary tree are naturally recursive
and are precisely solved by writing recursive subroutin
recursive if it calls itself. A recursive evaluation involves
to the basis part of the recursive definition,
way up in exactly the reverse order.
implementing recursive subroutines,
The registers involved in implementing a subroutine call are called linkage regis-
» and the process is known as the subroutine linkage convention, In all microproces-
sors, the PC is used as the linkage register. ln the PDP-11 computer, any CPU register
can be configured as a linkage register une ee ,
ler program control. In the majority of the
Processors, the sysem hardware automatically’ trol. In the majority
CPU register rather thar
calls are to be implemen!
1 descent process all the way
Then it involves an ascent process all the
Therefore, the use of a stack offers a solution for
8 subroutine ¢4
; igure a temporary routine ells
stack preventing the user stack area foye
linkage. ca
The effective use of subroutine cay
caller to the subroutine, and vice yengy po
or argument-passing conventions, Hi
typical parameter-passing conven
am
a. This. as
ish-level Jay
tions
In the call-by-value approach, ‘he, as ¢;
oalive by copying the parameter value qr” Progra
subroutine becomes active. W
'ethod for transferring data from the
Pect is often referred (0 as parameter”
Te au8eS such as Pascal and Ada adopt
38 call-by-valug and call-by-reference.
m transfers a parameter to a s¥b-
hen the subroutine oct Viable that is created when the
changes the value ofthis local variableMain program
RET
‘Subroutine P ‘Subroutine Q
OW Typical Two-Lovel Subroutine Nesting
Main program M Resume the execution
is running (oof at Ma
A— Subroutine P Return to the —\
is called caller M
Resume the execution’
Execute Subroutine P (otP at PR
Retum to the —»
caller P
‘Subroutine Q
is called
Execute Subroutine Q ¢
Expected Control Flow
sP—>
PA
a MA
Contents ofthe stack 4. Contents ofthe stack
after Executing CALL P atter Executing CALL Q
sp PR
a) MR
sp—+
fe. Contents ofthe stack Contents ofthe stack
Executing the RET ater Executing the
i inthe RET Insttuction in
fain Q the Subroutine P
iy
.16 Implementation of a Two-level Subroutine Nestinga
T
48 COMPUTER INSTRUCTION SE
gram does not change. This is a desirable pry
nae it keeps a function from altering the value gf yy
bec ms a unique valuc £0 the caller.
the main program transfers the address of g
alter the value of the parameter that belongs
allows as ne to transmit the results to the caller
oi iter the variables of the calling program is Known to have g
y of a subroutine 10 Aer et with the eall-by-value approach, However, the eal).
side effet, The ned eect for pasing the results to the ealler
Systane: those features, contemporary processors include either special in.
structions oF additional hardware, For example, the LINK and UNLK (unlink) instruc.
tions of the MC68000 processors allow «compiler designer to implement Pascal func.
tions and pocedures with minimum effort. The VAX-L1 CPU includes two dedicated
registers: AP (argument pointer) and FP (Irame pointer), These hardware clements
greatly simplify the task of argument passing. A thorough discussion of this topie is
beyond the scope of this book
‘An interrupt may be detined cdware-initiated subroutine call. For example,
in a microprocessor-based system, an VO device such as a keyboard may generate an
interrupt to inform the processor that valid data is available. When this interrupt is
Feeognized, the processor suspends the execution of the currently running program,
saves the contents of the PC’in the stack, and transfers the control to a service routine
dedicated to serve the keyboard. ‘The service routine in this case reads the keyboard
dat wy in the main memory for further Processing, and returns the control to the
suspended program.
IF the service routine needs the CPU registers,
saved in the stack before the service ro 7
ng Operation, and to speed it up,
main pro;
the actual parameter of the
i grams
function subprograms because It SE
ane 1, Hence, the funtion rtumns f
mv the call-by-reference approacls
| subroti
‘ail "
their previous contents must be
Hire tetully uses them. ‘This is a time-consum-
different microprocessors use dift acl
ng operation, and to SsOrs use different approaches.
Bor xa me a 7 z $0 mittoprocessor, all CPU registers are duplicated, For each
7 reg ct X, here is an alternate companion register X*. At any given time, either
he original or alternate set of CPU ro vain vetive. 1 ‘
sisters can remain activ
he org 7 ; iain active, willy, a user pro-
ga 7 u ne the orinal Set of CPU registers, When aue i a Me
IS automatica il 7 . ; :
Sean Helly switches 1 the alternate register set so the pi routine can
se all the registers in the aly ct. Howe i
alternate ever, i is
serviced, the system automatically switehes backs oie opener ater the inerut i
PrOBFUMI en continue “trae”, OTBinAl Set OF registers 50 the
“hue. This concept provides a fast response
execution of the suspended
to external interrupts,
Moder processors. s
wwch as the MC
software internal interrupts cattey (re C8
called TRAD io td VAX-L1 provide a number of
inte instructions, “1 :
Chapter 6. "ctions. ‘These are covered in detail in
2.4.5 System-control instruc
ns
With the advent of 1ov-€0st VLSt yj
cessors is feasible. Such a imultingy PCeSSOrS,
ng speed a mayne OSESOT systeny SERINE systems with several PO™
ire re 8 signitic ap!
processors will be allowed to share 4 e'¥- In a typieny a int advantages, such 3
ultiprocessor system, scveral
"Y unit. In this situation, at any give™2.4 INSTRUCTION TYPES 49
time only one processor should be allowed to access the shared memory. ‘This problem
is known as the mutual exclusion problem.
Some processors such as the Motorola MCG8000 have a special instruction called
test and set (TAS) to provide a hardware-based solution to the above problem. The TAS
instruction is used to prevent access to a shared resource by other programs when one
program has control of the resource. This is sometimes called lockout. The TAS instruc-
tion is used to test and modify a byte length held either in a data register or in memory.
For example, consider TAS (memory). If (memory) = 0, then set the zero lag Z = 1,
else set Z = 0, N = 1, and then set bit 7 of the memory location to 1, Now, consider
TAS (AL). Suppose (A1) = O0yqs then after execution of TAS, Z = 1 and (AL) is
changed to 80;q. I the initial zero value indicated the memory area was free for use, a
subsequent test of the memory area with the value 80) would indicate the memory was
in use. ‘To prevent accessing a shared memory in a multiprocessor system simult
neously by more than one processor, the ‘TAS instruction has an indivisible read-modily-
write cycle. Once the operand is acdressed by the GS000 executing the TAS instruction,
the bus is not available to another processor until the ‘TAS instruction is completed,
aan illustration of synchronization by the ‘TAS instruction, consider two 68000
processors that are interfaced through a shared RAM. It is desired to transfer (Dp) from
Processor | to (D2) by Processor 2 using the RAM byte ‘TRDATA, as shown in Figure
2.17. To accomplish this transler and provide proper sychronization, Processor 1 can
execute the following routine for writing (Dy) into the RAM location TRDATA:
LOOP! — TAS ‘TEST
BNE Loop!
MOVE Dg, (TRDATA)
CLR.B TEST
Processor 2 can then transfer (TRDATA) to Dz by executing the follow
ig routine:
Loop2 TAS TEST
BNE. Loop2
MOVE — (TRDATA), Dy
CLR.B r
Processor ko RAM | Processor 2
Figure 2.17 Interfacing Two Processors Via Shared RAM50 COMPUTER INSTRUCTION SET
ion TEST . When Processor 1 execut
s of the location TEST is zero , les the
Initially the contents of the loci ve 0; it sets (TEST) to 80,6 and falls through the
TAS instruction, it finds that (TEST) MOVE DO (TRDATA) instruction, At me
ve wa it executes the I 4
Nerang Pocatar executes the TAS instruction. Processor 2 finds th
point, assume th a
z srocedar 1
i ss LOOP2 (because Z = 0). When Processor I is done
EST) = 80, and it will not pass LOO! =). Woeesior | isd
co data tanator it will execute the CLRB TEST instruction and so (TEST) are clare
jis causes Processor 2 to n ec i
eer rve that at any given time, the value of the memory byte TEST i
00 of 80H. For this reason, this byte often is referred to as binary semaphore.
in
all through LOOP2, and hence it can access the
Tia VO Instructions
VO instructions allow a processor to perform input and output operations. An input
instruction allows a peripheral to transfer a word to either a CPU register or memory.
Similarly, an output, instruction enables a processor to transfer a word int the buffer
register of a peripheral device, In processors such as PDP=11 and MC65U00, a periph-
cral device is mapped to a main memory address. In this situation, data moves from
CPU to & memory location and vice versa constitute output and input operations, re-
This method is known as memory-mapped 10. ‘The instruction set of amy
* employing this approach does not include any special /O instructions. Also,
environment, a prog n exploit all the available addressing modes for
Performing VO opefations in an efficient manner, In Processors such as 8085, Z80, and
8086, a peripheral device can be mapped to an address in ‘ttc address space called
HO space. The ii struction set of each one of these Processors include special input (IN)
and output oun tnsiructions, These instructions are shorter than the regular move
stiuctions, ind so program execution time is expedited. Also, since the memory ad-
dress space does not conilict with the VO
: s © VO space, the entire memory ca letel}
utilized exclusively for storing code and dats Mee
From the preced
with 1/0 operations,
V s throughput (number .
eh throu. Of tasks processed per
i fine) ects the recon SPeedl of a peripheral deview iy y ically 20 10.90
mes lower vriva® Processor. In order 19 overcome this difficult 7 hardware ap-
oer ed dliect memory access (DMA) ig employed. In this method the processor
s |, the process
iates an UO operation by tran
the memory, block size, and the meters such as starting address in
a hardware device called a DMA
contéoller. Afier this, the DMA ¢
sor is free to do any OL of the VO transfer. The proces-
ng the DMA, controller, Since /O
discussion, we notice that the
and CPU ag
faster_and
+ YeSktop worl
S tial ay
ns and Thinicomputers2.5 REDUCED INSTRUCTION SET COMPUTER (RISC) 51
nical computing market. This market is currently worth $8 billion, and sales could k
to $28 billion by 1989. The major RISC vendors are Pyramid Technology, which makes
UNIX-based supermini computers, and RIDGE computers, which produces computers
for enginccring workstations.
‘The basic idea behind RISC is for machines to cost less yet run fuster, by using a
small" set of simple instructions for their operations, Also, RISC allows a balance be-
“Wween hardware and sofiware based on functions to be achieved to make a program run
faster and more efficiently. The philosophy of RISC embraces six principles: reliance
on optimizing compilers, few instruction and addressing modes, fixed instruction for-
mat, instructions executed in one machine cycle, only call/return instructions accessing
memory, and hardwired control.
The trend has always been to build CISCs (complex instruction set computers),
which use many detailed instructions, However, because of their complexity, more
hardware would have to | J, which actually would slow down the computer. The
“OTE Instructions, the more hardware logic is needed to implement and support them.
For example, in a RISC machine, an ADD instruction takes its data from register On
a VAX, each operand can be stored in any of 14 different forms, so the compiler must
check 14 possibilities.
The principles of understanding optimizing compilers and what actually happens
when a program is executed lead to RISC. It turns out that RISC is really as much a
philosophical approach a ny implementers, RISC is just common
sense.
However, not everyone in the industry favors using RISC as opposed to CISC.
Some computer architects see RISC as a fad, or misleading claim. Their claim is that
the advantages of RISC have nothing to do with reduced instruction sets. A study done
by D. Patterson [7] at the University of California, Berkeley, showed that much of the
performance of RISC and CISC machines has come from having lots of registers rather
than from having few instructions. Critics also noted that RISC designs need to keep
Juggling program requirements with the number of available registers. If those factors
{got out of balance, there would be a Jot of time-consuming memory swaps made by the
processor, hence negating much of the performance advantage. According to Hewlett-
Packard's Spectrum machine, which uses only 32 registers, this is-not always the case.
The Spectrum line was the result of the reduction and simplification of the HP
300 instruction set. ‘To exploit the increased power of the RISC-type architecture, users
would employ an optimizing compiler to recompile automatically their existing appli-
cation software, The software can run efficiently on everything from personal computers
to mainframes. Hewlett-Packard was able to only use 32 registers based on a tremen-
amount of analysis and simulation, Therefore, it achieved optimal performance
a very few registers.
Another argument crities have against RISC is the claim that the RISC technology
is not well suited for modern general-purpose computing jobs. One example is floating-
point arithmetic capability for high-precision numerical calculations. These operations
require more than one machine cycle to execute; thus they do not fit into the single-
cycle-per-instruction RISC philosophy. It is also very difficult to perform memory mane
agement or swapping chunks of data between devices such as disk drives and CPU
memory with simple instructions, Therefore, none of the general-purpose computers52 COMPUTER INSTRUCTION SET
built with RISC principles are completely RISC machines. They are called “Riso,
like.” This means most include atleast a few comple instructions 2s well 36 ins
tions taking more than one machine cycle. However, there are some “pure” Rig
machines that are built by universities, that have only 30 or 40 instructions,
Critics of RISC also claim RISC relies too heavily on compiler design. Without g
good optimizing compiler, RISC is not better than CISC and may be worse, Opposing
arguments claim writing the optimizing compiler for a RISC machine is easier than for
4 CISC machine because of the simple RISC design.
2.5.1 Case Study: RISC I (University of California, Berkeley)
‘The RISC machine presented in this section is the one investigated by D. Patterson and
C. Sequin [8]. The authors proposed the computer RISC 1 with the following design
constraints:
1. Only one instruction is executed per eycle.
2. All instructions have the same size
3. Only load and store instructions can access memory,
4. High-level languages (ILL) are supported.
Owing to the larger user community of C and Pascal, these are the two languages
that are considered for RISC 1. A simple architecture implies a lesser number of tran.
sistors, and this leads to the fact that most pieces of RISC HLL system are in software,
Hardware is utilized for time-consuming operations. Using C and Pascal, a comparison
study was made to determine the frequency of occurrence of particular variable and
Statement types. Studies revealed that integer constants appeared most frequently, and a
study of the code produced revealed th
consuming operations.
it the procedure calls are the most time-
Basic RISC Architecture
The RISC | instruction set contains a fow
shift operations). These instructions op
and registers are all 32 bits long. RISC inst
imple operations (arithmetic, logical, and
© on registers. Instruction, data, addresses
Tuctions fall in four categories: ALU, mem-
ory aecess, branch, and miscellaneous. The execution time is giver by the time taken
to read a register, perform an ALU operation, and Slore the result in a register. Register
0 always contains 0. Load and store instructions move data between registers and
memory. ‘These instructions use two CPU eyes. Variations a memory-access instruc-
tion: a extended or zero-extended 8-bit, 16-bit and
32-bit data, Though absolute and register indirect addressing are not directly available,
they may be synthesized using register 0, Branch instructions include CALL, RETURN,
and conditional and unconditional jumps. ‘The contin e are the
: Py nal instructions available arc t
standard ones used in PDP-11, ructions availa
Instruction Format
[epee sec(I) | dest(s) | soureet(s)
imm(t) | source2(13)2.5 REDUCED INSTRUCTION SET COMPUTER (RISC) 53
For rogistcr-to-register instructions, dest selects one of the 32 registers as destination of
the result of the operation that is itself performed on registers source! and source2. If
imm equals 0, the low-order 5 bits of source? specily another register, If imm equals 1,
then source? is regarded as a sign-extended 13-bit constant. Since the frequency of
integer constants is high, the immediate field has been made an option in every instruc-
tion. Also, sec determines whether the condition codes are set. Memory-access instruc-
tions use source! to specify the index register and source? to specify offset.
Register Windows
As mentioned earlier, the procedure-call statements take the maximum execution time.
A RISC program has more call statements, since the complex instructions available in
CISC are subroutines in RISC. ‘The RISC register window scheme strives to make the
call operation as fast as possible and also to reduce the number of accesses to data
memory. The scheme works as follows.
Using procedures involve two groups of time-consuming oper
ing oF restoring registers on cach call/return and passing parameters and results to and
from the procedure, Statistics indicate that local scalars are the most frequent operands.
This creates a need to support the allocation of locals in the registers. One avail-
able scheme is to provide multiple banks of registers on the chip to avoid saving and
restoring of registers. Thus cach procedure call results in a new set of registers being
allocated for use by that procedure. The return alters a pointer that restores the old set.
A similar scheme is adopted by RISC, However, there are some registers that are not
saved or restored; these are called global registers. In addition, the sets of registers used
by different processes are overlapped in order to allow parameters to be passed. In other
machines, parameters are usually passed! on the stuck with the calling procedure using
a register to point to the beginning of the parameters (and also to the end of the locals).
‘Thus all references to parameters are indexed references to memory. In RISC { the set
of window registers (r10 to 131) is divided into three parts. Registers 126 to 131 (HIGH)
parameters passed from the calling procedure, Registers r16 to 25 (LOCAL)
are for local scalar storage. Registers r10 to r15 (LOW) are for local storage and for
ced 10 the called procedure. On each call, a new set of r10 to 131
registers is allocated. The LOW registers of the caller are required to become the HIGH
registers of the called procedure. ‘This is accomplished by having the hardware overla
the LOW registers of the calling frame with the HIGH registers of the called frame.
“Thus without actually moving the information, parameters are transferred (Refer to Fig-
ure 2.18 for an illustration.)
Multiple register banks require « mechaism to handle the case in which there are
no free register banks available, RISC handles this problem with a separate register-
overflow stack in memory and at stack pointer to it, Overflow and underflow are handled
With a trap to a software routine that adjusts the stack, ‘The final step in allocating
variables in registers is handling the problem of pointers. RISC resolves this by giving
addresses to the window registers. If a portion of the address space is reserved, we can
determine with one comparison whether an address points to a register or to memory.
Load and store are the only instructions that aecess memory and they take an extra cycle
already. Hence this feature may be added without reducing the performance of the load
and store instructions. This permits the use of straightforward computer technology and
still leaves a large fraction of the variables in registers.
ons, namely, sav-COMPUTER INSTRUCTION SET
54
prc A proc proc
la
HIGH, 126,
125,
LOCAL, 16, Bly
15, 26,
LOWA/HIGHB ro, 5p
116,
LOCAL, r15y Ble
10, 126¢
LOWB/HIGHC mse
116.
LOCALe 1ISc
o 110¢
A
GLOBAL
Cat
Floure 218 Usage of Overlapped Register Windows (From Patterson, D., and
©. Sequin A VLSI FISC, Computer, September 1982. Reprinted win permission.)
Delayed Jump
A normal RISC 1 instruction cycle is long enough to execute the following sequence of
operations:
1. Read a register,
2. Perform an ALU operation,
3. Store the result back into a register,
Performance is increased by Prefetching the next
instruction, To facilitate this, jumps arc redefined such
the following instruction, This is
in Chapter 7.
instruction during the current
that they do not occur until after
alled delayed jump, and its significance is explained
Evaluation
A study was made to compare RISC 1 and VAX and PDP-11 and MC68000. The results
of the test are illustrated in Figure 219. This la is i
Eenerited by 4 C compiler for these four nats "4 is collected by studying the code
uF Machines for the call i
In the preceding fest, to parameters » To have benz.
rian Ne Pseing Parameters are assumed to have been passed and three
29.2 RISC Assessment
following material strives to gh : 5
language (HLL) environment as Cig rw a ce gr Ovide as good a high-level
manee of HLL programs on RISC any CISC. The ate ee Comare the per
''St is speed and the second is the2.5 REDUCED INSTRUCTION SET COMPUTER (RISC) 55
System Instructions Size Register Data Memory
‘executed (bytes) Accesses accesses
VAX 5 16 59 19
MC68000 9 30 41 12
PDP-I1 19 44 st Is
RISC 6 Py 12 0.2
Figure 2.49 Performance Assessment of RISC | (From Patterson D., and A. Piepho,
Assessing RISCs in High-level Language Support, IEEE Micro, 1982. Reprinted with
permission.)
penalty of using HLL on a given machine, ‘The index of evaluation used in the latter is
the ratio of speed of execution of a progrant written in assembly to the speed of the
same when written in HLL, ‘This ratio is known as the HLL execution support factor
(HLLESE), A system with HLLESE elose to 0 penalizes the use of HILL, whereas an
HLLESF close to 1 does not reward the use of assembly kinguage.
The results of an experiment conducied to evaluate the relative performance of
RISC, 68000, Z80, and VAX-11/780 with respect to the preceding two metrics are
reproduced in Figure 2.20. Five variables, or benchmark programs, have been utilized
Number of times slower than RISC
Risc | 6800 | z80 | VAX
Benchmark (ins) 1/780
E—String search | 0.46 | 2.8 | 16 | 13
F—Bit test 006 | 48 | 72 | 48
H—Linked list o10 | 16 | 24 | 12
o43 | 40 | 52 | 30
3.0
K—Bit mat
I—Quicksort 30.40 | 4.1
HLLESF - HLL Execution Support Factor
RISC 6800 Z80.-— VAXII/780
B 0.62 | 0.17 [0.32 0.23
F 1,00 | 0.23 | 0.27 0.34
H 1,00 | 0.92 | 0.96 0.88
K 0.94 | 0.21 | 0.29, 0.34
1 0.92 | 0.16 | 0.44 0.47
Figure 2.20 Two Metrics to Assoss the RISC’s Effectiveness
in HLL Support (From Patterson, D., and R. Piapho, Assassing AISCs in High-level
Language Support, IEEE Micro, 1982. Reprinted with permission.)56 COMPUTER INSTRUCTION SET
for this experiment, These are selected as being representatives of frequent “real.
world” problems. They allow manipulation of characters, integers, and floating-point
dita besides testing interrupt handling and addressing modes. Actually there are 12 such
programs, but 7 were omitted either duc 10 st kwck of virtual memory or the difficulty
involved in writing them in HLL. A brief description of the remaining 5 follows:
1. String search Examines a long character string for the first occurrance of a
substring,
2. Bit test, set and reset V
string.
sels, oF resets at bit within a tightly packed bit
3. Linked-list insertion; Inserts a new entry into a doubly linked list.
4. Quicksort: Performs @ nonrecursive quicksort algorithm on large vectors of
fixed length records
5. Bitmairix transportation: Takes a tightly packed square bit matrix and t
poses it
ay
RISC is & combination of scientilic and philosophic rule called Occam's razor,
which states that among competing theo
complex
RISC, which is a trend tows
on the computers of the future,
. the simplest should be preferred to the
| simplicity, will eventually have a delinite influence
QUESTICNS AND PROBLEMS
2.1 What are the characteristics of a good instruction format?
2.2 What are the merits and demerits of the block-code encoding technique?
2.3 Explain the key idea behind the expandit
niques,
2 op-code and Huffman encoding tech-
2.4 Ina computer instruction format, the instruction length and the size of an address
ficld are 11 and 4 bits, respectively. Is it possible to have
5 bo-address instructions
45 one-address instructions
32 zero-address instructions.
ng this form
1? Justify your answer.
2.5 Using the instruction format of Problem 2.4, determin
ke to have
whether of not is it pos-