Block-3 The Processing Unit
Block-3 The Processing Unit
9.0 Introduction
9.1 Objectives
9.2 Instruction Set Characteristics and Design Considerations
9.2.1 Operand Data Types
9.2.2 Types of Instructions
9.2.3 Stored Program Organization
9.3 Number of Addresses and Instruction size
9.4 Instruction Set and Format Design Issues
9.5 Addressing Schemes
9.5.1 Immediate Addressing
9.5.2 Direct Addressing
9.5.3 Indirect Addressing
9.5.4 Register Addressing
9.5.5 Register Indirect Addressing
9.5.6 Relative Addressing Scheme
9.5.7 Base Register Addressing
9.5.8 Indexed Addressing Scheme
9.5.9 Stack Addressing
9.6 Summary
9.7 Answers/Solutions
9.0 INTRODUCTION
In the previous two blocks, you have learnt the concepts of data representation,
memory organization and Input/output organisation. This Unit discusses about one of
the most fundamental aspect of a general-purpose computer – the instruction. A
computer can do general purpose or specific tasks using the instructions. Instructions
are the mediator between the programmer and hardware. Hardware is the computer
architecture, and software is the instruction set architecture. Instructions set
architecture is the only way you can interact with the computer machines. An
instruction set architecture consists of a complete set of instruction to do a task on a
specific computer system.
In this unit, details of instructions format, operands data types, instruction types and
various addressing modes have been discussed.
9.1 OBJECTIVES
5
9.2 INSTRUCTION SET CHARACTERISTICS AND
DESIGN CONSIDERATIONS
A programmer writes the program instructions in assembly language, which are easily
understandable for the humans. But the assembly language is not understandable by
the computer machines, as these computer machines are made of digital modules,
which understand only binary logic. Hence, during execution, program is converted
into binary codes which is understandable by the computer machines. A binary coded
instruction is in format, which is interpreted and executed by the machine. Instruction
format is discussed next.
Instruction format
A simple instruction format is shown in Figure 9.1. It consists of three components:
the operand address, the operation code or opcode and the mode. The number of bits
is allocated to each component. Basic definition of these components are:
31 30 24 23 0
I Opcode Operand
A point that can be noted for the instruction format of Figure 9.1 is that size of
operand filed of instruction is 24 bits. In case, this operand is a direct operand, then
the size of the main memory addresses that can be supported by the machine having
instruction format, as given in Figure 9.1 is 224 = 16 M memory words.
6
9.2.1 Operand Data Types
In computer, operands are the data bits on which operation is to be performed. As
shown in Figure 9.1, data types can be categorised as logical, integer and characters.
Logical data is Boolean which may be 1/0 or true/false. Integer data may be further
divided into: decimal, fixed point, and floating point. And the character data have
mnemonics such as: ASCII, UNICODE etc.
Data types
Binary
Instructions to transfer the data: In computer machines data transfer takes place
between processor registers, between memory and processor register, between
processor register and input or output interface. The instructions with data in the
central processor register are faster than that of instructions with data in memory.
Each instruction has a mnemonic which can be used as part of the assembly language.
Please note that these mnemonics may vary for different machines. Data transfer
instructions are listed below:
7
Table 9.1: Some Data transfer instructions of computer
Logical and bit manipulation: The logical and bit manipulation functions are used to
perform logical operations or manipulation by setting/resetting a single data bit. Some
of the logical operations are -AND, OR, XOR etc.
Bit manipulation - selective set and mask: Let’s assume A= 1010, and you need to set
the least significant bit (LSB), keeping all the remining bits unchanged. Since you just
need to set the LSB, therefore, the second operand for selective set would be B=0001
and you will use OR operator, as shown below:
A 1010
B 0001
A OR B 1011
You may please notice that upper three bits in the result (A OR B) remains
unchanged, whereas, the LSB is set to 1. Hence, least significant bit of A is set to 1 by
performing the OR operation. Similarly, AND function can be used to clear the
selective bit. For example, if for the given A value (1011), you just want to use the
8
LSB or in other words you would like to make the upper three bits as 0’s. This is
performed with the help of AND. For this example, the value of B would be selected
as 0001. This operation is also called the mask operation and is shown below:
A 1011
B 0001
A AND B 0001
Shift instructions: Shift instructions are to shift the register data bits in left or right
direction. In shl R, data bits of the register are shifted left where data bit input takes
place at the rightmost bit and leftmost bit is lost. Whereas in shr R, data bits are
shifted right and data is input from the leftmost bit. In shr R, rightmost bit is lost. In
case of circular shift no bit is lost. In cil R, data bits move to the right and rightmost
bits is stored in the leftmost bit. In cir R, data bits move to the left and leftmost bit is
stored in the rightmost bit. The Table 9.2 shows these two types of shift operations.
Right most
R shr R Shift right
bit is lost
register R 0 0 1 1 1
Answer is
0011
Left most
R cil R Register bit is
circular moved to
shift left 0 1 1 1 rightmost
bit, no bit
loss.
Answer is
1110
Right most
R cir R Register bit is
circular moved to
shift right 0 1 1 1 left most
bit, no bit
loss
Answer is
1011
Program control instructions: The address of the next executable instruction is stored
in the program counter register. Program control instructions contains the condition,
which may cause address alteration in the program counter. Hence, the execution of
9
program control instruction results into change in program counter address breaking
the sequence.
Branch (BR) and Jump (JMP): Branch and jump instructions are used to
change the flow of control of a program to a new instruction address specified
as the target of branch and jump instruction. These instructions may be
conditional or unconditional. For example,
JMP: is an unconditional branch to an instruction address. It may be used to
implement simple loops.
JNE: (jump not equal) is a conditional branch instruction. This instruction checks the
zero-flag register to determine, if two operands are equal or not (How the flags would
be set? This will be explained in Block 4). The jump to the specified address of
instruction will take place, only if the zero flag is not set.
In the following example, conditional branch has been utilized in which, PC will be
loaded with memory location 707 if the content of accumulator register (AC) is zero.
And an unconditional JUMP instruction is to execute the program from a particular
memory location 901.
SKIP: The SKIP instruction skips the next instruction to be executed in sequence.
Hence, it increments the value of PC by one instruction length. The SKIP can also be
conditional. For example, the instruction ISZ skips the next instruction only if the
result of the most recent operation is zero.
10
Subroutine call and return instruction : The subroutine call instcution holds the
opcode and address of the start of subroutine. On execution of a subroutine call
instruction, first the return address, which is the addresss of the next instruction to
subroutine call stored in PC is moved to the memory, and then PC is loaded with the
subroutine address from the subroutine call instruction. Once subroutine execution is
completed, program has to retun back to the calling program to execute the next
instruction after the subroutine call instruciton, which was stored as return address.
Therefore, all the subroutines end with the return instruction. A subroutine call is
implemented as below:
A computer is organized to have processor registers and memory unit. Memory holds
the program instructions as well as the operands/data on which operation is to be
performed. For example, in a 16-bit memory of size 4096 words, as shown in Figure
9.3, an instruction of size one word includes an opcode and one operand address.
Since the size of the memory is 212 or 4096 words, the operand address would be 12-
bits and the remaining 4-bits would represent an opcode, which defines the operation
to be performed. Operand address defined in instruction is the memory location which
holds 16-bit data.
Accumulator is the processor register. The operations are performed on the memory
content and the accumulator data. A detailed and enhanced example of this structure is
given in Block 4, which provides details on 8086 microprocessor.
11
Memory Accumulator
4096×16
Processor register AC
Program Instructions
15 12 11 0
Opcode operand address
Operand’s data
Instruction format
15 0
Operand
3. Consider that A contains 0101 1110 and B contains 0000 1111, then what
would be the value of
(a) Selective Set of A using B
(b) Masking of A by B
12
addresses in an instruction on instruction size and program? It will be discussed with
the help of program that evaluates the arithmetic function:
A =(W+X) × (Y+Z)
Zero address instruction: The instructions used for stack organized computers do not
have the address field in the instruction. As you can see in the following instructions
that no address is assigned in the instruction except for the PUSH and POP
commands. The operands are implicit and are taken from the top of the stack. Hence,
these are called Zero address instructions. The program to evaluate the value of A is
given in Table 9.4.
Table 9.4: Zero Address instructions
PUSH W The stack top position W
PUSH X The stack top position X
ADD The stack top position W+X
PUSH Y The stack top position Y
PUSH Z The stack top position Z
ADD The stack top position Y+Z
MUL The stack top position (W+X)×(Y+Z)
POP A A The stack top position
One address instruction: in these instructions, only one address field is present while
the second operand is an implicit register operand - the AC accumulator register. The
set of instructions that can solve the stated mathematical operation are discussed in
Table 9.5. You can observer that all the instructions in the Table 9.5 have just one
operand address specified in the instruction. We can see, instructions are only with
one address.
Two addresses instruction: Here, one instruction holds two addresses which may be
memory address or processor register. As listed in the following table, all the
instructions hold two addresses one of these is a processor register and the other is the
memory location.
13
Table 9.6: Two-addresses instructions
Three addresses Instructions: These instructions hold three addresses which may be
processor register and/or memory. As shown in the following table, all the instructions
hold three addresses.
ADD R2, Y, Z R2M[Y] +M[Z] Data of memory location Y and Z are added, and
result is stored in processor register R2
Advantage of increasing the number of addresses: As you can observe from Tables
9.4, 9.5, 9.6 and 9.7 that number of instructions that are required to perform arithmetic
operation decreases.
Disadvantage of increasing the number of addresses: The number of bits required to
define an instruction keeps on increasing, thus, increasing the length of an instruction.
Reduced Instruction Set computers (RISC): RISC instructions use three processor
registers for operations that perform computations, whereas one memory address and
one processor register for the input/output instructions, such as load and store
instructions. Instructions performing computational operation do not use memory. The
following table illustrate a program based on RISC machine.
14
Table 9.8: reduced instruction set
In Figure 9.1 of Section 9.2 an example instruction format is shown. This instruction
format stores only one operand address, which is stored in the lowest 24 bits (bit 0 to
bit 23) for the instruction. In case, this address is a direct operand address, then such
an instruction format may support only 224 = 16 M words memory. This computation
assumes that each memory address is an address of one memory word. The opcode
size is 7 bits (bit 24 to bit 30). Therefore, in general, there would be 27= 128 possible
operation codes for this machine. The most significant bit (bit 31) is an addressing
mode bit, which in this instruction format specifies the direct or indirect memory
addressing mode. An instruction of a computer can have several addressing modes,
which are explained in the next section. Thus, in an instruction format, there are three
components: opcode, operand and the addressing mode. Hence, instruction length
depends on the number of bits allocated to each component. The following are the
issues relating to the instruction format design:
15
Instruction Length: Instruction length is critical in instruction format. There is trade of
in smaller vs longer instructions.
Bits allocation to operand and opcode: Bits allocated to operand depends on the
addressing mode used. For example, register addressing would require lesser number
of bits than a memory address. Even the number of opcode depends on the flexibility,
for example, an instruction set may have many different add operation codes for
different addition operations. For example, a machine may have several addition
operations like memory and one immediate operand, one memory and one register
operand, two memory operand etc. each being assigned a different opcode. The
addressing mode bits can facilitate bringing down such instructions. RISC computers
use only register addressing (except for memory read and write instruction), thus, may
simplify the instruction format.
Addressing mode: The bits allocated to addressing mode depends on the number of
addressing modes used in the instruction set. The number of bits allocated will be high
if number of addressing modes been used are high.
16
9.5 ADDRESSING SCHEMES
There is concept of addressing schemes which can be used by the programmer to get
flexibility to do task. There are a number of addressing modes. A machine can support
a number of these addressing modes.
Implied mode: In this mode, the operand is an implied operand for the given
instruction. For example, CPLA is an instruction, which complements the
accumulator register. The instruction does not contain the address of the operand
register; however, it has one implied operand, i.e. accumulator register. Few other
examples of implied addressing mode instructions are: INCA and DECA, where
accumulator content is processed.
INCA Increments accumulator register content by one
DECA Decrements accumulator register content by one
In this, the instruction specifies the memory address, where the operand is stored. This
is one of the most fundamental addressing modes and is present in most of the
machines, including RISC machine. For example, the following example, the content
of memory location 200 would be loaded in the accumulator register.
In this instruction, memory address 200 holds the operand which is stored in the
processor register accumulator. Therefore, instruction address is the effective address.
Figure 9.4 illustrates this instruction at location 20. As per Figure 9.4, the location 200
contains the operand 350.
Thus, Effective Address of operand = 200
The value of operand which would be loaded in AC = 350.
Similarly, for an instruction Load R, 350 in Figure 9.4, the effective address would be
350 and the operand value 10 that is stored in the memory address 300 would be
loaded in processor register R.
17
9.5.3 Indirect Addressing
In this addressing mode, the address given in the instruction is the memory address
where effective address of operands is stored. For example, consider following
instruction in the Figure 9.4:
Load AC (200)
This instruction is stored at memory location 21, as shown in Figure 9.4. The
instruction address 200 contains 350 which is the effective address of the operand.
The operand stored at 350 is 10. Thus, in the indirect addressing mode for the given
example,
Effective Address of operand = Content of location (200) = 350
The value of operand which would be loaded in AC = 10.
20 Load AC 200
Load AC (200)
200 350
350 10
Operands are in register that reside within CPU. For example, consider the instruction:
LD R1
Which loads the content of register R1 to the accumulator register (AC). In this
instruction, R1 is a processor register. In case, this instruction is executed on the
machine shown in Figure 9.5, the effective address in this case is the address of
register R1 itself. On execution of this instruction the content of R1, which is 201, will
be loaded in the AC. Therefore, the content of the AC would be 201, after
execution of this instruction.
In the register indirect addressing, the instruction holds the processor register which
carries the address of the operand in memory. For example, the instruction
LD (R1)
In this instruction the operands from memory location whose address in in R1 register
is loaded in the accumulator.
AC M[R1]
For example, given the values stored in different registers and memory
locations, as shown in Figure 9.5, the instruction LD(R1), which is a register
indirect addressing mode will be interpreted as follows:
• The register R1 contains the value 201, therefore, the effective address of
operand is memory address 201, i.e. EA=201
• The operand value stored in memory address 201 is 175. Hence, content of
memory location 201, which is 175 is loaded in AC register.
18
Therefore, content of the AC after execution of this instruction would be 175.
In this mode, the operands are stored at the memory address obtained by adding
address given in the instruction with the current program counter (PC) location. For
example, the instruction:
LD $ADR ; ADR is the value of the operand address field of instruction.
This instruction results in the operation: AC M[PC+ADR], which means that the
operands are located at memory address (effective address) PC+ADR. Assuming this
is the instruction given in location 100 of the Figure 9.5, which can be written as:
LD $100 ; Please note ADR=100 in this instruction.
The effective address is calculated as: content of PC + ADR of the instruction. Since
the current instruction is at PC value 100, on fetch of this instruction PC would be
incremented to the address of next instruction, which is 102. Thus, the effective
address would be 102+100 = 202, which means that operand is at the memory
location 202.
EA =202
Therefore, on execution of this instruction, the content of memory location 202, which
is 130, would be loaded in the accumulator.
In the base register addressing scheme a base register and an offset from this register
is specified in the instruction. Effective address is found by adding the base register
content with the address part of the instruction. For example,
LD BR, ADR
The instruction, as given above, assumes that the instruction specifies the base register
(BR in this instruction) and offset from the base register in the address part of the
instruction (ADR). Assuming that the instruction at the location 100 is using base
register addressing, the instruction can be written as:
LD BR, 100
Since, the value of the base register BR=100 and ADR=100. The effective address
would be:
EA=BR+ADR = 100+100 = 200
Therefore, on execution of the instruction the content of the memory location
200, which is 170, would be loaded in the accumulator.
AC is loaded with 170
In the indexed addressing scheme an address of a memory location is defined and the
value of an index register, which represents an offset, is added to it to find the
effective address of an operand. This addressing scheme is very useful for accessing
an array, where an index register points to offset in an array of values. For example,
the following instruction uses the index register XR with the original offset of data is
stored in ADR, which is the operand field of the instruction.
LD ADR(XR)
The following operation defines the operation performed by the instruction.
19
AC M[ADR+XR]
For example, as ADR portion of the instruction is 100 and if the index register
is referring to 200th element of an array that is XR=200, then effective address
would be:
EA =100+200 = 300
Therefore, on execution of this instruction, the content of Memory location
300 would be loaded in the accumulator. Thus, AC will be loaded with 140
(Refer to Figure 9.5).
AC Mode 100
R1=201
BR=100
170 200
175 201
130 202
140 300
Stack organization operated on Last in First out (LIFO), e.g. as you arrange books one
above other in a bookshelf, while taking out the books last placed book will be lifted
first. A stack is collection of finite number of words, as shown in Figure 9.6. It is a 64-
word stack of memory locations. A register called Stack Pointer (SP) is used to hold
the word address of top of the stack. Data register DR holds the data to be written or
read from the stack. Where FULL and EMPTY are one –bit registers to find out if the
stack is full or empty. Where data bit FULL is 1 when stack is full, and EMPTY bit is
1 when stack is empty. A new data is inserted in the stack using PUSH, when stack is
not full, i.e. FULL=0, the PUSH operation is shown below:
Whereas, POP deletes a data item from the top of the stack and puts it in DR register:
DR M[SP] data of top-most stack location is read to DR
SPSP-1 Stack pointer is decremented by 1
20
63
‐
‐ DR
‐
‐ Data
03 SP
02 Stack
01
00
EMPTY FULL
R=702
720
700
130 701
702
140
720
110
900
120
1202
Figure 9.7: The Example Data
You may assume that the instruction is at location 500 and 501. Please note
AC is accumulator register; PC is Program counter register; XR is index
21
register; and R is any processor register. The address of the instruction to be
executed, is stored in processor register program counter.
9.6 SUMMARY
In this unit, basics of instructions such as operand data types, instruction length, and
instruction set has been discussed. The instruction format with details of operands and
opcode had been explained. In addition to that, various addressing modes, and related
instruction size (w.r.t. number of addresses) has been discussed. Depending on the
addressing modes, the number of addresses in one instruction and the program
complexity has been explained. The flexibility provided to the programmer because of
the addressing modes has also been deliberated.
= 219
22
Indirect bit =1
31 30 23 18 0
24 19
23
Register 702 130 Register holds the Memory address
indirect where operand is stored
24
Registers, Micro-
operations and
UNIT 10 REGISTERS, MICRO-OPERATIONS Introduction
Execution
AND INSTRUCTION EXECUTION
10.0 INTRODUCTION
In the previous unit, you have gone through the concepts relating to various types of
instructions and operands that a computer can have. The main task performed by the
CPU is the execution of the instructions. This Unit focusses on the process of
execution of these instructions by the CPU. This Units tries to answers the following
two questions regarding instruction execution.
What are the steps required for the execution of an instruction? And
How are these steps performed by the CPU?
Execution of an instruction can be divided into sequence of steps, together they
constitute an instruction execution sequence, called instruction cycle. Each of these
steps can be termed as a micro-operation. A micro-operation is the smallest operation
performed by the CPU. These operations put together execute an instruction.
For answering the second question, youmayrecall the basic structure of a computer.
The CPU of a computer consists of an Arithmetic Logic Unit (ALU), the Control Unit
(CU) and operational registers. We will be discussing the register organisation and
ALU in this unit, whereas the control unit organisationis discussed in next unit.
In this unit we will first discuss the basic CPU structure and the register organisation
in general. This is followed by a discussion on micro-operations that include register–
transfer, arithmetic, logic and shift micro-operation and their implementation, which
25
The Central
Processing Unit
forms the basis of design of a ALU. The discussion on micro-operations will
gradually lead us towards the discussion of an ALU structure. The unit will also
discuss about the arithmetic processor, which are commonlyused for floating point
computations.
10.1 OBJECTIVES
The number and the nature of registers is a key factor that differentiates among
computers. For example, Intel Pentium has about 32 registers. Some of these registers
are special registers and others are general-purpose registers. Some of the basic
registers in a machine are:
• All von-Neumann machines have a program counter (PC) (or instruction counter
IC), which is a register that contains the address of the instruction that is expected
to be executed next.
• Most computers use special registers to hold the instruction(s) currently being
executed. They are called instruction register (IR).
• There are a number of general-purpose registers, which can be used for
arithmetic computations or any other purpose.
• Memory-address register (MAR) holds the address of next memory operation
(load or store).
• Memory-buffer register (MBR) or Memory data Register (MDR) holds the
content of memory operation (load or store).
• Processor status bits indicate the current status of the processor. Sometimes it is
combined with the other processor status bits and is called the program status
word (PSW). Some processors also use flags register, which store different flags
set by the processor like carry flag, overflow flag, zero flag etc.
• CPU can access registers faster than it can access main memory.
• Register addressing requires less bits in the instructions for addressing than that
of memory addressing. For example, for addressing 256 registers you just need 8
bits, whereas the memory size of 1MB would requires 20 address bits, a
difference of 60%.
• Compilers tend to use a small number of registers, as large numbers of registers
are difficult to use effectively. A general good number of registers is 32 in a
general machine.
• Registers are more expensive than memory but far less in number.
From a user’s point of view, computers have two different kinds of registers. These
are:
26
Registers, Micro-
Programmer Visible Registers: These registers can be used by machine or assembly operations and
language programmers while programming. A good program minimizes the Introduction
references to main memory. Execution
Status Control and Registers: These registers cannot be used by the programmers
but are used to control the CPU or the execution of a program.
These registers can be accessed using machine language. In general,there are four
types of programmer visible registers.
The general-purpose registers as the name suggests can be used for various functions.
For example, they may contain operands or can be used for calculation of address of
operand etc. However, to simplify the task of programmers and computers dedicated
registers can be used. For example, registers may be dedicated to floating point
operations. Such dedication may lead to design of data and address registers.
The data registers are used only for storing intermediate results or data and not for
operand address calculation.
The address registers are used for address computation. Some dedicated address
registers are:
Segment Pointer : Used to point out a segment of memory.
Index Register : These are used for index addressing scheme.
Stack Pointer : Points to top of the stack when programmer visible stack
addressing is used.
One of the basic issues with register design is the number of general-purpose registers
or data and address registers to be provided in a microprocessor. The number of
registers also affects the instruction design as the number of registers determines the
number of bits needed in an instruction to specify a register reference. In general, the
optimum number of registers in a CPU is in the range 16 to 32. In case registers fall
below the range then more memory reference per instruction on an average will be
needed, as some of the intermediate results then must be stored in the memory. On the
other hand, if the number of registers goes above 32, then there is no appreciable
reduction in memory references. However, in some computers hundreds of registers
are used. These systems have special characteristics. These are called Reduced
Instruction Set Computers (RISC) and they exhibit this property. RISC computers are
discussed in a later unit.
What is the importance of having less memory references? As the time required for
memory reference is more than that of a register reference, therefore the increased
number of memory references results in slower execution of a program.
27
The Central
Processing Unit
Register Length: An important characteristic related to registers is the length of a
register. Normally, the length of a register is dependent on its use. For example, a
register, which is used to calculate address, must be long enough to hold the
maximum possible addresses. If the size of memory is 1 MB than a minimum of 20
bits are required to store an instruction address. Please note how this requirement can
be optimized in 8086 in the block 4. Similarly, the length of data register should be
long enough to hold the data type it is supposed to hold. If the length of a data register
is half of the size of data, then it is possible that two consecutive registers, rather than
on single register, are used to store the data.
For control of various operations several registers are used. These registers cannot be
used in data manipulation; however, the content of some of these registers can be used
by the programmer.Almost all the CPUs have a status register, a part of which may be
programmer visible. A register which may be formed by condition codes is called
condition code register. Some of the commonly used flags or condition codes in such
a register may be:
Flag Comments
Sign flag This indicates whether the sign of previous arithmetic operation
was positive (0) or negative (1).
Zero flag This flag bit will be set (contain a value 1) if the result of the last
arithmetic operation was zero.
Carry flag This flag is set, if a carry results from the addition of the highest
order bits or borrow is taken on subtraction of highest order bit.
Equal flag This bit flag will be set if a logic comparison operation finds
out that both of its operands are equal.
Overflow flag This flag is used to indicate the condition of arithmetic overflow.
Interrupt This flag is used for enabling or disabling interrupts. Enable/
disable flag.
Supervisor flag This flag is used in certain computers to determine whether
the CPU is executing in supervisor or user mode. In case the CPU
is in supervisor mode it will be allowed to execute certain
privileged instructions.
Figure 10.1: Flags or conditional codes
These flags are set by the CPU hardware while performing an operation. For example,
an addition operation may set the carry flag and zero flag; or on a division by 0 the
overflow flag can be set etc. These flags or conditional codes are tested by a program
while performing typical operations like conditional branch operation. The condition
codes are collected in one or more registers. RISC machines have several sets of
conditional code bits. In these machines an instruction specifies the set of condition
codes which is to be used. Independent sets of condition code enable the provisions of
having parallelism within the instruction execution unit.
The flag register is often known as Program Status Word (PSW). It contains condition
code plus other status information. There can be several other status and control
registers such as interrupt vector register in the machines using vectored interrupt etc.
28
Registers, Micro-
☞Check Your Progress 1 operations and
Introduction
Execution
1. What is an address register?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2. A machine has 20 general-purpose registers. How many bits will be needed for
register address of this machine?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
3. What is the advantage of having independent set of conditional codes?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
4. Can you store status and control information in the memory?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
Step 1: Get the instruction from memory to the Instruction Register (IR): This step
itself will consists of several micro-steps, which will require transfer of content among
registers and using the bus.
29
The Central
Processing Unit
Step 2: Decode the instruction: This will be the job of control unit (CU) and it will
issue the necessary set of control signals. This step will be discussed in Unit 11.
Step 3: Fetch the operands, if needed, in the processing unit registers.
Step 4: Execute the instructions using arithmetic logic unit (ALU) and store the results
back to processing unit registers.
Step 5: Store the result back to memory, if needed.
Please note that each of these steps may require several micro-steps and those micro-
steps are the micro-operations. May of these steps are due to the specific functions of
registers, for example, the Program counter (PC) register stores the address of
instruction that is to be executed next. Thus, in order to get the next instruction from
the memory, you are required to transfer this information to the register that is used as
an memory address register. Similarly, instruction once fetched from memory may be
in a data register, since IR is used as input for decode operation, the instruction must
be sent to IR register. The micro-operations that are used for representing data transfer
between two registers or one register and one memory location using the bus are
termed as register transfer micro-operations.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
IR(H) IR(L)
IR
30
Registers, Micro-
2. Information transfer from one register to another is designated by the symbol operations and
, which replaces the content of the destination register by the content of the Introduction
source register, as explained earlier. The content of source register remains Execution
unchanged.
3. The control signal which enables the process of register transfer is shown with
the help of a Boolean control function. This feature is very useful as operation
can be controlled. For example, if c is a Boolean control variable that is it can
have a value of 0 or 1,which controls the transfer of content from R2 to R1,
then it will be represented as:
4. The micro-operations written on the same line are executedin the same clock
time. However, such micro-operations should be free from conflict. The
following example depicts a conflict in micro-operations; thus, these micro-
operations should not be executed in parallel. What is a conflict in micro-
operations? It is explained with the help of following example.
Example: Consider that a register R1 is to be incremented and it is also to be
loaded with the content of IR register, then if you represent the following
parallel micro-operations would be in conflict:
c: Rn Rn+1, RnIR
These two micro-operations would update the register Rn at the same time, so
one of these updated values would be lost.
5. All the register transfers occur during the falling edge transition of the clock.
For more details, you may refer to further readings.
Memory Transfer
A von Neumann machine stores the program and data in the main memory of a
computer. Therefore, instruction execution requires reading and writing operations
from the memory. The main memory and the processing units of a computer are
connected through the system bus, which includes address, data and control bus. The
address bus is used to select the specific RAM word from the memory, which in turn
is transferred over the data bus. The control unit controls the entire process of data
transfer by sending control signals through control bus. The two basic operations
performed on the memory are Read and Write operations.
Memory Read: Read operation on the memory requires the information about the
location of the memory, which is to be read. The processor decides which data word
or instruction word from the memory is to be read. The address of that memory word
is put in an address register (AR) and then applied on the address bus, simultaneously
enabling the memory read operation using the control bus. This causes the selected
word from the memory unit to be placed on the data bus part of system bus. Control
unit also enables data register (DR) to accept the input from the data bus. Thus,
31
The Central
Processing Unit
completing the memory read operation. The following micro-operation shows this
memory read operation (please note the use of [ ] symbol to represent memory):
mr: DR [AR] ; Content of memory addressed by AR is send to DR register
Memory Write: Write operation on the memory requires– the location of the memory,
which is to be written and the content, which is to be written. The processor puts the
address of memory word, which is to be written, to an address register (AR) and then
applies this address on the address bus, simultaneously loading the content of data to
be written, which may be stored in a data register (DR) and enabling the memory read
operation using the control bus. This causes the selected word on the memory unit to
accept the data bus. control unit also enables data register (DR) to accept the data
from the data bus. Thus, completing the memory write operation. This operation can
be represented using following micro-operation:
mw: [AR] DR ;Write the content of memory addressed by AR register by DR
• There should exist a direct path, such as internal bus, from sender register to
receiver register. It may be noted that number of bus lines and the sizes of sender
and receiver register should be the same.
• Since a micro-operation is proposed to be completed in a single clock pulse,
therefore, all the data bits on the receiver register should be loaded at the same
time. Thus, the receiver register must support parallel loading of bits (Refer to
Unit 4 Block1).
R3 R1 − R2
Please note that this micro-operation can also be represented as:
R3 R1 + R2’ +1
Why? R2’is complement of R2, on adding 1 to it you get 2’s complement of R2.
Thus, both the above micro-operations are equivalent.
32
Registers, Micro-
An addition and subtraction micro-operations can be implemented using an ALU that operations and
supports simple arithmetic operations, as discussed in section 10.4. Assuming that Introduction
this ALU does implement the addition, subtraction, simple logic and shift micro- Execution
operation of fixed-point numbers, how will this machine implement multiplication and
division operations on fixed point and floating-point numbers? In such a machine
these operations can be implemented with the help of programs, which may use
micro-operations like addition, shift and so. This kind of implementationrequires that
operations like fixed point multiplication and division be implemented using several
micro-operation steps using several micro-instructions. Thus, fixed point
multiplication and division and floating point arithmetic instructions cannot be
considered as micro operations for this kind of machine.
10.3.4Logic Micro-operations
R1 R1.R2
The result of the micro-operation, as given above, will be stored in the R1 register. A
typical use of this micro-operation is shown in the following example.
Example 1: Assume that two four-bit registers R1 and R2 contains the data 1100 and
1010 respectively. What would be the output if following micro-operations are
performed on these two registers:
i. R1 R1 . R2
ii. R1 R1 + R2
iii. R1 ~R1
Solution:
i. 1100 .1010 = 1000
ii. 1100 +1010 = 1110
iii. ~1100 = 0011
Example 2: Consider a register A containing an 8-bit value 01010011. Find the value
of register B and micro-operation, which can be used to set the upper four bits of the
register A, while the lower four bit remains unchanged.
Solution: To set the upper four bits irrespective of the values of A while keeping the
lower four bits unchanged, the register B can consist of value 11110000 and the
micro-operation OR can be used, as shown below:
Register A 0 1 0 1 0 0 1 1
Register B 1 1 1 1 0 0 0 0
A OR B 1 1 1 1 0 0 1 1
Thus, in the output the upper four bits contains value 1, while lower four bits are same
as that of register A.
Example 3: Use the same value of register A, as used in example 2. Find the value of
register B and micro-operation, which can be used to clear the upper four bits of the
register A, while the lower four bit remains unchanged.
Solution: To clear the upper four bits irrespective of the valueof A while keeping the
lower four bits unchanged, the register B can consist of value 00001111 and the
micro-operation AND can be used, as shown below:
33
The Central
Processing Unit
Register A 0 1 0 1 0 0 1 1
Register B 0 0 0 0 1 1 1 1
A OR B 0 0 0 0 0 0 1 1
Thus, in the output the upper four bits contains value 0, while lower four bits are same
as that of register A. This operation is sometimes also referred to as mask operation,
where the upper four bits of the register A are masked out.
Example 4: Use the same value of register A, as used in example 2. Find the value of
register B and micro-operation, which can be used to clear all the bits of a register.
Solution: To clear the entire register A, you can use register B same as that of register
A and use XOR micro-operation, as shown below:
Register A 0 1 0 1 0 0 1 1
Register B 0 1 0 1 0 0 1 1
A XOR B 0 0 0 0 0 0 0 0
Example 5: Use the same value of register A, as used in example 2. Find the value of
register B and micro-operation, which can be used to clear all the bits of a register.
Solution: One of the simplest ways to complement a register is to perform NOT
micro-operation on register itself. An alternative native method would be to perform
XOR with register B containing all 1’s, as shown below:
Register A 0 1 0 1 0 0 1 1
Register B 1 1 1 1 1 1 1 1
A XOR B 1 0 1 0 1 1 0 0
10.3.5Shift Micro-operations
Shifting of bits of a register can be used for several useful functions in a computer,
such as serial transfer of data, multiplication operation, division operation. A register
consists of a linear sequence of bits, which can either be shifted towards the left
direction or right direction. Shifting by one bit, irrespective of left or right shift, will
result in one bit to move out of the shift register from one end and one bitwill be input
to the register over the other end. Shift operations have been discussed in the Unit 9.
Shift operations are of three basic types:
1. Logical Shift: In the logical shift, the input bit is kept as 0 and output bit is
discarded.
2. Arithmetic shift: In arithmetic shift the sign of the number is kept the same
3. In circular shift the output bit is circulated to the input.
34
Registers, Micro-
……………………………………………………………………………………… operations and
……………………………………………………………………………………… Introduction
Execution
………………………………………………………………………………………
3. Consider a register R1 contains 00010011. Select a suitable register R2 and
sequence of logic micro-operations that can perform the following tasks:
(i) Reset the complete register R1, you must use this using Mask micro-
operation and not using XOR.
(ii) Insert the 11001000 data into the register R1. You may use more than
one micro-operation to do this task.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
Step 1 (FI cycle):An instruction is available in the memory and program counter
register (PC) points to this instruction, which is to be executed. Thus, in order to get
the instruction from the memory, bus would be used along with memory address
register (MAR). Thus, a sequence of operations would be needed to perform this
operation. This sequence is represented with the help of a timing sequence using the
timing control T1, T2, etc. Please note the micro-operation with timing control T1 will
36
Registers, Micro-
be performed prior to micro-operations with timing control T2 and so on. Figure 10.4 operations and
lists the micro-operation sequence of FI cycle. Introduction
Execution
• Transfer the content of PC to MAR. T1: MAR PC
• Apply MAR on address BUS; Control unit enables
the memory Read operation and DR is enabled to
receive content on the data BUS. Thus, the content T2: DR (MAR)
of memory location pointed by MAR is read to DR.
• Perform the following two operations in parallel at
time T3:
o Increment PC so that it points to the next
instruction to be executed. (PC is incremented by
one memory word length, as it is assumed that
each instruction is just one word long and T3: PCPC +1
memory address is a word address).
o The instructionin DR is sent to IR to complete
the FI cycle. : IR DR
Step 2: DI: Control unit performs the decoding of instruction. It identifies the two
important things; first what operation is to be performed and second what addressing
modes are used by the instruction. The addressing modes are to be decoded so that the
data is brought in the ALU registers for executing the decoded operation. Since
decode operation is performed by the control unit, more details related to this
operation would be discussed in Unit 11.
Step 3: FoA: In this step the operand address is converted to the direct operand
address and is stored in the address part of instruction. This address is used in the next
step for instruction execution. In the case of the present instruction format only two
possible addressing types – direct and indirect. In the case of direct addressing the
address of the operand is already in the operand address part of instruction, thus, no
additional micro-operation is needed. However, in case of indirect addressing, the
address of the operand is to be fetched using the address portion of the instruction.
This fetched address should replace the current address portion of the instruction. This
step is shown in Figure 10.5:
Step 4: ExI: The instruction execution is also performed with the help of micro-
operations. The first step of instruction execution will require the operand to be
brought from the address of main memory to the processor register. This is followed
by the arithmetic micro-operation as per the requirements of instruction opcode. The
following examples explains the micro-operations required to execute certain
instructions.
37
The Central
Processing Unit
(1) The step required to execute a simple addition instruction of the form
(assuming that indirect bit is set to 0, i.e., it is a direct instruction):
ADD ADR
The following sequence of micro-operations, as given in Figure 10.6, would
execute this instruction (after FI cycle):
38
Registers, Micro-
• Transfer ADR to MAR and the return T1: MAR IR(ADR) operations and
address, which is in PC is put in the DR. T1: DR PC Introduction
• To branch to the subroutine, the ADR Execution
should be moved to PC. Further, at this T2: PC IR (ADR)
address (ADR), the return address is to T2: (MAR) DR
be put. Please note that ADR is already
in MARat time T1.
• The first instruction of the subroutine
starts at the ADR plus one, thus, T3: PC PC + 1
increment PC.
Figure 10.8: Execution cycle of subroutine call instruction
It may be noted that even interrupt servicing programs are a sequence of instructions.
Each of these instructions are executed as per instruction cycle. You may refer to the
further readings for more details on instruction cycle.
In the previous section, you have gone through various steps of instruction execution.
Can these steps be performed in parallel to execute an instruction? The answer is NO,
as to execute an instruction these steps are to be performed in sequence. However, can
the steps of different instruction be performed in parallel to each other or can be
executed in an overlapped manner. Execution of several instructions in parallel will
require several processing elements, rather breaking the execution of instruction into
steps and executing instruction in an overlapped manner may be useful. This is the the
principle of instruction pipelining. Thus, a simple instruction pipeline would require to
execute instructions in an overlappedway, which will facilitate and reduce the time of
39
The Central
Processing Unit
overall instruction execution. This would require that instruction cycle should be
divided into equal parts which can be executed in parallel for different stages of
instructions. One such decomposition of instruction cycle stages, also called pipeline
stages are -fetch the instruction (FI), decode the instruction (DI), fetch operand
address(FoA) and execute the instruction (ExI). A new stagehas been added here that
allows storage of result back to the memory location, let us call it StR.
Figure 10.10 shows the overlapped execution of seven instructions using a five stage
instruction pipelining.
Time Slot 1 2 3 4 5 6 7 8 9 10 11
Instruction 1 FI DI FoA ExI StR
Instruction 2 FI DI FoA ExI StR
Instruction 3 FI DI FoA ExI StR
Instruction 4 FI DI FoA ExI StR
Instruction 5 FI DI FoA ExI StR
Instruction 6 FI DI FoA ExI StR
Instruction 7 FI DI FoA ExI StR
In Figure 10.10, you may notice that at time slot time slot 5 the pipeline is executing 5
instructions simultaneously, though in different stages. At the end of time slot 5,
execution of the first instruction is completed, thereafter, at the end of each time slot a
new instruction would be completed or in other words the first instruction gets
completed at the end of time slot 5, the second instruction gets completed at the end of
time slot 6, the third instruction gets completed at the end of time slot 7 and so
on.Thus, the execution of instruction in an overlapped fashion has resulted in almost
one instruction execution in one time slot.However, the instruction pipelines suffer
from the problem of resource conflicts. For example, in the 5th time slotThe first
instruction is storing the result, therefore, would require memory reference; at the
same time slot the second instruction is in the execution stage, thus, it will fetch the
operand using memory reference and use the ALU to execute this instruction; at the
same time the third instruction is in operand fetch stage, which also requires memory
reference; the fourth instruction is in the decode stage; and the fifth instruction is in
the fetch instruction stage, which also requires memory reference. Thus, the pipelined
processor should allow each of these instructions to reference memory simultaneously
through different paths so that there is no conflict, otherwise the instructions cannot be
executed in the pipeline fashion.
• The pipelined execution seems good for execution of sequence of instruction, but
instructions that require transfer of control, like conditional branch instruction,
may cause disruption of pipeline sequence, as the decision whether a branch will
be taken or not, can occur only at the execution stage. In case a branch is to be
taken then all the subsequent instructions, which were already fetched in the
pipelineare to be removed from the pipeline.
The branch penalty can be minimized using any of the following schemes:
• Predicting, if branch will be taken or not and accordingly fetching the next
instruction.
• Making provision of pre-fetching those instructions, which may be executed
because of a branch.
• Not allowing fetching of the next instruction to pipeline till the branch decision is
made.
40
Registers, Micro-
operations and
Introduction
Check Your Progress 3 Execution
1) What is the need of the indirect cycle? Will indirect cycle be needed even if an
instruction use register addressing schemes? Justify your answer.
……………………………………………………………………………….
………………………………………………………………………………..
……………………………………………………………………………….
2) What is fetch cycle? Do the present-day machines also have this cycle?
……………………………………………………………………………….
………………………………………………………………………………..
……………………………………………………………………………….
3. What is the role of Interrupt cycle?
……………………………………………………………………………….
……………………………………………………………………………….
………………………………………………………………………………..
41
The Central
Processing Unit Internal Bus
The above structure consists of three registers AC, MQ and DR, with the assumed size
of one word each. Please note that the Parallel adders and other logic circuits (these
are the arithmetic, logic circuits) this von Neumann machine can have at most two
inputs and one output. In other words, it implies that any ALU operation at most can
have two input values and will generate single output along with the other status bits.
In the Figure 10.11, the two inputs are AC and DR registers, while output is AC
register. AC and MQ registers are generally used as a single AC.MQ register. This
register is capable of left or right shift operations. Some of the micro-operations that
can be defined on this ALU are:
Addition : AC AC + DR
Subtraction : AC AC – DR
AND : AC AC˄ DR
OR : AC AC ˅DR
Exclusive OR : AC AC⊕ DR
NOT : AC AC’
In this ALU organisation multiplication and division were implemented using shift-
add/subtract operations. The MQ (Multiplier-Quotient register) is a special register
used for implementation of multiplication and division instructions. Please note that in
the ALU shown in Figure 10.11, the multiplication and division instructions are not
implemented directly using the logic circuits. For more details on these algorithms
please refer to further readings. One such algorithm is Booth’s algorithm and you
must refer to it in further readings.
42
Registers, Micro-
For multiplication or division operations DR register stores the multiplicand or divisor operations and
respectively. The result of multiplication or division on applying certain algorithm can Introduction
finally be obtained in AC.MQ register combination. Please note that these are not Execution
micro-operations for the given ALU organization, as execution of these two
instructions would require a series of shift-add operations.
DR is another important register, which is used for storing second operand. In fact, it
acts as a buffer register and stores the data brought from the memory. In machines
where we have general purpose registers any of the registers can be utilized as AC,
MQ and DR. For more details on ALUs, you can go through the further readings.
ALU consists of circuits that executes the micro-operations. The data is input to ALU
through registers and the output of ALU is also stored in an output register. For
performing the input/output of data to ALU, a BUS is used. So, let us first explain
how an internal bus can be used for Data transfer.
A computer processor has large number of registers. These registers are used to store
data that is required to be processed by the ALU.But, how is the datacommunicated
among these registers? One possibility is to create separate data paths from every
register to all other registers, however, this connection structure would waste large
amount of processor resources. Therefore a shared media call internal BUS. A BUS
consists of shared data lines, which are then connected to every register. Using these
shared lines any two registers can communicate with each other, at a time. The
number of lines in the shared BUS is kept same as the size of registers.
A register is selected for the transfer of data through bus with the help of control
signals. The common data transfer path, that is the bus lines, are made using the
multiplexercircuit.Figure 10.12 shows an example of 2-bit data bus using 2×1
multiplexers. Please note the size of the registers is also of two bits.
The construction of a bus system for two 2-bit registers using two 2×1 multiplexers is
shown in the Figure 10.12. Each register has two bits, viz. Bit 1 and Bit 0. Each
multiplexer has 2-bit data input, numbered 0 and 1, and one control or selection lines,
C0. The circuit assumes no enable bits. The 0thdata input of MUX 0is connected to the
corresponding Bit 0 of Register A and the 1st data input of MUX 0 is connected to Bit
0 of Register B. Similarly, the Bit 1 of Register A and Bit 1 of Register B are
connected to 0th data input and 1st data input of MUX 1 respectively. There is just 1
selection line S, when S is 0, then 0th input values for MUX 0 and MUX 1 are
transferred to the output, that is the content of Register A is transferred on the bus,
whereas, when S is 1 bits of Register B are selected for transmission on the bus.
43
The Central
Processing Unit Register A Register B
1 0 1 0
2×1 2×1
MUX 1 MUX 0
Bit 1
Bit 0
The Figure 10.13lists the selection of data based on the selection input to the
multiplexers.
Thus, to construct a bus for 2 registers of 2-bits each, you would require two 2×1
multiplexers. Similarly, to construct a bus for 8 registers of sixteen bits each, you
would require sixteen 8×1 multiplexers, which will have 3 selection input. Please note
one multiplexer is needed for transfer of one bit. Since sixteen bits are to be
transferred, therefore, sixteen multiplexers would be needed. Further, one of the 8
registers would be selected to transfer data on the BUS, therefore, 3 selection input
would be needed, as 2 3 = 8.
44
Registers, Micro-
operations and
Introduction
Execution
Cin
S
a0 X0 C0
FA
S O0
b0 0 2 ×1 Y0 C1
1 MUX
0
a1 X1 C1
FA
S O1
b1 0 2 ×1 Y1 C2
1 MUX
1
Cout
Figure 10.14: A two-bit arithmetic circuit (adder-subtractor)
The multiplexer controls one of the input to the circuit resulting in a set of micro-
operations. Let us find out how the multiplexer control lines will change one of the
Inputs for Adder circuit. Figure 10.15 shows the two inputs that are possible in the
Figure 10.14. (Please note the convention used in this table, viz. uppercase alphabet
indicates a 2-bit data word, whereas the lowercase alphabet indicates a bit.)
Figure 10.15: Input to full adders using the multiplexers in Figure 10.14
Now let us discuss how by using the carry-in-bit (Cin) and these input values, you can
obtain various micro-operations.
Input to Circuits
• Register A bits as a0and a1, are input to X0 and X1 bits of the Full Adders (FA).
• Register B bits are input as given in the Figure 10.15to form the Y input to FA.
• Please note that each bit of register A and register B is fed to different full adder
unit.
• Please also note that the A input directly goes to adder but B input can be
manipulated through the Multiplexers to create different input values, as shown
in Figure 10.15. The B input is controlled by the selection line S.
• The input carry Cin, which can be equal to 0 or 1, is input to the full adder that
adds the least significant bits. The carry out of this full adder is then fed to the
45
The Central
Processing Unit
full adder of the next higher bit and so on. The carry out of the most significant
bit full adder is the output of the circuit. Logically it is the same as that of
addition operation performed by us. We do pass the carry of lower digits
addition to higher digits. The following Boolean function represents the output
of this adder circuit:
O = X + Y + Cin
Please note that in Figure 10.15, the value of X is a direct input, but the value
of Y is input through the multiplexer using the selection input S. In addition,
the value ofCinis another input. The arithmetic micro-operations that can be
implemented using Figure 10.15 are given in Figure 10.16.
When S= 0, input line B is applied directly to the Y inputs of the full adder. Now,
If input carry Cin= 0, the output will be O = A + B
If input carry Cin= 1, the output will be O = A + B + 1.
Reason: Please observe the following example, where A = 0111 and B=0110, then
B’=1001. The sum will be calculated as:
0111 (Value of A)
+ 1001 ( Complement of B)
1 0000 + (ignore the carry out bit and Add Carry in = 1)
= 0001
Thus, it is a subtract micro-operation.
0111 (Value of A)
1001 ( Complement of B)
1 0000 + (Carry in =0) = 0000
This operation, thus, is equivalent to:
O = A +B’
O = (A – 1) + (B’+ 1)
=> O = (A – 1) + 2’s complement of B
=> O = A – (B+1) Thus, is the name subtract with borrow
Example: Let us assume that the Register A is of 4 bits and contains the value 0101
and it is added to an all (1) value as:
0101
+ 1111
1 0100
The 1 is carry out and is discarded. Thus, on addition with all (1’s) the number
has actually got decremented by one.
In many computers only four logic micro-operations, viz. AND, OR, XOR and NOT
logicmicro-operations, are implemented. The other logic micro-operations can be
derived from these four micro-operations. Figure 10.17 shows one bit, which is the
ithbit stage of the four logic operations. Please note that the circuit consists of 4 gates
and a 4 × 1 MUX. The ith bits of Register R1 and R2 are passed through the circuit.
On the basis of selection inputs S0 and S1 the desired micro-operation is obtained.
S1 4×1
S0 MUX
ith bit of R1
0
ith bit of R2
1
F
2
0 1 F = R 1⋁ R2 OR Operation
1 0 F = R 1⊕ R 2 XOR Operation
1 1 F=R ′ Complement of
Register R1
A typical processor needs most of the control and data processing hardware for
implementing non-arithmetic functions. As the hardware costs are directly related to
chip area, a floating-point circuit being complex in nature is costly to implement.
They need not be included in the instruction set of a processor. In such systems,
floating-point operations were implemented by using software routines. This
implementation of floating-point arithmetic is definitely slower than the hardware
implementation. Now, the question is whether a processor can be constructed only for
arithmetic operations. A processor, if devoted exclusively to arithmetic functions, can
be used to implement a full range of arithmetic functions in the hardware at a
relatively low cost. This can be done in a single Integrated Circuit. Thus, a special
purpose arithmetic processor, for performing only the arithmetic operations, can be
constructed. This processor physically may be separate yet can be utilized by the
processor to execute complex arithmetic instructions. Please note in the absence of
arithmetic processors, these instructions may be executed using the slower software
routines by the processor itself. Thus, this auxiliary processor enhances the speed of
execution of programs having a lot of complex arithmetic computations.
48
Registers, Micro-
If the arithmetic processor has a register and instruction set which can be considered operations and
an extension of the CPU registers and instruction set, then it is called a tightly coupled Introduction
processor. Here the CPU reserves a special subset of code for arithmetic processor. In Execution
such a system the instructions meant for arithmetic processor are fetched by CPU and
decoded jointly by CPU and the arithmetic processor, and finally executed by
arithmetic processor. Thus, these processors can be considered a logical extension of
the CPU. Such attached arithmetic processors were termed as co-processors.
These days floating point units are implemented as a part of the processor itself. More
details on these can be found in further readings.
10.8 SUMMARY
This unit discusses the concept of instruction execution for a hypothetical machine
with the help of micro-operations. It also describes very simplified view of
implementation of micro-operations using combinational and sequential circuits. The
idea is to give you a basic information about the implementation of a computer system
based on its instruction set. The unit also discusses the concept of register transfer
language for representing the micro-operations. The unit also defined the concept of
Instruction Pipeline. The unit also discussed the hardware implementation of micro-
operations. The unit shows a simple implementation of bus, which is the backbone for
any register transfer operation. This is followed by a discussion on arithmetic circuit
and micro-operation there on using full adder circuits. The logic micro-operation
implementation has also been discussed. Finally, the unit also discussed the arithmetic
processors.
You may refer to the further readings for more details on micro-operation concept and
instruction cycle.
50
The Control Unit
11.0 INTRODUCTION
In the previous units of this block, instruction set architecture and concept of micro-
operations were discussed. The micro-operations are used to provide execution
environment for the instruction set in a computer system. The micro-operations are
implemented as a part of the ALU and processor internal BUS.
The control unit is responsible for issuing the control signals, as per the micro-
operationsrequirements of the instructions of a computer. In addition, it controls all
the other units of a computer system. In this unit we are going to discuss the functions
of a control unit and its implementation mechanisms like hardwired control unit and
micro-programmed control unit. The micro-programmed control unit is popular
amongst the Intel computer architecture due to its flexibility and legacy requirements.
The hardwired control unit and other computer logic circuitrycan be designed using
Hardware Description Languages (HDL). The input to these languages are the
electronic circuit structure and expected behaviour.Some of the specialized
HDLprogramming languages areVHDL, Verilog etc. Discussion on these languages is
beyond the scope of this course.
The unit discusses the basic requirements of a control unit, followed by the hardwired
control unit andWilkes control unit. Finally, we will discuss the micro-programmed
control.
11.1 OBJECTIVES
moving data between registers using internal BUS or moving data from/to
memory location using system BUS (register transfer micro-operations)
making ALU to perform a particular operation on the data
regulating other internal operations.
But how does a control unit control the above operations? What are the functional
requirements of the control unit? What is its structure? Let us explore answers to these
questions in the next sections.
The Arithmetic Logic Unit (ALU), which performs the basic arithmetic and
logical operations.
Registers which are used for information storage within the processor.
Internal Data Paths: These paths are useful for moving the data between two
registers or between a register and ALU.
External Data Paths: The roles of these data paths are normally to link the
processor registers with the memory or I/O interfaces. This role is normally
fulfilled by the system bus.
The Control Unit: This causes all the operations to happen in the processor.
The basic responsibility of the control unit lies in the fact that the control unit must be
able to guide various components of processor to perform a specific sequence of
micro-operations to achieve the execution of an instruction.
66
The Control Unit
What are the functions, which a control unit performs to make an instruction
execution feasible? The instruction execution is achieved by executing micro-
operations in a specific sequence. For different instructions this sequence may be
different. Thus, the control unit must perform two basic functions:
But how are these two tasks achieved? The control unit generates control signals,
which in turn are responsible for achieving the two tasks stated above. But how are
these control signals generated? We will answer this question in later sections. First
let us discuss a simple structure of control unit.
In the model given above the control unit is a black box, which has certain inputs and
outputs.
Flags: Flag represent the conditional codes that can be used in decision making.
Flags are set by the ALU operations.For example, a zero flag, if
set,communicates to the control unit that the result of last ALU operations was
zero. Thus, if processor wasexecuting theISZ instruction(skip the next instruction
if zero flag is set), the next instruction should be skipped. This action is initiated
by the control unit, whichwouldincrementPC by one program instruction length,
thus skipping the next instruction.
67
The Central
Processing Unit Control Signals from Control Bus: Some of the control signals are provided to
the control unit through the control bus. These signals are issued from outside the
processor. Some of these signals are interrupt signals and acknowledgement
signals.
Based on the input signals the control unit activates certain output control signals,
which in turn are responsible for execution of an instruction. These output control
signals are:
Control signals, which are required within the processor: These control
signals cause two types of micro-operations, viz., for data transfer from one
register to another; and for performing an arithmetic, logic and shift operation
using ALU.
Control signals to control bus: These control signals transfer data from or to
processor register to or from memory or I/O interface. These control signals are
issued on the control bus to activate a data path on the data / address bus etc.
A control unit must know how all the instructions would be executed. It should also
know about the nature of the results and the indication of possible errors. All this is
achieved with the help of flags, opcodes, clock andcontrol signals.
A control unit contains a clock portion that provides clock-pulses. This clock signal is
used for determining the timing sequence of the micro-operations. In general, the
timing signals from control unit are kept sufficiently long to accommodate the
propagational delays of signals within the processor along various data paths. Since
within the same instruction cycle different control signals are generated at different
times for performing different micro-operations, therefore a counter can be utilised
with the clock to keep the count. However, at the end of each instruction cycle the
counter should be reset to the initial condition. Thus, the clock to the control unit must
provide counted timing signals. Examples, of the functionality of control units along
with timing diagrams are given in further readings.
How are these control signals applied to realisemicro-operations? The control signals
are applied directly as the binary inputs to the logic gates of the logic circuits that are
responsible for implementing micro-operations. All these inputs are the control
signals, which are applied to select a circuit (for example, select or enable input) or a
path (for example, multiplexers) or any other operation in the logic circuits.
In the last section, we have discussed the control unit in terms of its inputs, output and
functions. A variety of techniques have been used to organize a control unit. Most of
them fall into two major categories:
1. Hardwired control organization
2. Microprogrammed control organization.
The clocksignal is input to sequencing logic andforms one of the input of control unit.
The sequencing logic issues a repetitive sequence of pulses for the execution of micro-
operation(s). These timing signals control the sequence of execution of instruction and
determine what control signal needs to be applied at what time for instruction
execution. Please note a typical example, of timing sequence for execution of micro-
operations of different sub-cyclesof an instruction are given in Unit 10
IR
n-opcode bits
Decoder
2n lines (only one is selected)
T1
The Control signals
sequencing T2
Clock logic Control Unit in the sequence
of time.
Tn
A sequence of timing
signal is generated
Conditional Code or
flags
Figure 11.2: Block Diagram of Hardwired Control Unit
69
The Central
Processing Unit
11.4 WILKES CONTROL
IR
•••
One Micro-
Instruction
Decoder
(Selects only one •
•
control line (one •
micro-instruction)
based on selection
bits)
••• (Conditional
signal input)
Control
Signals
The control memory in Wilkes control was organized, as a PLA’s like matrix made of
diodes. Each horizontal line on this matrix consists of two components, viz. the
control signals and the address of the next micro-instruction. The next micro-
instruction register stores the address of the next micro-instruction to be loaded.
Please note that this register should be loaded at the falling edge of clock, that is once
the previous micro-instruction completes its execution. The next micro-instruction to
be executed is either specified by IR, after the instruction has been decoded or by the
address of the next micro-instruction as specified in the micro-instruction itself. The
70
The Control Unit
register input passes through the address decoder and the decoded line of the control
matrix is selected for generating the control signals for the processor.The Wilkes
control unit also provides handling of conditions. For example, a condition like zero
flag can be attached to the conditional signal input, which determines the micro-
instruction to be executed next. More details on the Wilkes control unit may be
studied from the further readings.
Flags from
ALU Sequencing Logic to Register for storing
generate address of the Micro-instruction Address
Clock next micro-instruction
Signal
Control Signals for generating addresses of
Control memory
consisting of micro-
instructions
Next Micro-Instruction
Register to store
Micro-instruction
71
The Central
Processing Unit
The control memory of the micro-programmed control unit stores the micro-
instructions. The micro-instruction address register stores the address of the micro-
instruction, which should be used to generate the control signal and the address of
next micro-instruction.The micro-instruction register stores the last read micro-
instruction, which is used to generate the control signals for performing micro-
operations and control signals for generating address of the next micro-instruction. . A
micro-instruction execution primarily involves the generation of desired control
signals and signals used to determine the next micro-instruction to be executed. The
sequencing logic of this control unit loads the micro-instruction address register. It
also issues a read command to control memory, which stores the micro-instrucitons.
The following functions are performed by the micro-programmed control unit:
1. The sequence logic unit specifies the address of the control memory word which
contains the micro-instruction that is to be read, in the micro-instruction address
register of the Control Memory. It also issues the READ signal to the control
memory, so that the desired micro-instruction can be read.
2. The desired control memory word containing the desired micro-instruction is
read into the micro-instruction register.
3. The micro-instruction register forms the input to the logic circuit that generates
the control signals based on the current micro-instruction. Further, this circuitry
also generates the control signals that can be used by sequencing logic to
generate the address of micro-instructionin the control memory that is to be
executed next.
4. The sequencing logic uses the control signals, as stated above, and flag register to
computethe address of the micro-instruction that is to be executed next.
As we have discussed earlier, the execute cycle steps of micro-operations are different
for all instructions in addition the addressing mode may be different. All such
information generally is stored is coded in the instruction, which is stored in the
instruction Register (IR). The IR input to Micro-Instruction Address Register for
Control Memory is used for determining the micro-instruction, which performs the
execute cycle of the instruction. The decoder after IR uses the IR register to generate
the address of the first micro-instruction in control memory for the specified
instruction in IR (Refer to Figure 11.4).
3. What will be the control signals and address of the next micro-instruction in the
Wilkes control example of Figure 11.3, if the entry address for a machine
instruction selects the branching control lineand the conditional bit value for
branch is true (assume that out of the two branching lines the bottom line is
selected when condition is true)?
...................................................................................................................................
.....................................................................................................................………
……………………………………………………………………………………..
72
The Control Unit
11.6 THE MICRO-INSTRUCTIONS
A micro-instruction, as defined earlier, is an instruction of a micro-program. It
specifies one or more micro-operations, which can be executed simultaneously. On
executing a micro-instruction, a set of control signals are generated which in turn
cause the desired micro-operation to happen.
… … …
73
The Central
Processing Unit
Let us give an example of control memory organization. Let us take a machine
instruction: Branch on zero. This instruction causes a branch to a specified main
memory address in case the result of the last ALU operation is zero, that is, the zero
flag is set. The pseudocode of the micro-program for this instruction may be written
as:
Test "Zero flag”;if SET branch to micro-code at label ZERO
Unconditional branch to micro-code of label NON-ZERO
ZERO: Microcode ofsequence of micro-operations required to be executed
to replace the PC by the effective address of the instruction
operand.
NON-ZERO: Branch to Interrupt or Fetch cycle.
Please note that in case the Zero flag is not SET, then no operation is needed, as next
instruction in sequence is to be executed, whose address is already in PC. Thus, next
instruction should be fetched.
… Branch Address
Control signalsfor
Control signals for operations using
processor’sinternal operations system Bus
Jump conditions
(unconditional zero,
overflow, indirect)
Function codes
Branch Address …
• • • • • • • • • • • •
74
The Control Unit
•••
Control
Control Decoder
Signals
signals •
• Jump conditions
(unconditional zero,
overflow, indirect)
In a vertical micro-instruction, many similar control signals can be encoded into a few
micro-instruction bits. For example, for 16 ALU operations, which may require 16
individual control bits in horizontal micro-instruction, only 4 encoded bits are needed
in vertical micro-instruction. Similarly, in a vertical micro-instruction only 3 bits are
needed to select one of the eight registers. However, these encoded bits need to be
passed from the respective decoders to get the individual control signals. This is
shown in Figure 11.6(b).
In general, a horizontal control unit is faster, yet requires wider instruction words,
whereas vertical control units, although require a decoder, are shorter in length. Most
of the systems use neither purely horizontal nor purely vertical micro-instructions
Figure 11.6(c).
The micro-instruction cycle can consist of two basic cycles: the fetch and the execute.
Here, in the fetch cycle the address of the micro-instruction is generated and this
micro-instruction is put in a register used for the address of a micro-instruction for
execution. The execution of a micro-instruction simply means generation of control
signals. These control signals may drive the processor (internal control signals) or the
system bus. The format of micro-instruction and its contents determine the complexity
of a logic module, which executes a micro-instruction.These logic module depends on
the encoding of micro-instructions, which is discussed next.
Since we are dealing with binary control signals, therefore, a ‘N’ bit micro-instruction
can represent 2N combinations of control signals.
75
The Central
Processing Unit
combinations where both these control signals are active for the same
destination are redundant.
2. A register cannot act as a source and a destination at the same time. Thus, such
a combination of control signals is redundant.
3. You can provide only one pattern of control signals at a time to ALU, making
some of the combinations redundant.
4. You can provide only one pattern of control signals at a time to the external
control bus also.
Therefore, you do not need 2N combinations. Supposeyou only need 2K (where K< N)
combinations, then you need only K encoded bits instead of N control signals. The K
bit micro-instruction is an extreme encoded micro-instruction. Let us touch upon the
characteristics of the extreme encoded and unencoded micro-instructions:
Unencoded micro-instructions
One bit is needed for each control signal; therefore, the number of bits required
in a micro-instruction is high.
It presents a detailed hardware view, as control signal need can be determined.
Since each of the control signals can be controlled individually, therefore these
micro-instructions are difficult to program. However, concurrency can be
exploited easily.
Almost no control logic is needed to decode the instruction as there is one to
one mapping of control signals to a bit of micro-instruction. Thus, execution of
micro-instruction and hence the micro-program is faster.
The unencoded micro-instruction aims at optimising the performance of a
machine.
The encoded bits needed in micro-instructions are smaller in size than that
ofunencoded micro-instructions.
It provided an aggregated view that is a higher view of the processor as only an
encoded sequence can be used for micro-programming.
The encoding helps in reduction in programming burden; however, the
concurrency may not be exploited to the fullest.
Complex control logic is needed, as decoding is a must. Thus, the execution of
a micro-instruction can have propagation delay through gates. Therefore, the
execution of micro-program takes longer time than that of an unencoded micro-
instruction.
The highly encoded micro-instructions are aimed at optimizing programming
effort.
76
|←. Unencoded→|←. Encoded→|
The Control Unit
••• n bits
Conditional
Control Signals Decoder Codes Decoder
•••
Decoder
n
2 Control Signals
Control Signals
Next instruction in sequence: Figure 11.5 shows one example of control memory. This
control organisation has micro-instructions for fetch, indirect and interrupt cycles
followed by the execute cycle. You may recall the instruction cycle, as given in Unit
10, the fetchinstruction cycle consists of the following sequence of micro-operations:
T1: MAR PC
T2: DR (MAR)
T3: PCPC +1; IR DR
One micro-instruction can be created for each timing sequence. For example, for the
stated sequence of micro-operations, three micro-instructions, one each for timing T1,
T2 and T3 would be created. These micro-instructions would be stored at address 00h,
01h and 02h. The micro-instruction at 03h would be a conditional branch instruction
to the indirect or execute cycle. Please also note that at time T3, two micro-operations
are to be performed in parallel. Thus, micro-instruction at 02h should generate control
signals, which result in the related units of the processor to perform the increment
operation on PC and transfer of DR to IR simultaneously.
The micro-instruction at 00h to 03h are to be executed in a sequence to perform the
desired operation of instruction fetch.
Branch address (conditional or unconditional): You may please note that the last
micro-instruction for instruction fetch is the conditional branch instruction. Please also
note that in Figure 11.5, the indirect cycle starts at an address 04h and the execution
cycle starts at 0Ch. Thus, the micro-instruction at address 03h would be a conditional
branch instruction, having the condition, if the indirect bit is set of not. This
conditional branch would be taken to address 0Ch (the start of execution cycle) in
case the indirect bit is CLEAR.Thus, if indirect bit is SET, then the next micro-
instruction in sequence would be executed, which will be the starting micro-
instruction of the indirect cycle that will convert the indirect operand to direct
operand. Please also note in the Figure 11.5, that the last micro-instruction of the
indirect cycle is an unconditional jump instruction to the execute cycle.
Calculated on the basis of opcode: The opcode of the Instruction Register is used to
decode the operation that is to be performed on the operands. The control unit
supports this decode operation. In the case of micro-programmed control unit, this
opcode can be used to compute the address of the first micro-instruction to be
executed to perform the operation. In Figure 11.5, the execute cycle contains the
77
The Central
Processing Unit
micro-instructions that perform jump to the micro-instruction address of the desired
operation.
We will explain it with the help of an example, assume that in Figure 11.5, the micro-
instructions related to opcodes start from micro-instruction address 00h instead of 10h
and the micro-instructions of fetch, indirect, interrupt and execute cycles starts at
micro-instruction address F0h, F4h, F8h and FCh respectively.Further, we assume that
operation of each opcode is performed using just four micro-instructions. Since the
control memory has addresses from 00h to FFh, out of which 00h to EFh(a total of
F0h) addresses are for storing micro-instructions of various opcodes. Therefore, this
control memory can contain micro-instructions for F0h/4h = 3Ch opcodes. Thus, the
possible opcodes for such a machine would be 000000002to 001110112, or 0000002 to
1110112.How these opcodes would be mapped to the related micro-instruction start
address. The following table shows these mapping:
Please note different kind of control memory and opcode organisation will make this
computational logic a complex one. In any case, you can design the related logic
circuit for calculating the micro-instruction address for a given opcode.
You must refer to further readings for more detailed information on Micro-
programmed Control Unit Design.
78
The Control Unit
d) A decoder is needed to find a branch address in the vertical micro-
instruction.
f) Status bits supplied from ALU to sequencing logic have no role to play
with the sequencing of micro-instruction.
11.8 SUMMARY
In this unit we have discussed the organization of control units. Hardwired, Wilkes
and micro-programmed control units are also discussed. The key to such control units
are micro-instruction, which can be briefly (that is types and formats) described in this
unit. The function of a micro-programmed unit, that is, micro-programmed execution,
has also been discussed. The control unit is the key for the optimised performance of a
computer. The information given in this unit can be further appended by going
through further readings.
11.9 SOLUTIONS/ANSWERS
Check Your Progress 1
1. IR, Timing Signal, Flags Register
2. The control unit issues control signals that cause execution of micro-operations in
a pre-determined sequence. This enables execution sequence of an instruction.
3. A logic circuit-based implementation of control unit.
79
The Central
Processing Unit
2. (a) False (b) False (C) False (d) False
3. Please check the Figure 11.3 from left to right and select the bottom branch line.
The control signals would be 000…00
Address of next micro-instruction would be: 100…10
3. Wilkes control typically has one address field. However, for a conditional
branching micro-instruction, it contains two addresses. The Wilkes control, in
fact, is a hardware representation of a micro-programmed control unit.
4.
Unencoded Micro instructions Highly encoded
Large number of bits Relatively less bits
Difficult to program Easy to program
No decoding logic Need decoding logic
Optimizes machine Optimizes programming effort
performances Aggregated view
Detailed hardware view
80
Reduced Instruction
UNIT 12 REDUCED INSTRUCTION SET Set Computer
Architecture
COMPUTER ARCHITECTURE
Structure Page No.
12.0 Introduction
12.1 Objectives
12.2 Introduction to RISC
12.2.1 Importance of RISC Processors
12.2.2 Reasons for Increased Complexity
12.2.3 High Level Language Program Characteristics
12.3 RISC Architecture
12.4 The Use of Large Register File
12.5 Comments on RISC
12.6 RISC Pipelining
12.7 Summary
12.8 Solutions/ Answers
12.0 INTRODUCTION
In the previous units, we have discussed the instruction set, register organization and
pipelining, and control unit organization. The trend of those years was to have a large
instruction set, a large number of addressing modes and about 16 –32 registers.
However, their existed a pool of thought which was in favour of having simplicity in
instruction set. This logic was mainly based on the type of the programs, which were
being written for various machines. This led to the development of a new type of
computers called Reduced Instruction Set Computer (RISC). In this unit, we will
discuss about the RISC machines. Our emphasis will be on discussing the basic
principles of RISC and its pipeline. You may refer to further readings for more details
on this architecture.
12.1 OBJECTIVES
After going through this unit, you should be able to:
However, this assumption is not very valid in the present era where the Main memory
is supported with Cache technology. Cache memories have reduced the difference
between the CPU and the memory speed and, therefore, an instruction execution
through a subroutine step may not be that difficult. Let us explain it with the help of
an example:
Suppose the floating-point operation ADD A, B requires the following steps
(assuming the machine does not have floating point registers) and the registers being
used for exponent are E1, E2, and EO (output); for mantissa M1, M2 and MO
(output):
84
Reduced Instruction
Load the mantissa of A in M1 Set Computer
Load the exponent of B in E2 Architecture
Load the mantissa of B in M2
Compare E1 and E2
If E1 = E2 then MO M1 + M2 and EO E1
- Normalise MO and adjust EO
- Result will be contained in MO, E1
else if E1 < E2 then find the difference = E2 – E1
- Shift Right M1, by difference
- MO M1 + M2 and EO E2
- Normalise MO and adjust EO
- Result is contained in MO, EO
else E2 < E1, if so find the difference = E1 – E2
- Shift Right M2 by difference above
- MO M1 + M2 and EO E1
- Normalise MO and adjust E1 into EO
- Result is contained in MO, EO
Move the mantissa and exponent of the results to A
Checks overflow underflow if any.
If all these steps are coded as one machine instruction, then this simple instruction will
require several instruction cycles. If this instruction is made a part of the machine
instruction set architecture as an instruction: ADDF A, B (Add floating point numbers
A and B and store results in A), then it will just be a single machine instruction. All
the above steps required will then be coded with the help of micro-operations in the
form of Control Unit Micro-Program. Thus, just one instruction cycle (although a long
one) may be needed. This cycle will require just one instruction fetch. Whereas in the
program memory instructions will be fetched.
However, faster cache memory for Instruction and data stored in registers can create
an almost similar instruction execution environment. Pipelining can further enhance
such speed. Thus, creating an instruction as above may not result in faster execution.
It is considered that the control unit of a computer be constructed using two ways;
create micro-program that execute micro-instructions or build circuits for each
instruction execution. Micro-programmed control allows the implementation of
complex architectures more cost effective than hardwired control as the cost to expand
an instruction set is very small, only a few more micro-instructions for the control
store. Thus, it may be reasoned that moving subroutines like string editing, integer to
floating point number conversion and mathematical evaluations such as polynomial
evaluation to control unit micro-program is more cost effective. However, such a
mechanism may result in slightly slower execution of commonly used instructions.
The smaller programs are advantageous because they require smaller RAM space.
Fewer instructions mean fewer instruction bytes to be fetched. But this does not
ensures that program written for CISC machines would be smaller in size than that of
85
The Central
Processing Unit
programs written for RISC machine. It may be possible that a CISC program is
smaller in number of instructions, yet the overall size, in terms of number of bytes,
may not be small. This may result from the reason that in RISC we use register
addressing and less instruction, which require fewer bits in general. In addition, you
may please note that even the compliers on CISC machine favours simpler
instructions. Let us explain this with the help of the following example:
Assume a CISC machine has a 4GB byte addressable RAM (232) and 32 registers (25).
A machine instruction consists of two operands, one of which should be a register
operand. Almost similar RISC machine having the same size of RAM and active
registers. Further, the CISC machine uses 16 bit to represent opcode and addressing
modes and the RISC machine uses 8 bit to represent opcode and addressing modes.
Figure 12.1(a) shows ADD and MOV instructions for the CISC machine. On the other
hand, RISC machine would have at least two instruction formats (First to load the data
from RAM to register or store the register to memory; or addition operation on
register. Figure 12.2 (b) shows these two instruction formats for RISC machine.
16 5 32
ADD R1 A ; R1R1+[A]
ADD A R1 ; [A]R1+[A]
MOV R1 A ; R1 [A]
MOV A R1 ; [A]R1
(a) Sample instructions of CISC
8 5 32
LDA R1 A ; R1 [A]
STR R1 A ; [A] R1
8 5 5
ADD R1 R2 ; R1 R1+R2
(b) A sample Load, Store and ADD instruction of RISC
Figure 12.1: Sample machine instructions
Figure 12.1 shows the instructions for a CISC and RISC machine. The size of CISC
ADD instruction is 53 bits, therefore, it will be stored in 7 bytes or 56 bits. The load
(LDA) and store (STR) instructions of RISC machines are 45 bits long, so would be
stored in 6 bytes. In addition, in RISC machine the ADD instruction would use 18 bits
or 3 bytes. Consider the following sequence of operations on these two machines:
C=A+B
16 5 32
MOV R1 A ; R1 [A]
ADD R1 B ; R1R1+[A]
MOV C R1 ; [C] R1
Program segment size = 7 3 = 21 Bytes
Size in bits = 53 3 = 159 bits
(a) Segment to compute C=A+B using sample CISC ISA
8 5 32
LDA R1 A ; R1 [A]
LDA R2 B ; R2 [B]
ADD R1 R2 ; R1R1+R2 (18 bits)
STR R1 C ; [C] R1
Program segment size = 6 3 + 3= 21 Bytes
Size in bits = 45 3 + 18 = 153 bits
(b) Segment to compute C=A+B using sample RISC ISA
Figure 12.2: Execution of C=A+B on hypothetical machines
86
Reduced Instruction
So, the expectation that a CISC will produce smaller programs may not be correct. In Set Computer
addition, in the present time memory is inexpensive, this potential advantage of Architecture
smaller programs is not so compelling these days.
However, even though the instructions that were closer to the high-level languages
were implemented in Complex Instruction Set Computers (CISCs), still it was hard to
exploit these instructions since the compilers were needed to find those conditions that
exactly fit those constructs. In addition, the task of optimising the generated code to
minimise code size, reduce instruction execution count, and enhance pipelining is
much more difficult with such a complex instruction set.
Another motivation for increasingly complex instruction sets was that the complex
HLL operation would execute more quickly as a single machine instruction rather
than as a series of more primitive instructions. However, because of the bias of
programmers towards the use of simpler instructions, it may turn out otherwise. CISC
makes the more complex control unit with larger microprogram control store to
accommodate a richer instruction set. This increases the execution time for simpler
instructions.
Thus, it is far from clear that the trend to complex instruction sets is appropriate. This
has led a number of groups to pursue the opposite path.
GOTO FEW
Others 1-5%
87
The Central
Processing Unit
Observations
b) CISC yields smaller programs than RISC, which improves its performance;
therefore, it is very superior to RISC.
One instruction per cycle: A machine cycle is the time taken to fetch two
operands from registers, perform the ALU operation on them and store the
result in a register. Thus, RISC instruction execution takes about the same time
as the micro-instructions on CISC machines. With such simple instruction
execution rather than micro-instructions, it can use fast logic circuits for control
unit, thus increasing the execution efficiency further.
Thus, RISC is potentially a very strong architecture. It has high performance potential
and can support VLSI implementation. Let us discuss these points in more detail.
In general, the register storage is faster than the main memory and the cache. Also the
register addressing uses much shorter addresses than the addresses for main memory
and the cache. However, the numbers of registers in a machine are less as generally
the same chip contains the ALU and control unit. Thus, a strategy is needed that will
optimize the register use and, thus, allow the most frequently accessed operands to be
kept in registers in order to minimize register-memory operations.
It may seem that a large number of registers would lead to fewer memory accesses,
however in general, about 32 registers were considered optimum. So how does this
large register file further optimize the program execution?
Since most operand references are to local variables of a function in C they are the
obvious choice for storing in registers. Some registers can also be used for global
variables. However, the problem here is that the program follows function call - return
so the local variables are related to most recent local function, in addition this call -
return expects saving the context of calling program and return address. This also
requires parameter passing on call. On return, from a call the variables of the calling
program must be restored and the results must be passed back to the calling program.
RISC register file provides a support for such call-returns with the help of register
windows. Register files are broken into an overlapping set of smaller group of
registers, as shown in Figure 12.4. Each of these register set can be used for different
function/subroutine. A function call automatically changes the set being used. The use
from one fixed size window of registers to another, rather than saving registers in
memory as done in CISC. Windows for adjacent procedures are overlapped. This
feature allows parameter passing without moving the variables at all. The following
figure tries to explain this concept:
Assumptions:
Register file contains 138 registers. Let them be called by register number 0 – 137.
Further, a program has three functions, viz. main, sorting and Xchange. The operating
90
Reduced Instruction
system calls function main (fmain) which calls function sorting (fsorting) and function Set Computer
sorting calls function Xchage (fXchage). Architecture
Functioning of the registers: at any point of time the global registers and one set of
register being used for a specific function would be active for execution of the
program. Thus, for programming purpose there may be only 32 registers. Window in
the above example although has a total of 138 registers. This window consists of:
But what is the maximum function calls nesting can be allowed through RISC? Let us
describe it with the help of a circular buffer diagram, technically the registers as above
have to be circular in the call return hierarchy.
This organization is shown in the following figure. The register buffer is filled as
function A called function B, function B called function C, function C called function
91
The Central
Processing Unit
D. The function D is the current function. The current window pointer (CWP) points
to the register window of the most recent function (function D in this case). Any
register references by a machine instruction is added with the contents of CWP to
compute the address of the registers holding the operands for the executing function.
The other register, i.e., the saved window pointer registers points to the most recent
register window that has be saved in the memory of the computer. This action will be
needed if a further call is made and there is no space for that call. If function D now
calls function E arguments for function E are placed in D’s temporary registers
indicated by D temp and the CWP is advanced by one window.
W1
W0
Sav
d
W
(F)
Rest
(E)
D.in
T
D.lo
W
W2
W3
Current
window
i
Call Return
Assume that now the function E calls function F. This call cannot be serviced as the
circular buffer already has allowed number of function call, unless you free space
equivalent to exactly one window. This condition can easily be determined as current
window pointer on incrementing will be equal to saved window pointer. Now, we
need to create space; how can we do it? The simplest way will be to swap FA register
to memory and use that space. Thus, an N window register file can support N –1 level
of function calls.
Reduced Instruction
Set Computer
Thus, the register file, organized in the form as above, is a small fast register buffer Architecture
that holds most of the variables that are likely to be used heavily. From this point of
view the register file acts almost like a cache memory. So let us find how the two
approaches are different.
Characteristics of large-register-file and cache organizations
The basic difference is due to addressing overhead of the two approaches. Small
Register (R) address are smaller than the Cache reference, which are generated
from a long memory address. Thus, for simple variables access register file is
superior to cache memory. However, even in RISC computer, performance can
be enhanced by the addition of instruction cache.
a. RISC has a large register file so that more variables can be stored in register
or longer periods of time.
A
b. Only global variables are stored in registers. B
The studies have shown that it is not so due to the following reasons:
If an instruction can be executed in more ways than one, then more cases must be
considered. For it the compiler writer needed to balance the speed of the compilers to
get good code. In CISCs compilers need to analyze the potential usage of all available
instruction, which is time consuming. Thus, it is recommended that there is at least
one good way to do something. In RISC, there are few choices; for example, if an
operand is in memory, it must first be loaded into a register. Thus, RISC requires
simple case analysis, which means a simple compiler, although more machine
instructions will be generated in each case.
RISC is tailored for C language and will not work well with other high level
languages.
But the studies of other high-level languages found that the most frequently executed
operations in other languages are also the same as simple HLL constructs found in C,
for which RISC has been optimized. Unless a HLL changes the paradigm of
programming you may get similar result.
The good performance is due to the overlapped register windows; the reduced
instruction set has nothing to do with it.
Certainly, a major portion of the speed is due to the overlapped register windows of
the RISC that provide support for function calls. However, please note this register
windows are possible due to reduction in control unit size. In addition, the control is
simple in RISC than CISC, thus further helping the simple instructions to execute
faster.
The data processing instruction would just require two pipeline stages:
FI: Fetch the Instruction from a memory address.
94
Reduced Instruction
EI: Execute the instruction on register operands, result is stored in register. Set Computer
Architecture
Let us explain pipelining in RISC with an example program that uses instruction set
given in Figure 12.1(b), with few additional instruction MUL, which use the same
format as ADD instruction. The program segment implements the following
expression:
Z = (A + B) C
(1) LDA R1, A (Load memory location A to R1)
(2) LDA R2, B (Load memory location B to R2)
(3) ADD R1, R2 (R1 R1 + R2)
(4) LDA R2, C (Load memory location C to R2)
(5) MUL R1, R2 (R1 R1 R2)
(6) STR R1, Z (Store result in in memory location Z)
As discussed earlier, each of the instructions (1), (2), (4) and (6) will be processed in
three stages and each of the instructions (3) and (5) will be processed in two stages.
Assuming that one stage is executed in one clock cycle, a total of 43+22=16 clock
cycles would be required if these instructions are not executed without using an
instruction pipeline. However, if a pipeline that allows various overlapping stages to
execute in parallel would result in execution of these instructions in only 8 clock
cycles (Please refer to Figure 12.6)
You may please note that the pipeline as shown above suffers from data dependencies
at instruction (3) and (5). In both these cases, the preceding data access instruction
must complete to allow the execution of these instructions. In addition, an instruction
pipeline may suffer due to presence of branch instruction penalties. Next, we discuss
about how such problems can be minimized
Optimization of Pipelining
RISC machines can employ a very efficient pipeline scheme because of the simple
and regular instructions. Like all other instruction pipelines RISC pipeline suffer from
the problems of data dependencies and branching instructions. The data dependency
problem can be handled using an optimizing compiler, which can reschedule some of
the instructions. For example, in the given program segment the, following changes
may minimize the data dependency:
Interchange instruction (3) and instruction (4), in addition, instead of loading
memory location C in R2 use R3 register. Accordingly, in instruction (5)
change R2 to R3. The new instruction pipeline is shown in Figure 12.7.
95
The Central
Processing Unit
(1) LDA R1, A FI EI TD
(2) LDA R2, B FI EI TD
(4) LDA R3, C FI EI TD
(3) ADD R1, R2 FI EI
(5) MUL R1, R3 FI EI
(6) STR R1, Z FI EI TD
Clock Cycles 1 2 3 4 5 6 7 8
Time = 8 units
Figure 12.7: Pipeline without data dependencies
The second problem of Branch instruction penalty can be optimised in RISC by using
several techniques. For example, consider that a conditional branch instruction: “In
case after the computation, the value R1 register is zero, then instruction 6 is not
executed and the program jumps to instruction 7” exists after instruction 5. This
modified instruction sequence is shown in Figure 12.8.
The problem with this instruction cycle is that the instruction 6 has already been
fetched to the pipeline, therefore, in case R1 has zero value the pipeline should be
emptied, i.e., instruction 6 will be removed from the pipeline. After that the
instruction (7) should be stated again. There are two possible solutions to this
problem. First the fetch of instruction (6) may be delayed for a cycle, so that the
decision, whether the branch is to be taken or not would be made. Based on that the
next instruction would be fetched. This is shown in Figure 12.9.
Another way to handle the branch penalty is by moving the branch instruction, so that
the branch decision is known prior to the execution of next instruction after the branch
instruction. Figure 12.10 shows this solution.
96
Reduced Instruction
(1) LDA R1, A FI EI TD Set Computer
(2) LDA R2, B FI EI TD Architecture
(4) LDA R3, C FI EI TD
(3) ADD R1, R2 FI EI
(5.b) If R1=0 or R3 = 0
FI EI
JMP to (7)
(5.a) MUL R1, R3 FI EI
(6) or STR R1, Z FI EI TD
(7) ADD R2, R3 FI EI
Clock Cycles 1 2 3 4 5 6 7 8 9
Figure 12.10: Pipeline with optimised Brach penalty
Please note that the instruction at (5.b) shows a hypothetical instruction that checks
two conditions at a time. The purpose here is to demonstrate the concept, therefore,
such instruction has been shown. Please also note the change in the sequence of
instructions (5.a) and (5.b). Please also note that decision to take the branch has been
taken at clock cycle 6, therefore, at lock cycle 7, it will be known, which of the two
instructions (6) or (7) is to be fetched.
Finally, let us summarize the basic differences between CISC and RISC architecture.
The following table lists these differences:
CISC RISC
1. In general, large number of 1. Relatively fewer instructions than
instructions CISC
97
The Central
Processing Unit
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
3. What are the problems of RISC architecture? How are these problems
compensated such that there is no reduction in performance?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
12.7 SUMMARY
RISC represents new styles of computers that take less time to build yet provide a
higher performance. While traditional machines support HLLs with instruction that
look like HLL constructs, this machine supports the use of HLLs with instructions that
HLL compilers can use efficiently. The loss of complexity has not reduced RISC’s
functionality; the chosen subset, especially when combined with the register window
scheme, emulates more complex machines. Thus, we see that because of all the
features discussed above, the RISC architecture should prove to better for certain
applications.
In this unit we have also covered the details of the pipelined features of the RISC
architecture, which support this architecture to show better performance.
2.
a) False
b) False
c) False
98
C.loc
Reduced Instruction
Set Computer
Architecture
Incoming Outgoing No. of Local
Parameter Parameter Registers
Registers Registers
1 1 22
2 2 20
3 3 18
4 4 16
5 5 14
6 6 12
7 7 10
8 8 8
9 9 6
10 10 4
11 11 2
12 12 0
99