Unit3 Coa Notes
Unit3 Coa Notes
Control Unit: Instruction types, formats, instruction cycles and sub cycles (fetch and execute
etc), micro operations, execution of a complete instruction. Program Control, Reduced
Instruction Set Computer, Pipelining. Hardwire and micro programmed control: micro
programme sequencing, concept of horizontal and vertical microprogramming.
when the assembler processes an Instruction, it converts the instruction from its mnemonics form to
standard machine language format called the "Instruction format". In the process of conversion the
assembler must determine the type of instruction, convert symbolic labels and explicit notation to a
base/displacement format, determine the lengths of certain operands and parse any literal and constants.
An instruction format defines layout of bits of an instruction, in terms of its constituent parts.
An instruction format must include an opcode and implicitly or explicitly, zero or more operands.
Format must, implicitly or explicitly, indicate addressing mode for each operand.
Opcode-Field Address-Field
Example –
IR register contains = 0001XXXXXXXXXXXX, i.e. ADD after fetching and decoding of
instruction we find out that it is a memory reference instruction for ADD operation.
Hence, DR ← M[AR]
AC ← AC + DR, SC ← 0
Register Reference – These instructions perform operations on registers rather than memory
addresses. The IR(14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 0
(differentiates it from input/output instructions). The rest 12 bits specify register operation.
Example –
IR register contains = 0111001000000000, i.e. CMA after fetch and decode cycle we find out
that it is a register reference instruction for complement accumulator.
Hence, AC ← ~AC
Input/Output – These instructions are for communication between computer and outside
environment. The IR (14 – 12) is 111 (differentiates it from memory reference) and IR(15) is 1
(differentiates it from register reference instructions). The rest 12 bits specify I/O operation.
Example –
IR register contains = 1111100000000000, i.e. INP after fetch and decode cycle we find out
that it is an input/output instruction for inputing character. Hence, INPUT character from
peripheral device.
The set of instructions incorporated in16 bit IR register are:
1. Arithmetic, logical and shift instructions (and, add, complement, circulate left, right, etc)
2. To move information to and from memory (store the accumulator, load the accumulator)
3. Program control instructions with status conditions (branch, skip)
4. Input output instructions (input character, output character)
Computer perform task on the basis of instruction provided. A instruction in computer comprises
of groups called fields. These field contains different information as for computers every thing is
in 0 and 1 so each field has different significance on the basis of which a CPU decide what so
perform. The most common fields are:
• Operation field which specifies the operation to be performed like addition.
• Address field which contain the location of operand, i.e., register or memory location.
• Mode field which specifies how operand is to be founded.
A instruction is of various length depending upon the number of addresses it contain. Generally
CPU organization is of three types on the basis of number of address fields:
1. Single Accumulator organization
2. General register organization
3. Stack organization
In first organization operation is done involving a special register called accumulator. In second
on multiple registers are used for the computation purpose. In third organization the work on
stack basis operation due to which it does not contain any address field. It is not necessary that
only a single organization is applied a blend of various organization is mostly what we see
generally.
On the basis of number of address instruction are classified as:
Note that we will use X = (A+B)*(C+D) expression to showcase the procedure.
A stack based computer do not use address field in instruction.To evaluate a expression first it is
converted to revere Polish Notation i.e. Post fix Notation.
Expression: X = (A+B) * (C+D)
Postfixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location
1 PUSH A TOP = A
2 PUSH B TOP = B
4 PUSH C TOP = C
5 PUSH D TOP = D
Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location
1
LOAD A AC = M[A]
2 ADD B AC = AC + M[B]
3 STORE T M[T] = AC
4 LOAD C AC = M[C]
5 ADD D AC = AC + M[D]
6 MUL T AC = AC * M[T]
7 STORE X M[X] = AC
This is common in commercial computers. Here two address can be specified in the instruction.
Unlike earlier in one address instruction where the result was stored in accumulator ,here result
can be stored at different location rather than just accumulator, but require more number of bit to
represent address.
3 MOV R2, C R2 = C
4 ADD R2, D R2 = R2 + D
5 MUL R1, R2 R1 = R1 * R2
6 MOV X, R1 M[X] = R1
This has three address field to specify a register or a memory location. Program created are much
short in size but number of bits per instruction increase. These instructions make creation of
program much easier but it does not mean that program will run much faster because now
instruction only contain more information but each micro operation (changing content of
register, loading address in address bus etc.) will be performed in one cycle only.
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
Instruction Cycle
The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-
execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the
computer has shut down in order to process instructions. It is composed of three main stages: the
fetch stage, the decode stage, and the execute stage.
This is a simple diagram illustrating the individual stages of the fetch-decode-execute cycle.
In simpler CPUs, the instruction cycle is executed sequentially, each instruction being processed
before the next one is started. In most modern CPUs, the instruction cycles are instead
executed concurrently, and often in parallel, through an instruction pipeline: the next instruction
starts being processed before the previous instruction has finished, which is possible because the
cycle is broken up into separate steps
Role of components
The program counter (PC) is a special register that holds the memory address of the next
instruction to be executed. During the fetch stage, the address stored in the PC is copied into
the memory address register (MAR) and then the PC is incremented in order to "point" to the
memory address of the next instruction to be executed. The CPU then takes the instruction at the
memory address described by the MAR and copies it into the memory data register (MDR). The
MDR also acts as a two-way register that holds data fetched from memory or data waiting to be
stored in memory (it is also known as the memory buffer register (MBR) because of this).
Eventually, the instruction in the MDR is copied into the current instruction register (CIR) which
acts as a temporary holding ground for the instruction that has just been fetched from memory.
During the decode stage, the control unit (CU) will decode the instruction in the CIR. The CU
then sends signals to other components within the CPU, such as the arithmetic logic unit
(ALU) and the floating point unit (FPU). The ALU performs arithmetic operations such as
addition and subtraction and also multiplication via repeated addition and division via repeated
subtraction. It also performs logic operations such as AND, OR, NOT, and binary shifts as well.
The FPU is reserved for performing floating-point operations.
Summary of stages
Each computer's CPU can have different cycles based on different instruction sets, but will be
similar to the following cycle:
1. Fetch Stage: The next instruction is fetched from the memory address that is currently
stored in the program counter and stored into the instruction register. At the end of the
fetch operation, the PC points to the next instruction that will be read at the next cycle.
2. Decode Stage: During this stage, the encoded instruction presented in the instruction
register is interpreted by the decoder.
o Read the effective address: In the case of a memory instruction (direct or
indirect), the execution phase will be during the next clock pulse. If the instruction
has an indirect address, the effective address is read from main memory, and any
required data is fetched from main memory to be processed and then placed into data
registers (clock pulse: T3). If the instruction is direct, nothing is done during this
clock pulse. If this is an I/O instruction or a register instruction, the operation is
performed during the clock pulse.
3. Execute Stage: The control unit of the CPU passes the decoded information as a
sequence of control signals to the relevant function units of the CPU to perform the
actions required by the instruction, such as reading values from registers, passing them to
the ALU to perform mathematical or logic functions on them, and writing the result back
to a register. If the ALU is involved, it sends a condition signal back to the CU. The
result generated by the operation is stored in the main memory or sent to an output
device. Based on the feedback from the ALU, the PC may be updated to a different
address from which the next instruction will be fetched.
4. Repeat Cycle
Initiation
The cycle begins as soon as power is applied to the system, with an initial PC value that is
predefined by the system's architecture (for instance, in Intel IA-32 CPUs, the predefined PC
value is 0xfffffff0 ). Typically, this address points to a set of instructions in read-only
memory (ROM), which begins the process of loading (or booting) the operating system
Fetch stage
The fetch step is the same for each instruction:
1. The CPU sends the contents of the PC to the MAR and sends a read command on the
address bus
2. In response to the read command (with address equal to PC), the memory returns the data
stored at the memory location indicated by PC on the data bus
3. The CPU copies the data from the data bus into its MDR (also known as MBR; see
section Role of components above)
4. A fraction of a second later, the CPU copies the data from the MDR to the instruction
register for instruction decoding
5. The PC is incremented so that it points to the next instruction. This step prepares the CPU
for the next cycle.
The control unit fetches the instruction's address from the memory unit.
Decode stage
The decoding process allows the CPU to determine what instruction is to be performed so that
the CPU can tell how many operands it needs to fetch in order to perform the instruction. The
opcode fetched from the memory is decoded for the next steps and moved to the appropriate
registers. The decoding is done by the CPU's Control Unit.
Reading the effective address
This step evaluates which type of operation is to be performed. If it is a memory operation, the
computer checks whether it's a direct or indirect memory operation:
Execute stage
The function of the instruction is performed. If the instruction involves arithmetic or logic, the
ALU is utilized. This is the only stage of the instruction cycle that is useful from the perspective
of the end user. Everything else is overhead required to make the execute step happen.
Micro operations
Arithmetic Micro-operations
In general, the Arithmetic Micro-operations deals with the operations performed on numeric data
stored in the registers.
1. Addition
2. Subtraction
3. Increment
4. Decrement
5. Shift
Logical shift: A logical shift is one that transfers 0 through the serial input. We will adopt the
symbols shl and shr for logical shift-left and shift-right rnicrooperations. For example:
R1 ← shl R1
R2 ← shr R2
are two rnicrooperations that specify a 1-bit shift to the left of the content of register R 1 and a
1-bit shift to the right of the content of register R2. The register symbol must be the same on
both sides of the arrow. The bit transferred to the end position through the serial input is
assumed to be 0 during a logical shift.
The circular shift (also known as a rotate operation) circulates the bits of the register around
the two ends without loss of information. This is accomplished by connecting the serial output
of the shift register to its serial input. We will use the symbols cil and cir for the circular shift left
and right, respectively. The symbolic notation for the shift rnicrooperations is shown in Table 4-
7.
An arithmetic shift is a rnicrooperation that shifts a signed binary number to the left or right. An
arithmetic shift-left multiplies a signed binary number by 2. An arithmetic shift-right divides the
number by 2. Arithmetic shifts must leave the sign bit unchanged because the sign of the number
remains the same
when it is multiplied or divided by 2. The leftmost bit in a register holds the sign bit, and the
remaining bits hold the number. The sign bit is 0 for positive and 1 for negative. Negative
numbers are in 2's complement form. Figure 4-11 shows a typical register of n bits. Bit Rn- 1 in
the leftmost position holds the sign bit. Rn-2 is the most significant bit of the number and Ro is
the least significant bit. The arithmetic shift-right leaves the sign bit unchanged and shifts the
number (including the sign bit) to the right. Thus Rn-1 remains the same, Rn-2 receives the bit
from Rn-1 and so on for the other bits in the register. The bit in Ro is lost.
The arithmetic shift-left inserts a 0 into R0 and shifts all other bits to the left. The initial bit of
Rn-1 is lost and replaced by the bit from Rn-2. A sign reversal occurs if the bit in Rn-1 changes in
value after the shift. This happens if the multiplication by 2 causes an overflow. An overflow
occurs after an arithmetic shift left if initially, before the shift, Rn-1 is not equal to Rn-2. An
overflow flip-flop Vs can be used to detect an arithmetic shift-left overflow.
Vs = Rn-1 + Rn-2
If Vs = 0, there is no overflow, but if Vs = I, there is an overflow and a sign reversal after the
shift. Vs must be transferred into the overflow flip-flop with the same clock pulse that shifts the
register.
Hardware implementation
A possible choice for a shift unit would be a bidirectional shift register with parallel load (see
Fig. 2-9). Information can be transferred to the register in parallel and then shifted to the right or
left. In this type of configuration, a clock pulse is needed for loading the data into the register,
and another pulse is needed to initiate the shift. In a processor unit with many registers it is
more efficient to implement the shift operation with a combinational circuit. In this way the
content of a register that has to be shifted is first placed onto a common bus whose output is
connected to the combinational shifter, and the shifted number is then loaded back into the
register. This requires only one clock pulse for loading the shifted value into the register.
shifter:
A combinational circuit shifter can be constructed with multiplexers as shown in Fig. 4-12. The
4-bit shifter has four data inputs, A0 through A3 and four data outputs, H0 through H3. There are
two serial inputs, one for shift left (IL) and the other for shift right (h).
When the selection input S = 0, the input data are shifted right (down in the diagram). When S =
1, the input data are shifted left (up in the diagram). The function table in Fig. 4-12 shows which
input goes to each output after the shift. A shifter with n data inputs and outputs requires n
multiplexers. The two serial inputs can be controlled by another multiplexer to provide the three
possible types of shifts.
The MIR in the following figure shows the micro-instruction that performs "MOV R1, (R2)" -
move data in R1 to memory location whose address is given in R2. The figure shows the
steps/events/things that will happen in the data-path when this micro-instruction is executed:
What is The Difference Between RISC and CISC Architecture
The architecture of the Central Processing Unit (CPU) operates the capacity to function from
“Instruction Set Architecture” to where it was designed. The architectural design of the CPU is
Reduced instruction set computing (RISC) and Complex instruction set computing (CISC). CISC
has the capacity to perform multi-step operations or addressing modes within one instruction set.
It is the CPU design where one instruction works several low-level acts. For instance, memory
storage, loading from memory, and an arithmetic operation. Reduced instruction set computing is
a Central Processing Unit design strategy based on the vision that basic instruction set gives a
great performance when combined with a microprocessor architecture which has the capacity to
perform the instructions by using some microprocessor cycles per instruction. This article
discusses the difference between the RISC and CISC architecture. The hardware part of the Intel
is named as Complex Instruction Set Computer (CISC), and Apple hardware is Reduced
Instruction Set Computer (RISC).
Difference-between-
RISC-and-CISC
Before we discuss the differences between the RISC and CISC architecture let us know about the
concepts of RISC and CISC
What is RISC?
A reduced instruction set computer is a computer which only uses simple commands that can be
divided into several instructions which achieve low-level operation within a single CLK cycle, as
its name proposes “Reduced Instruction Set”.
RISC Architecture
The term RISC stands for ‘’Reduced Instruction Set Computer’’. It is a CPU design plan based
on simple orders and acts fast.
RISC Architecture
This is small or reduced set of instructions. Here, every instruction is expected to attain very
small jobs. In this machine, the instruction sets are modest and simple, which help in comprising
more complex commands. Each instruction is of the similar length; these are wound together to
get compound tasks done in a single operation. Most commands are completed in one machine
cycle. This pipelining is a crucial technique used to speed up RISC machine
What is CISC?
A complex instruction set computer is a computer where single instructions can perform
numerous low-level operations like a load from memory, an arithmetic operation, and a memory
store or are accomplished by multi-step processes or addressing modes in single instructions, as
its name proposes “Complex Instruction Set ”.
CISC Architecture
The term CISC stands for ‘’Complex Instruction Set Computer’’. It is a CPU design plan based
on single commands, which are skilled in executing multi-step operations.
CISC Architecture
CISC computers have small programs. It has a huge number of compound instructions, which
takes a long time to perform. Here, a single set of instruction is protected in several steps; each
instruction set has additional than 300 separate instructions. Maximum instructions are finished
in two to ten machine cycles. In CISC, instruction pipelining is not easily implemented.
4. It has no memory unit and uses a separate hardware to 4. It has a memory unit to implement
implement instructions.. complex instructions.
6. The instruction set is reduced i.e. it has only a few instructions 6. The instruction set has a variety of
in the instruction set. Many of these instructions are very different instructions that can be used
primitive. for complex operations.
9. Multiple register sets are present 9. Only has a single register set
11. The complexity of RISC lies with the compiler that executes 11. The complexity lies in the
the program microprogram
12. Execution time is very less 12. Execution time is very high
13. Code expansion can be a problem 13. Code expansion is not a problem
14. Decoding of instructions is simple. 14. Decoding of instructions is complex
Pipelining.
Non-Pipelined Execution-
In non-pipelined architecture,
• All the instructions of a program are executed sequentially one after the other.
• A new instruction executes only after the previous instruction has executed
completely.
• This style of executing the instructions is highly inefficient.
Example-
2. Pipelined Execution-
In pipelined architecture,
• Multiple instructions are executed parallely.
• This style of executing the instructions is highly efficient.
nstruction Pipelining-
Instruction pipelining is a technique that implements a form of parallelism called as instruction level
parallelism within a single processor.
• A pipelined processor does not wait until the previous instruction has executed
completely.
• Rather, it fetches the next instruction and begins its execution.
Pipelined Architecture-
In pipelined architecture,
• The hardware of the CPU is split up into several functional units.
• Each functional unit performs a dedicated task.
• The number of functional units may vary from processor to processor.
• These functional units are called as stages of the pipeline.
• Control unit manages all the stages using control signals.
• There is a register associated with each stage that holds the data.
• There is a global clock that synchronizes the working of all the stages.
• At the beginning of each clock cycle, each stage takes the input from its register.
• Each stage then processes the data and feed its output to the register of the next stage.
Four-Stage Pipeline-
At stage-01,
• First functional unit performs instruction fetch.
• It fetches the instruction to be executed.
Stage-02:
At stage-02,
• Second functional unit performs instruction decode.
• It decodes the instruction to be executed.
Stage-03:
At stage-03,
• Third functional unit performs instruction execution.
• It executes the instruction.
Stage-04:
At stage-04,
• Fourth functional unit performs write back.
• It writes back the result so obtained after executing the instruction.
Execution-
In pipelined architecture,
• Instructions of the program execute parallely.
• When one instruction goes from nth stage to (n+1)th stage, another instruction goes from
(n-1) stage to nth stage.
th
Phase-Time Diagram-
Time taken to execute three instructions in four stage pipelined architecture = 6 clock
cycles.
NOTE-
In non-pipelined architecture,
Time taken to execute three instructions would be
= 3 x Time taken to execute one instruction
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-pipelined
execution.
Hardwired control unit generates the control Micrprogrammed control unit generates the
signals needed for the processor using logic control signals with the help of micro
compared to microprogrammed control unit This is slower than the other as micro
as the required control signals are generated instructions are used for generating signals
Difficult to modify as the control signals that Easy to modify as the modification need to
need to be generated are hard wired be done only at the instruction level
More costlier as everything has to be realized micro instructions are used for generating
Only limited number of instructions are used Control signals for many instructions can be
Used in computer that makes use of Reduced Used in computer that makes use of
Basically, control unit (CU) is the engine that runs the entire functions of a computer with the
help of control signals in the proper sequence. In the micro-programmed control unit approach,
the control signals that are associated with the operations are stored in special memory units. It is
convenient to think of sets of control signals that cause specific micro-operations to occur as
being “microinstructions”. The sequences of microinstructions could be stored in an internal
“control” memory.
Micro-programmed control unit can be classified into two types based on the type of Control
Word stored in the Control Memory, viz., Horizontal micro-programmed control unit and
Vertical micro-programmed control unit.
• In Horizontal micro-programmed control unit, the control signals are represented in the
decoded binary format, i.e., 1 bit/CS. Here ‘n’ control signals require n bit encoding. On
the other hand.
• In Vertical micro-programmed control unit, the control signals are represented in the
encoded binary format. Here ‘n’ control signals require log2n bit encoding.
Comparison between Horizontal micro-programmed control unit and Vertical micro-
programmed control unit:
HORIZONTAL Μ-PROGRAMMED
CU VERTICAL Μ-PROGRAMMED CU
If degree is n, then n Control Signals It allows low degree of parallelism i.e., degree
It is less flexible than Vertical micro- It is more flexible than Horizontal micro-
the control field attaches to a control translates this code into individual control
HORIZONTAL Μ-PROGRAMMED
CU VERTICAL Μ-PROGRAMMED CU
line. signals.
Horizontal micro-programmed
control unit makes less use of ROM Vertical micro-programmed control unit makes
encoding than vertical micro- more use of ROM encoding to reduce the
Example: Consider a hypothetical Control Unit which supports 4 k words. The Hardware
contains 64 control signals and 16 Flags. What is the size of control word used in bits and control
memory in byte using:
a) Horizontal Programming
b) Vertical programming
Solution:
a)For Horizontal
a)For Vertical
6 bits for 64 signals i.e log264