DLCA - Solved - Question Bank-1
DLCA - Solved - Question Bank-1
Q3) Explain Different Types Of Distributed And Centralized Bus Arbitration Methods.
In distributed bus arbitration, the responsibility for resolving bus conflicts is distributed among
the devices
Connected to the bus. Each device has the ability to arbitrate for bus control independently.
Some common
1) Daisy Chain: In this method, devices are connected in a linear fashion, forming a daisy
chain. When a
Device needs access to the bus, it sends a request signal to the next device in the chain. The
request signal
Propagates through the chain until it reaches a device that can grant the access. This device then
asserts a
2) Token Passing: This method uses a special control token that circulates among the
devices connected to
The bus. Only the device holding the token has the right to access the bus. When a device wants
to access
The bus, it waits until the token arrives, gains control, performs its operation, and then passes the
token to
3) Random Selection: In this method, devices contend for bus control randomly. Each
device generates a
Random number and compares it with the numbers generated by other devices. The device with
the lowest
Or highest number (depending on the protocol) gains control of the bus. Random selection
provides
Fairness among devices, but it can also result in unpredictable bus access times.
In centralized bus arbitration, there is a dedicated controller or arbiter responsible for granting
bus access to
Devices. The arbiter receives requests from devices and makes the decision on which device
should be granted
Access to the bus. Some common centralized bus arbitration methods include:
1) Priority-Based: In this method, each device is assigned a priority level. The arbiter
grants bus access to
The device with the highest priority among the requesting devices. Priority levels can be fixed or
2) Round Robin: In this method, the arbiter grants bus access to devices in a sequential
manner. Each device
gets a turn to access the bus, and the arbiter cycles through the devices in a fixed order. This
method
ensures fairness as each device gets an equal opportunity to access the bus.
1. Fetch: In this phase, the processor fetches the instruction from memory. The program
counter (PC) holds the
address of the next instruction to be fetched. The processor reads the instruction from memory
using the address provided by the PC and stores it in an instruction register (IR).
2. Decode: In this phase, the processor decodes the fetched instruction. It interprets the
opcode (operation code) portion of the instruction to determine the type of operation to
be performed and identifies the operands or
registers involved.
3. Execute: In this phase, the processor performs the operation specified by the
instruction. It may involve calculations, data manipulation, logical operations, or
control flow modifications based on the decoded instruction.
4. Store: In this phase, the result of the execution is stored in the appropriate location. It
could be a register, memory location, or an I/O device, depending on the instruction and
the architecture of the computer system
Q8) Explain Different Addressing Modes.
1. Immediate addressing: The operand is a constant value or immediate data
directly embedded within the instruction itself. It is useful for operations that
involve constants or immediate values.
2. Register addressing: The operand is the content of a specific register. This mode allows
direct access to registers in the processor, which are typically fast storage locations.
3. Direct addressing: The operand is the actual memory address where the data is
stored. The processor directly accesses the memory location specified in the
instruction.
4. Indirect addressing: The operand is a memory address that contains the actual memory
address where the data is stored. The processor accesses the memory location indirectly
by first obtaining the address from the specified memory location.
5. Indexed addressing: The operand is calculated by adding a constant offset or index
value to a base address. It is commonly used in array or table access, where the index
determines the position of the element.
6. Relative addressing: The operand is a memory address calculated relative to the current
program counter (PC) or instruction pointer. It is often used in branch instructions to
specify the target address relative to the current instruction.
7. Stack addressing: The operand is implicitly specified from the top of the stack. It is
commonly used in stack- based architectures, where operands are pushed onto and
popped from the stack.
8. Base/Offset addressing: The operand is obtained by adding a constant offset to a
base address specified in a register or memory location. It is useful for accessing
data structures like arrays, records, or objects.
9. Indirect indexed addressing: This mode combines indirect and indexed addressing.
The operand is obtained by first obtaining a memory address indirectly and then
adding an index value to that address.
A control unit whose binary control variables are stored in memory is known as a
microprogrammed control unit.
In Microprogrammed Control, the control information is stored in the control memory and
is programmed to initiate the required sequence of micro-operations.
By creating a definite collection of signals at each system clock beat, a controller generates
the instructions to be executed. Each of these output signals causes a single micro-
operation, such as register transfer. As a result, defined micro-operations that can be
preserved in memory are formed from the sets of control signals.
Each bit in the microinstruction is connected to a single control signal. The control signal is
active when its bit is set. The control signal becomes inactive when it is cleared. The
internal control memory can store a sequence of these microinstructions. A microprogram-
controlled computer's control unit is a computer within a computer.
The block diagram of a Microprogrammed Control Organization is shown below.
Encoder
An Encoder is a device that converts the active data signal into a coded message format or it
is a device that converts analogue signal to digital signals. It is a combinational circuit, that
converts binary information in the form of 2N input lines into N output lines which represent
N bit code for the input. When an input signal is applied to an encoder the logic circuitry
involved within it converts that particular input into coded binary output.
To decode is to perform the reverse operation: converting a code back into an unambiguous
form code and the device which performs this operation is termed a Decoder.
Decoder
A decoder is also a combinational circuit as an encoder but its operation is exactly reverse as
that of the encoder. A decoder is a device that generates the original signal as output from
the coded input signal and converts n lines of input into 2n lines of output. An AND gate can
be used as the basic decoding element because it produces a high output only when all inputs
are high.
Decoder
Encoder vs Decoder
ENCODER DECODER
Encoder circuit basically converts the Decoder performs reverse operation and
applied information signal into a coded recovers the original information signal from
digital bit stream. the coded bits.
In case of encoder, the applied signal is the Decoder accepts coded binary data as its input.
active signal input.
The output lines for an encoder is n. The output lines of an decoder is 2n.
The encoder generates coded data bits as The decoder generates an active output signal
its output. in response to the coded data bits.
The encoder circuit is installed at the The decoder circuit is installed at the receiving
transmitting end. side.
OR gate is the basic logic element used in AND gate along with NOT gate is the basic
it. logic element used in it.
Von-Neumann computer architecture design was proposed in 1945.It was later known as
Von-Neumann architecture.
It is also known as ISA (Instruction set architecture) computer and is having three basic
units:
• The Central Processing Unit (CPU)
The central processing unit is defined as the it is an electric circuit used for the
executing the instruction of computer program.
1.Control Unit(CU)
3.variety of Registers
• Control Unit –
A control unit (CU) handles all processor control signals. It directs all input and
output flow, fetches code for instructions, and controls how data moves around
the system.
1. Registers – Registers refer to high-speed storage areas in the CPU. The data
processed by the CPU are fetched from the registers. There are different types of
registers used in architecture :-
• Program Counter (PC): Keeps track of the memory location of the next
instructions to be dealt with. The PC then passes this next address to the
Memory Address Register (MAR).
2. Buses – Data is transmitted from one part of a computer to another, connecting all
major internal components to the CPU and memory, by the means of Buses.
Types:
o Data Bus: It carries data among the memory unit, the I/O devices, and
the processor.
• Address Bus: It carries the address of data (not the actual data) between
memory and processor.
a. Control Bus: It carries control commands from the CPU (and status
signals from other devices) in order to control and coordinate all the
activities within the computer.
3. Input/Output Devices – Program or data is read into main memory from the input
device or secondary storage under the control of CPU input instruction. Output
devices are used to output information from a computer. If some results are
evaluated by the computer and it is stored in the computer, then with the help of
output devices, we can present them to the user.
This architecture is very important and is used in our PCs and even in Super Computers.
The translation between the logical address space and the physical memory is known
as Memory Mapping. To translate from logical to a physical address, to aid in memory
protection also to enable better management of memory resources are objectives
of memory mapping.
During cache mapping, the block is not brought from the main memory but the main
memory block is simply copied to the cache. Cache memory generally tends to operate in
some different configurations,
1. Direct mapping
2. Fully associative mapping
3. Set associative mapping
1) Direct Mapping
In Direct mapped cache memory, each block mapped to exactly one location in cache
memory.
A particular block of main memory can map the line number of cache is given by - Cache
line number = (Block Address of Main Memory) modulo (Number of lines in Cache).
The direct-mapped cache is like rows in a table with three columns' main memory address
are bits for Offset, Index, and Tag. The size of the fields depends on the capacity of
memory and size of the block in the cache.
The least significant w bits are used to identify a word within a block of main memory. Tag
corresponds to the remaining bits are used to determine the proper block of main memory.
Line off-set or block is used to select a block to be accessed out of total blocks are
available according to the capacity of the cache.
The data block or cache line that contains the actual data fetched and stored, a tag with all
or part of the address of the data that was fetched, and a flag bit that shows the presence in
the row entry of a valid bit of data.
2) Associative Mapping
In this type of mapping, any main memory block can go in any line of the cache. So we
have to use proper replacement policy to replace a block from the cache if the required
block of main memory is not present in the cache. Here, the main memory is divided into
two fields: word field identifies which word in the block is needed and the tag field
identifies the block. It is considered to be the fastest and the most flexible mapping form of
cache mapping.
In this mapping technique, blocks of cache are grouped to form a set and a block of main
memory can go into any block of a specific set.
This also reduces searching overhead present in the associative mapping. Here,
searching is restricted to the number of sets instead of the number of blocks
17) Grey Code
Grey code, also known as reflected binary code, is a binary numeral system where
two successive values differ in only one bit. Grey code is useful in minimizing
errors in digital communications and is commonly used in analog-to-digital
converters and error correction in digital systems.
For example, the 4-bit binary numbers and their corresponding Grey code
representations are:
18) BCD (Binary-Coded Decimal)
BCD is a class of binary encodings of decimal numbers where each decimal digit
is represented by a fixed number of binary digits, usually four or eight. The most
common encoding is the 4-bit encoding, also known as 8421 encoding.
For example:
Excess-3 is a binary-coded decimal code that is derived from the natural BCD code by
adding 3 (0011 in binary) to each decimal digit and then encoding the result in binary.
For example:
So, the decimal number 7 in Excess-3 code is 1010.
These codes are widely used in various digital systems and applications to facilitate error
checking and digital communication.
20) List and Explain Characters And Hierarchy of memory
Memory Hierarchy is an enhancement to organize the memory such that it can minimize the
access time. The Memory Hierarchy was developed based on a program behavior known as
locality of references. The figure below clearly demonstrates the different levels of the
memory hierarchy
Memory Hierarchy is one of the most required things in Computer Memory as it helps in
optimizing the memory available in the computer. There are multiple levels present in the
memory, each one having a different size, different cost, etc. Some types of memory
like cache, and main memory are faster as compared to other types of memory
but they are having a little less size and are also costly whereas some memory
has a little higher storage value, but they are a little slower. Accessing of data is
not similar in all types of memory, some have faster access whereas some have
slower access.
Types of Memory Hierarchy
This Memory Hierarchy Design is divided into 2 main types:
• External Memory or Secondary Memory: Comprising of
Magnetic Disk, Optical Disk, and Magnetic Tape i.e.
peripheral storage devices which are accessible by the
processor via an I/O Module.
• Internal Memory or Primary Memory: Comprising of
Main Memory, Cache Memory & CPU registers. This is
directly accessible by the processor.
1. Registers
Registers are small, high-speed memory units located in the CPU. They
are used to store the most frequently used data and instructions.
Registers have the fastest access time and the smallest storage capacity,
typically ranging from 16 to 64 bits.
2. Cache Memory
Cache memory is a small, fast memory unit located close to the CPU.
It stores frequently used data and instructions that have been recently
accessed from the main memory. Cache memory is designed to
minimize the time it takes to access data by providing the CPU with
quick access to frequently used data.
3. Main Memory
Main memory, also known as RAM (Random Access Memory), is the
primary memory of a computer system. It has a larger storage capacity
than cache memory, but it is slower. Main memory is used to store data
and instructions that are currently in use by the CPU.
Types of Main Memory
• Static RAM: Static RAM stores the binary information in
flip flops and information remains valid until power is
supplied. It has a faster access time and is used in
implementing cache memory.
• Dynamic RAM: It stores the binary information as a charge
on the capacitor. It requires refreshing circuitry to maintain
the charge on the capacitors after a few milliseconds. It
contains more memory cells per unit area as compared to
SRAM.
4. Secondary Storage
5. Magnetic Disk
Magnetic Disks are simply circular plates that are fabricated with either
a metal or a plastic or a magnetized material. The Magnetic disks work
at a high speed inside the computer and these are frequently used.
6. Magnetic Tape
Level 1 2 3 4
Name Register Cache Main Memory Secondary
Memory
Size <1 KB less than 16 <16GB >100 GB
MB
Implementation Multi-ports On- DRAM Magnetic
chip/SRAM (capacitor
memory)
Access Time 0.25ns to 0.5 to 25ns 80ns to 250ns 50 lakh ns
0.5ns
Bandwidth 20000 to 1 5000 to 15000 1000 to 5000 20 to 150
lakh MB
Managed by Compiler Hardware Operating Operating
System System
Backing From cache from Main from Secondary from ie
Mechanism Memory Memory
21) Pipeline Hazard and Dependencies
Dependencies and Data Hazard in pipeline in Computer Organization
In this section, we will learn about dependencies in a pipelined processor, which is described as
follows:
The pipeline processor usually has three types of dependencies, which are described as follows:
1. Structural dependencies
2. Data dependencies
3. Control dependencies
Because of these dependencies, the stalls will be introduced in a pipeline. A stall can be described
as a cycle without new input in the pipeline. In other words, we can say that the stall will happen
when the later instruction depends on the output of the earlier instruction.
Structural dependencies
Because of the resource conflict in the pipeline, structural dependency usually arises. The
resource conflict can be described as a situation where there is a cycle containing resources such
as ALU (arithmetical logical unit), memory, or register. In resource conflict, more than one
instruction tries to access the same resource
Example:
Instructions 1 2 3 4 5
/ Cycle
I1 IF(Mem) ID EX Mem
I2 IF(Mem) ID EX
I3 IF(Mem) ID EX
I4 IF(Mem) ID
The above table contains the four instructions I 1, I2, I3, and I4, and five cycles 1, 2, 3, 4, 5. In
cycle 4, there is a resource conflict because I1 and I4 are trying to access the same resource. In
our case, the resource is memory. The solution to this problem is that we have to keep the
instruction on wait as long as the required resource becomes available. Because of this wait, the
stall will be introduced in pipelines like this:
Instructions 1 2 3 4 5 6 7 8
/ Cycle
I1 IF(Mem) ID EX Mem WB
I2 IF(Mem) ID EX Mem WB
I3 IF(Mem) ID EX Mem WB
I4 - - - IF(Mem)
With the help of a hardware mechanism, we can minimize the structural dependency stalls in a
pipeline. The mechanism is known as renaming.
Remaining: In this mechanism, the memory will be divided into two independent modules,
which are known as Data memory (DM) and Code memory (CM). Here, all the instructions are
contained with the help of CM, and all the operands which are required for the instructions are
contained by the DM.
Instructions 1 2 3 4 5 6 7
/ Cycle
I1 IF(CM) ID EX DM WB
I2 IF(CM) ID EX DM WB
I3 IF(CM) ID EX DM WB
I4 IF(CM) ID EX DM
I5 IF(CM) ID EX
I6 IF(CM) ID
I7 IF(CM)
When we transfer the control instructions, the control dependency will occur at that time. These
instructions can be JMP, CALL, BRANCH, and many more. On many instruction architectures,
when the processor wants to add the new instruction into the pipeline, the processor does not
know the target address of these new instructions. Because of this drawback, unwanted
instructions are inserted into the pipeline
For example:
For this, we will assume a program and take the following sequence of instructions like this:
100: I1
101: I2
102: I3
.
.
250: BI1
I1 → I2 → BI1
Note: After the ID stage, the processor is able to know the target address of JMP instruction.
Instructions 1 2 3 4 5 6
/ Cycle
I1 IF ID EX MEM WB
I2 IF ID(PC:250) EX MEM WB
I3 IF ID EX MEM
BI1 IF ID EX
I1 → I2 → I3 → BI1
So the above example shows that the expected output and output sequence are not equal to each
other. It shows that the pipeline is not correctly implemented.
We can correct that problem with the help of stopping the instruction fetch as long as we get the
target address of branch instruction. For this, we will implement the delay slot as long as we get
the target address, which is described in the following table:
Instructions 1 2 3 4 5 6
/ Cycle
I1 IF ID EX MEM WB
I2 IF ID(PC:250) EX MEM WB
Delay - - - - - -
BI1 IF ID EX
In the above example, we can see that there is no operation performed by the delay slot. That's
why this output sequence and the expected output are not equal to each other. But because of
this slot, a stall will be introduced in the pipeline.
In the control dependency, we can eliminate the stall in the pipelines with the help of a method
known as Branch prediction. The prediction about which branch will be taken is done at the
1st stage of branch prediction. The branch prediction contains the 0 branch penalty.
Branch Penalty: Branch penalty can be described as the number of stalls that are introduced at
the time of branch operation in the pipelined.
For this, we will assume an ADD instruction S, and three registers, which are described as
follows:
The above condition is known as the Bernstein condition. In this condition, there are three
cases, which are described as follows:
Flow (data) Dependence: Suppose this dependency contains O(S1) ? I(S2), S1 → S2. In this
case, when S2 reads something, only after that, S1 write.
Anti Dependence: Suppose this dependency contains I(S1) ? O(S2), S1 → S2. In this case,
before S2 overwrite S1, the S1 will read something.
Output Dependence: Suppose this dependency contains O(S1) ? O(S2), S1 → S2. In this case,
both S1 and S2 write on the same memory location.
For example: Here, we will assume that we have two instructions I1, and I2, like this:
The condition of data dependency will occur when the above instructions I 1, I2 are executed in a
pipelined processor. It shows that before I 1 writes the data, the I2 tries to read it. As a result, the
instruction I2 incorrectly gets the old value from I1, which is described in the following table:
Instructions / 1 2 3 4
Cycle
I1 IF ID EX DM
I2 IF ID (Old EX
value)
Here we will use the operand forwarding so that we can minimize the stalls in data dependency.
Operand Forwarding: In this forwarding, we will use the interface registers which exist
between the stages. These registers are used to contain the intermediate output. With the help of
intermediate registers, the dependent instruction is able to directly access the new value.
Instructions / Cycle 1 2 3 4
I1 IF ID EX DM
I2 IF ID EX
Data Hazards
Due to the data dependency, data hazards have occurred. If the data is modified in different
stages of a pipeline with the help of instructions that exhibit data dependency, in this case, the
data hazard will occur. When the instructions are read/write the registers that are used by some
other instructions, in this case, the instruction hazards will occur. Because of the data hazard,
there will be a delay in the pipeline. The data hazards are basically of three types:
1. RAW
2. WAR
3. WAW
To understand these hazards, we will assume we have two instructions I1 and I2, in such a way
that I2 follows I1. The hazards are described as follows:
RAW:
RAW hazard can be referred to as 'Read after Write'. It is also known as Flow/True data
dependency. If the later instruction tries to read on operand before earlier instruction writes it, in
this case, the RAW hazards will occur. The condition to detect the RAW hazard is when O n and
In+1 both have a minimum one common operand.
For example:
There is a RAW hazard because subtraction instruction reads output of the addition. The hazard
for instructions 'add R1, R2, R3' and 'sub R5, R1, R4' is described as follows:
Instructions / 1 2 3 4 5 6
Cycle
I1 IF ID EX MEM WB
I2 IF ID EX MEM WB
WAR
WAR can be referred to as 'Write after Read'. It is also known as Anti-Data dependency. If the
later instruction tries to write an operand before the earlier instruction reads it, in this case, the
WAR hazards will occur. The condition to detect the WAR hazard is when I n and On+1 both have
a minimum one common operand.
For example:
Here addition instruction creates a WAR hazard because subtraction instruction writes R2, which
is read by addition. In a reasonable (in-order) pipeline, the WAR hazard is very uncommon or
impossible. The hazard for instructions 'add R1, R2, R3' and 'sub R2, R5, R4' are described as
follows:
Instructions / 1 2 3 4 5 6
Cycle
I1 IF ID EX MEM WB
I2 IF ID EX MEM WB
When the instruction tries to enter into the write back stage of the pipeline, at that time, all the
previous instructions contained by the program have already passed through the read stage of
register and read their input values. Now without causing any type of problem, the write
instruction can write its destination register. The WAR instructions contain less problems as
compared to the WAW because in WAR, before the write back stage of a pipeline, the read stage
of a register occur.
WAW
WAW can be referred to as 'Write after Write'. It is also known as Output Data dependency.
If the later instruction tries to write on operand before earlier instruction writes it, in this case,
the WAW hazards will occur. The condition to detect the WAW hazard is when On and On+1 both
have a minimum one common operand.
For example:
Instructions 1 2 3 4 5 6 7
/ Cycle
I2 IF ID EX MEM WB
In the write back stage of a pipeline, the output register of instruction will be written. The order
in which the instruction with WAW hazard appears in the program, in the same order these
instructions will be entered the write back stage of a pipeline. The result of these instructions
will be written into the register in the right order. The processor has improved performance as
compared to the original program because it allows instructions to execute in different orders.
The WAR hazards and WAW hazards occur because the process contains a finite number of
registers. Because of this reason, these hazards are also known as the name dependencies.
The processor will use the different registers to generate the output of each instruction if it
contains an infinite number of registers. There is no chance of occurring the WAR and WAW
hazards in this case.
The WAR and WAW hazards will not cause the delay if a processor uses the same pipeline for
all the instructions and executes these instructions in the same order in which they appear in the
program. This is all because of the process of instructions flow through a pipeline.
Disadvantages :
1. Limited Flexibility : Effectiveness heavily depends on the ability to find
suitable instructions to fill the delay slots, which isn't always possible.
2. Increased Compiler Complexity : Compilers must perform additional work
to identify and move instructions into delay slots, which can increase complexity
and compilation time.
3. Wasted Slots : If no useful instructions can be found for the delay slots,
these slots may be filled with NOPs (no-operations), leading to wasted cycles.
Branch Prediction
Concept :
- Branch prediction involves guessing the outcome of a branch instruction before
it is known for sure and speculatively executing subsequent instructions based
on the prediction.
- Modern processors use sophisticated branch prediction algorithms, including
static and dynamic techniques, to improve prediction accuracy.
Advantages :
1. Increased Parallelism : Enables the pipeline to stay full by speculatively
executing instructions, potentially leading to significant performance gains.
2. Adaptive to Workloads : Dynamic branch predictors can adapt to the
branching patterns of different workloads, improving accuracy over time.
3. Reduced Pipeline Stalls : By predicting the branch outcome and continuing
execution, branch prediction can minimize the number of pipeline stalls and
keep the CPU busy.
Disadvantages :
1. Complexity : Implementing accurate and efficient branch predictors adds
significant complexity to the CP prediction U design.
2. Mis- Penalty : Incorrect predictions lead to flushing the pipeline and re-
executing instructions, which can incur a significant performance penalty.
3. Power Consumption : Additional logic for branch prediction consumes more
power, which is a critical consideration for modern processors, especially in
mobile and embedded systems.
Comparison
Performance :
- Delayed Branch : Performance improvement is limited by the compiler's
ability to fill delay slots.
- Branch Prediction : Can lead to substantial performance improvements,
especially with accurate predictors and deep pipelines.
Implementation Complexity :
- Delayed Branch : Simpler hardware but requires sophisticated compiler
support.
- Branch Prediction : Complex hardware design but offers greater flexibility and
adaptability.
Efficiency :
- Delayed Branch : Efficiency depends on the presence of suitable instructions
for delay slots.
- Branch Prediction : Efficiency depends on the accuracy of the predictor and
the ability to minimize mis-prediction penalties.
Adaptability :
- Delayed Branch : Less adaptable to changing workloads and branching
patterns.
Branch Prediction : Highly adaptable, especially with dynamic predictors that
learn and adjust based on runtime behavior.
• Non-Pipelined Execution
• Pipelined Execution
1. Non-Pipelined Execution-
In non-pipelined architecture,
• All the instructions of a program are executed sequentially one after the other.
• A new instruction executes only after the previous instruction has executed completely.
• This style of executing the instructions is highly inefficient.
Example-
2. Pipelined Execution-
In pipelined architecture,
• Multiple instructions are executed parallely.
• This style of executing the instructions is highly efficient.
Instruction Pipelining-
Instruction pipelining is a
technique that implements a form
of parallelism called as instruction
level parallelism within a single
processor.
• A pipelined
processor does not wait until the previous instruction has executed
completely.
• Rather, it fetches the next instruction and begins its execution.
Pipelined Architecture-
In pipelined architecture,
• The hardware of the CPU is split up into several functional units.
• Each functional unit performs a dedicated task.
• The number of functional units may vary from processor to processor.
• These functional units are called as stages of the pipeline.
• Control unit manages all the stages using control signals.
• There is a register associated with each stage that holds the data.
• There is a global clock that synchronizes the working of all the stages.
• At the beginning of each clock cycle, each stage takes the input from its register.
• Each stage then processes the data and feed its output to the register of the next stage.
Four-Stage Pipeline-
In four stage pipelined architecture, the execution of each instruction is completed in following
4 stages-
Stage-01:
At stage-01,
o First functional unit performs instruction fetch.
o It fetches the instruction to be executed.
Stage-02:
At stage-02,
o Second functional unit performs instruction decode.
o It decodes the instruction to be executed.
Stage-03:
At stage-03,
➢ Third functional unit performs instruction execution.
➢ It executes the instruction.
Stage-04:
At stage-04,
10. Fourth functional unit performs write back.
11. It writes back the result so obtained after executing the instruction.
Execution-
In pipelined architecture,
5. Instructions of the program execute parallely.
6. When one instruction goes from nth stage to (n+1)th stage, another instruction goes from
(n-1)th stage to nth stage.
Phase-Time Diagram-
3. Phase-time diagram shows the execution of instructions in the pipelined architecture.
4. The following diagram shows the execution of three instructions in four stage pipeline
architecture.
Time taken to execute three instructions in four stage pipelined architecture = 6 clock cycles.
NOTE-
In non-pipelined architecture,
Time taken to execute three instructions would be
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-pipelined
execution.