0% found this document useful (0 votes)
155 views4 pages

KGP Computer Architecture Exam Paper

This document contains details about an examination for the subject "Computer Architecture and Operating Systems" including 9 questions to be answered in 2 hours. Question 1 involves calculating clock cycles, time, CPI, and speedup for two processor implementations running a program with different instruction classes. Question 2 asks to write MIPS assembly code equivalent to a given C code segment. Question 3 asks to reconstruct a C function from given MIPS assembly code and describe what the function does.

Uploaded by

Utkarsh Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views4 pages

KGP Computer Architecture Exam Paper

This document contains details about an examination for the subject "Computer Architecture and Operating Systems" including 9 questions to be answered in 2 hours. Question 1 involves calculating clock cycles, time, CPI, and speedup for two processor implementations running a program with different instruction classes. Question 2 asks to write MIPS assembly code equivalent to a given C code segment. Question 3 asks to reconstruct a C function from given MIPS assembly code and describe what the function does.

Uploaded by

Utkarsh Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Indian Institute of Technology, Kharagpur

Mid-Spring Semester 2021-22

Date of Examination: 22-02-2022 Session: (10-12 noon) Duration: 2 hrs


Subject No.: CS31702
Subject: COMPUTER ARCHITECTURE AND OPERATING SYSTEMS
Department/Center/School: Computer Science and Engineering
Specific charts, graph paper, log book etc., required: NO Total Marks : 100
Special instructions (if any): ANSWER ALL QUESTIONS
Note: (i) All parts of the question (a,b,c,....) should be answered at a stretch.
(ii) All intermediate steps need to be mentioned in the answer script

1. Consider two different implementations of same instruction set architecture. There are 4 classes
of instructions, A, B, C, and D. The clock rate and CPI of each implementation are given in the
following table.

Clock rate CPI of Class A CPI of Class B CPI of Class C CPI of Class D
P1 1.5 GHz 1 2 3 4
P2 2.0 GHz 2 3 4 5

Given a program with 106 instructions divided into classes as follows: 10% class A, 20% class B,
50% class C and 20% class D.

(a) Determine the number of clock cycles required for execution of the program by Processors P1
and P2.
(b) Determine the time required by the processors P1 and P2 for execution of the program.
(c) What is the global CPI for each of the processors P1 and P2?
(d) Among two implementations, which implementation is faster for the execution of the program
mentioned. Indicate the speed-up factor of one processor with the other one.

(3+2+2+1 = 8M)

2. Consider the following C program segment. Write the equivalent MIPS assembly program. Assume
that the base addresses of the integer arrays A[n], B[n] and C[n] are stored in registers $s0, $s1
and $s2, respectively. The register $s3 is used for a loop index and the loop limit n is stored in
register $s4. Write appropriate comments to each of the MIPS instructions for better readability
and understanding.
(NOTE: DON’T USE ANY PSEUDO MACHINE INSTRUCTIONS).
for( i = 0; i<n; i++)
{
if(A[i] == B[i])
C[i] = 0;
else if(A[i] < B[i])
C[i] = -1;
else C[i] = 1;
}

(12M)

1
3. The MIPS assembly code given below is a translation of a C function named mipsfun. Reconstruct
the C function mipsfun from the given MIPS code. Your code should be as concise as possible and
should not use any goto statements or explicit pointers. Using one sentence, describe what this
function does.
mipsfun:
addi $t0, $a2, 1
loop:
slt $t3, $a1, $t0
addi $t4, $0, 1
beq $t3, $t4, exit
add $t1, $t0, $t0
add $t1, $t1, $t1
add $t1, $t1, $a0
lw $t2, 0($t1)
addi $t1, $t1, -4
sw $t2, 0($t1)
addi $t0, $t0, 1
j loop
exit: jr $ra

(10M)

4. Using first version of division hardware perform the following division 77÷ 6. Indicate all iterations
and sequence of steps clearly for performing the above division. In each iteration for every step
mention the contents of quotient, divisor and reminder registers.

(10M)

5. For the given sequence of instructions represent their execution graphically (synchronized w.r.t
clock) in view 5-stage pipeline architecture with data forwarding. Clearly show and explain (in
sequence w.r.t clock) the sequence of events (hazards) occur and actions taken against to the
hazards in view of execution of the above sequence of instructions. Explain clearly the execution
of each instruction and its dependence with other instructions.
(Note: In view of branch instruction, assume that the next instruction will be fetched only after
branch condition is evaluated. In this problem branch condition is computed in EX stage (3rd
cycle), and assume the result is Branch ”NOT TAKEN”)

lw s2, 40(s1)
add s3, s2, s4
beq s3, s2, Exit
or s5, s2, s6
and s7, s5, s3

(9M)

6. Compute the execution time for the sequence of instructions mentioned in above question 5, in
case of (i) single-cycle approach, (ii) 5-stage pipeline without data forwarding and (iii) 5-stage
pipeline with data forwarding. What will be the CPU clock frequency for each of the above three
cases. Assume latencies for the individual stages are as follows: IF-300 ps, ID-400 ps, EX-350 ps,
MEM-500 ps and WB-100 ps.

(6M)

2
7. In class we discused about data-hazards and data-forwarding hardware in EX stage of 5-stage
pipeline. Now, write the detailed conditions to detect the data hazards at DM (data memory) stage
in a 5-stage pipeline architecture. Derive the control signals to resolve them. Draw a modified
datapath to incorporate the data forwarding control unit, multiplexers, forwarding paths, inputs
and outputs of forwarding unit.

In this case, you may consider the limited portion of 5-stage pipeline datapath (without IF
and ID stages) to illustrate the functionality of dataforwarding control unit. Provide an example
sequence of instructions where you may need data forwarding soon after MEM/WB stage.

(10M)

8. The processor fetches the following instruction word: 10001110001101000000001001011000


Assume an instruction is executed in a single-cycle datapath, the data memory is all zeros and
that the processors temporary registers $t0 to $t7 contains a constant value 500 and save registers
$s0 to $s7 contains a constant value 1000 and other registers contain some random values at the
beginning of the cycle in which the above instruction word is fetched:

(a) What are the outputs of the sign-extend and the jump Shift left 2 unit for this instruction
word?
(b) What are the values of the ALU control units inputs for this instruction?
(c) What is the new PC address after this instruction is executed? Highlight the path through
which this value is determined.
(d) For each Mux, show the values of its data output during the execution of this instruction and
these register values.
(e) For the ALU and the two add units, what are their data input values?
(f) What are the values of all inputs for the Registers unit?
(g) Translate the above machine instruction into MIPS instruction format.
(h) What are the values of all inputs for the ”Data Memory” unit?

(2+1+1.5+2.5+2.5+2.5+1+2 = 15M)

3
9. For a 2-way set associative cache design with 32-bit address, the following bits of the address are
used to access the cache: Tag: 31-10, index: 9-4 and Offset: 3-0.

(a) Show the hardware implementation of the cache for the above specifications for accessing the
words.
(b) What is the cache block size (in words)?
(c) How many entries does the chache have?
(d) What is the maximum data storage capacity supported by this cache?
(e) Determine the overhead/control information (in bytes) involved in this cache implementation?
(f) Starting from power on, the following byte-addressed cache references are recorded: 160, 200,
172, 1188, 1224, 2212, 2248, 4136, 760 and 3268. Find the following:
i. For each entry find the block number, cache entry number/index, set number and whether
the reference is hit/miss.
ii. How many misses were observed?
iii. How many block replacements (conflict misses) were observed?
iv. What is the hit ratio?
v. List the final state of the cache, with each valid entry represented as a record of <index,
tag, data>

(5+1+1+1+2+10 = 20M)

Common questions

Powered by AI

To calculate the number of clock cycles required for executing a program, you need to know the number of instructions of each type and their respective CPI (cycles per instruction). For each class of instruction, determine the total cycles by multiplying the CPI by the number of instructions in that class. Sum these products to obtain the total cycles for the entire program. Using this method, if class A has 100,000 instructions with CPI of 1 (P1) or 2 (P2), class B has 200,000 with CPI of 2 or 3, class C has 500,000 with CPI of 3 or 4, and class D has 200,000 with CPI of 4 or 5, the total cycles for P1 and P2 can be calculated as follows: For P1, the total cycles are (100,000 * 1) + (200,000 * 2) + (500,000 * 3) + (200,000 * 4) = 2,400,000 cycles. For P2, the total cycles are (100,000 * 2) + (200,000 * 3) + (500,000 * 4) + (200,000 * 5) = 3,400,000 cycles .

To construct a MIPS assembly program from a C program like the provided segment, ensure each C statement is translated into corresponding MIPS instructions. Begin by setting up the loop counter and check conditions using registers. Translate 'if' and 'else' conditions using branch and comparison instructions (such as slt, beq, and bne). Implement array element comparisons and assignments using lw, sw, and conditional branch instructions. This results in a sequence considering loop increments and termination conditions, using MIPS syntax without pseudoinstructions. For instance, loading elements with lw, comparing with slt or beq, followed by storing with sw depending on comparison results ensures functionality mirrors C logic .

The Global CPI is calculated as the weighted average of the CPIs for different classes of instructions, based on their frequency in the program. It provides a single number to compare across different processors. For processor P1, the Global CPI is (0.1 * 1 + 0.2 * 2 + 0.5 * 3 + 0.2 * 4) = 2.7, and for P2, it is (0.1 * 2 + 0.2 * 3 + 0.5 * 4 + 0.2 * 5) = 3.4. The execution time is directly proportional to the Global CPI; thus, lower Global CPI generally implies faster execution given equal clock rates. With P1 running at 1.5 GHz and P2 at 2.0 GHz, execution time for P1 can be calculated as (2,400,000 cycles / 1.5 billion cycles per second) = 1.6 seconds, and for P2 as (3,400,000 cycles / 2 billion cycles per second) = 1.7 seconds. This shows P1 is faster despite the lower clock rate due to a better Global CPI .

Reconstructing a C function from MIPS assembly involves understanding the purpose of each instruction, identifying loop structures, branches, and function calls. By observing the ladder-like structure with instructions using arithmetic, logic, and branch operations, one can deduce the flow and control structures like loops and conditionals. The given MIPS code suggests a looping mechanism with increment operations and memory interactions that aligns with copying or modifying array elements. Describing its purpose can be derived from the loop's impacts—such as incrementing through memory with specific arithmetic operations, `

You might also like