CS683-exam2-answer
CS683-exam2-answer
Mid-term-version II
27th October, 2023
Time Limit: 75 Minutes Roll No.:
Tips:
Be concise and cognizant. There will be a penalty for verbosity and “it depends” without
justification. Your logic and your understanding will not lead to marks unless your logic and
understanding respect the design and implementation aspects of Computer Architecture. Do
not spend too much or too little time on any particular question.
“I promise I will write this exam honestly and ethically”. Your Signature:
Your marks in CS683 will certainly not affect the # zeros in your salary. Page 1 of 12
So relax, enjoy, and write this exam :)
CS683-2023@IITB Mid-term-version II Name:
This exam tests CS683. However, life will test the way you learned it. Page 2 of 12
CS683-2023@IITB Mid-term-version II Name:
(b) Using code-II with stride1 = stride2 = 32, size1 = 1056, and size2 = 1024, we
observe latency[0] = 300 cycles. However, if size1 = 1024, latency[0] = 100 cycles.
What is the maximum number of ways in L1? (Note: The replacement policy is a
vanilla policy that can be either FIFO or LRU). [3 points]
(c) The TAs want to find out the exact replacement policy, assuming that the
associativity is the maximum obtained in part in the previous question. We first
run code (2) with stride1 = 32, size1 = 1024, stride2 = 64, and size2 = 1056. Then
(after resetting j), we run code (1) with stride = 32 and size = 1024. We observe
latency[1] = 100 cycles. What is the replacement policy? Explain. (Hint: The
replacement policy can be either FIFO or LRU. You need to find the correct one
and explain). [3 points]
A person who never made a mistake never tried anything new. Page 4 of 12
So, it is OK if you do not score 30.
CS683-2023@IITB Mid-term-version II Name:
(a) L1 block size 32 bytes, L2 128 bytes, (b) 32, and (c) FIFO or closer
to FIFO
2. (10 points) [25 minutes] Suppose now the TAs of CS683 have a system with 32 cores
that share a physical second-level cache. Assume each core is running a single single-
threaded application, and all 32 cores are concurrently running applications. Assume
that the page size of the architecture is 8KB, the block size of the cache is 128 bytes, and
the cache uses LRU replacement. We want to ensure each application gets a dedicated
space in this shared cache without any interference from other cores. We would like to
enforce this using the OS-based page coloring mechanism to partition the cache, as we
discussed in the lecture. Recall that with page coloring, the operating system ensures,
using virtual memory mechanisms, that the applications do not contend for the same
space in the cache. Show all the details.
(a) What is the minimum size the L2 cache needs to be such that each application is
allocated its dedicated space in the cache via page coloring? Show your work. [2.5
points]
(b) Assume the cache is 4MB, 32-way associative. Can the operating system ensure that
the cache is partitioned such that no two applications interfere with cache space? Show
your work. [2.5 points]
Sometimes the questions are complicated and the answers are simple. Page 6 of 12
CS683-2023@IITB Mid-term-version II Name:
(c) Assume you would like to design a 32MB shared cache such that the operating system
has the ability to ensure that the cache is partitioned such that no two applications
interfere with cache space. What is the minimum associativity of the cache such that
this is possible? Show your work. [2.5 points]
Life is the most difficult exam. Many fail because they try to copy others. Page 7 of 12
CS683-2023@IITB Mid-term-version II Name:
(d) Suppose the TAs decide to change the cache design and use way-based cache parti-
tioning to partition the cache, instead of OS-based page coloring. Assume we would like
to design a 4MB cache with a 128-byte block size. What is the minimum associativity of
the cache such that each application is guaranteed a minimum amount of space without
interference? [2.5 points]
Exam is entering into its final stages. Pressure cooker situation. Page 8 of 12
CS683-2023@IITB Mid-term-version II Name:
3. (10 points) [25 minutes] An IITB student writes two programs A and B and runs them
on two different machines M1 and M2 at SL1 and SL2 labs to determine the type of
prefetching mechanism used in each of these two machines. She observes programs A
and B to have the following access patterns to cache blocks. Note that the addresses are
cache block addresses, not byte addresses.
Program A: 27 accesses
a, a + 1, a + 2, a + 3, a + 4, a + 8, a + 16, a + 32, a + 64,
a, a + 1, a + 2, a + 3, a + 4, a + 8, a + 16, a + 32, a + 64,
a, a + 1, a + 2, a + 3, a + 4, a + 8, a + 16, a + 32, a + 64
Exam is entering into its final stages. Pressure cooker situation. Page 9 of 12
CS683-2023@IITB Mid-term-version II Name:
The student is able to measure the accuracy and coverage of the prefetching mechanism
in each of the machines. The following table shows her measurement results:
Program A: Coverage (M1): 6/27 Accuracy(M1): 6/27 Coverage(M2): 1/3 Accuracy(M2):
9/26
Program B: Coverage(M1): 499/501 Accuracy(M1):499/501 Coverage(M2):499/501 Ac-
curacy(M2): 499/500
The student knows the following facts about M1 and M2
• The prefetcher prefetches into a fully associative cache whose size is 8 cache blocks.
The cache employs the FIFO (First-In First-Out) replacement policy.
• The prefetchers have large enough resources to detect and store access patterns.
• Each cache block access is separated long enough in time such that all prefetches issued
can be completed before the next access happens.
• There are 3 different possible choices for the prefetching mechanism:
1) 1st-next-block prefetcher (degree = 1) – prefetches block N + 1 after seeing block N
2) 4th-next-block prefetcher (degree = 1) – prefetches block N + 4 after seeing block N
3) Intel’s IP-stride prefetcher as the sysad. buys Intel machines. Note that none of the
above-mentioned prefetchers employ confidence bits.
The prefetchers start out with an empty table when each program A and B start ex-
ecution. The prefetcher sends only one prefetch request after a program access (i.e.,
prefetch degree = 1). Determine what type of prefetching mechanism each of the above-
mentioned machines (M1 and M2) use: Answer: 4th next block, and IP-stride.
Rough sheet
Rough sheet