0% found this document useful (0 votes)

21 views21 pages

Handout Chapter-5 PBK

The document discusses the challenges of improving CPU performance and the importance of memory optimization through the concept of memory hierarchy. It explains the principles of temporal and spatial locality, the differences between DRAM and SRAM, and various cache organizations, including direct-mapped cache. Additionally, it covers the structure of memory addresses and the calculation of hit and miss ratios in cache memory systems.

Uploaded by

Wasiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views21 pages

Handout Chapter-5 PBK

Uploaded by

Wasiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

CSE340: Computer Architecture

Handout_Chapter - 5: Large and Fast: Exploiting

Memory Hierarchy

Prepared by: Partha Bhoumik (PBK)

Course Coordinator, CSE340

1
Background
In recent years, improving CPU performance has become increasingly difficult.
Instead of making single CPUs faster, designers now focus on increasing the
number of processors or CPU cores to handle more tasks simultaneously.

However, simply adding more cores isn’t enough. A major challenge in modern
computing is ensuring that data is accessed quickly and efficiently. This is why
much of today’s hardware research focuses on memory optimization.

Imagine you have a program with 1000 lines of code, written in a

high-level language. When executed, this translates to 10,000 lines of RISC-V
assembly instructions. While the program as a whole may access many different
memory locations, at any given moment, only a small subset of instructions and
memory locations are actively in use.

Consider a simple loop inside that 1000-line program:

for (i = 0; i < 100; i++) {
array[i] = array[i] + 1;
}

Even though the entire program consists of 10,000 lines, the processor
repeatedly executes just three lines of code for a significant amount of time (100
iterations). This means that during this period, the CPU is primarily focused on
a small section of code and memory rather than the entire program.

Now, again, think about a function for factorial calculation,

int factorial(int n) {
int result = 1;
for (int i = 1; i <= n; i++) {
result *= i; // Updating the same variable repeatedly
}
return result;
}

2
Here too, even though the function is part of a larger program, during the
execution of this function, only a single variable is being updated.

Although your larger program is going to access lot of different memory

locations; your scope of investigation is restricted to a small subset of
instructions and a small subset of memory locations at the time.
This behavior is universal in all programs, no matter how complex.

From this observation, we get two important attributes that all programs have:

Temporal Locality: Items accessed recently are likely to be accessed again

soon.
Ex. instructions in a loop

Spatial Locality: Items near those accessed recently are likely to be accessed
soon.
Ex. sequential instruction access, array data.

We can take advantage of these two characteristics by copying the instructions

and data we need into a faster location for a short period—let’s say 1 second.

For example, if we could fit these instructions and data into registers for 1
second, the CPU would not need to load or store anything from memory during
that time. This means the CPU can run at full speed without waiting for data.
Once we are done with the data and instructions, we load a new chunk from
memory and continue.

If we follow this pattern, then we reduce frequent memory accesses and keep
the CPU running efficiently. This is the core concept behind memory
hierarchy.
This idea goes beyond just registers. To keep frequently used data easily
accessible, we add intermediate levels of memory instead of always relying on
main memory. These levels help the CPU access data faster, improving overall
performance.

3
Memory Hierarchy

Figure: The basic structure of a memory hierarchy.

Computer memory is designed as a hierarchy to take advantage of the principle

of locality. This hierarchy consists of multiple levels of memory, each with
different speeds, sizes, and costs.

Key Characteristics of Memory Hierarchy:

Faster Memory (Closer to CPU)
i. Higher speed (low access time)
ii. Limited in size due to high cost.

Slower Memory (Far away from CPU)

i. Lower speed (higher access time)
ii. Larger size due to lower cost.

4
DRAM vs SRAM
Dynamic Random Access Memory uses 1 capacitor and 1 transistor to store 1
bit of data. The data is stored as a charge in the capacitor and a transistor is then
used to access this stored charge, either to read the value or to write. The data
stored in the capacitor slowly leaks away with time, so it needs regular
refreshing. Refresh means to read the contents and write them back. Because of
this refreshing, we call this a Dynamic RAM

On the contrary, Static Random Access Memory uses 6-8 transistors to store 1
bit of data. SRAMs do not need to be refreshed. As long as power is applied, the
value can be kept continuously.

Read this to know more:

https://www.geeksforgeeks.org/difference-between-sram-and-dram/

Different terms related to Cache

Figure: Memory Hierarchy Levels

5
The minimum unit of information that can be either present or not present in a
cache is called a block or line as shown in the above figure.

If the data requested by the processor appears in some block in the upper level,
this is called a hit.

If the data are not found in the upper level, the request is called a miss. The
lower level in the hierarchy is then accessed to retrieve the block containing the
requested data.

Hit rate or Hit ratio is the percentage of memory accesses found in the upper
level(cache); it is often used as a measure of the performance of the memory
hierarchy. (Hits/ Accesses)

Miss rate or Miss ratio (1 - Hit Ratio) is the fraction of memory accesses not
found in the upper level(cache).

Hit time is the time required to access a level of the memory hierarchy,
including the time needed to determine whether the access is a hit or a miss.
Since the upper level is smaller and faster, the hit time is very low.

Miss penalty is the time to replace a block in the upper level with
the corresponding block from the lower level, plus the time to deliver this block
to the processor. Since lower-level memory is much slower, the miss penalty is
high and can significantly slow down performance.

An ideal design of cache memory should try to increase the hit rate and reduce
the miss penalty.

6
Different types of Cache Organizations:
i. Direct Mapped Cache;
ii. Fully Associative;
iii. Set Associative.

Direct Mapped Cache

Direct-mapped cache is the simplest form of cache organization. Each block of
memory maps to exactly one cache block. Lookups in a direct-mapped cache
are faster. This is also easy to implement in hardware. But if two memory
blocks that map to the same cache block, one overwrites the other.

Now the question is,

A main memory block will be mapped to which cache index?

𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 = 𝐵𝑙𝑜𝑐𝑘 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝑚𝑜𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒

Follow the explanation to have a better idea.

7
Explanation:
The above diagram represents how main memory blocks are mapped to a cache
with 8 blocks using direct-mapped cache organization.

The upper portion represents a cache, with 8 blocks labeled using 3-bit binary
indices (000 to 111, or 0–7 in decimal).
Each block in the cache can hold one block of data from main memory.

The lower portion represents the main memory, showing block addresses in
5-bit binary (00001, 00101, etc.).

Now let's try to map the main memory blocks into cache memory.

Mapping:
𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 = 𝐵𝑙𝑜𝑐𝑘 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝑚𝑜𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒

For example:
Main Memory Block Formula Mapped Cache Index
00001 (1) 1 mod 8 = 1 001

00101 (5) 5 mod 8 = 5 101

01001 (9) 9 mod 8 = 1 001
01101 (13) 13 mod 8 = 5 101
10001 (17) 17 mod 8 = 1 001
11001 (25) 25 mod 8 = 1 001
11101 (29) 29 mod 8 = 5 101

From this table, you can also see some conflicts, such as,
00001, 01001, 10001, 11001, all these main memory addresses are mapped to a
single cache index 001.

8
Address Subdivision

When a CPU accesses memory, it uses a memory address. This address is split
into three main parts:

𝑀𝑒𝑚𝑜𝑟𝑦 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 𝑠𝑖𝑧𝑒 = [𝑇𝑎𝑔 𝑏𝑖𝑡𝑠] + [𝐼𝑛𝑑𝑒𝑥 𝑏𝑖𝑡𝑠] + [𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡𝑠] ; [ ] = 𝑠𝑖𝑧𝑒 𝑜𝑓

Index Bits:
This field is used to select the specific cache block where the main memory
block will be mapped. In a direct-mapped cache, the Index field points to a
single cache block.

Formula to calculate the size of the index bits(for direct-mapped):

𝐼𝑛𝑑𝑒𝑥 𝑏𝑖𝑡 𝑠𝑖𝑧𝑒 = 𝑙𝑜𝑔2(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 𝑏𝑙𝑜𝑐𝑘𝑠)
Example:
If cache has 64 blocks → Index bit size = log₂(64) = 6 bits

Tag Bits:
This field is used to check if the block in the cache is the correct one or not(to
detect a cache hit).

Formula to calculate the size of the Tag bits(for direct-mapped):

𝑇𝑎𝑔 𝑏𝑖𝑡𝑠 = 𝑇𝑜𝑡𝑎𝑙 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝑏𝑖𝑡𝑠 − 𝐼𝑛𝑑𝑒𝑥 𝑏𝑖𝑡𝑠 − 𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡𝑠
Example:
If address is 32 bits, index = 6 bits, offset = 4 bits:
Tag bit size = 32 − 6 − 4 = 22 bits

Offset Bits:
This field is used to select the exact byte within a block.

Formula to calculate the size of the Offset bits(for direct-mapped):

𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡 𝑠𝑖𝑧𝑒 = 𝑙𝑜𝑔2(𝐵𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒 𝑖𝑛 𝑏𝑦𝑡𝑒𝑠)

9
Example:
If Block size = 16 bytes, Offset bit size = log₂(16) = 4 bits

Valid Bit:
This field is used to identify whether the cache block contains valid data or not.
Remember: the valid bit is not part of the memory address.

Size: Always 1 bit per cache block.

0 → Block is invalid (empty or outdated)
1 → Block contains valid data

For a better understanding, please follow this simulation.

Setup:
Cache Size = 8 blocks;
Block Size = 1 byte or 8 bits;
Offset bits size = Log2(1) = 0; Hence, no Offset bits in the address.

Access 1: Memory Address = 22 (10110)

Index = 22 mod 8 = 6 = 110
Tag bits will be the remaining msb bits = 10

At the start, the cache is empty:

All valid bits = 0, meaning the cache doesn’t contain any data yet.
So even though index 6 is pointing to the correct location, the valid bit is not
set.
Hence, Access 1 is a miss.

10
Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

As this is a miss, now the CPU will fetch the corresponding block from main
memory and place it in the cache.

Access 2: Memory Address = 26 (11010)

Index = 26 mod 8 = 2 = 010
Tag bits will be the remaining msb bits = 11

Access 2 is also a miss.

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

As this is a miss, now the CPU will fetch the corresponding block from main
memory and place it in the cache.

11
Access 3: Memory Address = 22 (10110)
Index = 22 mod 8 = 6 = 110
Tag bits will be the remaining msb bits = 10
In this case, Tag bits match with the table and the valid bit is set to 1 (Y)
So, Access 3 is a hit.

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

22 10 110 Hit 110

Access 4: Memory Address = 26 (11010)

Index = 26 mod 8 = 2 = 010
Tag bits will be the remaining msb bits = 11
In this case, Tag bits match with the table and the valid bit is set to 1 (Y)
So, Access 4 is a hit.

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

22 10 110 Hit 110

12
26 11 010 Hit 010

Access 5: Memory Address = 18 (10010)

Index = 18 mod 8 = 2 = 010
Tag bits will be the remaining msb bits = 10
In this case, although the valid bit is set to 1 but the tag bits do not match with
the table.
So, Access 5 is a miss.

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

22 10 110 Hit 110

26 11 010 Hit 010

18 10010 Miss 010

As this is a miss, now the CPU will fetch the corresponding block from main
memory and place it in the cache.

Can you calculate the Hit and Miss ratio now?

Hit ratio = Number of Hit / Total Access
Miss ratio = Number of Miss / Total Access

13
Here, Total access = 5, Number of Hit = 2, Number of Miss = 3
So,
Hit ratio = 2/5; 40% of the time, the cache served the data without needing to
access main memory (hit).
Miss ratio = 3/5; 60% of the time, the data was not in the cache and had to be
fetched (miss), causing a delay.

Index Bits Calculation

Example1:
Given, Cache size = 16 blocks
Find the index bits for the block address 125.
Or,
Find the cache index where the block address 125 will be mapped?

Solution:
Cache Index = Block Address Mod Number of cache blocks
= 125 mod 16
= 1111101 mod 24
= 1101 = 13

So, Block address 125 will be mapped to cache index 13(1101)

Example2:
Given, Cache size = 16 blocks; Block Size = 64 bits
Find the index bits for the memory address 320.

Solution:
Memory address = 320 = 101000000
Index bits size = log2(number of cache blocks) = log2(16) = 4
Block Size = 64 bits = 8 bytes
Offset bits size = log2(block size in bytes) = log2(8) = 3
Tag bits size = Total address size - Index bits size - Offset bits size
=9-4-3=2
Tag bits Index bits Offset bits
10 1000 000

14
So, index bits = 1000

Alternate solution,

Block address = memory address/size of block in bytes (take the floor value of
the division)
= 320 / 8 = 40

Cache Index = Block Address Mod Number of cache blocks

= 40 mod 16
= 101000 mod 24
= 1000 = 8

How to determine a cache hit or miss?

Given an address and you need to determine whether it will be a cache hit or
miss.

Steps:
1. Find the index, tag and offset bits from the given address;
2. Use the index bit to find the correct block in the cache;
3. If (the valid bit of that cache index == 1):
If (tag bits of the address == the tag bits of that cache index):
Cache Hit;
Else:
Cache Miss;
Else:
Cache miss;

Example: Given a cache table:

Index Valid bit Tag Data
000 1 100 Data
001 0
010 1 011 Data
011 1 001 Data

15
100 0
101 1 000 Data
110 1 010 Data
111 0
Given, the memory address 100110; determine whether it will be a cache hit or
miss?
Note: the block size is 1 byte.

Solution:
The cache has 8 blocks → Index size = log₂(8) = 3 bits
Block size = 1 byte → Offset = log₂(1) = 0 bits
Total address size = 6 bits → Tag size = 6 − 3 − 0 = 3 bits

Breakdown of the address 100110:

Tag = 100
Index = 110

From the cache table:

1. Valid bit at index 110 = 1
2. Now we check if the tag bits match or not;
Tag bit from the table for index 110 is 010
Which is not the same as the tag bit generated from the table.
So, it will be a miss.

Block Size Considerations

Benefits of Larger Block Size

Reduces Miss Rate (Initially) => Due to spatial locality: if you access address
A, you are likely to access A+1, A+2, etc. A larger block brings more nearby
data into the cache with each miss.
So, fewer memory accesses. Hence, fewer misses.

16
Issues of Larger Block Size (Especially in Fixed Cache Size)
1. Fewer Blocks in the Cache: If total cache size is fixed, larger blocks = fewer
blocks. This leads to more conflict misses (more data fighting for fewer slots).
2. Cache Pollution: If you load a large block but only use a few bytes, the rest is
wasted. This "pollutes" the cache and evicts potentially useful data.
3. Increased Miss Penalty: Larger block = more data to fetch from main
memory on a miss. Takes longer → higher latency on each miss. This can
cancel out the benefit of reduced miss rate.

Techniques to Reduce Penalty

1. Early Restart: Start sending data to the CPU as soon as the requested word is
fetched. Don’t wait for the entire block.
2. Critical-Word-First: Request the word you need first, even if it’s in the middle
of the block. The remaining words can be loaded in the background.

What happens when a Cache Miss occurs

In case of any cache miss CPU must fetch the data from the next level of
memory which is usually the main memory. This causes a pipeline stall. (CPU
has to pause while waiting for data)

Instruction Cache Miss: It happens when the CPU tries to get an instruction to
run, but that instruction is not found in the cache. The processor stops fetching
instructions until the required instruction is brought in from memory. (Restart of
the instruction fetch process)
Data Cache Miss: It happens when a load instruction refers to data that is not
present in the cache. To solve this, CPU completes data access.

17
Cache Hit on write operation

A write hit occurs when the CPU wants to store (write) data to a memory
location and that location is already present in the cache. Since the data is
already in the cache, the CPU can perform the write quickly. But how this write
is handled depends on the cache’s write policy.
Different cache write policies to follow for cache write hit:
i. Write-Through;
ii. Write-Back;

Write-Through:
On a write hit,
Data is written to both the cache and main memory simultaneously. This keeps
memory always up-to-date. It also ensures memory and cache are always in
sync.
This approach is a bit slower because every write includes accessing the slower
main memory.
Solution: Using a write buffer. Holds data waiting to be written to memory.
Allows the CPU to continue without stalling—only stalls if the buffer is full.

Write-Back:
On a write hit,
Data is updated in the cache block only. Afterwards, mark the cache block as
dirty. When a dirty block is replaced, write it back to memory.
You can use a write buffer here, too. A write buffer temporarily holds the dirty
block while the new block is fetched into the cache.
This allows the memory system to:
i. Start reading in the new block right away.
ii. Write the old dirty block to memory in the background (asynchronously).
This overlap saves time and helps avoid CPU stalls.

18
Cache Miss on write operation

A write miss occurs when the CPU wants to store (write) data to a memory
location, but that location is not present in the cache. Since the cache does not
have the data block yet, it must decide what to do — and this depends on the
cache’s write policy.

Different cache write policies to follow for cache write miss:

i. Write Allocation.
ii. No-Write Allocation.

Write Allocation:
On a write miss,
The block is brought into the cache, and then the write is performed in the cache
using either write-through or write-back.

No-Write Allocation:
On a write miss,
The write is performed directly in the main memory, and the block is not loaded
into the cache.

Write Around = Write-Through + No Write Allocate. Explore yourself.

Average Memory Access Time(AMAT)

AMAT = Hit time + Miss rate × Miss penalty

Example: Given that the CPU has a 1ns clock, hit time = 1 cycle, miss penalty =
20 cycles, and I-cache miss rate = 5%, find the AMAT and how many cycles are
required per instruction?

19
Solution:

Miss penalty = 20 cycles = 20 x 1ns = 20 ns

AMAT = Hit time + Miss rate × Miss penalty
= 1 + (5/100) x 20
=2
AMAT = 1 + 0.05 × 20 = 2ns
1ns = 1 cycle
2 ns = 2 cycle
So, 2 cycles per instruction

Multilevel Cache

Figure: Multilevel Cache

20
Formulas
1. 𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 = 𝐵𝑙𝑜𝑐𝑘 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝑚𝑜𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒 (Direct
mapped cache)

2. 𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 𝑠𝑖𝑧𝑒 = 𝑙𝑜𝑔2(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 𝑏𝑙𝑜𝑐𝑘𝑠)

3. 𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡 𝑠𝑖𝑧𝑒 = 𝑙𝑜𝑔2(𝐵𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒 𝑖𝑛 𝑏𝑦𝑡𝑒𝑠)

4. 𝑇𝑎𝑔 𝑏𝑖𝑡 𝑠𝑖𝑧𝑒 = 𝑇𝑜𝑡𝑎𝑙 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 𝑆𝑖𝑧𝑒 − 𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 𝑠𝑖𝑧𝑒 − 𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡 𝑠𝑖𝑧𝑒

5. 𝑀𝑒𝑚𝑜𝑟𝑦 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑠 𝑜𝑓 𝑇𝑎𝑔 𝑏𝑖𝑡𝑠, 𝐼𝑛𝑑𝑒𝑥 𝑏𝑖𝑡𝑠, 𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡𝑠 [𝑂𝑟𝑑𝑒𝑟 𝑖𝑠 𝑓𝑖𝑥𝑒𝑑]
𝑀𝑒𝑚𝑜𝑟𝑦 𝐴𝑑𝑑𝑟𝑒𝑠𝑠
6. 𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 = 𝐵𝑙𝑜𝑐𝑘 𝑆𝑖𝑧𝑒 𝑖𝑛 𝑏𝑦𝑡𝑒𝑠
[Always take the floor value of this answer.]
7. 𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡 𝑡𝑖𝑚𝑒 + 𝑀𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 × 𝑀𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑇𝑜𝑡𝑎𝑙 𝐻𝑖𝑡
8. 𝐻𝑖𝑡 𝑅𝑎𝑡𝑖𝑜 = 𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑠𝑠
= 1 − 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑖𝑜
𝑇𝑜𝑡𝑎𝑙 𝑀𝑖𝑠𝑠
9. 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑖𝑜 = 𝑇𝑜𝑡𝑎𝑙 𝐴𝑐𝑐𝑒𝑠𝑠
= 1 − 𝐻𝑖𝑡 𝑅𝑎𝑡𝑖𝑜

Cache Memory Design and Hierarchy
No ratings yet
Cache Memory Design and Hierarchy
9 pages
Lecture11 Cda3101
No ratings yet
Lecture11 Cda3101
73 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Lecture 03
No ratings yet
Lecture 03
37 pages
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
No ratings yet
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
16 pages
Understanding Hierarchical Memory Systems
No ratings yet
Understanding Hierarchical Memory Systems
18 pages
Unit 4 Memory Hierarchy
No ratings yet
Unit 4 Memory Hierarchy
66 pages
Memory Systems for Engineers
No ratings yet
Memory Systems for Engineers
77 pages
Memory Hierarchy Explained
No ratings yet
Memory Hierarchy Explained
14 pages
Module 5
No ratings yet
Module 5
30 pages
Computer Memory Hierarchy Basics
No ratings yet
Computer Memory Hierarchy Basics
9 pages
Chapter 5 - Memory
No ratings yet
Chapter 5 - Memory
44 pages
Unit 4 Coa - Memory-1
No ratings yet
Unit 4 Coa - Memory-1
12 pages
Lecture 31
No ratings yet
Lecture 31
8 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
30 pages
Cache Memory in Computer Organization
No ratings yet
Cache Memory in Computer Organization
12 pages
Module 2 - Memory Organization
No ratings yet
Module 2 - Memory Organization
12 pages
EC 5001 - Memory 1
No ratings yet
EC 5001 - Memory 1
56 pages
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
No ratings yet
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
25 pages
Cache Structure and Miss Rate Analysis
No ratings yet
Cache Structure and Miss Rate Analysis
63 pages
Chapter 7
No ratings yet
Chapter 7
39 pages
04 Cache Memory
No ratings yet
04 Cache Memory
80 pages
Cache Memory Presentation Slides
No ratings yet
Cache Memory Presentation Slides
25 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Module 3 Memory Unit
No ratings yet
Module 3 Memory Unit
19 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
Memory Hierarchy and CPU Connection
No ratings yet
Memory Hierarchy and CPU Connection
30 pages
Unit-2 CDA DrManojY
No ratings yet
Unit-2 CDA DrManojY
81 pages
Memory Hierarchy & Troubleshooting
No ratings yet
Memory Hierarchy & Troubleshooting
63 pages
Cache Memory CAD
No ratings yet
Cache Memory CAD
16 pages
COA Chapter 4
No ratings yet
COA Chapter 4
11 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
Memory Organization
No ratings yet
Memory Organization
57 pages
Unit III Memory Hierarchy
No ratings yet
Unit III Memory Hierarchy
21 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Memory Hierarchy Essentials
No ratings yet
Memory Hierarchy Essentials
60 pages
Cap Ese Q Answers
No ratings yet
Cap Ese Q Answers
11 pages
CA Unit-2 EE
No ratings yet
CA Unit-2 EE
13 pages
CSO Unit 5 Notes
No ratings yet
CSO Unit 5 Notes
23 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
Lecture 11. Memory Hierarchy
No ratings yet
Lecture 11. Memory Hierarchy
107 pages
Memory Organization AndCache Mapping Study 13
100% (1)
Memory Organization AndCache Mapping Study 13
55 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Ca-Module Ii Notes
No ratings yet
Ca-Module Ii Notes
75 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Information System
No ratings yet
Information System
7 pages
Unit 5 Memory System
No ratings yet
Unit 5 Memory System
77 pages
Memory Systems for Computer Science
No ratings yet
Memory Systems for Computer Science
46 pages
COA ch3
No ratings yet
COA ch3
39 pages
6 Memory Organization
No ratings yet
6 Memory Organization
44 pages
Usha Mittal Institute of Technology SNDT Women'S University: MUMBAI - 400049
No ratings yet
Usha Mittal Institute of Technology SNDT Women'S University: MUMBAI - 400049
19 pages
Cache Memory: Types and Performance
No ratings yet
Cache Memory: Types and Performance
4 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Help 2
No ratings yet
Help 2
102 pages
Memory Hierarchy Design Explained
No ratings yet
Memory Hierarchy Design Explained
17 pages
CSE423 Final Practice Sheet v2.0
No ratings yet
CSE423 Final Practice Sheet v2.0
5 pages
Finals - Practice Problems - With Solution
No ratings yet
Finals - Practice Problems - With Solution
18 pages
Sociology Lecture 1 Notes For Quiz 1
No ratings yet
Sociology Lecture 1 Notes For Quiz 1
23 pages
Lab2 Spec SymbolTable Generation
No ratings yet
Lab2 Spec SymbolTable Generation
5 pages
IEC 61850 Overview
100% (2)
IEC 61850 Overview
40 pages
Database Architect Profile
No ratings yet
Database Architect Profile
4 pages
BL654 USB Dongle User Guide v1 - 0
No ratings yet
BL654 USB Dongle User Guide v1 - 0
11 pages
Cisco ACI Overview & Configuration Guide
100% (1)
Cisco ACI Overview & Configuration Guide
20 pages
Introduction To VLSI Design: Amit Kumar Mishra ECE Department IIT Guwahati
No ratings yet
Introduction To VLSI Design: Amit Kumar Mishra ECE Department IIT Guwahati
20 pages
Developing Embedded Applications Using CompactRIO and LabVIEW FPGA - PG
No ratings yet
Developing Embedded Applications Using CompactRIO and LabVIEW FPGA - PG
284 pages
Access Violation in mikroC PRO PIC
No ratings yet
Access Violation in mikroC PRO PIC
30 pages
VAS 5054A Installation Guide
No ratings yet
VAS 5054A Installation Guide
0 pages
All About Cookies
No ratings yet
All About Cookies
3 pages
Basic BGP Lab
No ratings yet
Basic BGP Lab
19 pages
IT Recruitment & HR Expertise
No ratings yet
IT Recruitment & HR Expertise
4 pages
D510E Dome IP Camera Features
No ratings yet
D510E Dome IP Camera Features
2 pages
UDM 21.1 Product Description
No ratings yet
UDM 21.1 Product Description
66 pages
Computer POST Troubleshooting Guide
No ratings yet
Computer POST Troubleshooting Guide
5 pages
Web Services in SAP Business ByDesign PDF
No ratings yet
Web Services in SAP Business ByDesign PDF
72 pages
Discord Fonts ー Discord Fonts Generator
No ratings yet
Discord Fonts ー Discord Fonts Generator
1 page
Mobile Photography Snap Cards Guide
100% (2)
Mobile Photography Snap Cards Guide
1 page
SLUCT Configuration Overview
No ratings yet
SLUCT Configuration Overview
25 pages
Ruchita Desai .Net Developer
No ratings yet
Ruchita Desai .Net Developer
3 pages
E2 Lab 7 5 2
No ratings yet
E2 Lab 7 5 2
8 pages
Network Flow & Congestion Control
No ratings yet
Network Flow & Congestion Control
1 page
Makalah Iklim Dan Cuaca - PDF
No ratings yet
Makalah Iklim Dan Cuaca - PDF
46 pages
ASM 11gR2 Upgrade: Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) For
No ratings yet
ASM 11gR2 Upgrade: Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) For
30 pages
Leap Motion Developer Portal
No ratings yet
Leap Motion Developer Portal
21 pages
Tibco Hawk Microagent For Tibco Activematrix Businessworks 6 Installation Guide
No ratings yet
Tibco Hawk Microagent For Tibco Activematrix Businessworks 6 Installation Guide
15 pages
SPI MX25L6473F, 3V, 64Mb, v1.3
No ratings yet
SPI MX25L6473F, 3V, 64Mb, v1.3
77 pages
Motorola Commands
100% (1)
Motorola Commands
4 pages
More Than 7000 Electrical & Electronics Engineering Books
55% (11)
More Than 7000 Electrical & Electronics Engineering Books
220 pages
Cloud Computing - Question Bank
100% (1)
Cloud Computing - Question Bank
2 pages
Esp32 Technical Reference Manual en
100% (1)
Esp32 Technical Reference Manual en
259 pages

Handout Chapter-5 PBK

Uploaded by

Handout Chapter-5 PBK

Uploaded by

CSE340: Computer Architecture

Handout_Chapter - 5: Large and Fast: Exploiting

Prepared by: Partha Bhoumik (PBK)

Imagine you have a program with 1000 lines of code, written in a

Consider a simple loop inside that 1000-line program:

Now, again, think about a function for factorial calculation,

Although your larger program is going to access lot of different memory

Temporal Locality: Items accessed recently are likely to be accessed again

We can take advantage of these two characteristics by copying the instructions

Figure: The basic structure of a memory hierarchy.

Computer memory is designed as a hierarchy to take advantage of the principle

Key Characteristics of Memory Hierarchy:

Slower Memory (Far away from CPU)

Read this to know more:

Different terms related to Cache

Figure: Memory Hierarchy Levels

Direct Mapped Cache

Now the question is,

A main memory block will be mapped to which cache index? ​

𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 = 𝐵𝑙𝑜𝑐𝑘 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 𝑚𝑜𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒

Follow the explanation to have a better idea.

00101 (5) 5 mod 8 = 5 101

Formula to calculate the size of the index bits(for direct-mapped):

Formula to calculate the size of the Tag bits(for direct-mapped):

Formula to calculate the size of the Offset bits(for direct-mapped):

Size: Always 1 bit per cache block.

For a better understanding, please follow this simulation.

Access 1: Memory Address = 22 (10110)

At the start, the cache is empty:

22 10 110 Miss 110

Access 2: Memory Address = 26 (11010)

Access 2 is also a miss.

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

22 10 110 Hit 110

Access 4: Memory Address = 26 (11010)

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

22 10 110 Hit 110

Access 5: Memory Address = 18 (10010)

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

26 11 010 Miss 010

22 10 110 Hit 110

26 11 010 Hit 010

18 10010 Miss 010

Can you calculate the Hit and Miss ratio now? ​

Index Bits Calculation

So, Block address 125 will be mapped to cache index 13(1101)

Cache Index = Block Address Mod Number of cache blocks

How to determine a cache hit or miss?

Example: Given a cache table:

Breakdown of the address 100110:

From the cache table:

Block Size Considerations

Benefits of Larger Block Size

Techniques to Reduce Penalty

What happens when a Cache Miss occurs

Different cache write policies to follow for cache write miss:

Write Around = Write-Through + No Write Allocate. Explore yourself.

Average Memory Access Time(AMAT)

AMAT = Hit time + Miss rate × Miss penalty

Miss penalty = 20 cycles = 20 x 1ns = 20 ns

Figure: Multilevel Cache

2. 𝐶𝑎𝑐ℎ𝑒 𝐼𝑛𝑑𝑒𝑥 𝑠𝑖𝑧𝑒 = 𝑙𝑜𝑔2(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑐ℎ𝑒 𝑏𝑙𝑜𝑐𝑘𝑠)

3. 𝑂𝑓𝑓𝑠𝑒𝑡 𝑏𝑖𝑡 𝑠𝑖𝑧𝑒 = 𝑙𝑜𝑔2(𝐵𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒 𝑖𝑛 𝑏𝑦𝑡𝑒𝑠)

You might also like

A main memory block will be mapped to which cache index?

Can you calculate the Hit and Miss ratio now?