0% found this document useful (0 votes)

30 views73 pages

Lecture11 Cda3101

The document discusses memory hierarchy and caching. It begins by explaining that memory has desirable properties of large size and fast access speed but these cannot be optimized together. A memory hierarchy with multiple levels of memory of varying speeds and sizes exploits locality principles. Caches are implemented using direct mapping where a memory block's address maps to a single cache location. Caches use tags stored with blocks to verify a hit by comparing high address bits. Valid bits ensure only meaningful cache blocks are used.

Uploaded by

BushraS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views73 pages

Lecture11 Cda3101

Uploaded by

BushraS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

LECTURE 11 Memory Hierarchy

MEMORY HIERARCHY
When it comes to memory, there are two universally desirable properties:
• Large Size: ideally, we want to never have to worry about running out of memory.
• Speed of Access: we want the process of accessing memory to take as little time as
possible.

But we cannot optimize both of these properties at the same time. As our memory size
increases, the time to find a memory location and access it grows as well.

The goal of designing a memory hierarchy is to simulate having unlimited amounts of

fast memory.
LOCALITY
To simulate these properties, we can take advantage of two forms of locality.
• Temporal Locality: if an item is referenced, it will tend to be referenced again soon.
• The location of the variable counter may be accessed frequently in a loop.
• A branching instruction may be accessed repeatedly in a given period of time.

• Spatial Locality – if an item is referenced, items whose addresses are close by will
tend to be referenced soon.
• If we access the location of A[0], we will probably also be accessing A[1], A[2], etc.
• Sequential instruction access also exhibits spatial locality.
MEMORY HIERARCHY
A memory hierarchy, consisting of multiple levels of memory with varying speed and
size, exploits these principles of locality.
• Faster memory is more expensive per bit, so we use it in smaller quantities.
• Slower memory is much cheaper so we can afford to use a lot of it.

The goal is to, whenever possible, keep references in the fastest memory. However,
we also want to minimize our overall memory cost.
MEMORY HIERARCHY

All data in a level is typically also found

in the next largest level.

We keep the smallest, faster memory

unit closest to the processor.

The idea is that our access time during a

running program is defined primarily by
the access time of the level 1 unit. But
our memory capacity is as large as the
level n unit.
MEMORY HIERARCHY
Processor

The unit of data that is transferred between two levels is fixed

Level 1 in size and is called a block, or a line.

Level 2

Level 3
MEMORY HIERARCHY
Speed Processor Size Cost ($/bit) Technology

Fastest Memory Smallest Highest SRAM

Memory DRAM

Slowest Biggest Lowest Magnetic Disk

Memory
MEMORY HIERARCHY
There are four technologies that are used in a memory hierarchy:
• SRAM (Static Random Access Memory): fastest memory available. Used in memory
units close to the processor called caches. Volatile.
• DRAM (Dynamic Random Access Memory): mid-range. Used in main memory. Volatile.
• Flash: Falls between DRAM and disk in cost and speed. Used as non-volatile memory
in personal mobile devices.
• Magnetic Disk: slowest memory available. Used as non-volatile memory in a server
or PC.
MEMORY HIERARCHY

Technology Typical Access Time $ per GiB in 2016

SRAM 0.5-5 ns $400 - $1000

DRAM 50-70 ns $3 - $5
Flash 5,000-50,000 ns $0.30 - $0.50
Magnetic Disk 5,000,000 – 20,000,000 ns $0.05 - $0.10
MEMORY HIERARCHY TERMS
• Hit: item found in a specified level of the hierarchy.
• Miss: item not found in a specified level of the hierarchy.
• Hit time: time required to access the desired item in a specified level of the
hierarchy (includes the time to determine if the access is a hit or a miss).
• Miss penalty: the additional time required to service the miss.
• Hit rate: fraction of accesses that are in a specified level of the hierarchy.
• Miss rate: fraction of accesses that are not in a specified level of the hierarchy.
• Block: unit of information that is checked to reside in a specified level of the
hierarchy and is retrieved from the next lower level on a miss.
MEMORY HIERARCHY
The key points so far:
• Memory hierarchies take advantage of temporal locality by keeping more recently
accessed data items closer to the processor. Memory hierarchies take advantage of
spatial locality by moving blocks consisting of multiple contiguous words in memory to
upper levels of the hierarchy.
• Memory hierarchy uses smaller and faster memory technologies close to the
processor. Accesses that hit in the highest level can be processed quickly. Accesses that
miss go to lower levels, which are larger but slower. If the hit rate is high enough, the
memory hierarchy has an effective access time close to that of the highest (and
fastest) level and a true size equal to that of the lowest (and largest) level.
• Memory is typically a true hierarchy, meaning that data cannot be present in level i
unless it is also present in level i+1.
CACHES
We’ll begin by looking at the most basic cache. Let’s say we’re running a program
that, so far, has referenced 𝑛 − 1 words. These could be 𝑛 − 1 independent integer
variables, for example.
At this point, our cache might look like this (assuming a
block is simply 1 word). That is, every reference made
so far has been moved into the cache to take
advantage of temporal locality.

What happens when our program references 𝑋% ?

CACHES
A reference to 𝑋% causes a miss, which forces the cache to fetch 𝑋% from some lower
level of the memory hierarchy, presumably main memory.

Two Questions:
1. How do we know if an item is
present in the cache?
2. How do we find the item in the
cache?
CACHES
A reference to 𝑋% causes a miss, which forces the cache to fetch 𝑋% from some lower
level of the memory hierarchy, presumably main memory.

Two Questions:
1. How do we know if an item is
present in the cache?
2. How do we find the item in the
cache?

One Answer: If each word can go

in exactly one place in the cache, then
we can easily find it in the cache.
DIRECT-MAPPED CACHES
The simplest way to assign a location in the cache for each word in memory is to
assign the cache location based on the address of the word in memory.
This creates a direct-mapped cache – every location in memory is mapped directly to
one location in the cache.
A typical direct-mapped cache uses the following mapping:
𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 % (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒)
Conveniently, entering a block into a cache with 2% entries means just looking at the
lower 𝑛 bits of the block address.
DIRECT-MAPPED CACHE

Here is an example cache which contains

2> = 8 entries.

Blocks in memory are mapped to a

particular cache index if the lower 3 bits
of the block address matches the index.

So, now we know where to find the data

but we still have to answer the following
question: how do we know if the data we
want is in the cache?
TAGS
To verify that a cache entry contains the data we’re looking for, and not data from
another memory address with the same lower bits, we use a tag.
A tag is a field in a table which corresponds to a cache entry and gives extra
information about the source of the data in the cache entry.

What is an obvious choice for the tag?

TAGS
To verify that a cache entry contains the data we’re looking for, and not data from
another memory address with the same lower bits, we use a tag.
A tag is a field in a table which corresponds to a cache entry and gives extra
information about the source of the data in the cache entry.

What is an obvious choice for the tag? The upper bits of the address of the block!
TAGS
For instance, in this particular example,
let’s say the block at address 01101 is
held in the cache entry with index 101.

The tag for the cache entry with index 101

must then be 01, the upper bits of the
address.

Therefore, when looking in the cache for

the block at address 11101, we know that
we have a miss because 11 != 01.
VALID BIT
Even if there is data in the cache entry and a tag associated with the entry, we may
not want to use the data. For instance, when a processor has first started up or when
switching processes, the cache entries and tag fields may be meaningless.
Generally speaking, a valid bit associated with the cache entry can be used to ensure
that an entry is valid.
EXERCISE
Let’s assume we have an 8-entry cache with the initial state shown
to the right. Let’s fill in the cache according to the references that
come in listed in the table below.

Note that initially the valid-bit entries are all ‘N’ for not valid.
EXERCISE
The first reference is for the block at address 22, which uses
the lower bits 110 to index into the cache. The 110 cache
entry is not valid so this is a miss.

We need to retrieve the contents of the block at address 22 and place

it in the cache entry.
EXERCISE
The block at address 22 is now placed in the data entry of the
cache and the tag is updated to the upper portion of the address, 10.
Also, the valid bit is set to ‘Y’.

Now, we have a reference to the block at address 26. What happens

here?
EXERCISE
We have a miss, so we retrieve the data from address 26 and place
it in the cache entry. We also update the tag and valid bit.

Now, we have a reference the block at address 22 again. Now what

happens?
EXERCISE
The correct data is already in the cache! We don’t have to update the
contents or fetch anything from main memory.

Similarly, we will have another reference to the block at address 26.

We do not need to update the cache at all.
EXERCISE
Now, we have a reference to the block at address 16. Its associated
cache entry is invalid, so we will need to fetch the data from main
memory and update the entry.
EXERCISE
Now, we have a reference to the block at address 3. Its associated
cache entry is invalid, so we will need to fetch the data from main
memory and update the entry.
EXERCISE
A reference to the block at address 16 causes a hit (as we have
already pulled this data into the cache) so we do not have to make
any changes.
EXERCISE
Now, we get something interesting. We have a reference to the block
at address 18. The lower bits used to index into the cache are 010.
As these are also the lower bits of address 26, we have a valid entry
but it’s not the one we want. Comparing the tag of the entry with the
upper portion of 18’s binary representation tells us we have a miss.
EXERCISE
We fetch the data at address 18 and update the cache entry to hold
this data, as well as the correct tag. Note now that a reference to the
block at address 26 will result in a miss and we’ll have to fetch that
data again.
PHYSICAL ADDRESS TO CACHE
To the right is a figure showing how a typical physical
address may be divided up to find the valid entry
within the cache.

• The offset is used to indicate the first byte accessed

within a block. Its size is log D 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑦𝑡𝑒𝑠 𝑖𝑛 𝑏𝑙𝑜𝑐𝑘.
For example, a block containing 4 bytes does not need to
consider the lower 2 bits of the address to index into the
cache.
• The cache index, in this case, is a 10-bit wide lower
portion of the physical address (because there are
2FG = 1024 entries).
• The tag is the upper 20 bits of the physical address.
OFFSET
Consider a scheme where a …
block of memory contains 2 34892894 0000 0010 0001 0100 0110 1100 0101 1110
words. Each word is 4 bytes. 34892895 0000 0010 0001 0100 0110 1100 0101 1111
Bytes are the smallest 34892896 0000 0010 0001 0100 0110 1100 0110 0000
addressable unit of memory 34892897 0000 0010 0001 0100 0110 1100 0110 0001
so a block starting at address Word 1
34892896 contains 8 byte- 34892898 0000 0010 0001 0100 0110 1100 0110 0010
addressable locations. Block 34892899 0000 0010 0001 0100 0110 1100 0110 0011
34892900 0000 0010 0001 0100 0110 1100 0110 0100
Because 2> = 8, we need 3 34892901 0000 0010 0001 0100 0110 1100 0110 0101
bits to individually identify the Word 2
addresses in the block. The 4th 34892902 0000 0010 0001 0100 0110 1100 0110 0110
bit is the first bit common to 34892903 0000 0010 0001 0100 0110 1100 0110 0111
all addresses in the block. 34892904 0000 0010 0001 0100 0110 1100 0110 1000
Therefore, the offset to the 34892905 0000 0010 0001 0100 0110 1100 0110 1001
index is given by …
log D (𝑛𝑢𝑚 𝑏𝑦𝑡𝑒𝑠 𝑖𝑛 𝑏𝑙𝑜𝑐𝑘).
BLOCKS IN A CACHE
We’ve mostly assumed so far that a block contains one word, or 4 bytes. In reality, a
block contains several words.
Assuming we are using 32-bit addresses, consider a direct-mapped cache which
holds 2% blocks and each block contains 2J words.

How many bytes are in a block?

How big does a tag field need to be?

BLOCKS IN A CACHE
We’ve mostly assumed so far that a block contains one word, or 4 bytes. In reality, a
block contains several words.
Assuming we are using 32-bit addresses, consider a direct-mapped cache which
holds 2% blocks and each block contains 2J words.

How many bytes are in a block? 2J ∗ 4 = 2J ∗ 2D = 2JLD bytes per block.

How big does a tag field need to be? 32 – (n + m + 2). A block has a 32-bit
address. We do not consider the lower m+2 bits because there are 2JLD bytes in a
block. We need n bits to index into the cache, m bits to identify the word.
EXERCISE
How many total bits are required for a direct-mapped cache with 16 KB of data and
4-word blocks, assuming a 32-bit address?
EXERCISE
How many total bits are required for a direct-mapped cache with 16 KB of data and
4-word blocks, assuming a 32-bit address?

We know that 16 KB is 4K words, which is 2FD words, and, with a block size of 4
words (2D ), 2FG blocks.
Each block contains 4 words, or 128 bits, of data. Each block also has a tag that is
32-10-2-2 bits long, as well as one valid bit. Therefore, the total cache size is

2FG × 128 + 32 − 10 − 2 − 2 + 1 = 147 𝐾𝑏𝑖𝑡𝑠

Or, 18.4 KB cache for 16KB of data.
EXERCISE
Consider a cache with 64 blocks and a block size of 16 bytes (4 words). What block
number does byte address 1200 ( 0100 1011 0000) map to?
EXERCISE
Consider a cache with 64 blocks and a block size of 16 bytes (4 words). What block
number does byte address 1200 map to?

First of all, we know the entry into the cache is given by

𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 % (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒)
RSTU VWWXUYY
Where the block address is given by .
Z[J\UX ]^ \STUY _UX \`]ab
FDGG
So, the block address is = 75.
Fc
This corresponds to block number 75 % 64 = 11. This block maps all addresses
between 1200 and 1215.
BLOCK SIZE AND MISS RATE
A larger block size means we bring
more contiguous bytes of memory
in when we fetch a block. This can
lower our miss rate as it exploits
spatial locality.
However, in a fixed-size cache, a
larger block size means less blocks in
a cache – therefore, we may have
blocks competing for cache space
more often.
Furthermore, a larger block size takes
more time to retrieve from main
memory in the case of a miss.
64-BYTE BLOCK CACHE
Here is a 256-entry cache that has
64-byte block entries. That is, each block
is 16 words wide.

We index using an 8-bit portion of the

address. The individual bytes of the
address are identifiable using the lower
6 bits (2c = 64). However, we don’t want
to access every byte. We only want to
access individual words. This requires 4
cf
bits because = 16 = 2f .
f
EXERCISE
Assume a direct-mapped cache with 4 blocks and 8 bytes per block.
How is the physical address portioned?
Tag bits Index bits Offset bits

Fill in the appropriate

information for the Address Tag Index Offset
following memory 4
references: 8
12
20
67
EXERCISE
Assume a direct-mapped cache with 4 blocks and 8 bytes per block.
How is the physical address portioned?
Tag bits Index bits Offset bits
27 [31-5] 2 [4-3] 3 [2-0]

Fill in the appropriate

information for the Address Tag Index Offset
following memory 4 0 0 4
references: 8 0 1 0
12 0 1 4
20 0 2 4
67 2 0 3
FULLY-ASSOCIATIVE CACHE
We’ve already seen direct-mapped caches, a simple scheme where every block has
one particular cache entry where it can be placed.
In a fully-associative cache, any block can be found in any entry of the cache.
To find a block in the cache, we must search the entire cache – therefore, this scheme
is only practical for caches with a small number of entries.
SET-ASSOCIATIVE CACHE
The middle ground between direct-mapped and fully-associative is set-associative.
In a set-associative cache, there are a fixed number of entries where a particular
block may be found. If a set-associative cache allows n different entries for a block
to be found, it is called an n-way set-associative cache.

An n-way set-associative cache may have some number of sets, each containing n
blocks. A block address can be mapped to a particular set, in which the block can be
placed in any of the n entries.

To find a reference in a set-associative cache, we figure out its set based on the
address and then search all of the entries in the set.
SET-ASSOCIATIVE CACHE
The example below has a reference with a block address of 12 and each cache
organization has 8 entries.

In a set-associative cache, the set

can be found using the following:

𝐵𝑙𝑜𝑐𝑘 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 % (#𝑆𝑒𝑡𝑠)

SET-ASSOCIATIVE CACHE
All placement strategies
are really a variation on
set-associativity.
SET-ASSOCIATIVE CACHE
The advantage of increasing the degree of associativity is that, typically, the miss
rate will decrease.

The disadvantages are:

• Potential hit time increase.
• More tag bits per cache block.
• Logic to determine which block to replace.
FOUR-WAY SET-ASSOCIATIVE CACHE
Here is a set-associative cache with 256 sets
of four blocks each, where each block is one
word.

The index tells us which set to look in. We need

8 bits for the index because 2i = 256.

The tag of every entry is compared to the upper

22 bits of the address. If there is a match, and
the valid bit is set, we have a hit. The mux selects
the data of the entry that resulted in a hit.

Otherwise, we have a miss.

SET-ASSOCIATIVE CACHE
Assume a 2-way set-associative cache with 64 sets and 4 words per block.

How is the physical address partitioned?

Tag bits Index bits Offset bits

Fill in the appropriate Address Tag Index Offset

information for the 300
following memory 304
references: 1216
4404
4408
SET-ASSOCIATIVE CACHE
Assume a 2-way set-associative cache with 64 sets and 4 words per block.

How is the physical address partitioned?

Tag bits Index bits Offset bits
22 6 4

Let’s say, in the MEM stage of a store word instruction, we write to the data cache.
Then, main memory and data cache will have different values for that particular
block. In this case, they are said to be inconsistent.
There are two solutions to this issue. The method we use becomes our write policy.
• Write-through
• Write-back
WRITE POLICIES
• Write-through
• Always write data into both the cache and main memory (or the next lower level).
• Easily implemented.
• Could slow down the processor à use a write buffer to allow the processor to continue executing
while the data is written to memory.
• Cache and memory are always consistent.

• Write-back
• Only write the data to the cache block.
• The updated block is only written back to memory when it is replaced by another block.
• A dirty bit is used to indicate whether the block needs to be written or not.
• Reduces accesses to the next lower level.

What if the block to be written is not in the cache?

WRITE MISS POLICIES
• Write allocate
• The block is loaded into the cache on a write miss.
• Typically used with write back.

• No-write allocate
• The block is not loaded into the cache on a write miss.
• Block simply updated in main memory.
• Typically used with write through.
WRITE-THROUGH, NO-WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300
R 304
R 4404
W 4408
W 8496
R 8500
R 304
WRITE-THROUGH, NO-WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12 Miss Yes No
R 304 0 19 0
R 4404 4 19 4
W 4408 4 19 8
W 8496 8 19 0
R 8500 8 19 4
R 304 0 19 0
WRITE-THROUGH, NO-WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12 Miss Yes No
R 304 0 19 0 Miss Yes Yes
R 4404 4 19 4 Miss Yes Yes
W 4408 4 19 8 Hit Yes Yes
W 8496 8 19 0 Miss Yes No
R 8500 8 19 4 Miss Yes Yes
R 304 0 19 0 Miss Yes Yes
WRITE-BACK, WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12
R 304 0 19 0
R 4404 4 19 4
W 4408 4 19 8
W 8496 8 19 0
R 8500 8 19 4
R 304 0 19 0
WRITE-BACK, WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12 Miss Yes Yes
R 304 0 19 0 Miss Yes Yes
R 4404 4 19 4 Miss Yes Yes
W 4408 4 19 8 Hit No Yes
W 8496 8 19 0 Miss Yes Yes
R 8500 8 19 4 Hit No No
R 304 0 19 0 Miss Yes (2) Yes
CACHE MISSES
Let’s consider the effect of cache misses for instructions. Assume our miss penalty is 10
cycles and the miss rate is .10.

The average access time for an instruction is given by:

ℎ𝑖𝑡 𝑡𝑖𝑚𝑒 + 𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 ∗ 𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
So, the number of cycles needed to fetch instructions is:
#𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 ∗ 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑎𝑐𝑐𝑒𝑠𝑠 𝑡𝑖𝑚𝑒
= #𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 ∗ (ℎ𝑖𝑡 𝑡𝑖𝑚𝑒 + 𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 ∗ 𝑚𝑖𝑠𝑠 𝑝𝑒𝑛𝑎𝑙𝑡𝑦)
= #𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 ∗ (1 + .10 ∗ 10)
= #𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 ∗ 2.0
CACHES FOR PIPELINED PROCESSORS
In reality, instructions and data have separate caches.
This allows us to not only avoid structural hazards (when one instruction is being
fetched while another accesses memory in the same cycle), but also fine-tune the
specs of the cache for each task.

Cycle 1 2 3 4 5 6 7 8
inst1 IF ID EX MEM WB
inst2 IF ID EX MEM WB
inst3 IF ID EX MEM WB
inst4 IF ID EX MEM WB
MEMORY HIERARCHY MISSES
Not all misses are equal. We can categorize them in the following way:
• Compulsory Misses
• Caused by first access to block.
• Possibly decreased by increasing block size.

• Capacity Misses
• Caused when memory level cannot contain all blocks needed during execution of process.
• Can be decreased by increasing cache size.

• Conflict Misses
• Occur when too many blocks compete for same entry in cache.
• Can be decreased by increasing associativity.
CRITICAL WORD FIRST AND EARLY RESTART
One way to reduce the penalty for misses is to reduce the time spent waiting for the
actual request data, rather than the whole block of data.

Critical word first means to request the missed word first from the next memory
hierarchy level to allow the processor to continue while filling in the remaining words
in the block, usually in a wrap-around fill manner.

Early restart means to fetch the words in the normal order, but allow the processor to
continue once the requested word arrives.
MULTILEVEL CACHES
Three levels of cache all on the same chip are now common, where there are
separate L1 instruction and data caches and unified L2 and L3 caches.
• The L1 cache is typically much smaller than L2 cache with lower associativity to
provide faster access times. Same with L2 and L3.
• The L1 caches typically have smaller block sizes than L2 caches to have a shorter
miss penalty. Same with L2 and L3.
• Lower cache levels being much larger and having higher associativity than higher
cache levels decreases their misses, which have higher miss penalties.
MULTILEVEL CACHE PERFORMANCE
The miss penalty of an upper level cache is the average access time of the next lower
level cache.
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑐𝑐𝑒𝑠𝑠 𝑇𝑖𝑚𝑒 = 𝐿1 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿1 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)
where
𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = 𝐿2 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿2 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)

What is the average access time given that the L1 hit time is 1 cycle, the L1 miss rate
is 0.05, the L2 hit time is 4 cycles, the L2 miss rate is 0.25, and the L2 miss penalty is
50 cycles?
MULTILEVEL CACHE PERFORMANCE
The miss penalty of an upper level cache is the average access time of the next lower
level cache.
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑐𝑐𝑒𝑠𝑠 𝑇𝑖𝑚𝑒 = 𝐿1 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿1 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)
where
𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = 𝐿2 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿2 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)

What is the average access time given that the L1 hit time is 1 cycle, the L1 miss rate
is 0.05, the L2 hit time is 4 cycles, the L2 miss rate is 0.25, and the L2 miss penalty is
50 cycles?
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑐𝑐𝑒𝑠𝑠 𝑇𝑖𝑚𝑒 = 1 + .05 ∗ 4 + .25 ∗ 50 = 1.85
MULTILEVEL CACHE PERFORMANCE
• Local Miss Rate: the fraction of references to one level of a cache that miss.
𝑀𝑖𝑠𝑠𝑒𝑠 𝑖𝑛 𝐿2
Example: 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 =
𝐴𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑡𝑜 𝐿2
• Global Miss Rate: the fraction of references that miss in all levels of a multilevel
cache.
Example: 𝐺𝑙𝑜𝑏𝑎𝑙 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 = 𝐿1 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ …
IMPROVING CACHE PERFORMANCE
• Techniques for reducing the miss rate:
• Increase the associativity to exploit temporal locality.
• Increase the block size to exploit spatial locality.

• Techniques for reducing the miss penalty:

• Use wrap-around filling of a line (early restart and critical word first).
• Use multilevel caches.

• Techniques for reducing the hit time:

• Use small and simple L1 caches.
APPENDIX: SRAM
• Static Random Access Memory
• Used in caches.
• Has a single access port for reads/writes.
• Access time is 5-10 times faster than DRAM.
• Semiconductor memory that uses ~6 transistors for each bit of data.
• Data is maintained as long as power to the SRAM chip is provided; no need to
refresh.
APPENDIX: DRAM
• Dynamic Random Access Memory
• Used for main memory.
• Requires a single transistor per bit (much denser and cheaper than SRAM).
• Data is lost after being read, so we must refresh after a read by writing back the
data.
• The charge can be kept for several milliseconds before a refresh is required. About
1%-2% of the cycles are used to refresh – accomplished by reading a row of data
and writing it back.

Lecture 03
No ratings yet
Lecture 03
37 pages
Chapter 5 - Memory
No ratings yet
Chapter 5 - Memory
44 pages
Handout Chapter-5 PBK
No ratings yet
Handout Chapter-5 PBK
21 pages
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
No ratings yet
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
25 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Memory Hierarchy and Cache Performance
No ratings yet
Memory Hierarchy and Cache Performance
101 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Cache Memory Design and Hierarchy
No ratings yet
Cache Memory Design and Hierarchy
9 pages
Module 4: Memory System Organization & Architecture
No ratings yet
Module 4: Memory System Organization & Architecture
97 pages
EC 5001 - Memory 1
No ratings yet
EC 5001 - Memory 1
56 pages
Cache Memory CAD
No ratings yet
Cache Memory CAD
16 pages
Chap 6
No ratings yet
Chap 6
48 pages
Address Field Breakdown for Cache System
No ratings yet
Address Field Breakdown for Cache System
55 pages
6 Memory Organization
No ratings yet
6 Memory Organization
44 pages
Memory Hierarchy Essentials
No ratings yet
Memory Hierarchy Essentials
60 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
CA09 2024S2 New
No ratings yet
CA09 2024S2 New
29 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Memory Tag Size Calculation
No ratings yet
Cache Memory Tag Size Calculation
63 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Memory Hierarchy in Computer Architecture
No ratings yet
Memory Hierarchy in Computer Architecture
48 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Lecture 9 - The Memory Hierarchy
No ratings yet
Lecture 9 - The Memory Hierarchy
25 pages
Unit 6
No ratings yet
Unit 6
25 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
Cache Memories
No ratings yet
Cache Memories
41 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Memory Hierarchy and Cache Concepts
No ratings yet
Memory Hierarchy and Cache Concepts
13 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Ch4 CacheMemory
No ratings yet
Ch4 CacheMemory
29 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Memory Hierarchy & Cache Basics
No ratings yet
Memory Hierarchy & Cache Basics
18 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
Help 2
No ratings yet
Help 2
102 pages
The Basics of Caches
No ratings yet
The Basics of Caches
14 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Memory Subsystem Notes-1
No ratings yet
Memory Subsystem Notes-1
8 pages
MODULE 5 Caches
No ratings yet
MODULE 5 Caches
49 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Cache - Memory - Concept
No ratings yet
Cache - Memory - Concept
73 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
30 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Chapter 5
No ratings yet
Chapter 5
131 pages
ESD Unit 4 Memory 2024
No ratings yet
ESD Unit 4 Memory 2024
78 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Week 11
No ratings yet
Week 11
45 pages
The Future of IPTV in India
No ratings yet
The Future of IPTV in India
9 pages
Uvm Code Examples
100% (1)
Uvm Code Examples
115 pages
Class 8 Comp For Deepanshu
No ratings yet
Class 8 Comp For Deepanshu
1 page
Cap Report
No ratings yet
Cap Report
34 pages
Semi Custom Design Flow Leveraging Place
No ratings yet
Semi Custom Design Flow Leveraging Place
20 pages
Diamond 3 14 Tutorial
No ratings yet
Diamond 3 14 Tutorial
46 pages
IPO Server Edition Backup and Restore June 2024
No ratings yet
IPO Server Edition Backup and Restore June 2024
72 pages
Library Management System Feasibility Report
No ratings yet
Library Management System Feasibility Report
8 pages
AIO 2024 Q6 Atlantis III Twin Rivers
No ratings yet
AIO 2024 Q6 Atlantis III Twin Rivers
3 pages
Technology Companies Directory
No ratings yet
Technology Companies Directory
2 pages
Assignment 1 Introduction To SQL
No ratings yet
Assignment 1 Introduction To SQL
21 pages
50 3 B Diagram Situation en
No ratings yet
50 3 B Diagram Situation en
1 page
Personnel Administration Overview in SAP
100% (1)
Personnel Administration Overview in SAP
121 pages
Multimedia Final Exam
No ratings yet
Multimedia Final Exam
3 pages
Computer Applications Course Guide
No ratings yet
Computer Applications Course Guide
226 pages
Ceragon FibeAir IP-20S Datasheet
No ratings yet
Ceragon FibeAir IP-20S Datasheet
6 pages
Vico Office R6.5 Installation Guide
No ratings yet
Vico Office R6.5 Installation Guide
32 pages
SAP Production Versions
No ratings yet
SAP Production Versions
3 pages
P25D80SH
No ratings yet
P25D80SH
56 pages
Design Pattern - Service Provider Interface (SPI)
No ratings yet
Design Pattern - Service Provider Interface (SPI)
2 pages
Beginner's Guide to Rails Testing
No ratings yet
Beginner's Guide to Rails Testing
45 pages
Error Codes in LibreOffice Calc
No ratings yet
Error Codes in LibreOffice Calc
4 pages
(QUESTIONS) Sup Exam Object-Oriented Programming With Java (Wed, 7 July 2024)
No ratings yet
(QUESTIONS) Sup Exam Object-Oriented Programming With Java (Wed, 7 July 2024)
4 pages
AMAG SRControllers
No ratings yet
AMAG SRControllers
4 pages
Teldat Bintec w2003n Ext 5510000325 Manual de Usuario
No ratings yet
Teldat Bintec w2003n Ext 5510000325 Manual de Usuario
9 pages
SAP Technical Upgrade From R3 4.7 To ECC 6.0
No ratings yet
SAP Technical Upgrade From R3 4.7 To ECC 6.0
12 pages
Quick Changeover for Manufacturing
100% (1)
Quick Changeover for Manufacturing
28 pages
The Advantages and Disadvantages of Outsourcing
100% (1)
The Advantages and Disadvantages of Outsourcing
8 pages
Solution 2 28122022 013134am
No ratings yet
Solution 2 28122022 013134am
5 pages
MIMO (eRAN13.1 02)
No ratings yet
MIMO (eRAN13.1 02)
274 pages

Lecture11 Cda3101

Uploaded by

Lecture11 Cda3101

Uploaded by

LECTURE 11 Memory Hierarchy

The goal of designing a memory hierarchy is to simulate having unlimited amounts of

All data in a level is typically also found

We keep the smallest, faster memory

The idea is that our access time during a

The unit of data that is transferred between two levels is fixed

Fastest Memory Smallest Highest SRAM

Slowest Biggest Lowest Magnetic Disk

Technology Typical Access Time $ per GiB in 2016

SRAM 0.5-5 ns $400 - $1000

What happens when our program references 𝑋% ?

One Answer: If each word can go

Here is an example cache which contains

Blocks in memory are mapped to a

So, now we know where to find the data

What is an obvious choice for the tag?

The tag for the cache entry with index 101

Therefore, when looking in the cache for

We need to retrieve the contents of the block at address 22 and place

Now, we have a reference to the block at address 26. What happens

Now, we have a reference the block at address 22 again. Now what

Similarly, we will have another reference to the block at address 26.

• The offset is used to indicate the first byte accessed

How many bytes are in a block?

How big does a tag field need to be?

How many bytes are in a block? 2J ∗ 4 = 2J ∗ 2D = 2JLD bytes per block.

2FG × 128 + 32 − 10 − 2 − 2 + 1 = 147 𝐾𝑏𝑖𝑡𝑠

First of all, we know the entry into the cache is given by

We index using an 8-bit portion of the

Fill in the appropriate

Fill in the appropriate

In a set-associative cache, the set

𝐵𝑙𝑜𝑐𝑘 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 % (#𝑆𝑒𝑡𝑠)

The disadvantages are:

The index tells us which set to look in. We need

The tag of every entry is compared to the upper

Otherwise, we have a miss.

How is the physical address partitioned?

Fill in the appropriate Address Tag Index Offset

How is the physical address partitioned?

Fill in the appropriate Address Tag Index Offset

What if the block to be written is not in the cache?

The average access time for an instruction is given by:

• Techniques for reducing the miss penalty:

• Techniques for reducing the hit time:

You might also like