Lecture11 Cda3101
Lecture11 Cda3101
MEMORY HIERARCHY
When it comes to memory, there are two universally desirable properties:
• Large Size: ideally, we want to never have to worry about running out of memory.
• Speed of Access: we want the process of accessing memory to take as little time as
possible.
But we cannot optimize both of these properties at the same time. As our memory size
increases, the time to find a memory location and access it grows as well.
• Spatial Locality – if an item is referenced, items whose addresses are close by will
tend to be referenced soon.
• If we access the location of A[0], we will probably also be accessing A[1], A[2], etc.
• Sequential instruction access also exhibits spatial locality.
MEMORY HIERARCHY
A memory hierarchy, consisting of multiple levels of memory with varying speed and
size, exploits these principles of locality.
• Faster memory is more expensive per bit, so we use it in smaller quantities.
• Slower memory is much cheaper so we can afford to use a lot of it.
The goal is to, whenever possible, keep references in the fastest memory. However,
we also want to minimize our overall memory cost.
MEMORY HIERARCHY
Level 2
Level 3
MEMORY HIERARCHY
Speed Processor Size Cost ($/bit) Technology
Memory DRAM
Two Questions:
1. How do we know if an item is
present in the cache?
2. How do we find the item in the
cache?
CACHES
A reference to 𝑋% causes a miss, which forces the cache to fetch 𝑋% from some lower
level of the memory hierarchy, presumably main memory.
Two Questions:
1. How do we know if an item is
present in the cache?
2. How do we find the item in the
cache?
What is an obvious choice for the tag? The upper bits of the address of the block!
TAGS
For instance, in this particular example,
let’s say the block at address 01101 is
held in the cache entry with index 101.
Note that initially the valid-bit entries are all ‘N’ for not valid.
EXERCISE
The first reference is for the block at address 22, which uses
the lower bits 110 to index into the cache. The 110 cache
entry is not valid so this is a miss.
We know that 16 KB is 4K words, which is 2FD words, and, with a block size of 4
words (2D ), 2FG blocks.
Each block contains 4 words, or 128 bits, of data. Each block also has a tag that is
32-10-2-2 bits long, as well as one valid bit. Therefore, the total cache size is
An n-way set-associative cache may have some number of sets, each containing n
blocks. A block address can be mapped to a particular set, in which the block can be
placed in any of the n entries.
To find a reference in a set-associative cache, we figure out its set based on the
address and then search all of the entries in the set.
SET-ASSOCIATIVE CACHE
The example below has a reference with a block address of 12 and each cache
organization has 8 entries.
Let’s say, in the MEM stage of a store word instruction, we write to the data cache.
Then, main memory and data cache will have different values for that particular
block. In this case, they are said to be inconsistent.
There are two solutions to this issue. The method we use becomes our write policy.
• Write-through
• Write-back
WRITE POLICIES
• Write-through
• Always write data into both the cache and main memory (or the next lower level).
• Easily implemented.
• Could slow down the processor à use a write buffer to allow the processor to continue executing
while the data is written to memory.
• Cache and memory are always consistent.
• Write-back
• Only write the data to the cache block.
• The updated block is only written back to memory when it is replaced by another block.
• A dirty bit is used to indicate whether the block needs to be written or not.
• Reduces accesses to the next lower level.
• No-write allocate
• The block is not loaded into the cache on a write miss.
• Block simply updated in main memory.
• Typically used with write through.
WRITE-THROUGH, NO-WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300
R 304
R 4404
W 4408
W 8496
R 8500
R 304
WRITE-THROUGH, NO-WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12 Miss Yes No
R 304 0 19 0
R 4404 4 19 4
W 4408 4 19 8
W 8496 8 19 0
R 8500 8 19 4
R 304 0 19 0
WRITE-THROUGH, NO-WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12 Miss Yes No
R 304 0 19 0 Miss Yes Yes
R 4404 4 19 4 Miss Yes Yes
W 4408 4 19 8 Hit Yes Yes
W 8496 8 19 0 Miss Yes No
R 8500 8 19 4 Miss Yes Yes
R 304 0 19 0 Miss Yes Yes
WRITE-BACK, WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12
R 304 0 19 0
R 4404 4 19 4
W 4408 4 19 8
W 8496 8 19 0
R 8500 8 19 4
R 304 0 19 0
WRITE-BACK, WRITE ALLOCATE
Assume a 2-way set-associative cache with 64 cache sets, 4 words per block, and an
LRU replacement policy. Fill in the appropriate information for the following memory
references.
R/W Addr Tag Index Offset Result Memref Update
Cache?
W 300 0 18 12 Miss Yes Yes
R 304 0 19 0 Miss Yes Yes
R 4404 4 19 4 Miss Yes Yes
W 4408 4 19 8 Hit No Yes
W 8496 8 19 0 Miss Yes Yes
R 8500 8 19 4 Hit No No
R 304 0 19 0 Miss Yes (2) Yes
CACHE MISSES
Let’s consider the effect of cache misses for instructions. Assume our miss penalty is 10
cycles and the miss rate is .10.
Cycle 1 2 3 4 5 6 7 8
inst1 IF ID EX MEM WB
inst2 IF ID EX MEM WB
inst3 IF ID EX MEM WB
inst4 IF ID EX MEM WB
MEMORY HIERARCHY MISSES
Not all misses are equal. We can categorize them in the following way:
• Compulsory Misses
• Caused by first access to block.
• Possibly decreased by increasing block size.
• Capacity Misses
• Caused when memory level cannot contain all blocks needed during execution of process.
• Can be decreased by increasing cache size.
• Conflict Misses
• Occur when too many blocks compete for same entry in cache.
• Can be decreased by increasing associativity.
CRITICAL WORD FIRST AND EARLY RESTART
One way to reduce the penalty for misses is to reduce the time spent waiting for the
actual request data, rather than the whole block of data.
Critical word first means to request the missed word first from the next memory
hierarchy level to allow the processor to continue while filling in the remaining words
in the block, usually in a wrap-around fill manner.
Early restart means to fetch the words in the normal order, but allow the processor to
continue once the requested word arrives.
MULTILEVEL CACHES
Three levels of cache all on the same chip are now common, where there are
separate L1 instruction and data caches and unified L2 and L3 caches.
• The L1 cache is typically much smaller than L2 cache with lower associativity to
provide faster access times. Same with L2 and L3.
• The L1 caches typically have smaller block sizes than L2 caches to have a shorter
miss penalty. Same with L2 and L3.
• Lower cache levels being much larger and having higher associativity than higher
cache levels decreases their misses, which have higher miss penalties.
MULTILEVEL CACHE PERFORMANCE
The miss penalty of an upper level cache is the average access time of the next lower
level cache.
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑐𝑐𝑒𝑠𝑠 𝑇𝑖𝑚𝑒 = 𝐿1 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿1 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)
where
𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = 𝐿2 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿2 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)
What is the average access time given that the L1 hit time is 1 cycle, the L1 miss rate
is 0.05, the L2 hit time is 4 cycles, the L2 miss rate is 0.25, and the L2 miss penalty is
50 cycles?
MULTILEVEL CACHE PERFORMANCE
The miss penalty of an upper level cache is the average access time of the next lower
level cache.
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑐𝑐𝑒𝑠𝑠 𝑇𝑖𝑚𝑒 = 𝐿1 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿1 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)
where
𝐿1 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = 𝐿2 𝐻𝑖𝑡 𝑇𝑖𝑚𝑒 + 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ (𝐿2 𝑀𝑖𝑠𝑠 𝑃𝑒𝑛𝑎𝑙𝑡𝑦)
What is the average access time given that the L1 hit time is 1 cycle, the L1 miss rate
is 0.05, the L2 hit time is 4 cycles, the L2 miss rate is 0.25, and the L2 miss penalty is
50 cycles?
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑐𝑐𝑒𝑠𝑠 𝑇𝑖𝑚𝑒 = 1 + .05 ∗ 4 + .25 ∗ 50 = 1.85
MULTILEVEL CACHE PERFORMANCE
• Local Miss Rate: the fraction of references to one level of a cache that miss.
𝑀𝑖𝑠𝑠𝑒𝑠 𝑖𝑛 𝐿2
Example: 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 =
𝐴𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑡𝑜 𝐿2
• Global Miss Rate: the fraction of references that miss in all levels of a multilevel
cache.
Example: 𝐺𝑙𝑜𝑏𝑎𝑙 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 = 𝐿1 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ 𝐿2 𝑀𝑖𝑠𝑠 𝑅𝑎𝑡𝑒 ∗ …
IMPROVING CACHE PERFORMANCE
• Techniques for reducing the miss rate:
• Increase the associativity to exploit temporal locality.
• Increase the block size to exploit spatial locality.