11 Cache Memory
11 Cache Memory
and Organization
Chapter 8
Cache Memory
Memory Hierarchy - Diagram
Locality of Reference
• During the course of the execution of a
program, memory references tend to
cluster
• e.g. loops
Cache
• Small amount of fast memory
• Sits between normal main memory and
CPU
• May be located on CPU chip or module
Cache and Main Memory
Cache/Main Memory Structure
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from
main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each cache
slot
Cache Read Operation - Flowchart
Cache Design
• Addressing
• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches
Cache Addressing
• Where does cache sit?
— Between processor and virtual memory management
unit
— Between MMU and main memory
• Logical cache (virtual cache) stores data using
virtual addresses
— Processor accesses cache directly, not MMU
— Cache access faster, before MMU address translation
• Physical cache stores data using main memory
physical addresses
Size does matter
• Cost
—More cache is expensive
• Speed
—More cache is faster (up to a point)
—Checking cache for data takes time
Typical Cache Organization
Comparison of Cache Sizes
Processor Type Year of Introduction L1 cache L2 cache L3 cache
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
High-end server/
IBM SP supercomputer 2000 64 KB/32 KB 8 MB —
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
—(224=16M)
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one memory
block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Address Structure
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
— 8 bit tag (=22-14)
— 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping from Cache to Main Memory
Direct Mapping
Cache Line Table
1 1,m+1, 2m+1…2s-m+1
…
m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping Cache Organization
Direct
Mapping
Example
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words
or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory
= 2s+w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high
Victim Cache
• Lower miss penalty
• Remember what was discarded
—Already fetched
—Use again with little penalty
• Fully associative
• 4 to 16 cache lines
• Between direct mapped L1 cache and next
memory level
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Associative Mapping from
Cache to Main Memory
Fully Associative Cache Organization
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to
check for hit
• Least significant 2 bits of address identify which
8 bit word is required from 32 bit data block
• e.g.
— Address Tag Data Cache
line
— 3FFFFF 3FFFFF 24682468 3FFF
Associative
Mapping
Example
Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words
or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory
= 2s+ w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given
set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only
one set
Set Associative Mapping
Address Structure
Word
Tag 9 bit Set 13 bit 2 bit