0% found this document useful (0 votes)

14 views36 pages

Memory Design

Uploaded by

6wqy47qvgk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views36 pages

Memory Design

Uploaded by

6wqy47qvgk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Memory Design

Key References:
• Computer Organization and Design H-S Interface–Patterson and Hennessy
• Digital Design and Computer Architecture-Harris and Harris
A Computing System
• Three key components
• Computation
• Communication
• Storage/memory

2
Memory (Programmer’s View)

3
Memory Abstraction: Virtual vs. Physical
• Programmer sees virtual memory [in general purpose
machines]
• Can assume the memory is “infinite”

• Reality: Physical memory size is much smaller than what the

programmer assumes

• Who helps? : The system (system software + hardware,

cooperatively) maps virtual memory addresses to physical
memory
• The system automatically manages the physical memory space
transparently to the programmer
+ Programmer does not need to know the physical size of memory nor
manage it  A small physical memory can appear as a huge one to
the programmer. 4
Memory Abstraction:
• We need a larger level of storage to manage a small amount
of physical memory automatically.

5
Ideal Memory Conditions

Pipeline
Instruction Data
(Instruction
Supply Supply
execution)

- Zero latency access - No pipeline stalls - Zero latency access

- Infinite capacity -Perfect data flow - Infinite capacity

(reg/memory dependencies)
- Zero cost - Infinite bandwidth
- Zero-cycle interconnect
- Perfect control flow (operand communication) - Zero cost

- Enough functional units

- Zero latency compute

6
Methods to Store Data?
• Flip-Flops (or Latches)
• Very fast, parallel access
• Very expensive (one bit costs tens of transistors)
• Static RAM
• Relatively fast, only one data word at a time
• Expensive (one bit costs 6 transistors). Cache memories.
• Dynamic RAM
• Slower, one data word at a time, reading destroys content
(refresh), needs special process for manufacturing
• Cheap (one bit costs only one transistor plus one capacitor)
• Other storage technology (flash memory, hard disk, tape)
• Much slower, access takes a long time, non-volatile
• Very cheap (no transistors directly involved)
Building Larger Memories
• Requires larger memory arrays :: Large  slow
• How do we make the memory large without making it very slow?

• Idea: Divide the memory into smaller arrays and interconnect the arrays
to input/output buses.
• Interleaving (banking)
• Goal: Reduce the latency of memory array access and enable multiple
accesses in parallel
• Task: Divide a large array into multiple banks that can be accessed
independently (in the same cycle / in consecutive cycles)
• Each bank is smaller than the entire memory storage
• Accesses to different banks can be overlapped

8
DRAM vs. SRAM
• DRAM
• Slower access (capacitor)
• Higher density (1T 1C cell)
• Lower cost
• Requires refresh (power, performance, circuitry)
• Manufacturing requires putting capacitor and logic together

• SRAM
• Faster access (no capacitor)
• Lower density (6T cell)
• Higher cost
• No need for refresh
• Manufacturing compatible with logic process (no capacitor)
9
The Memory Hierarchy: Ideal Memory
• Zero access time (latency)
• Infinite capacity
• Zero cost
• Infinite bandwidth (to support multiple accesses in parallel)

• Observe:: different memories have different properties (away from

ideal)….So, we need to do better & different..

10
The Problem
• Ideal memory’s requirements oppose each other

• Bigger is slower !!
• Bigger  Takes longer to determine the location

• Faster is more expensive

• Memory technology: SRAM  DRAM  Disk  Tape

• Higher bandwidth is more expensive

• Need more banks, more ports, higher frequency, or faster technology

11
The Problem
• Bigger is slower
• SRAM, 512 Bytes, sub-nanosec
• SRAM, KByte~MByte, ~nanosec
• DRAM, Gigabyte, ~50 nanosec
• Hard Disk, Terabyte, ~10 millisec

• Faster is more expensive (dollars and chip area)

• SRAM, < 10$ per Megabyte
• DRAM, < 1$ per Megabyte
• Hard Disk < 1$ per Gigabyte
• Flash memory and others….

12
Why Memory Hierarchy?
• Yet, …….We want both fast and large

• Unfortunately… we cannot achieve both with a single level of

memory

• So, what now??

• Have multiple levels of storage (progressively bigger and slower as the
levels are farther from the processor) and ensure most of the data the
processor needs is kept in the fast(er) level(s)

13
The Memory Hierarchy

o move frequently needed stuff fast

here small

cheaper per byte

faster per byte
o Backup everything here
big but slow

**With good locality of reference, memory appears as fast as and as large as possible 14
Memory Hierarchy
• Fundamental tradeoff
• Fast memory: small
• Large memory: slow
• Idea: Memory hierarchy

Hard Disk
Main
CPU Cache Memory
RF (DRAM)

• Latency, cost, size,

• bandwidth

15
Locality
• Locality rules the idea of memory hierarchy.

• Temporal Locality: with reference to time, you would do the same task
again in the future (time), soon.
• Temporal: A program tends to reference the same memory location
many times and all within a small time frame. OR, a loop: same
instruction gets executed again and again.

• Spatial Locality: (in space): with reference to space (surroundings), you

would repeat the task in the neighborhood (space) as well.
 Spatial: A program tends to reference a set of memory locations at a
time [PC, PC+4, PC+8…. All are consecutive] OR array/data structure
references
• 16
Caching Basics: Temporal & Spatial Locality
• Temporal Idea: Store recently accessed data in cache (fast memory)
• Temporal locality principle
• Recently accessed data will be again accessed in the near future

• Spatial Idea: Store addresses adjacent to the recently accessed one

in cache.
• Logically divide memory into equal size blocks (ex: some IBM’s system used
16 Kbyte cache with 64 byte blocks
• Fetch to cache the accessed block in its entirety
• Spatial locality principle
• Nearby data in memory will be accessed in the near future

17
Cache Memory
• It is an automatically-managed memory structure based on SRAM that
memorizes frequently used results to avoid :
• repeating the long-latency operations required to reproduce the results from
scratch. [i.e. paying for the DRAM access latency]

18
Caching in a Pipelined Design
 The cache needs synchronization with pipeline
 i.e., access in 1-cycle so that load-dependent operations do not stall
 High frequency pipeline  Cannot make the cache large (why? We
don’t want delay!)
 But, we want a large cache AND a pipelined design
 Idea: Cache hierarchy [i.e. zoom in to memory hierarchy]

Main
Level 2 Memory
CPU Level1 Cache (DRAM)
RF Cache

** L3 cache also possible .

19
A Modern Memory Hierarchy
Register File
32 words, sub-nsec

L1 cache
~32 KB, ~nsec

L2 cache
512 KB ~ 1MB, many nsec

L3 cache,
.....

Main memory (DRAM),

GB, ~100 nsec

Swap Disk
100 GB, ~10 msec
20
Hit or Miss ??
• If the processor requests data that is available in the cache, it is returned
quickly-This is called a cache hit.

• Otherwise, the processor retrieves the data from main memory (DRAM).
This is called a cache miss.

 Memory system performance metrics are miss rate or hit rate and
average memory access time.

21
Hit or Miss ??
• Average memory access time (AMAT) is the average time a processor
must wait for memory per load or store instruction.
• [i.e. search cache?  DRAM ? Disk (virtual mem)?]

22
Caching Basics
 Block (line): Unit of storage in the cache
 Memory is logically divided into cache blocks that map to locations in the
cache

 On a reference:
 HIT: If in cache, use cached data instead of accessing memory
 MISS: If not in cache, bring block into cache
 Maybe have to replace something else out to do it

 Some important cache design decisions

 Placement: where and how to place/find a block in cache?
 Replacement: what data to remove to make room in cache?
 Instructions/data: do we treat them separately? (if we have separate caches)

23
Cache Abstraction

Address
Tag Store Data Store

(is the address (stores

in the cache? memory
+ bookkeeping) blocks)

Hit/miss? Data

• Cache hit rate = (# hits) / (# hits + # misses) = (# hits) / (# accesses)

• Average memory access time (AMAT)
• = ( hit-rate * hit-latency ) + ( miss-rate * miss-latency )

24
Blocks and Addressing the Cache
• Memory is logically divided into fixed-size blocks

• Each block maps to a location in the cache, determined by the index bits in
the address tag index byte in block
• used to index into the tag and data stores 2b 3 bits 3 bits

8-bit address
Find the block
• Cache access process:
Find the byte
• 1) index into the tag and data stores with index bits in address
• 2) check valid bit in tag store
• 3) compare tag bits in address with the stored tag in tag store

• If a block is in the cache (cache hit), the stored tag should be valid and
match the tag of the block
25
Direct-Mapped Cache: Placement and Access
Block: 00000
Block: 00001 • Assume byte-addressable memory: 256 bytes,
Block: 00010
Block: 00011
Block: 00100
8-byte/ blocks  32 blocks (i.e. 256/8=32)
Block: 00101
Block: 00110 • Assume cache: 64 bytes, 8 blocks (i.e. each block is 8 bytes)
Block: 00111
Block: 01000  Direct-mapped: A block can go to only one location
Block: 01001
Block: 01010 tag index byte in block
Block: 01011
Block: 01100 2b 3 bits 3 bits Tag store Data store
Block: 01101
Block: 01110 Address
Block: 01111
Block: 10000
Block: 10001
Block: 10010
Block: 10011
Block: 10100
Block: 10101 V tag
Block: 10110
Block: 10111
Block: 11000 byte in block
Block: 11001 =? MUX
Block: 11010
Block: 11011
Block: 11100 Hit? Data
Block: 11101
Block: 11110  Addresses with same index contend for the same location
Block: 11111  Cause conflict misses
Main memory 26
Direct-Mapped Caches
• Direct-mapped cache: Two blocks in memory that map to the same
index in the cache cannot be present in the cache at the same time
• One index  one entry

• Can lead to 0% hit rate if more than one block accessed in an

interleaved manner map to the same index
• Assume addresses A and B have the same index bits but different tag bits
• A, B, A, B, A, B, A, B, …  conflict in the cache index [full waste of
remaining space]
• All accesses are conflict misses

• Summary:: when two recently accessed addresses map to the same

cache block, a conflict occurs, and the most recently accessed address
evicts the previous one from the block.
27
• the two least significant bits of the 32-bit address are called the byte offset,
• the next three bits are called the set bits, indicate the set to map.
• the remaining 27 tag bits memory address of the data stored in a given cache set

28
• A load instruction reads the specified entry from the cache and checks the
tag and valid bits.
• If the tag matches the most significant 27 bits of the address and the valid bit
is 1, the cache hits and the data is returned to the processor.
29
• Consider a cache consisting of 128 blocks of 16 words each, for a
total of 2048 (2K) words, and assume that the main memory is
addressable by a 16-bit address. The main memory has 64K
words, which we will view as 4K blocks of 16 words each.

• Direct Mapping:
• In this technique, block j of the main memory maps onto block
• j mod 128 of the cache.
• Whenever one of the main memory
blocks 0, 128, 256,... is loaded into
the cache, it is stored in cache block
0. Blocks 1, 129, 257,... are stored in
cache block 1, and so on

• Now there is contention ! Oh.

• How to solve: ?
• Tag (5b): 4096/128=
• The memory address can be divided 32 (which block set
into three fields: from mem)
• Block (7b): cache
has 128 blocks
• Word (4b): Each
block has 16 words
(block offset)
Associative Mapping
• The most flexible mapping method, in which a main memory block
can be placed into any cache block position.
• So, we shall need to worry about only the word-offset (word
address), other fields are memory address itself.  simple !

• 12 b: 4096 mem location

address
• 4 b: which word in a block
(16 words/ block)
• More efficient use of the space in the cache.
• When a new block is brought into the cache, it replaces an existing block
only if the cache is full.  we need an algorithm to select the block to be
replaced [not discussed here).

• The complexity of an associative cache is higher than that of a direct-

mapped cache, because  need to search all tag patterns to determine
whether a given block is in the cache.
• We do it (searching of tags) in parallel.!!  but expensive 

• Done with Content-addressable memory (CAM): compares input search

data against a table of stored data, and returns the address of matching
data.
Set-Associative Mapping
• It uses a combination of the direct- and associative-mapping
techniques.
• The blocks of the cache are grouped into sets, and the mapping
allows a block of the main memory to reside in any block of a
specific set.
• No contention as in direct method have a few choices for block
placement.
• Hardware cost is reduced  by decreasing the size of the
associative search.
 For a cache with two blocks per
set.
 In this case, memory blocks 0, 64,
128,..., 4032 map into cache
set 0, and they can occupy either
of the two block positions within
this set.

• 6 b: (set) : 64 set
• 4 b : word offset
• 6 b : (tag): 4096/64 set = 64 
to compare with two blocks of a
set
• Further Reading:
• The number of blocks per set is a parameter that can be selected to suit
the requirements of a particular computer.
• For example: four blocks per set can be accommodated by a 5-bit set
field 128/4=32), eight blocks per set by a 4-bit set field, and so on.
• The extreme condition of 128 blocks per set requires no set bits and
corresponds to fully-associative technique, with 12 tag bits.
• The other extreme of one block per set is the direct-mapping method.
• A cache that has k blocks per set is referred to as a k-way set-
associative cache. (we studies 2-way)

Week 10
No ratings yet
Week 10
59 pages
Advanced Memory Systems Lecture
No ratings yet
Advanced Memory Systems Lecture
88 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Cache Memory & Design Principles
No ratings yet
Cache Memory & Design Principles
47 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
CH04 COA9e
No ratings yet
CH04 COA9e
58 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
CA09 2024S2 New
No ratings yet
CA09 2024S2 New
29 pages
COA ch3
No ratings yet
COA ch3
39 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
Memory Organization Ch41
No ratings yet
Memory Organization Ch41
51 pages
Memory Organization AndCache Mapping Study 13
100% (1)
Memory Organization AndCache Mapping Study 13
55 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
Lecture 11. Memory Hierarchy
No ratings yet
Lecture 11. Memory Hierarchy
107 pages
03-Chap4-Cache Memory Mapping
No ratings yet
03-Chap4-Cache Memory Mapping
24 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Lecture 5
No ratings yet
Lecture 5
44 pages
All CSC 417 Note
No ratings yet
All CSC 417 Note
238 pages
Memory System Overview and Technologies
No ratings yet
Memory System Overview and Technologies
44 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
Memory 2
No ratings yet
Memory 2
31 pages
Memory Systems for Engineers
No ratings yet
Memory Systems for Engineers
77 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Cache
No ratings yet
Cache
36 pages
Chapter 7
No ratings yet
Chapter 7
43 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Address Field Breakdown for Cache System
No ratings yet
Address Field Breakdown for Cache System
55 pages
Computer Memory Systems Guide
No ratings yet
Computer Memory Systems Guide
16 pages
Unit III Memory Hierarchy
No ratings yet
Unit III Memory Hierarchy
21 pages
Memory Systems and Hierarchy
No ratings yet
Memory Systems and Hierarchy
78 pages
Chapter 5 - Memory
No ratings yet
Chapter 5 - Memory
44 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Memory Systems for Computer Science
No ratings yet
Memory Systems for Computer Science
46 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Cache and Virtual Memory Explained
No ratings yet
Cache and Virtual Memory Explained
50 pages
CompArch 18a Cache-1
No ratings yet
CompArch 18a Cache-1
14 pages
Chapter 2 Memory
No ratings yet
Chapter 2 Memory
23 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
Memory Hierarchy and Cache Design
No ratings yet
Memory Hierarchy and Cache Design
53 pages
Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
75% (4)
Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
105 pages
Memory
No ratings yet
Memory
57 pages
Understanding Cache Memory Basics
No ratings yet
Understanding Cache Memory Basics
47 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
Lec13 Memory 1 Notes
No ratings yet
Lec13 Memory 1 Notes
27 pages
Lecture 8
No ratings yet
Lecture 8
41 pages
Lecture Two
No ratings yet
Lecture Two
35 pages
10gen-MongoDB Operations Best Practices
No ratings yet
10gen-MongoDB Operations Best Practices
29 pages
Computer Organization Lab Manual
No ratings yet
Computer Organization Lab Manual
40 pages
Computer Class 7 Yt
No ratings yet
Computer Class 7 Yt
10 pages
Chapter 5 (CWWU Memory Testing)
No ratings yet
Chapter 5 (CWWU Memory Testing)
180 pages
Xerox 6204 Wide Format Pringer - External Device Er PDF
No ratings yet
Xerox 6204 Wide Format Pringer - External Device Er PDF
3 pages
Computer Quiz for 1st Year Students
No ratings yet
Computer Quiz for 1st Year Students
1 page
C13-Computational Performance
No ratings yet
C13-Computational Performance
45 pages
Embedded Real-Time Systems Guide
No ratings yet
Embedded Real-Time Systems Guide
4 pages
National Assessment at Form Iii: Students Answer On The Question Paper
No ratings yet
National Assessment at Form Iii: Students Answer On The Question Paper
22 pages
Nicl Computer Basics
No ratings yet
Nicl Computer Basics
58 pages
EMBEDDED BASED Automatic AUDITORIUM CONTROLLER 0TH REV
100% (2)
EMBEDDED BASED Automatic AUDITORIUM CONTROLLER 0TH REV
36 pages
TM1 Migration Strategies by QueBIT
No ratings yet
TM1 Migration Strategies by QueBIT
8 pages
Types of Computer Memory: Primary and Secondary: Unit-3
No ratings yet
Types of Computer Memory: Primary and Secondary: Unit-3
8 pages
1000+ Microprocessor 8085,8086 MCQ
48% (29)
1000+ Microprocessor 8085,8086 MCQ
164 pages
Stair Reynolds Is Essentials PPT Chapter 2
No ratings yet
Stair Reynolds Is Essentials PPT Chapter 2
76 pages
LCM Module: Industrial Co., LTD
No ratings yet
LCM Module: Industrial Co., LTD
17 pages
Human-Computer Interaction
No ratings yet
Human-Computer Interaction
28 pages
BIOS User Manual C341 V100
No ratings yet
BIOS User Manual C341 V100
66 pages
Information Technology Grade 8
100% (1)
Information Technology Grade 8
26 pages
MongoDB Performance Best Practices
No ratings yet
MongoDB Performance Best Practices
16 pages
Microprocessors Course Guide
No ratings yet
Microprocessors Course Guide
17 pages
ICT Glossary for Students
No ratings yet
ICT Glossary for Students
5 pages
Es Unit 1
No ratings yet
Es Unit 1
22 pages
Cyber Careers - The Basics of Information Technology and Deciding On A Career Path
No ratings yet
Cyber Careers - The Basics of Information Technology and Deciding On A Career Path
99 pages
Computer Motherboard Guide
No ratings yet
Computer Motherboard Guide
45 pages
Computer Basics for Beginners
No ratings yet
Computer Basics for Beginners
94 pages
RAM Connection V8: Manual
No ratings yet
RAM Connection V8: Manual
152 pages
CorePythonforEveryone MadhusudanMothe
No ratings yet
CorePythonforEveryone MadhusudanMothe
137 pages
CGO Software: CHC - Geomatics Office Software
No ratings yet
CGO Software: CHC - Geomatics Office Software
2 pages
Caie Igcse Ict 0417 Theory v3
No ratings yet
Caie Igcse Ict 0417 Theory v3
44 pages

Memory Design

Uploaded by

Memory Design

Uploaded by

Memory Design

• Reality: Physical memory size is much smaller than what the

• Who helps? : The system (system software + hardware,

- Zero latency access - No pipeline stalls - Zero latency access

- Infinite capacity -Perfect data flow - Infinite capacity

- Enough functional units

- Zero latency compute

• Observe:: different memories have different properties (away from

• Faster is more expensive

• Higher bandwidth is more expensive

• Faster is more expensive (dollars and chip area)

• Unfortunately… we cannot achieve both with a single level of

• So, what now??

o move frequently needed stuff fast

cheaper per byte

• Latency, cost, size,

• Spatial Locality: (in space): with reference to space (surroundings), you

• Spatial Idea: Store addresses adjacent to the recently accessed one

** L3 cache also possible .

Main memory (DRAM),

 Some important cache design decisions

(is the address (stores

• Cache hit rate = (# hits) / (# hits + # misses) = (# hits) / (# accesses)

• Can lead to 0% hit rate if more than one block accessed in an

• Summary:: when two recently accessed addresses map to the same

• Now there is contention ! Oh.

• 12 b: 4096 mem location

• The complexity of an associative cache is higher than that of a direct-

• Done with Content-addressable memory (CAM): compares input search

You might also like