ELT3047 Computer Architecture: Lecture 11: Virtual Memory
ELT3047 Computer Architecture: Lecture 11: Virtual Memory
How can we prevent each program from interfering with other’s memory?
Program Memory
Program 1
Physical Memory
4GB
1GB XYZ$@#!
Program 2
⚠️Crash if we try to access an address > 0x3FFF ⚠️Corrupt if each process can access any
FFFF! memory address
Virtual Memory
Give each process the illusion of a full memory address space at a time.
Parts of the program (working set) reside in RAM, others are in disk.
Virtual Memory
DRAM
B
3 A
C
Process 1
2 B D
1 C
Disk
G
3 E
Process 2 A
2 F
F
1 G H
0 H
Virtual Address Space Illusion
Processor
Processor ffff ffff Memory
Memory
StackStack hex
Control
Control
Unused
Unused Processes use virtual
Different processes run simultaneously by context switching
Datapath
Datapath Memory
Memory
Physical Addresses
Physical Addresses
Instruction
CacheCache
Virtual Addresses
Instruction
Virtual Addresses
Program
Program Counter
Counter (PC) (PC) addresses
Bytes Bytes
Many processes, all using
Registers
Registers ? ?
HeapHeap same (conflicting)
Data Cache
Data Cache addresses.
StaticStatic
DataData
Arithmetic-Logic
Arithmetic-Logic
Unit (ALU)
Unit (ALU)
CodeCode 0000 0000
hex
Processor
Processor ffff ffff Memory
Memory
Stack
Stack hex
Control
Control
Unused
Unused
Datapath
Datapath Memory
Memory
Physical Addresses
Physical Addresses
Instruction
Instruction
CacheCache
Virtual Addresses
Virtual Addresses
Program
Program
Counter
Counter
(PC) (PC)
BytesBytes
Registers
Registers ? ?
HeapHeap
Data Data
CacheCache
Static
Static
DataData
Arithmetic-Logic
Arithmetic-Logic
Unit (ALU)
Unit (ALU)
CodeCode 0000 0000
hex
Benefits of Virtual Memory
By moving old data to the disk to free up RAM for active processes.
Each process sees a contiguous block of memory in its virtual address space, making development easier.
Memory management unit (MMU) takes care of the mapping between virtual and physical addresses.
Each process has its own virtual address space → one program cannot directly access the memory of another program.
Principle of Locality: a program often accesses a small portion of its address space during a period of time.
Physical Adresses
f: frame number (f
max
frames)
o: frame offset (o
max
bytes/frames) (f,o)
o
Physical address = o
max
f+o
Example: o
max
= 512 bytes/frames Physical
PA: 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0
16 10 9 1
(0,0)
Virtual Adresses
(p -1,o -1)
MAX MAX
p: page number (p
max
pages)
o: page offset (o
max
bytes/pages) (p,o)
o
Virtual address = o
max
p+o
Observations Virtual
o
max
is the same for both pages and frames. Memory
p
max
and f
max
in principle have no relation
p
Normally, we have more pages than frames
(0,0)
Address mapping
Each memory request requires a mapping from virtual space to physical space.
It is possible for a virtual page to be absent from the main memory (DRAM).
Program 1
main memory
Program 2
address.
Address translation: introduction
Translating a Virtual Address (VA) to a Physical Address (PA) is done by a combination of hardware and
software.
1
index
into 1
page 1
table 0
Page Table
232 virtual addresses / (212 B/page) = 220 virtual page numbers (220 entries)
(1 Mi pages)
0x00000
0
Page table contents:
…
Each VPN has a valid entry: no tags, the VPN is used as an Status bits Memory page/
It maps virtual page numbers to locations in the swap space (disk blocks).
Protection:
Page table entry also includes a write protection bit; if on, then page is “protected”.
Address translation: Page hit example
Program DRAM
Page Table
(32-b virtual address space) (physical address space)
4️⃣
VPN PPN
0 …
lb t0, 0xFFFFF004(x0)
1️⃣ … … 1
……………data for
2
0x60000 disk t0…
… … 3
0xFFFFF 3️⃣
2️⃣ 1
…
(assume 4 x 4KiB pages)
CPU
Address translation: Page fault example
Page fault: the OS uses the swap table to find the disk block
3.
OS reads the disk, loads the data to RAM & returns it to the program.
Program DRAM
Page Table
(32-b virtual address space) (physical address space)
0 …
lb t0, 0xFFFFF004(x0) VPN PPN
1
lb t1, 0x60000030(x0) … … ……………data
1️⃣ ⚠️ 2
for t0…
0x60000 disk
2️⃣ … … 3️⃣ 3
0xFFFFF 1 Go to
…
(assume 4 x 4KiB pages)
CPU disk!
Dealing with Large Page Tables
When OS starts a new process, it creates space on disk for all the pages of the process (all valid bits in page
table = zero)
called Demand Paging - pages of the process are loaded from disk only when needed.
With demand paging, physical memory fills quickly → the overall page table size becomes too big.
Variety of solutions to tradeoff the page table size for slower performance
31 22 21 12 11 0
g e?
h y s ical pa
y p
e is m
Wher
?
an s l ation
e is my tr
Wher
PTEntry
PPN
PDEntry
Page Table
PTBR
Total page table size: 210 (#tables) * 210 (#entries/table) * 4 bytes = 4MB
Benefits
Don’t need to allocate every PageTable, only those containing valid entries
Drawbacks
Longer lookups.
Disk Write and Load control
Use write-back
Dirty bit in the page table entry set when page is written
To improve CPU utilization, the CPU muts switch to service other processes during a disk write
System is thrashing when a chain of page faults occur → CPU utilization falls : spending all of its time paging.
Load control: determining how many jobs can be in memory at one time
When a process faults & memory is full, some page must be swapped out
Replacement algorithms
Clairvoyant: replace the page that won’t be needed for the longest time in the future – optimal but impractical (can’t look forward in
time).
LRU: replace the page that hasn’t been referenced for the longest time
Look backwards and use the recent past to predict the near future.
1
Time 20 3 4 5 6 7 8 9
c
Requests a d b e b a b c
a 0 aa a a a a a a a
b 1 bb b b b b b b b
Frames
Page
c 2 cc c c e e e e e
d 3 dd d d d d d d c
Faults • • •
a=2 a=7 a=7
Time page b=4 b=8 b=8
last used c=1 e=5 e=5
d=3 d=3 c=9
Replaced a page that we’re just going to reference 1 (virtual) time unit later.
LRU Replacement: Implementation
1
Time 20 3 4 5 6 7 8 9
c
Requests a d b e b a b c
a 0 aa a a a a a a a
b 1 bb b b b b b b b
Frames
Page
c 2 cc c c e e e e e
d 3 dd d d d d d d c
Faults • •
c a d b e b a b c d
LRU
c a d b e b a b c
page stack
c a d d e e a b
c a a d d e a
Page to replace c d e
LRU Replacement: Approximation
Clock hand sweeps over pages looking for one with used bit = 0.
The search starts with the 1st valid entry in the page table and there-after continue where it left off last time.
begin
else
one to translate Virtual Address into Physical Address (page table lookup)
Typical design: fully associative, 16–512 entries, 0.5–1 cycle for hit, 10–100 cycles for miss, 0.01%–1% miss rate, random/FIFO
replacement policy.
¼ t hit ¾t
VA PA miss
TLB Main
CPU Cache
Lookup Memory
miss hit
Page table
translation
data
No, if the page is already loaded into main memory → finds & loads the information from the page table into the TLB (takes ~10’s of
cycles).
Yes, if the page is not in main memory → it’s a true page fault (takes ~millions of cycles to service a page fault).
TLB misses are much more frequent than true page faults
Summary: steps in memory access