Virtual Memory & Page Replacement
The Core Philosophy of Virtual Memory
Virtual memory allows programs to utilize an address space significantly larger than the available physical RAM. Its core philosophy is: You do not need to load the entire program into physical memory to run it. The OS only loads the pages currently required for execution; the rest remain stored on the disk (in the Swap space or Pagefile).
Analogy: A library reading desk (Physical RAM) has limited space, but you can always retrieve books from the massive archives (Disk). When your desk is full, you put a book you aren't currently reading back into the archives to make room for a new one.
Virtual Address Space (Massive)
┌────┬────┬────┬────┬────┬────┬────┬────┐
│ P0 │ P1 │ P2 │ P3 │ P4 │ P5 │ P6 │ P7 │
└───┬┴───┬┴────┴───┬┴────┴───┬┴────┴────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────┬────┬────┬────┐ Disk (Swap Space)
│ P0 │ P1 │ P3 │ P5 │ ┌────┬────┬────┐
│ │ │ │ │ │ P2 │ P4 │ P6 │
└────┴────┴────┴────┘ └────┴────┴────┘
Physical RAM (Limited) Temporarily Evicted Pages
The Page Fault
When the CPU attempts to access a virtual page, but that page is not currently in physical RAM (the 'Valid' bit in the Page Table is 0), the hardware triggers a Page Fault exception:
CPU Accesses Virtual Address
│
▼
Check Page Table ── Valid Bit = 1 ──▶ Normal Physical Memory Access
│
Valid Bit = 0 (Page Fault)
│
▼
Hardware Triggers Page Fault Exception
│
▼
OS Kernel Intervenes:
│
├── Free Frame Available → Load page from disk into free frame
│ → Update Page Table
│ → Re-execute the faulting instruction
│
└── Physical RAM is Full → Select a "Victim Page" (Replacement Algorithm)
→ If victim is a "Dirty Page" (modified), write it to disk
→ Load new page into the now-freed frame
→ Update Page Table
→ Re-execute the faulting instruction
The Page Fault Rate directly dictates system performance. Assuming RAM access takes 100ns and Disk access takes 10ms:
- Effective Access Time = (1-p) × 100ns + p × 10,000,000ns
- If p = 0.001 (1 out of 1000 accesses faults), the effective access time balloons to ~10μs. Performance degrades by 100x.
Therefore, an optimal Page Replacement Algorithm is mission-critical.
Page Replacement Algorithms
1. Optimal Replacement Algorithm (OPT)
Replaces the page that will not be used for the longest period of time in the future. This is mathematically optimal but requires clairvoyance (predicting future execution). Therefore, it is impossible to implement in practice and only serves as a theoretical benchmark to evaluate other algorithms.
2. First-In, First-Out (FIFO)
Replaces the page that entered memory earliest. It is simple to implement but performs poorly, as it often evicts heavily used pages simply because they were loaded early.
Page Reference String: 7 0 1 2 0 3 0 4
Frames = 3:
Access 7: [7, -, -] Fault ✓
Access 0: [7, 0, -] Fault ✓
Access 1: [7, 0, 1] Fault ✓
Access 2: [2, 0, 1] Fault ✓ (Evict 7, the oldest)
Access 0: [2, 0, 1] Hit
Access 3: [2, 3, 1] Fault ✓ (Evict 0, the oldest)
Access 0: [2, 3, 0] Fault ✓ (Evict 1, the oldest)
Access 4: [4, 3, 0] Fault ✓ (Evict 2, the oldest)
Total Faults = 7
Belady's Anomaly: Under FIFO, increasing the number of physical frames can paradoxically increase the number of page faults. This is a notorious, counter-intuitive flaw unique to FIFO.
3. Least Recently Used (LRU)
Replaces the page that has not been accessed for the longest time. It relies on the principle of locality—if a page hasn't been used recently, it is unlikely to be used soon.
Same Reference String: 7 0 1 2 0 3 0 4
Frames = 3:
Access 7: [7, -, -] Fault ✓
Access 0: [7, 0, -] Fault ✓
Access 1: [7, 0, 1] Fault ✓
Access 2: [2, 0, 1] Fault ✓ (7 is Least Recently Used)
Access 0: [2, 0, 1] Hit (0 becomes Most Recently Used)
Access 3: [2, 0, 3] Fault ✓ (1 is Least Recently Used)
Access 0: [2, 0, 3] Hit
Access 4: [4, 0, 3] Fault ✓ (2 is Least Recently Used)
Total Faults = 6 (Superior to FIFO's 7)
Implementation Overhead:
- Counter Method: Stamp each page with the clock time on every access. To replace, search all pages for the oldest stamp. High search overhead.
- Stack Method: Use a doubly-linked list. On every access, move the page to the top. The bottom page is always the LRU. Requires 6 pointer manipulations per memory access, which is hardware-prohibitive.
4. Clock Replacement Algorithm (Second Chance)
Because strict LRU is too expensive, the Clock Algorithm approximates LRU. It is the most common implementation in actual operating systems. Every page is assigned a hardware Reference Bit. The MMU sets this bit to 1 automatically whenever the page is read or written.
┌── R=1 ──┐
│ │
┌────▼────┐ │
│ Page A │ │ Clock Pointer
├─────────┤ │ │
│ Page B │ R=0 ◀───────┘
├─────────┤
│ Page C │ R=1
├─────────┤
│ Page D │ R=0 ← Replace this one!
└─────────┘
Algorithm Mechanics:
1. Examine the page under the pointer.
2. If R=0 → Replace this page.
3. If R=1 → Set R=0, move pointer to the next page (give it a "Second Chance").
4. Repeat until an R=0 page is found.
Thrashing
When the system runs too many processes concurrently, the number of physical frames allocated to each process drops below the minimum required to hold their active working sets. Processes constantly trigger page faults, stealing frames from each other. The CPU spends the vast majority of its time resolving page faults (swapping data to/from disk) rather than executing application code. This total system collapse is called Thrashing.
CPU Utilization
▲
│ ╱╲
│ ╱ ╲
│ ╱ ╲ ← Thrashing Begins
│ ╱ ╲
│ ╱ ╲___
│ ╱
└────────────────────▶ Degree of Multiprogramming
Optimal Point (Number of Processes)
Resolution Tactics:
- Decrease the degree of multiprogramming (kill or suspend processes).
- Implement the Working Set Model: Track the set of pages each process has accessed recently, and guarantee that the OS provides enough frames to hold that working set before allowing the process to run.
System Design Audit & Observability
For high-throughput systems (databases, caches, JVMs), swapping to disk is the equivalent of a catastrophic outage. Memory is fast; disk I/O is notoriously slow.
1. The Swap Death Spiral
When Linux runs out of memory, it aggressively pages out "idle" memory to the Swap partition to prevent an OOM (Out Of Memory) killer invocation. If the system starts swapping actively used heap memory, the application will experience unpredictable latency spikes (seconds or even minutes).
- Audit Command: Run
vmstat 1. If thesi(Swap In) andso(Swap Out) columns consistently show non-zero values, your system is actively thrashing. You must immediately provision more RAM, optimize application memory footprint, or reduce the concurrency limits.
2. Tuning Swappiness
Linux controls how aggressively it swaps memory vs. dropping filesystem cache via the swappiness parameter (0-100). The default is usually 60, which is terrible for databases and containerized workloads.
- Audit Protocol: Check
cat /proc/sys/vm/swappiness. For systems running Kubernetes nodes, Elasticsearch, or PostgreSQL, this value should be forcefully set to1or10. This commands the kernel: "Only swap to disk if it is absolutely necessary to prevent a system crash."
3. The JVM GC vs. Swap Conflict
If an application runtime with a Garbage Collector (like Java or Go) has its heap swapped to disk, a GC pause transforms from a 50ms operation into a 30-second system freeze. The GC attempts to scan the entire heap, forcing the OS to swap every single page back into physical RAM, obliterating disk I/O limits.
- Audit Protocol: For latency-critical JVM applications, always disable swap entirely on the host machine (
swapoff -a), or explicitly instruct the JVM to lock its heap into physical memory using-XX:+AlwaysPreTouchand OS-level memory locking (mlockall()), preventing the OS from ever paging it out.