Memory Allocation & Fragmentation
Memory Fragmentation
Memory fragmentation is the phenomenon where memory becomes inefficiently utilized, leading to wasted space or the inability to satisfy allocation requests despite having enough total free memory.
External Fragmentation
Occurs when the free memory is scattered in small, non-contiguous blocks throughout the physical memory space.
- The Problem: The total sum of free memory might be 100MB, but if the largest contiguous block is only 5MB, a request for a 10MB block will fail.
- Common in: Segmentation architectures.
Internal Fragmentation
Occurs when the memory allocated to a process is strictly larger than what the process actually requested, and the excess memory is locked inside the allocated block, completely unusable by anything else.
- The Problem: If the system uses fixed 4KB pages, and a process requests 4.1KB of memory, the OS must allocate two full pages (8KB). The remaining 3.9KB in the second page is wasted.
- Common in: Paging architectures.
Contiguous Allocation Strategies
When dealing with external fragmentation, how does the OS choose which "hole" (free block) to allocate to a new process?
| Strategy | Methodology | Characteristics |
|---|---|---|
| First-Fit | Scan from the beginning and allocate the very first hole that is large enough. | Extremely fast. However, it tends to concentrate small, unusable fragments at the low end of the address space. |
| Best-Fit | Scan all holes and allocate the smallest hole that satisfies the request. | Leaves the smallest possible leftover fragment. However, scanning takes time, and the resulting tiny fragments are often completely useless. |
| Worst-Fit | Scan all holes and allocate the largest available hole. | Leaves a massive leftover fragment that is still large enough to be highly useful for future requests. However, it quickly depletes the largest available blocks. |
The Buddy System
The Linux kernel manages physical page frames using the Buddy System. Core Philosophy: Memory is always divided into blocks sized as powers of 2 (e.g., 4KB, 8KB, 16KB, 32KB). When allocating, blocks are repeatedly split in half. When freeing, adjacent "buddies" are merged back together.
Request: 21KB → Round up to the nearest power of 2: 32KB
256KB Block
├── 128KB
│ ├── 64KB
│ │ ├── 32KB ← Allocated (Internal Fragmentation: 11KB)
│ │ └── 32KB (Free Buddy)
│ └── 64KB (Free)
└── 128KB (Free)
On Free: The 32KB block is released → Merges with its free 32KB buddy to form 64KB → Merges with its 64KB buddy to form 128KB → ...
Pros: Allocation and deallocation are extremely fast ($O(\log n)$). It effectively eliminates severe External Fragmentation because merging is guaranteed and immediate. Cons: High Internal Fragmentation due to power-of-2 rounding.
The Slab Allocator
The Buddy System's minimum allocation unit is a single Page (4KB). However, the Linux kernel frequently needs to allocate millions of tiny data structures (like process descriptors, inodes, or socket buffers) that are only dozens of bytes in size. Using the Buddy System directly would cause catastrophic internal fragmentation.
The Slab Allocator sits on top of the Buddy System. It requests a full page from the Buddy System, and then carves that page up into identically-sized chunks specifically tailored for a specific type of kernel object.
Slab (A single Page allocated by the Buddy System)
┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐
│obj │ │obj │ │free│ │obj │ │free│
│Used│ │Used│ │ │ │Used│ │ │
└────┘ └────┘ └────┘ └────┘ └────┘
All objects are exactly the same size (e.g., `task_struct`)
Architectural Advantages:
- Zero Internal Fragmentation: The chunks are exactly the size of the object.
- Object Caching: When a
task_structis freed, the memory isn't returned to the OS. The Slab keeps it initialized. The next time atask_structis needed, the pre-initialized block is instantly handed out, bypassing constructor overhead. - Speed: Allocation is a simple pointer bump.
The Linux Memory Allocation Landscape
How does it all fit together?
- User Space (
malloc()):- For small allocations: Modifies the data segment boundary using the
brk()system call (expanding the heap). - For large allocations (>128KB): Bypasses the heap and asks the kernel directly for anonymous memory pages via the
mmap()system call.
- For small allocations: Modifies the data segment boundary using the
- Kernel Space:
- The Buddy System manages raw physical page frames.
- The Slab Allocator manages small, frequently used kernel objects on top of the Buddy System.
System Design Audit & Observability
Memory fragmentation is a silent killer in long-running applications like databases or caching layers (e.g., Redis).
1. The Glibc malloc Heap Fragmentation Trap
In user space, C/C++ applications relying on standard glibc malloc() can suffer severe external fragmentation if they allocate and free highly variable object sizes over weeks of uptime. The heap grows, but memory isn't returned to the OS because a few live objects are "pinning" the high addresses.
- Audit Protocol: If an application exhibits a steadily growing Resident Set Size (RSS) but its internal metric of "used memory" is low, you are experiencing heap fragmentation. Switch to an arena-based allocator like jemalloc (used by Redis) or tcmalloc (used by Google), which are specifically designed to group similar-sized allocations into thread-local slabs, drastically reducing fragmentation.
2. Observing Kernel Slab Consumption
Sometimes a server runs out of memory, but user-space applications (checked via top or ps) show low consumption. The culprit is often a kernel-level Slab leak (e.g., millions of lingering network sockets or file descriptors).
- Audit Command: Run
slabtopor inspectcat /proc/slabinfo. If you see massive consumption underdentry(directory entries),inode_cache, orskbuff_head_cache(network buffers), the kernel itself is holding the memory. You can manually drop filesystem caches usingecho 3 > /proc/sys/vm/drop_cachesto reclaimdentryandinodeslabs if the system is starved.
3. Tuning the Buddy System for HugePages
The Buddy System's power-of-2 merging is mathematically elegant, but after months of uptime, physical memory becomes heavily fragmented. Even if 10GB is free, there might not be a single contiguous 2MB block available.
- Audit Protocol: If you intend to use
HugePages(which require large, contiguous blocks of physical memory), you must allocate them at boot time. Attempting to allocateHugePageson a system that has been running for weeks will likely fail due to Buddy System external fragmentation. Useecho 1 > /proc/sys/vm/compact_memoryto force the kernel to aggressively defragment physical memory, though this is a highly expensive CPU operation.