Linux File Systems
Everything is a File
One of the most foundational design philosophies of Linux is "Everything is a file". Not only are standard documents treated as files, but directories, devices, pipes, and sockets are all abstracted as files. They are uniformly manipulated via the standard open() / read() / write() / close() system calls.
| File Type | Symbol | Example |
|---|---|---|
| Regular File | - |
/etc/passwd |
| Directory | d |
/home/user/ |
| Symbolic Link | l |
/usr/bin/python -> python3 |
| Block Device | b |
/dev/sda (Disk) |
| Character Device | c |
/dev/tty (Terminal) |
| Pipe | p |
Named Pipe |
| Socket | s |
/var/run/mysql.sock |
The advantage of this abstraction is a unified interface: a program does not need to know whether it is operating on a disk file or a network socket; it simply uses the exact same suite of system calls.
inode: The Identity Card of a File
In a Linux file system, the filename is merely a label. The true identity of a file is its inode (Index Node).
Directory Entry (dentry) inode Data Blocks
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Name: foo.txt│──────▶│ inode ID: 42 │──────▶│ Actual File Data │
│ inode: 42 │ │ Size: 1024 │ │ Hello World... │
└──────────────┘ │ Perms: 644 │ └─────────────────┘
│ Owner: root │ ┌─────────────────┐
│ Timestamps │──────▶│ More Data Blocks│
│ Block Ptrs │ └─────────────────┘
│ Link Count: 1│
└──────────────┘
Information Stored in an inode
| Information | Description |
|---|---|
| File Type | Regular file, directory, link, etc. |
| Permissions | Read/Write/Execute bits (9 bits) |
| Owner / Group | UID and GID |
| File Size | Size in bytes |
| Timestamps | atime (access), mtime (modify), ctime (metadata change) |
| Hard Link Count | Number of directory entries pointing to this inode |
| Data Block Pointers | Pointers to the actual disk blocks storing the data |
Critical Concept: The inode does not store the filename. Filenames are stored in the directory entry (dentry), which acts as a mapping from the "filename → inode number".
Data Block Pointer Structure
For large files, the inode utilizes a multi-level indirect pointer architecture to address data blocks:
inode
├── Direct Pointers ×12 → Point directly to data blocks (up to 48KB)
├── Single Indirect ×1 → Points to a block of pointers → data blocks
├── Double Indirect ×1 → Pointer block → Pointer block → data blocks
└── Triple Indirect ×1 → Pointer block → Pointer block → Pointer block → data blocks
Assuming a 4KB block size and 4-byte pointers:
- Direct: 12 × 4KB = 48KB
- Single: 1024 × 4KB = 4MB
- Double: 1024² × 4KB = 4GB
- Triple: 1024³ × 4KB = 4TB
Hard Links vs. Symbolic Links
Hard Links
A hard link occurs when multiple directory entries point to the exact same inode. It is akin to a single person having multiple names (but only one social security number).
Directory Entry A inode 42
┌──────────┐ ┌──────────────┐
│ foo.txt │──────▶│ Link Count: 2│──────▶ Data Blocks
└──────────┘ └──────────────┘
▲
Directory Entry B │
┌──────────┐ │
│ bar.txt │─────────────┘
└──────────┘
Delete foo.txt → Link count becomes 1 → Data remains untouched.
Delete bar.txt → Link count becomes 0 → Data blocks are freed.
# Create a hard link
ln foo.txt bar.txt
# Verification: Both share the exact same inode number
ls -li foo.txt bar.txt
# 42 -rw-r--r-- 2 user group 1024 foo.txt
# 42 -rw-r--r-- 2 user group 1024 bar.txt
Symbolic Links (Soft Links)
A symbolic link is an independent file whose data content is merely the path string to the target file. It behaves similarly to a Windows shortcut.
Directory Entry inode 100 Data Block
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ link.txt │──────▶│ Type: Symlink│───▶│ "/path/foo" │
└──────────┘ └──────────────┘ └──────────────┘
│
▼ Path resolved at runtime
inode 42
┌──────────────┐
│ Actual Data │
└──────────────┘
# Create a symbolic link
ln -s foo.txt link.txt
# Verification: They possess different inode numbers
ls -li foo.txt link.txt
# 42 -rw-r--r-- 1 user group 1024 foo.txt
# 100 lrwxrwxrwx 1 user group 7 link.txt -> foo.txt
Direct Comparison
| Dimension | Hard Link | Symbolic Link |
|---|---|---|
| Nature | Multiple dentries for the same inode | Independent file storing a path string |
| inode Number | Identical | Different |
| Target Deletion | Data persists (Link count > 0) | Link becomes dangling (broken) |
| Cross-Filesystem | ❌ Forbidden | ✅ Supported |
| Link to Directory | ❌ Forbidden (prevents cycles) | ✅ Supported |
| File Size | Same as target | Length of the path string |
File Permission Model
Linux file permissions are segmented into three octets, each containing three bits:
Owner (u) Group (g) Others (o)
rwx rwx rwx
421 421 421
Example: -rwxr-xr-- = 754
│ │││ │││ │││
│ │││ │││ └── Others: r-- = 4 (Read-only)
│ │││ └── Group: r-x = 5 (Read + Execute)
│ └── Owner: rwx = 7 (Read + Write + Execute)
└── File Type: - (Regular File)
Permission Semantics
| Permission | On a File | On a Directory |
|---|---|---|
r (read) |
Can read file content | Can list directory contents (ls) |
w (write) |
Can modify file content | Can create/delete files within the directory |
x (execute) |
Can execute as a program | Can traverse/enter the directory (cd) |
Special Permission Bits
| Permission | Symbol | Numeric | Effect |
|---|---|---|---|
| SetUID | s (Owner x-bit) |
4000 | Inherits owner's privileges during execution |
| SetGID | s (Group x-bit) |
2000 | Inherits group's privileges during execution |
| Sticky | t (Others x-bit) |
1000 | Files in the directory can only be deleted by their owners |
# SetUID Example: passwd command modifies /etc/shadow (requires root write)
ls -l /usr/bin/passwd
# -rwsr-xr-x 1 root root 68208 passwd
# ^-- SetUID: Regular users execute this with temporary root privileges
# Sticky Bit Example: /tmp directory
ls -ld /tmp
# drwxrwxrwt 10 root root 4096 /tmp
# ^-- Sticky: Anyone can create files, but can only delete their own
Common Permission Commands
# Modify permissions (Numeric)
chmod 755 script.sh # rwxr-xr-x
# Modify permissions (Symbolic)
chmod u+x script.sh # Add execute for owner
chmod go-w file.txt # Remove write for group/others
chmod a+r file.txt # Add read for all
# Change owner
chown user:group file.txt
# Check default permission mask
umask # Typically 022
# New file perm = 666 - 022 = 644 (rw-r--r--)
# New dir perm = 777 - 022 = 755 (rwxr-xr-x)
The ext4 File System
ext4 (Fourth Extended Filesystem) is the prevailing default file system in Linux. It introduces massive structural improvements over ext3:
| Feature | ext3 | ext4 |
|---|---|---|
| Max File Size | 2TB | 16TB |
| Max Volume Size | 16TB | 1EB |
| Subdirectories | 32,000 | Unlimited |
| Block Allocation | Direct / Indirect | Extent (Contiguous ranges) |
| Journal Checksum | No | Yes |
| Delayed Allocation | Unsupported | Supported |
The Extent Mechanism
ext4 replaced the archaic indirect block mapping with Extents. An extent records a contiguous range of physical blocks:
Traditional (ext3): Records every single block pointer
inode → [blk1] [blk2] [blk3] [blk4] [blk5] ...
A 1GB file requires ~260,000 block pointers.
Extent (ext4): Records start block + length
inode → [Start_Block=100, Length=256,000]
A 1GB contiguous file requires exactly 1 Extent.
This dramatically reduces metadata overhead and drastically accelerates read/write throughput for large files by ensuring contiguous disk access.
System Design Audit & Observability
1. Inode Exhaustion vs. Space Exhaustion
A file system can report "No space left on device" even when disk space is plentiful. This occurs when the inode pool is exhausted (typically caused by millions of zero-byte files or an unmanaged /tmp).
- Audit Command:
df -ireveals inode consumption, distinct from standard block consumptiondf -h.
2. The "Phantom Disk Space" Leak
A critical production issue arises when a log file is deleted via rm while a daemon (e.g., Nginx, Docker) still holds an open file descriptor (fd) to it.
- Mechanic: Linux reclaims disk blocks only when the inode link count drops to 0 AND the open fd count drops to 0.
- Audit Command: Use
lsof | grep deletedto identify ghost files monopolizing disk space. The correct mitigation is to truncate the file (> app.log) rather thanrm, or force the daemon to reload its descriptors.
3. Hard Link Constraints in Builds
When designing high-performance build systems (like Gradle or Bazel), hard links are used to deduplicate build caches. However, architects must enforce boundaries: hard links cannot span across mount points. Build caching infrastructure must ensure the cache directory resides on the identical physical volume as the build working directory.