MySQL Architecture and Storage Engines
Overall MySQL Architecture
MySQL operates on a layered architecture that separates the "brain" (logic/SQL handling) from the "limbs" (data storage).
Client (Java/Python/CLI/GUI)
│
▼
┌───────────────────────────────────────┐
│ Connection Handler │ ← Authentication, Connection Mgmt
├───────────────────────────────────────┤
│ Query Cache (Removed 8.0) │ ← Instant returns on hits
├───────────────────────────────────────┤
│ Parser │ ← Syntax analysis, AST building
├───────────────────────────────────────┤
│ Optimizer │ ← Execution plans, index selection
├───────────────────────────────────────┤
│ Executor │ ← API calls to the engine
├───────────────────────────────────────┤
│ Storage Engines (Pluggable) │
│ ┌──────┐ ┌──────┐ ┌────────┐ │
│ │InnoDB│ │MyISAM│ │Memory │ │
│ └──────┘ └──────┘ └────────┘ │
└───────────────────────────────────────┘
MySQL uses a Pluggable Storage Engine Architecture. The "Server Layer" handles everything cross-engine (SQL parsing, optimization, permissions), while the "Storage Engine Layer" is responsible for the actual storage and retrieval of data on disk/memory.
InnoDB vs. MyISAM
| Feature | InnoDB | MyISAM |
|---|---|---|
| Transactions | ✅ ACID Compliant | ❌ Not supported |
| Locking | Row-level locking | Table-level locking |
| Foreign Keys | ✅ Supported | ❌ Not supported |
| MVCC | ✅ Supported | ❌ Not supported |
| Crash Recovery | ✅ Redo logs | ❌ High risk of corruption |
| Index Type | Clustered (Index + Data) | Non-clustered (Index points to data) |
| Caching | Caches Data AND Indices | Caches ONLY Indices |
| Default | Yes (5.5+) | No |
InnoDB is the industry standard for almost all production scenarios due to its reliability and high-concurrency support.
Core Components of InnoDB
- Clustered Index: Data is physically organized by the Primary Key. The leaf node of the Primary Key index is the data row itself.
- MVCC (Multi-Version Concurrency Control): Allows reads and writes to happen simultaneously without blocking each other.
- Redo Log: A "Write-Ahead Logging" (WAL) mechanism that ensures durability. Changes are written to the log first so they can be recovered if the system crashes before data pages hit the disk.
- Buffer Pool: A giant chunk of memory used to cache data and index pages. It minimizes disk I/O by keeping "hot" data in RAM.
The Lifecycle of a SQL Statement
SELECT * FROM users WHERE id = 1;
- Connection: Authenticate and check permissions.
- Parsing: Break down the string. Does
usersexist? Isida valid column? Build an Abstract Syntax Tree (AST). - Optimization: Should I use the
idindex? Or a full table scan? The optimizer picks the cheapest path. - Execution: The executor calls the InnoDB API.
- Check the Buffer Pool (RAM).
- If miss, load from the
.ibdfile on disk to RAM. - Return the row.
Deep Technical Insights
Why Row-Level Locking Wins
In MyISAM, if a user updates one row, the entire table is locked. No one else can read or write until it's done. In InnoDB, only the specific row (or range) is locked. This is the difference between a library that only lets one person in at a time vs. a library where multiple people can read different books simultaneously.
The Power of WAL (Write-Ahead Logging)
Writing to a data file on disk is an "expensive" random I/O operation (the pointer has to move to specific blocks). Writing to a Redo Log is an "inexpensive" sequential I/O (just append at the end). By using Redo Logs, MySQL can instantly confirm a transaction as committed while deferring the heavy work of updating data files until later.