Indexing Principles
What is an Index?
An index is a specialized data structure that helps MySQL locate data efficiently. Without an index, the engine must perform a "Full Table Scan" (checking every single row). With an index, it can jump straight to the relevant data.
The Book Metaphor: An index is the Table of Contents at the back of a 1,000-page book. Instead of flipping through every page to find "B+ Trees," you check the index, see "Page 567," and go there directly.
Why B+ Trees?
InnoDB uses B+ Trees as its primary indexing structure. Why not Binary Trees, Red-Black Trees, or Hash Tables?
Why not Red-Black Trees?
Binary trees are too deep. For 1 million rows, a Red-Black tree has a height of 20. Every level usually requires a distinct disk I/O. 20 I/Os for one row is unacceptable. A B+ Tree, being "wide and flat" (multi-way), can store millions of rows in just 34 levels.
Why not Hash Indexes?
Hash indexes are extremely fast ($O(1)$) but only support exact matches (=, IN). They cannot handle range queries (<, >, BETWEEN), sorting (ORDER BY), or prefix matching. A B+ Tree treats all these operations natively.
B-Tree vs. B+ Tree
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data Storage | Every node stores keys + full row data | Only leaf nodes store data |
| Node Layout | Thick (fewer keys per node) | Thin (many keys per node) |
| Query Path | Varies (might stop at middle node) | Always travels to leaf (Stable) |
| Scanning | Requires complex tree traversal | Linked List on leaves (Sequential) |
B+ Tree Topology:
[10 | 20 | 30] ← Non-leaf (Pointers only)
╱ │ ╲
[3|5|8] [12|15|18] [22|25|28]
╱ │ ╲ ╱ │ ╲ ╱ │ ╲
Data Data Data Data Data Data Data Data ← Leaf (Full Row/PK)
◀═════════════════════════════════════▶ Double Linked List
Fan-out Advantage: Because non-leaf nodes in a B+ Tree don't store row data, they can hold thousands of child pointers. This massive "fan-out" keeps the tree extremely short, minimizing expensive disk disk seek operations.
InnoDB Index Types
Clustered Index (Primary Key Index)
In InnoDB, the table is the index. Data rows are physically sorted by the Primary Key. The leaf node of a clustered index contains the complete data row. There can be only one clustered index per table.
Secondary Index (Non-Clustered)
The leaf node of a secondary index (e.g., an index on email) does not store the full row. Instead, it stores the Primary Key value.
The "Double Lookup" (Bookmark Lookup):
If you search by email, MySQL finds the email in the secondary index, retrieves the id (Primary Key), and then performs a second search in the Clustered Index to get the actual row. This second step is called a "Key Lookup" or "Returning to the Table" (回表).
Architectural Insights
The Magic of the Linked List
The linked list connecting leaf nodes is what makes BETWEEN 10 AND 50 so fast. Once MySQL finds the first match at a leaf node, it simply follows the linked list to harvest the rest of the range. No tree re-traversal required.
Page Size and Pointers
By default, InnoDB pages are 16KB. A typical 8-byte pointer + 8-byte key means a single non-leaf page can point to ~1,000 children. $1000^3 = 1$ billion. This is why B+ Trees stay so shallow despite massive data volumes.