正在切换页面...

Indexing Principles

hardMySQLIndexingB+ TreeHash IndexUpdated

What is an Index?

An index is a specialized data structure that helps MySQL locate data efficiently. Without an index, the engine must perform a "Full Table Scan" (checking every single row). With an index, it can jump straight to the relevant data.

The Book Metaphor: An index is the Table of Contents at the back of a 1,000-page book. Instead of flipping through every page to find "B+ Trees," you check the index, see "Page 567," and go there directly.

Why B+ Trees?

InnoDB uses B+ Trees as its primary indexing structure. Why not Binary Trees, Red-Black Trees, or Hash Tables?

Why not Red-Black Trees?

Binary trees are too deep. For 1 million rows, a Red-Black tree has a height of ~~20. Every level usually requires a distinct disk I/O. 20 I/Os for one row is unacceptable. A B+ Tree, being "wide and flat" (multi-way), can store millions of rows in just 3~~4 levels.

Why not Hash Indexes?

Hash indexes are extremely fast ($O(1)$) but only support exact matches (=, IN). They cannot handle range queries (<, >, BETWEEN), sorting (ORDER BY), or prefix matching. A B+ Tree treats all these operations natively.

B-Tree vs. B+ Tree

Feature	B-Tree	B+ Tree
Data Storage	Every node stores keys + full row data	Only leaf nodes store data
Node Layout	Thick (fewer keys per node)	Thin (many keys per node)
Query Path	Varies (might stop at middle node)	Always travels to leaf (Stable)
Scanning	Requires complex tree traversal	Linked List on leaves (Sequential)

B+ Tree Topology:
                    [10 | 20 | 30]           ← Non-leaf (Pointers only)
                   ╱     │      ╲
          [3|5|8]    [12|15|18]    [22|25|28]
         ╱  │  ╲    ╱   │   ╲     ╱   │   ╲
        Data Data Data Data Data Data Data Data  ← Leaf (Full Row/PK)
        ◀═════════════════════════════════════▶  Double Linked List

Fan-out Advantage: Because non-leaf nodes in a B+ Tree don't store row data, they can hold thousands of child pointers. This massive "fan-out" keeps the tree extremely short, minimizing expensive disk disk seek operations.

InnoDB Index Types

Clustered Index (Primary Key Index)

In InnoDB, the table is the index. Data rows are physically sorted by the Primary Key. The leaf node of a clustered index contains the complete data row. There can be only one clustered index per table.

Secondary Index (Non-Clustered)

The leaf node of a secondary index (e.g., an index on email) does not store the full row. Instead, it stores the Primary Key value.

The "Double Lookup" (Bookmark Lookup): If you search by email, MySQL finds the email in the secondary index, retrieves the id (Primary Key), and then performs a second search in the Clustered Index to get the actual row. This second step is called a "Key Lookup" or "Returning to the Table" (回表).

Architectural Insights

The Magic of the Linked List

The linked list connecting leaf nodes is what makes BETWEEN 10 AND 50 so fast. Once MySQL finds the first match at a leaf node, it simply follows the linked list to harvest the rest of the range. No tree re-traversal required.

Page Size and Pointers

By default, InnoDB pages are 16KB. A typical 8-byte pointer + 8-byte key means a single non-leaf page can point to ~1,000 children. $1000^3 = 1$ billion. This is why B+ Trees stay so shallow despite massive data volumes.