HashMap Internal Principles
hardHashMapHashingRedBlackTreePerformance
The HashMap is the most versatile tool in the Java professional's arsenal. In JDK 1.8+, it evolved into a sophisticated Array + Linked List + Red-Black Tree hybrid architecture designed to maintain $O(1)$ performance even under collision-heavy conditions.
1. The Architectural Blueprint
A HashMap consists of an array of Node<K,V> buckets. Each bucket handles collisions via:
- Linked List: Used for the first few collisions.
- Red-Black Tree (Treeification): When a bucket's length exceeds 8 AND the total map capacity is at least 64, the list is transformed into a Red-Black Tree to prevent $O(n)$ performance degradation (Worst-case search becomes $O(\log n)$).
2. The Power of 2 Rationale
HashMap capacity is always $2^n$ (e.g., 16, 32, 64). This is used to optimize the index calculation:
- Index Formula:
(n - 1) & hash - Why it works: When $n$ is a power of 2, $(n-1)$ is a bitmask of all 1s (e.g., $16-1 = 1111$ in binary). Bitwise ANDing with the hash is an ultra-fast alternative to the modulo operator (
%).
3. Resizing and Expansion: The High-Bit Trick
When the map reaches its Load Factor (default 0.75), it doubles its capacity. JDK 1.8 introduced a genius optimization for the resize:
- No Re-hashing: Instead of calculating
hash & (newCap - 1)for every element, the map performs a single-bit check:hash & oldCap. - Outcome:
- If the result is 0, the element stays at its original index
i. - If the result is non-zero, the element moves to
i + oldCap. This avoids full $O(n)$ re-calculation and keeps the distribution uniform.
- If the result is 0, the element stays at its original index
4. Key Takeaways for Production
- Initial Capacity: Always calculate
ExpectedSize / 0.75 + 1to avoid expensive resizing. - HashCode and Equals: You MUST override both. If two keys are equal, they must have the same
hashCode. - Thread Safety:
HashMapis NOT thread-safe. UseConcurrentHashMapfor multi-threaded environments. Excessive writing in a rawHashMapfrom multiple threads can cause Infinite Loops (in JDK 1.7) or Data Loss (in JDK 1.8).