Hash Table Principles and Applications
Core Concept of Hash Tables
A Hash Table is a data structure designed to perform insertion, deletion, and search operations in O(1) average time, trading off extra memory for speed.
The central idea is Time-Space Trade-off: use a hash function to map Keys to array indices, enabling direct access and eliminating the O(n) overhead of sequential comparison.
key "alice" → hash("alice") % 8 = 3 → Stored at array[3]
key "bob" → hash("bob") % 8 = 5 → Stored at array[5]
key "carol" → hash("carol") % 8 = 1 → Stored at array[1]
Lookup "bob": hash("bob") % 8 = 5, direct access array[5] → O(1)
Hash Functions
A high-quality hash function satisfies:
- Determinism: The same key always yields the same hash value.
- Uniformity: Spreads keys evenly across the available array range.
- Efficiency: Computes hash values rapidly.
Common Function (Integer Keys):
h(k) = k mod m (m is typically a prime to reduce periodic collisions)
Java's String.hashCode() uses polynomial rolling hash:
// Simplified logic of Java String hashing
int hash = 0;
for (char c : s.toCharArray()) {
hash = hash * 31 + c;
}
// 31 is prime; 31 * i = 32 * i - i = (i << 5) - i, which optimized via bit shifts
Hash Collisions and Resolutions
A Hash Collision occur when two distinct keys hash to the same array index. While collisions are inevitable due to the Pigeonhole Principle, they are manageable.
Separate Chaining
Each array slot points to a linked list. Colliding elements are appended to the list at that slot.
array[3]: → [("alice", 1)] → null
array[5]: → [("bob", 2)] → [("eve", 7)] → null ← Collision resolved!
array[1]: → [("carol", 3)] → null
Java's HashMap implements this with an optimization:
When a list's length $\ge 8$ and array capacity $\ge 64$, the list converts into a Red-Black Tree, reducing the worst-case lookup from O(n) to O(log n).
// HashMap internal threshold constants
static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;
Open Addressing
Upon collision, search for the next available empty slot within the array itself.
Linear Probing: Search for the next available slot sequentially.
Array: [_, _, alice, _, bob, eve, _, _]
^2 ^4 ^5
Collision at 5? Try 6, then 7, then rollover to 0... until an empty slot is found.
Disadvantage: Clustering—collisions form continuous sequences, making subsequent collisions more frequent.
Load Factor
Load Factor = $(\text{number of stored elements}) / (\text{array capacity})$.
Java's HashMap defaults to 0.75. When the number of elements exceeds 75% of capacity, it triggers a Resize: the capacity is doubled, and all elements are rehashed (O(n) overhead).
0.75 is the sweet spot between time and space: too low (e.g., 0.5) wastes memory; too high (e.g., 0.9) increases collisions and degrades performance.
Java HashMap Internal Implementation
// Default initial capacity is 16 (always a power of 2 for bitwise modulo optimization)
// h & (capacity - 1) is equivalent to h % capacity when capacity is a power of 2.
Why power of 2?
The modulo operation h % m can be replaced by the faster bitwise h & (m-1) if $m$ is a power of 2.
HashMap Put Workflow:
- Compute hash:
(h = key.hashCode()) ^ (h >>> 16)(Mixes high and low bits to reduce collisions). - Compute index:
hash & (capacity - 1). - If slot is empty: Insert immediately.
- If slot is occupied:
- If keys are identical (
equals()): Update value. - If different and current structure is a list: Append at tail (Java 8+).
- If different and current structure is a tree: Insert into Red-Black Tree.
- If keys are identical (
- If total size transcends threshold, resize.
Core Java Hash Structures
HashMap: Key-Value Mapping
Map<String, Integer> map = new HashMap<>();
map.put("a", 1);
map.getOrDefault("b", 0); // Return default if key missing
map.merge("a", 1, Integer::sum); // Increment if exists, else initialize
map.containsKey("a");
// Iteration
for (Map.Entry<String, Integer> e : map.entrySet()) {
System.out.println(e.getKey() + ": " + e.getValue());
}
HashSet: Element De-duplication
Set<Integer> set = new HashSet<>();
set.add(1);
set.contains(1); // O(1)
set.remove(1);
LinkedHashMap: Preserve Insertion Order
Map<String, Integer> lruMap = new LinkedHashMap<>(16, 0.75f, true) {
// If accessOrder=true, entries are sorted by access recency (LRU baseline)
@Override
protected boolean removeEldestEntry(Map.Entry<String, Integer> eldest) {
return size() > 10; // Auto-evict oldest once capacity exceeds 10
}
};
TreeMap: Key-Ordered
TreeMap<Integer, String> treeMap = new TreeMap<>();
treeMap.put(3, "c");
treeMap.put(1, "a");
treeMap.firstKey(); // 1 (Greatest lower bound)
treeMap.floorKey(2); // 1 (Largest key <= 2)
treeMap.ceilingKey(2); // 3 (Smallest key >= 2)
// Built on Red-Black Tree; lookup is O(log n).
Common Architectural Patterns
Pattern 1: Two Sum (Complement Finding)
// Time O(n), Space O(n)
int[] twoSum(int[] nums, int target) {
Map<Integer, Integer> seen = new HashMap<>();
for (int i = 0; i < nums.length; i++) {
int complement = target - nums[i];
if (seen.containsKey(complement)) {
return new int[]{seen.get(complement), i};
}
seen.put(nums[i], i);
}
return new int[]{};
}
Pattern 2: Frequency Counting
// Check if t is an anagram of s (same characters, different order)
boolean isAnagram(String s, String t) {
if (s.length() != t.length()) return false;
int[] count = new int[26]; // Array is faster than HashMap for small, fixed ranges
for (char c : s.toCharArray()) count[c - 'a']++;
for (char c : t.toCharArray()) count[c - 'a']--;
for (int n : count) if (n != 0) return false;
return true;
}
Pattern 3: Grouping (Characteristic Aggregation)
List<List<String>> groupAnagrams(String[] strs) {
Map<String, List<String>> groups = new HashMap<>();
for (String s : strs) {
char[] chars = s.toCharArray();
Arrays.sort(chars); // Sorted string serves as a unique key for anagrams
String key = new String(chars);
groups.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
}
return new ArrayList<>(groups.values());
}
Complexity Summary
| Operation | Average | Worst (Total Collision) |
|---|---|---|
| Insertion | O(1) | O(n) |
| Search | O(1) | O(n) (Linked List) / O(log n) (Tree Optimization) |
| Deletion | O(1) | O(n) |
| Space | O(n) | O(n) |
Hash table O(1) is an amortized average, not a guarantee. Performance can degrade under adversarial inputs (Hash DoS). Red-Black Tree fallbacks in Java mitigate this effectively.
Summary
The essence of a hash table is mapping any Key to an integer index via a hash function for direct indexing.
Critical Design choices:
- Chaining vs. Open Addressing: Chaining is easier to implement; Open Addressing offers better data locality.
- Expansion Threshold: Java's 0.75 provides a balance; expansion is O(n).
- Java 8 Optimization: Red-Black tree conversion prevents O(n) worst-case lookups.
Engineering Pro-Tip: "If you need to check if a value has appeared before or track attributes of encountered elements quickly, use a HashMap."