Union-Find (Disjoint Set Union)
mediumGraphUnion-FindDSUConnectivity
What is Union-Find?
Union-Find (also known as Disjoint Set Union / DSU) is a specialized data structure designed to solve dynamic connectivity problems efficiently: determining if two elements belong to the same set and merging two distinct sets.
Core Operations:
find(x): Locates the root representative ("representative element") of the set containingx.union(x, y): Merges the sets containingxandyinto a single set.
Union by Rank + Path Compression
By combining these two optimizations, the time complexity for both operations becomes nearly O(1) (specifically, amortized O($\alpha(n)$), where $\alpha$ is the inverse Ackermann function).
class UnionFind {
private final int[] parent;
private final int[] rank; // Union by Rank: estimation of tree height
private int count; // Number of disjoint components
public UnionFind(int n) {
parent = new int[n];
rank = new int[n];
count = n;
for (int i = 0; i < n; i++) {
parent[i] = i; // Every node is its own parent initially
}
}
// Path Compression: Make every node in the search path point directly to the root
public int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}
// Union by Rank: Attach the smaller tree under the larger tree's root
public boolean union(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return false; // Already in the same set
if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
count--;
return true;
}
public boolean connected(int x, int y) {
return find(x) == find(y);
}
public int getCount() { return count; }
}
Classical Problem 1: Number of Islands (Union-Find approach)
While DFS is more common, Union-Find is useful for large/dynamic grids:
int numIslands(char[][] grid) {
if (grid == null || grid.length == 0) return 0;
int rows = grid.length, cols = grid[0].length;
UnionFind uf = new UnionFind(rows * cols);
int waterCount = 0;
int[] dr = {0, 1}, dc = {1, 0}; // Check right and down to avoid redundancy
for (int r = 0; r < rows; r++) {
for (int c = 0; c < cols; c++) {
if (grid[r][c] == '0') {
waterCount++;
continue;
}
for (int d = 0; d < 2; d++) {
int nr = r + dr[d], nc = c + dc[d];
if (nr < rows && nc < cols && grid[nr][nc] == '1') {
uf.union(r * cols + c, nr * cols + nc);
}
}
}
}
// Result = Total slots - water slots - (union operations performed automatically reduced count)
return uf.getCount() - waterCount;
}
Classical Problem 2: Redundant Connection (LeetCode 684)
In an undirected graph, find an edge that could be removed such that the remaining graph is a tree (acyclic and connected).
int[] findRedundantConnection(int[][] edges) {
int n = edges.length;
UnionFind uf = new UnionFind(n + 1); // Nodes are 1-indexed
for (int[] edge : edges) {
// If union fails, it means the two nodes were already connected.
// Adding this edge creates a cycle.
if (!uf.union(edge[0], edge[1])) {
return edge;
}
}
return new int[0];
}
Classical Problem 3: Accounts Merge (LeetCode 721)
Merge email accounts that belong to the same person based on shared email addresses.
public List<List<String>> accountsMerge(List<List<String>> accounts) {
// Treat emails as nodes. If two emails appear in the same account list, union them.
// Use a HashMap to map email -> ID for the UnionFind structure.
// ... logic omitted for brevity, implementation strategy similar to DSU components ...
// Steps:
// 1. Map each email to a unique index.
// 2. Perform unions across emails in the same account.
// 3. Group emails by their UnionFind root.
return new ArrayList<>();
}
Union-Find vs. BFS/DFS
| Context | Recommended Solution | Why? |
|---|---|---|
| Connectivity only (Static Graph) | Union-Find | Faster, compact code. |
| Path finding (Finding the way) | BFS/DFS | Union-Find doesn't store paths. |
| Dynamically adding edges | Union-Find | Optimal for online updates; BFS/DFS would need full rebuild. |
| Shortest Path | BFS | BFS inherently finds Layer 0 $\rightarrow$ Target. |
Advanced Variants
- Weighted Union-Find: The
rankorparentstores a value representing distance or probability (e.g., LeetCode 399: Evaluate Division). - Undoable DSU: Union by rank without path compression allows $O(log N)$ rollbacks.
- Cycle Detection: As seen in "Redundant Connection," DSU is the standard for detecting cycles in undirected graphs.
Summary
- Path Compression flattens the tree during
find, making future lookups $O(1)$. - Union by Rank ensures the tree remains balanced, preventing it from degenerating into a linked list.
- Use when you need to answer: "Are these two things connected?" or "How many separate groups exist?"