Graph Algorithms: DFS, BFS, and Union Find
2025-04-16 Hanlun Wang
Graph algorithms form the foundation of many complex problem-solving techniques in computer science. This article explores three fundamental graph traversal and manipulation algorithms: Depth-First Search (DFS), Breadth-First Search (BFS), and Union Find. We'll examine their implementations, applications, and differences through practical LeetCode problems.
Problem 1: Number of Islands
Let's start with LeetCode Problem 200: Number of Islands.
Problem Description
Given an m x n
2D binary grid grid
which represents a map of '1'
s (land) and '0'
s (water), return the number of islands.
An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.
Example 1:
Example 2:
Core Approach
The core idea for solving this problem is to traverse the entire grid. When we find an unvisited land cell ('1'), we increment our island count and mark all connected land cells as visited. This way, when we discover another unvisited land cell, we know it belongs to a different island.
This approach raises two key questions:
- How do we mark cells as visited?
- How do we find all connected land cells?
Marking Visited Cells
For the first question, there are two common approaches:
-
In-place modification: Change the value in the original grid (e.g., convert '1' to '0' or add a marker like '-'). This is space-efficient but modifies the input.
-
Using a separate data structure: Create a separate grid or set to track visited positions. This preserves the input but requires additional space.
For most graph problems, if possible, using the first method saves space complexity, so we'll use that approach.
Finding Connected Land Cells
For the second question, we have two classic approaches:
1. Depth-First Search (DFS)
DFS explores as deeply as possible along each branch before backtracking. When we find a land cell ('1'), we immediately explore one of its neighbors, then that neighbor's neighbors, and so on recursively. We follow one path as far as possible before trying alternative paths.
In DFS, the exploration pattern resembles diving deep into one direction first:
- Start at a land cell
- Mark it as visited
- Recursively explore all its unvisited neighbors
- When no more unvisited neighbors exist, backtrack and explore other paths
Let's examine the DFS process in detail:
This recursive approach elegantly handles the depth-first traversal without needing an explicit stack data structure.
2. Breadth-First Search (BFS)
BFS explores all neighbors at the current depth level before moving deeper. Instead of immediately diving into a newly discovered land cell as in DFS, BFS records all possible paths (typically using a queue) and visits them one by one.
The BFS exploration pattern is level by level:
- Start at a land cell
- Mark it as visited
- Enqueue all its unvisited neighbors
- Process the queue one by one, marking and enqueueing new neighbors
Let's examine the BFS process in detail:
Unlike DFS which uses the call stack for recursion, BFS explicitly uses a queue data structure to manage the traversal order.
DFS Solution
The time complexity is O(m×n) where m and n are the grid dimensions, as we potentially visit each cell once. The space complexity is O(m×n) in the worst case due to the recursion stack depth when the grid is filled with land cells in a specific pattern that maximizes stack usage.
BFS Solution
Like the DFS solution, the BFS solution has a time complexity of O(m×n). The space complexity is also O(m×n) in the worst case, which occurs when many land cells are at the same distance from the starting point, causing the queue to grow large.
Comparing DFS and BFS
For the island counting problem, both DFS and BFS are effective:
-
DFS: Leverages recursion to explore paths deeply before backtracking. It navigates from one land cell to the next immediately, following a path as far as possible.
-
BFS: Uses a queue to explore cells level by level. It records all potentially explorable paths first, then processes them one by one.
The key differences are:
- DFS may be more intuitive for this problem and often requires less code
- BFS is often better for finding the shortest path (though not needed here)
- DFS can lead to stack overflow for very large grids due to deep recursion
- BFS tends to use more memory for the queue in the worst case
Union Find
Before explaining how Union Find can solve the Island problem, let's first explore Union Find through another problem.
Problem 2: Redundant Connection
LeetCode Problem 684: Redundant Connection
Problem Description
In this problem, a tree is an undirected graph that is connected and has no cycles.
You are given a graph that started as a tree with n nodes labeled from 1 to n, with one additional edge added. The added edge has two different vertices chosen from 1 to n, and was not an edge that already existed. The graph is represented as an array edges of length n where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the graph.
Return an edge that can be removed so that the resulting graph is a tree of n nodes. If there are multiple answers, return the answer that occurs last in the input.
Example 1:
Example 2:
Core Approach Using Union Find
Union Find (also known as Disjoint Set) is especially well-suited for problems where we need to identify elements that share a common characteristic.
In this problem, we need to find which edge is redundant. An edge is redundant if it connects two nodes that are already connected through some path. Adding such an edge creates a cycle in the graph, which violates the definition of a tree.
The key question is: how do we determine if two nodes are already connected? A straightforward approach is to find their "root" nodes (also called representatives of their sets). If they share the same root, they're already connected.
Union Find uses two main operations:
- Find: Determine which set (or component) an element belongs to
- Union: Merge two sets together
Union Find Implementation in Detail
First, let's initialize the Union Find data structure:
The Find operation uses path compression to locate a node's root efficiently:
Path compression is a crucial optimization that flattens the tree structure during traversal, reducing the time complexity of future operations.
The Union operation merges two sets by rank to maintain a balanced tree structure:
Union by rank ensures that the tree doesn't get too deep, which keeps the Find operation efficient.
Complete Solution for Redundant Connection
Let's walk through example 2 step by step:
- Initially: parent = [0,1,2,3,4,5], rank = [0,1,1,1,1,1]
- Process [1,2]:
- Find roots: 1 and 2 (their own parents)
- Union: parent = [0,1,1,3,4,5], rank = [0,2,1,1,1,1]
- Process [2,3]:
- Find roots: 1 (parent of 2) and 3
- Union: parent = [0,1,1,1,4,5], rank = [0,3,1,1,1,1]
- Process [3,4]:
- Find roots: 1 (parent of 3) and 4
- Union: parent = [0,1,1,1,1,5], rank = [0,4,1,1,1,1]
- Process [1,4]:
- Find roots: 1 and 1 (parent of 4)
- They're already connected, so this edge is redundant
- Return [1,4]
The time complexity is O(n α(n)) where α is the inverse Ackermann function, which grows extremely slowly and is effectively constant for all practical purposes. The space complexity is O(n) for the parent and rank arrays.
Applying Union Find to the Number of Islands Problem
Union Find can also elegantly solve the Number of Islands problem:
How Union Find Works for Island Counting - A Detailed Example
The Union Find approach to the island problem is particularly elegant:
- We start by considering each land cell as its own island
- We maintain a count of total islands (initially equal to the number of land cells)
- As we discover adjacent land cells, we merge their islands using Union
- Whenever we perform a successful union, we decrease our island count
- The final count represents the number of distinct islands
Let's walk through a detailed example:
First, we map this 2D grid to a 1D array (0-5) for easier processing:
Now we initialize:
- parent = [0,1,-1,3,-1,5] (-1 for water cells)
- rank = [0,0,0,0,0,0]
- count = 4 (number of land cells)
Processing step by step:
-
Cell (0,0) at index 0:
- Mark as visited
- Check right neighbor (0,1) at index 1: it's land
- Union(0, 1): parent becomes [0,0,-1,3,-1,5], count = 3
- Check bottom neighbor (1,0) at index 3: it's land
- Union(0, 3): parent becomes [0,0,-1,0,-1,5], count = 2
-
Cell (0,1) at index 1:
- Already visited (marked as "0"), so skip
-
Cell (1,0) at index 3:
- Already visited, so skip
-
Cell (2,2) at index 5:
- Mark as visited
- No adjacent land cells
- This remains a separate island
Final state:
- parent = [0,0,-1,0,-1,5]
- count = 2 (representing 2 distinct islands)
The Union Find solution has a time complexity of O(m×n α(m×n)), which is essentially O(m×n) for all practical purposes. The space complexity is O(m×n) for the parent and rank arrays.
Comprehensive Comparison of the Three Approaches
Let's compare DFS, BFS, and Union Find for the Number of Islands problem:
-
DFS:
- Algorithm: Recursive, explores deeply before backtracking
- Implementation: Uses the call stack for recursion
- Traversal Pattern: Follows one path completely before trying alternatives
- Time Complexity: O(m×n)
- Space Complexity: O(m×n) worst case (deep recursion)
- Advantages: Intuitive implementation, good for maze-like exploration
- Disadvantages: Risk of stack overflow for large grids
-
BFS:
- Algorithm: Iterative, explores level by level
- Implementation: Uses a queue data structure
- Traversal Pattern: Explores all nodes at current depth before going deeper
- Time Complexity: O(m×n)
- Space Complexity: O(min(m,n)) average case, O(m×n) worst case
- Advantages: Finds shortest paths, avoids stack overflow
- Disadvantages: Slightly more complex implementation
-
Union Find:
- Algorithm: Set-based, tracks connected components
- Implementation: Uses parent and rank arrays with path compression
- Traversal Pattern: No explicit traversal; merges sets as connections are discovered
- Time Complexity: O(m×n α(m×n)) ≈ O(m×n)
- Space Complexity: O(m×n)
- Advantages: Excellent for dynamic connectivity queries
- Disadvantages: More complex implementation, less intuitive
Advanced Considerations: When to Use Each Algorithm
-
DFS:
- Best for exploring all possible paths
- Ideal for problems requiring exhaustive search
- Good for detecting cycles in directed graphs
- Memory-efficient when graph is sparse and deep
-
BFS:
- Optimal for finding shortest paths in unweighted graphs
- Preferred for level-order processing
- Better for graphs with high branching factor
- More predictable memory usage
-
Union Find:
- Excels at dynamic connectivity problems
- Ideal for checking if two elements are connected
- Perfect for incremental processing of graph edges
- Efficient for merging connected components
- Often used in Kruskal's algorithm for minimum spanning trees
Conclusion
Graph algorithms are powerful tools for solving complex problems across various domains. Understanding the nuances of DFS, BFS, and Union Find is essential for selecting the right approach for different problem types.
DFS dives deep, BFS explores broadly, and Union Find efficiently tracks connectivity patterns. Each has its strengths and optimal use cases. By mastering these fundamental techniques, you'll be well-equipped to tackle a wide range of graph-related challenges, from network analysis to game development, from pathfinding to connectivity problems.
The ability to choose the right algorithm for a specific problem is a hallmark of an experienced developer. These three powerful graph algorithms form the foundation of many complex problem-solving techniques and will serve you well across countless programming challenges.