My Blog

Graph Algorithms: DFS, BFS, and Union Find

2025-04-16 Hanlun Wang

Graph algorithms form the foundation of many complex problem-solving techniques in computer science. This article explores three fundamental graph traversal and manipulation algorithms: Depth-First Search (DFS), Breadth-First Search (BFS), and Union Find. We'll examine their implementations, applications, and differences through practical LeetCode problems.

Problem 1: Number of Islands

Let's start with LeetCode Problem 200: Number of Islands.

Problem Description

Given an m x n 2D binary grid grid which represents a map of '1's (land) and '0's (water), return the number of islands.

An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

Example 1:

Input: grid = [
  ["1","1","1","1","0"],
  ["1","1","0","1","0"],
  ["1","1","0","0","0"],
  ["0","0","0","0","0"]
]
Output: 1

Example 2:

Input: grid = [
  ["1","1","0","0","0"],
  ["1","1","0","0","0"],
  ["0","0","1","0","0"],
  ["0","0","0","1","1"]
]
Output: 3

Core Approach

The core idea for solving this problem is to traverse the entire grid. When we find an unvisited land cell ('1'), we increment our island count and mark all connected land cells as visited. This way, when we discover another unvisited land cell, we know it belongs to a different island.

This approach raises two key questions:

  1. How do we mark cells as visited?
  2. How do we find all connected land cells?

Marking Visited Cells

For the first question, there are two common approaches:

  1. In-place modification: Change the value in the original grid (e.g., convert '1' to '0' or add a marker like '-'). This is space-efficient but modifies the input.

  2. Using a separate data structure: Create a separate grid or set to track visited positions. This preserves the input but requires additional space.

For most graph problems, if possible, using the first method saves space complexity, so we'll use that approach.

Finding Connected Land Cells

For the second question, we have two classic approaches:

1. Depth-First Search (DFS)

DFS explores as deeply as possible along each branch before backtracking. When we find a land cell ('1'), we immediately explore one of its neighbors, then that neighbor's neighbors, and so on recursively. We follow one path as far as possible before trying alternative paths.

In DFS, the exploration pattern resembles diving deep into one direction first:

  1. Start at a land cell
  2. Mark it as visited
  3. Recursively explore all its unvisited neighbors
  4. When no more unvisited neighbors exist, backtrack and explore other paths

Let's examine the DFS process in detail:

def dfs(i, j):
    # Base case: if out of bounds or water or already visited, return
    if (i < 0 or i >= rows or
        j < 0 or j >= cols or
        grid[i][j] == '0'):
        return
 
    # Mark current cell as visited by changing it to water
    grid[i][j] = '0'
 
    # Recursively explore all four directions
    dfs(i+1, j)  # down
    dfs(i-1, j)  # up
    dfs(i, j+1)  # right
    dfs(i, j-1)  # left

This recursive approach elegantly handles the depth-first traversal without needing an explicit stack data structure.

2. Breadth-First Search (BFS)

BFS explores all neighbors at the current depth level before moving deeper. Instead of immediately diving into a newly discovered land cell as in DFS, BFS records all possible paths (typically using a queue) and visits them one by one.

The BFS exploration pattern is level by level:

  1. Start at a land cell
  2. Mark it as visited
  3. Enqueue all its unvisited neighbors
  4. Process the queue one by one, marking and enqueueing new neighbors

Let's examine the BFS process in detail:

def bfs(i, j):
    # Initialize queue with starting cell
    queue = deque([(i, j)])
    grid[i][j] = '0'  # Mark starting cell as visited
 
    while queue:
        # Dequeue the next cell to process
        curr_i, curr_j = queue.popleft()
 
        # Define all four directions to explore
        directions = [(1,0), (-1,0), (0,1), (0,-1)]  # down, up, right, left
 
        # Check all four adjacent cells
        for di, dj in directions:
            next_i, next_j = curr_i + di, curr_j + dj
 
            # If valid land cell, mark as visited and enqueue
            if (0 <= next_i < rows and
                0 <= next_j < cols and
                grid[next_i][next_j] == '1'):
                queue.append((next_i, next_j))
                grid[next_i][next_j] = '0'  # Mark as visited when enqueueing

Unlike DFS which uses the call stack for recursion, BFS explicitly uses a queue data structure to manage the traversal order.

DFS Solution

class Solution:
    def numIslands(self, grid: List[List[str]]) -> int:
        if not grid:
            return 0
 
        count = 0
        rows, cols = len(grid), len(grid[0])
 
        def dfs(i, j):
            # Check boundaries and if current cell is land
            if (i < 0 or i >= rows or
                j < 0 or j >= cols or
                grid[i][j] == '0'):
                return
 
            # Mark current cell as visited
            grid[i][j] = '0'
 
            # Explore all 4 directions
            dfs(i+1, j)  # down
            dfs(i-1, j)  # up
            dfs(i, j+1)  # right
            dfs(i, j-1)  # left
 
        # Iterate through each cell in the grid
        for i in range(rows):
            for j in range(cols):
                if grid[i][j] == '1':
                    count += 1
                    dfs(i, j)
 
        return count

The time complexity is O(m×n) where m and n are the grid dimensions, as we potentially visit each cell once. The space complexity is O(m×n) in the worst case due to the recursion stack depth when the grid is filled with land cells in a specific pattern that maximizes stack usage.

BFS Solution

from collections import deque
 
class Solution:
    def numIslands(self, grid: List[List[str]]) -> int:
        if not grid:
            return 0
 
        count = 0
        rows, cols = len(grid), len(grid[0])
 
        def bfs(i, j):
            queue = deque([(i, j)])
            grid[i][j] = '0'  # Mark as visited
 
            while queue:
                curr_i, curr_j = queue.popleft()
                # Check all 4 directions
                directions = [(1,0), (-1,0), (0,1), (0,-1)]
 
                for di, dj in directions:
                    next_i, next_j = curr_i + di, curr_j + dj
                    if (0 <= next_i < rows and
                        0 <= next_j < cols and
                        grid[next_i][next_j] == '1'):
                        queue.append((next_i, next_j))
                        grid[next_i][next_j] = '0'  # Mark as visited
 
        for i in range(rows):
            for j in range(cols):
                if grid[i][j] == '1':
                    count += 1
                    bfs(i, j)
 
        return count

Like the DFS solution, the BFS solution has a time complexity of O(m×n). The space complexity is also O(m×n) in the worst case, which occurs when many land cells are at the same distance from the starting point, causing the queue to grow large.

Comparing DFS and BFS

For the island counting problem, both DFS and BFS are effective:

  • DFS: Leverages recursion to explore paths deeply before backtracking. It navigates from one land cell to the next immediately, following a path as far as possible.

  • BFS: Uses a queue to explore cells level by level. It records all potentially explorable paths first, then processes them one by one.

The key differences are:

  • DFS may be more intuitive for this problem and often requires less code
  • BFS is often better for finding the shortest path (though not needed here)
  • DFS can lead to stack overflow for very large grids due to deep recursion
  • BFS tends to use more memory for the queue in the worst case

Union Find

Before explaining how Union Find can solve the Island problem, let's first explore Union Find through another problem.

Problem 2: Redundant Connection

LeetCode Problem 684: Redundant Connection

Problem Description

In this problem, a tree is an undirected graph that is connected and has no cycles.

You are given a graph that started as a tree with n nodes labeled from 1 to n, with one additional edge added. The added edge has two different vertices chosen from 1 to n, and was not an edge that already existed. The graph is represented as an array edges of length n where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the graph.

Return an edge that can be removed so that the resulting graph is a tree of n nodes. If there are multiple answers, return the answer that occurs last in the input.

Example 1:

example 1

Input: edges = [[1,2],[1,3],[2,3]]
Output: [2,3]

Example 2:

example 2

Input: edges = [[1,2],[2,3],[3,4],[1,4],[1,5]]
Output: [1,4]

Core Approach Using Union Find

Union Find (also known as Disjoint Set) is especially well-suited for problems where we need to identify elements that share a common characteristic.

In this problem, we need to find which edge is redundant. An edge is redundant if it connects two nodes that are already connected through some path. Adding such an edge creates a cycle in the graph, which violates the definition of a tree.

The key question is: how do we determine if two nodes are already connected? A straightforward approach is to find their "root" nodes (also called representatives of their sets). If they share the same root, they're already connected.

Union Find uses two main operations:

  1. Find: Determine which set (or component) an element belongs to
  2. Union: Merge two sets together

Union Find Implementation in Detail

First, let's initialize the Union Find data structure:

# Initialize parent array and rank array
par = [i for i in range(len(edges) + 1)]  # Each node is initially its own parent
rank = [1] * (len(edges) + 1)  # Initial rank of each node is 1

The Find operation uses path compression to locate a node's root efficiently:

def find(n):
    """Find the root of node n with path compression"""
    p = par[n]
    while p != par[p]:  # If current node is not the root (not its own parent)
        par[p] = par[par[p]]  # Path compression: point to grandparent
        p = par[p]  # Move up to parent
    return p

Path compression is a crucial optimization that flattens the tree structure during traversal, reducing the time complexity of future operations.

The Union operation merges two sets by rank to maintain a balanced tree structure:

def union(n1, n2):
    """Union by rank to keep the tree balanced"""
    p1, p2 = find(n1), find(n2)
 
    if p1 == p2:  # Already in the same set
        return False
 
    # Union by rank: attach smaller tree to larger tree
    if rank[p1] > rank[p2]:
        par[p2] = p1
        rank[p1] += rank[p2]
    else:
        par[p1] = p2
        rank[p2] += rank[p1]
    return True

Union by rank ensures that the tree doesn't get too deep, which keeps the Find operation efficient.

Complete Solution for Redundant Connection

class Solution:
    def findRedundantConnection(self, edges: List[List[int]]) -> List[int]:
        par = [i for i in range(len(edges) + 1)]
        rank = [1] * (len(edges) + 1)
 
        def find(n):
            """Find root with path compression"""
            p = par[n]
            while p != par[p]:
                par[p] = par[par[p]]  # Path compression
                p = par[p]
            return p
 
        def union(n1, n2):
            """Union by rank"""
            p1, p2 = find(n1), find(n2)
 
            if p1 == p2:  # Already connected, adding this edge creates a cycle
                return False
 
            # Union by rank
            if rank[p1] > rank[p2]:
                par[p2] = p1
                rank[p1] += rank[p2]
            else:
                par[p1] = p2
                rank[p2] += rank[p1]
            return True
 
        for n1, n2 in edges:
            if not union(n1, n2):  # If already connected, this edge is redundant
                return [n1, n2]

Let's walk through example 2 step by step:

  1. Initially: parent = [0,1,2,3,4,5], rank = [0,1,1,1,1,1]
  2. Process [1,2]:
    • Find roots: 1 and 2 (their own parents)
    • Union: parent = [0,1,1,3,4,5], rank = [0,2,1,1,1,1]
  3. Process [2,3]:
    • Find roots: 1 (parent of 2) and 3
    • Union: parent = [0,1,1,1,4,5], rank = [0,3,1,1,1,1]
  4. Process [3,4]:
    • Find roots: 1 (parent of 3) and 4
    • Union: parent = [0,1,1,1,1,5], rank = [0,4,1,1,1,1]
  5. Process [1,4]:
    • Find roots: 1 and 1 (parent of 4)
    • They're already connected, so this edge is redundant
    • Return [1,4]

The time complexity is O(n α(n)) where α is the inverse Ackermann function, which grows extremely slowly and is effectively constant for all practical purposes. The space complexity is O(n) for the parent and rank arrays.

Applying Union Find to the Number of Islands Problem

Union Find can also elegantly solve the Number of Islands problem:

class Solution:
    def numIslands(self, grid: List[List[str]]) -> int:
        if not grid or not grid[0]:
            return 0
 
        count = 0
        m, n = len(grid), len(grid[0])
        parent = []
        rank = []
 
        # Initialize parent and rank arrays
        for i in range(m):
            for j in range(n):
                if grid[i][j] == "1":
                    parent.append(i * n + j)  # Map 2D coordinates to 1D
                    count += 1
                else:
                    parent.append(-1)  # Water cells marked as -1
                rank.append(0)
 
        def find(i: int) -> int:
            """Find root of set with path compression"""
            if parent[i] != i:
                parent[i] = find(parent[i])
            return parent[i]
 
        def union(x: int, y: int) -> None:
            """Union by rank"""
            rootx = find(x)
            rooty = find(y)
 
            if rootx != rooty:
                if rank[rootx] > rank[rooty]:
                    parent[rooty] = rootx
                elif rank[rootx] < rank[rooty]:
                    parent[rootx] = rooty
                else:
                    parent[rooty] = rootx
                    rank[rootx] += 1
                nonlocal count
                count -= 1  # Decrease count when merging two islands
 
        # Process each cell in the grid
        for r in range(m):
            for c in range(n):
                if grid[r][c] == "1":
                    grid[r][c] = "0"  # Mark as visited
 
                    # Check all 4 adjacent cells
                    directions = [(-1, 0), (1, 0), (0, -1), (0, 1)]  # Up, Down, Left, Right
                    for dr, dc in directions:
                        nr, nc = r + dr, c + dc
                        if (0 <= nr < m and 0 <= nc < n and
                            grid[nr][nc] == "1"):
                            union(r * n + c, nr * n + nc)
 
        return count

How Union Find Works for Island Counting - A Detailed Example

The Union Find approach to the island problem is particularly elegant:

  1. We start by considering each land cell as its own island
  2. We maintain a count of total islands (initially equal to the number of land cells)
  3. As we discover adjacent land cells, we merge their islands using Union
  4. Whenever we perform a successful union, we decrease our island count
  5. The final count represents the number of distinct islands

Let's walk through a detailed example:

Grid:
1 1 0
1 0 1

First, we map this 2D grid to a 1D array (0-5) for easier processing:

Index:            0  1  2  3  4  5
For grid cells:   1  1  0  1  0  1

Now we initialize:

  • parent = [0,1,-1,3,-1,5] (-1 for water cells)
  • rank = [0,0,0,0,0,0]
  • count = 4 (number of land cells)

Processing step by step:

  1. Cell (0,0) at index 0:

    • Mark as visited
    • Check right neighbor (0,1) at index 1: it's land
      • Union(0, 1): parent becomes [0,0,-1,3,-1,5], count = 3
    • Check bottom neighbor (1,0) at index 3: it's land
      • Union(0, 3): parent becomes [0,0,-1,0,-1,5], count = 2
  2. Cell (0,1) at index 1:

    • Already visited (marked as "0"), so skip
  3. Cell (1,0) at index 3:

    • Already visited, so skip
  4. Cell (2,2) at index 5:

    • Mark as visited
    • No adjacent land cells
    • This remains a separate island

Final state:

  • parent = [0,0,-1,0,-1,5]
  • count = 2 (representing 2 distinct islands)

The Union Find solution has a time complexity of O(m×n α(m×n)), which is essentially O(m×n) for all practical purposes. The space complexity is O(m×n) for the parent and rank arrays.

Comprehensive Comparison of the Three Approaches

Let's compare DFS, BFS, and Union Find for the Number of Islands problem:

  1. DFS:

    • Algorithm: Recursive, explores deeply before backtracking
    • Implementation: Uses the call stack for recursion
    • Traversal Pattern: Follows one path completely before trying alternatives
    • Time Complexity: O(m×n)
    • Space Complexity: O(m×n) worst case (deep recursion)
    • Advantages: Intuitive implementation, good for maze-like exploration
    • Disadvantages: Risk of stack overflow for large grids
  2. BFS:

    • Algorithm: Iterative, explores level by level
    • Implementation: Uses a queue data structure
    • Traversal Pattern: Explores all nodes at current depth before going deeper
    • Time Complexity: O(m×n)
    • Space Complexity: O(min(m,n)) average case, O(m×n) worst case
    • Advantages: Finds shortest paths, avoids stack overflow
    • Disadvantages: Slightly more complex implementation
  3. Union Find:

    • Algorithm: Set-based, tracks connected components
    • Implementation: Uses parent and rank arrays with path compression
    • Traversal Pattern: No explicit traversal; merges sets as connections are discovered
    • Time Complexity: O(m×n α(m×n)) ≈ O(m×n)
    • Space Complexity: O(m×n)
    • Advantages: Excellent for dynamic connectivity queries
    • Disadvantages: More complex implementation, less intuitive

Advanced Considerations: When to Use Each Algorithm

  • DFS:

    • Best for exploring all possible paths
    • Ideal for problems requiring exhaustive search
    • Good for detecting cycles in directed graphs
    • Memory-efficient when graph is sparse and deep
  • BFS:

    • Optimal for finding shortest paths in unweighted graphs
    • Preferred for level-order processing
    • Better for graphs with high branching factor
    • More predictable memory usage
  • Union Find:

    • Excels at dynamic connectivity problems
    • Ideal for checking if two elements are connected
    • Perfect for incremental processing of graph edges
    • Efficient for merging connected components
    • Often used in Kruskal's algorithm for minimum spanning trees

Conclusion

Graph algorithms are powerful tools for solving complex problems across various domains. Understanding the nuances of DFS, BFS, and Union Find is essential for selecting the right approach for different problem types.

DFS dives deep, BFS explores broadly, and Union Find efficiently tracks connectivity patterns. Each has its strengths and optimal use cases. By mastering these fundamental techniques, you'll be well-equipped to tackle a wide range of graph-related challenges, from network analysis to game development, from pathfinding to connectivity problems.

The ability to choose the right algorithm for a specific problem is a hallmark of an experienced developer. These three powerful graph algorithms form the foundation of many complex problem-solving techniques and will serve you well across countless programming challenges.