Graph Components and Traversal Algorithms

Component analysis and graph traversal are fundamental techniques for understanding graph structure, detecting communities, validating properties, and exploring relationships. This chapter covers the theory and implementation of four essential algorithms in aprender's graph module.

Overview

Aprender implements four key algorithms for graph exploration and decomposition:

  1. Depth-First Search (DFS): Stack-based graph traversal
  2. Connected Components: Find groups of reachable nodes (undirected graphs)
  3. Strongly Connected Components (SCCs): Find mutually reachable groups (directed graphs)
  4. Topological Sort: Linear ordering of directed acyclic graphs (DAGs)

All algorithms operate on the Compressed Sparse Row (CSR) graph representation for optimal cache locality and memory efficiency.

Depth-First Search (DFS)

Algorithm

Depth-First Search explores a graph by going as deep as possible along each branch before backtracking. It uses a stack (explicit or via recursion) to track the exploration path.

Properties:

  • Time Complexity: O(n + m) where n = nodes, m = edges
  • Space Complexity: O(n) for visited tracking and stack
  • Explores one branch completely before trying others
  • Returns nodes in pre-order visitation

Implementation

use aprender::graph::Graph;

let g = Graph::from_edges(&[(0, 1), (1, 2), (2, 3), (1, 4)], false);

// DFS from node 0
let order = g.dfs(0).expect("node should exist");
// Possible result: [0, 1, 2, 3, 4] or [0, 1, 4, 2, 3]
// Order depends on neighbor iteration order

// DFS on disconnected graph only visits reachable nodes
let g2 = Graph::from_edges(&[(0, 1), (2, 3)], false);
let order2 = g2.dfs(0).expect("node should exist");
assert_eq!(order2, vec![0, 1]); // Only component with node 0

// Invalid starting node returns None
assert!(g.dfs(100).is_none());

How It Works

  1. Initialization: Push source node onto stack, mark as visited
  2. Loop: While stack is not empty:
    • Pop node from stack
    • If already visited, skip
    • Mark as visited, add to result
    • Push unvisited neighbors onto stack (in reverse order for consistent traversal)
  3. Termination: Stack is empty when all reachable nodes explored

Visual Example (tree):

Graph:      0
           / \
          1   2
         /
        3

DFS from 0:
Stack: [0]           Visited: {}        Order: []
Stack: [2, 1]        Visited: {0}       Order: [0]
Stack: [2, 3]        Visited: {0,1}     Order: [0,1]
Stack: [2]           Visited: {0,1,3}   Order: [0,1,3]
Stack: []            Visited: {0,1,2,3} Order: [0,1,3,2]

Stack-Based vs Recursive:

  • Aprender uses explicit stack (not recursion)
  • Avoids stack overflow on deep graphs (>10K depth)
  • Pre-order traversal: node added to result when first visited
  • Neighbors pushed in reverse order for deterministic left-to-right traversal

Use Cases

  • Cycle Detection: DFS can detect cycles by tracking in-stack nodes
  • Path Finding: Find any path between two nodes (not necessarily shortest)
  • Maze Solving: Explore all paths until exit found
  • Topological Sort: DFS post-order is foundation for DAG ordering
  • Connected Components: DFS from each unvisited node finds components

Comparison with BFS

AspectDFSBFS
Data StructureStack (LIFO)Queue (FIFO)
ExplorationDeep (branch-first)Wide (level-first)
Path FoundAny pathShortest path (unweighted)
MemoryO(n) worst caseO(n) worst case
Use CaseStructure analysisDistance computation
use aprender::graph::Graph;

let g = Graph::from_edges(
    &[(0, 1), (0, 2), (1, 3), (2, 3)],
    false
);

// DFS might visit: 0 → 1 → 3 → 2
let dfs_order = g.dfs(0).expect("node exists");

// BFS (via shortest_path) visits: 0 → 1, 2 → 3 (level-by-level)
let path_to_3 = g.shortest_path(0, 3).expect("path exists");
assert_eq!(path_to_3.len(), 3); // 0 → 1 → 3 (or 0 → 2 → 3)

Connected Components

Algorithm

Connected Components identifies groups of nodes that are mutually reachable in an undirected graph. Aprender uses Union-Find (also called Disjoint Set Union) with path compression and union by rank.

Properties:

  • Time Complexity: O(m α(n)) where α = inverse Ackermann function (effectively constant)
  • Space Complexity: O(n) for parent and rank arrays
  • Near-linear performance in practice
  • Returns component ID for each node

Implementation

use aprender::graph::Graph;

// Three components: {0,1}, {2,3,4}, {5}
let g = Graph::from_edges(
    &[(0, 1), (2, 3), (3, 4)],
    false
);

let components = g.connected_components();
assert_eq!(components.len(), 6);

// Nodes in same component have same ID
assert_eq!(components[0], components[1]); // 0 and 1 connected
assert_eq!(components[2], components[3]); // 2 and 3 connected
assert_eq!(components[3], components[4]); // 3 and 4 connected

// Different components have different IDs
assert_ne!(components[0], components[2]);
assert_ne!(components[0], components[5]);

// Count number of components
use std::collections::HashSet;
let num_components: usize = components.iter().collect::<HashSet<_>>().len();
assert_eq!(num_components, 3);

How It Works

Union-Find maintains a forest of trees where each tree represents a component.

Data Structures:

  • parent[i]: Parent of node i (root if parent[i] == i)
  • rank[i]: Approximate depth of tree rooted at i

Operations:

  1. Find(x): Find root of x's tree with path compression
fn find(parent: &mut [usize], x: usize) -> usize {
    if parent[x] != x {
        parent[x] = find(parent, parent[x]); // Path compression
    }
    parent[x]
}
  1. Union(x, y): Merge trees of x and y with union by rank
fn union(parent: &mut [usize], rank: &mut [usize], x: usize, y: usize) {
    let root_x = find(parent, x);
    let root_y = find(parent, y);

    if root_x == root_y { return; }

    // Attach smaller tree under larger tree
    if rank[root_x] < rank[root_y] {
        parent[root_x] = root_y;
    } else if rank[root_x] > rank[root_y] {
        parent[root_y] = root_x;
    } else {
        parent[root_y] = root_x;
        rank[root_x] += 1;
    }
}

Visual Example:

Graph: 0---1   2---3---4   5

Initial: parent=[0,1,2,3,4,5], rank=[0,0,0,0,0,0]

Process edge (0,1):
  Union(0,1): parent=[0,0,2,3,4,5], rank=[1,0,0,0,0,0]

Process edge (2,3):
  Union(2,3): parent=[0,0,2,2,4,5], rank=[1,0,1,0,0,0]

Process edge (3,4):
  Union(2,4): parent=[0,0,2,2,2,5], rank=[1,0,2,0,0,0]

Final components:
  Component 0: {0,1}
  Component 2: {2,3,4}
  Component 5: {5}

Path Compression

Path compression flattens trees during find operations, making future queries faster.

Without path compression:

Find(4): 4 → 3 → 2  (3 steps)

With path compression:

After Find(4): 4 → 2, 3 → 2  (all point to root)
Next Find(4): 4 → 2  (1 step)

This achieves amortized O(α(n)) ≈ O(1) time per operation.

Use Cases

  • Network Connectivity: Identify isolated sub-networks
  • Image Segmentation: Group connected pixels
  • Social Network Clusters: Find friend groups
  • Graph Partitioning: Identify disconnected regions
  • Reachability Queries: "Can I get from A to B?"

Strongly Connected Components (SCCs)

Algorithm

Strongly Connected Components finds groups of nodes in a directed graph where every node can reach every other node in the group. Aprender uses Tarjan's algorithm (single DFS pass).

Properties:

  • Time Complexity: O(n + m) - single DFS traversal
  • Space Complexity: O(n) for discovery time, low-link values, and stack
  • Returns component ID for each node
  • Components are returned in reverse topological order

Implementation

use aprender::graph::Graph;

// Directed graph with 2 SCCs: {0,1,2} and {3}
//   0 → 1 → 2 → 0 (cycle)
//   2 → 3 (one-way edge to isolated node)
let g = Graph::from_edges(
    &[(0, 1), (1, 2), (2, 0), (2, 3)],
    true  // directed
);

let sccs = g.strongly_connected_components();
assert_eq!(sccs.len(), 4);

// Cycle forms one SCC
assert_eq!(sccs[0], sccs[1]);
assert_eq!(sccs[1], sccs[2]);

// Node 3 is separate SCC (no incoming edges in cycle)
assert_ne!(sccs[0], sccs[3]);

// On DAG, each node is its own SCC
let dag = Graph::from_edges(&[(0, 1), (1, 2)], true);
let dag_sccs = dag.strongly_connected_components();
assert_ne!(dag_sccs[0], dag_sccs[1]);
assert_ne!(dag_sccs[1], dag_sccs[2]);

How It Works

Tarjan's algorithm uses DFS with two timestamps per node:

  • disc[v]: Discovery time (when v first visited)
  • low[v]: Lowest discovery time reachable from v

Key Insight: If low[v] == disc[v], then v is the root of an SCC.

Algorithm Steps:

  1. DFS Traversal: Visit nodes in DFS order
  2. Discovery Time: Assign disc[v] = time++ when visiting v
  3. Low-Link Calculation:
    • For tree edges: low[v] = min(low[v], low[w])
    • For back edges: low[v] = min(low[v], disc[w])
  4. SCC Detection: If low[v] == disc[v], pop stack until v is found
  5. Stack Management: Maintain stack of nodes in current DFS path

Visual Example:

Graph:  0 → 1 → 2
        ↑       ↓
        └───────┘

DFS from 0:
Visit 0: disc[0]=0, low[0]=0, stack=[0]
Visit 1: disc[1]=1, low[1]=1, stack=[0,1]
Visit 2: disc[2]=2, low[2]=2, stack=[0,1,2]
Back edge 2→0: low[2]=min(2,0)=0
               low[1]=min(1,0)=0
               low[0]=min(0,0)=0

SCC detection at 0: low[0]==disc[0]
Pop stack until 0: {2,1,0} form one SCC

Comparison: Tarjan vs Kosaraju

AspectTarjanKosaraju
DFS Passes12
Transpose GraphNoYes
ComplexityO(n+m)O(n+m)
ImplementationMore complexSimpler
Performance~30% fasterEasier to understand

Aprender uses Tarjan's for better performance.

Use Cases

  • Dependency Analysis: Find circular dependencies
  • Compiler Optimization: Detect infinite loops
  • Web Crawling: Identify link cycles
  • Database Transactions: Detect deadlocks
  • Social Network Analysis: Find tightly-knit groups

Topological Sort

Algorithm

Topological Sort produces a linear ordering of nodes in a directed acyclic graph (DAG) such that for every edge u → v, u appears before v. This is used for task scheduling, dependency resolution, and build systems.

Properties:

  • Time Complexity: O(n + m) - DFS-based
  • Space Complexity: O(n) for visited and in-stack tracking
  • Returns Some(order) for DAGs, None for graphs with cycles
  • Multiple valid orderings may exist

Implementation

use aprender::graph::Graph;

// DAG: 0 → 1 → 3
//      ↓    ↓
//      2 ───┘
let g = Graph::from_edges(
    &[(0, 1), (0, 2), (1, 3), (2, 3)],
    true  // directed
);

let order = g.topological_sort().expect("DAG should have valid ordering");
assert_eq!(order.len(), 4);

// Verify ordering: each edge (u,v) has u before v
let pos: std::collections::HashMap<_, _> =
    order.iter().enumerate().map(|(i, &v)| (v, i)).collect();

// Edge 0→1: pos[0] < pos[1]
assert!(pos[&0] < pos[&1]);
assert!(pos[&0] < pos[&2]);
assert!(pos[&1] < pos[&3]);
assert!(pos[&2] < pos[&3]);

// Cycle detection: returns None
let cycle = Graph::from_edges(&[(0, 1), (1, 2), (2, 0)], true);
assert!(cycle.topological_sort().is_none());

How It Works

Topological sort uses DFS with post-order traversal and cycle detection.

Algorithm Steps:

  1. Initialization: Mark all nodes as unvisited
  2. DFS with Cycle Detection: For each unvisited node:
    • Mark as in-stack (currently exploring)
    • Recursively visit all unvisited neighbors
    • If neighbor is in-stack, cycle detected → return None
    • Mark as visited (finished exploring)
    • Add to result in post-order (after all descendants)
  3. Reverse: Reverse post-order to get topological order

Visual Example:

Graph:  0 → 1 → 3
        ↓    ↓
        2 ───┘

DFS from 0:
  Visit 0 (in_stack)
    Visit 1 (in_stack)
      Visit 3 (in_stack)
      3 done → post_order=[3]
    1 done → post_order=[3,1]
    Visit 2 (in_stack)
      3 already visited, skip
    2 done → post_order=[3,1,2]
  0 done → post_order=[3,1,2,0]

Reverse: [0,2,1,3] (valid topological order)

Cycle Detection:

Graph: 0 → 1 → 2 → 0 (cycle)

DFS from 0:
  Visit 0 (in_stack={0})
    Visit 1 (in_stack={0,1})
      Visit 2 (in_stack={0,1,2})
        Visit 0 (in_stack={0,1,2})
        0 is in_stack → CYCLE DETECTED
        Return None

Multiple Valid Orderings

DAGs often have multiple valid topological orderings:

use aprender::graph::Graph;

// Diamond DAG:  0
//              / \
//             1   2
//              \ /
//               3

let g = Graph::from_edges(&[(0, 1), (0, 2), (1, 3), (2, 3)], true);
let order = g.topological_sort().expect("valid DAG");

// Valid orderings: [0,1,2,3] or [0,2,1,3]
// Both satisfy: 0 before 1,2 and 1,2 before 3

Use Cases

  • Build Systems: Compile source files in dependency order (Makefile, Cargo)
  • Course Prerequisites: Schedule classes respecting prerequisites
  • Task Scheduling: Execute tasks with dependencies (CI/CD pipelines)
  • Package Managers: Install dependencies before dependents (npm, pip)
  • Spreadsheet Calculations: Compute cells in formula dependency order

Kahn's Algorithm (Alternative)

Kahn's algorithm is an alternative using in-degree counting:

  1. Find all nodes with in-degree 0
  2. Add them to result, remove from graph
  3. Repeat until graph is empty (valid) or no zero in-degree nodes (cycle)

Comparison:

AspectDFS-based (aprender)Kahn's Algorithm
ComplexityO(n+m)O(n+m)
Cycle DetectionEarly terminationEnd of algorithm
Output OrderDeterministicQueue-dependent
ImplementationRecursive/stackQueue-based

Aprender uses DFS-based for early cycle detection and simpler implementation.

Performance Comparison

Complexity Summary

AlgorithmTimeSpaceUse Case
DFSO(n+m)O(n)Graph exploration
Connected ComponentsO(m α(n))O(n)Undirected connectivity
SCCs (Tarjan)O(n+m)O(n)Directed connectivity
Topological SortO(n+m)O(n)DAG ordering

All algorithms achieve near-linear performance on sparse graphs (m ≈ n).

Benchmark Results

Synthetic graphs (average degree ≈ 3):

Algorithm              | 100 nodes | 1000 nodes | 5000 nodes |
-----------------------|-----------|------------|------------|
DFS                    | 580 ns    | 5.6 µs     | 28 µs      |
Connected Components   | 1.2 µs    | 11.5 µs    | 58 µs      |
SCCs (Tarjan)          | 1.8 µs    | 17.2 µs    | 87 µs      |
Topological Sort       | 620 ns    | 6.2 µs     | 31 µs      |

Key Observations:

  • Perfect linear scaling: 10x nodes → ~10x time
  • DFS and topological sort have minimal overhead
  • SCCs ~1.5x slower than connected components (directed graph complexity)
  • All algorithms <100µs for 5000-node graphs

Advanced Topics

Bi-Connected Components

Bi-connected components are maximal subgraphs with no articulation points (bridges). Removing any single node doesn't disconnect the component.

Application: Network resilience analysis

Not yet implemented in aprender (future roadmap).

Condensation Graph

The condensation graph represents SCCs as nodes, with edges between SCCs.

Original:  0 → 1 ⇄ 2      Condensation:  {0} → {1,2} → {3}
           ↓       ↓
           3 ←─────┘

Property: Condensation is always a DAG

Use Case: Simplify graph analysis by collapsing cycles

Parallel Algorithms

DFS is inherently sequential (stack-based), but components can be parallelized:

  • Parallel Union-Find: Use concurrent data structures for find/union
  • Parallel SCCs: Multiple independent DFS starting points
  • Parallel Topological Sort: Level-based parallelization

Not yet implemented in aprender (future optimization).

See Also

References

  1. Tarjan, R. E. (1972). "Depth-first search and linear graph algorithms." SIAM Journal on Computing, 1(2), 146-160.

  2. Tarjan, R. E. (1975). "Efficiency of a good but not linear set union algorithm." Journal of the ACM, 22(2), 215-225.

  3. Cormen, T. H., et al. (2009). Introduction to Algorithms (3rd ed.). MIT Press.

    • Chapter 22: Elementary Graph Algorithms (DFS, topological sort)
    • Chapter 21: Data Structures for Disjoint Sets (Union-Find)
  4. Knuth, D. E. (1997). The Art of Computer Programming, Volume 1: Fundamental Algorithms (3rd ed.). Section 2.3.3: Topological Sorting.

  5. Sharir, M. (1981). "A strong-connectivity algorithm and its applications in data flow analysis." Computers & Mathematics with Applications, 7(1), 67-72.