Graph Components and Traversal Algorithms
Component analysis and graph traversal are fundamental techniques for understanding graph structure, detecting communities, validating properties, and exploring relationships. This chapter covers the theory and implementation of four essential algorithms in aprender's graph module.
Overview
Aprender implements four key algorithms for graph exploration and decomposition:
- Depth-First Search (DFS): Stack-based graph traversal
- Connected Components: Find groups of reachable nodes (undirected graphs)
- Strongly Connected Components (SCCs): Find mutually reachable groups (directed graphs)
- Topological Sort: Linear ordering of directed acyclic graphs (DAGs)
All algorithms operate on the Compressed Sparse Row (CSR) graph representation for optimal cache locality and memory efficiency.
Depth-First Search (DFS)
Algorithm
Depth-First Search explores a graph by going as deep as possible along each branch before backtracking. It uses a stack (explicit or via recursion) to track the exploration path.
Properties:
- Time Complexity: O(n + m) where n = nodes, m = edges
- Space Complexity: O(n) for visited tracking and stack
- Explores one branch completely before trying others
- Returns nodes in pre-order visitation
Implementation
use aprender::graph::Graph;
let g = Graph::from_edges(&[(0, 1), (1, 2), (2, 3), (1, 4)], false);
// DFS from node 0
let order = g.dfs(0).expect("node should exist");
// Possible result: [0, 1, 2, 3, 4] or [0, 1, 4, 2, 3]
// Order depends on neighbor iteration order
// DFS on disconnected graph only visits reachable nodes
let g2 = Graph::from_edges(&[(0, 1), (2, 3)], false);
let order2 = g2.dfs(0).expect("node should exist");
assert_eq!(order2, vec![0, 1]); // Only component with node 0
// Invalid starting node returns None
assert!(g.dfs(100).is_none());
How It Works
- Initialization: Push source node onto stack, mark as visited
- Loop: While stack is not empty:
- Pop node from stack
- If already visited, skip
- Mark as visited, add to result
- Push unvisited neighbors onto stack (in reverse order for consistent traversal)
- Termination: Stack is empty when all reachable nodes explored
Visual Example (tree):
Graph: 0
/ \
1 2
/
3
DFS from 0:
Stack: [0] Visited: {} Order: []
Stack: [2, 1] Visited: {0} Order: [0]
Stack: [2, 3] Visited: {0,1} Order: [0,1]
Stack: [2] Visited: {0,1,3} Order: [0,1,3]
Stack: [] Visited: {0,1,2,3} Order: [0,1,3,2]
Stack-Based vs Recursive:
- Aprender uses explicit stack (not recursion)
- Avoids stack overflow on deep graphs (>10K depth)
- Pre-order traversal: node added to result when first visited
- Neighbors pushed in reverse order for deterministic left-to-right traversal
Use Cases
- Cycle Detection: DFS can detect cycles by tracking in-stack nodes
- Path Finding: Find any path between two nodes (not necessarily shortest)
- Maze Solving: Explore all paths until exit found
- Topological Sort: DFS post-order is foundation for DAG ordering
- Connected Components: DFS from each unvisited node finds components
Comparison with BFS
| Aspect | DFS | BFS |
|---|---|---|
| Data Structure | Stack (LIFO) | Queue (FIFO) |
| Exploration | Deep (branch-first) | Wide (level-first) |
| Path Found | Any path | Shortest path (unweighted) |
| Memory | O(n) worst case | O(n) worst case |
| Use Case | Structure analysis | Distance computation |
use aprender::graph::Graph;
let g = Graph::from_edges(
&[(0, 1), (0, 2), (1, 3), (2, 3)],
false
);
// DFS might visit: 0 → 1 → 3 → 2
let dfs_order = g.dfs(0).expect("node exists");
// BFS (via shortest_path) visits: 0 → 1, 2 → 3 (level-by-level)
let path_to_3 = g.shortest_path(0, 3).expect("path exists");
assert_eq!(path_to_3.len(), 3); // 0 → 1 → 3 (or 0 → 2 → 3)
Connected Components
Algorithm
Connected Components identifies groups of nodes that are mutually reachable in an undirected graph. Aprender uses Union-Find (also called Disjoint Set Union) with path compression and union by rank.
Properties:
- Time Complexity: O(m α(n)) where α = inverse Ackermann function (effectively constant)
- Space Complexity: O(n) for parent and rank arrays
- Near-linear performance in practice
- Returns component ID for each node
Implementation
use aprender::graph::Graph;
// Three components: {0,1}, {2,3,4}, {5}
let g = Graph::from_edges(
&[(0, 1), (2, 3), (3, 4)],
false
);
let components = g.connected_components();
assert_eq!(components.len(), 6);
// Nodes in same component have same ID
assert_eq!(components[0], components[1]); // 0 and 1 connected
assert_eq!(components[2], components[3]); // 2 and 3 connected
assert_eq!(components[3], components[4]); // 3 and 4 connected
// Different components have different IDs
assert_ne!(components[0], components[2]);
assert_ne!(components[0], components[5]);
// Count number of components
use std::collections::HashSet;
let num_components: usize = components.iter().collect::<HashSet<_>>().len();
assert_eq!(num_components, 3);
How It Works
Union-Find maintains a forest of trees where each tree represents a component.
Data Structures:
parent[i]: Parent of node i (root if parent[i] == i)rank[i]: Approximate depth of tree rooted at i
Operations:
- Find(x): Find root of x's tree with path compression
fn find(parent: &mut [usize], x: usize) -> usize {
if parent[x] != x {
parent[x] = find(parent, parent[x]); // Path compression
}
parent[x]
}
- Union(x, y): Merge trees of x and y with union by rank
fn union(parent: &mut [usize], rank: &mut [usize], x: usize, y: usize) {
let root_x = find(parent, x);
let root_y = find(parent, y);
if root_x == root_y { return; }
// Attach smaller tree under larger tree
if rank[root_x] < rank[root_y] {
parent[root_x] = root_y;
} else if rank[root_x] > rank[root_y] {
parent[root_y] = root_x;
} else {
parent[root_y] = root_x;
rank[root_x] += 1;
}
}
Visual Example:
Graph: 0---1 2---3---4 5
Initial: parent=[0,1,2,3,4,5], rank=[0,0,0,0,0,0]
Process edge (0,1):
Union(0,1): parent=[0,0,2,3,4,5], rank=[1,0,0,0,0,0]
Process edge (2,3):
Union(2,3): parent=[0,0,2,2,4,5], rank=[1,0,1,0,0,0]
Process edge (3,4):
Union(2,4): parent=[0,0,2,2,2,5], rank=[1,0,2,0,0,0]
Final components:
Component 0: {0,1}
Component 2: {2,3,4}
Component 5: {5}
Path Compression
Path compression flattens trees during find operations, making future queries faster.
Without path compression:
Find(4): 4 → 3 → 2 (3 steps)
With path compression:
After Find(4): 4 → 2, 3 → 2 (all point to root)
Next Find(4): 4 → 2 (1 step)
This achieves amortized O(α(n)) ≈ O(1) time per operation.
Use Cases
- Network Connectivity: Identify isolated sub-networks
- Image Segmentation: Group connected pixels
- Social Network Clusters: Find friend groups
- Graph Partitioning: Identify disconnected regions
- Reachability Queries: "Can I get from A to B?"
Strongly Connected Components (SCCs)
Algorithm
Strongly Connected Components finds groups of nodes in a directed graph where every node can reach every other node in the group. Aprender uses Tarjan's algorithm (single DFS pass).
Properties:
- Time Complexity: O(n + m) - single DFS traversal
- Space Complexity: O(n) for discovery time, low-link values, and stack
- Returns component ID for each node
- Components are returned in reverse topological order
Implementation
use aprender::graph::Graph;
// Directed graph with 2 SCCs: {0,1,2} and {3}
// 0 → 1 → 2 → 0 (cycle)
// 2 → 3 (one-way edge to isolated node)
let g = Graph::from_edges(
&[(0, 1), (1, 2), (2, 0), (2, 3)],
true // directed
);
let sccs = g.strongly_connected_components();
assert_eq!(sccs.len(), 4);
// Cycle forms one SCC
assert_eq!(sccs[0], sccs[1]);
assert_eq!(sccs[1], sccs[2]);
// Node 3 is separate SCC (no incoming edges in cycle)
assert_ne!(sccs[0], sccs[3]);
// On DAG, each node is its own SCC
let dag = Graph::from_edges(&[(0, 1), (1, 2)], true);
let dag_sccs = dag.strongly_connected_components();
assert_ne!(dag_sccs[0], dag_sccs[1]);
assert_ne!(dag_sccs[1], dag_sccs[2]);
How It Works
Tarjan's algorithm uses DFS with two timestamps per node:
- disc[v]: Discovery time (when v first visited)
- low[v]: Lowest discovery time reachable from v
Key Insight: If low[v] == disc[v], then v is the root of an SCC.
Algorithm Steps:
- DFS Traversal: Visit nodes in DFS order
- Discovery Time: Assign
disc[v] = time++when visiting v - Low-Link Calculation:
- For tree edges:
low[v] = min(low[v], low[w]) - For back edges:
low[v] = min(low[v], disc[w])
- For tree edges:
- SCC Detection: If
low[v] == disc[v], pop stack until v is found - Stack Management: Maintain stack of nodes in current DFS path
Visual Example:
Graph: 0 → 1 → 2
↑ ↓
└───────┘
DFS from 0:
Visit 0: disc[0]=0, low[0]=0, stack=[0]
Visit 1: disc[1]=1, low[1]=1, stack=[0,1]
Visit 2: disc[2]=2, low[2]=2, stack=[0,1,2]
Back edge 2→0: low[2]=min(2,0)=0
low[1]=min(1,0)=0
low[0]=min(0,0)=0
SCC detection at 0: low[0]==disc[0]
Pop stack until 0: {2,1,0} form one SCC
Comparison: Tarjan vs Kosaraju
| Aspect | Tarjan | Kosaraju |
|---|---|---|
| DFS Passes | 1 | 2 |
| Transpose Graph | No | Yes |
| Complexity | O(n+m) | O(n+m) |
| Implementation | More complex | Simpler |
| Performance | ~30% faster | Easier to understand |
Aprender uses Tarjan's for better performance.
Use Cases
- Dependency Analysis: Find circular dependencies
- Compiler Optimization: Detect infinite loops
- Web Crawling: Identify link cycles
- Database Transactions: Detect deadlocks
- Social Network Analysis: Find tightly-knit groups
Topological Sort
Algorithm
Topological Sort produces a linear ordering of nodes in a directed acyclic graph (DAG) such that for every edge u → v, u appears before v. This is used for task scheduling, dependency resolution, and build systems.
Properties:
- Time Complexity: O(n + m) - DFS-based
- Space Complexity: O(n) for visited and in-stack tracking
- Returns
Some(order)for DAGs,Nonefor graphs with cycles - Multiple valid orderings may exist
Implementation
use aprender::graph::Graph;
// DAG: 0 → 1 → 3
// ↓ ↓
// 2 ───┘
let g = Graph::from_edges(
&[(0, 1), (0, 2), (1, 3), (2, 3)],
true // directed
);
let order = g.topological_sort().expect("DAG should have valid ordering");
assert_eq!(order.len(), 4);
// Verify ordering: each edge (u,v) has u before v
let pos: std::collections::HashMap<_, _> =
order.iter().enumerate().map(|(i, &v)| (v, i)).collect();
// Edge 0→1: pos[0] < pos[1]
assert!(pos[&0] < pos[&1]);
assert!(pos[&0] < pos[&2]);
assert!(pos[&1] < pos[&3]);
assert!(pos[&2] < pos[&3]);
// Cycle detection: returns None
let cycle = Graph::from_edges(&[(0, 1), (1, 2), (2, 0)], true);
assert!(cycle.topological_sort().is_none());
How It Works
Topological sort uses DFS with post-order traversal and cycle detection.
Algorithm Steps:
- Initialization: Mark all nodes as unvisited
- DFS with Cycle Detection: For each unvisited node:
- Mark as in-stack (currently exploring)
- Recursively visit all unvisited neighbors
- If neighbor is in-stack, cycle detected → return None
- Mark as visited (finished exploring)
- Add to result in post-order (after all descendants)
- Reverse: Reverse post-order to get topological order
Visual Example:
Graph: 0 → 1 → 3
↓ ↓
2 ───┘
DFS from 0:
Visit 0 (in_stack)
Visit 1 (in_stack)
Visit 3 (in_stack)
3 done → post_order=[3]
1 done → post_order=[3,1]
Visit 2 (in_stack)
3 already visited, skip
2 done → post_order=[3,1,2]
0 done → post_order=[3,1,2,0]
Reverse: [0,2,1,3] (valid topological order)
Cycle Detection:
Graph: 0 → 1 → 2 → 0 (cycle)
DFS from 0:
Visit 0 (in_stack={0})
Visit 1 (in_stack={0,1})
Visit 2 (in_stack={0,1,2})
Visit 0 (in_stack={0,1,2})
0 is in_stack → CYCLE DETECTED
Return None
Multiple Valid Orderings
DAGs often have multiple valid topological orderings:
use aprender::graph::Graph;
// Diamond DAG: 0
// / \
// 1 2
// \ /
// 3
let g = Graph::from_edges(&[(0, 1), (0, 2), (1, 3), (2, 3)], true);
let order = g.topological_sort().expect("valid DAG");
// Valid orderings: [0,1,2,3] or [0,2,1,3]
// Both satisfy: 0 before 1,2 and 1,2 before 3
Use Cases
- Build Systems: Compile source files in dependency order (Makefile, Cargo)
- Course Prerequisites: Schedule classes respecting prerequisites
- Task Scheduling: Execute tasks with dependencies (CI/CD pipelines)
- Package Managers: Install dependencies before dependents (npm, pip)
- Spreadsheet Calculations: Compute cells in formula dependency order
Kahn's Algorithm (Alternative)
Kahn's algorithm is an alternative using in-degree counting:
- Find all nodes with in-degree 0
- Add them to result, remove from graph
- Repeat until graph is empty (valid) or no zero in-degree nodes (cycle)
Comparison:
| Aspect | DFS-based (aprender) | Kahn's Algorithm |
|---|---|---|
| Complexity | O(n+m) | O(n+m) |
| Cycle Detection | Early termination | End of algorithm |
| Output Order | Deterministic | Queue-dependent |
| Implementation | Recursive/stack | Queue-based |
Aprender uses DFS-based for early cycle detection and simpler implementation.
Performance Comparison
Complexity Summary
| Algorithm | Time | Space | Use Case |
|---|---|---|---|
| DFS | O(n+m) | O(n) | Graph exploration |
| Connected Components | O(m α(n)) | O(n) | Undirected connectivity |
| SCCs (Tarjan) | O(n+m) | O(n) | Directed connectivity |
| Topological Sort | O(n+m) | O(n) | DAG ordering |
All algorithms achieve near-linear performance on sparse graphs (m ≈ n).
Benchmark Results
Synthetic graphs (average degree ≈ 3):
Algorithm | 100 nodes | 1000 nodes | 5000 nodes |
-----------------------|-----------|------------|------------|
DFS | 580 ns | 5.6 µs | 28 µs |
Connected Components | 1.2 µs | 11.5 µs | 58 µs |
SCCs (Tarjan) | 1.8 µs | 17.2 µs | 87 µs |
Topological Sort | 620 ns | 6.2 µs | 31 µs |
Key Observations:
- Perfect linear scaling: 10x nodes → ~10x time
- DFS and topological sort have minimal overhead
- SCCs ~1.5x slower than connected components (directed graph complexity)
- All algorithms <100µs for 5000-node graphs
Advanced Topics
Bi-Connected Components
Bi-connected components are maximal subgraphs with no articulation points (bridges). Removing any single node doesn't disconnect the component.
Application: Network resilience analysis
Not yet implemented in aprender (future roadmap).
Condensation Graph
The condensation graph represents SCCs as nodes, with edges between SCCs.
Original: 0 → 1 ⇄ 2 Condensation: {0} → {1,2} → {3}
↓ ↓
3 ←─────┘
Property: Condensation is always a DAG
Use Case: Simplify graph analysis by collapsing cycles
Parallel Algorithms
DFS is inherently sequential (stack-based), but components can be parallelized:
- Parallel Union-Find: Use concurrent data structures for find/union
- Parallel SCCs: Multiple independent DFS starting points
- Parallel Topological Sort: Level-based parallelization
Not yet implemented in aprender (future optimization).
See Also
- Graph Algorithms - Centrality and structural analysis
- Graph Pathfinding - Shortest path algorithms
- Graph Link Prediction - Community detection and link analysis
- Graph Examples - Practical usage examples
References
-
Tarjan, R. E. (1972). "Depth-first search and linear graph algorithms." SIAM Journal on Computing, 1(2), 146-160.
-
Tarjan, R. E. (1975). "Efficiency of a good but not linear set union algorithm." Journal of the ACM, 22(2), 215-225.
-
Cormen, T. H., et al. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
- Chapter 22: Elementary Graph Algorithms (DFS, topological sort)
- Chapter 21: Data Structures for Disjoint Sets (Union-Find)
-
Knuth, D. E. (1997). The Art of Computer Programming, Volume 1: Fundamental Algorithms (3rd ed.). Section 2.3.3: Topological Sorting.
-
Sharir, M. (1981). "A strong-connectivity algorithm and its applications in data flow analysis." Computers & Mathematics with Applications, 7(1), 67-72.