Ring-Allreduce
Demonstrates the ring-allreduce algorithm for distributed gradient aggregation across worker nodes. Proceeds in two phases (scatter-reduce and allgather) over a logical ring with optimal bandwidth utilization.
CLI Equivalent
N/A
Key Concepts
- Scatter-reduce phase: partial gradient accumulation around the ring
- Allgather phase: broadcast fully-reduced chunks to all workers
- Bandwidth-optimal O(N) communication pattern
Run
cargo run --example distributed_ring_allreduce