Speaker Diarization
Identifies "who spoke when" in a multi-speaker audio stream using simplified speaker embeddings and k-means clustering. Demonstrates the full pipeline: audio generation, per-frame feature extraction, clustering, and turn merging.
CLI Equivalent
N/A
Key Concepts
- Speaker embedding extraction from audio frames
- K-means clustering for speaker identification
- Turn merging for contiguous speaker segments
Run
cargo run --example speech_diarization