Bottleneck Identification
Identifying the true bottleneck before optimizing saves weeks of wasted effort. This chapter covers CPU profiling, syscall analysis, and memory allocation tracking.
CPU Profiling with Flamegraph
cargo install flamegraph
cargo flamegraph --root --bin batuta -- analyze /path/to/project
Reading the Flamegraph
| Pattern | Meaning | Action |
|---|---|---|
| Wide plateau at top | Single function dominates | Optimize or parallelize |
| Many thin towers | Overhead spread evenly | Algorithmic improvement |
| Deep call stack | Excessive abstraction | Consider inlining |
alloc:: frames | Allocation overhead | Pre-allocate or stack buffers |
Syscall Analysis with renacer
renacer trace -- batuta transpile --source ./src
| Symptom | Syscall Pattern | Fix |
|---|---|---|
| Slow file I/O | Many small read() calls | BufReader |
| Slow startup | Many open() on configs | Lazy load or include_str! |
| Memory pressure | Frequent mmap/munmap | Pre-allocate, reuse buffers |
| Lock contention | futex() spinning | Reduce critical section |
Memory Allocation Tracking
#![allow(unused)]
fn main() {
// Reuse buffers instead of allocating
let mut buffer = Vec::with_capacity(max_item_size);
for item in items {
buffer.clear();
buffer.extend_from_slice(item);
process(&buffer);
}
}
The Bottleneck Decision Tree
CPU-bound? (check with perf stat)
├── Yes -> Flamegraph -> Find hot function -> Optimize or SIMD
└── No
├── I/O-bound? (renacer trace)
│ ├── Disk -> Buffered I/O, mmap, async
│ └── Network -> Connection pooling, batching
└── Memory-bound? (perf stat bandwidth)
├── Allocation-heavy -> DHAT, pre-allocate
└── Cache-miss-heavy -> Improve data layout
The 2.05x throughput improvement in Profiling was discovered by this process: perf stat showed low IPC, flamegraph showed rayon sync overhead, reducing threads from 48 to 16 eliminated cache line bouncing.
Navigate: Table of Contents