Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Bottleneck Identification

Identifying the true bottleneck before optimizing saves weeks of wasted effort. This chapter covers CPU profiling, syscall analysis, and memory allocation tracking.

CPU Profiling with Flamegraph

cargo install flamegraph
cargo flamegraph --root --bin batuta -- analyze /path/to/project

Reading the Flamegraph

PatternMeaningAction
Wide plateau at topSingle function dominatesOptimize or parallelize
Many thin towersOverhead spread evenlyAlgorithmic improvement
Deep call stackExcessive abstractionConsider inlining
alloc:: framesAllocation overheadPre-allocate or stack buffers

Syscall Analysis with renacer

renacer trace -- batuta transpile --source ./src
SymptomSyscall PatternFix
Slow file I/OMany small read() callsBufReader
Slow startupMany open() on configsLazy load or include_str!
Memory pressureFrequent mmap/munmapPre-allocate, reuse buffers
Lock contentionfutex() spinningReduce critical section

Memory Allocation Tracking

#![allow(unused)]
fn main() {
// Reuse buffers instead of allocating
let mut buffer = Vec::with_capacity(max_item_size);
for item in items {
    buffer.clear();
    buffer.extend_from_slice(item);
    process(&buffer);
}
}

The Bottleneck Decision Tree

CPU-bound? (check with perf stat)
├── Yes -> Flamegraph -> Find hot function -> Optimize or SIMD
└── No
    ├── I/O-bound? (renacer trace)
    │   ├── Disk -> Buffered I/O, mmap, async
    │   └── Network -> Connection pooling, batching
    └── Memory-bound? (perf stat bandwidth)
        ├── Allocation-heavy -> DHAT, pre-allocate
        └── Cache-miss-heavy -> Improve data layout

The 2.05x throughput improvement in Profiling was discovered by this process: perf stat showed low IPC, flamegraph showed rayon sync overhead, reducing threads from 48 to 16 eliminated cache line bouncing.


Navigate: Table of Contents