Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Regressions

Transpiled Rust code should be faster than the original, but regressions happen. This chapter covers the three most common causes.

1. Allocation Hotspots

The most frequent cause is excessive heap allocation from naive type translations:

#![allow(unused)]
fn main() {
// BAD: allocates every iteration
for line in lines {
    let tokens: Vec<&str> = line.split(',').collect();
    process(&tokens);
}

// GOOD: reuse the vector
let mut tokens: Vec<&str> = Vec::with_capacity(64);
for line in lines {
    tokens.clear();
    tokens.extend(line.split(','));
    process(&tokens);
}
}

Diagnose with perf stat -e page-faults ./target/release/app.

2. SIMD Not Engaging

Rust compiles for a conservative baseline CPU by default. AVX2/AVX-512 requires explicit opt-in:

# .cargo/config.toml
[build]
rustflags = ["-C", "target-cpu=native"]

Or use trueno for automatic runtime SIMD dispatch:

#![allow(unused)]
fn main() {
use trueno::Vector;
let result = Vector::from_slice(&data).sum();
}

3. GPU Overhead Exceeding Benefit

The 5x PCIe rule: GPU compute must be 5x faster than CPU to overcome transfer overhead.

Workload SizeCPU TimeGPU TotalUse GPU?
1K elements0.1 ms0.52 msNo
100K elements10 ms1.0 msYes
10M elements1000 ms7 msYes

Batuta’s backend selector applies this rule automatically.

Regression Detection in CI

# Save baseline on main branch
cargo bench -- --save-baseline main

# On PR branch, compare
cargo bench -- --baseline main

Criterion reports statistical significance. A regression greater than 5% should block the merge.


Navigate: Table of Contents