Benchmarking Methodology

Trueno-DB uses Criterion.rs for statistical benchmarking following Toyota Way principle of Genchi Genbutsu (Go and See).

Running Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark suite
cargo bench --bench storage_benchmarks

# Run specific benchmark
cargo bench --bench storage_benchmarks parquet_loading

Benchmark Suites

Storage Benchmarks (`storage_benchmarks.rs`) ✅

Measures Arrow storage backend performance:

Parquet file loading
Morsel iteration overhead
RecordBatch slicing (zero-copy)
Memory calculation performance

Aggregation Benchmarks (`aggregations.rs`) 🚧

Future: GPU vs SIMD vs Scalar comparison for:

SUM aggregations
AVG, COUNT, MIN, MAX
Target: 50-100x GPU speedup for 100M+ rows

Backend Comparison (`backend_comparison.rs`) 🚧

Future: Backend equivalence verification:

GPU == SIMD == Scalar (correctness)
Performance comparison
Cost model validation

Statistical Rigor

Criterion.rs provides:

Warm-up phase (3 seconds) - eliminate cold cache effects
100 samples - statistical significance
Outlier detection - identify anomalies
Confidence intervals - quantify measurement uncertainty
Regression detection - track performance over time

Interpreting Results

Time Measurements

parquet_loading/10000   time:   [124.53 µs 125.77 µs 127.17 µs]
                                  ^^^^^^^^  ^^^^^^^^  ^^^^^^^^
                                  Lower CI  Estimate  Upper CI

Estimate: Best measured performance (median)
Confidence Interval: 95% confidence bounds
Outliers: Measurements outside expected range

Throughput Calculation

// For 10,000 rows in 125.77 µs:
throughput = 10_000 / 0.12577 ms = 79,500 rows/ms
           = 79.5M rows/second

Toyota Way Principles

Genchi Genbutsu (Go and See)

✅ Measure actual performance - Don't guess, benchmark

Real Parquet files (not synthetic data)
Multiple dataset sizes (1K, 10K, 100K, 1M rows)
Realistic workloads (storage → morsel → GPU queue)

Kaizen (Continuous Improvement)

✅ Track performance over time

Criterion saves historical data
Regression detection on every run
Identify performance regressions early

Example output:

parquet_loading/10000   time:   [125.77 µs 126.34 µs 126.93 µs]
                        change: [-2.3421% -1.8934% -1.4128%] (p = 0.00 < 0.05)
                        Performance has improved.

Muda (Waste Elimination)

✅ Identify bottlenecks before optimizing

Measure morsel iteration overhead
Quantify zero-copy benefits
Validate architectural assumptions

Benchmark-Driven Development

RED: Write failing performance test

// Target: < 1ms for 100K rows
assert!(duration < Duration::from_millis(1));

GREEN: Implement until benchmark passes

parquet_loading/100000  time:   [881.1 µs ...]  ✅ PASS

REFACTOR: Optimize with benchmarks as safety net
- Change implementation
- Re-run benchmarks
- Ensure no regression

Performance Targets

Phase 1 (Current)

Component	Target	Actual	Status
Parquet loading (100K)	< 1 ms	881 µs	✅
Morsel iteration	< 500 ns	119 ns	✅
Batch slicing	< 200 ns	108 ns	✅

Phase 2 (Future)

Component	Target	Status
GPU SUM (100M rows)	< 100 ms	🚧
Backend selection	< 10 µs	🚧
JIT compilation	< 1 ms	🚧

Profiling Integration

For detailed performance analysis:

# CPU profiling with perf
cargo bench --bench storage_benchmarks --profile-time 60

# Memory profiling with valgrind
valgrind --tool=massif target/release/deps/storage_benchmarks-*

# Flame graph generation
cargo flamegraph --bench storage_benchmarks

CI/CD Integration

Benchmarks run on every PR:

Detect performance regressions
Require < 5% slowdown for approval
Historical tracking in target/criterion/

Trueno-DB - GPU-Accelerated Database with EXTREME TDD