Golden Trace Validation
Status: Integrated (v0.7.0) Tool: Renacer 0.6.2 Purpose: Performance regression detection via syscall tracing
Overview
Golden trace validation uses Renacer (pure Rust syscall tracer) to capture canonical execution traces for Trueno compute examples. These traces serve as performance baselines, enabling:
- Regression Detection: Detect performance degradation via syscall count/latency budgets
- PCIe Bottleneck Analysis: Identify inefficient GPU memory transfers
- Build-Time Assertions: Enforce performance contracts in CI/CD
- Root Cause Analysis: Correlate syscalls to Rust source code
Quick Start
1. Install Renacer
cargo install renacer --version 0.6.2
2. Capture Golden Traces
cd /path/to/trueno
./scripts/capture_golden_traces.sh
Output:
✅ Captured: golden_traces/backend_detection.json (0.73ms, 87 syscalls)
✅ Captured: golden_traces/matrix_operations.json (1.56ms, 168 syscalls)
✅ Captured: golden_traces/activation_functions.json (1.30ms, 159 syscalls)
✅ Captured: golden_traces/performance_demo.json (1.51ms, 138 syscalls)
✅ Captured: golden_traces/ml_similarity.json (0.82ms, 109 syscalls)
3. View Trace Summary
cat golden_traces/backend_detection_summary.txt
Example Output:
Syscall Summary:
write: 23 calls (0.15ms total)
mmap: 13 calls (0.21ms total)
mprotect: 6 calls (0.08ms total)
munmap: 5 calls (0.04ms total)
...
TOTAL: 87 calls (0.73ms total)
Traced Operations
1. Backend Detection (backend_detection)
Purpose: Validate SIMD backend auto-selection (AVX-512 → AVX2 → SSE2 → Scalar)
Performance Budget:
- Runtime: <10ms
- Syscalls: <100
- Memory: <10MB
Actual Performance: ✅
- Runtime: 0.73ms (13× faster than budget)
- Syscalls: 87
- Top syscalls:
write(23),mmap(13),mprotect(6)
Trace Capture:
renacer --format json -- ./target/release/examples/backend_detection > backend_detection.json
2. Matrix Operations (matrix_operations)
Purpose: Measure SIMD-accelerated matrix multiply and transpose overhead
Performance Budget:
- Runtime: <20ms
- Syscalls: <200
Actual Performance: ✅
- Runtime: 1.56ms (15× faster)
- Syscalls: 168
Key Insight: SIMD operations are compute-bound (minimal syscalls)
3. ML Activation Functions (activation_functions)
Purpose: Measure SIMD-accelerated activations (ReLU, sigmoid, tanh, GELU, swish)
Performance Budget:
- Runtime: <20ms
- Syscalls: <200
Actual Performance: ✅
- Runtime: 1.30ms
- Syscalls: 159
4. Performance Demo (performance_demo)
Purpose: Comprehensive benchmark across vector ops, matrix ops, and backend comparisons
Performance Budget:
- Runtime: <50ms
- Syscalls: <300
Actual Performance: ✅
- Runtime: 1.51ms (33× faster)
- Syscalls: 138
5. ML Similarity (ml_similarity)
Purpose: Measure vector similarity operations (cosine, Euclidean, Manhattan)
Performance Budget:
- Runtime: <20ms
- Syscalls: <200
Actual Performance: ✅ FASTEST
- Runtime: 0.82ms
- Syscalls: 109
Why Fast: Heavily optimized SIMD dot product, minimal allocations
Performance Assertions (renacer.toml)
Critical Path Latency
[[assertion]]
name = "example_startup_latency"
type = "critical_path"
max_duration_ms = 100
fail_on_violation = true
enabled = true
Rationale: Compute examples should complete quickly. 100ms allows for SIMD initialization and small-scale computations.
Violation Symptoms:
- SIMD overhead issues
- Unexpected I/O operations
- Debug build instead of release
Syscall Budget
[[assertion]]
name = "max_syscall_budget"
type = "span_count"
max_spans = 500
fail_on_violation = true
enabled = true
Rationale: SIMD operations are CPU-bound with minimal syscalls (mostly mmap for allocation). Budget prevents I/O regressions.
Typical Syscalls:
write: stdout output (20-50 calls)mmap: vector/matrix allocation (10-30 calls)mprotect: memory permissions (5-10 calls)
Memory Allocation Budget
[[assertion]]
name = "memory_allocation_budget"
type = "memory_usage"
max_bytes = 104857600 # 100MB
tracking_mode = "allocations"
fail_on_violation = true
enabled = true
Rationale: Small examples should have minimal memory footprint. 100MB accommodates matrix allocations and SIMD buffers.
PCIe Bottleneck Detection
[[assertion]]
name = "detect_pcie_bottleneck"
type = "anti_pattern"
pattern = "PCIeBottleneck"
threshold = 0.7
fail_on_violation = false # Warning only
enabled = true
Pattern Detected: GPU transfer time >> compute time
Symptoms:
- Many
write/readsyscalls to/dev/nvidia* - High
ioctlfrequency for GPU operations - Transfer overhead dominates (>70% of total time)
Example Warning:
⚠️ PCIe Bottleneck detected (confidence: 85%)
GPU transfers: 45ms (90% of total time)
Compute time: 5ms (10% of total time)
Recommendation: Batch operations, keep data on GPU
Solution:
- Batch multiple operations
- Keep intermediate results on GPU
- Use larger workloads (amortize transfer costs)
- Trueno automatically disables GPU for small ops (v0.2.1+)
CI/CD Integration
GitHub Actions Workflow
Add to .github/workflows/ci.yml:
name: Golden Trace Validation
on: [push, pull_request]
jobs:
validate-traces:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@stable
- name: Install Renacer
run: cargo install renacer --version 0.6.2
- name: Build Examples (Release)
run: cargo build --release --examples
- name: Capture Golden Traces
run: ./scripts/capture_golden_traces.sh
- name: Run Performance Assertions
run: |
renacer --assert renacer.toml -- ./target/release/examples/backend_detection
renacer --assert renacer.toml -- ./target/release/examples/matrix_operations
renacer --assert renacer.toml -- ./target/release/examples/activation_functions
- name: Upload Traces
uses: actions/upload-artifact@v3
with:
name: golden-traces
path: golden_traces/
CI Failure Example:
❌ Assertion 'example_startup_latency' FAILED
Actual: 125ms
Budget: 100ms
Regression: +25%
⚠️ Build BLOCKED. SIMD overhead regression detected.
Advanced Usage
1. Source Code Correlation
Map syscalls to Rust source code:
renacer -s -- ./target/release/examples/backend_detection
Output:
write(1, "Backend: AVX2\n", 14) = 14 [src/lib.rs:245]
mmap(...) = 0x7f... [src/vector.rs:89]
Use Case: Identify which code paths trigger GPU initialization or excessive allocations.
2. OpenTelemetry Export
Visualize traces in Jaeger:
# Start Jaeger
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
# Export trace
renacer --otlp http://localhost:4317 -- ./target/release/examples/performance_demo
# View in Jaeger UI
open http://localhost:16686
Use Case: Visualize syscall timelines for multi-operation pipelines.
3. Regression Analysis
Compare current execution against baseline:
# Capture current trace
renacer --format json -- ./target/release/examples/backend_detection > current.json
# Compare with golden
diff <(jq '.syscalls | length' golden_traces/backend_detection.json) \
<(jq '.syscalls | length' current.json)
Expected: No difference in syscall count (±5% tolerance)
4. GPU Workload Analysis
For GPU-enabled builds:
# Build with GPU feature
cargo build --release --examples --features gpu
# Trace GPU example
renacer --format json -- ./target/release/examples/gpu_test > gpu_trace.json
# Filter GPU device operations
jq '.syscalls[] | select(.name == "ioctl" or .name == "write")' gpu_trace.json
Expected: GPU operations show as ioctl calls to /dev/nvidia0
Red Flag: If transfer syscalls dominate, GPU is inefficient for this workload size.
Toyota Way Principles
Andon (Stop the Line)
Implementation: Build-time assertions fail CI on regression
[[assertion]]
fail_on_violation = true # ← Andon: Stop the CI pipeline
Poka-Yoke (Error-Proofing)
Implementation: Golden traces make expected patterns explicit
# Automated comparison prevents silent regressions
diff golden_traces/backend_detection.json new_trace.json
Jidoka (Autonomation)
Implementation: Automated quality enforcement without manual intervention
# GitHub Actions runs golden trace validation automatically
- name: Validate Performance
run: ./scripts/capture_golden_traces.sh
Troubleshooting
Issue: Capture script fails with "Binary not found"
Solution:
cargo build --release --examples
./scripts/capture_golden_traces.sh
Issue: Performance regression detected
Diagnosis:
renacer --summary --timing -- ./target/release/examples/backend_detection
cat golden_traces/backend_detection_summary.txt
Common Causes:
- Debug build instead of release (
cargo build --release) - SIMD features disabled (check
RUSTFLAGS) - New dependencies (increase initialization overhead)
Issue: Syscall count regression
Diagnosis:
renacer -- ./target/release/examples/backend_detection > current_trace.txt
diff current_trace.txt golden_traces/backend_detection_summary.txt
Common Causes:
- New logging initialization (env_logger, tracing)
- Allocator changes (jemalloc → system allocator)
- Library updates (different I/O patterns)
Performance Baselines (v0.7.0)
| Example | Runtime | Syscalls | Top Syscall | Status |
|---|---|---|---|---|
backend_detection | 0.73ms | 87 | write (23) | ✅ |
matrix_operations | 1.56ms | 168 | write (45) | ✅ |
activation_functions | 1.30ms | 159 | write (38) | ✅ |
performance_demo | 1.51ms | 138 | mmap (25) | ✅ |
ml_similarity | 0.82ms | 109 | write (28) | ✅ FASTEST |
Platform: x86_64 Linux, AVX2 backend, Release build
References
Last Updated: 2025-11-23 Renacer Version: 0.6.2 Trueno Version: 0.7.0