Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 3: Optimization

Phase 3 analyzes transpiled code for compute-intensive patterns and selects optimal execution backends using Mixture-of-Experts (MoE) routing.

Overview

After transpilation produces Rust code, the optimization phase identifies opportunities for hardware acceleration:

Transpiled .rs files
       │
       ▼
┌──────────────────┐
│ Pattern Scanner  │ ← Scan for matmul, reduce, iter patterns
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  MoE Router      │ ← BackendSelector::select_with_moe()
│  (5× PCIe Rule)  │
└────────┬─────────┘
         │
    ┌────┼────┐
    ▼    ▼    ▼
 Scalar SIMD  GPU     ← Per-pattern recommendation

The 5x PCIe Dispatch Rule

Based on Gregg & Hazelwood (2011), GPU dispatch is only beneficial when:

compute_time > 5 × transfer_time

This prevents wasteful GPU dispatch for small workloads where PCIe transfer overhead dominates. The --gpu-threshold flag controls the matrix size cutoff (default: 500).

Compute Pattern Classification

PatternComplexityRecommended Backend
matmul/gemm/dot_productHighGPU (if above threshold)
.sum()/.fold()/reduceMediumSIMD
.iter().map()/.zip()LowScalar

Cargo Profile Optimization

The optimizer writes [profile.release] settings to Cargo.toml:

Profileopt-levelLTOcodegen-unitsStrip
Fast2off16
Balanced3thin4
Aggressive3full1symbols

Jidoka Integration

If optimization analysis fails (e.g., output directory missing), the phase is marked as failed in the workflow state machine. Subsequent phases (Validation, Build) will refuse to run until the issue is resolved.

CLI Reference

See batuta optimize for full command documentation.


Previous: Phase 2: Transpilation Next: Phase 4: Validation