Phase 3: Optimization

Phase 3 analyzes transpiled code for compute-intensive patterns and selects optimal execution backends using Mixture-of-Experts (MoE) routing.

Overview

After transpilation produces Rust code, the optimization phase identifies opportunities for hardware acceleration:

Transpiled .rs files
       │
       ▼
┌──────────────────┐
│ Pattern Scanner  │ ← Scan for matmul, reduce, iter patterns
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  MoE Router      │ ← BackendSelector::select_with_moe()
│  (5× PCIe Rule)  │
└────────┬─────────┘
         │
    ┌────┼────┐
    ▼    ▼    ▼
 Scalar SIMD  GPU     ← Per-pattern recommendation

The 5x PCIe Dispatch Rule

Based on Gregg & Hazelwood (2011), GPU dispatch is only beneficial when:

compute_time > 5 × transfer_time

This prevents wasteful GPU dispatch for small workloads where PCIe transfer overhead dominates. The --gpu-threshold flag controls the matrix size cutoff (default: 500).

Compute Pattern Classification

Pattern	Complexity	Recommended Backend
`matmul`/`gemm`/`dot_product`	High	GPU (if above threshold)
`.sum()`/`.fold()`/`reduce`	Medium	SIMD
`.iter().map()`/`.zip()`	Low	Scalar

Cargo Profile Optimization

The optimizer writes [profile.release] settings to Cargo.toml:

Profile	`opt-level`	LTO	`codegen-units`	Strip
Fast	2	off	16	—
Balanced	3	thin	4	—
Aggressive	3	full	1	symbols

If optimization analysis fails (e.g., output directory missing), the phase is marked as failed in the workflow state machine. Subsequent phases (Validation, Build) will refuse to run until the issue is resolved.

CLI Reference

See batuta optimize for full command documentation.

Previous: Phase 2: Transpilation Next: Phase 4: Validation

The Batuta Book

Phase 3: Optimization

Overview

The 5x PCIe Dispatch Rule

Compute Pattern Classification

Cargo Profile Optimization

Jidoka Integration

CLI Reference

Keyboard shortcuts

The Batuta Book

Phase 3: Optimization

Overview

The 5x PCIe Dispatch Rule

Compute Pattern Classification

Cargo Profile Optimization

Jidoka Integration

CLI Reference