Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Optimization Settings

The [optimization] section controls Phase 3 of the pipeline: SIMD vectorization, GPU dispatch, backend selection, and the Trueno compute backend.

Top-Level Settings

[optimization]
profile = "balanced"
enable_simd = true
enable_gpu = false
gpu_threshold = 500
use_moe_routing = false
KeyTypeDefaultDescription
profilestring"balanced"Optimization profile: "fast", "balanced", or "aggressive"
enable_simdbooltrueEnable SIMD vectorization (AVX2/AVX-512/NEON)
enable_gpuboolfalseEnable GPU dispatch via wgpu
gpu_thresholdinteger500Minimum matrix dimension before GPU dispatch is considered
use_moe_routingboolfalseEnable Mixture-of-Experts backend selection

Optimization Profiles

ProfileCompile TimeRuntimeUse Case
fastFastestGoodDevelopment iteration
balancedModerateBetterDefault for most projects
aggressiveSlowestBestProduction, benchmarking

Backend Selection Thresholds

Batuta uses a cost-based backend selector based on the 5x PCIe rule (Gregg and Hazelwood, 2011). The gpu_threshold value sets the minimum matrix dimension at which GPU dispatch becomes profitable after accounting for host-to-device transfer overhead.

  • Below the threshold: SIMD or scalar execution on CPU.
  • Above the threshold: GPU dispatch if enable_gpu is true.

When use_moe_routing is enabled, a Mixture-of-Experts router learns from prior dispatch decisions and adjusts thresholds adaptively.

Trueno Backend Configuration

[optimization.trueno]
backends = ["simd", "cpu"]
adaptive_thresholds = false
cpu_threshold = 500
KeyTypeDefaultDescription
backendsarray["simd", "cpu"]Backend priority order ("gpu", "simd", "cpu")
adaptive_thresholdsboolfalseLearn dispatch thresholds from runtime telemetry
cpu_thresholdinteger500Element count below which scalar CPU is preferred over SIMD

Target Architecture Hints

The backends array is ordered by preference. Batuta tries each backend in order and falls back to the next if the preferred one is unavailable or below the dispatch threshold.

# GPU-first configuration for a machine with a discrete GPU
[optimization.trueno]
backends = ["gpu", "simd", "cpu"]
adaptive_thresholds = true
cpu_threshold = 256
# Conservative CPU-only configuration
[optimization.trueno]
backends = ["cpu"]
adaptive_thresholds = false
cpu_threshold = 0

The row-major tensor layout mandate (LAYOUT-002) applies to all backends. See the Memory Layout chapter for details.


Navigate: Table of Contents