Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MoE Backend Selection

The Mixture-of-Experts (MoE) router is the core decision engine in Phase 3 optimization. It classifies each compute operation by complexity and data size, then selects the optimal backend: Scalar, SIMD, or GPU.

How MoE Routing Works

The BackendSelector::select_with_moe() method takes two inputs:

  1. Operation complexity – Low, Medium, or High
  2. Data size – number of elements in the operation
#![allow(unused)]
fn main() {
pub fn select_with_moe(&self, complexity: OpComplexity, data_size: usize) -> Backend {
    match complexity {
        OpComplexity::Low => {
            if data_size > 1_000_000 { Backend::SIMD }
            else { Backend::Scalar }
        }
        OpComplexity::Medium => {
            if data_size > 100_000 { Backend::GPU }
            else if data_size > 10_000 { Backend::SIMD }
            else { Backend::Scalar }
        }
        OpComplexity::High => {
            if data_size > 10_000 { Backend::GPU }
            else if data_size > 1_000 { Backend::SIMD }
            else { Backend::Scalar }
        }
    }
}
}

Complexity Classification

LevelOperationsAlgorithmic ComplexityMemory Pattern
Lowadd, subtract, multiply, reshapeO(n)Memory-bound
Mediumsum, mean, max, min, dot productO(n)Moderate compute
Highmatmul, convolution, attentionO(n^2) or O(n^3)Compute-bound

Threshold Table

ComplexityScalarSIMDGPU
Low< 1M elements>= 1M elementsNever
Medium< 10K elements10K – 100K elements> 100K elements
High< 1K elements1K – 10K elements> 10K elements

These thresholds are derived from empirical benchmarks on Trueno SIMD kernels and the 5x PCIe dispatch rule from Gregg and Hazelwood (2011).

Per-Converter Integration

Each framework converter embeds complexity metadata in its operation mappings:

#![allow(unused)]
fn main() {
// NumPy
NumPyOp::Add.complexity()                         // Low
NumPyOp::Sum.complexity()                         // Medium
NumPyOp::Dot.complexity()                         // High

// sklearn
SklearnAlgorithm::StandardScaler.complexity()     // Low
SklearnAlgorithm::LinearRegression.complexity()   // Medium
SklearnAlgorithm::KMeans.complexity()             // High

// PyTorch
PyTorchOperation::TensorCreation.complexity()     // Low
PyTorchOperation::Linear.complexity()             // Medium
PyTorchOperation::Forward.complexity()            // High
}

End-to-End Example

#![allow(unused)]
fn main() {
let converter = NumPyConverter::new();

// Small array addition: Scalar
converter.recommend_backend(&NumPyOp::Add, 100);       // Scalar

// Large array addition: SIMD
converter.recommend_backend(&NumPyOp::Add, 2_000_000); // SIMD

// Large matrix multiply: GPU
converter.recommend_backend(&NumPyOp::Dot, 50_000);    // GPU
}

The cost model parameters are configurable for different hardware. See GPU Acceleration for tuning details.


Navigate: Table of Contents