MoE Backend Selection

The Mixture-of-Experts (MoE) router is the core decision engine in Phase 3 optimization. It classifies each compute operation by complexity and data size, then selects the optimal backend: Scalar, SIMD, or GPU.

How MoE Routing Works

The BackendSelector::select_with_moe() method takes two inputs:

Operation complexity – Low, Medium, or High
Data size – number of elements in the operation

#![allow(unused)]
fn main() {
pub fn select_with_moe(&self, complexity: OpComplexity, data_size: usize) -> Backend {
    match complexity {
        OpComplexity::Low => {
            if data_size > 1_000_000 { Backend::SIMD }
            else { Backend::Scalar }
        }
        OpComplexity::Medium => {
            if data_size > 100_000 { Backend::GPU }
            else if data_size > 10_000 { Backend::SIMD }
            else { Backend::Scalar }
        }
        OpComplexity::High => {
            if data_size > 10_000 { Backend::GPU }
            else if data_size > 1_000 { Backend::SIMD }
            else { Backend::Scalar }
        }
    }
}
}

Complexity Classification

Level	Operations	Algorithmic Complexity	Memory Pattern
Low	add, subtract, multiply, reshape	O(n)	Memory-bound
Medium	sum, mean, max, min, dot product	O(n)	Moderate compute
High	matmul, convolution, attention	O(n^2) or O(n^3)	Compute-bound

Threshold Table

Complexity	Scalar	SIMD	GPU
Low	< 1M elements	>= 1M elements	Never
Medium	< 10K elements	10K – 100K elements	> 100K elements
High	< 1K elements	1K – 10K elements	> 10K elements

These thresholds are derived from empirical benchmarks on Trueno SIMD kernels and the 5x PCIe dispatch rule from Gregg and Hazelwood (2011).

Per-Converter Integration

Each framework converter embeds complexity metadata in its operation mappings:

#![allow(unused)]
fn main() {
// NumPy
NumPyOp::Add.complexity()                         // Low
NumPyOp::Sum.complexity()                         // Medium
NumPyOp::Dot.complexity()                         // High

// sklearn
SklearnAlgorithm::StandardScaler.complexity()     // Low
SklearnAlgorithm::LinearRegression.complexity()   // Medium
SklearnAlgorithm::KMeans.complexity()             // High

// PyTorch
PyTorchOperation::TensorCreation.complexity()     // Low
PyTorchOperation::Linear.complexity()             // Medium
PyTorchOperation::Forward.complexity()            // High
}

End-to-End Example

#![allow(unused)]
fn main() {
let converter = NumPyConverter::new();

// Small array addition: Scalar
converter.recommend_backend(&NumPyOp::Add, 100);       // Scalar

// Large array addition: SIMD
converter.recommend_backend(&NumPyOp::Add, 2_000_000); // SIMD

// Large matrix multiply: GPU
converter.recommend_backend(&NumPyOp::Dot, 50_000);    // GPU
}

The cost model parameters are configurable for different hardware. See GPU Acceleration for tuning details.

Navigate: Table of Contents