Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SIMD Vectorization

SIMD (Single Instruction, Multiple Data) vectorization is the primary optimization target in Phase 3. The Trueno crate provides portable SIMD backends that accelerate element-wise and reduction operations across CPU architectures.

Supported SIMD Backends

BackendArchitectureRegister WidthTypical Speedup
AVX2x86-64 (Haswell+)256-bit (8 x f32)4-8x
AVX-512x86-64 (Skylake-X+)512-bit (16 x f32)8-16x
NEONARM (ARMv8+)128-bit (4 x f32)2-4x
ScalarAll32/64-bit1x (baseline)

Automatic Detection

Trueno detects the best available SIMD instruction set at runtime using cpuid (x86) or feature registers (ARM). When the BackendSelector returns Backend::SIMD, it maps to trueno::Backend::Auto, letting Trueno pick the optimal instruction set:

#![allow(unused)]
fn main() {
pub fn to_trueno_backend(backend: Backend) -> trueno::Backend {
    match backend {
        Backend::Scalar => trueno::Backend::Scalar,
        Backend::SIMD   => trueno::Backend::Auto,
        Backend::GPU    => trueno::Backend::GPU,
    }
}
}

When SIMD Is Selected

The MoE router selects SIMD for:

  • Low complexity operations (element-wise add, multiply) at 1M+ elements
  • Medium complexity operations (reductions, dot product) at 10K-100K elements
  • High complexity operations (matrix multiply) at 1K-10K elements

Below these thresholds, scalar code is sufficient. Above them, GPU dispatch becomes beneficial.

Code Patterns That Benefit

PatternPythonTrueno (SIMD)
Vector additionnp.add(a, b)a.add(&b)
Element-wise multiplya * ba.mul(&b)
Dot productnp.dot(a, b)a.dot(&b)
Sum reductionnp.sum(a)a.sum()
Matrix multiplya @ bmat_a.matmul(&mat_b)

Example: Vector Addition

#![allow(unused)]
fn main() {
use trueno::Vector;

let a = Vector::from_slice(&[1.0, 2.0, 3.0, 4.0]);
let b = Vector::from_slice(&[5.0, 6.0, 7.0, 8.0]);
let c = a.add(&b).unwrap();
// c = [6.0, 8.0, 10.0, 12.0]
// Automatically uses AVX2/AVX-512/NEON based on CPU
}

Verifying SIMD Usage

# Check which SIMD features are available
rustc --print cfg | grep target_feature

# Verify Trueno detected the correct backend
RUST_LOG=trueno=debug cargo run 2>&1 | grep "Selected backend"

Portability

Code using trueno::Backend::Auto compiles and runs on any platform. On systems without SIMD support, Trueno falls back to scalar loops with identical results. No conditional compilation or feature flags are needed in user code.


Navigate: Table of Contents