Trueno: Multi-target Compute

Trueno (Spanish: “thunder”) is a Rust library providing unified, high-performance compute primitives across multiple execution targets. It serves as the foundation for numerical computation in the sovereign stack.

Overview

Trueno delivers:

CPU SIMD - x86 (SSE2/AVX/AVX2/AVX-512), ARM (NEON), WASM (SIMD128)
GPU - Vulkan/Metal/DX12/WebGPU via wgpu
WebAssembly - Portable SIMD128 for browser/edge deployment

┌─────────────────────────────────────────────────┐
│           Trueno Public API (Safe)              │
│  compute(), map(), reduce(), transform()        │
└─────────────────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        ▼             ▼             ▼
   ┌────────┐   ┌─────────┐   ┌──────────┐
   │  SIMD  │   │   GPU   │   │   WASM   │
   │ Backend│   │ Backend │   │  Backend │
   └────────┘   └─────────┘   └──────────┘
        │             │             │
   ┌────┴────┐   ┌────┴────┐   ┌───┴─────┐
   │ Runtime │   │  wgpu   │   │ SIMD128 │
   │ Detect  │   │ Compute │   │ Portable│
   └─────────┘   └─────────┘   └─────────┘

Installation

[dependencies]
trueno = "0.14"

# With GPU support
trueno = { version = "0.14", features = ["gpu"] }

# With CUDA monitoring (NVIDIA GPUs)
trueno = { version = "0.14", features = ["cuda-monitor"] }

What’s New in 0.14

Streaming Tensors: Memory-mapped streaming for large datasets
Q5K/Q6K Quantization: Extended quantization formats
Improved WASM: Better WebAssembly SIMD128 support
LZ4/ZSTD Compression: Built-in tensor compression for memory efficiency
GPU PTX Fixes: Resolved NVIDIA PTX codegen issues
AVX-512 Improvements: Better auto-vectorization
Simulation Framework: Toyota-style Jidoka guards and stress testing

Core Features

Vector Operations

#![allow(unused)]
fn main() {
use trueno::{Vector, VectorOps};

// Create vectors
let a = Vector::from_slice(&[1.0, 2.0, 3.0, 4.0]);
let b = Vector::from_slice(&[5.0, 6.0, 7.0, 8.0]);

// Element-wise operations (auto-selects best SIMD backend)
let sum = a.add(&b)?;       // [6.0, 8.0, 10.0, 12.0]
let product = a.mul(&b)?;   // [5.0, 12.0, 21.0, 32.0]
let dot = a.dot(&b)?;       // 70.0

// Reductions
let total = a.sum()?;       // 10.0
let average = a.mean()?;    // 2.5
}

Matrix Operations

#![allow(unused)]
fn main() {
use trueno::Matrix;

let a = Matrix::from_slice(2, 3, &[
    1.0, 2.0, 3.0,
    4.0, 5.0, 6.0,
]);

let b = Matrix::from_slice(3, 2, &[
    7.0, 8.0,
    9.0, 10.0,
    11.0, 12.0,
]);

// Matrix multiplication (SIMD-accelerated)
let c = a.matmul(&b)?;  // 2x2 result

// Transpose
let at = a.transpose();

// Eigendecomposition (symmetric matrices)
let eigen = matrix.symmetric_eigen()?;
}

Activation Functions

#![allow(unused)]
fn main() {
use trueno::activations::*;

let x = Vector::from_slice(&[-1.0, 0.0, 1.0, 2.0]);

// Neural network activations (SIMD-optimized)
let relu_out = relu(&x)?;      // [0.0, 0.0, 1.0, 2.0]
let sigmoid_out = sigmoid(&x)?;
let gelu_out = gelu(&x)?;
let swish_out = swish(&x)?;
let tanh_out = tanh_activation(&x)?;
}

Backend Selection

Trueno automatically selects the optimal backend based on:

Data size - GPU only for large workloads (>100K elements)
CPU features - AVX-512 > AVX2 > AVX > SSE2 > NEON
Operation complexity - Complex ops benefit more from GPU

#![allow(unused)]
fn main() {
use trueno::Backend;

// Auto-select (recommended)
let result = vector.add(&other)?;

// Force specific backend
let result = vector.add_with_backend(&other, Backend::Avx2)?;
let result = vector.add_with_backend(&other, Backend::GPU)?;
}

Backend Priority

Priority	Backend	Condition
1	GPU	Available + size > 100K
2	AVX-512	CPU supports
3	AVX2	CPU supports
4	AVX	CPU supports
5	SSE2	x86_64 baseline
6	NEON	ARM64
7	SIMD128	WASM
8	Scalar	Fallback

Simulation Testing Framework (v0.8.5+)

Trueno 0.8.5 introduces a comprehensive simulation testing framework based on Toyota Production System principles.

SimRng: Deterministic Random Number Generator

#![allow(unused)]
fn main() {
use trueno::simulation::SimRng;

// Deterministic PCG-based RNG
let mut rng = SimRng::new(42);  // Seed for reproducibility

// Generate deterministic random values
let value = rng.next_f32();           // [0.0, 1.0)
let int = rng.next_u32();             // Full u32 range
let range = rng.range(1.0, 10.0);     // Custom range
let normal = rng.normal(0.0, 1.0);    // Gaussian distribution

// Fork for parallel testing (maintains determinism)
let child_rng = rng.fork();
}

BackendSelector: Intelligent Backend Selection

#![allow(unused)]
fn main() {
use trueno::simulation::{BackendSelector, BackendThresholds};

let thresholds = BackendThresholds {
    gpu_min_elements: 100_000,
    simd_min_elements: 32,
};

let selector = BackendSelector::new(thresholds);
let backend = selector.select(data_size, op_complexity);
}

JidokaGuard: Stop-on-Defect Quality Checks

#![allow(unused)]
fn main() {
use trueno::simulation::JidokaGuard;

// Toyota-style quality gate - stops on first defect
let guard = JidokaGuard::new();

// Check for NaN/Inf values
guard.check_finite(&result)?;

// Custom invariant checking
guard.assert_invariant(|| value >= 0.0, "Value must be non-negative")?;
}

BufferRenderer: Visual Regression Testing

#![allow(unused)]
fn main() {
use trueno::simulation::{BufferRenderer, ColorPalette};

let renderer = BufferRenderer::new(800, 600);
let palette = ColorPalette::viridis();

// Render data to RGBA buffer for visual comparison
let buffer = renderer.render_heatmap(&data, &palette)?;

// Compare with golden baseline
let diff = renderer.compare_buffers(&buffer, &golden)?;
assert!(diff.max_error < 1e-5);
}

StressTestConfig: Stress Testing Infrastructure

#![allow(unused)]
fn main() {
use trueno::simulation::{StressTestConfig, StressTestResult};

let config = StressTestConfig {
    iterations: 10_000,
    data_size_range: 100..1_000_000,
    anomaly_threshold: 3.0,  // Standard deviations
};

let result = stress_test(&operation, &config)?;
assert!(result.anomaly_count == 0);
}

BackendTolerance: Cross-Backend Comparison

#![allow(unused)]
fn main() {
use trueno::simulation::BackendTolerance;

let tolerance = BackendTolerance::relaxed();

// Get tolerance for comparing results across backends
let tol = tolerance.for_backends(Backend::GPU, Backend::Scalar);
assert!((gpu_result - scalar_result).abs() < tol);
}

GPU Compute

Synchronous API

#![allow(unused)]
fn main() {
use trueno::gpu::GpuDevice;

let device = GpuDevice::new()?;

// Large matrix multiplication on GPU
let result = device.matmul(&a, &b)?;

// Batch operations
let results = device.batch_add(&vectors_a, &vectors_b)?;
}

Async API

#![allow(unused)]
fn main() {
use trueno::gpu::GpuDevice;

let device = GpuDevice::new()?;

// Non-blocking GPU operations
let future = device.matmul_async(&a, &b);
let result = future.await?;
}

NumPy Compatibility (via Batuta)

Trueno is the target for NumPy → Rust transpilation:

NumPy	Trueno
`np.array([1,2,3])`	`Vector::from_slice(&[1.0,2.0,3.0])`
`np.dot(a, b)`	`a.dot(&b)?`
`a + b`	`a.add(&b)?`
`a @ b`	`a.matmul(&b)?`
`np.sum(a)`	`a.sum()?`
`np.mean(a)`	`a.mean()?`

Performance

Expected speedups vs scalar baseline:

Operation	Size	SSE2	AVX2	AVX-512	GPU
add_f32	1K	2x	4x	8x	-
add_f32	100K	2x	4x	8x	3x
add_f32	1M	2x	4x	8x	10x
add_f32	10M	2x	4x	8x	50x
dot_product	1M	3x	6x	12x	20x
matmul	1K×1K	3x	6x	12x	30x

trueno-gpu - CUDA monitoring via NVML
trueno-db - High-performance vector database
trueno-graph - Graph analytics engine
trueno-viz - GPU-accelerated visualization
trueno-rag - RAG pipeline components

References

Navigate: Table of Contents | Previous: Foundation Libraries | Next: Aprender

Keyboard shortcuts

The Batuta Book