This chapter demonstrates the integration of trueno 0.8.8+ compute infrastructure with aprender's ML training pipeline.
The aprender::compute module provides ML-specific wrappers around trueno's simulation testing infrastructure, following Toyota Way principles:
Jidoka : Built-in quality - stop on defect (NaN/Inf detection)
Poka-Yoke : Mistake-proofing via type-safe backend selection
Heijunka : Leveled testing across compute backends
Feature Description Use Case
Backend Selection Auto CPU/GPU dispatch Optimize compute for data size
Training Guards NaN/Inf detection Training stability
Divergence Checking Cross-backend validation GPU correctness verification
Reproducibility Deterministic seeding Reproducible experiments
Automatically select the optimal compute backend based on data size:
use aprender::compute::{select_backend, should_use_gpu, BackendCategory};
// Auto-select backend
let category = select_backend(data.len(), gpu_available);
match category {
BackendCategory::SimdOnly => {
// N < 1,000: Pure SIMD (low overhead)
}
BackendCategory::SimdParallel => {
// 1,000 <= N < 100,000: SIMD + Rayon parallelism
}
BackendCategory::Gpu => {
// N >= 100,000: GPU compute (if available)
}
}
// Helper functions
if should_use_gpu(data.len()) {
// Offload to GPU
}
Data Size Backend Rationale
N < 1,000 SIMD Only Parallelization overhead exceeds benefit
1,000 <= N < 100,000 SIMD + Parallel Rayon parallelism beneficial
N >= 100,000 GPU GPU offload cost amortized
Detect numerical instabilities during training:
use aprender::compute::TrainingGuard;
let guard = TrainingGuard::new("epoch_1");
// After computing gradients
guard.check_gradients(&gradients)?;
// After weight update
guard.check_weights(&weights)?;
// After loss computation
guard.check_loss(loss)?;
Issue Cause Detection
NaN values 0/0, sqrt(-1), log(0) check_gradients(), check_weights()
Infinity Overflow, 1/0 check_gradients(), check_weights()
NaN loss Gradient explosion check_loss()
Infinite loss Numerical overflow check_loss()
use aprender::compute::TrainingGuard;
use aprender::error::AprenderError;
let guard = TrainingGuard::new("training_step_42");
match guard.check_gradients(&gradients) {
Ok(()) => {
// Continue training
}
Err(AprenderError::ValidationError { message }) => {
// Jidoka triggered - stop and investigate
eprintln!("Training stopped: {}", message);
// Example: "Jidoka: NaN in gradients at training_step_42:nan"
}
Err(e) => {
// Other error
}
}
Validate that different compute backends produce consistent results:
use aprender::compute::DivergenceGuard;
// Default ML tolerance (1e-5)
let guard = DivergenceGuard::default_tolerance("cpu_vs_gpu");
// Compare CPU and GPU results
let cpu_result = compute_on_cpu(&input);
let gpu_result = compute_on_gpu(&input);
guard.check(&cpu_result, &gpu_result)?;
// Custom tolerance for specific operations
let relaxed_guard = DivergenceGuard::new(0.01, "approximate_softmax");
relaxed_guard.check(&approx_result, &exact_result)?;
Operation Recommended Tolerance Rationale
Exact arithmetic 0.0 Bit-exact expected
FP32 operations 1e-5 IEEE 754 precision
Mixed precision 1e-4 FP16 accumulation
Approximate kernels 1e-2 Algorithmic differences
Ensure deterministic training with structured seeding:
use aprender::compute::ExperimentSeed;
// Derive all seeds from master
let seed = ExperimentSeed::from_master(42);
println!("Master: {}", seed.master);
println!("Data shuffle: {}", seed.data_shuffle);
println!("Weight init: {}", seed.weight_init);
println!("Dropout: {}", seed.dropout);
// Use in training
let mut rng_data = StdRng::seed_from_u64(seed.data_shuffle);
let mut rng_weights = StdRng::seed_from_u64(seed.weight_init);
let mut rng_dropout = StdRng::seed_from_u64(seed.dropout);
Seeds are derived deterministically using LCG multipliers:
Seed Derivation Use
masterInput Experiment identifier
data_shufflemaster * 6364136223846793005Dataset shuffling
weight_initmaster * 1442695040888963407Parameter initialization
dropoutmaster * 2685821657736338717Dropout/regularization
Function Description
select_backend(size, gpu_available)Returns recommended BackendCategory
should_use_gpu(size)Returns true if size >= 100,000
should_use_parallel(size)Returns true if size >= 1,000
Method Description
TrainingGuard::new(context)Create guard with context string
check_gradients(&[f32])Check for NaN/Inf in gradients
check_weights(&[f32])Check for NaN/Inf in weights
check_loss(f32)Check for NaN/Inf loss value
check_f64(&[f64], kind)Check f64 values
Method Description
DivergenceGuard::new(tolerance, context)Create with custom tolerance
DivergenceGuard::default_tolerance(context)Create with 1e-5 tolerance
check(&[f32], &[f32])Compare two result arrays
Method Description
ExperimentSeed::from_master(seed)Derive all seeds from master
ExperimentSeed::new(...)Create with explicit seeds
ExperimentSeed::default()Master seed = 42
cargo run --example trueno_compute_integration
Output demonstrates:
Backend Selection : Auto-dispatch based on data size
Training Guards : NaN/Inf detection (Jidoka triggered)
Divergence Checking : Cross-backend tolerance validation
Reproducibility : Deterministic seed derivation
use aprender::compute::{TrainingGuard, select_backend, ExperimentSeed};
fn train(data: &[f32], epochs: usize) -> Result<Vec<f32>> {
let seed = ExperimentSeed::from_master(42);
let backend = select_backend(data.len(), check_gpu_available());
let mut weights = initialize_weights(seed.weight_init);
for epoch in 0..epochs {
let guard = TrainingGuard::new(format!("epoch_{}", epoch));
// Forward pass
let output = forward(&weights, data);
// Backward pass
let gradients = backward(&output, data);
guard.check_gradients(&gradients)?;
// Update weights
update_weights(&mut weights, &gradients);
guard.check_weights(&weights)?;
// Compute loss
let loss = compute_loss(&output, data);
guard.check_loss(loss)?;
println!("Epoch {}: loss = {:.4}", epoch, loss);
}
Ok(weights)
}
Principle Implementation
Jidoka TrainingGuard stops on NaN/Inf
Poka-Yoke Type-safe BackendCategory selection
Genchi Genbutsu Detailed error context in guards
Heijunka Leveled backend thresholds