realizar: Inference Engine

Toyota Way Principle (Heijunka): Level the workload. Batch inference for consistent throughput and predictable latency.

Status: Complete

The Problem: Unpredictable Inference

Traditional inference systems suffer from:

# PyTorch inference - hidden non-determinism
model.eval()
with torch.no_grad():
    pred1 = model(x)
    pred2 = model(x)  # May differ due to dropout state!

realizar Solution: Deterministic Inference

┌─────────────────────────────────────────────────────────┐
│                  realizar Pipeline                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Input → Validate → Batch → Predict → Verify → Output  │
│    │         │        │        │        │        │     │
│    ↓         ↓        ↓        ↓        ↓        ↓     │
│  Typed   Bounds   Efficient  Exact   Tracked  Logged   │
│  Data    Check    Batches   Results  Bounds   Response │
│                                                         │
└─────────────────────────────────────────────────────────┘

Validation

Run all chapter examples:

make run-ch13      # Run inference example
make test-ch13     # Run all tests

Model Definition

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct Model {
    weights: Vec<f64>,
    bias: f64,
    config: InferenceConfig,
}

impl Model {
    fn new(weights: Vec<f64>, bias: f64) -> Self {
        Self {
            weights,
            bias,
            config: InferenceConfig::default(),
        }
    }
}
}

Single Prediction

#![allow(unused)]
fn main() {
fn predict(&self, x: &[f64]) -> f64 {
    let sum: f64 = self.weights.iter()
        .zip(x.iter())
        .map(|(w, xi)| w * xi)
        .sum();
    sum + self.bias
}
}

Batch Inference

For efficiency, process multiple inputs at once:

#![allow(unused)]
fn main() {
fn predict_batch(&self, batch: &[Vec<f64>]) -> Vec<f64> {
    batch.iter().map(|x| self.predict(x)).collect()
}
}

Example Output

   Input   │ Prediction
─────────┼───────────
[1.0, 1.0] │     6.0000
[2.0, 2.0] │    11.0000
[3.0, 3.0] │    16.0000

Uncertainty Quantification

Provide confidence bounds with predictions:

#![allow(unused)]
fn main() {
struct PredictionResult {
    value: f64,
    lower_bound: f64,
    upper_bound: f64,
}

fn predict_with_bounds(&self, x: &[f64], uncertainty: f64) -> PredictionResult {
    let prediction = self.predict(x);
    PredictionResult {
        value: prediction,
        lower_bound: prediction - uncertainty,
        upper_bound: prediction + uncertainty,
    }
}
}

Validation Against Targets

   x │   Target │       Bounds │ Hit?
─────┼──────────┼──────────────┼───────
 1.0 │     3.00 │ [2.50, 3.50] │ ✅
 2.0 │     5.00 │ [4.50, 5.50] │ ✅
 3.0 │     6.50 │ [6.50, 7.50] │ ✅
 4.0 │    10.00 │ [8.50, 9.50] │ ❌

Inference Engine

Manage multiple models:

#![allow(unused)]
fn main() {
struct InferenceEngine {
    models: Vec<(String, Model)>,
}

impl InferenceEngine {
    fn new() -> Self {
        Self { models: Vec::new() }
    }

    fn register_model(&mut self, name: &str, model: Model) {
        self.models.push((name.to_string(), model));
    }

    fn predict(&self, model_name: &str, x: &[f64]) -> Option<f64> {
        self.get_model(model_name).map(|m| m.predict(x))
    }
}
}

Determinism Guarantee

#![allow(unused)]
fn main() {
#[test]
fn test_inference_determinism() {
    let model = Model::new(vec![1.5, 2.5], 0.5);
    let input = vec![1.0, 2.0];

    let mut results = Vec::new();
    for _ in 0..10 {
        results.push(model.predict(&input));
    }

    let first = results[0];
    assert!(results.iter().all(|&r| (r - first).abs() < 1e-15),
        "Inference must be deterministic");
}
}

Result: All 10 runs produce identical results to 15 decimal places.

Configuration

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct InferenceConfig {
    batch_size: usize,
    num_threads: usize,
    precision: Precision,
}

#[derive(Debug, Clone, Copy, PartialEq)]
enum Precision {
    F32,
    F64,
}
}

EU AI Act Compliance

Article 10: Data Governance

Model weights fully specified
No external model loading
Inference data stays local

Article 13: Transparency

Predictions fully explainable
Uncertainty bounds provided
Model architecture visible

Article 15: Robustness

Deterministic predictions
Type-safe operations
Batch processing reliable

Comparison: realizar vs TensorFlow Serving

Aspect	TensorFlow Serving	realizar
Model format	SavedModel (opaque)	Rust struct (transparent)
Determinism	Approximate	Exact
Batching	Automatic	Explicit
Uncertainty	Not built-in	First-class support
Memory safety	C++ runtime	Rust ownership

Testing

#![allow(unused)]
fn main() {
#[test]
fn test_single_prediction() {
    let model = Model::new(vec![2.0], 1.0);
    let pred = model.predict(&[3.0]);
    // y = 2*3 + 1 = 7
    assert!((pred - 7.0).abs() < 1e-10);
}

#[test]
fn test_batch_prediction() {
    let model = Model::new(vec![2.0], 0.0);
    let batch = vec![vec![1.0], vec![2.0], vec![3.0]];
    let preds = model.predict_batch(&batch);

    assert_eq!(preds.len(), 3);
    assert!((preds[0] - 2.0).abs() < 1e-10);
    assert!((preds[1] - 4.0).abs() < 1e-10);
    assert!((preds[2] - 6.0).abs() < 1e-10);
}

#[test]
fn test_prediction_bounds() {
    let model = Model::new(vec![1.0], 0.0);
    let result = model.predict_with_bounds(&[5.0], 1.0);

    assert!(result.contains(5.0));
    assert!(result.contains(4.5));
    assert!(!result.contains(3.0));
}
}

Key Takeaways

Deterministic Inference: Same input always produces same output
Batch Processing: Efficient handling of multiple inputs
Uncertainty Bounds: Every prediction has confidence intervals
Model Registry: Manage multiple models in one engine
Type Safety: Compile-time guarantees on model operations

Next Steps

Chapter 14: entrenar - Distributed training
Chapter 15: trueno-db - Vector database

Source Code

Full implementation: examples/ch13-realizar/

# Verify all claims
make test-ch13

# Run examples
make run-ch13

Keyboard shortcuts

Sovereign AI Stack: EXTREME TDD for EU-Compliant AI Systems