Case Study: Model Serialization (.apr Format)

Save and load ML models with built-in quality: checksums, signatures, encryption, WASM compatibility.

Quick Start

use aprender::format::{save, load, ModelType, SaveOptions};
use aprender::linear_model::LinearRegression;

// Train model
let mut model = LinearRegression::new();
model.fit(&x, &y)?;

// Save
save(&model, ModelType::LinearRegression, "model.apr", SaveOptions::default())?;

// Load
let loaded: LinearRegression = load("model.apr", ModelType::LinearRegression)?;

WASM Compatibility (Hard Requirement)

The .apr format is designed for universal deployment. Every feature works in:

  • Native (Linux, macOS, Windows)
  • WASM (browsers, Cloudflare Workers, Vercel Edge)
  • Embedded (no_std with alloc)
// Same model works everywhere
#[cfg(target_arch = "wasm32")]
async fn load_in_browser() -> Result<LinearRegression> {
    let bytes = fetch("https://models.example.com/house-prices.apr").await?;
    load_from_bytes(&bytes, ModelType::LinearRegression)
}

#[cfg(not(target_arch = "wasm32"))]
fn load_native() -> Result<LinearRegression> {
    load("house-prices.apr", ModelType::LinearRegression)
}

Why this matters:

  • Train once, deploy anywhere
  • Browser-based ML demos
  • Edge inference (low latency)
  • Serverless functions

Format Structure

┌─────────────────────────────────────────┐
│ Header (32 bytes, fixed)                │ ← Magic, version, type, sizes
├─────────────────────────────────────────┤
│ Metadata (variable, MessagePack)        │ ← Hyperparameters, metrics
├─────────────────────────────────────────┤
│ Salt + Nonce (if ENCRYPTED)             │ ← Security parameters
├─────────────────────────────────────────┤
│ Payload (variable, compressed)          │ ← Model weights (bincode)
├─────────────────────────────────────────┤
│ Signature (if SIGNED)                   │ ← Ed25519 signature
├─────────────────────────────────────────┤
│ License (if LICENSED)                   │ ← Commercial protection
├─────────────────────────────────────────┤
│ Checksum (4 bytes, CRC32)               │ ← Integrity verification
└─────────────────────────────────────────┘

Built-in Quality (Jidoka)

CRC32 Checksum

Every .apr file has a CRC32 checksum. Corruption is detected immediately:

// Automatic verification on load
let model: LinearRegression = load("model.apr", ModelType::LinearRegression)?;
// If checksum fails: AprenderError::ChecksumMismatch { expected, actual }

Type Safety

Model type is encoded in header. Loading wrong type fails fast:

// Saved as LinearRegression
save(&lr_model, ModelType::LinearRegression, "lr.apr", opts)?;

// Attempt to load as KMeans - fails immediately
let result: Result<KMeans> = load("lr.apr", ModelType::KMeans);
// Error: "Model type mismatch: file contains LinearRegression, expected KMeans"

Metadata

Store hyperparameters, metrics, and custom data:

let options = SaveOptions::default()
    .with_name("house-price-predictor")
    .with_description("Trained on Boston Housing dataset");

// Add hyperparameters
options.metadata.hyperparameters.insert(
    "learning_rate".to_string(),
    serde_json::json!(0.01)
);

// Add metrics
options.metadata.metrics.insert(
    "r2_score".to_string(),
    serde_json::json!(0.95)
);

save(&model, ModelType::LinearRegression, "model.apr", options)?;

Inspection Without Loading

Check model info without deserializing weights:

use aprender::format::inspect;

let info = inspect("model.apr")?;
println!("Model type: {:?}", info.model_type);
println!("Format version: {}.{}", info.format_version.0, info.format_version.1);
println!("Payload size: {} bytes", info.payload_size);
println!("Created: {}", info.metadata.created_at);
println!("Encrypted: {}", info.encrypted);
println!("Signed: {}", info.signed);

Model Types

ValueTypeUse Case
0x0001LinearRegressionRegression
0x0002LogisticRegressionBinary classification
0x0003DecisionTreeInterpretable classification
0x0004RandomForestEnsemble classification
0x0005GradientBoostingHigh-performance ensemble
0x0006KMeansClustering
0x0007PcaDimensionality reduction
0x0008NaiveBayesProbabilistic classification
0x0009KnnDistance-based classification
0x000ASvmSupport vector machine
0x0010NgramLmLanguage modeling
0x0011TfIdfText vectorization
0x0012CountVectorizerBag of words
0x0020NeuralSequentialDeep learning
0x0021NeuralCustomCustom architectures
0x0030ContentRecommenderRecommendations
0x0040MixtureOfExpertsSparse/dense MoE ensembles
0x00FFCustomUser-defined

Encryption (Feature: format-encryption)

Password-Based (Personal/Team)

use aprender::format::{save_encrypted, load_encrypted};

// Save with password (Argon2id + AES-256-GCM)
save_encrypted(&model, ModelType::LinearRegression, "secure.apr",
    SaveOptions::default(), "my-strong-password")?;

// Load with password
let model: LinearRegression = load_encrypted("secure.apr",
    ModelType::LinearRegression, "my-strong-password")?;

Security properties:

  • Argon2id: Memory-hard, GPU-resistant key derivation
  • AES-256-GCM: Authenticated encryption (detects tampering)
  • Random salt: Same password produces different ciphertexts

Recipient-Based (Commercial Distribution)

use aprender::format::{save_for_recipient, load_as_recipient};
use x25519_dalek::{PublicKey, StaticSecret};

// Generate buyer's keypair (done once by buyer)
let buyer_secret = StaticSecret::random_from_rng(&mut rng);
let buyer_public = PublicKey::from(&buyer_secret);

// Seller encrypts for buyer's public key (no password sharing!)
save_for_recipient(&model, ModelType::LinearRegression, "commercial.apr",
    SaveOptions::default(), &buyer_public)?;

// Only buyer's secret key can decrypt
let model: LinearRegression = load_as_recipient("commercial.apr",
    ModelType::LinearRegression, &buyer_secret)?;

Benefits:

  • No password sharing required
  • Cryptographically bound to buyer (non-transferable)
  • Forward secrecy via ephemeral sender keys
  • Perfect for model marketplaces

Digital Signatures (Feature: format-signing)

Verify model provenance:

use aprender::format::{save_signed, load_verified};
use ed25519_dalek::{SigningKey, VerifyingKey};

// Generate seller's keypair (done once)
let signing_key = SigningKey::generate(&mut rng);
let verifying_key = VerifyingKey::from(&signing_key);

// Sign model with private key
save_signed(&model, ModelType::LinearRegression, "signed.apr",
    SaveOptions::default(), &signing_key)?;

// Verify signature before loading (reject tampering)
let model: LinearRegression = load_verified("signed.apr",
    ModelType::LinearRegression, Some(&verifying_key))?;

Use cases:

  • Model marketplaces (verify seller identity)
  • Compliance (audit trail)
  • Supply chain security

Compression (Feature: format-compression)

use aprender::format::{Compression, SaveOptions};

let options = SaveOptions::default()
    .with_compression(Compression::ZstdDefault);  // Level 3, good balance

// Or maximum compression for archival
let archival = SaveOptions::default()
    .with_compression(Compression::ZstdMax);  // Level 19
AlgorithmRatioSpeedUse Case
None1:1InstantDebugging
ZstdDefault~3:1FastDistribution
ZstdMax~4:1SlowArchival
LZ4~2:1Very fastStreaming

WASM Loading Patterns

Browser (Fetch API)

#[cfg(target_arch = "wasm32")]
pub async fn load_from_url<M: DeserializeOwned>(
    url: &str,
    model_type: ModelType,
) -> Result<M> {
    let response = fetch(url).await?;
    let bytes = response.bytes().await?;
    load_from_bytes(&bytes, model_type)
}

// Usage
let model = load_from_url::<LinearRegression>(
    "https://models.example.com/house-prices.apr",
    ModelType::LinearRegression
).await?;

IndexedDB Cache

#[cfg(target_arch = "wasm32")]
pub async fn load_cached<M: DeserializeOwned>(
    cache_key: &str,
    url: &str,
    model_type: ModelType,
) -> Result<M> {
    // Try cache first
    if let Some(bytes) = idb_get(cache_key).await? {
        return load_from_bytes(&bytes, model_type);
    }

    // Fetch and cache
    let bytes = fetch(url).await?.bytes().await?;
    idb_set(cache_key, &bytes).await?;
    load_from_bytes(&bytes, model_type)
}

Graceful Degradation

Some features are native-only (STREAMING, TRUENO_NATIVE). In WASM, they're silently ignored:

// This works in both native and WASM
let options = SaveOptions::default()
    .with_compression(Compression::ZstdDefault)  // Works everywhere
    .with_streaming(true);  // Ignored in WASM, no error

// WASM: loads via in-memory path
// Native: uses mmap for large models
let model: LinearRegression = load("model.apr", ModelType::LinearRegression)?;

Ecosystem Integration

The .apr format coordinates with alimentar's .ald dataset format:

Training Pipeline (Native):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ dataset.ald │ → │  aprender   │ → │  model.apr  │
│ (alimentar) │    │  training   │    │  (aprender) │
└─────────────┘    └─────────────┘    └─────────────┘

Inference Pipeline (WASM):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Fetch .apr  │ → │   aprender  │ → │ Prediction  │
│ from CDN    │    │  inference  │    │ in browser  │
└─────────────┘    └─────────────┘    └─────────────┘

Shared properties:

  • Same crypto stack (aes-gcm, ed25519-dalek, x25519-dalek)
  • Same WASM compatibility requirements
  • Same Toyota Way principles (Jidoka, checksums, signatures)

Private Inference (HIPAA/GDPR)

For sensitive data, use bidirectional encryption:

// Model publishes public key in metadata
let info = inspect("medical-model.apr")?;
let model_pub_key = info.metadata.custom.get("inference_pub_key");

// User encrypts input with model's public key
let encrypted_input = encrypt_for_model(&patient_data, model_pub_key)?;

// Send encrypted_input to model owner
// Model owner decrypts, runs inference, encrypts response with user's public key
// Only user can decrypt the prediction

Use cases:

  • HIPAA-compliant medical inference
  • GDPR-compliant EU data processing
  • Financial data analysis
  • Zero-trust ML APIs

Toyota Way Principles

PrincipleImplementation
JidokaCRC32 checksum stops on corruption
JidokaType verification stops on mismatch
JidokaSignature verification stops on tampering
JidokaDecryption fails on wrong key (authenticated)
Genchi Genbutsuinspect() to see actual file contents
KaizenSemantic versioning for format evolution
HeijunkaGraceful degradation (WASM ignores native-only flags)

Error Handling

use aprender::error::AprenderError;

match load::<LinearRegression>("model.apr", ModelType::LinearRegression) {
    Ok(model) => { /* use model */ },
    Err(AprenderError::ChecksumMismatch { expected, actual }) => {
        eprintln!("File corrupted: expected {:08X}, got {:08X}", expected, actual);
    },
    Err(AprenderError::ModelTypeMismatch { expected, found }) => {
        eprintln!("Wrong model type: expected {:?}, found {:?}", expected, found);
    },
    Err(AprenderError::SignatureInvalid) => {
        eprintln!("Signature verification failed - model may be tampered");
    },
    Err(AprenderError::DecryptionFailed) => {
        eprintln!("Decryption failed - wrong password or key");
    },
    Err(AprenderError::UnsupportedVersion { found, supported }) => {
        eprintln!("Version {}.{} not supported (max {}.{})",
            found.0, found.1, supported.0, supported.1);
    },
    Err(e) => eprintln!("Error: {}", e),
}

Feature Flags

FeatureCrates AddedBinary SizeWASM
(core)bincode, rmp-serde~60KB
format-compressionzstd+250KB
format-signinged25519-dalek+150KB
format-encryptionaes-gcm, argon2, x25519-dalek, hkdf, sha2+180KB
# Cargo.toml
[dependencies]
aprender = { version = "0.9", features = ["format-encryption", "format-signing"] }

Single Binary Deployment

The .apr format's killer feature: embed models directly in your executable.

The Pattern

// Embed model at compile time - zero runtime dependencies
const MODEL: &[u8] = include_bytes!("sentiment.apr");

fn main() -> Result<()> {
    let model: LogisticRegression = load_from_bytes(MODEL, ModelType::LogisticRegression)?;

    // SIMD inference immediately available
    let prediction = model.predict(&features)?;
}

Build and deploy:

cargo build --release --target aarch64-unknown-linux-gnu
# Output: single 5MB binary with model embedded
./app  # Runs anywhere, NEON SIMD active on ARM

Why This Matters

MetricDocker + Pythonaprender Binary
Cold start5-30 seconds<100ms
Memory500MB - 2GB10-50MB
DependenciesPython, PyTorch, etc.None
Artifacts5-20 files1 file

AWS Lambda ARM (Graviton)

Based on ruchy-lambda research: blocking I/O achieves 7.69ms cold start.

const MODEL: &[u8] = include_bytes!("classifier.apr");

fn main() {
    let model: LogisticRegression = load_from_bytes(MODEL, ModelType::LogisticRegression)
        .expect("embedded model valid");

    // Lambda Runtime API loop (blocking, no tokio)
    loop {
        let event = get_next_event();           // blocking GET
        let pred = model.predict(&event.data);  // NEON SIMD
        send_response(pred);                    // blocking POST
    }
}

Performance: 128MB ARM64, <10ms cold start, ~$0.0000002/request.

Deployment Targets

TargetBinarySIMDUse Case
x86_64-unknown-linux-gnu~5MBAVX2/512Lambda x86, servers
aarch64-unknown-linux-gnu~4MBNEONLambda ARM, RPi
wasm32-unknown-unknown~500KB-Browser, Workers

Quantization

Reduce model size 4-8x with integer weights (GGUF-compatible).

Quick Start

# Quantize existing model
apr quantize model.apr --type q4_0 --output model-q4.apr

# Inspect
apr inspect model-q4.apr --quantization
# Type: Q4_0, Block size: 32, Bits/weight: 4.5

Types (GGUF Standard)

TypeBitsBlockUse Case
Q8_0832High accuracy
Q4_0432Balanced
Q4_1432Better accuracy

API

use aprender::format::{QuantType, save_quantized};

// Quantize and save
let quantized = model.quantize(QuantType::Q4_0)?;
save(&quantized, ModelType::NeuralSequential, "model-q4.apr", opts)?;

Export

# To GGUF (llama.cpp compatible)
apr export model-q4.apr --format gguf --output model.gguf

# To SafeTensors (HuggingFace)
apr export model-q4.apr --format safetensors --output model/

Knowledge Distillation

Train smaller models from larger teachers with full provenance tracking.

The Pipeline

# 1. Distill 7B → 1B
apr distill teacher-7b.apr --output student-1b.apr \
    --temperature 3.0 --alpha 0.7

# 2. Quantize
apr quantize student-1b.apr --type q4_0 --output student-q4.apr

# 3. Embed in binary
# include_bytes!("student-q4.apr")

Size reduction:

StageSizeReduction
Teacher (7B, FP32)28 GBbaseline
Student (1B, FP32)4 GB7x
Student (Q4_0)500 MB56x
+ Zstd400 MB70x

Provenance

Every distilled model stores teacher information:

let info = inspect("student.apr")?;
let distill = info.distillation.unwrap();

println!("Teacher: {}", distill.teacher.hash);      // SHA256
println!("Method: {:?}", distill.method);           // Standard/Progressive/Ensemble
println!("Temperature: {}", distill.params.temperature);
println!("Final loss: {}", distill.params.final_loss);

Methods

MethodDescription
StandardKL divergence on final logits
ProgressiveLayer-wise intermediate matching
EnsembleMultiple teachers averaged
# Progressive distillation with layer mapping
apr distill teacher.apr --output student.apr \
    --method progressive --layer-map "0:0,1:2,2:4"

# Ensemble from multiple teachers
apr distill teacher1.apr teacher2.apr teacher3.apr \
    --output student.apr --method ensemble

Complete SLM Pipeline

End-to-end: large model → edge deployment.

┌──────────────────┐
│ LLaMA 7B (28GB)  │  Teacher model
└────────┬─────────┘
         │ distill (entrenar)
         ▼
┌──────────────────┐
│ Student 1B (4GB) │  Knowledge transferred
└────────┬─────────┘
         │ quantize (Q4_0)
         ▼
┌──────────────────┐
│ Quantized (500MB)│  4-bit weights
└────────┬─────────┘
         │ compress (zstd)
         ▼
┌──────────────────┐
│ Compressed (400MB)│ 70x smaller
└────────┬─────────┘
         │ embed (include_bytes!)
         ▼
┌──────────────────┐
│ Single Binary    │  Deploy anywhere
│ ARM NEON SIMD    │  <10ms cold start
│ 2GB RAM device   │  $0.0000002/req
└──────────────────┘

Cargo.toml for minimal binary:

[profile.release]
lto = true
codegen-units = 1
panic = "abort"
strip = true
opt-level = "z"

Mixture of Experts (MoE)

MoE models use bundled persistence - a single .apr file contains the gating network and all experts:

model.apr
├── Header (ModelType::MixtureOfExperts = 0x0040)
├── Metadata (MoeConfig)
└── Payload
    ├── Gating Network
    └── Experts[0..n]
use aprender::ensemble::{MixtureOfExperts, MoeConfig, SoftmaxGating};

// Build MoE
let moe = MixtureOfExperts::builder()
    .gating(SoftmaxGating::new(n_features, n_experts))
    .expert(expert_0)
    .expert(expert_1)
    .expert(expert_2)
    .config(MoeConfig::default().with_top_k(2))
    .build()?;

// Save bundled (single file)
moe.save_apr("model.apr")?;

// Load
let loaded = MixtureOfExperts::<MyExpert, SoftmaxGating>::load("model.apr")?;

Benefits:

  • Atomic save/load (no partial states)
  • Single file deployment
  • Checksummed integrity

See Case Study: Mixture of Experts for full API documentation.

Specification

Full specification: docs/specifications/model-format-spec.md

Key properties:

  • Pure Rust (Sovereign AI, zero C/C++ dependencies)
  • WASM compatibility (hard requirement, spec §1.0)
  • Single binary deployment (spec §1.1)
  • GGUF-compatible quantization (spec §6.2)
  • Knowledge distillation provenance (spec §6.3)
  • MoE bundled architecture (spec §6.4)
  • 32-byte fixed header for fast scanning
  • MessagePack metadata (compact, fast)
  • bincode payload (zero-copy potential)
  • CRC32 integrity, Ed25519 signatures, AES-256-GCM encryption
  • trueno-native mode for zero-copy SIMD inference (native only)