Case Study: Model Serialization (.apr Format)

Save and load ML models with built-in quality: checksums, signatures, encryption, WASM compatibility.

Quick Start

use aprender::format::{save, load, ModelType, SaveOptions};
use aprender::linear_model::LinearRegression;

// Train model
let mut model = LinearRegression::new();
model.fit(&x, &y)?;

// Save
save(&model, ModelType::LinearRegression, "model.apr", SaveOptions::default())?;

// Load
let loaded: LinearRegression = load("model.apr", ModelType::LinearRegression)?;

WASM Compatibility (Hard Requirement)

The .apr format is designed for universal deployment. Every feature works in:

Native (Linux, macOS, Windows)
WASM (browsers, Cloudflare Workers, Vercel Edge)
Embedded (no_std with alloc)

// Same model works everywhere
#[cfg(target_arch = "wasm32")]
async fn load_in_browser() -> Result<LinearRegression> {
    let bytes = fetch("https://models.example.com/house-prices.apr").await?;
    load_from_bytes(&bytes, ModelType::LinearRegression)
}

#[cfg(not(target_arch = "wasm32"))]
fn load_native() -> Result<LinearRegression> {
    load("house-prices.apr", ModelType::LinearRegression)
}

Why this matters:

Train once, deploy anywhere
Browser-based ML demos
Edge inference (low latency)
Serverless functions

Format Structure

┌─────────────────────────────────────────┐
│ Header (32 bytes, fixed)                │ ← Magic, version, type, sizes
├─────────────────────────────────────────┤
│ Metadata (variable, MessagePack)        │ ← Hyperparameters, metrics
├─────────────────────────────────────────┤
│ Salt + Nonce (if ENCRYPTED)             │ ← Security parameters
├─────────────────────────────────────────┤
│ Payload (variable, compressed)          │ ← Model weights (bincode)
├─────────────────────────────────────────┤
│ Signature (if SIGNED)                   │ ← Ed25519 signature
├─────────────────────────────────────────┤
│ License (if LICENSED)                   │ ← Commercial protection
├─────────────────────────────────────────┤
│ Checksum (4 bytes, CRC32)               │ ← Integrity verification
└─────────────────────────────────────────┘

Built-in Quality (Jidoka)

CRC32 Checksum

Every .apr file has a CRC32 checksum. Corruption is detected immediately:

// Automatic verification on load
let model: LinearRegression = load("model.apr", ModelType::LinearRegression)?;
// If checksum fails: AprenderError::ChecksumMismatch { expected, actual }

Type Safety

Model type is encoded in header. Loading wrong type fails fast:

// Saved as LinearRegression
save(&lr_model, ModelType::LinearRegression, "lr.apr", opts)?;

// Attempt to load as KMeans - fails immediately
let result: Result<KMeans> = load("lr.apr", ModelType::KMeans);
// Error: "Model type mismatch: file contains LinearRegression, expected KMeans"

Metadata

Store hyperparameters, metrics, and custom data:

let options = SaveOptions::default()
    .with_name("house-price-predictor")
    .with_description("Trained on Boston Housing dataset");

// Add hyperparameters
options.metadata.hyperparameters.insert(
    "learning_rate".to_string(),
    serde_json::json!(0.01)
);

// Add metrics
options.metadata.metrics.insert(
    "r2_score".to_string(),
    serde_json::json!(0.95)
);

save(&model, ModelType::LinearRegression, "model.apr", options)?;

Inspection Without Loading

Check model info without deserializing weights:

use aprender::format::inspect;

let info = inspect("model.apr")?;
println!("Model type: {:?}", info.model_type);
println!("Format version: {}.{}", info.format_version.0, info.format_version.1);
println!("Payload size: {} bytes", info.payload_size);
println!("Created: {}", info.metadata.created_at);
println!("Encrypted: {}", info.encrypted);
println!("Signed: {}", info.signed);

Model Types

Value	Type	Use Case
0x0001	LinearRegression	Regression
0x0002	LogisticRegression	Binary classification
0x0003	DecisionTree	Interpretable classification
0x0004	RandomForest	Ensemble classification
0x0005	GradientBoosting	High-performance ensemble
0x0006	KMeans	Clustering
0x0007	Pca	Dimensionality reduction
0x0008	NaiveBayes	Probabilistic classification
0x0009	Knn	Distance-based classification
0x000A	Svm	Support vector machine
0x0010	NgramLm	Language modeling
0x0011	TfIdf	Text vectorization
0x0012	CountVectorizer	Bag of words
0x0020	NeuralSequential	Deep learning
0x0021	NeuralCustom	Custom architectures
0x0030	ContentRecommender	Recommendations
0x0040	MixtureOfExperts	Sparse/dense MoE ensembles
0x00FF	Custom	User-defined

Encryption (Feature: `format-encryption`)

Password-Based (Personal/Team)

use aprender::format::{save_encrypted, load_encrypted};

// Save with password (Argon2id + AES-256-GCM)
save_encrypted(&model, ModelType::LinearRegression, "secure.apr",
    SaveOptions::default(), "my-strong-password")?;

// Load with password
let model: LinearRegression = load_encrypted("secure.apr",
    ModelType::LinearRegression, "my-strong-password")?;

Security properties:

Argon2id: Memory-hard, GPU-resistant key derivation
AES-256-GCM: Authenticated encryption (detects tampering)
Random salt: Same password produces different ciphertexts

Recipient-Based (Commercial Distribution)

use aprender::format::{save_for_recipient, load_as_recipient};
use x25519_dalek::{PublicKey, StaticSecret};

// Generate buyer's keypair (done once by buyer)
let buyer_secret = StaticSecret::random_from_rng(&mut rng);
let buyer_public = PublicKey::from(&buyer_secret);

// Seller encrypts for buyer's public key (no password sharing!)
save_for_recipient(&model, ModelType::LinearRegression, "commercial.apr",
    SaveOptions::default(), &buyer_public)?;

// Only buyer's secret key can decrypt
let model: LinearRegression = load_as_recipient("commercial.apr",
    ModelType::LinearRegression, &buyer_secret)?;

Benefits:

No password sharing required
Cryptographically bound to buyer (non-transferable)
Forward secrecy via ephemeral sender keys
Perfect for model marketplaces

Digital Signatures (Feature: `format-signing`)

Verify model provenance:

use aprender::format::{save_signed, load_verified};
use ed25519_dalek::{SigningKey, VerifyingKey};

// Generate seller's keypair (done once)
let signing_key = SigningKey::generate(&mut rng);
let verifying_key = VerifyingKey::from(&signing_key);

// Sign model with private key
save_signed(&model, ModelType::LinearRegression, "signed.apr",
    SaveOptions::default(), &signing_key)?;

// Verify signature before loading (reject tampering)
let model: LinearRegression = load_verified("signed.apr",
    ModelType::LinearRegression, Some(&verifying_key))?;

Use cases:

Model marketplaces (verify seller identity)
Compliance (audit trail)
Supply chain security

Compression (Feature: `format-compression`)

use aprender::format::{Compression, SaveOptions};

let options = SaveOptions::default()
    .with_compression(Compression::ZstdDefault);  // Level 3, good balance

// Or maximum compression for archival
let archival = SaveOptions::default()
    .with_compression(Compression::ZstdMax);  // Level 19

Algorithm	Ratio	Speed	Use Case
None	1:1	Instant	Debugging
ZstdDefault	~3:1	Fast	Distribution
ZstdMax	~4:1	Slow	Archival
LZ4	~2:1	Very fast	Streaming

WASM Loading Patterns

Browser (Fetch API)

#[cfg(target_arch = "wasm32")]
pub async fn load_from_url<M: DeserializeOwned>(
    url: &str,
    model_type: ModelType,
) -> Result<M> {
    let response = fetch(url).await?;
    let bytes = response.bytes().await?;
    load_from_bytes(&bytes, model_type)
}

// Usage
let model = load_from_url::<LinearRegression>(
    "https://models.example.com/house-prices.apr",
    ModelType::LinearRegression
).await?;

IndexedDB Cache

#[cfg(target_arch = "wasm32")]
pub async fn load_cached<M: DeserializeOwned>(
    cache_key: &str,
    url: &str,
    model_type: ModelType,
) -> Result<M> {
    // Try cache first
    if let Some(bytes) = idb_get(cache_key).await? {
        return load_from_bytes(&bytes, model_type);
    }

    // Fetch and cache
    let bytes = fetch(url).await?.bytes().await?;
    idb_set(cache_key, &bytes).await?;
    load_from_bytes(&bytes, model_type)
}

Graceful Degradation

Some features are native-only (STREAMING, TRUENO_NATIVE). In WASM, they're silently ignored:

// This works in both native and WASM
let options = SaveOptions::default()
    .with_compression(Compression::ZstdDefault)  // Works everywhere
    .with_streaming(true);  // Ignored in WASM, no error

// WASM: loads via in-memory path
// Native: uses mmap for large models
let model: LinearRegression = load("model.apr", ModelType::LinearRegression)?;

Ecosystem Integration

The .apr format coordinates with alimentar's .ald dataset format:

Training Pipeline (Native):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ dataset.ald │ → │  aprender   │ → │  model.apr  │
│ (alimentar) │    │  training   │    │  (aprender) │
└─────────────┘    └─────────────┘    └─────────────┘

Inference Pipeline (WASM):
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Fetch .apr  │ → │   aprender  │ → │ Prediction  │
│ from CDN    │    │  inference  │    │ in browser  │
└─────────────┘    └─────────────┘    └─────────────┘

Shared properties:

Same crypto stack (aes-gcm, ed25519-dalek, x25519-dalek)
Same WASM compatibility requirements
Same Toyota Way principles (Jidoka, checksums, signatures)

Private Inference (HIPAA/GDPR)

For sensitive data, use bidirectional encryption:

// Model publishes public key in metadata
let info = inspect("medical-model.apr")?;
let model_pub_key = info.metadata.custom.get("inference_pub_key");

// User encrypts input with model's public key
let encrypted_input = encrypt_for_model(&patient_data, model_pub_key)?;

// Send encrypted_input to model owner
// Model owner decrypts, runs inference, encrypts response with user's public key
// Only user can decrypt the prediction

Use cases:

HIPAA-compliant medical inference
GDPR-compliant EU data processing
Financial data analysis
Zero-trust ML APIs

Toyota Way Principles

Principle	Implementation
Jidoka	CRC32 checksum stops on corruption
Jidoka	Type verification stops on mismatch
Jidoka	Signature verification stops on tampering
Jidoka	Decryption fails on wrong key (authenticated)
Genchi Genbutsu	`inspect()` to see actual file contents
Kaizen	Semantic versioning for format evolution
Heijunka	Graceful degradation (WASM ignores native-only flags)

Error Handling

use aprender::error::AprenderError;

match load::<LinearRegression>("model.apr", ModelType::LinearRegression) {
    Ok(model) => { /* use model */ },
    Err(AprenderError::ChecksumMismatch { expected, actual }) => {
        eprintln!("File corrupted: expected {:08X}, got {:08X}", expected, actual);
    },
    Err(AprenderError::ModelTypeMismatch { expected, found }) => {
        eprintln!("Wrong model type: expected {:?}, found {:?}", expected, found);
    },
    Err(AprenderError::SignatureInvalid) => {
        eprintln!("Signature verification failed - model may be tampered");
    },
    Err(AprenderError::DecryptionFailed) => {
        eprintln!("Decryption failed - wrong password or key");
    },
    Err(AprenderError::UnsupportedVersion { found, supported }) => {
        eprintln!("Version {}.{} not supported (max {}.{})",
            found.0, found.1, supported.0, supported.1);
    },
    Err(e) => eprintln!("Error: {}", e),
}

Feature Flags

Feature	Crates Added	Binary Size	WASM
(core)	bincode, rmp-serde	~60KB	✓
`format-compression`	zstd	+250KB	✓
`format-signing`	ed25519-dalek	+150KB	✓
`format-encryption`	aes-gcm, argon2, x25519-dalek, hkdf, sha2	+180KB	✓

# Cargo.toml
[dependencies]
aprender = { version = "0.9", features = ["format-encryption", "format-signing"] }

Single Binary Deployment

The .apr format's killer feature: embed models directly in your executable.

The Pattern

// Embed model at compile time - zero runtime dependencies
const MODEL: &[u8] = include_bytes!("sentiment.apr");

fn main() -> Result<()> {
    let model: LogisticRegression = load_from_bytes(MODEL, ModelType::LogisticRegression)?;

    // SIMD inference immediately available
    let prediction = model.predict(&features)?;
}

Build and deploy:

cargo build --release --target aarch64-unknown-linux-gnu
# Output: single 5MB binary with model embedded
./app  # Runs anywhere, NEON SIMD active on ARM

Why This Matters

Metric	Docker + Python	aprender Binary
Cold start	5-30 seconds	<100ms
Memory	500MB - 2GB	10-50MB
Dependencies	Python, PyTorch, etc.	None
Artifacts	5-20 files	1 file

AWS Lambda ARM (Graviton)

Based on ruchy-lambda research: blocking I/O achieves 7.69ms cold start.

const MODEL: &[u8] = include_bytes!("classifier.apr");

fn main() {
    let model: LogisticRegression = load_from_bytes(MODEL, ModelType::LogisticRegression)
        .expect("embedded model valid");

    // Lambda Runtime API loop (blocking, no tokio)
    loop {
        let event = get_next_event();           // blocking GET
        let pred = model.predict(&event.data);  // NEON SIMD
        send_response(pred);                    // blocking POST
    }
}

Performance: 128MB ARM64, <10ms cold start, ~$0.0000002/request.

Deployment Targets

Target	Binary	SIMD	Use Case
`x86_64-unknown-linux-gnu`	~5MB	AVX2/512	Lambda x86, servers
`aarch64-unknown-linux-gnu`	~4MB	NEON	Lambda ARM, RPi
`wasm32-unknown-unknown`	~500KB	-	Browser, Workers

Quantization

Reduce model size 4-8x with integer weights (GGUF-compatible).

Quick Start

# Quantize existing model
apr quantize model.apr --type q4_0 --output model-q4.apr

# Inspect
apr inspect model-q4.apr --quantization
# Type: Q4_0, Block size: 32, Bits/weight: 4.5

Types (GGUF Standard)

Type	Bits	Block	Use Case
Q8_0	8	32	High accuracy
Q4_0	4	32	Balanced
Q4_1	4	32	Better accuracy

API

use aprender::format::{QuantType, save_quantized};

// Quantize and save
let quantized = model.quantize(QuantType::Q4_0)?;
save(&quantized, ModelType::NeuralSequential, "model-q4.apr", opts)?;

Export

# To GGUF (llama.cpp compatible)
apr export model-q4.apr --format gguf --output model.gguf

# To SafeTensors (HuggingFace)
apr export model-q4.apr --format safetensors --output model/

Knowledge Distillation

Train smaller models from larger teachers with full provenance tracking.

The Pipeline

# 1. Distill 7B → 1B
apr distill teacher-7b.apr --output student-1b.apr \
    --temperature 3.0 --alpha 0.7

# 2. Quantize
apr quantize student-1b.apr --type q4_0 --output student-q4.apr

# 3. Embed in binary
# include_bytes!("student-q4.apr")

Size reduction:

Stage	Size	Reduction
Teacher (7B, FP32)	28 GB	baseline
Student (1B, FP32)	4 GB	7x
Student (Q4_0)	500 MB	56x
+ Zstd	400 MB	70x

Provenance

Every distilled model stores teacher information:

let info = inspect("student.apr")?;
let distill = info.distillation.unwrap();

println!("Teacher: {}", distill.teacher.hash);      // SHA256
println!("Method: {:?}", distill.method);           // Standard/Progressive/Ensemble
println!("Temperature: {}", distill.params.temperature);
println!("Final loss: {}", distill.params.final_loss);

Methods

Method	Description
Standard	KL divergence on final logits
Progressive	Layer-wise intermediate matching
Ensemble	Multiple teachers averaged

# Progressive distillation with layer mapping
apr distill teacher.apr --output student.apr \
    --method progressive --layer-map "0:0,1:2,2:4"

# Ensemble from multiple teachers
apr distill teacher1.apr teacher2.apr teacher3.apr \
    --output student.apr --method ensemble

Complete SLM Pipeline

End-to-end: large model → edge deployment.

┌──────────────────┐
│ LLaMA 7B (28GB)  │  Teacher model
└────────┬─────────┘
         │ distill (entrenar)
         ▼
┌──────────────────┐
│ Student 1B (4GB) │  Knowledge transferred
└────────┬─────────┘
         │ quantize (Q4_0)
         ▼
┌──────────────────┐
│ Quantized (500MB)│  4-bit weights
└────────┬─────────┘
         │ compress (zstd)
         ▼
┌──────────────────┐
│ Compressed (400MB)│ 70x smaller
└────────┬─────────┘
         │ embed (include_bytes!)
         ▼
┌──────────────────┐
│ Single Binary    │  Deploy anywhere
│ ARM NEON SIMD    │  <10ms cold start
│ 2GB RAM device   │  $0.0000002/req
└──────────────────┘

Cargo.toml for minimal binary:

[profile.release]
lto = true
codegen-units = 1
panic = "abort"
strip = true
opt-level = "z"

Mixture of Experts (MoE)

MoE models use bundled persistence - a single .apr file contains the gating network and all experts:

model.apr
├── Header (ModelType::MixtureOfExperts = 0x0040)
├── Metadata (MoeConfig)
└── Payload
    ├── Gating Network
    └── Experts[0..n]

use aprender::ensemble::{MixtureOfExperts, MoeConfig, SoftmaxGating};

// Build MoE
let moe = MixtureOfExperts::builder()
    .gating(SoftmaxGating::new(n_features, n_experts))
    .expert(expert_0)
    .expert(expert_1)
    .expert(expert_2)
    .config(MoeConfig::default().with_top_k(2))
    .build()?;

// Save bundled (single file)
moe.save_apr("model.apr")?;

// Load
let loaded = MixtureOfExperts::<MyExpert, SoftmaxGating>::load("model.apr")?;

Benefits:

Atomic save/load (no partial states)
Single file deployment
Checksummed integrity

See Case Study: Mixture of Experts for full API documentation.

Specification

Full specification: docs/specifications/model-format-spec.md

Key properties:

Pure Rust (Sovereign AI, zero C/C++ dependencies)
WASM compatibility (hard requirement, spec §1.0)
Single binary deployment (spec §1.1)
GGUF-compatible quantization (spec §6.2)
Knowledge distillation provenance (spec §6.3)
MoE bundled architecture (spec §6.4)
32-byte fixed header for fast scanning
MessagePack metadata (compact, fast)
bincode payload (zero-copy potential)
CRC32 integrity, Ed25519 signatures, AES-256-GCM encryption
trueno-native mode for zero-copy SIMD inference (native only)

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning