APR Complete Specification

Version: 2.0.0-draft Status: Draft Created: 2025-12-16 GitHub Issue: https://github.com/paiml/aprender/issues/119


Table of Contents

  1. Abstract
  2. Design Principles
  3. APR v2 Format
  4. CLI Operations
  5. Auxiliary Data Patterns
  6. Format Comparison
  7. Error Handling
  8. Configuration
  9. Quality Gates
  10. Multi-Format Conversion Specification
  11. Conversion QA Checklist (25 Points)
  12. Automated Conversion Validation
  13. Falsification QA Checklist (Legacy)
  14. Implementation Roadmap
  15. References
  16. Appendices

1. Abstract

APR (Aprender Portable Representation) is a WASM-first model serialization format for machine learning models. This specification covers:

  • APR v2 Format: Binary format supporting web-scale models (10B+ parameters) with tensor alignment, LZ4 streaming compression, and multi-file sharding
  • CLI Operations: Comprehensive tooling for inspect, debug, trace, export, convert, import, merge, diff, and validate operations
  • Auxiliary Data: Patterns for storing vocabulary, tokenizer config, mel filterbanks, and other model-specific data

2. Design Principles

2.1 WASM-First Design

  1. WASM-first: Must work in wasm32-unknown-unknown without Emscripten
  2. Progressive enhancement: Features degrade gracefully (mmap → heap, compression → raw)
  3. Backward compatibility: APR1 files remain readable
  4. Zero-copy where possible: Alignment enables direct tensor access
  5. Streaming: Support chunked loading for large models

2.2 Toyota Way Alignment

PrincipleApplication
Genchi GenbutsuGo and see the actual model data, not abstractions
VisualizationMake model internals visible for debugging
JidokaStop on quality issues (corrupted models, NaN weights)
KaizenContinuous improvement via diff and merge operations
StandardizationConsistent CLI interface across all operations

3. APR v2 Format

3.1 Format Overview

┌─────────────────────────────────────────────────────────────┐
│ Header (32 bytes, aligned)                                  │
├─────────────────────────────────────────────────────────────┤
│ Metadata Section (JSON, variable length)                    │
├─────────────────────────────────────────────────────────────┤
│ Tensor Index (binary, variable length)                      │
├─────────────────────────────────────────────────────────────┤
│ [Padding to 64-byte alignment]                              │
├─────────────────────────────────────────────────────────────┤
│ Tensor Data Section (aligned tensors)                       │
│   ├── Tensor 0 (64-byte aligned)                           │
│   ├── Tensor 1 (64-byte aligned)                           │
│   └── ...                                                   │
├─────────────────────────────────────────────────────────────┤
│ Footer (16 bytes)                                           │
└─────────────────────────────────────────────────────────────┘

3.2 Header (32 bytes)

OffsetSizeFieldDescription
04magicAPR2 (0x41505232)
42version_majorFormat major version (2)
62version_minorFormat minor version (0)
84flagsFeature flags (see below)
124metadata_offsetOffset to metadata section
164metadata_sizeSize of metadata section
204index_offsetOffset to tensor index
244index_sizeSize of tensor index
284data_offsetOffset to tensor data section

3.3 Feature Flags

bitflags! {
    pub struct AprFlags: u32 {
        const COMPRESSED     = 0b0000_0001;  // LZ4 compression enabled
        const ALIGNED_64     = 0b0000_0010;  // 64-byte tensor alignment
        const ALIGNED_32     = 0b0000_0100;  // 32-byte tensor alignment (GGUF compat)
        const SHARDED        = 0b0000_1000;  // Multi-file model
        const ENCRYPTED      = 0b0001_0000;  // AES-256-GCM encryption
        const SIGNED         = 0b0010_0000;  // Ed25519 signature present
        const QUANTIZED      = 0b0100_0000;  // Contains quantized tensors
        const STREAMING      = 0b1000_0000;  // Streaming-optimized layout
    }
}

3.4 Metadata Section

JSON object containing model configuration and auxiliary data.

Required Keys

{
  "apr_version": "2.0.0",
  "model_type": "whisper",
  "architecture": {
    "n_vocab": 51865,
    "n_audio_ctx": 1500,
    "n_text_ctx": 448,
    "n_mels": 80,
    "n_audio_layer": 4,
    "n_text_layer": 4,
    "n_audio_head": 6,
    "n_text_head": 6,
    "n_audio_state": 384,
    "n_text_state": 384
  }
}

Optional Keys

{
  "vocab": ["<|endoftext|>", "<|startoftranscript|>", "..."],
  "mel_filterbank": [0.0, 0.0, "..."],
  "mel_filterbank_shape": [80, 201],
  "tokenizer_config": { "..." },
  "model_card": { "..." },
  "quantization": {
    "method": "Q8_0",
    "bits_per_weight": 8.5
  }
}

3.5 Tensor Index (Binary)

Index Header (8 bytes)

OffsetSizeField
04tensor_count
44reserved

Tensor Entry (variable, ~40+ bytes each)

OffsetSizeFieldDescription
02name_lenLength of tensor name
2name_lennameUTF-8 tensor name
+01dtypeData type enum
+11n_dimsNumber of dimensions (1-8)
+28×n_dimsdimsDimension sizes (u64 each)
+n8offsetByte offset in data section
+n+88sizeCompressed size (or raw size)
+n+168raw_sizeUncompressed size (0 if not compressed)
+n+244flagsPer-tensor flags

Data Type Enum

#[repr(u8)]
pub enum DType {
    F32 = 0, F16 = 1, BF16 = 2, I8 = 3, I16 = 4, I32 = 5, I64 = 6, U8 = 7,
    Q8_0 = 16, Q4_0 = 17, Q4_1 = 18, Q5_0 = 19, Q5_1 = 20,
}

3.6 Tensor Data Section

Tensors stored contiguously with alignment padding.

  • Default: 64-byte alignment (cache-line optimal)
  • GGUF-compatible: 32-byte alignment
  • Compression: Per-tensor LZ4 block compression (64KB blocks)
OffsetSizeFieldDescription
04crc32CRC32 of all preceding bytes
44magic_end2RPA (reverse magic)
88file_sizeTotal file size for validation

3.8 Sharding (Multi-File)

For models > 2GB, use manifest + shard files.

{
  "apr_version": "2.0.0",
  "sharded": true,
  "shard_count": 4,
  "shards": [
    {"file": "model-00001-of-00004.apr", "size": 2147483648, "crc32": "..."},
    {"file": "model-00002-of-00004.apr", "size": 2147483648, "crc32": "..."}
  ],
  "tensor_shard_map": {
    "encoder.conv1.weight": 0,
    "decoder.token_embedding.weight": 1
  }
}

3.9 WASM Considerations

pub trait StreamingLoader {
    fn load_metadata(&mut self) -> Result<AprMetadata>;
    fn load_index(&mut self) -> Result<Vec<TensorDescriptor>>;
    fn load_tensor(&mut self, name: &str) -> Result<Tensor>;
    fn prefetch(&mut self, names: &[&str]);
}

4. CLI Operations

4.1 Command Overview

apr - APR Model Operations Tool

COMMANDS:
    inspect     Inspect model metadata, vocab, and structure
    debug       Simple debugging output ("drama" mode)
    validate    Validate model integrity
    diff        Compare two models
    tensors     List tensor information
    export      Export model to other formats
    import      Import from external formats
    convert     Convert between model types
    merge       Merge multiple models
    trace       Trace model operations with renacer
    lint        Check for best practices and conventions
    explain     Explain errors, architecture, and tensors
    tui         Interactive terminal UI for exploration

4.2 Inspect Command

$ apr inspect whisper.apr

=== whisper.apr ===
Type:        NeuralCustom (Whisper ASR)
Version:     1.0
Size:        1.5 GB (compressed: 890 MB)
Parameters:  39,000,000
Vocab Size:  51,865
Flags:       COMPRESSED | SIGNED
Checksum:    0xA1B2C3D4 (valid)

Options: --vocab, --filters, --json, --full

4.2.1 Visual Inspection

For suspect tensors, generate an in-terminal histogram to visualize distributions (e.g., detecting shifted means):

$ apr tensors model.apr --hist encoder.layer_norm.weight

Distribution: encoder.layer_norm.weight (shape: [384])
Min: 10.4  Max: 12.1  Mean: 11.2  Std: 0.2

       |          *
       |         ***
  50%  |        *****
       |       *******
       |      *********
       +------------------
       10.0      11.2      12.5

4.3 Debug Command ("Drama" Mode)

$ apr debug whisper.apr --drama

====[ DRAMA: whisper.apr ]====

ACT I: THE HEADER
  Scene 1: Magic bytes... APRN (applause!)
  Scene 2: Version check... 1.0 (standing ovation!)

ACT II: THE METADATA
  Scene 1: Parameters... 39,000,000 (a cast of millions!)

ACT III: THE VERDICT
  CURTAIN CALL: Model is PRODUCTION READY!

Options: --hex, --strings, --limit

4.4 Validate Command

$ apr validate model.apr --quality

=== 100-Point Quality Assessment ===

Structure (25 pts):     24/25
Security (25 pts):      20/25
Weights (25 pts):       25/25
Metadata (25 pts):      22/25

TOTAL: 91/100 (EXCELLENT)

4.5 Diff Command

$ apr diff model_v1.apr model_v2.apr

Similarity: 94.2%
Weight Changes: Max delta 0.0234, L2 distance 1.234
Vocab Changes: Added 42 tokens, Removed 3 tokens

Diff vs Reference

Compare an APR model against a raw .safetensors reference to detect translation drift:

$ apr diff model.apr source.safetensors --tensor-mapping mapping.json

# Output:
# encoder.conv1.weight: MATCH (delta < 1e-6)
# encoder.layer_norm.weight: DRIFT (delta = 10.2) !!!

4.6 Export Command

FormatExtensionUse Case
ONNX.onnxCross-framework inference
SafeTensors.safetensorsHuggingFace ecosystem
GGUF.ggufllama.cpp / local inference
TorchScript.ptPyTorch deployment
apr export model.apr --format gguf --quantize q4_0 --output model.gguf

4.7 Import Command

apr import hf://openai/whisper-tiny --output whisper.apr
apr import model.safetensors --from safetensors --output model.apr

4.8 Convert Command

Model optimization and size reduction operations.

apr convert model.apr --quantize q8_0 --output model_q8.apr
apr convert model.apr --precision fp16 --output model_fp16.apr

4.8.1 Size Reduction Techniques

TechniqueFlagReductionQualityReversible
Quantization--quantize2-8xLow lossNo
Compression--compress1.2-2xLosslessYes
Pruning--prune2-10xMediumNo
Distillation--distill2-10xMediumNo
Low-rank (SVD)--lowrank2-4xLow lossNo
Sparsity--sparse2-5xLow lossYes
Quantization

Reduce precision of weights:

# Integer quantization
apr convert model.apr --quantize int8 -o model-int8.apr      # 4x smaller
apr convert model.apr --quantize int4 -o model-int4.apr      # 8x smaller

# Float quantization
apr convert model.apr --quantize fp16 -o model-fp16.apr      # 2x smaller
apr convert model.apr --quantize bf16 -o model-bf16.apr      # 2x smaller

# GGUF-style quantization
apr convert model.apr --quantize q4_k_m -o model-q4km.apr    # 4.5 bits/weight
apr convert model.apr --quantize q8_0 -o model-q8.apr        # 8 bits/weight
Compression

Lossless compression of tensor data:

# LZ4 (fast, default)
apr convert model.apr --compress lz4 -o model-lz4.apr

# Zstd (better ratio)
apr convert model.apr --compress zstd -o model-zstd.apr
apr convert model.apr --compress zstd:19 -o model-zstd19.apr  # Max compression

# Combine with quantization
apr convert model.apr --quantize int8 --compress zstd -o model-int8-zstd.apr
Pruning

Remove low-magnitude weights:

# Unstructured pruning (sparse tensors)
apr convert model.apr --prune 0.5 -o model-pruned.apr        # 50% sparsity

# Structured pruning (remove entire neurons/heads)
apr convert model.apr --prune-heads 2 -o model-pruned.apr    # Remove 2 attention heads
apr convert model.apr --prune-layers 1 -o model-pruned.apr   # Remove 1 layer

# Magnitude-based with threshold
apr convert model.apr --prune-threshold 0.01 -o model-pruned.apr
Distillation

Train smaller model from larger (requires reference data):

# Distill to smaller architecture
apr convert model-large.apr --distill tiny --data train.jsonl -o model-tiny.apr

# Layer reduction
apr convert model.apr --distill-layers 4 --data train.jsonl -o model-4layer.apr

# Knowledge distillation with temperature
apr convert model.apr --distill small --temperature 2.0 --data train.jsonl -o model-small.apr

Note: Distillation requires training data and compute. Use --epochs and --lr to control.

Low-Rank Factorization

Decompose weight matrices using SVD/LoRA:

# SVD decomposition
apr convert model.apr --lowrank svd --rank 64 -o model-svd.apr

# LoRA-style decomposition
apr convert model.apr --lowrank lora --rank 16 -o model-lora.apr

# Target specific layers
apr convert model.apr --lowrank svd --rank 32 --target "*.fc1.weight" -o model-svd.apr
Sparsity Encoding

Efficient storage for sparse tensors:

# CSR format for sparse tensors
apr convert model.apr --sparse csr --threshold 0.001 -o model-sparse.apr

# Block sparsity (GPU-friendly)
apr convert model.apr --sparse block:4 -o model-block-sparse.apr

4.8.2 Combination Examples

# Maximum compression pipeline
apr convert model.apr \
  --quantize int4 \
  --prune 0.3 \
  --compress zstd:19 \
  -o model-optimized.apr
# Result: ~20x smaller than original

# WASM-optimized (fast decode, small size)
apr convert model.apr \
  --quantize int8 \
  --compress lz4 \
  -o model-wasm.apr
# Result: ~5x smaller, fast streaming decode

# Quality-preserving compression
apr convert model.apr \
  --quantize fp16 \
  --lowrank svd --rank 128 \
  --compress zstd \
  -o model-quality.apr
# Result: ~3x smaller, minimal quality loss

4.8.3 Size Comparison Table

TechniqueWhisper TinyWhisper BaseLLaMA 7B
Original (f32)145 MB290 MB26 GB
fp1673 MB145 MB13 GB
int837 MB73 MB6.5 GB
int419 MB37 MB3.3 GB
int4 + zstd15 MB29 MB2.6 GB
int4 + prune50%10 MB19 MB1.7 GB

4.8.4 Quality Validation (Pre vs Post)

Compare model quality before and after optimization:

# Compare outputs between original and optimized
apr validate model.apr model-optimized.apr --quality

Quality Comparison: model.apr vs model-optimized.apr
═══════════════════════════════════════════════════════════════
                          Original    Optimized    Δ
Tensor count              167         167          0
Total params              39.0M       39.0M        0
Non-zero params           39.0M       19.5M        -50%
Size                      145 MB      15 MB        -89%

Output Comparison (10 test inputs):
  Mean L2 distance:       0.0234      (threshold: 0.1)  ✓ PASS
  Max L2 distance:        0.0891      (threshold: 0.5)  ✓ PASS
  Cosine similarity:      0.9987      (threshold: 0.99) ✓ PASS

Layer-by-layer drift:
  encoder.conv1:          0.001       ✓
  encoder.layer_norm:     0.002       ✓
  decoder.layer_norm:     0.089       ⚠ (highest drift)

VERDICT: ✓ PASS - Optimized model within quality tolerance
═══════════════════════════════════════════════════════════════
Canary Inputs

Define reference inputs with expected outputs for regression testing:

# Create canary test suite
apr canary create model.apr --input test.wav --output canary.json

# Validate optimized model against canary
apr canary check model-optimized.apr --canary canary.json

Canary Test Results:
  Input: test.wav
  Expected: "The quick brown fox jumps over the lazy dog"
  Original:  "The quick brown fox jumps over the lazy dog"  ✓
  Optimized: "The quick brown fox jumps over the lazy dog"  ✓

  Token-level accuracy: 100%
  Character error rate: 0.0%
Automatic Quality Gates
# Fail optimization if quality degrades beyond threshold
apr convert model.apr --quantize int4 --prune 0.5 \
  --quality-check \
  --max-drift 0.1 \
  --canary canary.json \
  -o model-optimized.apr

# If quality check fails:
# ERROR: Quality gate failed
#   - L2 drift: 0.24 (max: 0.1)
#   - Canary "test.wav" failed: expected "fox" got "box"
# Use --force to ignore quality gates

4.8.5 Payload Tracing (Radioactive Tracer)

Trace a payload through the model step-by-step, like a radioactive tracer in medicine:

apr trace model.apr --input test.wav --trace-payload

Payload Trace: test.wav → model.apr
═══════════════════════════════════════════════════════════════

Step 1: Audio Input
  Shape: [1, 480000]  (30s @ 16kHz)
  Stats: mean=0.002, std=0.15, range=[-0.98, 0.97]

Step 2: Mel Spectrogram
  Shape: [1, 80, 3000]
  Stats: mean=-4.2, std=2.1
  ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁  (frequency distribution)

Step 3: encoder.conv1
  Shape: [1, 384, 3000]
  Stats: mean=0.12, std=0.34
  Time: 2.3ms
  ⚠ Activation spike at position 1247 (value: 12.4)

Step 4: encoder.conv2
  Shape: [1, 384, 1500]
  Stats: mean=0.08, std=0.29
  Time: 1.8ms

Step 5: encoder.positional_embedding
  Shape: [1, 1500, 384]
  Stats: mean=0.08, std=0.31

Step 6: encoder.layers.0.self_attn
  Shape: [1, 1500, 384]
  Attention pattern:
  ░░░░░░░░░░░░░░░░░░░░
  ░░░░████░░░░░░░░░░░░  ← attending to positions 40-80
  ░░░░░░░░░░░░████░░░░

  ... (layers 1-3) ...

Step 10: encoder.layer_norm
  Shape: [1, 1500, 384]
  Stats: mean=0.00, std=1.02  ✓ (properly normalized)

Step 11: decoder.token_embedding (SOT token)
  Shape: [1, 1, 384]
  Token: <|startoftranscript|> (50258)

  ... (decoder steps) ...

Step 47: Output Logits
  Shape: [1, 12, 51865]
  Top predictions:
    1. "The" (0.94)
    2. "A" (0.03)
    3. "This" (0.01)

═══════════════════════════════════════════════════════════════
Total time: 142ms | Peak memory: 312MB | Tokens generated: 12
Comparing Traces (Diff Mode)

Compare payload path between two models:

apr trace model.apr model-optimized.apr --input test.wav --diff

Trace Diff: model.apr vs model-optimized.apr
═══════════════════════════════════════════════════════════════

Step    Layer                    Original     Optimized    Drift
─────   ─────                    ────────     ─────────    ─────
1       audio_input              ████████     ████████     0.000
2       mel_spectrogram          ████████     ████████     0.000
3       encoder.conv1            ████████     ███████░     0.012
4       encoder.conv2            ████████     ███████░     0.018
...
10      encoder.layer_norm       ████████     ██████░░     0.089 ⚠
11      decoder.token_embed      ████████     ████████     0.001
...
47      output_logits            ████████     ███████░     0.023

Divergence detected at: encoder.layer_norm (step 10)
  Original mean:  0.0023
  Optimized mean: 0.0892

Recommendation: Check layer norm weight quantization
Anomaly Detection

Automatically detect unusual activations:

apr trace model.apr --input test.wav --detect-anomalies

Anomaly Report:
═══════════════════════════════════════════════════════════════

⚠ ANOMALY at encoder.layers.2.self_attn (step 8)
  - Activation explosion: max=847.3 (expected <10)
  - Possible cause: NaN propagation or weight corruption
  - Affected tokens: positions 120-135

⚠ ANOMALY at decoder.layer_norm (step 15)
  - Dead neurons: 12% of outputs are exactly 0
  - Possible cause: Aggressive pruning or ReLU saturation

✓ No anomalies in remaining 45 layers
Interactive Trace Mode (TUI)
apr trace model.apr --input test.wav --interactive
┌─────────────────────────────────────────────────────────────────┐
│  Payload Trace: test.wav                        [Interactive]   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─ Pipeline ───────────────────────────────────────────────┐  │
│  │                                                          │  │
│  │  [Audio] ──▶ [Mel] ──▶ [Conv1] ──▶ [Conv2] ──▶ ...      │  │
│  │     ✓         ✓         ✓          ✓                     │  │
│  │                                    ▲                      │  │
│  │                                    │ YOU ARE HERE         │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Current Layer: encoder.conv2 ───────────────────────────┐  │
│  │ Input:  [1, 384, 3000]   Output: [1, 384, 1500]          │  │
│  │ Params: 589,824          Time: 1.8ms                     │  │
│  │                                                          │  │
│  │ Activation Distribution:                                 │  │
│  │     ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁                                      │  │
│  │   -2.0            0            2.0                       │  │
│  │                                                          │  │
│  │ Weight Stats: mean=0.002, std=0.04                       │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Payload Snapshot ───────────────────────────────────────┐  │
│  │ [0.12, 0.34, -0.21, 0.08, 0.45, -0.11, 0.02, ...]       │  │
│  │ mean=0.08  std=0.29  min=-1.2  max=2.1                  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│ [←/→] step  [Enter] inspect  [d]iff  [e]xport  [q]uit   4/47   │
└─────────────────────────────────────────────────────────────────┘
Export Trace for Analysis
# Export full trace to JSON
apr trace model.apr --input test.wav --export trace.json

# Export to Chrome trace format (for chrome://tracing)
apr trace model.apr --input test.wav --export trace.perfetto

# Export intermediate activations for debugging
apr trace model.apr --input test.wav --dump-activations ./activations/

4.8.6 Debugging Conversion

# Analyze source tensor stats without converting
apr convert model.safetensors --analyze-source --arch whisper

# Output:
# [PASS] encoder.conv1.weight: mean=0.003 (expected ~0.0)
# [FAIL] encoder.layer_norm.weight: mean=11.2 (expected ~1.0) -> SOURCE ALREADY CORRUPT?

4.9 Merge Command

StrategyDescription
averageAverage weights (ensemble)
weightedWeighted average by performance
tiesTIES merging (trim, elect, sign)
dareDARE merging (drop and rescale)
slerpSpherical linear interpolation
apr merge model1.apr model2.apr --strategy ties --output merged.apr

4.10 Trace Command

$ apr trace model.apr --input sample.wav

Layer                          Time (ms)   Memory (MB)
encoder.conv1                      12.3         45.2
decoder.attention.0                15.4         12.3
TOTAL                             142.5        312.4

4.11 Lint Command

Static analysis for best practices, conventions, and "soft" requirements. Unlike validate (which checks for corruption/invalidity), lint checks for quality and standardization.

$ apr lint model.apr

[WARN] Metadata: Missing 'license' field
[WARN] Metadata: Missing 'model_card'
[INFO] Tensor Naming: 'encoder.w' should be 'encoder.weight' for auto-mapping
[INFO] Efficiency: 12 tensors could be aligned to 64 bytes (currently 32)

Falsifiable Guarantees (Must Fail If):

  • Naming: Any tensor name not matching canonical schema (Section 10.8) raises INFO/WARN.
  • Metadata: Missing license, model_card, or provenance raises WARN.
  • Efficiency: Tensors unaligned to 64 bytes raise INFO.
  • Compression: Uncompressed tensors >1MB raise INFO.

4.12 Explain Command

Provides human-readable context, architectural explanations, and error troubleshooting.

Explain Model Architecture

$ apr explain model.apr

This is a **Whisper (Tiny)** model.
- **Purpose**: Automatic Speech Recognition (ASR)
- **Architecture**: Encoder-Decoder Transformer
- **Input**: 80-channel Mel spectrograms
- **Output**: Text tokens (multilingual)

Explain Specific Tensor

$ apr explain model.apr --tensor encoder.conv1.weight

**encoder.conv1.weight**
- **Role**: Initial feature extraction (Audio -> Latent)
- **Shape**: [384, 80, 3] (Filters, Input Channels, Kernel Size)
- **Stats**: Mean 0.002, Std 0.04 (Healthy)

Explain Error Codes

$ apr explain E002

**E002: Corrupted Data**
The payload checksum does not match the header.
- **Common Causes**: Interrupted download, bit rot, disk error.
- **Troubleshooting**:
  1. Run `apr validate --checksum` to verify.
  2. Check source file integrity (MD5/SHA256).

Falsifiable Guarantees:

  • Unknown Error: apr explain E999 must return "Unknown Error Code" (not crash).
  • Unknown Tensor: apr explain --tensor nonexistent must list fuzzy matches.
  • Architecture: Must correctly identify all supported architectures (Section 10).

4.13 TUI Command

Interactive terminal UI for model exploration, statistics visualization, and comparison. Built with ratatui and trueno-viz.

$ apr tui model.apr
$ apr tui model1.apr model2.apr --compare

4.13.1 Graph View

ASCII/Unicode graph visualization of model architecture:

┌─────────────────────────────────────────────────────────────────┐
│  Model: whisper-tiny.apr                          [Graph View]  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐                    │
│   │  Audio  │───▶│  Conv1  │───▶│  Conv2  │                    │
│   │ [80,3000]│    │[384,80,3]│   │[384,384]│                    │
│   └─────────┘    └─────────┘    └─────────┘                    │
│                                      │                          │
│                                      ▼                          │
│   ┌──────────────────────────────────────────────────────┐     │
│   │              Encoder Layers (×4)                      │     │
│   │  ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   │     │
│   │  │Self-Attn│──▶│  LN   │──▶│  FFN   │──▶│  LN    │   │     │
│   │  └────────┘   └────────┘   └────────┘   └────────┘   │     │
│   └──────────────────────────────────────────────────────┘     │
│                           │                                     │
│                           ▼                                     │
│   ┌──────────────────────────────────────────────────────┐     │
│   │              Decoder Layers (×4)                      │     │
│   │  ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   │     │
│   │  │Self-Attn│──▶│Cross-Attn│─▶│  FFN   │──▶│  LN    │   │     │
│   │  └────────┘   └────────┘   └────────┘   └────────┘   │     │
│   └──────────────────────────────────────────────────────┘     │
│                           │                                     │
│                           ▼                                     │
│                    ┌─────────────┐                              │
│                    │   Output    │                              │
│                    │  [51865]    │                              │
│                    └─────────────┘                              │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│ [g]raph [s]tats [c]ompare [t]ensors [h]ist [q]uit    Page 1/3  │
└─────────────────────────────────────────────────────────────────┘

4.13.2 Descriptive Statistics View

Live-updating tensor statistics dashboard:

┌─────────────────────────────────────────────────────────────────┐
│  Model: whisper-tiny.apr                          [Stats View]  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─ Overview ───────────────────────────────────────────────┐  │
│  │ Total Params: 39,000,000    Tensors: 167    Size: 145MB  │  │
│  │ Quantization: f32           Vocab: 51,865   Arch: Whisper│  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Layer Norm Health ──────────────────────────────────────┐  │
│  │ Tensor                        Mean    Std    Status      │  │
│  │ encoder.layer_norm.weight     1.48    0.32   ✓ OK        │  │
│  │ decoder.layer_norm.weight    11.10    0.21   ✗ BAD       │  │
│  │ encoder.layers.0.ln.weight    1.22    0.28   ✓ OK        │  │
│  │ encoder.layers.1.ln.weight    1.35    0.31   ✓ OK        │  │
│  │ encoder.layers.2.ln.weight    1.41    0.29   ✓ OK        │  │
│  │ encoder.layers.3.ln.weight   10.94    0.18   ✗ BAD       │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Weight Distribution ────────────────────────────────────┐  │
│  │                                                          │  │
│  │  Attention:  ████████████████████  Mean: 0.002  ✓        │  │
│  │  FFN:        ███████████████████   Mean: 0.001  ✓        │  │
│  │  Embedding:  █████████████████     Mean: 0.015  ✓        │  │
│  │  LayerNorm:  ██████████████████████████████████  ✗       │  │
│  │              ↑ outlier: decoder.layer_norm.weight        │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Validation Score ───────────────────────────────────────┐  │
│  │ ████████████████████░░░░  21/25 FAIL                     │  │
│  │ Critical: 2 Layer Norm weights outside [0.5, 3.0]        │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│ [g]raph [s]tats [c]ompare [t]ensors [h]ist [q]uit    Page 1/1  │
└─────────────────────────────────────────────────────────────────┘

4.13.3 Comparison View

Side-by-side model comparison with diff highlighting:

┌─────────────────────────────────────────────────────────────────┐
│  Comparing: model_v1.apr vs model_v2.apr         [Compare View] │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─ Summary ────────────────────────────────────────────────┐  │
│  │ Similarity: 94.2%    Changed: 12 tensors    New: 0       │  │
│  │ Max Δ: 0.0234        L2 Dist: 1.234         Removed: 0   │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Tensor Comparison ──────────────────────────────────────┐  │
│  │ Tensor                    v1 Mean   v2 Mean   Δ          │  │
│  │ encoder.conv1.weight      0.0023    0.0025    +0.0002    │  │
│  │ encoder.layer_norm.wt     1.4832    1.4901    +0.0069    │  │
│  │ decoder.layer_norm.wt    11.0983    1.0521   -10.0462 !! │  │
│  │ decoder.layers.0.fc1.wt   0.0012    0.0014    +0.0002    │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Distribution Comparison ────────────────────────────────┐  │
│  │                                                          │  │
│  │  decoder.layer_norm.weight:                              │  │
│  │                                                          │  │
│  │  v1: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████  (mean=11.1)   │  │
│  │  v2: ░░░░░░░░░░████░░░░░░░░░░░░░░░░░░░░░░  (mean=1.05)   │  │
│  │      ──────────────────────────────────────              │  │
│  │      0         5         10        15                    │  │
│  │                                                          │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌─ Validation Score Comparison ────────────────────────────┐  │
│  │ v1: ████████████████████░░░░  21/25 FAIL                 │  │
│  │ v2: ████████████████████████  25/25 PASS  ← IMPROVED     │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│ [g]raph [s]tats [c]ompare [t]ensors [h]ist [q]uit    Page 1/1  │
└─────────────────────────────────────────────────────────────────┘

4.13.4 Histogram View

Per-tensor distribution visualization with sparklines:

┌─────────────────────────────────────────────────────────────────┐
│  Tensor: decoder.layer_norm.weight               [Histogram]    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Shape: [384]    dtype: f32    Size: 1.5 KB                    │
│  Mean: 11.0983   Std: 0.2134   Min: 10.42   Max: 12.01         │
│                                                                 │
│  Distribution:                                                  │
│                                                                 │
│   150 │                    ▄▄▄▄                                 │
│       │                  ▄██████▄                               │
│   100 │                ▄██████████▄                             │
│       │              ▄██████████████▄                           │
│    50 │            ▄██████████████████▄                         │
│       │          ▄██████████████████████▄                       │
│     0 ├──────────────────────────────────────────────           │
│       10.0      10.5      11.0      11.5      12.0              │
│                                                                 │
│  ⚠ ANOMALY DETECTED:                                           │
│  Expected mean ≈ 1.0 for LayerNorm weight                       │
│  Actual mean = 11.0983 (10x higher than expected)               │
│                                                                 │
│  Possible causes:                                               │
│  • Incorrect tensor scaling during conversion                   │
│  • Wrong tensor mapped to this name                             │
│  • Source model corruption                                      │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│ [←/→] prev/next tensor  [Enter] select  [q] back    12/167     │
└─────────────────────────────────────────────────────────────────┘

4.13.5 Keybindings

KeyAction
gSwitch to Graph view
sSwitch to Stats view
cSwitch to Compare view (if 2 models)
tSwitch to Tensor list
hSwitch to Histogram view
EnterSelect/drill down
EscBack/cancel
↑/↓Navigate list
←/→Previous/next page or tensor
/Search tensors
?Help
qQuit

4.13.6 Implementation

Crates:

  • ratatui = "0.28" - Terminal UI framework
  • crossterm = "0.28" - Cross-platform terminal handling
  • trueno-viz - Tensor visualization utilities (optional)

Feature Flag:

[features]
tui = ["ratatui", "crossterm"]

5. Auxiliary Data Patterns

5.1 JSON Metadata Pattern

[APR magic] → [metadata_len] → [JSON metadata] → [tensors] → [CRC32]
                                     ↑
                            Auxiliary data here

5.2 Common Auxiliary Data Types

Vocabulary (NLP)

{"vocab": ["<pad>", "<unk>", "the", "..."], "vocab_size": 51865}

Mel Filterbank (Audio)

{"mel_filterbank": [0.0, "..."], "mel_filterbank_shape": [80, 201]}

Tokenizer Config

{"tokenizer_config": {"type": "bpe", "unk_token": "<|unk|>", "eos_token": "<|endoftext|>"}}

Image Preprocessing (Vision)

{"image_config": {"image_size": 224, "mean": [0.485, 0.456, 0.406]}}

Label Mapping (Classification)

{"labels": {"0": "cat", "1": "dog"}, "num_labels": 2}

5.3 Tensor Storage for Large Data

Data SizeJSON MetadataTensor
< 100KBPreferredOverkill
100KB - 1MBAcceptableGood
> 1MBAvoidPreferred

Naming convention: audio.mel_filterbank, text.token_embedding

5.4 Best Practices

  1. Use standard keys: Follow HuggingFace/GGUF conventions
  2. Include shape info: Always store shape alongside flattened arrays
  3. Version metadata: Include format_version for compatibility
  4. Document units: Specify if values are normalized, in Hz, etc.
  5. Validate on load: Check array lengths match expected shapes

6. Format Comparison

FeatureAPR1APR2GGUFSafeTensors
WASM-firstYesYesNoYes
Tensor alignmentNoYes (64B)Yes (32B)Yes
CompressionNoLZ4NoNo
QuantizationMetadataNativeNativeNo
ShardingNoYesNoYes
StreamingNoYesNoNo
JSON metadataYesYesTyped KVJSON
CRC32YesYesNoNo

7. Error Handling

CodeCategoryDescription
E001FORMATInvalid file format
E002CORRUPTCorrupted data
E003VERSIONUnsupported version
E004CHECKSUMChecksum mismatch
E005DECRYPTDecryption failed
E006SIGNATURESignature invalid
E007IOFile I/O error
E008MEMORYOut of memory

8. Configuration

# ~/.config/apr/config.toml

[defaults]
output_format = "text"
color = true

[inspect]
show_vocab = true
max_tokens_display = 20

[debug]
drama_mode = false
hex_limit = 256

[validate]
strict = true
require_signature = false

9. Quality Gates

# .pmat-gates.toml
[apr-ops]
test_coverage_minimum = 95.0
max_cyclomatic_complexity = 10
satd_maximum = 0
mutation_score_minimum = 85.0
max_inspect_latency_ms = 100

10. Multi-Format Conversion Specification

10.1 Supported Input Formats

APR supports conversion from all major ML model formats:

FormatExtensionsSourcePriorityStatus
SafeTensors.safetensorsHuggingFaceP0✅ Implemented
PyTorch.pt, .pth, .binPyTorchP0🔲 Planned
GGUF.ggufllama.cppP1🔲 Planned
GGML.binLegacy llama.cppP2🔲 Planned
ONNX.onnxONNX RuntimeP1🔲 Planned
TensorFlow.pb, .h5, SavedModelTensorFlow/KerasP2🔲 Planned
Core ML.mlmodel, .mlpackageAppleP3🔲 Future
TensorRT.engine, .planNVIDIAP3🔲 Future

Critical Lesson Learned: A single incorrect tensor conversion (e.g., decoder.layer_norm.weight with mean=11 instead of ~1) can cause complete model failure while passing basic structural checks.


10.2 SafeTensors (HuggingFace)

Status: ✅ Primary implementation

File Structure:

model.safetensors
├── Header (8 bytes): JSON length (u64 LE)
├── JSON Metadata: tensor names, shapes, dtypes, offsets
└── Tensor Data: contiguous f32/f16/bf16 arrays

CLI Usage:

apr convert model.safetensors -o model.apr
apr convert model.safetensors --quantize int8 -o model-int8.apr

# From HuggingFace Hub
apr convert hf://openai/whisper-tiny -o whisper-tiny.apr

Data Types: | SafeTensors Type | APR Conversion | |------------------|----------------| | F32 | Direct copy | | F16 | Convert to f32 or keep as f16 | | BF16 | Convert to f32 | | I8 | Keep as int8 (quantized) |

Crate: safetensors = "0.4"


10.3 PyTorch (.pt, .pth, .bin)

Status: 🔲 Planned (P0)

File Structure:

model.pt (ZIP archive)
├── data.pkl          # Python pickle with tensor metadata
├── data/0            # Raw tensor bytes
├── data/1
└── ...

Security Warning: PyTorch files use Python pickle, which can execute arbitrary code. APR conversion MUST:

  1. Use pickle in restricted mode (no arbitrary imports)
  2. Validate tensor shapes before allocation
  3. Reject files with suspicious pickle opcodes

CLI Usage:

apr convert model.pt -o model.apr --arch whisper
apr convert model.pth -o model.apr --arch llama

# With state_dict key prefix
apr convert model.pt -o model.apr --prefix "model."

Implementation Notes:

  • Use zip crate for archive extraction
  • Implement minimal pickle parser (BINGET, MARK, TUPLE, etc.)
  • Map torch.float32 → f32, torch.float16 → f16
  • Handle both full checkpoints and state_dict-only files

Crate: Custom pickle parser (no Python dependency)


10.4 GGUF (llama.cpp)

Status: 🔲 Planned (P1)

File Structure:

model.gguf
├── Magic (4 bytes): "GGUF"
├── Version (4 bytes): u32
├── Tensor Count (8 bytes): u64
├── Metadata KV Count (8 bytes): u64
├── Metadata KV Pairs: typed key-value store
├── Tensor Infos: name, dims, type, offset
└── Tensor Data: aligned, possibly quantized

CLI Usage:

apr convert model.gguf -o model.apr
apr convert model-q4_k_m.gguf -o model.apr --dequantize f32
apr convert model.gguf -o model.apr --keep-quantization

Quantization Types: | GGUF Type | Bits | APR Handling | |-----------|------|--------------| | F32 | 32 | Direct copy | | F16 | 16 | Convert or keep | | Q8_0 | 8 | Dequantize or convert to APR int8 | | Q4_0 | 4 | Dequantize to f32 | | Q4_K_M | 4.5 | Dequantize to f32 | | Q5_K_M | 5.5 | Dequantize to f32 | | Q6_K | 6 | Dequantize to f32 |

Metadata Mapping: | GGUF Key | APR Metadata | |----------|--------------| | general.architecture | model_type | | general.name | model_name | | llama.context_length | context_length | | llama.embedding_length | hidden_size | | tokenizer.ggml.tokens | Vocabulary |

Crate: Custom GGUF parser


10.5 GGML (Legacy)

Status: 🔲 Planned (P2)

File Structure:

model.bin
├── Magic (4 bytes): "lmgg" or "tjgg"
├── Hyperparameters: model-specific struct
├── Vocabulary: token strings
└── Tensors: name + dims + data (unaligned)

CLI Usage:

apr convert model.bin -o model.apr --format ggml --arch llama

Notes:

  • Legacy format, prefer GGUF for new conversions
  • No standardized metadata format
  • Architecture must be specified manually

10.6 ONNX

Status: 🔲 Planned (P1)

File Structure:

model.onnx (Protobuf)
├── ModelProto
│   ├── graph: GraphProto
│   │   ├── node[]: operators
│   │   ├── input[]: model inputs
│   │   ├── output[]: model outputs
│   │   └── initializer[]: weight tensors
│   └── metadata_props: key-value pairs

CLI Usage:

apr convert model.onnx -o model.apr
apr convert model.onnx -o model.apr --opset 17

Data Types: | ONNX Type | APR Conversion | |-----------|----------------| | FLOAT | f32 | | FLOAT16 | f16 | | BFLOAT16 | f32 (convert) | | INT8 | int8 | | UINT8 | int8 (reinterpret) |

Crate: onnx-pb = "0.1" or custom protobuf parser


10.7 TensorFlow/Keras

Status: 🔲 Planned (P2)

Supported Formats:

FormatDescriptionCLI Flag
SavedModelDirectory with saved_model.pb--format savedmodel
HDF5Keras .h5 files--format h5
Frozen GraphSingle .pb file--format frozen
TFLite.tflite mobile format--format tflite

CLI Usage:

apr convert saved_model/ -o model.apr --format savedmodel
apr convert model.h5 -o model.apr --format h5
apr convert model.tflite -o model.apr --format tflite

Notes:

  • HDF5 requires hdf5 crate
  • SavedModel requires protobuf parsing
  • TFLite uses FlatBuffers

10.8 Tensor Name Mapping

Each source format uses different naming conventions. APR standardizes to a canonical form:

Whisper Model Mapping

Source FormatSource NameAPR Name
SafeTensorsmodel.encoder.conv1.weightencoder.conv1.weight
SafeTensorsmodel.encoder.embed_positions.weightencoder.positional_embedding
SafeTensorsmodel.decoder.embed_tokens.weightdecoder.token_embedding
PyTorchencoder.conv1.weightencoder.conv1.weight
GGUFencoder.conv1.weightencoder.conv1.weight
ONNX/encoder/conv1/weightencoder.conv1.weight

LLaMA Model Mapping

Source FormatSource NameAPR Name
SafeTensorsmodel.embed_tokens.weighttoken_embedding
SafeTensorsmodel.layers.0.self_attn.q_proj.weightlayers.0.attn.q_proj.weight
GGUFtoken_embd.weighttoken_embedding
GGUFblk.0.attn_q.weightlayers.0.attn.q_proj.weight

Full HuggingFace Whisper Mapping

HuggingFace NameAPR Name
model.encoder.conv1.weightencoder.conv1.weight
model.encoder.conv1.biasencoder.conv1.bias
model.encoder.conv2.weightencoder.conv2.weight
model.encoder.conv2.biasencoder.conv2.bias
model.encoder.embed_positions.weightencoder.positional_embedding
model.encoder.layer_norm.weightencoder.layer_norm.weight
model.encoder.layer_norm.biasencoder.layer_norm.bias
model.encoder.layers.N.self_attn_layer_norm.weightencoder.layers.N.self_attn_layer_norm.weight
model.encoder.layers.N.self_attn.q_proj.weightencoder.layers.N.self_attn.q_proj.weight
model.decoder.embed_tokens.weightdecoder.token_embedding
model.decoder.embed_positions.weightdecoder.positional_embedding
model.decoder.layer_norm.weightdecoder.layer_norm.weight
model.decoder.layer_norm.biasdecoder.layer_norm.bias
model.decoder.layers.N.self_attn_layer_norm.weightdecoder.layers.N.self_attn_layer_norm.weight
model.decoder.layers.N.encoder_attn_layer_norm.weightdecoder.layers.N.encoder_attn_layer_norm.weight
model.decoder.layers.N.final_layer_norm.weightdecoder.layers.N.final_layer_norm.weight

10.9 Expected Tensor Statistics

Layer Norm Weights (gamma) - MUST have mean ≈ 1.0:

Tensor                                   Expected Mean   Acceptable Range
encoder.layer_norm.weight                1.0 - 2.0       [0.5, 3.0]
decoder.layer_norm.weight                1.0 - 2.0       [0.5, 3.0]
*.self_attn_layer_norm.weight            1.0 - 2.0       [0.5, 3.0]
*.encoder_attn_layer_norm.weight         1.0 - 2.0       [0.5, 3.0]
*.final_layer_norm.weight                1.0 - 2.0       [0.5, 3.0]

Layer Norm Bias (beta) - MUST have mean ≈ 0.0:

Tensor                                   Expected Mean   Acceptable Range
*.layer_norm.bias                        0.0             [-0.5, 0.5]

Attention/Linear Weights - Should have mean ≈ 0.0:

Tensor                                   Expected Mean   Expected Std
*.q_proj.weight                          ~0.0            0.02 - 0.10
*.k_proj.weight                          ~0.0            0.02 - 0.10
*.v_proj.weight                          ~0.0            0.02 - 0.10
*.out_proj.weight                        ~0.0            0.02 - 0.10
*.fc1.weight                             ~0.0            0.02 - 0.05
*.fc2.weight                             ~0.0            0.02 - 0.05

Embeddings:

Tensor                                   Expected Mean   Expected Std
token_embedding                          ~0.0            0.02 - 0.05
positional_embedding                     ~0.0            0.01 - 0.02

10.10 Conversion Validation Requirements

  1. Shape Validation: Every tensor must match expected shape for model architecture
  2. Value Validation: Every tensor must have statistics within expected ranges
  3. Reference Comparison: Converted model must produce outputs within tolerance of HF reference
  4. Inline Validation (Strict Mode): The apr convert tool MUST run the statistical checks (Section 10.9) as tensors are being written.
    • Default Behavior: If a tensor violates the "Acceptable Range" (e.g., LayerNorm mean > 3.0), the conversion aborts with an error.
    • Override: Use --force or --relaxed to bypass this check.
    • Justification: Better to fail early than produce a "zombie" model.

10.11 Known Failure Modes

FailureSymptomRoot CauseTroubleshooting
LN weight mean=11Repetitive token output (e.g., "...")Incorrect tensor scaling or name mappingUse apr tensors --hist to visualize distribution
Missing conv biasZero encoder outputConv layer not loadedCheck --analyze-source
Transposed weightsGarbage outputRow-major vs column-major confusionRun apr diff vs reference
Truncated tensorsPartial outputsSize mismatch during copyVerify header vs file size

11. Master Falsification QA Checklist (100 Points)

This checklist unifies structural, physical, operational, and conversion requirements into a single 100-point quality gate. Every point must be testable and falsifiable.

A. Format & Structural Integrity (25 Points)

#ClaimTest CommandFalsification (How to Fail)
1Magic bytes validhead -c4 m.apr \| grep APR2Edit file to start with "APR1" or random bytes
2Header size fixedapr inspect m.apr --headerInsert 1 byte before data offset
3Version supportedLoad v2.0 fileLoad v3.0 file (should fail E003)
4Checksum validapr validate m.apr --checksumFlip 1 bit in payload (should fail E004)
5JSON Metadataapr inspect m.apr --jsonCorrupt JSON syntax in editor
6Tensor Alignmentapr lint m.apr checks 64BCreate file with 1-byte alignment (should warn)
7Index SortedValidate index sort orderSwap two entries in binary index
8Compressionapr info shows lz4Compress with unsupported algo (should fail)
9Sharding ManifestLoad sharded modelDelete one shard file (should fail E007)
10EndiannessRead on Big Endian system(Simulate BE) Read LE floats incorrectly
11Flags ParsedCheck specific flag bitsSet undefined flag bit (should warn/ignore)
12Footer MagicCheck 2RPA at EOFTruncate last 16 bytes (should fail)
13File SizeHeader size == ls -lAppend garbage to EOF (should warn)
14Tensor OffsetsRead last tensorSet offset beyond EOF (should fail E002)
15Empty ModelLoad model with 0 tensorsCreate valid header, 0 tensors (should pass)
16Huge HeaderMetadata > 100MBCreate 200MB JSON header (should stream/fail gracefully)
17UTF-8 NamesTensor names are UTF-8Insert invalid UTF-8 in name (should fail)
18Duplicate NamesIndex has unique namesDuplicate "tensor.a" in index (should fail)
19Dimension LimitSupport 8 dimsCreate 9-dim tensor (should fail)
20Zero DimsSupport scalar (0-dim)Create 0-dim tensor (should pass)
21DatatypesSupport all DType enumsUse invalid enum id 255 (should fail)
22Padding BytesPadding is zeroedFill padding with 0xFF (should warn in lint)
23SignatureVerify Ed25519 (if signed)Modify 1 byte of signature (should fail E006)
24EncryptionDecrypt AES-256-GCMProvide wrong key (should fail E005)
25WASM LoadLoad in wasm32 envRun in browser (must work)

B. Tensor Physics & Statistics (25 Points)

#ClaimTest CommandFalsification (How to Fail)
26No NaNsapr validate --nan-checkManually inject 0x7FC00000 (NaN) into f32 tensor
27No Infsapr validate --nan-checkInject 0x7F800000 (+Inf)
28LayerNorm Meanapr tensors --stats in [0.5, 3]Set LN weights to 11.0 (should fail/warn)
29LayerNorm Biasapr tensors --stats in [-0.5, 0.5]Set LN bias to 5.0 (should fail/warn)
30Embedding Stdapr tensors --stats < 0.2Set embedding std to 1.0 (should warn)
31Zero Tensorsapr validate --zero-checkSet entire tensor to 0.0 (should warn)
32Shape Matchapr validate --shapesResize tensor [384]->[383] (should fail)
33Vocab MatchMetadata n_vocab == tensor dimChange metadata n_vocab to mismatch (should fail)
34Quantization Rangeq8_0 values in [-127, 127]Manually set byte -128 (if using symm quant)
35Attn/Linear MeanMean approx 0.0Set Linear weight mean to 1.0 (should warn)
36Softmax Valid(If traceable) Output sums to 1.0(Hard to fuzz statically, use trace)
37Mel FiltersValues >= 0.0Set negative filter bank value (should warn)
38Pos EmbeddingsCorrect shape for ctx lenTruncate pos embedding (should fail shape)
39Token IDs(Trace) Output tokens < vocab(Trace) Force output token > vocab_max
40Audio Range(Trace) Input in [-1, 1]Feed audio with amp 10.0 (trace should warn)
41FP16 RangeValues within FP16 limitsvalue > 65504 in FP16 tensor (should become Inf)
42Sparsity(If sparse) Check non-zero %Claim sparse but 100% dense (lint warning)
43Dead Neurons(Trace) Activations never > 0(Trace) Detect 0-activation neuron across 100 inputs
44Exploding Grads(Trace) Values > 1e6(Trace) Detect activation spike
45Repeat Tokens(Trace) Repetition > 5x(Trace) Feed silence, check for hallucination
46Silence Input(Trace) Output is empty/silenceFeed silence, check non-empty output
47White Noise(Trace) Output is garbageFeed noise, check for confident output (bad)
48Mel ShapeFilterbank matches audio/melsMismatch n_mels 80 vs 128 (should fail)
49Text ContextPos embed covers text ctxInput text > max context (should truncate/fail)
50L2 Distanceapr diff vs ref < 1.0Compare against random tensor (should fail L2)

C. Tooling & Operations (25 Points)

#ClaimTest CommandFalsification (How to Fail)
51Inspect Speedinspect < 100ms(Perf) Load 100GB model (should be fast)
52Lint Defaultsapr lint runs default checksCreate file with no license (must warn)
53Drama Modeapr debug --dramaRun on CI (no tty) - should output text
54TUI Graphapr tui renders graphCreate cyclic graph (should handle/error)
55TUI Statsapr tui stats match CLI(Manual) Compare TUI number vs CLI number
56Diff Identityapr diff a.apr a.aprDiff same file (must show 100% match)
57Diff Detectionapr diff a.apr b.aprDiff modified file (must show mismatch)
58Merge Averageapr merge averages weightsMerge [1.0] and [3.0] -> expect [2.0]
59Merge TIESapr merge --strategy ties(Complex) Verify TIES masking logic
60Export ONNXapr export --format onnxValidate output with onnx.checker
61Export GGUFapr export --format ggufLoad output in llama.cpp
62Convert Quantapr convert --quantize int8Check output size < 25% of input
63Convert Pruneapr convert --prune 0.5Check non-zero count is 50%
64Trace Outputapr trace produces JSONCorrupt input audio (should err/warn)
65Explain Errorapr explain E001Ask for E999 (should say unknown)
66Explain Tensorapr explain --tensorAsk for random name (should fuzzy match)
67Analyze Sourceconvert --analyze-sourceRun on corrupt safetensors (must fail)
68Inline Validconvert fails on bad statForce bad mean in source, run convert (must abort)
69Force Overrideconvert --forceSame as 68, but use --force (must pass)
70Cache DirUses APR_CACHESet APR_CACHE=/tmp/x (check files there)
71Config LoadUses config.tomlSet output_format=json in config (check output)
72Canary Checkapr canary checkModify weights to cause regression (should fail canary)
73JSON Outputapr inspect --jsonPipe to jq (must parse)
74Trace Payloadapr trace --payloadCorrupt tensor, check for anomaly in trace output
75Trace Diffapr trace --diffDiff identical models (should show 0 drift)

D. Conversion & Interoperability (25 Points)

#ClaimTest CommandFalsification (How to Fail)
76SafeTensorsImport .safetensorsImport renamed .txt file (should fail)
77PyTorchImport .pt (pickle)Import malicious pickle (should fail/block)
78GGUF ImportImport .ggufImport GGUF with unknown arch (should fail)
79RoundtripAPR->ONNX->APRCompare tensor values (drift < 1e-5)
80HF MappingMaps model.layers.0 correctlyRename layer in source (should fail map)
81Q-DeepCopyPreserves quantizationConvert q8->apr (should stay q8 if supported)
82F32->BF16convert --precision bf16Check dtype is BF16
83BF16->F32convert --precision f32Check dtype is F32
84Vocab ImportImports full vocabTruncate vocab in source (check count)
85Special TokensPreserves BOS/EOS/UNKCheck metadata for token IDs
86Metadata CopyCopies model card/licenseRemove metadata from source (check warnings)
87Tensor Name NormNormalizes to encoder.xCheck for "model.encoder.x" (bad)
88PermutationTransposes weights if neededDisable transpose (check output garbage)
89Scale FactorsApplies rescaling (e.g. div 2)Disable scaling (check mean drift)
90Sharded ImportImports model-0001...Missing shard 2 (should fail)
91Remote Importapr import hf://...Network down (should fail gracefully)
92Cache HitSecond import is fastClear cache, time it; run again, time it
93Checksum VerifyVerify source SHA256Modify source file (should fail checksum)
94License WarningWarns on non-commercialImport CC-BY-NC model (check warning)
95Arch DetectAuto-detects Whisper/LLaMAImport unknown arch (should ask user)
96Output PathHonors --outputCheck file exists at path
97OverwriteFails if exists (no -f)Create file, run export (should fail)
98Disk FullHandle ENOSPCSimulate small disk (should fail clean)
99Memory LimitRespect APR_RAM_LIMITSet low limit, load big model (should error/mmap)
100Golden TracePasses canonical traceRun against golden_traces/ (must pass)

12. Automated Validation Script

The apr-qa tool runs this 100-point checklist automatically.

# Run the full suite
apr-qa verify model.apr --score

# Run specific category
apr-qa verify model.apr --category physics

# CI/CD usage (fail if score < 95)
apr-qa verify model.apr --min-score 95

13. Import/Convert Pipeline

The complete pipeline for downloading, converting, validating, and optimizing models.

13.1 Pipeline Overview

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Source    │───▶│   Import    │───▶│  Validate   │───▶│   Output    │
│ (HF/Local)  │    │ (Converter) │    │ (100-Point) │    │   (.apr)    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
      │                  │                  │                  │
      ▼                  ▼                  ▼                  ▼
  hf://openai/     SafeTensors→APR    Inline checks      Quantized/
  whisper-tiny     Name mapping       Tensor stats       Compressed

13.2 CLI Interface

# Full pipeline: download → convert → validate
apr import hf://openai/whisper-tiny -o whisper.apr

# With quantization
apr import hf://openai/whisper-tiny -o whisper-int8.apr --quantize int8

# Local file conversion
apr import model.safetensors -o model.apr

# Validate after import (automatic, but can run standalone)
apr validate whisper.apr --quality --min-score 95

# Post-import optimization
apr convert whisper.apr --quantize int8 --compress lz4 -o whisper-optimized.apr

13.3 SDK Interface

use aprender::format::{AprConverter, ImportOptions, ValidationConfig};

// Full pipeline with builder pattern
let apr_bytes = AprConverter::new()
    .source("hf://openai/whisper-tiny")
    .architecture("whisper")
    .validate(ValidationConfig::strict())  // Inline validation
    .quantize(Quantization::Int8)
    .compress(Compression::Lz4)
    .convert()?;

// Save to file
std::fs::write("whisper.apr", apr_bytes)?;

// Or use the high-level API
apr_import("hf://openai/whisper-tiny", "whisper.apr", ImportOptions::default())?;

13.4 Source Types

SourceFormatExample
HuggingFace Hubhf://org/repohf://openai/whisper-tiny
HuggingFace Filehf://org/repo/filehf://openai/whisper-tiny/model.safetensors
Local SafeTensorsPath./model.safetensors
Local PyTorchPath./model.pt
Local GGUFPath./model.gguf
URLhttps://https://example.com/model.safetensors

13.5 Tensor Name Mapping

During import, tensor names are normalized from source format to APR canonical form:

/// Tensor name mapper trait
pub trait TensorNameMapper {
    /// Map source tensor name to APR name
    fn map_name(&self, source_name: &str) -> Option<String>;

    /// Get expected tensor statistics for validation
    fn expected_stats(&self, apr_name: &str) -> Option<TensorExpectation>;
}

/// Built-in mappers
pub enum Architecture {
    Whisper,  // HuggingFace Whisper → APR Whisper
    Llama,    // HuggingFace LLaMA → APR LLaMA
    Bert,     // HuggingFace BERT → APR BERT
    Custom(Box<dyn TensorNameMapper>),
}

Whisper Mapping Example:

HuggingFace                           → APR
model.encoder.conv1.weight            → encoder.conv1.weight
model.decoder.layer_norm.weight       → decoder.layer_norm.weight
model.decoder.layers.0.self_attn...   → decoder.layers.0.self_attn...

13.6 Inline Validation

Critical: Validation runs DURING conversion, not after. If a tensor fails validation, conversion aborts immediately.

/// Validation that runs inline during conversion
pub struct InlineValidator {
    config: ValidationConfig,
    report: ValidationReport,
}

impl InlineValidator {
    /// Called for each tensor during conversion
    pub fn validate_tensor(&mut self, name: &str, data: &[f32]) -> Result<(), ValidationError> {
        let stats = TensorStats::compute(name, data);

        // Check for NaN/Inf
        if stats.nan_count > 0 {
            return Err(ValidationError::NanDetected { name: name.to_string(), count: stats.nan_count });
        }

        // Check LayerNorm weights (mean should be ~1.0)
        if name.contains("layer_norm") && name.ends_with(".weight") {
            if stats.mean < 0.5 || stats.mean > 3.0 {
                return Err(ValidationError::LayerNormMean {
                    name: name.to_string(),
                    mean: stats.mean,
                    expected: (0.5, 3.0),
                });
            }
        }

        Ok(())
    }
}

13.7 Import Options

/// Options for the import pipeline
#[derive(Debug, Clone)]
pub struct ImportOptions {
    /// Target architecture for name mapping
    pub architecture: Architecture,

    /// Validation configuration
    pub validation: ValidationConfig,

    /// Quantization (None = keep original precision)
    pub quantize: Option<Quantization>,

    /// Compression algorithm
    pub compress: Option<Compression>,

    /// Force import even if validation fails
    pub force: bool,

    /// Cache downloaded files
    pub cache: bool,

    /// HuggingFace token (from env HF_TOKEN if None)
    pub hf_token: Option<String>,
}

impl Default for ImportOptions {
    fn default() -> Self {
        Self {
            architecture: Architecture::Auto,  // Auto-detect
            validation: ValidationConfig::strict(),
            quantize: None,
            compress: None,
            force: false,
            cache: true,
            hf_token: None,
        }
    }
}

13.8 Error Handling

Import errors are specific and actionable:

#[derive(Debug, thiserror::Error)]
pub enum ImportError {
    #[error("Download failed: {source} - {reason}")]
    DownloadFailed { source: String, reason: String },

    #[error("Unsupported format: {extension}")]
    UnsupportedFormat { extension: String },

    #[error("Tensor validation failed: {name} - {reason}")]
    ValidationFailed { name: String, reason: String },

    #[error("Name mapping failed: unknown tensor '{source_name}'")]
    UnknownTensor { source_name: String },

    #[error("Architecture mismatch: expected {expected}, found {found}")]
    ArchitectureMismatch { expected: String, found: String },

    #[error("Missing required tensor: {name}")]
    MissingTensor { name: String },
}

13.9 Caching

Downloaded models are cached to avoid re-downloading:

~/.cache/apr/
├── hf/
│   └── openai/
│       └── whisper-tiny/
│           ├── model.safetensors
│           └── config.json
└── checksum.json
# Clear cache
apr cache clear

# Show cache usage
apr cache info

# Pre-download without converting
apr download hf://openai/whisper-tiny

13.10 Testing Requirements

Every import path must have:

  1. Unit Test: Test name mapping and validation logic
  2. Integration Test: Download real model, convert, validate
  3. Golden Test: Compare output against known-good .apr file
  4. Regression Test: Ensure tensor statistics match expected values
#[test]
fn test_whisper_tiny_import() {
    let result = apr_import(
        "hf://openai/whisper-tiny",
        "/tmp/test.apr",
        ImportOptions::default(),
    );

    assert!(result.is_ok());

    // Validate the output
    let validator = AprValidator::new();
    let report = validator.validate(&std::fs::read("/tmp/test.apr").unwrap());

    assert!(report.passed(95), "Score: {}/100", report.total_score);

    // Check specific tensor that was previously buggy
    let reader = AprReader::new(&std::fs::read("/tmp/test.apr").unwrap()).unwrap();
    let ln_weight = reader.load_tensor("decoder.layer_norm.weight").unwrap();
    let stats = TensorStats::compute("decoder.layer_norm.weight", &ln_weight);

    assert!(stats.mean >= 0.5 && stats.mean <= 3.0,
        "decoder.layer_norm.weight mean={} should be in [0.5, 3.0]", stats.mean);
}

14. Implementation Roadmap

Phase 1: Alignment (v2.0)

  • 64-byte tensor alignment
  • Binary tensor index
  • Backward-compatible reader

Phase 2: Compression (v2.1)

  • LZ4 block compression
  • Per-tensor compression flag
  • Streaming decompression

Phase 3: Sharding (v2.2)

  • Manifest file format
  • Multi-file loader
  • Tensor-level demand loading

15. References

  1. Sculley, D., et al. (2015). "Hidden Technical Debt in Machine Learning Systems." NeurIPS 2015
  2. Amershi, S., et al. (2019). "Software Engineering for Machine Learning." ICSE 2019
  3. Vartak, M., et al. (2016). "ModelDB: A System for ML Model Management." SIGMOD 2016
  4. Baylor, D., et al. (2017). "TFX: A TensorFlow-Based Production-Scale ML Platform." KDD 2017
  5. Zaharia, M., et al. (2018). "Accelerating the ML Lifecycle with MLflow." IEEE Data Eng. Bull.

Code References:

  • APR v1: src/serialization/apr.rs
  • GGUF: src/format/gguf.rs
  • Bundle system: src/bundle/
  • SafeTensors: src/serialization/safetensors.rs

16. Appendices

A. Exit Codes

CodeMeaning
0Success
1General error
2Invalid arguments
3File not found
4Format error
5Validation failed

B. Environment Variables

VariableDescriptionDefault
APR_CONFIGConfig file path~/.config/apr/config.toml
APR_CACHECache directory~/.cache/apr
APR_LOG_LEVELLog levelinfo
APR_COLOREnable colorsauto

Document generated following Toyota Way principles and PMAT quality standards.