Case Study: APR with JSON Metadata
This case study demonstrates embedding arbitrary JSON metadata (vocabulary, tokenizer config, model settings) alongside tensor data in a single .apr file for WASM-ready deployment.
The Problem
Modern ML models need more than just weights:
| Data Type | Traditional Approach | Problem |
|---|---|---|
| Vocabulary | Separate vocab.json | Multiple files to manage |
| Tokenizer | Separate tokenizer.json | Version mismatches |
| Config | Separate config.yaml | Deployment complexity |
| Custom | Application-specific files | N+1 file problem |
The Solution: Embedded Metadata
The .apr format supports arbitrary JSON metadata embedded directly in the model file:
use aprender::serialization::apr::{AprWriter, AprReader};
use serde_json::json;
let mut writer = AprWriter::new();
// Embed any JSON metadata
writer.set_metadata("model_type", json!("whisper-tiny"));
writer.set_metadata("n_vocab", json!(51865));
writer.set_metadata("tokenizer", json!({
"tokens": ["<|endoftext|>", "<|startoftranscript|>"],
"merges": [["t", "h"], ["th", "e"]],
"special_tokens": {"eot": 50256, "sot": 50257}
}));
// Add tensors
writer.add_tensor_f32("encoder.weight", vec![384, 80], &weights);
// Single file contains everything
writer.write("model.apr")?;
Complete Example
Run: cargo run --example apr_with_metadata
// Run this example:
// cargo run --example apr_with_metadata
//
// See the CLI reference and source code in crates/ for implementation details.
Key Features
1. Arbitrary JSON Metadata
Any JSON-serializable data can be embedded:
// Strings
writer.set_metadata("model_name", json!("my-model"));
// Numbers
writer.set_metadata("n_layers", json!(12));
// Arrays
writer.set_metadata("supported_languages", json!(["en", "es", "fr"]));
// Objects
writer.set_metadata("config", json!({
"hidden_size": 768,
"num_attention_heads": 12
}));
2. Type-Safe Tensor Storage
Tensors are stored with shape information:
writer.add_tensor_f32("layer.0.weight", vec![768, 768], &weights);
writer.add_tensor_f32("layer.0.bias", vec![768], &bias);
3. Single-File Deployment
Perfect for WASM:
// Embed at compile time
const MODEL: &[u8] = include_bytes!("model.apr");
fn inference(input: &[f32]) -> Vec<f32> {
let reader = AprReader::from_bytes(MODEL.to_vec()).unwrap();
// Access metadata
let vocab = reader.get_metadata("tokenizer").unwrap();
// Access tensors
let weights = reader.read_tensor_f32("encoder.weight").unwrap();
// ... inference logic
}
Use Cases
Speech Recognition (Whisper-style)
writer.set_metadata("tokenizer", json!({
"tokens": vocab_tokens,
"merges": bpe_merges,
"special_tokens": {
"eot": 50256,
"sot": 50257,
"transcribe": 50358,
"translate": 50359
}
}));
Language Models
writer.set_metadata("tokenizer", json!({
"type": "BPE",
"vocab_size": 32000,
"pad_token": "<pad>",
"eos_token": "</s>",
"bos_token": "<s>"
}));
Custom Models
writer.set_metadata("preprocessing", json!({
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225],
"input_size": [224, 224]
}));
Audio Models (Mel Filterbank)
For speech recognition models like Whisper, embedding the exact mel filterbank used during training is critical for correct transcription. Computing filterbanks at runtime produces different values due to normalization differences.
// Store filterbank as a named tensor (most efficient for 64KB+ data)
writer.add_tensor_f32(
"audio.mel_filterbank",
vec![80, 201], // n_mels x n_freqs
&filterbank_data,
);
// Store audio preprocessing config in metadata
writer.set_metadata("audio", json!({
"sample_rate": 16000,
"n_fft": 400,
"hop_length": 160,
"n_mels": 80
}));
Reading back:
let reader = AprReader::from_bytes(model_bytes)?;
// Read filterbank tensor
let filterbank = reader.read_tensor_f32("audio.mel_filterbank")?;
// Get audio config
let audio_config = reader.get_metadata("audio").unwrap();
let n_mels = audio_config["n_mels"].as_u64().unwrap() as usize;
let n_freqs = filterbank.len() / n_mels;
// Use filterbank for mel spectrogram computation
let mel_spectrogram = compute_mel(&audio_samples, &filterbank, n_mels, n_freqs);
Why this matters: Whisper was trained with librosa's slaney-normalized filterbank where row sums are ~0.025. Computing from scratch produces peak-normalized filterbanks with row sums of ~1.0+. This mismatch causes the "rererer" hallucination bug.
Benefits
| Benefit | Description |
|---|---|
| Single file | No more managing multiple files |
| Version-locked | Metadata travels with weights |
| WASM-ready | Embed entire model in binary |
| Type-safe | CRC32 checksum for integrity |
| Flexible | Any JSON structure supported |
Binary Data: Metadata vs Tensor
When storing binary data (filterbanks, embeddings), choose the right approach:
| Data Size | JSON Metadata | Named Tensor |
|---|---|---|
| < 100KB | Preferred | Overkill |
| 100KB - 1MB | Acceptable | Recommended |
| > 1MB | Avoid (slow JSON parsing) | Required |
Mel filterbank (64KB): Both work; tensor is more efficient.
Vocabulary (1-5MB): Use JSON for string arrays, tensor for embedding matrices.
Large embeddings (>10MB): Always use tensors.