Case Study: Model Bundling and Memory Paging
Deploy large ML models on resource-constrained devices using aprender's bundle module with LRU-based memory paging.
Quick Start
use aprender::bundle::{ModelBundle, BundleBuilder, PagedBundle, PagingConfig};
// Create a bundle with multiple models
let bundle = BundleBuilder::new("models.apbundle")
.add_model("encoder", encoder_weights)
.add_model("decoder", decoder_weights)
.add_model("classifier", classifier_weights)
.build()?;
// Load with memory paging (10MB limit)
let mut paged = PagedBundle::open("models.apbundle",
PagingConfig::new().with_max_memory(10_000_000))?;
// Access models on-demand - only loads what's needed
let weights = paged.get_model("encoder")?;
Motivation
Modern ML models can exceed available RAM, especially on:
- Edge devices (IoT, embedded systems)
- Mobile applications
- Multi-model deployments
- Development machines running multiple services
The bundle module solves this with:
- Model Bundling: Package multiple models atomically
- Memory Paging: LRU-based on-demand loading
- Pre-fetching: Proactive loading based on access patterns
The .apbundle Format
┌─────────────────────────────────────────────────┐
│ Magic: "APBUNDLE" (8 bytes) │
├─────────────────────────────────────────────────┤
│ Version: 1 (4 bytes) │
├─────────────────────────────────────────────────┤
│ Manifest Length (4 bytes) │
├─────────────────────────────────────────────────┤
│ Manifest (JSON) │
│ - model_count │
│ - models: [{name, offset, size, checksum}] │
├─────────────────────────────────────────────────┤
│ Model Data │
│ - encoder weights (aligned) │
│ - decoder weights (aligned) │
│ - classifier weights (aligned) │
└─────────────────────────────────────────────────┘
Memory Paging Strategies
LRU (Least Recently Used)
let config = PagingConfig::new()
.with_max_memory(10_000_000) // 10MB limit
.with_eviction(EvictionStrategy::LRU);
Evicts models not accessed recently. Best for sequential workloads.
LFU (Least Frequently Used)
let config = PagingConfig::new()
.with_max_memory(10_000_000)
.with_eviction(EvictionStrategy::LFU);
Evicts models with fewest accesses. Best for workloads with hot/cold patterns.
Pre-fetching
Enable proactive loading based on access patterns:
let config = PagingConfig::new()
.with_prefetch(true)
.with_prefetch_count(2); // Pre-fetch next 2 likely models
let mut bundle = PagedBundle::open("models.apbundle", config)?;
// Manual hint
bundle.prefetch_hint("classifier")?;
Paging Statistics
Monitor cache performance:
let stats = bundle.stats();
println!("Hits: {}", stats.hits);
println!("Misses: {}", stats.misses);
println!("Evictions: {}", stats.evictions);
println!("Hit Rate: {:.1}%", stats.hit_rate() * 100.0);
println!("Memory Used: {} bytes", stats.memory_used);
Shell Completion Example
aprender-shell uses paging for large histories:
# Train with 10MB memory limit
aprender-shell train --memory-limit 10
# Suggestions load n-gram segments on-demand
aprender-shell suggest "git " --memory-limit 10
# View paging statistics
aprender-shell stats --memory-limit 10
Output:
📊 Paged Model Statistics:
N-gram size: 3
Total commands: 50000
Vocabulary size: 15000
Total segments: 25
Loaded segments: 3
Memory limit: 10.0 MB
Loaded bytes: 2.5 KB
📈 Paging Statistics:
Page hits: 47
Page misses: 3
Evictions: 0
Hit rate: 94.0%
Architecture
┌──────────────────────────────────────────────────────────────┐
│ PagedBundle │
├──────────────────────────────────────────────────────────────┤
│ BundleReader │ LRU Cache │ PageTable │
│ ───────────── │ ────────── │ ───────── │
│ read_manifest() │ HashMap<K,V> │ track access │
│ read_model() │ LRU ordering │ find LRU/LFU │
│ │ eviction │ timestamps │
├──────────────────────────────────────────────────────────────┤
│ PagingConfig │
│ max_memory: 10MB │ eviction: LRU │ prefetch: true │
└──────────────────────────────────────────────────────────────┘
API Reference
BundleBuilder
let bundle = BundleBuilder::new("path.apbundle")
.add_model("name", data)
.with_config(BundleConfig::new()
.with_compression(false)
.with_max_memory(10_000_000))
.build()?;
ModelBundle
// Create empty bundle
let mut bundle = ModelBundle::new();
bundle.add_model("model1", weights);
bundle.save("path.apbundle")?;
// Load bundle
let bundle = ModelBundle::load("path.apbundle")?;
let weights = bundle.get_model("model1");
PagedBundle
// Open with paging
let mut bundle = PagedBundle::open("path.apbundle",
PagingConfig::new().with_max_memory(10_000_000))?;
// Get model (loads on-demand)
let data = bundle.get_model("model1")?;
// Check cache state
assert!(bundle.is_cached("model1"));
// Manually evict
bundle.evict("model1");
// Clear all cached data
bundle.clear_cache();
PagingConfig
let config = PagingConfig::new()
.with_max_memory(10_000_000) // 10MB limit
.with_page_size(4096) // 4KB pages
.with_prefetch(true) // Enable pre-fetching
.with_prefetch_count(2) // Pre-fetch 2 models
.with_eviction(EvictionStrategy::LRU);
Performance Characteristics
| Operation | Time | Notes |
|---|---|---|
| Bundle creation | O(n) | n = total model bytes |
| Bundle load (metadata) | O(m) | m = manifest size |
| Model access (cached) | O(1) | Hash lookup |
| Model access (uncached) | O(k) | k = model size, disk I/O |
| Eviction | O(1) | LRU: deque pop; LFU: heap |
| Pre-fetch | O(k) | Background loading |
Best Practices
- Size models appropriately: Split large models into logical components
- Choose eviction wisely: LRU for sequential, LFU for hot/cold
- Monitor hit rates: Target >80% for good performance
- Use pre-fetching: Reduce latency for predictable access patterns
- Test memory limits: Profile actual usage before deployment
Troubleshooting
| Issue | Solution |
|---|---|
| Low hit rate | Increase memory limit or reduce model sizes |
| High eviction count | Models too large for memory limit |
| Slow first access | Use pre-fetch hints for critical models |
| OOM errors | Reduce max_memory, ensure eviction works |
Implementation Details
The bundle module is implemented in pure Rust with:
- 42 tests covering all components
- Zero unsafe code
- No external dependencies beyond std
- Cross-platform (Unix mmap simulation via std I/O)
See src/bundle/ for implementation:
mod.rs: ModelBundle, BundleBuilder, BundleConfigformat.rs: Binary format reader/writermanifest.rs: JSON manifest handlingmmap.rs: Memory-mapped file abstractionpaging.rs: PagedBundle, PagingConfig, eviction strategies