Case Study: APR Model Cache

This example demonstrates the hierarchical caching system implementing Toyota Way Just-In-Time principles for model management.

Overview

The caching module provides a multi-tier cache for model storage:

  • L1 (Hot): In-memory, lowest latency
  • L2 (Warm): Memory-mapped files
  • L3 (Cold): Persistent storage

Toyota Way Principles

PrincipleApplication
Right AmountCache only what's needed for current inference
Right TimePrefetch before access, evict after use
Right PlaceL1 = hot, L2 = warm, L3 = cold storage

Running the Example

cargo run --example apr_cache

Eviction Policies

PolicyDescriptionBest For
LRULeast Recently UsedGeneral workloads
LFULeast Frequently UsedRepeated inference
ARCAdaptive Replacement CacheMixed workloads
ClockClock algorithm (FIFO variant)High throughput
FixedNo evictionEmbedded systems
let policies = [
    EvictionPolicy::LRU,
    EvictionPolicy::LFU,
    EvictionPolicy::ARC,
    EvictionPolicy::Clock,
    EvictionPolicy::Fixed,
];

for policy in &policies {
    println!("{:?}: {}", policy, policy.description());
    println!("  Supports eviction: {}", policy.supports_eviction());
    println!("  Recommended for: {}", policy.recommended_use_case());
}

Memory Budget

Control cache memory with watermarks:

// Default watermarks (90% high, 70% low)
let budget = MemoryBudget::new(100);

// Check eviction decisions
println!("90 pages: needs_eviction={}", budget.needs_eviction(90));  // true
println!("70 pages: can_stop={}", budget.can_stop_eviction(70));     // true

// Custom watermarks
let custom = MemoryBudget::with_watermarks(1000, 0.95, 0.80);

// Reserved pages (won't be evicted)
budget.reserve_page(1);
budget.reserve_page(2);
println!("Page 1 can_evict: {}", budget.can_evict(1));  // false

Access Statistics

Track cache performance:

let mut stats = AccessStats::new();

// Record cache accesses
for i in 0..80 {
    stats.record_hit(100 + (i % 50), i);
}
for i in 80..100 {
    stats.record_miss(i);
}

// Prefetch tracking
for _ in 0..30 {
    stats.record_prefetch_hit();
}

println!("Hit Rate: {:.1}%", stats.hit_rate() * 100.0);
println!("Avg Access Time: {:.1} ns", stats.avg_access_time_ns());
println!("Prefetch Effectiveness: {:.1}%", stats.prefetch_effectiveness() * 100.0);

Cache Configuration

Default Configuration

let default = CacheConfig::default();
println!("L1 Max: {} MB", default.l1_max_bytes / (1024 * 1024));
println!("L2 Max: {} MB", default.l2_max_bytes / (1024 * 1024));
println!("Eviction: {:?}", default.eviction_policy);
println!("Prefetch: {}", default.prefetch_enabled);

Embedded Configuration

let embedded = CacheConfig::embedded(1024 * 1024);  // 1MB
// L2 disabled, no eviction (Fixed policy)

Custom Configuration

let custom = CacheConfig::new()
    .with_l1_size(128 * 1024 * 1024)
    .with_l2_size(2 * 1024 * 1024 * 1024)
    .with_eviction_policy(EvictionPolicy::ARC)
    .with_ttl(Duration::from_secs(3600))
    .with_prefetch(true);

Model Registry

Manage cached models:

let config = CacheConfig::new()
    .with_l1_size(10 * 1024)
    .with_eviction_policy(EvictionPolicy::LRU);

let mut registry = ModelRegistry::new(config);

// Insert models
for i in 0..5 {
    let data = vec![0u8; 2048];
    let entry = CacheEntry::new(
        [i as u8; 32],
        ModelType::new(1),
        CacheData::Decompressed(data),
    );
    registry.insert_l1(format!("model_{}", i), entry);
}

// Access models
let _ = registry.get("model_0");
let _ = registry.get("model_2");

// Get statistics
let stats = registry.stats();
println!("L1 Entries: {}", stats.l1_entries);
println!("L1 Bytes: {} KB", stats.l1_bytes / 1024);
println!("Hit Rate: {:.1}%", stats.hit_rate() * 100.0);

Cache Tiers

TierNameTypical Latency
L1HotHot Cache~1 microsecond
L2WarmWarm Cache~100 microseconds
L3ColdCold Storage~10 milliseconds

Cache Data Variants

// In-memory (decompressed)
let decompressed = CacheData::Decompressed(vec![0u8; 1000]);

// In-memory (compressed)
let compressed = CacheData::Compressed(vec![0u8; 500]);

// Memory-mapped file
let mapped = CacheData::Mapped {
    path: "/tmp/model.cache".into(),
    offset: 0,
    length: 2000,
};

println!("Decompressed size: {}", decompressed.size());
println!("Compressed: {}", compressed.is_compressed());
println!("Mapped: {}", mapped.is_mapped());

Source Code

  • Example: examples/apr_cache.rs
  • Module: src/cache/mod.rs