Case Study: APR CLI Commands Demo
This case study demonstrates creating test models and using all 27 apr-cli commands for model inspection, validation, transformation, testing, and inference.
The Problem
APR model files need comprehensive tooling for:
| Need | Traditional Approach | Problem |
|---|---|---|
| Inspection | Custom scripts | No standardization |
| Validation | Manual checksums | Incomplete coverage |
| Transformation | Framework-specific | Lock-in |
| Regression | Manual testing | Error-prone |
The Solution: apr-cli
The apr CLI provides 29+ commands for complete model lifecycle management:
# Build the CLI
cargo build -p apr-cli
# Inspect model metadata
./target/debug/apr inspect model.apr --json
# Validate integrity (100-point QA)
./target/debug/apr validate model.apr --quality
# Quantize model
./target/debug/apr convert model.apr --quantize int8 -o model-int8.apr
Complete Example
Run: cargo run --example apr_cli_commands
// Run this example:
// cargo run --example apr_cli_commands
//
// See the CLI reference and source code in crates/ for implementation details.
All Commands
Model Inspection
1. INSPECT - View Model Metadata
apr inspect model.apr # Basic info
apr inspect model.apr --json # JSON output
apr inspect model.apr --weights # Include tensor info
Shows model type, framework, hyperparameters, and training info.
2. TENSORS - List Tensor Info
apr tensors model.apr # List all tensors
apr tensors model.apr --stats # Include statistics
apr tensors model.apr --json # JSON output
Lists tensor names, shapes, dtypes, and statistics.
3. TRACE - Layer-by-Layer Analysis
apr trace model.apr # Basic trace
apr trace model.apr --verbose # Detailed trace
apr trace model.apr --json # JSON output
Analyzes model layer by layer for debugging inference.
4. DEBUG - Debug Output
apr debug model.apr # Standard debug
apr debug model.apr --drama # Detailed drama mode
apr debug model.apr --hex --limit 64 # Hex dump
Provides detailed tensor inspection for debugging.
Quality & Validation
5. VALIDATE - Check Model Integrity
apr validate model.apr # Basic validation
apr validate model.apr --quality # 100-point QA checklist
apr validate model.apr --strict # Strict mode
Runs the 100-point quality assessment with grades A+ to F.
6. LINT - Best Practices Check
apr lint model.apr # Check best practices
Static analysis for naming conventions, metadata completeness, and efficiency.
Checks:
- Standard tensor naming patterns (layer.0.weight, not l0_w)
- Required metadata (author, license, provenance)
- Tensor alignment (64-byte boundaries)
- Compression for large tensors (>1MB)
7. DIFF - Compare Two Models
apr diff model_v1.apr model_v2.apr # Compare models
apr diff model_v1.apr model_v2.apr --json # JSON output
Shows metadata and tensor differences between model versions.
Model Transformation
8. CONVERT - Quantization/Optimization
apr convert model.apr --quantize int8 -o model-int8.apr
apr convert model.apr --quantize int4 -o model-int4.apr
apr convert model.apr --quantize fp16 -o model-fp16.apr
Applies quantization for reduced model size and faster inference.
| Quantization | Size Reduction | Accuracy Impact |
|---|---|---|
| fp16 | 50% | Minimal |
| int8 | 75% | Small |
| int4 | 87.5% | Moderate |
9. EXPORT - Export to Other Formats
apr export model.apr --format safetensors -o model.safetensors
apr export model.apr --format gguf -o model.gguf
Exports APR models to other ecosystems:
- SafeTensors - HuggingFace ecosystem
- GGUF - llama.cpp / local inference
10. MERGE - Merge Models
apr merge model1.apr model2.apr --strategy average -o merged.apr
apr merge model1.apr model2.apr --strategy weighted -o merged.apr
Combines multiple models using different strategies:
- average - Simple tensor averaging
- weighted - Weighted combination
Import & Interop
11. IMPORT - Import External Models
apr import external.safetensors -o imported.apr
apr import hf://org/repo -o model.apr --arch whisper
Imports from SafeTensors, HuggingFace Hub, and other formats.
Testing & Regression
12. CANARY - Regression Testing
# Create canary from original model
apr canary create model.apr --input ref.wav --output canary.json
# Check optimized model against canary
apr canary check model-optimized.apr --canary canary.json
Captures tensor statistics for regression testing after transformations (quantization, pruning).
Canary data includes:
- Tensor shapes and counts
- Mean, std, min, max for each tensor
- Drift tolerance checking
13. PROBAR - Visual Regression Testing
apr probar model.apr -o probar_output # Create probar suite
apr probar model.apr -o output --format json # JSON format
Exports model data for visual regression testing.
Help & Documentation
14. EXPLAIN - Get Explanations
apr explain E002 # Explain error code
apr explain --tensor encoder.conv1.weight # Explain tensor by convention
apr explain --tensor conv1 --file model.safetensors # Look up in actual model
apr explain --file model.apr # Analyze architecture
apr explain --kernel llama # Kernel pipeline for family
apr explain --kernel qwen2 --json # JSON output for tooling
apr explain --kernel /path/to/config.json --verbose # Resolve from config.json
apr explain --kernel Qwen/Qwen2.5-Coder-0.5B-Instruct # Resolve from HF repo
apr explain --kernel gemma --proof-status # Include proof status
Provides context-aware explanations for errors, tensors, model architectures, and kernel pipelines. When --file is provided with --tensor, looks up the tensor in the actual model via RosettaStone (supports APR, GGUF, SafeTensors). The --kernel flag explains which kernel equivalence class (A-F) a model uses, the architectural constraints that drive selection, and the kernel ops pipeline.
Interactive
15. TUI - Interactive Terminal UI
apr tui model.apr # Launch interactive UI
Interactive terminal interface for model exploration with four tabs:
| Tab | Key | Description |
|---|---|---|
| Overview | 1 | Model metadata, hyperparameters, training info |
| Tensors | 2 | Tensor list with shapes, dtypes, sizes |
| Stats | 3 | Tensor statistics (mean, std, min, max, zeros, NaNs) |
| Help | ? | Keyboard shortcuts and navigation help |
Keyboard Navigation:
1,2,3,?- Switch tabs directlyTab/Shift+Tab- Cycle through tabsj/↓- Next item in listk/↑- Previous item in listq/Esc- Quit
Inference (requires --features inference)
Build with inference support:
cargo build -p apr-cli --features inference
16. RUN - Run Model Inference
apr run model.apr --input "[1.0, 2.0]" # JSON array input
apr run model.apr --input "1.0,2.0" # CSV input
apr run model.apr --input "[1.0, 2.0]" --json # JSON output
Runs inference on APR, SafeTensors, or GGUF models:
| Format | Inference Type |
|---|---|
| APR (.apr) | Full ML inference via realizar |
| SafeTensors (.safetensors) | Tensor inspection |
| GGUF (.gguf) | Model inspection (mmap) |
Input Formats:
- JSON array:
"[1.0, 2.0, 3.0]" - CSV:
"1.0,2.0,3.0"
17. SERVE - Inference Server and Capacity Planning
Serve Plan — Pre-flight capacity planning (no weights loaded):
# Plan from local file
apr serve plan model.gguf --gpu
# Plan from HuggingFace repo (fetches only ~2KB config.json)
apr serve plan hf://Qwen/Qwen2.5-Coder-1.5B-Instruct --gpu --quant Q4_K_M
# JSON output for CI/tooling
apr serve plan microsoft/phi-2 --gpu --format json
Serve Run — Start inference server:
apr serve run model.apr --port 8080 # Start on port 8080
apr serve run model.apr --host 0.0.0.0 --port 3000 # Bind to all interfaces
Starts a REST API server for model inference:
APR Models (full inference):
# Health check
curl http://localhost:8080/health
# Run inference
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"input": [1.0, 2.0]}'
Server Features:
/health- Health check endpoint/predict- Inference endpoint (APR models)/model- Model info endpoint (GGUF/SafeTensors)/tensors- Tensor listing (SafeTensors)- Graceful shutdown via Ctrl+C
Chat & Comparison
18. CHAT - Interactive Chat (LLM models)
apr chat model.gguf # Interactive chat
apr chat model.gguf --system "You are a helpful assistant" # Custom system prompt
19. FLOW - Visualize Data Flow
apr flow model.safetensors # Show data flow
apr flow model.gguf --json # JSON output (architecture, groups)
apr flow model.apr --verbose # Verbose with shapes
Detects architecture (Encoder-Decoder, Decoder-Only, Encoder-Only) and groups tensors by layer. Supports APR, GGUF, and SafeTensors.
20. COMPARE-HF - Compare Against HuggingFace Source
apr compare-hf model.apr --hf openai/whisper-tiny # APR format
apr compare-hf model.gguf --hf openai/whisper-tiny # GGUF format
apr compare-hf model.safetensors --hf openai/whisper-tiny # SafeTensors format
apr compare-hf model.apr --hf openai/whisper-tiny --json # JSON output
Auto-detects local model format. Compares tensor-by-tensor against HuggingFace source.
HuggingFace Hub
21. PUBLISH - Push to HuggingFace Hub
apr publish model_dir/ org/model-name --dry-run
22. PULL - Download Model
apr pull hf://Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF -o ./models/
Benchmarking & QA
23. QA - Falsifiable QA Checklist
apr qa model.gguf # Run 8-gate QA checklist
apr qa model.gguf --json # JSON output
24. QUALIFY - Cross-Subcommand Smoke Test
apr qualify model.gguf # Smoke test all 11 tools
apr qualify model.gguf --tier full # Full tier (+contracts +playbook)
apr qualify model.gguf --json # JSON output for CI
apr qualify model.gguf --skip validate,validate_quality # Skip slow gates
Runs every diagnostic CLI tool against a model to verify no crashes. Three tiers: smoke (11 in-process gates), standard (+contract audit), full (+playbook check).
25. SHOWCASE - Performance Benchmark
apr showcase model.gguf --warmup 3 --iterations 10
26. PROFILE - Deep Performance Profiling
apr profile model.gguf --roofline
27. BENCH - Run Benchmarks
apr bench model.gguf --iterations 100
Example Output
Running the example creates demo models:
=== APR CLI Commands Demo ===
--- Part 1: Creating Demo Model ---
Adding tensors...
Model type: Linear Regression
Tensors: 4
Size: 1690 bytes
Created: /tmp/apr_cli_demo/demo_model.apr
--- Part 2: Creating Second Model (for diff) ---
Model type: Linear Regression v2
Tensors: 4
Size: 1707 bytes
Created: /tmp/apr_cli_demo/demo_model_v2.apr
Use Cases
CI/CD Model Validation
# In CI pipeline
apr validate model.apr --strict --min-score 90 && apr lint model.apr
if [ $? -ne 0 ]; then
echo "Model validation failed"
exit 1
fi
Model Optimization Pipeline
# Quantize for production
apr convert model.apr --quantize int8 -o model-int8.apr
# Verify no regression
apr canary create model.apr --input test.wav --output canary.json
apr canary check model-int8.apr --canary canary.json
# Export for deployment
apr export model-int8.apr --format gguf -o model.gguf
Model Version Comparison
# Compare before/after optimization
apr diff original.apr quantized.apr --json | jq '.tensor_changes'
Debugging Inference Issues
# Layer-by-layer trace
apr trace model.apr --verbose | grep -i "nan\|inf"
# Drama mode for detailed analysis
apr debug model.apr --drama
Benefits
| Benefit | Description |
|---|---|
| Standardized | Consistent CLI for all APR models |
| Comprehensive | 29+ commands cover full lifecycle |
| Scriptable | JSON output for automation |
| Debuggable | Deep inspection with drama mode |
| Validatable | 100-point QA with grades |
| Transformable | Quantization and format conversion |
| Testable | Canary regression testing |
| Inference | Run predictions and serve REST APIs |