Case Study: APR CLI Commands Demo

This case study demonstrates creating test models and using all 27 apr-cli commands for model inspection, validation, transformation, testing, and inference.

The Problem

APR model files need comprehensive tooling for:

NeedTraditional ApproachProblem
InspectionCustom scriptsNo standardization
ValidationManual checksumsIncomplete coverage
TransformationFramework-specificLock-in
RegressionManual testingError-prone

The Solution: apr-cli

The apr CLI provides 29+ commands for complete model lifecycle management:

# Build the CLI
cargo build -p apr-cli

# Inspect model metadata
./target/debug/apr inspect model.apr --json

# Validate integrity (100-point QA)
./target/debug/apr validate model.apr --quality

# Quantize model
./target/debug/apr convert model.apr --quantize int8 -o model-int8.apr

Complete Example

Run: cargo run --example apr_cli_commands

// Run this example:
//   cargo run --example apr_cli_commands
//
// See the CLI reference and source code in crates/ for implementation details.

All Commands

Model Inspection

1. INSPECT - View Model Metadata

apr inspect model.apr              # Basic info
apr inspect model.apr --json       # JSON output
apr inspect model.apr --weights    # Include tensor info

Shows model type, framework, hyperparameters, and training info.

2. TENSORS - List Tensor Info

apr tensors model.apr              # List all tensors
apr tensors model.apr --stats      # Include statistics
apr tensors model.apr --json       # JSON output

Lists tensor names, shapes, dtypes, and statistics.

3. TRACE - Layer-by-Layer Analysis

apr trace model.apr                # Basic trace
apr trace model.apr --verbose      # Detailed trace
apr trace model.apr --json         # JSON output

Analyzes model layer by layer for debugging inference.

4. DEBUG - Debug Output

apr debug model.apr                # Standard debug
apr debug model.apr --drama        # Detailed drama mode
apr debug model.apr --hex --limit 64  # Hex dump

Provides detailed tensor inspection for debugging.

Quality & Validation

5. VALIDATE - Check Model Integrity

apr validate model.apr             # Basic validation
apr validate model.apr --quality   # 100-point QA checklist
apr validate model.apr --strict    # Strict mode

Runs the 100-point quality assessment with grades A+ to F.

6. LINT - Best Practices Check

apr lint model.apr                 # Check best practices

Static analysis for naming conventions, metadata completeness, and efficiency.

Checks:

  • Standard tensor naming patterns (layer.0.weight, not l0_w)
  • Required metadata (author, license, provenance)
  • Tensor alignment (64-byte boundaries)
  • Compression for large tensors (>1MB)

7. DIFF - Compare Two Models

apr diff model_v1.apr model_v2.apr       # Compare models
apr diff model_v1.apr model_v2.apr --json  # JSON output

Shows metadata and tensor differences between model versions.

Model Transformation

8. CONVERT - Quantization/Optimization

apr convert model.apr --quantize int8 -o model-int8.apr
apr convert model.apr --quantize int4 -o model-int4.apr
apr convert model.apr --quantize fp16 -o model-fp16.apr

Applies quantization for reduced model size and faster inference.

QuantizationSize ReductionAccuracy Impact
fp1650%Minimal
int875%Small
int487.5%Moderate

9. EXPORT - Export to Other Formats

apr export model.apr --format safetensors -o model.safetensors
apr export model.apr --format gguf -o model.gguf

Exports APR models to other ecosystems:

  • SafeTensors - HuggingFace ecosystem
  • GGUF - llama.cpp / local inference

10. MERGE - Merge Models

apr merge model1.apr model2.apr --strategy average -o merged.apr
apr merge model1.apr model2.apr --strategy weighted -o merged.apr

Combines multiple models using different strategies:

  • average - Simple tensor averaging
  • weighted - Weighted combination

Import & Interop

11. IMPORT - Import External Models

apr import external.safetensors -o imported.apr
apr import hf://org/repo -o model.apr --arch whisper

Imports from SafeTensors, HuggingFace Hub, and other formats.

Testing & Regression

12. CANARY - Regression Testing

# Create canary from original model
apr canary create model.apr --input ref.wav --output canary.json

# Check optimized model against canary
apr canary check model-optimized.apr --canary canary.json

Captures tensor statistics for regression testing after transformations (quantization, pruning).

Canary data includes:

  • Tensor shapes and counts
  • Mean, std, min, max for each tensor
  • Drift tolerance checking

13. PROBAR - Visual Regression Testing

apr probar model.apr -o probar_output         # Create probar suite
apr probar model.apr -o output --format json  # JSON format

Exports model data for visual regression testing.

Help & Documentation

14. EXPLAIN - Get Explanations

apr explain E002                                            # Explain error code
apr explain --tensor encoder.conv1.weight                   # Explain tensor by convention
apr explain --tensor conv1 --file model.safetensors         # Look up in actual model
apr explain --file model.apr                                # Analyze architecture
apr explain --kernel llama                                  # Kernel pipeline for family
apr explain --kernel qwen2 --json                           # JSON output for tooling
apr explain --kernel /path/to/config.json --verbose         # Resolve from config.json
apr explain --kernel Qwen/Qwen2.5-Coder-0.5B-Instruct      # Resolve from HF repo
apr explain --kernel gemma --proof-status                   # Include proof status

Provides context-aware explanations for errors, tensors, model architectures, and kernel pipelines. When --file is provided with --tensor, looks up the tensor in the actual model via RosettaStone (supports APR, GGUF, SafeTensors). The --kernel flag explains which kernel equivalence class (A-F) a model uses, the architectural constraints that drive selection, and the kernel ops pipeline.

Interactive

15. TUI - Interactive Terminal UI

apr tui model.apr                          # Launch interactive UI

Interactive terminal interface for model exploration with four tabs:

TabKeyDescription
Overview1Model metadata, hyperparameters, training info
Tensors2Tensor list with shapes, dtypes, sizes
Stats3Tensor statistics (mean, std, min, max, zeros, NaNs)
Help?Keyboard shortcuts and navigation help

Keyboard Navigation:

  • 1, 2, 3, ? - Switch tabs directly
  • Tab / Shift+Tab - Cycle through tabs
  • j / - Next item in list
  • k / - Previous item in list
  • q / Esc - Quit

Inference (requires --features inference)

Build with inference support:

cargo build -p apr-cli --features inference

16. RUN - Run Model Inference

apr run model.apr --input "[1.0, 2.0]"       # JSON array input
apr run model.apr --input "1.0,2.0"          # CSV input
apr run model.apr --input "[1.0, 2.0]" --json  # JSON output

Runs inference on APR, SafeTensors, or GGUF models:

FormatInference Type
APR (.apr)Full ML inference via realizar
SafeTensors (.safetensors)Tensor inspection
GGUF (.gguf)Model inspection (mmap)

Input Formats:

  • JSON array: "[1.0, 2.0, 3.0]"
  • CSV: "1.0,2.0,3.0"

17. SERVE - Inference Server and Capacity Planning

Serve Plan — Pre-flight capacity planning (no weights loaded):

# Plan from local file
apr serve plan model.gguf --gpu

# Plan from HuggingFace repo (fetches only ~2KB config.json)
apr serve plan hf://Qwen/Qwen2.5-Coder-1.5B-Instruct --gpu --quant Q4_K_M

# JSON output for CI/tooling
apr serve plan microsoft/phi-2 --gpu --format json

Serve Run — Start inference server:

apr serve run model.apr --port 8080              # Start on port 8080
apr serve run model.apr --host 0.0.0.0 --port 3000  # Bind to all interfaces

Starts a REST API server for model inference:

APR Models (full inference):

# Health check
curl http://localhost:8080/health

# Run inference
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"input": [1.0, 2.0]}'

Server Features:

  • /health - Health check endpoint
  • /predict - Inference endpoint (APR models)
  • /model - Model info endpoint (GGUF/SafeTensors)
  • /tensors - Tensor listing (SafeTensors)
  • Graceful shutdown via Ctrl+C

Chat & Comparison

18. CHAT - Interactive Chat (LLM models)

apr chat model.gguf                                          # Interactive chat
apr chat model.gguf --system "You are a helpful assistant"   # Custom system prompt

19. FLOW - Visualize Data Flow

apr flow model.safetensors            # Show data flow
apr flow model.gguf --json            # JSON output (architecture, groups)
apr flow model.apr --verbose           # Verbose with shapes

Detects architecture (Encoder-Decoder, Decoder-Only, Encoder-Only) and groups tensors by layer. Supports APR, GGUF, and SafeTensors.

20. COMPARE-HF - Compare Against HuggingFace Source

apr compare-hf model.apr --hf openai/whisper-tiny              # APR format
apr compare-hf model.gguf --hf openai/whisper-tiny             # GGUF format
apr compare-hf model.safetensors --hf openai/whisper-tiny      # SafeTensors format
apr compare-hf model.apr --hf openai/whisper-tiny --json       # JSON output

Auto-detects local model format. Compares tensor-by-tensor against HuggingFace source.

HuggingFace Hub

21. PUBLISH - Push to HuggingFace Hub

apr publish model_dir/ org/model-name --dry-run

22. PULL - Download Model

apr pull hf://Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF -o ./models/

Benchmarking & QA

23. QA - Falsifiable QA Checklist

apr qa model.gguf                     # Run 8-gate QA checklist
apr qa model.gguf --json              # JSON output

24. QUALIFY - Cross-Subcommand Smoke Test

apr qualify model.gguf                              # Smoke test all 11 tools
apr qualify model.gguf --tier full                   # Full tier (+contracts +playbook)
apr qualify model.gguf --json                        # JSON output for CI
apr qualify model.gguf --skip validate,validate_quality  # Skip slow gates

Runs every diagnostic CLI tool against a model to verify no crashes. Three tiers: smoke (11 in-process gates), standard (+contract audit), full (+playbook check).

25. SHOWCASE - Performance Benchmark

apr showcase model.gguf --warmup 3 --iterations 10

26. PROFILE - Deep Performance Profiling

apr profile model.gguf --roofline

27. BENCH - Run Benchmarks

apr bench model.gguf --iterations 100

Example Output

Running the example creates demo models:

=== APR CLI Commands Demo ===

--- Part 1: Creating Demo Model ---
  Adding tensors...
  Model type: Linear Regression
  Tensors: 4
  Size: 1690 bytes
Created: /tmp/apr_cli_demo/demo_model.apr

--- Part 2: Creating Second Model (for diff) ---
  Model type: Linear Regression v2
  Tensors: 4
  Size: 1707 bytes
Created: /tmp/apr_cli_demo/demo_model_v2.apr

Use Cases

CI/CD Model Validation

# In CI pipeline
apr validate model.apr --strict --min-score 90 && apr lint model.apr
if [ $? -ne 0 ]; then
    echo "Model validation failed"
    exit 1
fi

Model Optimization Pipeline

# Quantize for production
apr convert model.apr --quantize int8 -o model-int8.apr

# Verify no regression
apr canary create model.apr --input test.wav --output canary.json
apr canary check model-int8.apr --canary canary.json

# Export for deployment
apr export model-int8.apr --format gguf -o model.gguf

Model Version Comparison

# Compare before/after optimization
apr diff original.apr quantized.apr --json | jq '.tensor_changes'

Debugging Inference Issues

# Layer-by-layer trace
apr trace model.apr --verbose | grep -i "nan\|inf"

# Drama mode for detailed analysis
apr debug model.apr --drama

Benefits

BenefitDescription
StandardizedConsistent CLI for all APR models
Comprehensive29+ commands cover full lifecycle
ScriptableJSON output for automation
DebuggableDeep inspection with drama mode
Validatable100-point QA with grades
TransformableQuantization and format conversion
TestableCanary regression testing
InferenceRun predictions and serve REST APIs