apr inspect
The apr inspect command reveals model structure, metadata, and tensor
statistics. For microGPT, the equivalent is cargo run --example inspect_model.
Model metadata
$ cargo run --example inspect_model
microGPT Model Inspection
══════════════════════════════════════════════════
Architecture:
Family: microGPT (character-level, 1-layer GPT)
Kernel Class: Custom (MHA + RMSNorm + ReLU)
Hidden Size: 16
Attention Heads: 4 (head_dim=4)
FFN Size: 64
Vocab Size: 27
Context Length: 16
Parameters: 4192
Compare with a production model:
$ apr inspect qwen-1.5b-q4k.apr
Architecture:
Family: qwen2
Hidden Size: 1536
Intermediate Size: 8960
Vocab Size: 151936
Max Position: 32768
RoPE Theta: 1000000
microGPT is ~360,000x smaller than Qwen-1.5B (1.5B / 4,192 ≈ 358K) but uses the same fundamental operations.
Tensor inventory
Equivalent to apr tensors <file> --stats:
Tensors (cf. `apr tensors`):
Name Shape Numel Stats
──────────────────────────────────────────────────────────────
wte (token embeddings) [27, 16] 432 μ=+0.010 σ=0.083
wpe (position embeddings) [16, 16] 256 μ=-0.010 σ=0.079
wq[0] (query proj head 0) [16, 4] 64 μ=-0.010 σ=0.081
...
w_fc1 (MLP up projection) [16, 64] 1024 μ=+0.000 σ=0.082
w_fc2 (MLP down projection) [64, 16] 1024 μ=+0.003 σ=0.079
w_lm (LM head) [16, 27] 432 μ=+0.003 σ=0.078
All weights are initialized from N(0, 0.08²), matching Karpathy’s
original Python implementation.
Kernel pipeline
Equivalent to apr explain --kernel:
┌─────────────────────────┬─────────────────────────────────┬─────────────────────────┐
│ Operation │ Implementation │ Contract │
├─────────────────────────┼─────────────────────────────────┼─────────────────────────┤
│ Embedding Lookup │ one_hot @ wte (differentiable) │ microgpt-v1 § one_hot │
│ RMSNorm │ x * (1/rms) per row │ microgpt-v1 § rms_norm │
│ Attention (4-head MHA) │ per-head Q/K/V/O matmul │ attention-kernel-v1 │
│ Causal Mask │ lower-tri 0, upper -1e9 │ microgpt-v1 § causal │
│ Softmax │ aprender autograd softmax │ softmax-kernel-v1 │
│ MLP (ReLU) │ fc1 → ReLU → fc2 │ activation-kernel-v1 │
│ LM Head │ matmul to vocab logits │ matmul-kernel-v1 │
└─────────────────────────┴─────────────────────────────────┴─────────────────────────┘
Every operation maps to a provable contract. The kernel pipeline for
a production LLaMA model (via apr explain --kernel llama) shows the
same structure at scale:
$ apr explain --kernel llama --proof-status
Kernel Pipeline (9 ops):
MatVec (Q4K) → matvec-kernel-v1 ◉ Tested
Softmax → softmax-kernel-v1 ◉ Tested
Attention (GQA) → element-wise-ops-v1 ◉ Tested
Normalization → normalization-kernel-v1 ◉ Tested
Activation → element-wise-ops-v1 ◉ Tested
MLP → element-wise-ops-v1 ◉ Tested
Position (RoPE) → rope-kernel-v1 ◉ Tested
Tensor roles
Equivalent to apr explain --tensor <name>:
| Tensor | Role |
|---|---|
wte | Token embedding — maps token IDs to dense vectors |
wpe | Position embedding — adds positional information |
wq[h] | Query projection in multi-head attention |
wk[h] | Key projection in multi-head attention |
wv[h] | Value projection in multi-head attention |
wo[h] | Output projection — combines head outputs |
w_fc1 | Feed-forward up projection (expand to 4x) |
w_fc2 | Feed-forward down projection (compress to 1x) |
w_lm | Language model head — projects to vocabulary logits |