APR Leaderboard Specification

Status: ACTIVE Version: 2.2.0 Date: 2026-03-22 Authors: APR Team

Quick Status

MetricValue
apr CLI subcommands verified19
Makefile targets45
Shell scripts10
YAML configs19 (7 models + 8 recipes + 1 eval + 2 pipeline + 1 data)
Python scripts0 (zero-Python constraint)
TOML configs0 (YAML-only)
Provable contracts5 (pass-at-k, decontamination, throughput, lora-algebra, quantization)
GPU sharing tests143 (entrenar, 9 modules)
HumanEval pass@1 (best 7B)87.20% (few-shot, 0.60pp from HF parity)
HumanEval pass@1 (best 32B)90.85% (standard, CPU batch)
MBPP pass@1 (best 7B)76.20% (standard + test assertions)
Perplexity (WikiText-2)6.63 (1.5B-Instruct Q4K)
ACs verified8 verified, 4 partial, 15 not tested, 2 blocked
Open issues6 (GH-8, GH-10, GH-11, GH-12, GH-13, GH-14)

See Implementation Status for detailed tracking.

Definitive spec: docs/specifications/leaderboard-spec.md — single executive summary with component files.