Provable Contracts (Design by Contract)
Every kernel in the pipeline MUST have a provable-contracts YAML contract binding it to its mathematical specification. This ensures the optimization techniques produce correct results, not just plausible ones.
16.0 Implementation Status
The provable-contracts crate is wired into apr-leaderboard as a path dependency (../provable-contracts/crates/provable-contracts). Contract validation is integrated into the acceptance --verify command:
# Validate all contracts in contracts/ directory
apr-leaderboard acceptance --verify
# Output:
# Acceptance Criteria Scaffold Verification:
# Scaffolded: 12/27
# Pending (needs real models): 11
# External (needs tooling): 4
#
# Contract validation:
# contracts/pass-at-k.yaml — 1 equations, 3 obligations
Wired APIs:
provable_contracts::schema::parse_contract(path)— Parse YAML contract filesprovable_contracts::schema::validate_contract(&contract)— Check equations, proof obligations, falsification testsprovable_contracts::error::Severity— Filter validation violations by severity
Current contracts (30 in contracts/ directory, all parsed by pv proof-status):
| Contract | Level | Obligs | Tests | Kani | Scope |
|---|---|---|---|---|---|
pass-at-k.yaml | L2 | 3 | 3 | 0 | Eval estimator (Chen et al.) |
inference-throughput.yaml | L2 | 2 | 2 | 0 | CPU/GPU throughput bounds |
decontamination.yaml | L2 | 3 | 3 | 0 | N-gram overlap gate |
distillation.yaml | L2 | 3 | 3 | 0 | Teacher→student quality (PMAT-007) |
lora-algebra.yaml | L2 | 3 | 3 | 0 | LoRA rank/merge math |
quantization.yaml | L2 | 3 | 3 | 0 | INT4/Q4K size + ordering |
dpo-alignment.yaml v2.0 | L1 | 6 | 5 | 0 | DPO e2e pipeline + MBPP target (PMAT-008) |
qlora-training-loop.yaml | L3 | 7 | 8 | 3 | Full training pipeline (§26) |
fused-cross-entropy.yaml | L3 | 4 | 5 | 2 | Chunked CE loss |
nf4-dequantization.yaml | L3 | 4 | 6 | 3 | NF4 codebook + roundtrip |
wgsl-gemm-tiled.yaml | L3 | 4 | 5 | 2 | CUTLASS-derived WGSL GEMM |
wgsl-transpose.yaml | L1 | 3 | 1 | 0 | GPU transpose shader |
gpu-output-norm.yaml | L2 | 3 | 3 | 0 | GPU-resident RMSNorm |
forward-pass-perf.yaml | L1 | 2 | 1 | 0 | Per-op layer timing |
lora-finetune-eval.yaml | L2 | 3 | 3 | 0 | Train→merge→eval (PMAT-008) |
merge-weight-norm.yaml v2.0 | L2 | 4 | 6 | 0 | SLERP/TIES norm + AC-024 (PMAT-010) |
leaderboard-gate.yaml | L2 | 3 | 3 | 0 | AC-022 compound gate |
preference-pairs.yaml | L1 | 4 | 3 | 0 | N-sampling→DPO pairs (PMAT-014) |
compile-binary.yaml | L2 | 3 | 3 | 0 | apr compile (AC-010/026) |
pipeline-validation.yaml | L2 | 3 | 3 | 0 | make verify/validate |
perplexity-baseline.yaml | L2 | 3 | 3 | 0 | WikiText-2 PPL (AC-002) |
tokenizer-preservation.yaml | L2 | 3 | 3 | 0 | GH-580/581 tokenizer in merge/quantize |
data-governance.yaml | L2 | 3 | 3 | 0 | Data catalog + lineage |
quantization-quality.yaml | L2 | 3 | 3 | 0 | INT4 pass@1 retention (AC-023) |
data-quality.yaml | L2 | 4 | 4 | 0 | Training data quality (AC-025) |
pruning-quality.yaml | L2 | 4 | 4 | 0 | Wanda pruning quality (AC-008) |
binding-coverage.yaml | L2 | 3 | 3 | 0 | Contract binding coverage (AC-012) |
hf-parity.yaml | L2 | 4 | 4 | 0 | HuggingFace parity gap (AC-014) |
ties-sign-resolution.yaml | L2 | 4 | 4 | 0 | TIES sign conflict resolution (AC-007) |
ft-completeness.yaml | L1 | 3 | 3 | 0 | All FTs pass — meta contract (AC-015) |
Totals: 101 proof obligations, 101 falsification tests, 10 Kani harnesses. Levels: L1=5, L2=20, L3=4.
Cross-project contracts (in ../provable-contracts/contracts/):
| Contract | Equations | Proof Obligations | Falsification Tests | Status |
|---|---|---|---|---|
gpu-multi-backend-parity-v1.yaml | 4 (multi_backend_parity, backend_priority, bandwidth_bound, jit_correctness) | 6 (parity, no garbage, determinism, wgpu, nvrtc, bandwidth) | 7 (F-MBP-001..007) | Active |
gpu-context-health-v1.yaml | 2 (fp8_guard, context_health) | 3 (FP8 disabled on Blackwell, no poison, Ada still enabled) | 3 (FT-GPU-CTX-001..003) | Verified |
ptx-target-parity-v1.yaml | 3 (target_parity, no_hardcoded, jit_success) | 4 (target match, no emit_ptx, kernels with_target, JIT success) | 5 (FALSIFY-PTP-001..005) | Violated on sm_121 |
gqa-kernel-v1.yaml | 1 (GQA formula) | 8 (normalization, MHA equiv, convex bound, KV broadcast, SIMD, GPU, head mapping) | 9 (FALSIFY-GQ-001..009) | Active |
16.0.1 Binding Coverage (AC-012)
Contract binding coverage tracks how many proof obligations have corresponding code implementations identified. See contracts/BINDING_REGISTRY.md for the full mapping.
| Metric | Current | Target |
|---|---|---|
| Obligations bound | 80/98 | 93/98 |
| Coverage | 81.6% | ≥95% |
| Gap | 18 unbound | 5 allowed |
Top unbound areas: TIES sign election (3), pruning eval pipeline (2), DPO pipeline (2), binding meta (2). See BINDING_REGISTRY.md for the priority list.
16.1 Contract Coverage Requirements
The leaderboard pipeline touches these kernel equivalence classes from the provable-contracts registry:
| Kernel Class | Contracts Required | Pipeline Stage |
|---|---|---|
| E (Qwen) | RMSNorm, SwiGLU, GQA, RoPE | Inference (eval, distill, chat) |
| Attention | attention-kernel-v1, flash-attention-kernel-v1 | Inference, distillation |
| Quantization | quantization-ordering-v1, q4k-q6k-superblock-v1 | apr quantize, QLoRA base weights |
| LoRA | lora-algebra-v1 | apr finetune --method lora/qlora |
| Softmax | softmax-kernel-v1 | Attention, sampling |
| Matmul | matmul-kernel-v1 | All linear layers |
| AdamW | adamw-kernel-v1 | Training optimizer |
16.2 Contract Verification Gates
Each pipeline stage MUST pass its contract obligations before proceeding:
# Verify all kernel contracts are bound and implemented
pv proof-status ../provable-contracts/contracts/ \
--binding ../provable-contracts/contracts/aprender/binding.yaml \
--format json
# Verify Qwen2 architecture contracts specifically
pv audit ../provable-contracts/contracts/model/qwen35-shapes-v1.yaml \
--binding ../provable-contracts/contracts/aprender/binding.yaml
# Run falsification tests for all pipeline-relevant kernels
cargo test --features kani -p aprender -- contract
16.3 Pipeline-Specific Proof Obligations
| Obligation | Property | Verification Level | Gate |
|---|---|---|---|
| PO-LB-001 | Distillation preserves architecture invariants | L2 (falsification) | Before apr distill |
| PO-LB-002 | Merge preserves tensor shape flow | L3 (proptest) | Before apr merge |
| PO-LB-003 | Prune maintains attention head structure | L2 (falsification) | Before apr prune |
| PO-LB-004 | Quantization ordering matches golden order §8 | L1 (type system) | Compile-time |
| PO-LB-005 | LoRA adapter rank ≤ hidden dim | L1 (Poka-Yoke) | Compile-time |
| PO-LB-006 | Q4K dequantize × quantize ≈ identity (CPU + wgpu) | L4 (Kani, bound=256) | CI |
| PO-LB-007 | Softmax normalization: sum(output) ≈ 1.0 (CPU + wgpu) | L4 (Kani, bound=16) | CI |
| PO-LB-008 | SLERP interpolation preserves weight norms | L3 (proptest) | Before apr merge --strategy slerp |
16.4 #[contract] Annotations
Every function in the apr-leaderboard pipeline that performs a mathematical operation MUST carry a #[contract] annotation linking it to its provable-contracts YAML:
#![allow(unused)] fn main() { use provable_contracts_macros::contract; #[contract("quantization-ordering-v1", equation = "quantize_int4")] pub fn quantize_model(model: &AprModel, scheme: QuantScheme) -> Result<AprModel> { // Implementation — contract macro enforces binding at compile time } #[contract("lora-algebra-v1", equation = "lora_forward")] pub fn lora_forward(base: &Tensor, a: &Tensor, b: &Tensor, scale: f32) -> Tensor { // output = base @ x + scale * (B @ (A @ x)) } }
If the binding is missing from contracts/aprender/binding.yaml, the build fails. Zero tolerance for unbound kernels.
16.5 Falsification Test Results
Tests run via make check-contracts (64 passed, 1 failed, updated 2026-04-03):
| Category | Tests | Status | Details |
|---|---|---|---|
| pass@k | 5 | PASS | FT-001..005 (boundary, ratio, high-c) |
| throughput | 2 | PASS | 2.5 tok/s, 385ms TTFT |
| benchmark data | 3 | PASS | HumanEval 164, MBPP 974, BCB 1140 |
| decontamination | 1 | PASS | 0% HE/MBPP overlap |
| eval results | 3 | PASS | 90.85% best, 15 runs, latest >= 80% |
| distillation | 2 | PASS | 32B > 7B, 11 categories |
| MBPP eval | 1 | PASS | 76.2% >= 70% |
| AC-022 gate | 1 | FAIL | HE=90.85% MBPP=76.2% < 80% |
| quantization | 3 | PASS | Q4K 35% FP16, apr check, golden ordering |
| distillation data | 3 | PASS | 99 completions, valid JSONL, 99 prompts |
| oracle analysis | 2 | PASS | 96.34% upper bound, 6 never-solved |
| pipeline | 3 | PASS | 24 scripts, 22 configs, 57 targets |
| compile | 1 | PASS | apr compile available |
| data catalog | 2 | PASS | 9 contract bindings, 13 datasets |
| leaderboard coverage | 2 | PASS | 20 eval runs, 2 benchmarks |
| HF parity | 1 | PASS | 3.05pp gap (apr=90.85%, HF=87.8%) |
| contract coverage | 1 | PASS | 29 contract YAMLs >= 25 |
| structure | 29 | PASS | All 29 contract YAMLs valid |
Makefile gate: make check-contracts — 68 passed, 1 failed (FT-GATE-001: MBPP 76.2% < 80%).
pv proof-status: 30/30 contracts parsed, 101 obligations, 101 tests, 10 Kani.
See contracts/CONTRACT_STATUS.md for full audit trail.