Provable Contracts (Design by Contract)

Every kernel in the pipeline MUST have a provable-contracts YAML contract binding it to its mathematical specification. This ensures the optimization techniques produce correct results, not just plausible ones.

16.0 Implementation Status

The provable-contracts crate is wired into apr-leaderboard as a path dependency (../provable-contracts/crates/provable-contracts). Contract validation is integrated into the acceptance --verify command:

# Validate all contracts in contracts/ directory
apr-leaderboard acceptance --verify
# Output:
#   Acceptance Criteria Scaffold Verification:
#     Scaffolded: 12/27
#     Pending (needs real models): 11
#     External (needs tooling): 4
#
#   Contract validation:
#     contracts/pass-at-k.yaml — 1 equations, 3 obligations

Wired APIs:

  • provable_contracts::schema::parse_contract(path) — Parse YAML contract files
  • provable_contracts::schema::validate_contract(&contract) — Check equations, proof obligations, falsification tests
  • provable_contracts::error::Severity — Filter validation violations by severity

Current contracts (30 in contracts/ directory, all parsed by pv proof-status):

ContractLevelObligsTestsKaniScope
pass-at-k.yamlL2330Eval estimator (Chen et al.)
inference-throughput.yamlL2220CPU/GPU throughput bounds
decontamination.yamlL2330N-gram overlap gate
distillation.yamlL2330Teacher→student quality (PMAT-007)
lora-algebra.yamlL2330LoRA rank/merge math
quantization.yamlL2330INT4/Q4K size + ordering
dpo-alignment.yaml v2.0L1650DPO e2e pipeline + MBPP target (PMAT-008)
qlora-training-loop.yamlL3783Full training pipeline (§26)
fused-cross-entropy.yamlL3452Chunked CE loss
nf4-dequantization.yamlL3463NF4 codebook + roundtrip
wgsl-gemm-tiled.yamlL3452CUTLASS-derived WGSL GEMM
wgsl-transpose.yamlL1310GPU transpose shader
gpu-output-norm.yamlL2330GPU-resident RMSNorm
forward-pass-perf.yamlL1210Per-op layer timing
lora-finetune-eval.yamlL2330Train→merge→eval (PMAT-008)
merge-weight-norm.yaml v2.0L2460SLERP/TIES norm + AC-024 (PMAT-010)
leaderboard-gate.yamlL2330AC-022 compound gate
preference-pairs.yamlL1430N-sampling→DPO pairs (PMAT-014)
compile-binary.yamlL2330apr compile (AC-010/026)
pipeline-validation.yamlL2330make verify/validate
perplexity-baseline.yamlL2330WikiText-2 PPL (AC-002)
tokenizer-preservation.yamlL2330GH-580/581 tokenizer in merge/quantize
data-governance.yamlL2330Data catalog + lineage
quantization-quality.yamlL2330INT4 pass@1 retention (AC-023)
data-quality.yamlL2440Training data quality (AC-025)
pruning-quality.yamlL2440Wanda pruning quality (AC-008)
binding-coverage.yamlL2330Contract binding coverage (AC-012)
hf-parity.yamlL2440HuggingFace parity gap (AC-014)
ties-sign-resolution.yamlL2440TIES sign conflict resolution (AC-007)
ft-completeness.yamlL1330All FTs pass — meta contract (AC-015)

Totals: 101 proof obligations, 101 falsification tests, 10 Kani harnesses. Levels: L1=5, L2=20, L3=4.

Cross-project contracts (in ../provable-contracts/contracts/):

ContractEquationsProof ObligationsFalsification TestsStatus
gpu-multi-backend-parity-v1.yaml4 (multi_backend_parity, backend_priority, bandwidth_bound, jit_correctness)6 (parity, no garbage, determinism, wgpu, nvrtc, bandwidth)7 (F-MBP-001..007)Active
gpu-context-health-v1.yaml2 (fp8_guard, context_health)3 (FP8 disabled on Blackwell, no poison, Ada still enabled)3 (FT-GPU-CTX-001..003)Verified
ptx-target-parity-v1.yaml3 (target_parity, no_hardcoded, jit_success)4 (target match, no emit_ptx, kernels with_target, JIT success)5 (FALSIFY-PTP-001..005)Violated on sm_121
gqa-kernel-v1.yaml1 (GQA formula)8 (normalization, MHA equiv, convex bound, KV broadcast, SIMD, GPU, head mapping)9 (FALSIFY-GQ-001..009)Active

16.0.1 Binding Coverage (AC-012)

Contract binding coverage tracks how many proof obligations have corresponding code implementations identified. See contracts/BINDING_REGISTRY.md for the full mapping.

MetricCurrentTarget
Obligations bound80/9893/98
Coverage81.6%≥95%
Gap18 unbound5 allowed

Top unbound areas: TIES sign election (3), pruning eval pipeline (2), DPO pipeline (2), binding meta (2). See BINDING_REGISTRY.md for the priority list.

16.1 Contract Coverage Requirements

The leaderboard pipeline touches these kernel equivalence classes from the provable-contracts registry:

Kernel ClassContracts RequiredPipeline Stage
E (Qwen)RMSNorm, SwiGLU, GQA, RoPEInference (eval, distill, chat)
Attentionattention-kernel-v1, flash-attention-kernel-v1Inference, distillation
Quantizationquantization-ordering-v1, q4k-q6k-superblock-v1apr quantize, QLoRA base weights
LoRAlora-algebra-v1apr finetune --method lora/qlora
Softmaxsoftmax-kernel-v1Attention, sampling
Matmulmatmul-kernel-v1All linear layers
AdamWadamw-kernel-v1Training optimizer

16.2 Contract Verification Gates

Each pipeline stage MUST pass its contract obligations before proceeding:

# Verify all kernel contracts are bound and implemented
pv proof-status ../provable-contracts/contracts/ \
    --binding ../provable-contracts/contracts/aprender/binding.yaml \
    --format json

# Verify Qwen2 architecture contracts specifically
pv audit ../provable-contracts/contracts/model/qwen35-shapes-v1.yaml \
    --binding ../provable-contracts/contracts/aprender/binding.yaml

# Run falsification tests for all pipeline-relevant kernels
cargo test --features kani -p aprender -- contract

16.3 Pipeline-Specific Proof Obligations

ObligationPropertyVerification LevelGate
PO-LB-001Distillation preserves architecture invariantsL2 (falsification)Before apr distill
PO-LB-002Merge preserves tensor shape flowL3 (proptest)Before apr merge
PO-LB-003Prune maintains attention head structureL2 (falsification)Before apr prune
PO-LB-004Quantization ordering matches golden order §8L1 (type system)Compile-time
PO-LB-005LoRA adapter rank ≤ hidden dimL1 (Poka-Yoke)Compile-time
PO-LB-006Q4K dequantize × quantize ≈ identity (CPU + wgpu)L4 (Kani, bound=256)CI
PO-LB-007Softmax normalization: sum(output) ≈ 1.0 (CPU + wgpu)L4 (Kani, bound=16)CI
PO-LB-008SLERP interpolation preserves weight normsL3 (proptest)Before apr merge --strategy slerp

16.4 #[contract] Annotations

Every function in the apr-leaderboard pipeline that performs a mathematical operation MUST carry a #[contract] annotation linking it to its provable-contracts YAML:

#![allow(unused)]
fn main() {
use provable_contracts_macros::contract;

#[contract("quantization-ordering-v1", equation = "quantize_int4")]
pub fn quantize_model(model: &AprModel, scheme: QuantScheme) -> Result<AprModel> {
    // Implementation — contract macro enforces binding at compile time
}

#[contract("lora-algebra-v1", equation = "lora_forward")]
pub fn lora_forward(base: &Tensor, a: &Tensor, b: &Tensor, scale: f32) -> Tensor {
    // output = base @ x + scale * (B @ (A @ x))
}
}

If the binding is missing from contracts/aprender/binding.yaml, the build fails. Zero tolerance for unbound kernels.

16.5 Falsification Test Results

Tests run via make check-contracts (64 passed, 1 failed, updated 2026-04-03):

CategoryTestsStatusDetails
pass@k5PASSFT-001..005 (boundary, ratio, high-c)
throughput2PASS2.5 tok/s, 385ms TTFT
benchmark data3PASSHumanEval 164, MBPP 974, BCB 1140
decontamination1PASS0% HE/MBPP overlap
eval results3PASS90.85% best, 15 runs, latest >= 80%
distillation2PASS32B > 7B, 11 categories
MBPP eval1PASS76.2% >= 70%
AC-022 gate1FAILHE=90.85% MBPP=76.2% < 80%
quantization3PASSQ4K 35% FP16, apr check, golden ordering
distillation data3PASS99 completions, valid JSONL, 99 prompts
oracle analysis2PASS96.34% upper bound, 6 never-solved
pipeline3PASS24 scripts, 22 configs, 57 targets
compile1PASSapr compile available
data catalog2PASS9 contract bindings, 13 datasets
leaderboard coverage2PASS20 eval runs, 2 benchmarks
HF parity1PASS3.05pp gap (apr=90.85%, HF=87.8%)
contract coverage1PASS29 contract YAMLs >= 25
structure29PASSAll 29 contract YAMLs valid

Makefile gate: make check-contracts68 passed, 1 failed (FT-GATE-001: MBPP 76.2% < 80%).

pv proof-status: 30/30 contracts parsed, 101 obligations, 101 tests, 10 Kani.

See contracts/CONTRACT_STATUS.md for full audit trail.