Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

14. Batuta Falsification Checklist

14.1 108-Item Popperian Assessment

The Albor project itself is subject to batuta’s 108-item falsification checklist:

# Full assessment
batuta falsify . --verbose --format markdown --output docs/falsification-report.md

# Critical-only (blocks release)
batuta falsify . --critical-only

# CI-friendly output
batuta falsify . --format github-actions --min-grade kaizen-required

14.2 Key Sections Applied to Albor

Section 1: Sovereign Data Governance (SDG)

  • All training data has documented provenance (HuggingFace commit SHAs)
  • No PII in training corpus (alimentar quality check)
  • Data residency: all data stored on owned hardware (lambda + intel)
  • Teacher model license verified (Apache 2.0)

Section 3: Hypothesis-Driven Development (HDD)

  • Each improvement stage has a falsifiable hypothesis:
    • “Distillation improves avg benchmark by >5%” (FALSIFY-ALBOR-005)
    • “Pruning at 50% sparsity degrades benchmarks by <2%” (FALSIFY-ALBOR-008)
    • “Q4 quantization degrades perplexity by <5%” (FALSIFY-ALBOR-009)
  • Reproducibility standard: Gold (deterministic seeds, versioned data, BLAKE3 checkpoint hashes, Cargo.lock pinning)

Section 4: Numerical Reproducibility (NR)

  • Float determinism enforced via fixed seeds and operator ordering
  • Cross-platform consistency: checkpoint trained on lambda loads on intel
  • SIMD parity: all kernels have provable-contracts SIMD equivalence obligations

Section 5: Performance & Waste Elimination (PW)

  • Seven Wastes (Muda) applied to training pipeline:
    • No redundant data copies (alimentar streaming)
    • No idle GPU time (pre-computed teacher logits)
    • No over-processing (progressive model sizing: 50M → 125M → 350M)

Section 6: Safety & Formal Verification (SF)

  • Critical kernels have Kani proofs (softmax, attention, cross-entropy)
  • New kernels (KD loss, gradient accumulation) get Kani harnesses

Section 10: Architectural Invariants (AI) — CRITICAL

  • AI-01: All model operations use apr (no manual weight manipulation)
  • AI-02: Every checkpoint is BLAKE3-hashed and version-tracked
  • AI-03: Training config is immutable once committed (no runtime overrides)
  • AI-04: Eval results are reproducible (fixed seed, deterministic batching)
  • AI-05: No undeclared dependencies (Cargo.lock enforced)

14.3 Current Grade

Perfect Score: 100.0% (108/108 PASS) — achieved 2026-03-04.

This exceeds the Toyota Standard (90-100%) target:

  • All 5 Critical items pass (Section 10)
  • All Major items pass
  • All Minor items pass
  • Zero PARTIAL, zero FAIL

Score progression across 14 MLOps survey batches: 34% → 100% (see entrenar/docs/specifications/world-class-mlops-survey.md).