Milestone history
CCPA's work is organized as a continuous sequence of M-rows (milestone-rows) tracked in docs/specifications/milestones-*.md. Each M-row is one substantive deliverable (a PR, a fixture, a finding) with its own scope and acceptance criteria.
High-level phases
| Phase | M-row range | What it shipped |
|---|---|---|
| Phase 1 (RECORD) — out-of-scope post-M222 | M0-M14 | original HTTPS-proxy recording path; rescoped to subprocess-driver |
| Phase 2 (REPLAY) | M15-M50 | trace schema, replayer, mock harness, hook+skill projection |
| Phase 3 (DISTILL — function-scale) | M51-M100 | MultiPL-E-Rust HumanEval bench, function-scale parity measurement (n=5, 1.0000) |
| Phase 4 (project-scale prep) | M101-M150 | fixture authoring for project-scale; differ enhancements; bidirectional sensitivity |
| Phase 5 (ARENA — project-scale) | M150-M234 | Arena runner, calibration-and-scale corpus, first arena scores |
| Phase 6 (UNDER-CONTRACT) | M250-M294 | compliance-enforced dispatch, V1_004 chain, Coder-finetune-distribution finding |
Notable M-rows
- M9 — regression corpus added (bidirectional sensitivity)
- M15 — schema v2 (hook_event + skill_invocation)
- M16 —
FALSIFY-CCPA-007hard-blocking corpus coverage gate - M150 — first measured function-scale parity (n=5, 1.0000)
- M194-M210 — Arena runner Phase 5 P5.1-P5.5
- M222 — RECORD path out-of-scope directive (rescope to subprocess-driver only)
- M230 —
FALSIFY-CCPA-008flipped to ADVISORY after M196-M224 four-bug-stack revealed meter under-sensitivity - M234 — Popperian-falsification of static-fixture as project-scale predictor (claude 1/5, apr code 0/5)
- M236 —
FALSIFY-CCPA-019(calibration_required_before_verdict) introduced - M280 — Phase 6 CCPA project SUSPENSION declared (1.5B model below testability floor)
- M286 — M32d MoE KV cache shipped (19× speedup; unblocks V1_004)
- M287 — greedy baseline pattern; uniform
driver_erroron 30B-Coder - M291 — sub-bench B pattern shift;
driver_error→oracle_failed_after_max_turns - M292 —
ArenaOutcome::AgentTextLoopdetector (Gap 3 closure) - M293 —
PHASE6_MAX_CONSECUTIVE_TEXT_TURNSenv var wiring - M294 — finetune-distribution A/B; non-Coder Qwen3-30B-A3B-Instruct-2507 confirmed at smoke level
How M-rows are tracked
Each M-row gets a row in docs/specifications/milestones-mNNN-mMMM.md. The row body explains:
- What was shipped
- Why (motivation, prior M-row references)
- Acceptance criteria (tests, evidence, contract entries)
- Cross-references (PR numbers, evidence file paths)
A doc-drift detector (scripts/check-doc-drift.sh) asserts that the milestone counter on 5 cross-reference surfaces (README, CONTRIBUTING, top spec, status-snapshots, milestones doc) all agree.
Operator-coordinated vs autonomous M-rows
- Autonomous — anything that doesn't require operator-only data (compute budget, model-class decision, contract amendment). The autonomous ship-cycle (per
CLAUDE.md) ships these continuously without check-in. - Operator-coordinated — anything that needs operator-only data: dispatching benches, deciding model class, amending contract gates. The substantive→mechanical→substantive cadence pauses ONLY for these.