CCPA — The Claude Code Parity Harness
A record-replay-distill harness measuring apr code against Claude Code at the action-stream level.
This book is the reference companion to the claude-code-parity-apr repository. It explains the methodology, the falsifier gates, the empirical findings, and the architectural decisions that shape every measurement.
Why this exists
A sovereign, locally-hosted coding agent (apr code) needs an honest, falsifiable yardstick to measure itself against the industry baseline (Claude Code). Without a rigorous yardstick:
- "It works" claims drift from "it works like the reference"
- Regressions hide behind narrative
- The compliance posture of code an agent emits has no contract gate
CCPA closes that gap with three commitments:
-
Contract-first. Every behavior gate (
FALSIFY-CCPA-001..020) is encoded as a falsifiable assertion in a YAML contract before code lands. Tests prove the gate;pv validateproves the contract;pmat complyproves the project's compliance posture. No code ships without a contract. -
Two complementary measurement paths. A static path — authored teacher/student trace pairs scored by a deterministic differ — validates the meter. An Arena path — multi-turn live dispatches of real
claude+ realapr codeagainst real Rust fixtures with test-shaped oracles — validates the system. The two paths cross-falsify each other. -
Empirical calibration. Every Arena verdict requires a fresh bidirectional-sensitivity calibration on file (
FALSIFY-CCPA-019). Static-fixture parity is calibrated against project-scale Arena reality; any drift between them is recorded and explained.
Honest framing
At function-scale (single-prompt code generation on HumanEval-style fixtures), claude and apr code are functionally interchangeable — both pass each other's tests (1.0000 parity, n=5, M150).
At project-scale (multi-turn Arena with real GitHub-issue fixtures), the static-fixture approach is Popperian-falsified as a project-scale predictor: claude solves 1/5, apr code 0/5 on phase-5 corpus (M234). Direction agrees with static verdict, magnitudes diverge.
The empirical chain in this book — M1 → M294 — is the honest record of what we measured, when, and how confident we are. Negative results are evidence; this book treats them as such.
Status as of writing
- Contract v1.32.0 — 20 gates registered (16 ACTIVE_RUNTIME, 4 PROPOSED)
- M0 → M294 all SHIPPED
- Phase 6 under-contract dispatch in active operator-coordinated bench cycles against Qwen3-30B-A3B-Instruct-2507
- V1_004 (Phase 6 non-zero student pass rate) is the open gate
How to read this book
- Want the methodology in 10 minutes? → What is CCPA? + Methodology
- Want to add a fixture or run a bench? → CLI reference
- Want the empirical story (the interesting part)? → V1_004 chain
- Want the academic basis? → Academic basis
License
Apache-2.0 OR MIT. See the repository root.