CCPA — The Claude Code Parity Harness

CCPA — record-replay-distill harness measuring claude vs apr code

A record-replay-distill harness measuring apr code against Claude Code at the action-stream level.

This book is the reference companion to the claude-code-parity-apr repository. It explains the methodology, the falsifier gates, the empirical findings, and the architectural decisions that shape every measurement.

Why this exists

A sovereign, locally-hosted coding agent (apr code) needs an honest, falsifiable yardstick to measure itself against the industry baseline (Claude Code). Without a rigorous yardstick:

"It works" claims drift from "it works like the reference"
Regressions hide behind narrative
The compliance posture of code an agent emits has no contract gate

CCPA closes that gap with three commitments:

Contract-first. Every behavior gate (FALSIFY-CCPA-001..020) is encoded as a falsifiable assertion in a YAML contract before code lands. Tests prove the gate; pv validate proves the contract; pmat comply proves the project's compliance posture. No code ships without a contract.
Two complementary measurement paths. A static path — authored teacher/student trace pairs scored by a deterministic differ — validates the meter. An Arena path — multi-turn live dispatches of real claude + real apr code against real Rust fixtures with test-shaped oracles — validates the system. The two paths cross-falsify each other.
Empirical calibration. Every Arena verdict requires a fresh bidirectional-sensitivity calibration on file (FALSIFY-CCPA-019). Static-fixture parity is calibrated against project-scale Arena reality; any drift between them is recorded and explained.

Honest framing

At function-scale (single-prompt code generation on HumanEval-style fixtures), claude and apr code are functionally interchangeable — both pass each other's tests (1.0000 parity, n=5, M150).

At project-scale (multi-turn Arena with real GitHub-issue fixtures), the static-fixture approach is Popperian-falsified as a project-scale predictor: claude solves 1/5, apr code 0/5 on phase-5 corpus (M234). Direction agrees with static verdict, magnitudes diverge.

The empirical chain in this book — M1 → M294 — is the honest record of what we measured, when, and how confident we are. Negative results are evidence; this book treats them as such.

Status as of writing

Contract v1.32.0 — 20 gates registered (16 ACTIVE_RUNTIME, 4 PROPOSED)
M0 → M294 all SHIPPED
Phase 6 under-contract dispatch in active operator-coordinated bench cycles against Qwen3-30B-A3B-Instruct-2507
V1_004 (Phase 6 non-zero student pass rate) is the open gate

How to read this book

Want the methodology in 10 minutes? → What is CCPA? + Methodology
Want to add a fixture or run a bench? → CLI reference
Want the empirical story (the interesting part)? → V1_004 chain
Want the academic basis? → Academic basis

License

Apache-2.0 OR MIT. See the repository root.