CCPA — The Claude Code Parity Harness

CCPA — record-replay-distill harness measuring claude vs apr code

A record-replay-distill harness measuring apr code against Claude Code at the action-stream level.

This book is the reference companion to the claude-code-parity-apr repository. It explains the methodology, the falsifier gates, the empirical findings, and the architectural decisions that shape every measurement.

Why this exists

A sovereign, locally-hosted coding agent (apr code) needs an honest, falsifiable yardstick to measure itself against the industry baseline (Claude Code). Without a rigorous yardstick:

  • "It works" claims drift from "it works like the reference"
  • Regressions hide behind narrative
  • The compliance posture of code an agent emits has no contract gate

CCPA closes that gap with three commitments:

  1. Contract-first. Every behavior gate (FALSIFY-CCPA-001..020) is encoded as a falsifiable assertion in a YAML contract before code lands. Tests prove the gate; pv validate proves the contract; pmat comply proves the project's compliance posture. No code ships without a contract.

  2. Two complementary measurement paths. A static path — authored teacher/student trace pairs scored by a deterministic differ — validates the meter. An Arena path — multi-turn live dispatches of real claude + real apr code against real Rust fixtures with test-shaped oracles — validates the system. The two paths cross-falsify each other.

  3. Empirical calibration. Every Arena verdict requires a fresh bidirectional-sensitivity calibration on file (FALSIFY-CCPA-019). Static-fixture parity is calibrated against project-scale Arena reality; any drift between them is recorded and explained.

Honest framing

At function-scale (single-prompt code generation on HumanEval-style fixtures), claude and apr code are functionally interchangeable — both pass each other's tests (1.0000 parity, n=5, M150).

At project-scale (multi-turn Arena with real GitHub-issue fixtures), the static-fixture approach is Popperian-falsified as a project-scale predictor: claude solves 1/5, apr code 0/5 on phase-5 corpus (M234). Direction agrees with static verdict, magnitudes diverge.

The empirical chain in this book — M1 → M294 — is the honest record of what we measured, when, and how confident we are. Negative results are evidence; this book treats them as such.

Status as of writing

  • Contract v1.32.0 — 20 gates registered (16 ACTIVE_RUNTIME, 4 PROPOSED)
  • M0 → M294 all SHIPPED
  • Phase 6 under-contract dispatch in active operator-coordinated bench cycles against Qwen3-30B-A3B-Instruct-2507
  • V1_004 (Phase 6 non-zero student pass rate) is the open gate

How to read this book

License

Apache-2.0 OR MIT. See the repository root.