Behavioral parity gates

These gates govern what apr code ↔ claude parity means. Each one is a falsifiable assertion about the action-stream equivalence between the two systems.

CCPA-001 — `trace_schema_roundtrip`

Asserts: every trace record kind serializes → parses → re-serializes → equals the original.

Why: a lossy schema would silently drop information that downstream parity computation depends on. Catches schema-bump regressions.

Tests: 17 pin tests in crates/ccpa-trace/tests/falsify_ccpa_001_roundtrip.rs.

CCPA-002 — `replay_determinism`

Asserts: replaying a recorded trace through ccpa-replayer::MockHarness produces byte-identical output across runs.

Why: nondeterminism in the replay path would invalidate any parity claim. Catches hidden time/random/PID dependencies.

Tests: 16 tests in crates/ccpa-replayer/.

CCPA-003 — `mock_completeness`

Asserts: the MockHarness covers every tool kind defined in the schema.

Why: an incomplete mock means some real-world traces can't be replayed. Catches gaps when new tools are added.

CCPA-004 — `tool_call_equivalence`

Asserts: per-tool equivalence rules are deterministic, total functions over (teacher.input, student.input) pairs.

Why: the heart of the parity score. If the equivalence rule for Bash (say) has a bug, the score is meaningless.

Tests: 36 tests in crates/ccpa-differ/tests/falsify_ccpa_004_tool_equivalence.rs. One test per (tool, equivalence-class) pair.

CCPA-005 — `file_mutation_equivalence`

Asserts: a Write and an Edit that produce the same post-state file SHA256 are equivalent at the file-mutation level.

Why: enables the differ to recognize "same effect, different tool" as equivalent at the file level (separately from the action-stream level).

Tests: 15 tests in crates/ccpa-differ/tests/falsify_ccpa_005_file_mutation.rs.

CCPA-006 — `sovereignty_on_replay`

Asserts: Tier3 SovereigntyViolation fires deterministically on any trace that performs a network egress to a non-localhost endpoint outside the allow-list, OR reads a credential-bearing env var.

Why: the sovereignty contract is the hardest gate. False negatives here are catastrophic.

Tests: 10 tests in crates/ccpa-differ/tests/falsify_ccpa_006_sovereignty.rs.

CCPA-007 — `corpus_coverage` (HARD-BLOCKING since M16)

Asserts: every required-row of apr-code-parity-v1.yaml has at least one fixture exercising it.

Why: prevents the meter from being valid on a curated subset of the parity surface only. New rows in apr-code-parity-v1.yaml MUST come with a fixture.

Tests: 15 tests + per-PR CI ccpa coverage --apr-code-parity-yaml ... --oos-rows ....

CCPA-008 — `parity_score_bound` (ADVISORY, M230)

Asserts: canonical corpus aggregate parity score ≥ threshold (currently ≥ 0.95).

Why: the differ's output IS the parity score; this is the corpus-level acceptance bound.

Status: ADVISORY since M230 — the threshold was relaxed because of the M196-M224 4-bug stack revealed that "always 1.0 on canonical" was actually evidence of meter under-sensitivity, not perfect performance.

Tests: 24 tests in crates/ccpa-differ/tests/falsify_ccpa_008_parity_score.rs.

CCPA — The Claude Code Parity Harness

Behavioral parity gates

CCPA-001 — `trace_schema_roundtrip`

CCPA-002 — `replay_determinism`

CCPA-003 — `mock_completeness`

CCPA-004 — `tool_call_equivalence`

CCPA-005 — `file_mutation_equivalence`

CCPA-006 — `sovereignty_on_replay`

CCPA-007 — `corpus_coverage` (HARD-BLOCKING since M16)

CCPA-008 — `parity_score_bound` (ADVISORY, M230)

CCPA-013 — `first_recorded_parity_score` (DISCHARGED)

CCPA-014 — `os_event_parity_bound`

CCPA-015 — `os_trace_output_purity`

CCPA-016 — `outcome_parity_bound`

CCPA-017 — `project_scale_parity_bound` (PROPOSED, v1.28.0)

CCPA-018 — `arena_recovery_rate_bound` (PROPOSED, v1.29.0)

CCPA-019 — `calibration_required_before_verdict` (PROPOSED, v1.32.0)

CCPA-020 — `contract_compliance_per_turn` (PROPOSED, v1.32.0)