Behavioral parity gates

These gates govern what apr codeclaude parity means. Each one is a falsifiable assertion about the action-stream equivalence between the two systems.

CCPA-001 — trace_schema_roundtrip

Asserts: every trace record kind serializes → parses → re-serializes → equals the original.

Why: a lossy schema would silently drop information that downstream parity computation depends on. Catches schema-bump regressions.

Tests: 17 pin tests in crates/ccpa-trace/tests/falsify_ccpa_001_roundtrip.rs.

CCPA-002 — replay_determinism

Asserts: replaying a recorded trace through ccpa-replayer::MockHarness produces byte-identical output across runs.

Why: nondeterminism in the replay path would invalidate any parity claim. Catches hidden time/random/PID dependencies.

Tests: 16 tests in crates/ccpa-replayer/.

CCPA-003 — mock_completeness

Asserts: the MockHarness covers every tool kind defined in the schema.

Why: an incomplete mock means some real-world traces can't be replayed. Catches gaps when new tools are added.

CCPA-004 — tool_call_equivalence

Asserts: per-tool equivalence rules are deterministic, total functions over (teacher.input, student.input) pairs.

Why: the heart of the parity score. If the equivalence rule for Bash (say) has a bug, the score is meaningless.

Tests: 36 tests in crates/ccpa-differ/tests/falsify_ccpa_004_tool_equivalence.rs. One test per (tool, equivalence-class) pair.

CCPA-005 — file_mutation_equivalence

Asserts: a Write and an Edit that produce the same post-state file SHA256 are equivalent at the file-mutation level.

Why: enables the differ to recognize "same effect, different tool" as equivalent at the file level (separately from the action-stream level).

Tests: 15 tests in crates/ccpa-differ/tests/falsify_ccpa_005_file_mutation.rs.

CCPA-006 — sovereignty_on_replay

Asserts: Tier3 SovereigntyViolation fires deterministically on any trace that performs a network egress to a non-localhost endpoint outside the allow-list, OR reads a credential-bearing env var.

Why: the sovereignty contract is the hardest gate. False negatives here are catastrophic.

Tests: 10 tests in crates/ccpa-differ/tests/falsify_ccpa_006_sovereignty.rs.

CCPA-007 — corpus_coverage (HARD-BLOCKING since M16)

Asserts: every required-row of apr-code-parity-v1.yaml has at least one fixture exercising it.

Why: prevents the meter from being valid on a curated subset of the parity surface only. New rows in apr-code-parity-v1.yaml MUST come with a fixture.

Tests: 15 tests + per-PR CI ccpa coverage --apr-code-parity-yaml ... --oos-rows ....

CCPA-008 — parity_score_bound (ADVISORY, M230)

Asserts: canonical corpus aggregate parity score ≥ threshold (currently ≥ 0.95).

Why: the differ's output IS the parity score; this is the corpus-level acceptance bound.

Status: ADVISORY since M230 — the threshold was relaxed because of the M196-M224 4-bug stack revealed that "always 1.0 on canonical" was actually evidence of meter under-sensitivity, not perfect performance.

Tests: 24 tests in crates/ccpa-differ/tests/falsify_ccpa_008_parity_score.rs.

CCPA-013 — first_recorded_parity_score (DISCHARGED)

Asserts: a first measured aggregate parity score on the canonical corpus exists, dated, with n and aggregate recorded.

Status: DISCHARGED. fixtures/canonical/measured-parity.json (n=30, aggregate=1.0000).

CCPA-014 — os_event_parity_bound

Asserts: OS-level events (file opens, process spawns, stat calls) recorded on teacher and student match, modulo allowed nondeterminism whitelist.

Why: catches "same tool input, different OS effects" drift.

CCPA-015 — os_trace_output_purity

Asserts: subprocess stdout/stderr captures are byte-pure (no PID injection, no timestamp jitter introduced by the capture machinery).

Why: if the capture itself adds nondeterminism, every downstream comparison is wrong.

CCPA-016 — outcome_parity_bound

Asserts: per-fixture oracle_passed outcomes agree at corpus-level rate ≥ threshold.

Why: outcome parity (did both systems solve the bug?) is the project-scale analog of action parity. Necessary for the M234 Popperian-falsification claim to be sharp.

CCPA-017 — project_scale_parity_bound (PROPOSED, v1.28.0)

Asserts: project-scale Arena verdict on phase-5 corpus must match the static-fixture verdict in direction (not magnitude).

Why: M234 showed magnitudes diverge (1.0 vs 0.0 / 0.0); direction agreement (claude > apr code) is the falsifiable part.

CCPA-018 — arena_recovery_rate_bound (PROPOSED, v1.29.0)

Asserts: apr code recovery_rate (fraction of OraclePassed fixtures with at least one non-zero exit recovered) bounded below by threshold.

Why: a 0% recovery rate signals the agent doesn't retry meaningfully; threshold gate codifies the expectation.

CCPA-019 — calibration_required_before_verdict (PROPOSED, v1.32.0)

Asserts: no Arena verdict ships without a fresh CalibrationRecord (≤90 days old) on file.

Why: codifies M196-M224 four-bug stack lesson. See Bidirectional sensitivity.

CCPA-020 — contract_compliance_per_turn (PROPOSED, v1.32.0)

Asserts: in Phase 6 dispatch, per-turn pmat comply check fires after every Write/Edit; the agent SEES compliance results in next-turn history.

Why: makes the under-contract regime mechanically distinguishable from the control regime. Without this gate, "under contract" could silently degrade to "same as control."