Behavioral parity gates
These gates govern what apr code ↔ claude parity means. Each one is a falsifiable assertion about the action-stream equivalence between the two systems.
CCPA-001 — trace_schema_roundtrip
Asserts: every trace record kind serializes → parses → re-serializes → equals the original.
Why: a lossy schema would silently drop information that downstream parity computation depends on. Catches schema-bump regressions.
Tests: 17 pin tests in crates/ccpa-trace/tests/falsify_ccpa_001_roundtrip.rs.
CCPA-002 — replay_determinism
Asserts: replaying a recorded trace through ccpa-replayer::MockHarness produces byte-identical output across runs.
Why: nondeterminism in the replay path would invalidate any parity claim. Catches hidden time/random/PID dependencies.
Tests: 16 tests in crates/ccpa-replayer/.
CCPA-003 — mock_completeness
Asserts: the MockHarness covers every tool kind defined in the schema.
Why: an incomplete mock means some real-world traces can't be replayed. Catches gaps when new tools are added.
CCPA-004 — tool_call_equivalence
Asserts: per-tool equivalence rules are deterministic, total functions over (teacher.input, student.input) pairs.
Why: the heart of the parity score. If the equivalence rule for Bash (say) has a bug, the score is meaningless.
Tests: 36 tests in crates/ccpa-differ/tests/falsify_ccpa_004_tool_equivalence.rs. One test per (tool, equivalence-class) pair.
CCPA-005 — file_mutation_equivalence
Asserts: a Write and an Edit that produce the same post-state file SHA256 are equivalent at the file-mutation level.
Why: enables the differ to recognize "same effect, different tool" as equivalent at the file level (separately from the action-stream level).
Tests: 15 tests in crates/ccpa-differ/tests/falsify_ccpa_005_file_mutation.rs.
CCPA-006 — sovereignty_on_replay
Asserts: Tier3 SovereigntyViolation fires deterministically on any trace that performs a network egress to a non-localhost endpoint outside the allow-list, OR reads a credential-bearing env var.
Why: the sovereignty contract is the hardest gate. False negatives here are catastrophic.
Tests: 10 tests in crates/ccpa-differ/tests/falsify_ccpa_006_sovereignty.rs.
CCPA-007 — corpus_coverage (HARD-BLOCKING since M16)
Asserts: every required-row of apr-code-parity-v1.yaml has at least one fixture exercising it.
Why: prevents the meter from being valid on a curated subset of the parity surface only. New rows in apr-code-parity-v1.yaml MUST come with a fixture.
Tests: 15 tests + per-PR CI ccpa coverage --apr-code-parity-yaml ... --oos-rows ....
CCPA-008 — parity_score_bound (ADVISORY, M230)
Asserts: canonical corpus aggregate parity score ≥ threshold (currently ≥ 0.95).
Why: the differ's output IS the parity score; this is the corpus-level acceptance bound.
Status: ADVISORY since M230 — the threshold was relaxed because of the M196-M224 4-bug stack revealed that "always 1.0 on canonical" was actually evidence of meter under-sensitivity, not perfect performance.
Tests: 24 tests in crates/ccpa-differ/tests/falsify_ccpa_008_parity_score.rs.
CCPA-013 — first_recorded_parity_score (DISCHARGED)
Asserts: a first measured aggregate parity score on the canonical corpus exists, dated, with n and aggregate recorded.
Status: DISCHARGED. fixtures/canonical/measured-parity.json (n=30, aggregate=1.0000).
CCPA-014 — os_event_parity_bound
Asserts: OS-level events (file opens, process spawns, stat calls) recorded on teacher and student match, modulo allowed nondeterminism whitelist.
Why: catches "same tool input, different OS effects" drift.
CCPA-015 — os_trace_output_purity
Asserts: subprocess stdout/stderr captures are byte-pure (no PID injection, no timestamp jitter introduced by the capture machinery).
Why: if the capture itself adds nondeterminism, every downstream comparison is wrong.
CCPA-016 — outcome_parity_bound
Asserts: per-fixture oracle_passed outcomes agree at corpus-level rate ≥ threshold.
Why: outcome parity (did both systems solve the bug?) is the project-scale analog of action parity. Necessary for the M234 Popperian-falsification claim to be sharp.
CCPA-017 — project_scale_parity_bound (PROPOSED, v1.28.0)
Asserts: project-scale Arena verdict on phase-5 corpus must match the static-fixture verdict in direction (not magnitude).
Why: M234 showed magnitudes diverge (1.0 vs 0.0 / 0.0); direction agreement (claude > apr code) is the falsifiable part.
CCPA-018 — arena_recovery_rate_bound (PROPOSED, v1.29.0)
Asserts: apr code recovery_rate (fraction of OraclePassed fixtures with at least one non-zero exit recovered) bounded below by threshold.
Why: a 0% recovery rate signals the agent doesn't retry meaningfully; threshold gate codifies the expectation.
CCPA-019 — calibration_required_before_verdict (PROPOSED, v1.32.0)
Asserts: no Arena verdict ships without a fresh CalibrationRecord (≤90 days old) on file.
Why: codifies M196-M224 four-bug stack lesson. See Bidirectional sensitivity.
CCPA-020 — contract_compliance_per_turn (PROPOSED, v1.32.0)
Asserts: in Phase 6 dispatch, per-turn pmat comply check fires after every Write/Edit; the agent SEES compliance results in next-turn history.
Why: makes the under-contract regime mechanically distinguishable from the control regime. Without this gate, "under contract" could silently degrade to "same as control."