Architecture at a glance

Workspace layout

claude-code-parity-apr/
├── contracts/                 # pin.lock + smoke YAML; canonical YAML lives in aprender
├── crates/
│   ├── ccpa-trace/            # JSONL trace schema, types, validators
│   ├── ccpa-differ/           # per-tool equivalence rules, parity score
│   ├── ccpa-recorder/         # stream-json parser (claude side)
│   ├── ccpa-subproc/          # subprocess driver (deterministic stdout/stderr capture)
│   ├── ccpa-replayer/         # mock harness for replay determinism
│   ├── ccpa-arena/            # multi-turn live runner + bench binary
│   └── ccpa-cli/              # `ccpa` user-facing binary
├── docs/specifications/       # 25 spec files (all <500 LOC, doc-drift gated)
├── evidence/                  # per-phase measured-output snapshots
├── fixtures/                  # canonical, regression, project-scale, calibration-and-scale, under-contract
└── scripts/                   # bench dispatch + drift detectors

Crate dependency graph

                       ccpa-cli
                          │
            ┌─────────────┼─────────────┐
            ▼             ▼             ▼
       ccpa-differ    ccpa-arena   ccpa-recorder
            │             │             │
            └─────────────┼─────────────┘
                          ▼
                     ccpa-trace
                          │
                          ▼
                     ccpa-subproc

ccpa-trace is the schema root — every crate consumes its Trace, Record, ToolUse, ToolResult types. Adding a new trace record kind goes here first; the schema bump cascades downward through compile-time type checks.

How ccpa diff produces a parity score

  1. Load both JSONL files via ccpa-trace::parse::parse_file. The parser hard-enforces schema v2 (hook_event + skill_invocation records added at M15).
  2. Pair records by index. Length must match exactly (records imbalance is a hard error — see tool_call_equivalence falsifier).
  3. Project hook events and skill invocations onto their target tool record (M15 hook/skill semantics).
  4. Match each paired record under its per-tool equivalence rule:
    • Bash: command tokenization + whitelist of allowed nondeterminism
    • Write/Edit: post-state file SHA256 must agree
    • Read: path + range + content excerpt
    • Skill: invocation site + arguments
    • Hook: trigger + target tool's invocation
  5. Score: count matches, divide by total. Score ∈ [0.0, 1.0].
  6. Categorize drifts: any mismatch is classified into a closed DriftCategory enum. Tier 0 = no drift; Tier 1 = cosmetic; Tier 2 = semantic; Tier 3 = sovereignty violation (see crates/ccpa-differ/src/sovereignty.rs).
  7. Report: ParityReport { score, drifts[] } — JSON-serializable, the unit of measurement.

How ccpa-arena-bench runs a fixture

1. Copy fixture's cwd-tree to /tmp/p6-uc-<fixture>-<side>.<rand>
2. Read prompt.txt
3. Launch driver subprocess:
     - teacher: claude --output-format=stream-json --print "<prompt>"
     - student: apr code --model=<path> -p "<prompt>" + apr serve auto-spawned
4. Multi-turn loop (max_turns=20 default, wall=900s default):
   a. Render history into prompt suffix
   b. driver.next_turn(prompt + history) → NextTurn { blocks, stop_reason }
   c. Extract first ToolUse block → dispatch in fixture cwd
   d. Append TurnRecord to history
   e. Every K turns (oracle_check_interval=3 default) OR on EndTurn:
      - Run oracle: cargo test 2>&1 | grep "test result: ok"
      - Pass → return OraclePassed
   f. Phase 6 only: if compliance_enforced, per-Write/Edit run pmat comply check
   g. Trap detectors: ComplianceTrap (N consecutive same-(file,sha) failures),
      AgentTextLoop (N consecutive text-only turns, M292, opt-in)
5. On max_turns / wall / driver_error / compliance_trap → return the appropriate ArenaOutcome
6. Emit BenchResult JSON to evidence/<phase>/captures/<fixture>/<side>.bench.json

The cleanly-typed outcome enum lets aggregate scoring (recovery_rate, oracle_passed_rate, compliance_cost_ratio) pattern-match without parsing strings.

Two binaries, one config space

  • ccpa — user-facing CLI for the static path (diff, corpus, coverage, validate)
  • ccpa-arena-bench — Arena dispatcher (operator-coordinated)

Both consume the same Trace/ArenaOutcome types and emit the same JSON shapes downstream tools depend on.