Architecture at a glance
Workspace layout
claude-code-parity-apr/
├── contracts/ # pin.lock + smoke YAML; canonical YAML lives in aprender
├── crates/
│ ├── ccpa-trace/ # JSONL trace schema, types, validators
│ ├── ccpa-differ/ # per-tool equivalence rules, parity score
│ ├── ccpa-recorder/ # stream-json parser (claude side)
│ ├── ccpa-subproc/ # subprocess driver (deterministic stdout/stderr capture)
│ ├── ccpa-replayer/ # mock harness for replay determinism
│ ├── ccpa-arena/ # multi-turn live runner + bench binary
│ └── ccpa-cli/ # `ccpa` user-facing binary
├── docs/specifications/ # 25 spec files (all <500 LOC, doc-drift gated)
├── evidence/ # per-phase measured-output snapshots
├── fixtures/ # canonical, regression, project-scale, calibration-and-scale, under-contract
└── scripts/ # bench dispatch + drift detectors
Crate dependency graph
ccpa-cli
│
┌─────────────┼─────────────┐
▼ ▼ ▼
ccpa-differ ccpa-arena ccpa-recorder
│ │ │
└─────────────┼─────────────┘
▼
ccpa-trace
│
▼
ccpa-subproc
ccpa-trace is the schema root — every crate consumes its Trace, Record, ToolUse, ToolResult types. Adding a new trace record kind goes here first; the schema bump cascades downward through compile-time type checks.
How ccpa diff produces a parity score
- Load both JSONL files via
ccpa-trace::parse::parse_file. The parser hard-enforces schema v2 (hook_event+skill_invocationrecords added at M15). - Pair records by index. Length must match exactly (records imbalance is a hard error — see
tool_call_equivalencefalsifier). - Project hook events and skill invocations onto their target tool record (M15 hook/skill semantics).
- Match each paired record under its per-tool equivalence rule:
Bash: command tokenization + whitelist of allowed nondeterminismWrite/Edit: post-state file SHA256 must agreeRead: path + range + content excerptSkill: invocation site + argumentsHook: trigger + target tool's invocation
- Score: count matches, divide by total. Score ∈ [0.0, 1.0].
- Categorize drifts: any mismatch is classified into a closed
DriftCategoryenum.Tier 0 = no drift; Tier 1 = cosmetic; Tier 2 = semantic; Tier 3 = sovereignty violation(seecrates/ccpa-differ/src/sovereignty.rs). - Report:
ParityReport { score, drifts[] }— JSON-serializable, the unit of measurement.
How ccpa-arena-bench runs a fixture
1. Copy fixture's cwd-tree to /tmp/p6-uc-<fixture>-<side>.<rand>
2. Read prompt.txt
3. Launch driver subprocess:
- teacher: claude --output-format=stream-json --print "<prompt>"
- student: apr code --model=<path> -p "<prompt>" + apr serve auto-spawned
4. Multi-turn loop (max_turns=20 default, wall=900s default):
a. Render history into prompt suffix
b. driver.next_turn(prompt + history) → NextTurn { blocks, stop_reason }
c. Extract first ToolUse block → dispatch in fixture cwd
d. Append TurnRecord to history
e. Every K turns (oracle_check_interval=3 default) OR on EndTurn:
- Run oracle: cargo test 2>&1 | grep "test result: ok"
- Pass → return OraclePassed
f. Phase 6 only: if compliance_enforced, per-Write/Edit run pmat comply check
g. Trap detectors: ComplianceTrap (N consecutive same-(file,sha) failures),
AgentTextLoop (N consecutive text-only turns, M292, opt-in)
5. On max_turns / wall / driver_error / compliance_trap → return the appropriate ArenaOutcome
6. Emit BenchResult JSON to evidence/<phase>/captures/<fixture>/<side>.bench.json
The cleanly-typed outcome enum lets aggregate scoring (recovery_rate, oracle_passed_rate, compliance_cost_ratio) pattern-match without parsing strings.
Two binaries, one config space
ccpa— user-facing CLI for the static path (diff,corpus,coverage,validate)ccpa-arena-bench— Arena dispatcher (operator-coordinated)
Both consume the same Trace/ArenaOutcome types and emit the same JSON shapes downstream tools depend on.