Arena runner overview
The Arena is CCPA's live-execution path. It dispatches real claude and real apr code subprocesses against real Rust bugs in real cwd-trees, and scores each via a test-shaped oracle.
The Arena loop (per fixture, per side)
1. Copy fixture's cwd-tree to /tmp/p6-uc-<fixture>-<side>.<rand>
2. Read prompt.txt
3. Launch driver subprocess via SubprocessDriver:
teacher: claude --output-format=stream-json --print "<prompt>"
student: apr code --model=<path> -p "<prompt>" (apr serve auto-spawned)
4. Multi-turn ArenaSession::run loop:
for turn in 1..=max_turns:
a. Check wall-clock budget
b. Render history into prompt suffix:
"<prompt>\n\n<rendered_history>### Continue:\n"
c. driver.next_turn(prompt) → NextTurn { blocks, stop_reason }
d. Extract first ToolUse block from blocks:
some → dispatch the tool in cwd, record ToolResult
none → record ToolInvocation::Text
e. Phase 6 only: ComplianceTrap detector observes ToolResult::FileMutated
f. M292: AgentTextLoop detector observes ToolInvocation::Text
g. Append TurnRecord to history
h. Every oracle_check_interval turns OR on StopReason::EndTurn:
run_oracle_compound → OracleOutcome { Passed | FailedDueToCompliance | NonZeroExit | ExitZeroNoPatternMatch }
Passed → return ArenaOutcome::OraclePassed
FailedDueToCompliance (Phase 6) → return ArenaOutcome::ComplianceFailed
end for
5. Loop exit → ArenaOutcome::OracleFailedAfterMaxTurns
6. Wall-time exit → ArenaOutcome::WallTimeout
7. Driver error → ArenaOutcome::DriverError { reason, turns_before_error }
8. Compliance trap → ArenaOutcome::ComplianceTrap { file, last_reason, consecutive_count }
9. Text loop (M292) → ArenaOutcome::AgentTextLoop { consecutive_text_turns, last_text_excerpt }
Default knobs
| Knob | Default | Set by |
|---|---|---|
max_turns | 20 | PHASE6_MAX_TURNS env / --max-turns flag |
max_wall_seconds | 900 (phase 5) / 3600 (phase 6) | PHASE6_WALL_SECONDS / --wall-seconds |
oracle_check_interval | 5 (phase 5) / 3 (phase 6) | PHASE6_ORACLE_INTERVAL / --oracle-check-interval |
compliance_enforced | false (phase 5) / true (phase 6) | PHASE6_COMPLIANCE_ENFORCED / --compliance-enforced |
max_consecutive_compliance_failures | 3 | PHASE6_MAX_CONSECUTIVE_COMPLIANCE_FAILURES |
max_consecutive_text_turns (M292) | 0 (disabled) | PHASE6_MAX_CONSECUTIVE_TEXT_TURNS |
Two binaries
ccpa-arena-bench(incrates/ccpa-arena/src/bin/) — one-fixture dispatcher. Reads prompt, oracle config, driver config from flags; emits BenchResult JSON.scripts/phase-{5,6}-bench.sh— corpus-walker that callsccpa-arena-benchper-fixture, aggregates per-sidescores.json.
The shell script handles model pre-warming, evidence directory layout, and per-fixture orchestration; the Rust binary handles the loop.