Arena runner overview

The Arena is CCPA's live-execution path. It dispatches real claude and real apr code subprocesses against real Rust bugs in real cwd-trees, and scores each via a test-shaped oracle.

The Arena loop (per fixture, per side)

1. Copy fixture's cwd-tree to /tmp/p6-uc-<fixture>-<side>.<rand>
2. Read prompt.txt
3. Launch driver subprocess via SubprocessDriver:
     teacher: claude --output-format=stream-json --print "<prompt>"
     student: apr code --model=<path> -p "<prompt>"  (apr serve auto-spawned)
4. Multi-turn ArenaSession::run loop:
   for turn in 1..=max_turns:
     a. Check wall-clock budget
     b. Render history into prompt suffix:
          "<prompt>\n\n<rendered_history>### Continue:\n"
     c. driver.next_turn(prompt) → NextTurn { blocks, stop_reason }
     d. Extract first ToolUse block from blocks:
          some → dispatch the tool in cwd, record ToolResult
          none → record ToolInvocation::Text
     e. Phase 6 only: ComplianceTrap detector observes ToolResult::FileMutated
     f. M292: AgentTextLoop detector observes ToolInvocation::Text
     g. Append TurnRecord to history
     h. Every oracle_check_interval turns OR on StopReason::EndTurn:
          run_oracle_compound → OracleOutcome { Passed | FailedDueToCompliance | NonZeroExit | ExitZeroNoPatternMatch }
          Passed → return ArenaOutcome::OraclePassed
          FailedDueToCompliance (Phase 6) → return ArenaOutcome::ComplianceFailed
   end for
5. Loop exit → ArenaOutcome::OracleFailedAfterMaxTurns
6. Wall-time exit → ArenaOutcome::WallTimeout
7. Driver error → ArenaOutcome::DriverError { reason, turns_before_error }
8. Compliance trap → ArenaOutcome::ComplianceTrap { file, last_reason, consecutive_count }
9. Text loop (M292) → ArenaOutcome::AgentTextLoop { consecutive_text_turns, last_text_excerpt }

Default knobs

KnobDefaultSet by
max_turns20PHASE6_MAX_TURNS env / --max-turns flag
max_wall_seconds900 (phase 5) / 3600 (phase 6)PHASE6_WALL_SECONDS / --wall-seconds
oracle_check_interval5 (phase 5) / 3 (phase 6)PHASE6_ORACLE_INTERVAL / --oracle-check-interval
compliance_enforcedfalse (phase 5) / true (phase 6)PHASE6_COMPLIANCE_ENFORCED / --compliance-enforced
max_consecutive_compliance_failures3PHASE6_MAX_CONSECUTIVE_COMPLIANCE_FAILURES
max_consecutive_text_turns (M292)0 (disabled)PHASE6_MAX_CONSECUTIVE_TEXT_TURNS

Two binaries

  • ccpa-arena-bench (in crates/ccpa-arena/src/bin/) — one-fixture dispatcher. Reads prompt, oracle config, driver config from flags; emits BenchResult JSON.
  • scripts/phase-{5,6}-bench.sh — corpus-walker that calls ccpa-arena-bench per-fixture, aggregates per-side scores.json.

The shell script handles model pre-warming, evidence directory layout, and per-fixture orchestration; the Rust binary handles the loop.