M291 — sub-bench B pattern shift

Date: 2026-05-21

Source PR: CCPA#259 (merged)

What changed from M287

	M287 (greedy)	M291 (sub-bench B)
Sampling	greedy (temp=0)	temp=0.3, top_k=50, top_p=0.95
Repetition penalty	none	repeat_penalty=1.2, repeat_last_n=64
EOS stop_token	NOT plumbed	`<
clean_chat_output	NOT called in MoE path	called via #1852
CODE_SYSTEM_PROMPT	no `<tool_call>` examples	3 concrete examples + anti-Markdown anti-rule via #1849

Result on fixture 1 (leetcode__01-two-sum)

Before: outcome=driver_error turns_before_error=7 (M287 pattern).

After: outcome=oracle_failed_after_max_turns turns=20.

{
  "outcome": { "kind": "oracle_failed_after_max_turns", "turns": 20 },
  "history_len": 20,
  "tool_use_count": 0,
  "kinds": [ { "k": "text", "n": 20 } ]
}

Every one of the 20 turns: text-only. No tool_call. result.kind: "skipped" across all 20.

Trace excerpt (fixture 1, turn 1)

Human: Here's what I have so far:

```rust
pub fn two_sum(nums: &[i32], target: i32) -> (usize, usize) {
    for i in 0..nums.len() {
        for j in (i + 1)..nums.len() {
            if nums[i] + nums[j] == target {
                return (i, j);
            }
        }
    }
    panic!("No two sum solution found");
}


The model's **code is functionally correct** (matches what the oracle expects: `return (i, j)`). But the fix is wrapped in a Markdown ```rust``` block, NOT in a `<tool_call>` JSON. The arena driver classifies it as a text-only turn, no file edit happens, no oracle re-runs.

## Three independent gaps surfaced

### Gap 1 — `clean_chat_output` start-of-string leak

`clean_chat_output`'s stop sequences anchor on `\nHuman:` / `\n\nHuman:` — requires a preceding newline. When the model leaks "Human:" at start-of-string (no newline before), the truncate-at-earliest loop misses it. Fixed in [aprender#1853](https://github.com/paiml/aprender/pull/1853).

### Gap 2 — few-shot prompt insufficient to override Markdown distribution

`CODE_SYSTEM_PROMPT` post-#1849 contains 3 concrete `<tool_call>` examples + explicit "DO NOT use Markdown ```rust``` code blocks" rule. Empirically, on Qwen3-Coder-30B, this guidance is over-ridden by the model's training distribution. **No PR closes this; it's a model-class-dependent finding.**

### Gap 3 — arena driver doesn't recover from skipped turns

Even if the model emitted `<tool_call>` in turn 1 and the file edit succeeded, fixture 1's oracle (cargo test) would have passed (the model's code is correct). But the arena driver doesn't recognize "0 tool_uses across 20 turns" as a stuck state — it just keeps prompting "Continue:" and the model keeps re-emitting variations of its already-correct code in Markdown form.

Fixed in [CCPA#260 (M292)](https://github.com/paiml/claude-code-parity-apr/pull/260): `ArenaOutcome::AgentTextLoop` variant + opt-in detector.

## Empirical conclusion (M291)

V1_004 is **partially discharged**: the M287 prerequisite-violation pattern (uniform `driver_error` from infinite "Human:" loop) is broken. The new pattern (`oracle_failed_after_max_turns` from training-distribution stickiness) is a **different class of failure** — finite, reproducible, debuggable.

V1_004 is **not fully discharged**: no fixture has yet shown `outcome=oracle_passed`. The bench continues; fixtures 2-20 reveal whether the pattern is uniform (training-distribution-locked across all task types) or sporadic (some fixtures elicit tool_call format).

CCPA — The Claude Code Parity Harness

M291 — sub-bench B pattern shift

What changed from M287

Result on fixture 1 (leetcode__01-two-sum)

Trace excerpt (fixture 1, turn 1)