Chapter 69: R8 Performance Matrix and the HTTP Stub Defect
R8 of the dogfood rounds (2026-04-18) did what R1–R7 deferred: measure every CLI path, plot the slow ones, and cross-reference pmat’s published O(1) contract against observed reality. The output is a benchmark matrix covering 80+ command surfaces, a new defect in the exit-0-on-error family (D72), a verification that D42 (MCP stdin EOF hang) is fixed, and a fresh look at the MCP surface gap that Chapter 64 flagged. This chapter is the reader’s fast path to those findings.
Full raw data lives in /tmp/pmat-dogfood/round8/bench-r8.md on the dogfood host; issue #347 tracks D72 and related follow-ups.
69.1 The Benchmark Matrix — What We Measured
Runner: /usr/bin/time -v, 3 runs per command, wall_p50 = median, wall_p95 = max-of-3. Two corpora: tiny (a 2-function Rust hello-world) and repo (the PMAT codebase itself, ~18K functions). Thresholds: O(1) paths ≤500 ms wall, ≤256 MB RSS.
Trivial paths (help, version, list, cache stats, memory stats, diagnose) are all ≤1 ms wall, ~15 MB RSS. Forty+ help outputs: clean.
The slow tail on the real repo is the headline:
Top-10 Slowest Commands (by wall_p50)
| # | cmd | wall_p50 | wall_p95 | rss | notes |
|---|---|---|---|---|---|
| 1 | pmat score | 56.05 s | 56.90 s | 1.41 GB | Peer to comply check (D10/D54, 52 s / 1.44 GB). |
| 2 | pmat quality-gate | 31.90 s | 32.60 s | 319 MB | 64× over O(1) wall threshold. |
| 3 | pmat perfection-score | 5.35 s | 5.40 s | 145 MB | |
| 4 | pmat rust-project-score | 4.14 s | 4.14 s | 52 MB | |
| 5 | pmat tdg . | 2.91 s | 5.16 s | 47 MB | |
| 6 | pmat analyze duplicates | 2.08 s | 15.43 s | 44 MB | Extreme p50→p95 variance (7.4×). |
| 7 | pmat analyze complexity | 1.11 s | 1.12 s | 30 MB | |
| 8 | pmat infra-score | 0.95 s | 1.03 s | 37 MB | Borderline. |
| 9 | pmat analyze satd | 0.49 s | 0.55 s | 18 MB | Meets budget. |
| 10 | pmat analyze churn --days 30 | 0.47 s | 0.48 s | 70 MB | Meets budget. |
The score entry is the most interesting finding of R8: it is a structural peer of the comply-check defect (D10/D54), sitting at an essentially identical time/memory profile. §69.3 explains why that matters.
69.2 D72 — pmat serve --transport http Binds No Port
R8’s second headline is a new entry in the exit-0-on-error family. Chapter 65 already documented D13 (the HTTP server is a stub for --transport http). R8 extends that finding: every one of the four transports is a stub, none bind a port, and every one reports “Server ready”. This is D72.
Reproduction
pmat serve --transport http --port 58719 &
PID=$!
sleep 2
ss -tln | grep 58719 # expect: (nothing — no listener)
curl -sSf http://127.0.0.1:58719/health 2>&1
# expect: "Connection refused"
kill $PID
Stderr banner while “running”:
🚀 Starting PMAT HTTP server on http://127.0.0.1:58719
✅ Server ready!
Health check: http://127.0.0.1:58719/health
API base: http://127.0.0.1:58719/api/v1
HTTP server functionality ready for implementation.
Press Ctrl+C to exit.
All four transport values show the same pattern:
| transport | bound? | banner |
|---|---|---|
--transport http | NO | “HTTP server functionality ready for implementation.” |
--transport web-socket | NO | “WebSocket server implementation ready for 127.0.0.1:PORT” |
--transport http-sse | NO | “HTTP-SSE server implementation ready for 127.0.0.1:PORT” |
--transport all | NO | “Full server implementation ready for 127.0.0.1:PORT” |
Why This Is a Silent-Hang Footgun
A CI job or integration test that runs:
pmat serve --transport http --port 9090 &
sleep 3
curl http://localhost:9090/health # hangs or reports connection refused
will either:
- Timeout on the curl — reported as a downstream test failure, not a pmat bug.
- Leak the background process —
pmat servestays alive until the CI runner tears down the job, consuming a runner slot.
The banner says “ready” and the exit code (when eventually killed) is 0. Nothing in the contract tells the caller that pmat serve --transport http is not shippable. This is the same class as D16/D25/D26/D35/D57/D64..D71: a command that reports success but does not do what its name promises. D72 adds a wrinkle — instead of exiting 0 immediately, it exits 0 after hanging forever. That’s worse, because automated pipelines hang.
Harness
The companion regression harness is examples/http_stub_probe.rs in the pmat repo (R8 PR #346):
cargo run --example http_stub_probe
# expected today: exit=1, "PORT NOT BOUND — D72 reproduced. Banner lies."
# expected post-fix: exit=0, "PORT BOUND — D72 appears FIXED."
It spawns the server, waits 1.5 s, attempts TcpStream::connect_timeout, and flips its exit code when the defect closes. Zero new dependencies — stdlib only.
Fix Options
The D72 remediation is one of three:
- Implement the transports. Replace each stub
handle_*_serverinserver/src/cli/analysis_utilities/comprehensive_serve.rswith a realaxum::Router+tokio::net::TcpListener::bindpair wiring the existing MCP dispatch logic to REST/SSE/WebSocket. Medium scope; tracked as the expected fix for issue #333 D13. - Make the stub exit 1. Until (1) lands, change the handlers to eprintln “not yet implemented” and return
exit code 2. This turns a silent hang into a loud failure, which is the Toyota-Way honest move. - Remove the stub from the CLI surface.
claprejects unknown values with exit 2 automatically; removinghttp/web-socket/http-sse/allfrom the--transportenum eliminates the surface entirely until it can ship.
Option (2) is the R8 recommendation as a tactical patch.
69.3 The “Structural Peer” Pattern
A structural peer defect is one where two surfaces share enough of a codepath that one fix plausibly closes both. R8’s headline structural peer:
| defect | command | wall | rss |
|---|---|---|---|
| D10/D54 | pmat comply check | 52 s | 1.44 GB |
| R8 finding | pmat score | 56 s | 1.41 GB |
These are not independent slow paths. Both commands walk the project tree with walkdir::WalkDir, both deserialise large JSON graphs, both materialise intermediate results in a shared-shape Report. R7’s CB-1800 patch sketch (Chapter 68 §68.5) targeted comply-check specifically, but the data-types it proposes — ProjectSnapshot, FileCorpus, NamedTimer — generalise to any command that does a full-project walk. quality-gate (32 s, 319 MB) is the same family. rust-project-score (4 s, 52 MB) and perfection-score (5 s, 145 MB) are lighter peers but share the pattern.
The pattern matters for remediation triage. When you profile comply-check and fix the WalkDir-deduplication bug, you close the top-5 slow commands with the same PR instead of fixing them one at a time. This is the argument for landing CB-1800 as the first-priority R9 kaizen ticket: its blast radius covers D10, D54, plus the R8 top-4 violators.
69.4 D42 VERIFIED RESOLVED — MCP stdin EOF
Chapter 67 catalogued D42 — “MCP stdin never closes; pmat serve –mode mcp hangs 60 s after last JSON-RPC request”. R8’s MCP harness found it fixed in 3.14.0. Evidence:
cat request.json | /usr/bin/time pmat serve --mode mcp
# observed: exits within 10 ms of EOF, user=0.02 sys=0.01 real=0.03
All four MCP calls (tools/list, tools/call get_server_info, tools/call list_templates, tools/call analyze_complexity) complete sub-50 ms, ≤22 MB RSS. MCP-over-stdio is the one transport surface that is healthy and meets the O(1) contract. Chapter 64 is now accurate; Chapter 68 §68.1’s D42 row can be annotated RESOLVED in a future revision.
69.5 MCP Surface Gap — 21 of ~70 Tools Wired
A secondary R8 finding: tools/list returns exactly 21 MCP tools:
get_server_info, generate_template, list_templates, validate_template,
scaffold_project, search_templates, analyze_code_churn, analyze_complexity,
analyze_dag, generate_context, analyze_dead_code, analyze_deep_context,
analyze_duplicates_vectorized, analyze_graph_metrics_vectorized,
analyze_name_similarity_vectorized, analyze_symbol_table_vectorized,
analyze_incremental_coverage_vectorized, analyze_big_o_vectorized,
generate_enhanced_report, analyze_satd, analyze_lint_hotspot
PMAT’s CLI surface has ~70 subcommands. Four of the most user-visible are not exposed via MCP:
quality-gate— the single most-demanded MCP tool per dogfood feedback.tdg— agents ask for TDG grades constantly; today they must shell out.score/repo-score/rust-project-score— the whole score family.query— semantic search; agents want this as an MCP tool badly.
The R8 recommendation is to track MCP-completeness as its own ticket (roughly “MCP-0001: wire query/quality-gate/score/tdg under the existing MCP dispatch”). The pmcp 2.3 release ships #[mcp_tool] derive macros (PR #340 in the pmat repo is pending) which make this mechanical — annotate each CLI handler, auto-derive the MCP inputSchema, ship. The gap is configuration, not code.
69.6 The O(1) Scorecard
Applying the 500 ms / 256 MB threshold from the PMAT hook contract:
| Bucket | Count | Examples |
|---|---|---|
| Meets budget (≤500 ms wall, ≤256 MB RSS) | ~45 | all --help, diagnose, list, cache stats, query *, deps-audit, repo-score, analyze satd, analyze churn |
| Borderline (500 ms–2 s) | ~8 | analyze complexity, infra-score, analyze dead-code (p95 spike), query dispatch (p95 420 ms) |
| Over wall, under memory (2–10 s, ≤256 MB) | ~6 | analyze duplicates (p95 15 s), tdg ., rust-project-score, perfection-score, analyze dead-code (p95) |
| Over wall AND over memory (V++) | 2 | pmat score (56 s / 1.41 GB), pmat quality-gate (32 s / 319 MB) |
| Silent hang (∞ wall, 0 work) | 4 | pmat serve --transport {http,web-socket,http-sse,all} — D72 |
The scorecard is healthier than Chapter 67 suggested: the help-class and query-class commands comfortably meet the contract. The score-class does not, and the 2× V++ entries (score + quality-gate) consume disproportionate blame. Fix the WalkDir-dedup hotspot, and the scorecard moves from 2 V++ / 6 V / 8 borderline to 0 V++ / 2 V / 4 borderline overnight.
69.7 How to Reproduce R8
# 1. Benchmark matrix (skip V++ commands unless you have 5+ minutes)
cd /path/to/paiml-mcp-agent-toolkit
for cmd in "--version" "--help" "analyze satd" "analyze churn --days 30" \
"query dispatch --limit 5" "repo-score" "analyze complexity" \
"tdg ." "score" "quality-gate"; do
/usr/bin/time -v pmat $cmd 2>&1 | grep -E "(Elapsed|Maximum resident)"
done
# 2. D72 HTTP stub probe
cargo run --example http_stub_probe
# expected today: exit=1, "PORT NOT BOUND — D72 reproduced."
# 3. D42 MCP stdin EOF regression check
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
| /usr/bin/time -v pmat serve --mode mcp 2>&1 \
| grep -E "(Elapsed|Maximum resident)"
# expected: Elapsed <100 ms; D42 RESOLVED
# 4. MCP surface audit
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' \
| pmat serve --mode mcp | python3 -c "
import json, sys
data = json.load(sys.stdin)
tools = [t['name'] for t in data['result']['tools']]
print(f'{len(tools)} MCP tools; gaps: quality-gate, tdg, score, query')
"
69.8 What Landed, What Didn’t
R8 was deliberately book-only. Three harnesses shipped with the R7 PR (#346 examples: exit_code_audit_driver, mcp_timing_bench, o1_hook_probe). R8 adds a fourth — http_stub_probe.rs — tracked in the same PR. No pmat repo commits from R8 beyond that example; the data is in this chapter and in /tmp/pmat-dogfood/round8/bench-r8.md. Cross-references:
- Issue #347 — D72 HTTP stub, R8 performance matrix.
- Issue #337 — kaizen roadmap; expect CB-1800 (see Chapter 68 §68.5) to land R9 as first priority.
- PR #346 — R8 examples (4 harnesses: mcp_timing_bench, o1_hook_probe, exit_code_audit_driver, http_stub_probe).
- Chapter 65 — HTTP server stub (D13, pre-dates D72 by a release).
- Chapter 67, Chapter 68 — dogfooding precursors.
69.9 Closing
R8 added two durable findings:
pmat score= 56 s / 1.41 GB. Structural peer of comply-check. The top-2 V++ slow paths share a hotspot. Fix one, close both.- D72 — the HTTP serve stub is a silent-hang defect. Banner promises, port doesn’t bind, process hangs. Exit-code-0-on-error’s worst variant.
The discipline holds from Chapter 68:
If you cannot reproduce a defect, it is not a defect. If you cannot gate a defect, it will regress. If a performance claim has no benchmark, the claim is marketing.
R8 put numbers on the performance claims. Most of them hold. Two of them don’t. One of them is a socket that isn’t there. The work of R9 is to close the top structural-peer hotspot and stop printing “Server ready!” for servers that are not.
Cross-references:
- GitHub issues #347, #337, #333
- PR #346 — R8 examples
- Chapter 65 (
ch65-00-http-server.md) — HTTP server D13 (pre-R8 surface) - Chapter 67 (
ch67-00-dogfooding.md) — five-round dogfood catalogue (D1..D58) - Chapter 68 (
ch68-00-r7-defect-remediation.md) — CB-1800 patch sketch - Source:
server/src/cli/analysis_utilities/comprehensive_serve.rs(stubs) - Raw data:
/tmp/pmat-dogfood/round8/bench-r8.md,results.tsv,results-mcp.tsv