Chapter 70: R9 — Execve Coupling and the FileCache Pattern

R9 of the dogfood rounds (2026-04-18, same day as R8) did what R8 flagged: strace the 56 s pmat score path, find the structural peer bug in code, and name the remediation algebra. The answer is worse than Chapter 69 predicted. pmat score does not merely share a walker with pmat comply check — it literally execves pmat comply check as a child process, along with 122 other subprocess spawns for sibling score calculations. This chapter documents that finding, the five redundant WalkDir walks inside muda, the good model that already exists in-tree, and the refined kaizen ordering (KAIZEN-0087..0089) needed to close the hotspot.

Prior context: Chapter 68 (R7) sketched FileCache, ProjectSnapshot, FileCorpus. Chapter 69 (R8) measured the perf matrix and named score a structural peer of comply check (both 52–56 s, 1.41–1.44 GB). R9 proves the peer relationship is not coincidence — it is a literal parent/child relationship.

70.1 The Execve Discovery

The first R9 strace run on pmat score reported syscalls that no single-process code path should produce:

syscall	count
`statx`	1,600,000
`openat`	540,000
`getdents64`	203,000
`read`	877,000

1.6 M statx calls on an 18K-function repo means the tree is being walked ~90 times. The explanation is in the source. server/src/cli/handlers/score_handler.rs:258:

#![allow(unused)]
fn main() {
async fn compute_comply(path: &Path) -> (f64, usize, usize) {
    // Run pmat comply check --format json as subprocess to avoid internal coupling
    let output = std::process::Command::new("pmat")
        .args(["comply", "check", "--format", "json"])
        .current_dir(path)
        .stdout(std::process::Stdio::piped())
        ...
}

The comment says the quiet part aloud: to avoid internal coupling. The fix for coupling was to execve a fresh pmat binary. That fresh binary loads .pmat/context.db (275 MB for a workspace), re-parses every Cargo.toml, and re-walks the full tree. R9’s accounting: pmat score spawns 124 pmat children — one for comply, 122 for per-category sub-scorers, one for the final aggregate. Each child pays the full startup tax. 56 s wall, 1.41 GB RSS, 4.3 M syscalls. That is the number Chapter 69 measured; R9 shows why.

70.2 Muda’s Five Redundant Walks

R9’s second finding is inside muda, the waste-detection subsystem of comply. src/cli/handlers/comply_handlers/muda_handlers_metrics.rs reads the source tree five times per invocation:

file:line	purpose
`muda_handlers_metrics.rs:11`	`WalkDir::new(&src_dir)` — count files
`muda_handlers_metrics.rs:83`	`WalkDir::new(&src_dir)` — collect `.rs` paths
`muda_handlers_metrics.rs:119`	`WalkDir::new(&src_dir)` — size metrics
`muda_handlers_metrics.rs:241`	`WalkDir::new(&src_dir)` — age metrics
`muda_handlers_measurement.rs:92`	`WalkDir::new(src_dir)` — coverage denominator

None of these walks cache. None of them share state. muda is called once per comply check, which itself is called 124× per score. Five walks × 124 children × 18K files = 11.1 M redundant stat operations. That matches the observed strace count almost exactly. The muda bug alone accounts for ~70% of the observed syscall budget.

70.3 The Good Model Already Exists

R9’s third finding is the one that turns this into a tractable kaizen ticket instead of a rewrite. The correct pattern is already in the codebase. src/services/rust_project_score/orchestrator.rs:161:

#![allow(unused)]
fn main() {
// Kaizen Round 4: Create FileCache once, share across all scorers
// BEFORE: Each scorer read filesystem independently (22 walks!)
// AFTER: Single filesystem walk, cache shared (3x faster)
let file_cache = match FileCache::populate(project_path) {
    Ok(cache) => { /* ... */ Some(cache) }
    Err(e) => { /* ... */ None }
};

// Kaizen Round 5: Parallel scorer execution for 2-3x speedup
let results: Result<Vec<_>, ScorerError> = Ok(self.scorers
    .par_iter()
    .filter_map(|scorer| {
        scorer.score_with_cache(project_path, mode, file_cache.as_ref())
            .ok().map(/* ... */)
    })
    .collect());
}

rust-project-score measures in at 4.1 s / 52 MB (Chapter 69 top-10 row 4) with the same category breadth as score precisely because it does one walk, then parallel in-memory consumption. score’s 56 s / 1.41 GB is the cost of not doing that. The KAIZEN-0083 hypothesis from Chapter 68 — “expose every per-file analyzer as Fn(FileFacts) -> Result<Metric>” — is the long-term end state. But the short-term fix is less ambitious: delete the execve, call comply_check() as a library function with a shared FileCache.

70.4 The Algebra (arXiv:2412.10632)

R9’s fourth finding is the framing from Composable Program Analyses via Monoid Homomorphisms (arXiv:2412.10632). A file-level metric aggregator is a monoid homomorphism iff the binary op ⊗ satisfies:

⊕ commutative — files can be walked in any order, result is stable.
⊗ sequencing — per-file metrics compose associatively.
Associativity — (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c); unique result independent of partition.

Every scorer in score today satisfies this trivially on paper — they are sums, means, or ratios over file properties. But the execve coupling breaks it: subprocess boundaries mean the monoid is evaluated 124× instead of once. The refactor target is to move from spawn × N to rayon::par_iter() × (FileFacts → Metric), where the paper’s algebra guarantees the rayon partition produces the same result as the sequential one.

This is the R9 contribution to the spec: the Fn(FileFacts) -> Result<Metric> trait in KAIZEN-0083 is not arbitrary. It is the shape the monoid algebra requires.

70.5 Refined Kaizen Ordering (KAIZEN-0087..0089)

Chapter 68 §68.5 sketched CB-1800 as one big ticket. R9 splits it along the dependency graph into five smaller ones, ordered by risk and reward:

ticket	scope	effort	blast radius
KAIZEN-0087	Remove `execve` in `score_handler.rs:258`; call `comply_check()` as library	0.5 d	drops 124 subprocess spawns → 1
KAIZEN-0084	Wire `HooksCacheManager` into `comply check`; stop re-parsing config per run	1 d	−15% wall on comply
KAIZEN-0088	De-duplicate the 5 `muda` walks into one `Arc<FileCache>` pass	1 d	−70% of syscall budget
KAIZEN-0089	Migrate `file_health` + `pv_lint` + `quality-gate` walkers to `FileCache`	1 d	closes remaining V/V++ from ch69 top-10
KAIZEN-0083	Expose all per-file analyzers as `Fn(FileFacts) -> Result<Metric>`; monoid-clean	5 d	long-term architecture win

KAIZEN-0087 is first because it is the highest-leverage, lowest-risk change: one Command::new replaced with a library call, and the top-1 V++ row from ch69 collapses. KAIZEN-0083 is last because it is a type-system refactor and its correctness proof rides on the arXiv:2412.10632 algebra — worth doing, but not worth blocking the immediate wins on.

70.6 Five Instrumentation Points for Future Profiling

R9’s dhat run (same methodology as the March 2026 memory profiling note in MEMORY.md) identified five cold spots that need NamedTimer instrumentation before KAIZEN-0087..0089 ship. Without timers, the post-fix claim “we dropped 56 s to N s” has no receipt:

score_handler.rs:258 — subprocess spawn for comply. Timer wraps the Command::new → output.wait() span.
muda_handlers_metrics.rs:11 — first WalkDir. Timer wraps the collect().
muda_handlers_metrics.rs:83 — second WalkDir. Same pattern.
muda_handlers_measurement.rs:92 — coverage-denominator walk. Attributable to a different scorer; separate timer so we can see which scorer dominates.
rust_project_score/orchestrator.rs:161 — FileCache::populate itself. This is the good path; timer here lets us quote the ratio bad:good honestly.

The NamedTimer type itself is sketched in Chapter 68 §68.5; R9 recommends landing it as its own tiny ticket (≈30 LOC) before the kaizen work so each KAIZEN-00XX PR ships with a before/after row in the chapter.

70.7 Why No New Example This Round

Chapter 69 added http_stub_probe.rs to PR #346 alongside mcp_timing_bench.rs, o1_hook_probe.rs, and exit_code_audit_driver.rs. R9 considered a fifth — filecache_redundancy_demo.rs — to measure the ratio of N sequential WalkDir passes vs one walk + N in-memory iterations. The measurement would produce a clean ~N× ratio on any tree, and the code is ≤80 LOC stdlib-only.

R9 skipped it. Reasoning: PR #346 is awaiting merge (user signalled a merge pass is worth doing first), and the ratio is already observable in production data — the 5 muda walks vs the 1 FileCache::populate in rust_project_score give the exact number the demo would reproduce. A demo adds a sixth example to a pending PR that is queued behind a merge gate; the book reference to lines 11/83/119/241/92 vs line 161 is sufficient. If merging #346 clears the queue, KAIZEN-0088’s PR is a natural home for the demo — it can be the before/after harness for the fix itself.

70.8 How to Reproduce R9

# 1. strace the execve count (confirms the 124 subprocess spawns)
strace -f -e execve -c pmat score 2>&1 | grep -c "execve.*pmat"
# expected: 124  (1 root + 1 comply + 122 sub-scorers)

# 2. syscall counts matching R9's table
strace -f -c pmat score 2>&1 | tail -20
# expected: statx ≈ 1.6M, openat ≈ 540K, getdents64 ≈ 203K, read ≈ 877K

# 3. Confirm the good path: rust-project-score ≈ 4 s, same repo
time pmat rust-project-score
# expected: Elapsed ≈ 4 s, RSS ≈ 52 MB

# 4. Count the muda walks in-source
grep -n "walkdir::WalkDir" \
  server/src/cli/handlers/comply_handlers/muda_handlers_metrics.rs \
  server/src/cli/handlers/comply_handlers/muda_handlers_measurement.rs
# expected: 5 matches (lines 11, 83, 119, 241 + 92)

# 5. Point at the good model
grep -n "FileCache::populate" server/src/services/rust_project_score/orchestrator.rs
# expected: line 161

70.9 Closing

R9 answered the question Chapter 69 left open: why is pmat score a structural peer of comply check? Answer: it is not a peer, it is a parent that spawns comply as a child, and that child itself walks the tree five redundant times. The structural-peer framing under-stated the coupling. It is not that two commands share a walker; it is that one command execves the other 124 times and the inner command is its own five-walk waste factory.

The good news is that the fix is already written. rust_project_score/orchestrator.rs:161 is the model. KAIZEN-0087 is a 0.5-day execve-removal PR. The monoid algebra from arXiv:2412.10632 says the refactor is type-safe. The five WalkDir lines in muda are a visible checklist. Chapter 68’s CB-1800 sketch now has a precise ordering and a precise dependency graph.

Chapter 68’s rule holds: if you cannot reproduce a defect, it is not a defect. R9’s contribution: if the fix is already in the same repo, the only kaizen that matters is copying it next door.

Cross-references:

Chapter 68 (ch68-00-r7-defect-remediation.md) — CB-1800 patch sketch (pre-split).
Chapter 69 (ch69-00-r8-performance-and-http-stub.md) — performance matrix + structural peer framing.
PR #346 — R8 examples (pending merge; R9 added no new example).
Issue #347 — R8/R9 perf findings, D72.
Issue #337 — kaizen roadmap; KAIZEN-0087..0089 to land here.
Source cited: server/src/cli/handlers/score_handler.rs:258, server/src/cli/handlers/comply_handlers/muda_handlers_metrics.rs:{11,83,119,241}, server/src/cli/handlers/comply_handlers/muda_handlers_measurement.rs:92, server/src/services/rust_project_score/orchestrator.rs:161.
Paper: arXiv:2412.10632 — Composable Program Analyses via Monoid Homomorphisms.

PMAT: The PAIML MCP Agent Toolkit