Chapter 70: R9 — Execve Coupling and the FileCache Pattern
R9 of the dogfood rounds (2026-04-18, same day as R8) did what R8 flagged: strace the 56 s pmat score path, find the structural peer bug in code, and name the remediation algebra. The answer is worse than Chapter 69 predicted. pmat score does not merely share a walker with pmat comply check — it literally execves pmat comply check as a child process, along with 122 other subprocess spawns for sibling score calculations. This chapter documents that finding, the five redundant WalkDir walks inside muda, the good model that already exists in-tree, and the refined kaizen ordering (KAIZEN-0087..0089) needed to close the hotspot.
Prior context: Chapter 68 (R7) sketched FileCache, ProjectSnapshot, FileCorpus. Chapter 69 (R8) measured the perf matrix and named score a structural peer of comply check (both 52–56 s, 1.41–1.44 GB). R9 proves the peer relationship is not coincidence — it is a literal parent/child relationship.
70.1 The Execve Discovery
The first R9 strace run on pmat score reported syscalls that no single-process code path should produce:
| syscall | count |
|---|---|
statx | 1,600,000 |
openat | 540,000 |
getdents64 | 203,000 |
read | 877,000 |
1.6 M statx calls on an 18K-function repo means the tree is being walked ~90 times. The explanation is in the source. server/src/cli/handlers/score_handler.rs:258:
#![allow(unused)] fn main() { async fn compute_comply(path: &Path) -> (f64, usize, usize) { // Run pmat comply check --format json as subprocess to avoid internal coupling let output = std::process::Command::new("pmat") .args(["comply", "check", "--format", "json"]) .current_dir(path) .stdout(std::process::Stdio::piped()) ... }
The comment says the quiet part aloud: to avoid internal coupling. The fix for coupling was to execve a fresh pmat binary. That fresh binary loads .pmat/context.db (275 MB for a workspace), re-parses every Cargo.toml, and re-walks the full tree. R9’s accounting: pmat score spawns 124 pmat children — one for comply, 122 for per-category sub-scorers, one for the final aggregate. Each child pays the full startup tax. 56 s wall, 1.41 GB RSS, 4.3 M syscalls. That is the number Chapter 69 measured; R9 shows why.
70.2 Muda’s Five Redundant Walks
R9’s second finding is inside muda, the waste-detection subsystem of comply. src/cli/handlers/comply_handlers/muda_handlers_metrics.rs reads the source tree five times per invocation:
| file:line | purpose |
|---|---|
muda_handlers_metrics.rs:11 | WalkDir::new(&src_dir) — count files |
muda_handlers_metrics.rs:83 | WalkDir::new(&src_dir) — collect .rs paths |
muda_handlers_metrics.rs:119 | WalkDir::new(&src_dir) — size metrics |
muda_handlers_metrics.rs:241 | WalkDir::new(&src_dir) — age metrics |
muda_handlers_measurement.rs:92 | WalkDir::new(src_dir) — coverage denominator |
None of these walks cache. None of them share state. muda is called once per comply check, which itself is called 124× per score. Five walks × 124 children × 18K files = 11.1 M redundant stat operations. That matches the observed strace count almost exactly. The muda bug alone accounts for ~70% of the observed syscall budget.
70.3 The Good Model Already Exists
R9’s third finding is the one that turns this into a tractable kaizen ticket instead of a rewrite. The correct pattern is already in the codebase. src/services/rust_project_score/orchestrator.rs:161:
#![allow(unused)] fn main() { // Kaizen Round 4: Create FileCache once, share across all scorers // BEFORE: Each scorer read filesystem independently (22 walks!) // AFTER: Single filesystem walk, cache shared (3x faster) let file_cache = match FileCache::populate(project_path) { Ok(cache) => { /* ... */ Some(cache) } Err(e) => { /* ... */ None } }; // Kaizen Round 5: Parallel scorer execution for 2-3x speedup let results: Result<Vec<_>, ScorerError> = Ok(self.scorers .par_iter() .filter_map(|scorer| { scorer.score_with_cache(project_path, mode, file_cache.as_ref()) .ok().map(/* ... */) }) .collect()); }
rust-project-score measures in at 4.1 s / 52 MB (Chapter 69 top-10 row 4) with the same category breadth as score precisely because it does one walk, then parallel in-memory consumption. score’s 56 s / 1.41 GB is the cost of not doing that. The KAIZEN-0083 hypothesis from Chapter 68 — “expose every per-file analyzer as Fn(FileFacts) -> Result<Metric>” — is the long-term end state. But the short-term fix is less ambitious: delete the execve, call comply_check() as a library function with a shared FileCache.
70.4 The Algebra (arXiv:2412.10632)
R9’s fourth finding is the framing from Composable Program Analyses via Monoid Homomorphisms (arXiv:2412.10632). A file-level metric aggregator is a monoid homomorphism iff the binary op ⊗ satisfies:
- ⊕ commutative — files can be walked in any order, result is stable.
- ⊗ sequencing — per-file metrics compose associatively.
- Associativity —
(a ⊗ b) ⊗ c = a ⊗ (b ⊗ c); unique result independent of partition.
Every scorer in score today satisfies this trivially on paper — they are sums, means, or ratios over file properties. But the execve coupling breaks it: subprocess boundaries mean the monoid is evaluated 124× instead of once. The refactor target is to move from spawn × N to rayon::par_iter() × (FileFacts → Metric), where the paper’s algebra guarantees the rayon partition produces the same result as the sequential one.
This is the R9 contribution to the spec: the Fn(FileFacts) -> Result<Metric> trait in KAIZEN-0083 is not arbitrary. It is the shape the monoid algebra requires.
70.5 Refined Kaizen Ordering (KAIZEN-0087..0089)
Chapter 68 §68.5 sketched CB-1800 as one big ticket. R9 splits it along the dependency graph into five smaller ones, ordered by risk and reward:
| ticket | scope | effort | blast radius |
|---|---|---|---|
| KAIZEN-0087 | Remove execve in score_handler.rs:258; call comply_check() as library | 0.5 d | drops 124 subprocess spawns → 1 |
| KAIZEN-0084 | Wire HooksCacheManager into comply check; stop re-parsing config per run | 1 d | −15% wall on comply |
| KAIZEN-0088 | De-duplicate the 5 muda walks into one Arc<FileCache> pass | 1 d | −70% of syscall budget |
| KAIZEN-0089 | Migrate file_health + pv_lint + quality-gate walkers to FileCache | 1 d | closes remaining V/V++ from ch69 top-10 |
| KAIZEN-0083 | Expose all per-file analyzers as Fn(FileFacts) -> Result<Metric>; monoid-clean | 5 d | long-term architecture win |
KAIZEN-0087 is first because it is the highest-leverage, lowest-risk change: one Command::new replaced with a library call, and the top-1 V++ row from ch69 collapses. KAIZEN-0083 is last because it is a type-system refactor and its correctness proof rides on the arXiv:2412.10632 algebra — worth doing, but not worth blocking the immediate wins on.
70.6 Five Instrumentation Points for Future Profiling
R9’s dhat run (same methodology as the March 2026 memory profiling note in MEMORY.md) identified five cold spots that need NamedTimer instrumentation before KAIZEN-0087..0089 ship. Without timers, the post-fix claim “we dropped 56 s to N s” has no receipt:
score_handler.rs:258— subprocess spawn for comply. Timer wraps theCommand::new→output.wait()span.muda_handlers_metrics.rs:11— firstWalkDir. Timer wraps thecollect().muda_handlers_metrics.rs:83— secondWalkDir. Same pattern.muda_handlers_measurement.rs:92— coverage-denominator walk. Attributable to a different scorer; separate timer so we can see which scorer dominates.rust_project_score/orchestrator.rs:161—FileCache::populateitself. This is the good path; timer here lets us quote the ratiobad:goodhonestly.
The NamedTimer type itself is sketched in Chapter 68 §68.5; R9 recommends landing it as its own tiny ticket (≈30 LOC) before the kaizen work so each KAIZEN-00XX PR ships with a before/after row in the chapter.
70.7 Why No New Example This Round
Chapter 69 added http_stub_probe.rs to PR #346 alongside mcp_timing_bench.rs, o1_hook_probe.rs, and exit_code_audit_driver.rs. R9 considered a fifth — filecache_redundancy_demo.rs — to measure the ratio of N sequential WalkDir passes vs one walk + N in-memory iterations. The measurement would produce a clean ~N× ratio on any tree, and the code is ≤80 LOC stdlib-only.
R9 skipped it. Reasoning: PR #346 is awaiting merge (user signalled a merge pass is worth doing first), and the ratio is already observable in production data — the 5 muda walks vs the 1 FileCache::populate in rust_project_score give the exact number the demo would reproduce. A demo adds a sixth example to a pending PR that is queued behind a merge gate; the book reference to lines 11/83/119/241/92 vs line 161 is sufficient. If merging #346 clears the queue, KAIZEN-0088’s PR is a natural home for the demo — it can be the before/after harness for the fix itself.
70.8 How to Reproduce R9
# 1. strace the execve count (confirms the 124 subprocess spawns)
strace -f -e execve -c pmat score 2>&1 | grep -c "execve.*pmat"
# expected: 124 (1 root + 1 comply + 122 sub-scorers)
# 2. syscall counts matching R9's table
strace -f -c pmat score 2>&1 | tail -20
# expected: statx ≈ 1.6M, openat ≈ 540K, getdents64 ≈ 203K, read ≈ 877K
# 3. Confirm the good path: rust-project-score ≈ 4 s, same repo
time pmat rust-project-score
# expected: Elapsed ≈ 4 s, RSS ≈ 52 MB
# 4. Count the muda walks in-source
grep -n "walkdir::WalkDir" \
server/src/cli/handlers/comply_handlers/muda_handlers_metrics.rs \
server/src/cli/handlers/comply_handlers/muda_handlers_measurement.rs
# expected: 5 matches (lines 11, 83, 119, 241 + 92)
# 5. Point at the good model
grep -n "FileCache::populate" server/src/services/rust_project_score/orchestrator.rs
# expected: line 161
70.9 Closing
R9 answered the question Chapter 69 left open: why is pmat score a structural peer of comply check? Answer: it is not a peer, it is a parent that spawns comply as a child, and that child itself walks the tree five redundant times. The structural-peer framing under-stated the coupling. It is not that two commands share a walker; it is that one command execves the other 124 times and the inner command is its own five-walk waste factory.
The good news is that the fix is already written. rust_project_score/orchestrator.rs:161 is the model. KAIZEN-0087 is a 0.5-day execve-removal PR. The monoid algebra from arXiv:2412.10632 says the refactor is type-safe. The five WalkDir lines in muda are a visible checklist. Chapter 68’s CB-1800 sketch now has a precise ordering and a precise dependency graph.
Chapter 68’s rule holds: if you cannot reproduce a defect, it is not a defect. R9’s contribution: if the fix is already in the same repo, the only kaizen that matters is copying it next door.
Cross-references:
- Chapter 68 (
ch68-00-r7-defect-remediation.md) — CB-1800 patch sketch (pre-split). - Chapter 69 (
ch69-00-r8-performance-and-http-stub.md) — performance matrix + structural peer framing. - PR #346 — R8 examples (pending merge; R9 added no new example).
- Issue #347 — R8/R9 perf findings, D72.
- Issue #337 — kaizen roadmap; KAIZEN-0087..0089 to land here.
- Source cited:
server/src/cli/handlers/score_handler.rs:258,server/src/cli/handlers/comply_handlers/muda_handlers_metrics.rs:{11,83,119,241},server/src/cli/handlers/comply_handlers/muda_handlers_measurement.rs:92,server/src/services/rust_project_score/orchestrator.rs:161. - Paper: arXiv:2412.10632 — Composable Program Analyses via Monoid Homomorphisms.