Chapter 43: File Health and Max-Lines Enforcement (CB-040)

The File Health system enforces code maintainability by preventing excessively large files. Based on Toyota Production System principles and peer-reviewed research, this feature ensures:

No file exceeds 500 lines (new files)
Existing files cannot grow (ratchet mechanism)
Test-to-Lines Ratio (TLR) scaling requirements

The Problem

Large files violate the Single Responsibility Principle and become:

Untestable (cognitive overload)
Unmaintainable (merge conflicts)
Error-prone (complexity hotspots)

Research shows files over 500 lines have 2.4x higher defect rates (Nagappan et al., IEEE TSE 2006).

Quick Start

# Check file health in your project
pmat comply check

# View detailed file health report
pmat comply check --verbose

File Health Metrics

1. File Size Classes

Class	Lines	Risk Level
Optimal	0-200	Low
Acceptable	201-500	Medium
Warning	501-1000	High
Critical	1001-2000	Very High
Emergency	2000+	Extreme

2. Test-to-Lines Ratio (TLR)

TLR requirements scale with file size:

File Size	Required TLR	Rationale
< 100 lines	0.3	Simple code needs fewer tests
100-300 lines	0.5	Moderate complexity
300-500 lines	0.8	Complex code needs more tests
500-1000 lines	1.2	High complexity penalty
> 1000 lines	1.5	Critical files need extensive tests

3. File Health Score Formula

Health Score = (Size Score × 0.30) + (TLR Score × 0.40) +
               (Complexity Score × 0.20) + (Stability Score × 0.10)

Where:
- Size Score: 100 - (lines / max_lines × 100)
- TLR Score: min(100, actual_tlr / required_tlr × 100)
- Complexity Score: 100 - (avg_complexity / threshold × 100)
- Stability Score: 100 - (churn_30d / 10 × 100)

4. Health Grades

Grade	Score Range	Status
A+	95-100	Excellent
A	90-94	Great
B	80-89	Good
C	70-79	Acceptable
D	60-69	Needs Work
F	< 60	Critical

Pre-commit Hook Enforcement

The pre-commit hook enforces two rules:

Rule 1: New Files Must Be < 500 Lines

# New file check
if [ "$LINES" -gt "$MAX_LINES_NEW" ]; then
    echo "❌ NEW file $file has $LINES lines (max: $MAX_LINES_NEW)"
    exit 1
fi

Rule 2: Existing Files Cannot Grow (Ratchet)

# Ratchet mechanism - Toyota Way Kaizen
BASELINE=$(git show HEAD:"$file" 2>/dev/null | wc -l || echo 0)
if [ "$LINES" -gt "$BASELINE" ] && [ "$BASELINE" -gt 0 ]; then
    echo "❌ RATCHET: $file grew from $BASELINE to $LINES lines"
    echo "   Files can only shrink or stay the same (Toyota Way: Kaizen)"
    exit 1
fi

Installing the Pre-commit Hook

# Install PMAT hooks (includes file health check)
pmat hooks install

# Verify hook is installed
cat .git/hooks/pre-commit | grep "File Health"

Compliance Check Output

When you run pmat comply check, the file health section shows:

📊 File Health Summary
├── 60 files >2000 lines (CRITICAL)
├── 117 files >1000 lines
├── 459 files >500 lines
├── Average Health Score: 73%
└── Grade: C

Priority Files for Refactoring:
1. analysis_utilities.rs (12,087 lines) - EMERGENCY
2. deep_context.rs (7,211 lines) - EMERGENCY
3. commands.rs (6,273 lines) - EMERGENCY
4. tools.rs (6,111 lines) - EMERGENCY

Toyota Way Principles

Jidoka (Built-in Quality)

Quality is built into the process through automated enforcement at commit time.

Kaizen (Continuous Improvement)

The ratchet mechanism ensures files never grow larger - only improvement is allowed.

Muda (Waste Elimination)

Large files represent waste: duplicated logic, dead code, and cognitive overhead.

Genchi Genbutsu (Go and See)

File health metrics are based on actual measurements, not estimates.

Refactoring Strategies

When a file exceeds limits, use these strategies:

1. Extract Module

#![allow(unused)]
fn main() {
// Before: large_file.rs (2000+ lines)
mod validation;
mod processing;
mod reporting;

// After: validation.rs, processing.rs, reporting.rs (~500 lines each)
}

2. Extract Trait

#![allow(unused)]
fn main() {
// Before: monolithic struct
impl LargeService {
    fn validate(&self) { ... }
    fn process(&self) { ... }
    fn report(&self) { ... }
}

// After: focused traits
trait Validator { fn validate(&self); }
trait Processor { fn process(&self); }
trait Reporter { fn report(&self); }
}

3. Extract Constants

#![allow(unused)]
fn main() {
// Before: inline constants throughout
const TIMEOUT: u64 = 30;
const MAX_RETRIES: u32 = 3;

// After: constants module
mod constants {
    pub const TIMEOUT: u64 = 30;
    pub const MAX_RETRIES: u32 = 3;
}
}

Peer-Reviewed References

Nagappan, N., Ball, T. (2006). “Using Software Dependencies and Churn Metrics to Predict Field Failures.” IEEE TSE.
Zimmermann, T., Nagappan, N. (2008). “Predicting Defects using Network Analysis on Dependency Graphs.” ICSE.
Ohno, T. (1988). “Toyota Production System: Beyond Large-Scale Production.” Productivity Press.
Bird, C., et al. (2011). “Don’t Touch My Code! Examining the Effects of Ownership on Software Quality.” FSE.
Bacchelli, A., Bird, C. (2013). “Expectations, Outcomes, and Challenges of Modern Code Review.” ICSE.

Popperian Falsifiability

The file health system uses testable hypotheses:

Falsifiable Claims

Claim: Files > 500 lines have higher defect rates
- Test: Compare defect density in small vs large files
- Threshold: 2x higher in large files
Claim: TLR < 0.5 correlates with bugs
- Test: Track bug rates by TLR quartile
- Threshold: Bottom quartile has 3x more bugs
Claim: Ratchet prevents regression
- Test: Measure average file size over 6 months
- Threshold: Average must not increase

Configuration

Configure file health thresholds in .pmat/project.toml:

[file-health]
max_lines_new = 500
max_lines_critical = 2000
required_tlr_scaling = true
enforce_ratchet = true

[file-health.thresholds]
optimal = 200
acceptable = 500
warning = 1000
critical = 2000

Summary

File Health enforcement prevents the accumulation of technical debt through:

Hard limits on new file sizes (500 lines)
Ratchet mechanism preventing growth of existing files
TLR scaling requiring more tests for larger files
Health scoring with actionable grades
Pre-commit hooks for automated enforcement

This implements Toyota Way principles (Jidoka, Kaizen, Muda elimination) with evidence-based thresholds from peer-reviewed research.

PMAT: The PAIML MCP Agent Toolkit