Chapter 53: Spec Falsification Engine

The pmat falsify command validates specification claims against the actual codebase using Popperian falsification. Instead of confirming that specs are correct, it actively searches for disconfirming evidence — a strictly more powerful approach based on Karl Popper’s philosophy of science.

Overview

Specifications drift from reality. A spec might claim “MUST use SQLite for storage” when the code actually uses PostgreSQL, or reference a file path that no longer exists. The falsification engine extracts testable claims from markdown documents and cross-references them against the codebase to catch contradictions before they become bugs.

┌──────────────────┐     ┌───────────────────┐     ┌──────────────────┐
│  Spec Document   │────▶│  Claim Extractor   │────▶│  94 Claims       │
│  (markdown/yaml) │     │  (RFC-2119, paths, │     │  extracted       │
│                  │     │   entities, etc.)  │     │                  │
└──────────────────┘     └───────────────────┘     └────────┬─────────┘
                                                            │
                                                            ▼
┌──────────────────┐     ┌───────────────────┐     ┌──────────────────┐
│  Report          │◀────│  Falsification     │◀────│  Strategy Router │
│  (pass/fail per  │     │  Strategies        │     │  (per category)  │
│   claim)         │     │  (path, entity,    │     │                  │
│                  │     │   absence, etc.)   │     │                  │
└──────────────────┘     └───────────────────┘     └──────────────────┘

Quick Start

# Falsify a single specification
pmat falsify docs/specifications/my-feature.md

# Falsify all specs in a directory
pmat falsify docs/specifications/

# Dry run — show extracted claims without checking
pmat falsify docs/specifications/my-feature.md --dry-run

# JSON output for CI/CD
pmat falsify docs/specifications/my-feature.md --format json 2>/dev/null

# Show only failures
pmat falsify docs/specifications/my-feature.md --failures-only

Claim Extraction

The engine automatically extracts falsifiable claims from structured documents using nine pattern categories:

RFC-2119 Keywords

Claims using MUST, SHALL, SHOULD, MAY (per RFC 2119) are extracted with priority based on keyword strength:

<!-- These become falsifiable claims: -->
The system MUST store receipts in `.pmat-work/{id}/falsification/`
The API SHOULD return results within 200ms
Configuration files SHALL use TOML format
KeywordPriorityMeaning
MUST / SHALL / REQUIREDP0 (Critical)Absolute requirement
SHOULD / RECOMMENDEDP1 (High)Strong recommendation
MAY / OPTIONALP2 (Medium)Truly optional

Path References

Any file or directory path is checked for existence:

<!-- Extracted and verified against filesystem: -->
Configuration at `.pmat-metrics.toml`
Source in `src/services/spec_falsification.rs`
Schema at `docs/schemas/receipt.json`

Code Entities

References to functions, structs, modules, and traits are searched in the codebase index:

<!-- Verified via pmat query: -->
The `FalsificationEngine` struct handles all strategies
The `extract_claims()` method returns a `Vec<SpecClaim>`

Numeric Thresholds

Measurable claims with specific numbers:

<!-- Extracted as metric claims: -->
Response time MUST be under 200ms
Coverage SHOULD exceed 85%
Maximum file size: 500 lines

Absence Assertions

Claims that something should NOT exist:

<!-- Verified by searching for the forbidden pattern: -->
MUST NOT use unwrap() in production code
SHALL NOT store secrets in plaintext

Claim Categories and Strategies

Each extracted claim is categorized and routed to the appropriate falsification strategy:

CategoryStrategyHow It’s Falsified
PathReferenceCheck filesystemPath exists? Similar file found?
CodeEntitySearch codebase indexEntity found via pmat query?
AbsenceClaimReverse searchForbidden pattern found?
CommandClaimValidate executableCommand exists and is parseable?
MetricClaimCheck thresholdMetric within stated bounds?
ArchitecturalClaimStructural analysisArchitecture matches description?
BehaviorClaimEvidence searchBehavioral evidence found?
RequirementEvidence searchImplementation evidence exists?

Verdict Statuses

Each claim receives one of four verdicts:

StatusMeaningAction
SurvivedClaim could not be falsified (evidence supports it)No action needed
FalsifiedEvidence contradicts the claimFix the spec or the code
NotTestableClaim cannot be empirically testedConsider rewriting
SkippedClaim filtered or deferredReview manually

Output Formats

Human-Readable (Default)

Spec Falsification Report: docs/specifications/my-feature.md
============================================================

  94 claims extracted, 40 survived, 9 falsified, 45 not testable

  Health Score: 0.62 (FAIR)

  FALSIFIED Claims:
  ─────────────────
  [P1] Line 142: "SHOULD use trueno-rag for vector search"
       Category: CodeEntity
       Evidence: Entity 'trueno-rag' not found in codebase
       Suggestion: Update spec or implement trueno-rag integration

  [P2] Line 87: Schema at `docs/schemas/claim.json`
       Category: PathReference
       Evidence: Path does not exist
       Similar: docs/schemas/receipt.json

JSON Output

pmat falsify docs/specifications/my-feature.md --format json 2>/dev/null
{
  "file": "docs/specifications/my-feature.md",
  "total_claims": 94,
  "survived": 40,
  "falsified": 9,
  "not_testable": 45,
  "skipped": 0,
  "health_score": 0.62,
  "verdicts": [
    {
      "claim": "SHOULD use trueno-rag for vector search",
      "line": 142,
      "category": "CodeEntity",
      "priority": "P1",
      "status": "Falsified",
      "evidence": "Entity 'trueno-rag' not found in codebase"
    }
  ]
}

Health Score

The health score (0.0 to 1.0) is calculated as:

health = survived / (survived + falsified)

Claims that are NotTestable or Skipped are excluded from the calculation. This means a spec with many vague claims won’t get a high score just because those claims can’t be disproven.

Score RangeRatingMeaning
0.90 - 1.00ExcellentSpec is highly aligned with codebase
0.75 - 0.89GoodMinor drift, review falsified claims
0.50 - 0.74FairSignificant drift, update needed
0.00 - 0.49PoorSpec is largely contradicted by code

Directory Mode

Falsify all specs in a directory at once:

# Scan entire specifications directory
pmat falsify docs/specifications/

# Only show failures across all specs
pmat falsify docs/specifications/ --failures-only

The engine collects all .md, .yaml, and .yml files recursively and produces a report for each.

Integration with Work System

The pmat falsify command integrates with the work system in two ways:

Work Item Falsification

When given a work item ID instead of a file path, the command delegates to the existing work contract falsification system:

# Falsify a work item's contract claims
pmat falsify GH-123

# With overrides
pmat falsify GH-123 --override-claims "claim-1,claim-2" --ticket "DEBT-456"

Spec Falsification in Work Complete

The pmat work complete command automatically runs falsification on any linked specification files as part of the quality gate. See Chapter 34 for details on the work contract system and falsification ledger.

CI/CD Integration

Add spec falsification to your CI pipeline:

# .github/workflows/quality.yml
- name: Falsify Specifications
  run: |
    pmat falsify docs/specifications/ --format json --failures-only 2>/dev/null > falsify.json
    FALSIFIED=$(jq '.verdicts | map(select(.status == "Falsified")) | length' falsify.json)
    if [ "$FALSIFIED" -gt 0 ]; then
      echo "::error::$FALSIFIED spec claims falsified by codebase"
      jq '.verdicts[] | select(.status == "Falsified")' falsify.json
      exit 1
    fi

Dry Run Mode

Use --dry-run to see what claims would be extracted without running falsification:

pmat falsify docs/specifications/my-feature.md --dry-run
Dry Run: 94 claims extracted from docs/specifications/my-feature.md
─────────────────────────────────────────────────────────────────────

  [P0] Line 14: "MUST use SQLite for storage" (Requirement)
  [P1] Line 42: Path `src/services/cache.rs` (PathReference)
  [P1] Line 87: Entity `FalsificationEngine` (CodeEntity)
  [P2] Line 123: "SHOULD return within 200ms" (MetricClaim)
  ...
CommandPurposeInputFocus
pmat falsifyFalsify spec claimsSpec files, work itemsDisconfirming evidence
pmat validate-readmeValidate README accuracyREADME.mdConfirmation of claims
pmat work completeGate work completionWork contractContract claim verification
pmat popper-scoreScore falsifiabilityProject-wideFalsifiability measurement
pmat spec scoreScore spec qualitySpec fileStructural quality (95-point)

Summary

The pmat falsify command brings Popperian epistemology to specification management:

  • Automatically extracts falsifiable claims from specs using RFC-2119 keywords, path references, code entities, and more
  • Actively searches for disconfirming evidence rather than confirming matches
  • Reports per-claim verdicts with evidence and suggestions
  • Supports JSON output for CI/CD integration
  • Works on individual files or entire directories
  • Integrates with the work system for automated quality gates

For the underlying philosophy, see Chapter 37: Popper Falsifiability Score. For work system integration, see Chapter 34: Unified Workflow Management.