Chapter 53: Spec Falsification Engine

The pmat falsify command validates specification claims against the actual codebase using Popperian falsification. Instead of confirming that specs are correct, it actively searches for disconfirming evidence — a strictly more powerful approach based on Karl Popper’s philosophy of science.

Overview

Specifications drift from reality. A spec might claim “MUST use SQLite for storage” when the code actually uses PostgreSQL, or reference a file path that no longer exists. The falsification engine extracts testable claims from markdown documents and cross-references them against the codebase to catch contradictions before they become bugs.

┌──────────────────┐     ┌───────────────────┐     ┌──────────────────┐
│  Spec Document   │────▶│  Claim Extractor   │────▶│  94 Claims       │
│  (markdown/yaml) │     │  (RFC-2119, paths, │     │  extracted       │
│                  │     │   entities, etc.)  │     │                  │
└──────────────────┘     └───────────────────┘     └────────┬─────────┘
                                                            │
                                                            ▼
┌──────────────────┐     ┌───────────────────┐     ┌──────────────────┐
│  Report          │◀────│  Falsification     │◀────│  Strategy Router │
│  (pass/fail per  │     │  Strategies        │     │  (per category)  │
│   claim)         │     │  (path, entity,    │     │                  │
│                  │     │   absence, etc.)   │     │                  │
└──────────────────┘     └───────────────────┘     └──────────────────┘

Quick Start

# Falsify a single specification
pmat falsify docs/specifications/my-feature.md

# Falsify all specs in a directory
pmat falsify docs/specifications/

# Dry run — show extracted claims without checking
pmat falsify docs/specifications/my-feature.md --dry-run

# JSON output for CI/CD
pmat falsify docs/specifications/my-feature.md --format json 2>/dev/null

# Show only failures
pmat falsify docs/specifications/my-feature.md --failures-only

Claim Extraction

The engine automatically extracts falsifiable claims from structured documents using nine pattern categories:

RFC-2119 Keywords

Claims using MUST, SHALL, SHOULD, MAY (per RFC 2119) are extracted with priority based on keyword strength:

<!-- These become falsifiable claims: -->
The system MUST store receipts in `.pmat-work/{id}/falsification/`
The API SHOULD return results within 200ms
Configuration files SHALL use TOML format

Keyword	Priority	Meaning
MUST / SHALL / REQUIRED	P0 (Critical)	Absolute requirement
SHOULD / RECOMMENDED	P1 (High)	Strong recommendation
MAY / OPTIONAL	P2 (Medium)	Truly optional

Path References

Any file or directory path is checked for existence:

<!-- Extracted and verified against filesystem: -->
Configuration at `.pmat-metrics.toml`
Source in `src/services/spec_falsification.rs`
Schema at `docs/schemas/receipt.json`

Code Entities

References to functions, structs, modules, and traits are searched in the codebase index:

<!-- Verified via pmat query: -->
The `FalsificationEngine` struct handles all strategies
The `extract_claims()` method returns a `Vec<SpecClaim>`

Numeric Thresholds

Measurable claims with specific numbers:

<!-- Extracted as metric claims: -->
Response time MUST be under 200ms
Coverage SHOULD exceed 85%
Maximum file size: 500 lines

Absence Assertions

Claims that something should NOT exist:

<!-- Verified by searching for the forbidden pattern: -->
MUST NOT use unwrap() in production code
SHALL NOT store secrets in plaintext

Claim Categories and Strategies

Each extracted claim is categorized and routed to the appropriate falsification strategy:

Category	Strategy	How It’s Falsified
`PathReference`	Check filesystem	Path exists? Similar file found?
`CodeEntity`	Search codebase index	Entity found via `pmat query`?
`AbsenceClaim`	Reverse search	Forbidden pattern found?
`CommandClaim`	Validate executable	Command exists and is parseable?
`MetricClaim`	Check threshold	Metric within stated bounds?
`ArchitecturalClaim`	Structural analysis	Architecture matches description?
`BehaviorClaim`	Evidence search	Behavioral evidence found?
`Requirement`	Evidence search	Implementation evidence exists?

Verdict Statuses

Each claim receives one of four verdicts:

Status	Meaning	Action
Survived	Claim could not be falsified (evidence supports it)	No action needed
Falsified	Evidence contradicts the claim	Fix the spec or the code
NotTestable	Claim cannot be empirically tested	Consider rewriting
Skipped	Claim filtered or deferred	Review manually

Output Formats

Human-Readable (Default)

Spec Falsification Report: docs/specifications/my-feature.md
============================================================

  94 claims extracted, 40 survived, 9 falsified, 45 not testable

  Health Score: 0.62 (FAIR)

  FALSIFIED Claims:
  ─────────────────
  [P1] Line 142: "SHOULD use trueno-rag for vector search"
       Category: CodeEntity
       Evidence: Entity 'trueno-rag' not found in codebase
       Suggestion: Update spec or implement trueno-rag integration

  [P2] Line 87: Schema at `docs/schemas/claim.json`
       Category: PathReference
       Evidence: Path does not exist
       Similar: docs/schemas/receipt.json

JSON Output

pmat falsify docs/specifications/my-feature.md --format json 2>/dev/null

{
  "file": "docs/specifications/my-feature.md",
  "total_claims": 94,
  "survived": 40,
  "falsified": 9,
  "not_testable": 45,
  "skipped": 0,
  "health_score": 0.62,
  "verdicts": [
    {
      "claim": "SHOULD use trueno-rag for vector search",
      "line": 142,
      "category": "CodeEntity",
      "priority": "P1",
      "status": "Falsified",
      "evidence": "Entity 'trueno-rag' not found in codebase"
    }
  ]
}

Health Score

The health score (0.0 to 1.0) is calculated as:

health = survived / (survived + falsified)

Claims that are NotTestable or Skipped are excluded from the calculation. This means a spec with many vague claims won’t get a high score just because those claims can’t be disproven.

Score Range	Rating	Meaning
0.90 - 1.00	Excellent	Spec is highly aligned with codebase
0.75 - 0.89	Good	Minor drift, review falsified claims
0.50 - 0.74	Fair	Significant drift, update needed
0.00 - 0.49	Poor	Spec is largely contradicted by code

Directory Mode

Falsify all specs in a directory at once:

# Scan entire specifications directory
pmat falsify docs/specifications/

# Only show failures across all specs
pmat falsify docs/specifications/ --failures-only

The engine collects all .md, .yaml, and .yml files recursively and produces a report for each.

Integration with Work System

The pmat falsify command integrates with the work system in two ways:

Work Item Falsification

When given a work item ID instead of a file path, the command delegates to the existing work contract falsification system:

# Falsify a work item's contract claims
pmat falsify GH-123

# With overrides
pmat falsify GH-123 --override-claims "claim-1,claim-2" --ticket "DEBT-456"

Spec Falsification in Work Complete

The pmat work complete command automatically runs falsification on any linked specification files as part of the quality gate. See Chapter 34 for details on the work contract system and falsification ledger.

CI/CD Integration

Add spec falsification to your CI pipeline:

# .github/workflows/quality.yml
- name: Falsify Specifications
  run: |
    pmat falsify docs/specifications/ --format json --failures-only 2>/dev/null > falsify.json
    FALSIFIED=$(jq '.verdicts | map(select(.status == "Falsified")) | length' falsify.json)
    if [ "$FALSIFIED" -gt 0 ]; then
      echo "::error::$FALSIFIED spec claims falsified by codebase"
      jq '.verdicts[] | select(.status == "Falsified")' falsify.json
      exit 1
    fi

Dry Run Mode

Use --dry-run to see what claims would be extracted without running falsification:

pmat falsify docs/specifications/my-feature.md --dry-run

Dry Run: 94 claims extracted from docs/specifications/my-feature.md
─────────────────────────────────────────────────────────────────────

  [P0] Line 14: "MUST use SQLite for storage" (Requirement)
  [P1] Line 42: Path `src/services/cache.rs` (PathReference)
  [P1] Line 87: Entity `FalsificationEngine` (CodeEntity)
  [P2] Line 123: "SHOULD return within 200ms" (MetricClaim)
  ...

Command	Purpose	Input	Focus
`pmat falsify`	Falsify spec claims	Spec files, work items	Disconfirming evidence
`pmat validate-readme`	Validate README accuracy	README.md	Confirmation of claims
`pmat work complete`	Gate work completion	Work contract	Contract claim verification
`pmat popper-score`	Score falsifiability	Project-wide	Falsifiability measurement
`pmat spec score`	Score spec quality	Spec file	Structural quality (95-point)

Summary

The pmat falsify command brings Popperian epistemology to specification management:

Automatically extracts falsifiable claims from specs using RFC-2119 keywords, path references, code entities, and more
Actively searches for disconfirming evidence rather than confirming matches
Reports per-claim verdicts with evidence and suggestions
Supports JSON output for CI/CD integration
Works on individual files or entire directories
Integrates with the work system for automated quality gates

For the underlying philosophy, see Chapter 37: Popper Falsifiability Score. For work system integration, see Chapter 34: Unified Workflow Management.

PMAT: The PAIML MCP Agent Toolkit