Chapter 59: Design by Contract for Work Items

The pmat work system implements Meyer’s Design by Contract (DbC) to enforce quality during development. Every work item carries a contract — a set of falsifiable claims organized into preconditions, invariants, and postconditions that must hold throughout the work lifecycle.

Overview

Traditional ticket systems track what to do. Design by Contract tracks what must remain true while you do it:

┌─────────────────────────────────────────────────────────┐
│                    Work Contract                        │
├────────────┬─────────────────┬──────────────────────────┤
│  REQUIRE   │   INVARIANT     │         ENSURE           │
│ (before)   │  (throughout)   │        (after)           │
├────────────┼─────────────────┼──────────────────────────┤
│ Compiles   │ No new warnings │ Coverage ≥ 85%           │
│ Tests pass │ No SATD added   │ TDG grade ≥ baseline     │
│ Clean git  │ Files < 500 LOC │ All tests pass           │
│            │                 │ Complexity ≤ threshold   │
└────────────┴─────────────────┴──────────────────────────┘

Quick Start

# Start work with a contract (auto-detects profile)
pmat work start PMAT-123

# Start with a specific profile
pmat work start PMAT-123 --profile rust

# Start with the strict PMAT profile
pmat work start PMAT-123 --profile pmat

# Exclude specific claims
pmat work start PMAT-123 --without coverage --without complexity

# Run a checkpoint (tests invariants)
pmat work checkpoint PMAT-123

# Complete work (tests all postconditions)
pmat work done PMAT-123

Contract Profiles

Profiles determine which claims are included in a contract.

Universal Profile (6 claims)

The minimal profile that works for any language:

Kind	Claim	Threshold
Require	Clean working directory	-
Require	Tests pass	-
Invariant	No new SATD comments	-
Invariant	File count stable	±10%
Ensure	Tests still pass	-
Ensure	No regression in test count	-

Rust Profile (14 claims)

Extends Universal with Rust-specific checks:

Kind	Claim	Threshold
Invariant	No new clippy warnings	-
Invariant	No unsafe added	-
Invariant	File line limit	≤ 500 LOC
Invariant	Dependency count stable	±5%
Ensure	Coverage ≥ threshold	configurable
Ensure	TDG grade ≥ baseline	per-function
Ensure	Complexity ≤ threshold	configurable
Ensure	No new dead code	-

PMAT Profile (25 claims)

The strictest profile, used for PMAT’s own development:

Kind	Claim	Threshold
Require	Index freshness	< 24h
Require	CLAUDE.md configured	-
Invariant	Mutation score stable	≥ 80%
Invariant	Property test coverage	≥ baseline
Ensure	Coverage ≥ 95%	strict
Ensure	Spec score ≥ 95	all specs
Ensure	Zero falsified claims	all specs

The Meyer Triad

Require (Preconditions)

Checked once when pmat work start is called. If any precondition fails, work cannot begin:

$ pmat work start PMAT-456
❌ Precondition failed: REQ-002 — Tests must pass before starting work
   Run `cargo test` to fix, then retry.

Invariant (Maintained Throughout)

Checked at every checkpoint (pmat work checkpoint). Invariants must hold continuously:

$ pmat work checkpoint PMAT-456
  ✅ INV-001: No new compiler warnings
  ✅ INV-002: No new SATD comments
  ❌ INV-003: File line limit exceeded
     → src/handlers/big_handler.rs: 523 lines (limit: 500)

Ensure (Postconditions)

Checked when work is completed (pmat work done). All postconditions must pass:

$ pmat work done PMAT-456
  ✅ ENS-001: Tests pass
  ✅ ENS-002: Coverage 96.2% (≥ 95%)
  ✅ ENS-003: TDG grade A (≥ baseline A)
  ❌ ENS-004: Complexity regression
     → parse_config(): cyclomatic 32 (limit: 30)

Subcontracting

When reworking a ticket across iterations, postconditions can only strengthen (Liskov Substitution Principle):

# First iteration: coverage target 85%
pmat work start PMAT-789 --iteration 1
pmat work done PMAT-789

# Second iteration: coverage must be ≥ 85% (inherited)
# Can strengthen to 90%, cannot weaken to 80%
pmat work start PMAT-789 --iteration 2

# This would fail:
# ❌ Subcontracting violation: postcondition weakened
#    ENS-001 coverage: 85% → 80% (must be monotonically non-decreasing)

Stack Manifests

Third-party tool stacks can define their own claims via .dbc-stack.toml:

[stack]
name = "nextjs-stack"
version = "1.0.0"
extends = "universal"  # Inherit universal claims

[[require]]
id = "npm-audit"
description = "No critical vulnerabilities"
check = "npm audit --audit-level=critical"
timeout = 60

[[ensure]]
id = "lighthouse-perf"
description = "Lighthouse performance score ≥90"
check = "npx lighthouse --output=json http://localhost:3000"
metric_pattern = "performance.*?([\\d.]+)"
threshold = { metric = "perf_score", op = "Gte", value = 0.9 }

[[rescue]]
for_clause = "lighthouse-perf"
strategy = "diagnose"
command = "npx lighthouse http://localhost:3000 --view"
guidance = "Check render-blocking resources and image optimization"

Security Restrictions

Stack manifest commands are validated against security restrictions:

No pipe to shell (| sh, | bash)
No backtick substitution (`cmd`)
No $() command substitution
No network fetch + execute (curl | bash)
No redirect to executable with chmod

TOFU Trust Model

Stack manifests use Trust On First Use (TOFU):

# First use: prompted to review and trust
$ pmat work start TASK-001 --stack ./nextjs-stack.toml
⚠️  New stack manifest: nextjs-stack v1.0.0
   Commands: npm audit, npx lighthouse
   Trust this manifest? [y/N]

# Subsequent uses: trusted if hash matches
$ pmat work start TASK-002 --stack ./nextjs-stack.toml
✅ Stack nextjs-stack v1.0.0 (trusted)

Rescue Protocol

When postconditions fail, the rescue protocol attempts automated recovery:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Claim Fails │────▶│  Strategy    │────▶│  Execute     │
│              │     │  Selection   │     │  Rescue      │
└──────────────┘     └──────────────┘     └──────┬───────┘
                                                  │
                                           ┌──────▼───────┐
                                           │  Re-test     │
                                           │  (max 3)     │
                                           └──────────────┘

Rescue strategies by claim type:

Claim Type	Strategy	Action
Coverage	`auto-fix`	Run `pmat oracle` to generate tests
Complexity	`diagnose`	Run `pmat analyze complexity`
TDG	`diagnose`	Run `pmat analyze tdg`
SATD	`guidance`	Show SATD locations
Stack	`command`	Run manifest rescue command

Contract Quality

The --without flag excludes claims from a contract. Contract quality tracks what percentage of applicable claims are active:

$ pmat work start PMAT-123 --without coverage --without complexity
⚠️  Contract quality: 75% (Strong) — 15/20 claims active
   Excluded: ENS-001 (coverage), ENS-004 (complexity)

Quality ratings:

Excellent: ≥ 90% claims active
Strong: ≥ 75%
Adequate: ≥ 50%
Weak: < 50%

Contract Scoring (v1.2.0)

Every contract receives a 5-dimension quality score adapted from provable-contracts:

# Score a work contract
pmat work score PMAT-123

# Score with minimum threshold (CI/CD mode)
pmat work score PMAT-123 --min-score 0.60

# JSON output for automation
pmat work score PMAT-123 --format json

Scoring Dimensions

Dimension	Weight	What it measures
spec_depth	0.20	How comprehensive are the contract clauses?
falsification	0.25	What fraction of claims have been verified?
invariant_health	0.25	Invariant pass rate across checkpoints
subcontracting	0.10	Monotonic postcondition strengthening
traceability	0.20	Coverage of require/ensure/invariant triad

Grade Scale

Grade	Score Range
A	>= 0.90
B	>= 0.75
C	>= 0.60
D	>= 0.40
F	< 0.40

DBC Lint Rules (v1.2.0)

13 lint rules validate contract health, modeled after provable-contracts PV-* rules:

Rule ID	Severity	Description
DBC-VAL-001	Warning	Missing preconditions (require empty)
DBC-VAL-002	Error	Missing postconditions (ensure empty)
DBC-VAL-003	Warning	Missing invariants (invariant empty)
DBC-VAL-004	Error	Empty claim hypothesis
DBC-AUD-001	Warning	Postcondition without falsification test
DBC-AUD-002	Info	Invariant without checkpoint evaluation
DBC-AUD-003	Info	Claim defined but never verified
DBC-SCR-001	Error	Contract score below threshold
DBC-SCR-002	Warning	More than 30% of claims excluded
DBC-PRV-001	Error	Subcontracting violation detected
DBC-DRF-001	Warning	Contract drift exceeds bound
DBC-TRD-001	Warning	Quality trend declining
DBC-TRD-002	Info	Rescue success rate below 50%

Rules run in 5 sequential gates: validation, audit, score, provability, drift/trend.

Drift Detection (v1.2.0)

Based on the ABC drift bounds theorem (arXiv:2602.22302):

alpha (drift rate): Increases with time since last checkpoint
gamma (recovery rate): Increases with checkpoint frequency
D* = alpha / gamma: Bounded drift (lower is better)
Staleness: Contracts without a checkpoint for >24h are flagged

Drift metrics are computed at every checkpoint and stored with the checkpoint record.

Trend Tracking (v1.2.0)

Quality trend tracking uses a 7-snapshot rolling window:

Each pmat work checkpoint records a trend snapshot
Rolling average score is computed over the last 7 snapshots
Drift detection: >5% drop from rolling average triggers a warning (DBC-DRF-001)
Trend direction: improving, stable, or declining

# View trend data in the score report
pmat work score PMAT-123

Lint Configuration (v1.2.0)

Per-project lint configuration via .pmat-work/dbc-lint.toml:

# Score threshold for CI/CD
min_score = 0.60

# Promote all warnings to errors
strict = false

# Per-rule severity overrides
[lint.rules]
DBC-AUD-003 = "error"     # Promote to error
DBC-SCR-002 = "off"       # Disable this rule

# Suppress rules entirely
[lint.suppress]
rules = ["DBC-AUD-002"]

# Trend tracking
[lint.trend]
enabled = true
retention_days = 90
drift_threshold = 0.05

Strict mode promotes all warnings to errors — useful for CI/CD pipelines where any finding should block the build.

Rule suppression completely silences a rule. Unlike severity override to “off”, suppressed rules don’t appear in SARIF output.

Codebase-Level Scoring (v1.2.0)

Aggregate quality metrics across all active work contracts:

# Portfolio quality score
pmat work codebase-score

# JSON output for dashboards
pmat work codebase-score --format json

Codebase scoring dimensions:

Contract coverage: fraction of contracts scoring >= C (0.60)
Mean score: average quality across all contracts
Lint pass rate: fraction of contracts with zero lint errors
Mean drift: average drift bound (lower is better)
Composite: 40% mean_score + 25% coverage + 20% lint_pass + 15% freshness

Runtime Violation Tracking (v1.2.0)

Based on the ABC framework (arXiv:2602.22302), runtime violations are tracked for stack manifest commands:

Command failures: non-zero exit codes recorded with command and exit code
Timing anomalies: execution times >3 sigma from rolling 50-observation mean
Trust violations: manifest content hash changes detected

Violation data is persisted to .pmat-work/{id}/violations/tracker.json and used to detect elevated violation sessions (>2x historical average).

A content hash chain links trust decisions: each trust entry contains chain_hash = SHA256(content_hash + prev_hash), preventing trust history tampering.

Configuration

DbC settings in .pmat-work/dbc-config.toml:

# Override default thresholds
[thresholds]
coverage_pct = 95.0
max_complexity = 25
max_file_lines = 500
tdg_grade_minimum = "A"

# Profile defaults
[profile]
default = "rust"

Running the Example

cargo run --example dbc_contract_demo

This demonstrates all DbC concepts programmatically: Meyer’s triad, subcontracting validation, stack manifest parsing, contract quality scoring, command security restrictions, 5-dimension scoring, ABC drift bounds, 13-rule lint gate, lint configuration, trend tracking, codebase-level scoring, and runtime violation tracking.

Specification

Full specification: docs/specifications/dbc.md

Based on:

Meyer, B. (1997). Object-Oriented Software Construction, 2nd ed. Prentice Hall.
Popper, K. (1959). The Logic of Scientific Discovery. Routledge.
Liskov, B. (1987). “Data Abstraction and Hierarchy.” OOPSLA.
ABC Drift Bounds (arXiv:2602.22302). Contracts with recovery rate gamma > alpha bound drift to D* = alpha/gamma.

PMAT: The PAIML MCP Agent Toolkit