Jidoka (Autonomation)

Jidoka is a Toyota Production System principle meaning "automation with a human touch." When a defect is detected, the process stops immediately rather than producing more defective items.

In software, Jidoka means making invalid states unrepresentable at the type level.

Theoretical Foundation

  • Shingo, S. (1986). Zero Quality Control: Source Inspection and the Poka-Yoke System. Productivity Press.
  • Brady, E. (2017). Type-Driven Development with Idris. Manning Publications.
  • Parsons, A. (2019). "Parse, Don't Validate" - https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

The Problem: Runtime Validation Can Be Bypassed

Traditional runtime validation has a fatal flaw: it can be bypassed.

// DANGEROUS: Nothing prevents using unvalidated data
fn process_embedding(data: Vec<f32>, vocab_size: usize, hidden_dim: usize) {
    // Validation can be skipped or forgotten
    if !validate_embedding(&data, vocab_size, hidden_dim) {
        return; // Easy to forget this check
    }
    // ... process data
}

// This compiles and runs, even with garbage data:
process_embedding(vec![0.0; 100], 151936, 896); // Wrong size!

The Solution: Poka-Yoke via Newtypes

Poka-Yoke (mistake-proofing) makes the invalid state impossible to represent:

/// Validated embedding - inner data is PRIVATE
pub struct ValidatedEmbedding {
    data: Vec<f32>,      // PRIVATE - cannot be accessed directly
    vocab_size: usize,
    hidden_dim: usize,
}

impl ValidatedEmbedding {
    /// The ONLY way to create a ValidatedEmbedding
    pub fn new(data: Vec<f32>, vocab_size: usize, hidden_dim: usize)
        -> Result<Self, ContractValidationError>
    {
        // All validation gates run here
        if data.len() != vocab_size * hidden_dim {
            return Err(/* shape error */);
        }
        if zero_pct(&data) > 50.0 {
            return Err(/* density error - catches PMAT-234 bug */);
        }
        // ... more gates

        Ok(Self { data, vocab_size, hidden_dim })
    }

    /// Access validated data
    pub fn data(&self) -> &[f32] { &self.data }
}

Now invalid data cannot exist:

// This function REQUIRES validated data - cannot be bypassed
fn process_embedding(embedding: ValidatedEmbedding) {
    // Compiler guarantees data passed validation
    let data = embedding.data(); // Safe to use
}

// Cannot call with invalid data - compilation error!
process_embedding(vec![0.0; 100]); // ERROR: expected ValidatedEmbedding

Validation Gates

The contract defines these validation gates (see contracts/tensor-layout-v1.yaml):

Gate IDRuleThreshold
F-DATA-QUALITY-001Embedding density< 50% zeros
F-DATA-QUALITY-002No NaN/Infcount = 0
F-DATA-QUALITY-003Non-degenerateL2 > 1e-6
F-DATA-QUALITY-004Spot checktokens at 10/50/90% non-zero

Real Bug: PMAT-234

This pattern caught a real bug where SafeTensors data had 94.5% leading zeros due to an offset calculation error:

# Trace output showing the bug
embedding len: 136134656 floats
first non-zero at index 128684288: value=-0.0131
leading zeros: 128684288 (94.5% of total)

Without validation, this data would have been used for inference, producing garbage output (Hebrew characters instead of English).

With ValidatedEmbedding, the load fails immediately with:

[F-DATA-QUALITY-001] Tensor 'embedding': DENSITY FAILURE: 94.5% zeros
(max 50%). Data likely loaded from wrong offset!

Popperian Falsification Tests

Per Popper (1959), each validation rule has explicit falsification tests that attempt to disprove the contract works:

#[test]
fn falsify_001_embedding_rejects_all_zeros() {
    let bad_data = vec![0.0f32; 100 * 64]; // 100% zeros
    let result = ValidatedEmbedding::new(bad_data, 100, 64);
    assert!(result.is_err(), "Should reject 100% zeros");
    assert!(result.unwrap_err().message.contains("DENSITY"));
}

If any falsification test passes when it should fail, the contract implementation is broken.

Running the Example

cargo run --example validated_tensors

Output:

═══════════════════════════════════════════════════════════════════
     PMAT-235: Validated Tensors - Compile-Time Enforcement
═══════════════════════════════════════════════════════════════════

┌─────────────────────────────────────────────────────────────────┐
│ Demo 1: Valid Embedding (Passes All Gates)                      │
└─────────────────────────────────────────────────────────────────┘

  ✅ ValidatedEmbedding created successfully!
     vocab_size: 100
     hidden_dim: 64
     Statistics:
       elements: 6400
       zero_pct: 0.0%
       min: -0.0998, max: 0.0998
       L2 norm: 6.3489

┌─────────────────────────────────────────────────────────────────┐
│ Demo 2: Density Rejection (Catches PMAT-234 Bug)                │
└─────────────────────────────────────────────────────────────────┘

  Creating embedding with 94.5% zeros (simulates offset bug)...
  ✅ Correctly rejected!
     Rule: F-DATA-QUALITY-001
     Tensor: embedding
     Error: DENSITY FAILURE: 94.5% zeros (max 50%)

Key Takeaways

  1. Make invalid states unrepresentable - Use newtypes with private fields
  2. Validation at construction - The only way to create the type runs validation
  3. Compiler enforcement - Cannot bypass validation because the type requires it
  4. Popperian testing - Write tests that attempt to disprove correctness

See also: