Mutation Testing

Mutation testing is the most rigorous form of test quality assessment. While code coverage tells you what code your tests execute, mutation testing tells you whether your tests actually verify the code's behavior.

The Problem with Coverage Metrics

Consider this code with 100% line coverage:

pub fn calculate_discount(price: f32, is_member: bool) -> f32 {
    if is_member {
        price * 0.9  // 10% discount
    } else {
        price
    }
}

#[test]
fn test_discount() {
    let result = calculate_discount(100.0, true);
    assert!(result > 0.0);  // Weak assertion!
}

This test achieves 100% coverage but would pass even if we changed 0.9 to 0.5 or 1.0. Mutation testing catches this.

How Mutation Testing Works

Generate Mutants: The tool creates variations of your code (mutants)
Run Tests: Each mutant is tested against your test suite
Kill or Survive: If tests fail, the mutant is "killed" (good). If tests pass, it "survives" (bad)
Calculate Score: Mutation Score = Killed Mutants / Total Mutants

Common Mutation Operators

Operator	Original	Mutant	Tests Should Catch
Arithmetic	`a + b`	`a - b`	Value changes
Relational	`a < b`	`a <= b`	Boundary conditions
Logical	`a && b`	`a \\|\\| b`	Boolean logic
Literal	`0.9`	`0.0`	Magic numbers
Return	`return x`	`return 0`	Return value usage

Using cargo-mutants in Aprender

Installation

cargo install cargo-mutants --locked

Makefile Targets

Aprender provides tiered mutation testing targets:

# Quick sample (~5 min) - for rapid feedback
make mutants-fast

# Full suite (~30-60 min) - for comprehensive analysis
make mutants

# Single file - for targeted improvements
make mutants-file FILE=src/metrics/mod.rs

# List potential mutants without running
make mutants-list

Direct Usage

# Run on entire crate
cargo mutants --no-times --timeout 300 -- --all-features

# Run on specific file
cargo mutants --no-times --timeout 120 --file src/loss/mod.rs

# Run with sharding for CI parallelization
cargo mutants --no-times --shard 1/4 -- --lib

Interpreting Results

Output Format

src/metrics/mod.rs:42: replace mse -> f32 with 0.0 ... KILLED
src/metrics/mod.rs:42: replace mse -> f32 with 1.0 ... KILLED
src/metrics/mod.rs:58: replace mae -> f32 with 0.0 ... SURVIVED  ⚠️

Result Categories

Status	Meaning	Action
KILLED	Tests caught the mutation	Good - no action needed
SURVIVED	Tests missed the mutation	Add stronger assertions
TIMEOUT	Tests took too long	May indicate infinite loop
UNVIABLE	Mutant doesn't compile	Normal - skip these

Improving Your Mutation Score

1. Strengthen Assertions

// ❌ Weak - survives many mutants
assert!(result > 0.0);

// ✅ Strong - kills most mutants
assert!((result - expected).abs() < 1e-6);

2. Test Boundary Conditions

#[test]
fn test_boundaries() {
    // Test exact boundaries, not just general cases
    assert_eq!(classify(0), Category::Zero);
    assert_eq!(classify(1), Category::Positive);
    assert_eq!(classify(-1), Category::Negative);
}

3. Verify Return Values

// ❌ Just calling the function
let _ = process_data(&input);

// ✅ Verify the actual result
let result = process_data(&input);
assert_eq!(result.len(), expected_len);
assert!(result.iter().all(|x| *x >= 0.0));

4. Test Error Paths

#[test]
fn test_error_handling() {
    // Verify errors are returned, not just that function doesn't panic
    let result = parse_config("invalid");
    assert!(result.is_err());
    assert!(result.unwrap_err().to_string().contains("invalid"));
}

Mutation Score Targets

Project Stage	Target Score	Rationale
Prototype	50%	Focus on functionality
Development	70%	Growing confidence
Production	80%	Reliability requirement
Critical Path	90%+	Zero-defect tolerance

Aprender targets 85%+ mutation score for core algorithms.

CI Integration

GitHub Actions Example

mutation-test:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install cargo-mutants
      run: cargo install cargo-mutants --locked
    - name: Run mutation tests (sample)
      run: cargo mutants --no-times --shard 1/4 --timeout 300
      continue-on-error: true
    - name: Upload results
      uses: actions/upload-artifact@v4
      with:
        name: mutants-results
        path: mutants.out/

Sharding for Parallelization

# Split across 4 CI jobs
cargo mutants --shard 1/4  # Job 1
cargo mutants --shard 2/4  # Job 2
cargo mutants --shard 3/4  # Job 3
cargo mutants --shard 4/4  # Job 4

Real Example: Fixing a Surviving Mutant

The Surviving Mutant

src/loss/mod.rs:85: replace - with + in cross_entropy ... SURVIVED

The Original Test

#[test]
fn test_cross_entropy() {
    let predictions = vec![0.9, 0.1];
    let targets = vec![1.0, 0.0];
    let loss = cross_entropy(&predictions, &targets);
    assert!(loss > 0.0);  // Too weak!
}

The Fix

#[test]
fn test_cross_entropy_value() {
    let predictions = vec![0.9, 0.1];
    let targets = vec![1.0, 0.0];
    let loss = cross_entropy(&predictions, &targets);

    // Expected: -1.0 * ln(0.9) - 0.0 * ln(0.1) ≈ 0.105
    assert!((loss - 0.105).abs() < 0.01);
}

#[test]
fn test_cross_entropy_increases_with_wrong_prediction() {
    let good_pred = cross_entropy(&[0.9], &[1.0]);
    let bad_pred = cross_entropy(&[0.1], &[1.0]);

    assert!(bad_pred > good_pred);  // Wrong predictions = higher loss
}

Best Practices

Start Small: Run mutants-fast during development
Target High-Risk Code: Focus on algorithms and business logic
Skip Test Code: Don't mutate test files themselves
Use Timeouts: Prevent infinite loops from stalling CI
Review Survivors: Each surviving mutant is a potential bug

Relationship to Other Testing

Test Type	What It Measures	Speed
Unit Tests	Functionality	Fast
Property Tests	Invariants	Medium
Coverage	Code execution	Fast
Mutation Testing	Test quality	Slow

Mutation testing is the final arbiter of test suite quality. Use it to validate that your other testing efforts actually catch bugs.

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning