Mutation Testing
Mutation testing validates the quality of your tests by deliberately introducing bugs (“mutations”) into your code and checking if your tests catch them. pforge targets a ≥90% mutation kill rate using cargo-mutants
, ensuring our 115 tests are actually effective.
The Problem Mutation Testing Solves
You can have 100% test coverage and still have ineffective tests:
// Production code
pub fn validate_config(config: &ForgeConfig) -> Result<()> {
if config.tools.is_empty() {
return Err(ConfigError::EmptyTools);
}
Ok(())
}
// Test with 100% line coverage but zero assertions
#[test]
fn test_validate_config() {
let config = create_valid_config();
validate_config(&config); // ❌ No assertion! Test passes even if code is broken
}
Coverage says: ✅ 100% line coverage Reality: This test catches nothing!
Mutation testing finds these weak tests by mutating code and seeing if tests fail.
How Mutation Testing Works
- Baseline: Run all tests → they should pass
- Mutate: Change code in a specific way (e.g., change
==
to!=
) - Test: Run tests again
- Result:
- Tests fail → Mutation killed ✅ (good test!)
- Tests pass → Mutation survived ❌ (weak test!)
Example Mutation
// Original code
pub fn has_handler(&self, name: &str) -> bool {
self.handlers.contains_key(name) // Original
}
// Mutation 1: Change return value
pub fn has_handler(&self, name: &str) -> bool {
!self.handlers.contains_key(name) // Mutated: inverted logic
}
// Mutation 2: Change to always return true
pub fn has_handler(&self, name: &str) -> bool {
true // Mutated: constant return
}
// Mutation 3: Change to always return false
pub fn has_handler(&self, name: &str) -> bool {
false // Mutated: constant return
}
Good test (catches all mutations):
#[test]
fn test_has_handler() {
let mut registry = HandlerRegistry::new();
// Should return false for non-existent handler
assert!(!registry.has_handler("nonexistent")); // Kills mutation 2
registry.register("test", TestHandler);
// Should return true for registered handler
assert!(registry.has_handler("test")); // Kills mutations 1 & 3
}
Weak test (mutations survive):
#[test]
fn test_has_handler_weak() {
let mut registry = HandlerRegistry::new();
registry.register("test", TestHandler);
// Only tests positive case - mutations 1 & 2 survive!
assert!(registry.has_handler("test"));
}
Setting Up cargo-mutants
Installation
cargo install cargo-mutants
Basic Usage
# Run mutation testing
cargo mutants
# Run on specific crate
cargo mutants -p pforge-runtime
# Run on specific file
cargo mutants --file crates/pforge-runtime/src/registry.rs
# Show what would be mutated without running tests
cargo mutants --list
Configuration
Create .cargo/mutants.toml
:
# Timeout per mutant (5 minutes default)
timeout = 300
# Exclude certain patterns
exclude_globs = [
"**/tests/**",
"**/*_test.rs",
]
# Additional test args
test_args = ["--release"]
Common Mutations
cargo-mutants applies various mutation operators:
1. Replace Function Return Values
// Original
fn get_count(&self) -> usize {
self.handlers.len()
}
// Mutations
fn get_count(&self) -> usize { 0 } // Always 0
fn get_count(&self) -> usize { 1 } // Always 1
fn get_count(&self) -> usize { usize::MAX } // Max value
Test that kills:
#[test]
fn test_get_count() {
let registry = HandlerRegistry::new();
assert_eq!(registry.get_count(), 0); // Kills non-zero mutations
registry.register("test", TestHandler);
assert_eq!(registry.get_count(), 1); // Kills 0 and MAX mutations
}
2. Negate Boolean Conditions
// Original
if config.tools.is_empty() {
return Err(ConfigError::EmptyTools);
}
// Mutation
if !config.tools.is_empty() { // Inverted!
return Err(ConfigError::EmptyTools);
}
Test that kills:
#[test]
fn test_validation_rejects_empty_tools() {
let config = create_config_with_no_tools();
assert!(validate_config(&config).is_err()); // Catches inversion
}
#[test]
fn test_validation_accepts_valid_tools() {
let config = create_config_with_tools();
assert!(validate_config(&config).is_ok()); // Also needed!
}
3. Change Comparison Operators
// Original
if count > threshold {
// ...
}
// Mutations
if count >= threshold { } // Change > to >=
if count < threshold { } // Change > to <
if count == threshold { } // Change > to ==
if count != threshold { } // Change > to !=
Test that kills:
#[test]
fn test_threshold_boundary() {
assert!(!exceeds_threshold(5, 5)); // count == threshold
assert!(!exceeds_threshold(4, 5)); // count < threshold
assert!(exceeds_threshold(6, 5)); // count > threshold
}
4. Delete Statements
// Original
fn process(&mut self) {
self.validate(); // Original
self.execute();
}
// Mutation: Delete validation
fn process(&mut self) {
// self.validate(); // Deleted!
self.execute();
}
Test that kills:
#[test]
fn test_process_validates_before_executing() {
let mut processor = create_invalid_processor();
// Should fail during validation
assert!(processor.process().is_err());
}
5. Replace Binary Operators
// Original
let sum = a + b;
// Mutations
let sum = a - b; // + → -
let sum = a * b; // + → *
let sum = a / b; // + → /
pforge Mutation Testing Strategy
Target: 90% Kill Rate
Mutation Score = (Killed Mutants / Total Mutants) × 100%
pforge target: ≥ 90%
Running Mutation Tests
# Full mutation test suite
make mutants
# Or manually
cargo mutants --test-threads=8
Example Run Output
Testing mutants:
crates/pforge-runtime/src/registry.rs:114:5: replace HandlerRegistry::new -> HandlerRegistry with Default::default()
CAUGHT in 0.2s
crates/pforge-runtime/src/registry.rs:121:9: replace <impl HandlerRegistry>::register -> () with ()
CAUGHT in 0.3s
crates/pforge-config/src/validator.rs:9:20: replace <impl>::validate -> Result<()> with Ok(())
CAUGHT in 0.2s
crates/pforge-config/src/validator.rs:15:16: replace != with ==
CAUGHT in 0.1s
Summary:
Tested: 127 mutants
Caught: 117 mutants (92.1%)
Missed: 8 mutants (6.3%)
Timeout: 2 mutants (1.6%)
Interpreting Results
- Caught: ✅ Test suite detected the mutation (good!)
- Missed: ❌ Test suite didn’t detect mutation (add test!)
- Timeout: ⚠️ Test took too long (possibly infinite loop)
- Unviable: Mutation wouldn’t compile (ignored)
Improving Kill Rate
Strategy 1: Test Both Branches
// Code with branch
fn validate(&self) -> Result<()> {
if self.is_valid() {
Ok(())
} else {
Err(Error::Invalid)
}
}
// Weak: Only tests one branch
#[test]
fn test_validate_success() {
let validator = create_valid();
assert!(validator.validate().is_ok());
}
// Strong: Tests both branches
#[test]
fn test_validate_success() {
let validator = create_valid();
assert!(validator.validate().is_ok());
}
#[test]
fn test_validate_failure() {
let validator = create_invalid();
assert!(validator.validate().is_err());
}
Strategy 2: Test Boundary Conditions
// Code with comparison
fn is_large(&self) -> bool {
self.size > 100
}
// Weak: Only tests middle of range
#[test]
fn test_is_large() {
assert!(Item { size: 150 }.is_large());
assert!(!Item { size: 50 }.is_large());
}
// Strong: Tests boundary
#[test]
fn test_is_large_boundary() {
assert!(!Item { size: 100 }.is_large()); // Exactly at boundary
assert!(!Item { size: 99 }.is_large()); // Just below
assert!(Item { size: 101 }.is_large()); // Just above
}
Strategy 3: Test Return Values
// Code
fn get_status(&self) -> Status {
if self.is_ready() {
Status::Ready
} else {
Status::NotReady
}
}
// Weak: No assertion on return value
#[test]
fn test_get_status() {
let item = Item::new();
item.get_status(); // ❌ Doesn't assert anything!
}
// Strong: Asserts actual vs expected
#[test]
fn test_get_status_ready() {
let item = create_ready_item();
assert_eq!(item.get_status(), Status::Ready);
}
#[test]
fn test_get_status_not_ready() {
let item = create_not_ready_item();
assert_eq!(item.get_status(), Status::NotReady);
}
Strategy 4: Test Error Cases
// Code
fn parse(input: &str) -> Result<Config> {
if input.is_empty() {
return Err(Error::EmptyInput);
}
// ... parse logic
Ok(config)
}
// Weak: Only tests success
#[test]
fn test_parse_success() {
let result = parse("valid config");
assert!(result.is_ok());
}
// Strong: Tests both success and error
#[test]
fn test_parse_success() {
let result = parse("valid config");
assert!(result.is_ok());
}
#[test]
fn test_parse_empty_input() {
let result = parse("");
assert!(matches!(result.unwrap_err(), Error::EmptyInput));
}
Real pforge Mutation Test Results
Before Mutation Testing
Initial run showed 82% kill rate with 23 surviving mutants:
Survived mutations:
1. validator.rs:25 - Changed `contains_key` to always return true
2. registry.rs:142 - Removed error handling
3. config.rs:18 - Changed `is_empty()` to `!is_empty()`
...
After Adding Tests
// Added test for mutation 1
#[test]
fn test_duplicate_detection_both_cases() {
// Tests that contains_key is actually checked
let mut seen = HashSet::new();
assert!(!seen.contains("key")); // Not present
seen.insert("key");
assert!(seen.contains("key")); // Present
}
// Added test for mutation 2
#[test]
fn test_error_propagation() {
let result = fallible_function();
assert!(result.is_err());
match result.unwrap_err() {
Error::Expected => {}, // Verify specific error
_ => panic!("Wrong error type"),
}
}
// Added test for mutation 3
#[test]
fn test_empty_check() {
let empty = Vec::<String>::new();
assert!(is_empty_error(&empty).is_err()); // Empty case
let nonempty = vec!["item".to_string()];
assert!(is_empty_error(&nonempty).is_ok()); // Non-empty case
}
Final Result
Summary:
Tested: 127 mutants
Caught: 117 mutants (92.1%) ✅
Missed: 8 mutants (6.3%)
Timeout: 2 mutants (1.6%)
Mutation score: 92.1% (TARGET: ≥90%)
Acceptable Mutations
Some mutations are acceptable to miss:
1. Logging Statements
// Original
fn process(&self) {
log::debug!("Processing item");
// ... actual logic
}
// Mutation: Delete log statement
fn process(&self) {
// log::debug!("Processing item"); // Deleted
// ... actual logic
}
Acceptable: Tests shouldn’t depend on logging.
2. Performance Optimizations
// Original
fn calculate(&self) -> i32 {
self.cached_value.unwrap_or_else(|| expensive_calculation())
}
// Mutation: Always calculate
fn calculate(&self) -> i32 {
expensive_calculation() // Remove cache
}
Acceptable: Result is same, just slower.
3. Error Messages
// Original
return Err(Error::Invalid("Field 'name' is required".to_string()));
// Mutation
return Err(Error::Invalid("".to_string()));
Acceptable if: Test only checks error variant, not message.
Integration with CI/CD
GitHub Actions
# .github/workflows/mutation.yml
name: Mutation Testing
on:
pull_request:
schedule:
- cron: '0 0 * * 0' # Weekly
jobs:
mutants:
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v3
- name: Install cargo-mutants
run: cargo install cargo-mutants
- name: Run mutation tests
run: cargo mutants --test-threads=4
- name: Check mutation score
run: |
SCORE=$(cargo mutants --json | jq '.score')
if (( $(echo "$SCORE < 90" | bc -l) )); then
echo "Mutation score $SCORE% below target 90%"
exit 1
fi
Local Pre-Push Hook
#!/bin/bash
# .git/hooks/pre-push
echo "Running mutation tests..."
cargo mutants --test-threads=8 || {
echo "❌ Mutation testing failed"
echo "Fix tests or accept surviving mutants"
exit 1
}
echo "✅ Mutation testing passed"
Performance Optimization
Mutation testing is slow. Optimize:
1. Parallel Execution
# Use all cores
cargo mutants --test-threads=$(nproc)
2. Incremental Testing
# Only test changed files
cargo mutants --file src/changed_file.rs
3. Shorter Timeouts
# Set 60 second timeout per mutant
cargo mutants --timeout=60
4. Baseline Filtering
# Skip mutants in tests
cargo mutants --exclude-globs '**/tests/**'
Mutation Testing Best Practices
1. Run Regularly, Not Every Commit
# Weekly in CI, or before releases
make mutants # Part of quality gate
2. Focus on Critical Code
# Prioritize high-value files
cargo mutants --file src/runtime/registry.rs
cargo mutants --file src/config/validator.rs
3. Track Metrics Over Time
# Save mutation scores
cargo mutants --json > mutation-report.json
4. Don’t Aim for 100%
90% is excellent. Diminishing returns above that:
- 90%: ✅ Excellent test quality
- 95%: ⚠️ Very good, some effort
- 100%: ❌ Not worth the effort
5. Use with Other Metrics
Mutation testing + coverage + complexity:
make quality-gate # Runs all quality checks
Limitations
- Slow: Can take 10-60 minutes for large codebases
- False positives: Some mutations are semantically equivalent
- Not exhaustive: Can’t test all possible bugs
- Requires good tests: Mutation testing validates tests, not code
Summary
Mutation testing is the ultimate validation of test quality:
- Purpose: Validate that tests actually catch bugs
- Target: ≥90% mutation kill rate
- Tool:
cargo-mutants
- Integration: Weekly CI runs, pre-release checks
- Benefit: Confidence that tests are effective
Mutation Testing in Context
Metric | What it measures | pforge target |
---|---|---|
Line coverage | Lines executed | ≥80% |
Mutation score | Tests effectiveness | ≥90% |
Complexity | Code simplicity | ≤20 |
TDG | Technical debt | ≥0.75 |
All four metrics together ensure comprehensive quality.
The Complete Testing Picture
pforge’s multi-layered testing strategy:
- Unit tests (Chapter 9.1): Fast, focused component tests
- Integration tests (Chapter 9.2): Cross-component workflows
- Property tests (Chapter 9.3): Automated edge case discovery
- Mutation tests (Chapter 9.4): Validate test effectiveness
Result: 115 high-quality tests that provide genuine confidence in pforge’s reliability.
Quality Metrics
115 total tests
├── 74 unit tests (<1ms each)
├── 26 integration tests (<100ms each)
├── 12 property tests (10K cases each = 120K total)
└── Validated by mutation testing (92% kill rate)
Coverage: 85% lines, 78% branches
Complexity: All functions ≤20
Mutation score: 92%
TDG: 0.82
This comprehensive approach ensures pforge maintains production-ready quality while enabling rapid, confident development through strict TDD discipline.
Further Reading
- cargo-mutants documentation
- PIT Mutation Testing - Java mutation testing
- pforge mutation config:
.cargo/mutants.toml