Data Quality Pipeline
Demonstrates the full data quality pipeline for fine-tuning (GH-453):
- PII filtering -- detection and redaction of emails, IPs, etc.
- Quality scoring and filtering -- length, diversity, repetition, structure
- EvolKit-style instruction evolution -- complexity enhancement
Run
cargo run --example data_quality_pipeline
Source
// Run this example:
// cargo run --example data_quality_pipeline
//
// See the CLI reference and source code in crates/ for implementation details.