Case Study: APR 100-Point Quality Scoring

This example demonstrates the comprehensive model quality scoring system that evaluates models across six dimensions based on ML best practices and Toyota Way principles.

Overview

The scoring system provides a standardized 100-point quality assessment:

DimensionMax PointsToyota Way Principle
Accuracy & Performance25Kaizen (continuous improvement)
Generalization & Robustness20Jidoka (quality built-in)
Model Complexity15Muda elimination (waste reduction)
Documentation & Provenance15Genchi Genbutsu (go and see)
Reproducibility15Standardization
Security & Safety10Poka-yoke (error-proofing)

Running the Example

cargo run --example apr_scoring

Grade System

GradeScore RangePassing
A+97-100Yes
A93-96Yes
A-90-92Yes
B+87-89Yes
B83-86Yes
B-80-82Yes
C+77-79Yes
C73-76Yes
C-70-72Yes
D60-69No
F<60No

Model Types and Metrics

Each model type has specific scoring criteria:

let types = [
    ScoredModelType::LinearRegression,      // Primary: R2, needs regularization
    ScoredModelType::LogisticRegression,    // Primary: accuracy
    ScoredModelType::DecisionTree,          // High interpretability
    ScoredModelType::RandomForest,          // Ensemble, lower interpretability
    ScoredModelType::GradientBoosting,      // Ensemble, needs tuning
    ScoredModelType::Knn,                   // Instance-based
    ScoredModelType::KMeans,                // Clustering
    ScoredModelType::NaiveBayes,            // Probabilistic
    ScoredModelType::NeuralSequential,      // Deep learning
    ScoredModelType::Svm,                   // Kernel methods
];

// Each type has:
println!("Interpretability: {:.1}", model_type.interpretability_score());
println!("Primary Metric: {}", model_type.primary_metric());
println!("Acceptable Threshold: {:.2}", model_type.acceptable_threshold());
println!("Needs Regularization: {}", model_type.needs_regularization());

Scoring a Model

Minimal Metadata

let mut metadata = ModelMetadata {
    model_name: Some("BasicModel".to_string()),
    model_type: Some(ScoredModelType::LinearRegression),
    ..Default::default()
};
metadata.metrics.insert("r2_score".to_string(), 0.85);

let config = ScoringConfig::default();
let score = compute_quality_score(&metadata, &config);

println!("Total: {:.1}/100 (Grade: {})", score.total, score.grade);

Comprehensive Metadata

let mut metadata = ModelMetadata {
    model_name: Some("IrisRandomForest".to_string()),
    description: Some("Random Forest classifier for Iris".to_string()),
    model_type: Some(ScoredModelType::RandomForest),
    n_parameters: Some(5000),
    aprender_version: Some("0.15.0".to_string()),
    training: Some(TrainingInfo {
        source: Some("iris_dataset.csv".to_string()),
        n_samples: Some(150),
        n_features: Some(4),
        duration_ms: Some(2500),
        random_seed: Some(42),
        test_size: Some(0.2),
    }),
    flags: ModelFlags {
        has_model_card: true,
        is_signed: true,
        is_encrypted: false,
        has_feature_importance: true,
        has_edge_case_tests: true,
        has_preprocessing_steps: true,
    },
    ..Default::default()
};

// Add metrics
metadata.metrics.insert("accuracy".to_string(), 0.967);
metadata.metrics.insert("cv_score_mean".to_string(), 0.953);
metadata.metrics.insert("cv_score_std".to_string(), 0.025);
metadata.metrics.insert("train_score".to_string(), 0.985);
metadata.metrics.insert("test_score".to_string(), 0.967);

Security Detection

The scoring system detects security issues:

// Model with leaked secrets
let mut bad_metadata = ModelMetadata::default();
bad_metadata.custom.insert("api_key".to_string(), "sk-secret123".to_string());
bad_metadata.custom.insert("password".to_string(), "admin123".to_string());

let config = ScoringConfig {
    require_signed: true,
    require_model_card: true,
    ..Default::default()
};

let score = compute_quality_score(&bad_metadata, &config);
println!("Critical Issues: {}", score.critical_issues.len());

Critical Issues Detected

  • Leaked API keys or passwords in metadata
  • Missing required signatures
  • Missing model cards in production
  • Excessive train/test gap (overfitting)

Scoring Configuration

// Default config
let default_config = ScoringConfig::default();

// Strict config for production
let strict_config = ScoringConfig {
    min_primary_metric: 0.9,    // Require 90% accuracy
    max_cv_std: 0.05,           // Max CV standard deviation
    max_train_test_gap: 0.05,   // Max overfitting tolerance
    require_signed: true,        // Require model signature
    require_model_card: true,    // Require documentation
};

Source Code

  • Example: examples/apr_scoring.rs
  • Module: src/scoring/mod.rs