Grid Search Hyperparameter Tuning

This example demonstrates grid search for finding optimal regularization hyperparameters using cross-validation with Ridge, Lasso, and ElasticNet regression.

Overview

Grid search is a systematic way to find the best hyperparameters by:

Defining a grid of candidate values
Evaluating each combination using cross-validation
Selecting parameters that maximize CV score
Retraining the final model with optimal parameters

Running the Example

cargo run --example grid_search_tuning

Key Concepts

Why Grid Search?

Problem: Default hyperparameters rarely optimal for your specific dataset

Solution: Systematically search parameter space to find best values

Benefits:

Automated hyperparameter optimization
Cross-validation prevents overfitting
Reproducible model selection
Better generalization performance

Grid Search Process

Define parameter grid: Range of values to try
K-Fold CV: Split training data into K folds
Evaluate: Train model on K-1 folds, validate on remaining fold
Average scores: Mean performance across all K folds
Select best: Parameters with highest CV score
Final model: Retrain on all training data with best parameters
Test: Evaluate on held-out test set

Examples Demonstrated

Example 1: Ridge Regression Alpha Tuning

Shows grid search for Ridge regression regularization strength (alpha):

Alpha Grid: [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]

Cross-Validation Scores:
  α=0.001  → R²=0.9510
  α=0.010  → R²=0.9510
  α=0.100  → R²=0.9510  ← Best
  α=1.000  → R²=0.9508
  α=10.000 → R²=0.9428
  α=100.000→ R²=0.8920

Best Parameters: α=0.100, CV Score=0.9510
Test Performance: R²=0.9626

Observation: Performance degrades with very large alpha (underf itting).

Example 2: Lasso Regression Alpha Tuning

Demonstrates grid search for Lasso with feature selection:

Alpha Grid: [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]

Best Parameters: α=1.0000
Test Performance: R²=0.9628
Non-zero coefficients: 5/5 (sparse!)

Key Feature: Lasso performs automatic feature selection by driving some coefficients to exactly zero.

Alpha guidelines:

Too small: Overfitting (no regularization)
Optimal: Balance between fit and complexity
Too large: Underfitting (excessive regularization)

Example 3: ElasticNet with L1 Ratio Tuning

Shows 2D grid search over both alpha and l1_ratio:

Searching over:
  α: [0.001, 0.01, 0.1, 1.0, 10.0]
  l1_ratio: [0.25, 0.5, 0.75]

Best Parameters:
  α=1.000, l1_ratio=0.75
  CV Score: 0.9511

l1_ratio Parameter:

0.0: Pure Ridge (L2 only)
0.5: Equal mix of Lasso and Ridge
1.0: Pure Lasso (L1 only)

When to use ElasticNet:

Many correlated features (Ridge component)
Want feature selection (Lasso component)
Best of both regularization types

Example 4: Visualizing Alpha vs Score

Compares Ridge and Lasso performance curves:

     Alpha      Ridge R²      Lasso R²
----------------------------------------
    0.0001        0.9510        0.9510
    0.0010        0.9510        0.9510
    0.0100        0.9510        0.9510
    0.1000        0.9510        0.9510
    1.0000        0.9508        0.9511
   10.0000        0.9428        0.9480
  100.0000        0.8920        0.8998

Observations:

Plateau region: Performance stable across small alphas
Ridge: Gradual degradation with large alpha
Lasso: Sharper drop after optimal point
Both: Performance collapses with excessive regularization

Example 5: Default vs Optimized Comparison

Demonstrates value of hyperparameter tuning:

Ridge Regression Comparison:

Default (α=1.0):
  Test R²: 0.9628

Grid Search Optimized (α=0.100):
  CV R²:   0.9510
  Test R²: 0.9626

→ Improvement or similar performance

Interpretation:

When default is good: Data well-suited to default parameters
When improvement significant: Dataset-specific tuning helps
Always worth checking: Small cost, potential large benefit

Implementation Details

Using grid_search_alpha()

use aprender::model_selection::{grid_search_alpha, KFold};

// Define parameter grid
let alphas = vec![0.001, 0.01, 0.1, 1.0, 10.0];

// Setup cross-validation
let kfold = KFold::new(5).with_random_state(42);

// Run grid search
let result = grid_search_alpha(
    "ridge",        // Model type
    &alphas,        // Parameter grid
    &x_train,       // Training features
    &y_train,       // Training targets
    &kfold,         // CV strategy
    None,           // l1_ratio (ElasticNet only)
).unwrap();

// Get best parameters
println!("Best alpha: {}", result.best_alpha);
println!("Best CV score: {}", result.best_score);

// Train final model
let mut model = Ridge::new(result.best_alpha);
model.fit(&x_train, &y_train).unwrap();

GridSearchResult Structure

pub struct GridSearchResult {
    pub best_alpha: f32,       // Optimal alpha value
    pub best_score: f32,       // Best CV score
    pub alphas: Vec<f32>,      // All alphas tried
    pub scores: Vec<f32>,      // Corresponding scores
}

Methods:

best_index(): Index of best alpha in grid

Best Practices

1. Define Appropriate Grid

// ✅ Good: Log-scale grid
let alphas = vec![0.001, 0.01, 0.1, 1.0, 10.0, 100.0];

// ❌ Bad: Linear grid missing optimal region
let alphas = vec![1.0, 2.0, 3.0, 4.0, 5.0];

Guideline: Use log-scale for regularization parameters.

2. Sufficient K-Folds

// ✅ Good: 5-10 folds typical
let kfold = KFold::new(5).with_random_state(42);

// ❌ Bad: Too few folds (unreliable estimates)
let kfold = KFold::new(2);

3. Evaluate on Test Set

// ✅ Correct workflow
let (x_train, x_test, y_train, y_test) = train_test_split(...);
let result = grid_search_alpha(..., &x_train, &y_train, ...);
let mut model = Ridge::new(result.best_alpha);
model.fit(&x_train, &y_train).unwrap();
let test_score = model.score(&x_test, &y_test); // Final evaluation

// ❌ Incorrect: Using CV score as final metric
println!("Final performance: {}", result.best_score); // Wrong!

4. Use Random State for Reproducibility

let kfold = KFold::new(5).with_random_state(42);
// Same results every run

Choosing Alpha Ranges

Ridge Regression

Start: [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]
Refine: Zoom in on best region
Typical optimal: 0.1 - 10.0

Lasso Regression

Start: [0.0001, 0.001, 0.01, 0.1, 1.0]
Note: Usually needs smaller alphas than Ridge
Typical optimal: 0.001 - 1.0

ElasticNet

Alpha: Same as Ridge/Lasso
L1 ratio: [0.1, 0.3, 0.5, 0.7, 0.9] or [0.25, 0.5, 0.75]
Tip: Start with 3-5 l1_ratio values

Common Pitfalls

Fitting grid search on all data: Always split train/test first
Too fine grid: Computationally expensive, minimal benefit
Ignoring CV variance: High variance suggests unstable model
Overfitting to CV: Test set still needed for final validation
Wrong scale: Linear grid misses optimal regions

Computational Cost

Formula: cost = n_alphas × n_folds × cost_per_fit

Example:

6 alphas
5 folds
Total fits: 6 × 5 = 30

Optimization:

Start with coarse grid
Refine around best region
Use fewer folds for very large datasets

Cross-Validation - K-Fold CV fundamentals
Regularized Regression - Ridge, Lasso, ElasticNet
Linear Regression - Baseline model

Key Takeaways

Grid search automates hyperparameter optimization
Cross-validation provides unbiased performance estimates
Log-scale grids work best for regularization parameters
Ridge degrades gradually, Lasso more sensitive to alpha
ElasticNet offers 2D tuning flexibility
Always validate final model on held-out test set
Reproducibility: Use random_state for consistent results
Computational cost scales with grid size and K-folds

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning