Case Study: Batch Optimization
This example demonstrates batch optimization algorithms for minimizing smooth, differentiable objective functions using gradient and Hessian information.
Overview
Batch optimization algorithms process the entire dataset at once (as opposed to stochastic/mini-batch methods). This example covers three powerful second-order methods:
- L-BFGS: Limited-memory BFGS (quasi-Newton method)
- Conjugate Gradient: CG with multiple β formulas
- Damped Newton: Newton's method with finite differences
Test Functions
The examples use classic optimization test functions:
Rosenbrock Function
f(x,y) = (1-x)² + 100(y-x²)²
Global minimum at (1, 1). Features a narrow, curved valley making it challenging for optimizers.
Sphere Function
f(x) = Σ x_i²
Convex quadratic with global minimum at origin. Easy test case - all optimizers should converge quickly.
Booth Function
f(x,y) = (x + 2y - 7)² + (2x + y - 5)²
Global minimum at (1, 3) with f(1, 3) = 0.
Examples Covered
1. Rosenbrock Function with Different Optimizers
Compares L-BFGS, Conjugate Gradient (Polak-Ribière and Fletcher-Reeves), and Damped Newton on the challenging Rosenbrock function.
2. Sphere Function (5D)
Tests all optimizers on a simple convex quadratic to verify correct implementation and fast convergence.
3. Booth Function
Demonstrates convergence on a moderately difficult quadratic problem.
4. Convergence Comparison
Runs optimizers from different initial points to analyze convergence behavior and robustness.
5. Optimizer Configuration
Shows how to configure:
- L-BFGS history size (m)
- CG periodic restart
- Damped Newton finite difference epsilon
Key Insights
L-BFGS
- Memory: Stores m recent gradients (typically m=10)
- Convergence: Superlinear for smooth convex functions
- Use case: General-purpose, large-scale optimization
- Cost: O(mn) per iteration
Conjugate Gradient
- Formulas: Polak-Ribière, Fletcher-Reeves, Hestenes-Stiefel
- Memory: O(n) only (no history storage)
- Convergence: Linear for quadratics, can stall on non-quadratics
- Use case: When memory is limited, or Hessian is expensive
- Tip: Periodic restart (every n iterations) helps non-quadratic problems
Damped Newton
- Hessian: Approximated via finite differences
- Convergence: Quadratic near minimum (fastest locally)
- Use case: High-accuracy solutions, few variables
- Cost: O(n²) Hessian approximation per iteration
Convergence Comparison
| Method | Rosenbrock Iters | Sphere Iters | Memory |
|---|---|---|---|
| L-BFGS | ~40-60 | ~10-15 | O(mn) |
| CG-PR | ~80-120 | ~5-10 | O(n) |
| CG-FR | ~100-150 | ~8-12 | O(n) |
| Damped Newton | ~20-30 | ~3-5 | O(n²) |
Running the Example
cargo run --example batch_optimization
The example runs all test functions with all optimizers, displaying:
- Convergence status
- Iteration count
- Final solution
- Objective value
- Gradient norm
- Elapsed time
Optimization Tips
- L-BFGS is the default choice for most smooth optimization problems
- Use CG when memory is constrained (large n)
- Use Damped Newton for high accuracy on smaller problems
- Always try multiple starting points to avoid local minima
- Monitor gradient norm - should decrease to near-zero at optimum
Code Location
See examples/batch_optimization.rs for full implementation.