Case Study: Linear SVM Iris

This case study demonstrates Linear Support Vector Machine (SVM) classification on the Iris dataset, achieving perfect 100% test accuracy for binary classification.

Running the Example

cargo run --example svm_iris

Results Summary

Test Accuracy: 100% (6/6 correct predictions on binary Setosa vs Versicolor)

Comparison with Other Classifiers

Classifier	Accuracy	Training Time	Prediction
Linear SVM	100.0%	<10ms (iterative)	O(p)
Naive Bayes	100.0%	<1ms (instant)	O(p·c)
kNN (k=5)	100.0%	<1ms (lazy)	O(n·p)

Winner: All three achieve perfect accuracy! Choice depends on:

SVM: Need margin-based decisions, robust to outliers
Naive Bayes: Need probabilistic predictions, instant training
kNN: Need non-parametric approach, local patterns

Decision Function Values

Sample  True  Predicted  Decision  Margin
───────────────────────────────────────────
   0      0      0       -1.195    1.195
   1      0      0       -1.111    1.111
   2      0      0       -1.105    1.105
   3      1      1       0.463    0.463
   4      1      1       1.305    1.305

Interpretation:

Negative decision: Predicted class 0 (Setosa)
Positive decision: Predicted class 1 (Versicolor)
Margin: Distance from decision boundary (confidence)
All samples correctly classified with good margins

Regularization Effect (C Parameter)

C Value	Accuracy	Behavior
0.01	50.0%	Over-regularized (too simple)
0.10	100.0%	Good regularization
1.00 (default)	100.0%	Balanced
10.00	100.0%	Fits data closely
100.00	100.0%	Minimal regularization

Insight: C ∈ [0.1, 100] all achieve 100% accuracy, showing:

Robust: Wide range of good C values
Well-separated: Iris species have distinct features
Warning: C=0.01 too restrictive (underfits)

Species	Correct	Total	Accuracy
Setosa	3/3	3	100.0%
Versicolor	3/3	3	100.0%

Why SVM Excels Here

Linearly separable: Setosa and Versicolor well-separated in feature space
Maximum margin: SVM finds optimal decision boundary
Robust: Soft margin (C parameter) handles outliers
Simple problem: Binary classification easier than multi-class
Clean data: Iris dataset has low noise

Implementation

use aprender::classification::LinearSVM;
use aprender::primitives::Matrix;

// Load binary data (Setosa vs Versicolor)
let (x_train, y_train, x_test, y_test) = load_binary_iris_data()?;

// Train Linear SVM
let mut svm = LinearSVM::new()
    .with_c(1.0)              // Regularization
    .with_max_iter(1000)      // Convergence
    .with_learning_rate(0.1); // Step size

svm.fit(&x_train, &y_train)?;

// Predict
let predictions = svm.predict(&x_test)?;
let decisions = svm.decision_function(&x_test)?;

// Evaluate
let accuracy = compute_accuracy(&predictions, &y_test);
println!("Accuracy: {:.1}%", accuracy * 100.0);

✓ 100% accuracy on test set
✓ Fast prediction (O(p) per sample)
✓ Robust regularization (wide C range works)
✓ Maximum margin decision boundary
✓ Interpretable decision function values

When Linear SVM Wins

Linearly separable classes
Need margin-based decisions
Want robust outlier handling
High-dimensional data (p >> n)
Binary classification problems

When to Use Alternatives

Naive Bayes: Need instant training, probabilistic output
kNN: Non-linear boundaries, local patterns important
Logistic Regression: Need calibrated probabilities
Kernel SVM: Non-linear decision boundaries required

Algorithm Details

Training Process

Initialize: w = 0, b = 0
Iterate: Subgradient descent for 1000 epochs
Update rule:
- If margin < 1: Update w and b (hinge loss)
- Else: Only regularize w
Converge: When weight change < tolerance

Optimization Objective

min  λ||w||² + (1/n) Σᵢ max(0, 1 - yᵢ(w·xᵢ + b))
     ─────────   ──────────────────────────────
   regularization        hinge loss

Hyperparameters

C = 1.0: Regularization strength (balanced)
learning_rate = 0.1: Step size for gradient descent
max_iter = 1000: Maximum epochs (converges faster)
tol = 1e-4: Convergence tolerance

Performance Analysis

Complexity

Training: O(n·p·iters) = O(14 × 4 × 1000) ≈ 56K ops
Prediction: O(m·p) = O(6 × 4) = 24 ops
Memory: O(p) = O(4) for weight vector

Training Time

Linear SVM: <10ms (subgradient descent)
Naive Bayes: <1ms (closed-form solution)
kNN: <1ms (lazy learning, no training)

Prediction Time

Linear SVM: O(p) - Very fast, constant per sample
Naive Bayes: O(p·c) - Fast, scales with classes
kNN: O(n·p) - Slower, scales with training size

Comparison: SVM vs Naive Bayes vs kNN

Accuracy

All achieve 100% on this well-separated binary problem.

Decision Mechanism

SVM: Maximum margin hyperplane (w·x + b = 0)
Naive Bayes: Bayes' theorem with Gaussian likelihoods
kNN: Local majority vote from k neighbors

Regularization

SVM: C parameter (controls margin/complexity trade-off)
Naive Bayes: Variance smoothing (prevents division by zero)
kNN: k parameter (controls local region size)

Output Type

SVM: Decision values (signed distance from hyperplane)
Naive Bayes: Probabilities (well-calibrated for independent features)
kNN: Probabilities (vote proportions, less calibrated)

Best Use Case

SVM: High-dimensional, linearly separable, need margins
Naive Bayes: Small data, need probabilities, instant training
kNN: Non-linear, local patterns, non-parametric

examples/naive_bayes_iris.rs - Gaussian Naive Bayes comparison
examples/knn_iris.rs - kNN comparison
book/src/ml-fundamentals/svm.md - SVM Theory

Further Exploration

Try Different C Values

for c in [0.001, 0.01, 0.1, 1.0, 10.0, 100.0] {
    let mut svm = LinearSVM::new().with_c(c);
    svm.fit(&x_train, &y_train)?;
    // Compare accuracy and margin sizes
}

Visualize Decision Boundary

Plot the hyperplane w·x + b = 0 in 2D feature space (e.g., petal_length vs petal_width).

Multi-Class Extension

Implement One-vs-Rest to handle all 3 Iris species:

// Train 3 binary classifiers:
// - Setosa vs (Versicolor, Virginica)
// - Versicolor vs (Setosa, Virginica)
// - Virginica vs (Setosa, Versicolor)
// Predict using argmax of decision functions

Add Kernel Functions

Extend to non-linear boundaries with RBF kernel:

K(x, x') = exp(-γ||x - x'||²)

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning