Logistic Regression Theory

Chapter Status: ✅ 100% Working (All examples verified)

StatusCountExamples
✅ Working5+All verified by tests + SafeTensors
⏳ In Progress0-
⬜ Not Implemented0-

Last tested: 2025-11-19 Aprender version: 0.3.0 Test file: src/classification/mod.rs tests + SafeTensors tests


Overview

Logistic regression is the foundation of binary classification. Despite its name, it's a classification algorithm that predicts probabilities using the logistic (sigmoid) function.

Key Concepts:

  • Sigmoid function: Maps any value to [0, 1] probability
  • Binary classification: Predict class 0 or 1
  • Gradient descent: Iterative optimization (no closed-form)

Why This Matters: Logistic regression powers countless applications: spam detection, medical diagnosis, credit scoring. It's interpretable, fast, and surprisingly effective.


Mathematical Foundation

The Sigmoid Function

The sigmoid (logistic) function squashes any real number to [0, 1]:

σ(z) = 1 / (1 + e^(-z))

Properties:

  • σ(0) = 0.5 (decision boundary)
  • σ(+∞) → 1 (high confidence for class 1)
  • σ(-∞) → 0 (high confidence for class 0)

Logistic Regression Model

For input x and coefficients β:

P(y=1|x) = σ(β·x + intercept)
         = 1 / (1 + e^(-(β·x + intercept)))

Decision Rule: Predict class 1 if P(y=1|x) ≥ 0.5, else class 0

Training: Gradient Descent

Unlike linear regression, there's no closed-form solution. We use gradient descent to minimize the binary cross-entropy loss:

Loss = -[y log(p) + (1-y) log(1-p)]

Where p = σ(β·x + intercept) is the predicted probability.

Test Reference: Implementation uses gradient descent in src/classification/mod.rs


Implementation in Aprender

Example 1: Binary Classification

use aprender::classification::LogisticRegression;
use aprender::primitives::{Matrix, Vector};

// Binary classification data (linearly separable)
let x = Matrix::from_vec(4, 2, vec![
    1.0, 1.0,  // Class 0
    1.0, 2.0,  // Class 0
    3.0, 3.0,  // Class 1
    3.0, 4.0,  // Class 1
]).unwrap();
let y = Vector::from_vec(vec![0.0, 0.0, 1.0, 1.0]);

// Train with gradient descent
let mut model = LogisticRegression::new()
    .with_learning_rate(0.1)
    .with_max_iter(1000)
    .with_tol(1e-4);

model.fit(&x, &y).unwrap();

// Predict probabilities
let x_test = Matrix::from_vec(1, 2, vec![2.0, 2.5]).unwrap();
let proba = model.predict_proba(&x_test);
println!("P(class=1) = {:.3}", proba[0]); // e.g., 0.612

Test Reference: src/classification/mod.rs::tests::test_logistic_regression_fit

Example 2: Model Serialization (SafeTensors)

Logistic regression models can be saved and loaded:

// Save model
model.save_safetensors("model.safetensors").unwrap();

// Load model (in production environment)
let loaded = LogisticRegression::load_safetensors("model.safetensors").unwrap();

// Predictions match exactly
let proba_original = model.predict_proba(&x_test);
let proba_loaded = loaded.predict_proba(&x_test);
assert_eq!(proba_original[0], proba_loaded[0]); // Exact match

Why This Matters: SafeTensors format is compatible with HuggingFace, PyTorch, TensorFlow, enabling cross-platform ML pipelines.

Test Reference: src/classification/mod.rs::tests::test_save_load_safetensors_roundtrip

Case Study: See Case Study: Logistic Regression for complete SafeTensors implementation (281 lines)


Verification Through Tests

Logistic regression has comprehensive test coverage:

Core Functionality Tests:

  • Fitting on linearly separable data
  • Probability predictions in [0, 1]
  • Decision boundary at 0.5 threshold

SafeTensors Tests (5 tests):

  • Unfitted model error handling
  • Save/load roundtrip
  • Corrupted file handling
  • Missing file error
  • Probability preservation (critical for classification)

All tests passing ensures production readiness.


Practical Considerations

When to Use Logistic Regression

  • Good for:

    • Binary classification (2 classes)
    • Interpretable coefficients (feature importance)
    • Probability estimates needed
    • Linearly separable data
  • Not good for:

    • Non-linear decision boundaries (use kernels or neural nets)
    • Multi-class classification (use softmax regression)
    • Imbalanced classes without adjustment

Performance Characteristics

  • Time Complexity: O(n·m·iter) where iter ≈ 100-1000
  • Space Complexity: O(n·m)
  • Convergence: Usually fast (< 1000 iterations)

Common Pitfalls

  1. Unscaled Features:

    • Problem: Features with different scales slow convergence
    • Solution: Use StandardScaler before training
  2. Non-convergence:

    • Problem: Learning rate too high → oscillation
    • Solution: Reduce learning_rate or increase max_iter
  3. Assuming Linearity:

    • Problem: Non-linear boundaries → poor accuracy
    • Solution: Add polynomial features or use kernel methods

Comparison with Alternatives

ApproachProsConsWhen to Use
Logistic Regression- Interpretable
- Fast training
- Probabilities
- Linear boundaries only
- Gradient descent needed
Interpretable binary classification
SVM- Non-linear kernels
- Max-margin
- No probabilities
- Slow on large data
Non-linear boundaries
Decision Trees- Non-linear
- No feature scaling
- Overfitting
- Unstable
Quick baseline

Real-World Application

Case Study Reference: See Case Study: Logistic Regression for:

  • Complete SafeTensors implementation (281 lines)
  • RED-GREEN-REFACTOR workflow
  • 5 comprehensive tests
  • Production deployment example (aprender → realizar)

Key Insight: SafeTensors enables cross-platform ML. Train in Rust, deploy anywhere (Python, C++, WASM).


Further Reading

Peer-Reviewed Paper

Cox (1958) - The Regression Analysis of Binary Sequences

  • Relevance: Original paper introducing logistic regression
  • Link: JSTOR (publicly accessible)
  • Key Contribution: Maximum likelihood estimation for binary outcomes
  • Applied in: src/classification/mod.rs

Summary

What You Learned:

  • ✅ Sigmoid function: σ(z) = 1/(1 + e^(-z))
  • ✅ Binary classification via probability thresholding
  • ✅ Gradient descent training (no closed-form)
  • ✅ SafeTensors serialization for production

Verification Guarantee: All logistic regression code is extensively tested (10+ tests) including SafeTensors roundtrip. See case study for complete implementation.

Test Summary:

  • 5+ core tests (fitting, predictions, probabilities)
  • 5 SafeTensors tests (serialization, errors)
  • 100% passing rate

Next Chapter: Decision Trees Theory

Previous Chapter: Regularization Theory

REQUIRED: Read Case Study: Logistic Regression for SafeTensors implementation