Naive Bayes

Naive Bayes is a family of probabilistic classifiers based on Bayes' theorem with the "naive" assumption of feature independence. Despite this strong assumption, Naive Bayes classifiers are remarkably effective in practice, especially for text classification.

Bayes' Theorem

The foundation of Naive Bayes is Bayes' theorem:

P(y|X) = P(X|y) * P(y) / P(X)

Where:

P(y|X): Posterior probability (probability of class y given features X)
P(X|y): Likelihood (probability of features X given class y)
P(y): Prior probability (probability of class y)
P(X): Evidence (probability of features X)

The Naive Assumption

Naive Bayes assumes conditional independence between features:

P(X|y) = P(x₁|y) * P(x₂|y) * ... * P(xₚ|y)

This simplifies computation dramatically, reducing from exponential to linear complexity.

Gaussian Naive Bayes

Assumes features follow a Gaussian (normal) distribution within each class.

Training

For each class c and feature i:

Compute mean: μᵢ,c = mean(xᵢ where y=c)
Compute variance: σ²ᵢ,c = var(xᵢ where y=c)
Compute prior: P(y=c) = count(y=c) / n

Prediction

For each class c:

log P(y=c|X) = log P(y=c) + Σᵢ log P(xᵢ|y=c)

where P(xᵢ|y=c) ~ N(μᵢ,c, σ²ᵢ,c) (Gaussian PDF)

Return class with highest posterior probability.

Implementation in Aprender

use aprender::classification::GaussianNB;
use aprender::primitives::Matrix;

// Create and train
let mut nb = GaussianNB::new();
nb.fit(&x_train, &y_train)?;

// Predict
let predictions = nb.predict(&x_test)?;

// Get probabilities
let probabilities = nb.predict_proba(&x_test)?;

Variance Smoothing

Adds small constant to variances to prevent numerical instability:

let nb = GaussianNB::new().with_var_smoothing(1e-9);

Complexity

Operation	Time	Space
Training	O(n·p)	O(c·p)
Prediction	O(m·p·c)	O(m·c)

Where: n=samples, p=features, c=classes, m=test samples

Advantages

✓ Extremely fast training and prediction
✓ Probabilistic predictions with confidence scores
✓ Works with small datasets
✓ Handles high-dimensional data well
✓ Naturally handles imbalanced classes via priors

Disadvantages

✗ Independence assumption rarely holds in practice
✗ Gaussian assumption may not fit data
✗ Cannot capture feature interactions
✗ Poor probability estimates (despite good classification)

When to Use

✓ Text classification (spam detection, sentiment analysis)
✓ Small datasets (<1000 samples)
✓ High-dimensional data (p > n)
✓ Baseline classifier (fast to implement and test)
✓ Real-time prediction requirements

Example Results

On Iris dataset:

Training time: <1ms
Test accuracy: 100% (30 samples)
Outperforms kNN: 100% vs 90%

See examples/naive_bayes_iris.rs for complete example.

API Reference

// Constructor
pub fn new() -> Self

// Builder
pub fn with_var_smoothing(mut self, var_smoothing: f32) -> Self

// Training
pub fn fit(&mut self, x: &Matrix<f32>, y: &[usize]) -> Result<(), &'static str>

// Prediction
pub fn predict(&self, x: &Matrix<f32>) -> Result<Vec<usize>, &'static str>
pub fn predict_proba(&self, x: &Matrix<f32>) -> Result<Vec<Vec<f32>>, &'static str>

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning