t-SNE Theory

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique optimized for visualizing high-dimensional data in 2D or 3D space.

Core Idea

t-SNE preserves local structure by:

Computing pairwise similarities in high-dimensional space (Gaussian kernel)
Computing pairwise similarities in low-dimensional space (Student's t-distribution)
Minimizing the KL divergence between these two distributions

Algorithm

Step 1: High-Dimensional Similarities

Compute conditional probabilities using Gaussian kernel:

P(j|i) = exp(-||x_i - x_j||² / (2σ_i²)) / Σ_k exp(-||x_i - x_k||² / (2σ_i²))

Where σ_i is chosen such that the perplexity equals a target value.

Perplexity controls the effective number of neighbors:

Perplexity(P_i) = 2^H(P_i)
where H(P_i) = -Σ_j P(j|i) log₂ P(j|i)

Typical range: 5-50 (default: 30)

Step 2: Symmetric Joint Probabilities

Make probabilities symmetric:

P_{ij} = (P(j|i) + P(i|j)) / (2N)

Step 3: Low-Dimensional Similarities

Use Student's t-distribution (heavy-tailed) to avoid "crowding problem":

Q_{ij} = (1 + ||y_i - y_j||²)^{-1} / Σ_{k≠l} (1 + ||y_k - y_l||²)^{-1}

Step 4: Minimize KL Divergence

Minimize Kullback-Leibler divergence:

KL(P||Q) = Σ_i Σ_j P_{ij} log(P_{ij} / Q_{ij})

Using gradient descent with momentum:

∂KL/∂y_i = 4 Σ_j (P_{ij} - Q_{ij}) · (y_i - y_j) · (1 + ||y_i - y_j||²)^{-1}

Parameters

n_components (default: 2): Embedding dimensions (usually 2 or 3 for visualization)
perplexity (default: 30.0): Balance between local and global structure
- Low (5-10): Very local, reveals fine clusters
- Medium (20-30): Balanced
- High (50+): More global structure
learning_rate (default: 200.0): Gradient descent step size
n_iter (default: 1000): Number of optimization iterations
- More iterations → better convergence but slower

Time and Space Complexity

Time: O(n²) per iteration for pairwise distances
- Total: O(n² · iterations)
- Impractical for n > 10,000
Space: O(n²) for distance and probability matrices

✓ Non-linear: Captures complex manifolds ✓ Local Structure: Preserves neighborhoods excellently ✓ Visualization: Best for 2D/3D plots ✓ Cluster Revelation: Makes clusters visually obvious

Disadvantages

✗ Slow: O(n²) doesn't scale to large datasets ✗ Stochastic: Different runs give different embeddings ✗ No Transform: Cannot embed new data points ✗ Global Structure: Distances between clusters not meaningful ✗ Tuning: Sensitive to perplexity, learning rate, iterations

Comparison with PCA

Feature	t-SNE	PCA
Type	Non-linear	Linear
Preserves	Local structure	Global variance
Speed	O(n²·iter)	O(n·d·k)
New Data	No	Yes
Stochastic	Yes	No
Use Case	Visualization	Preprocessing

When to Use

Use t-SNE for:

Visualizing high-dimensional data (>3D)
Exploratory data analysis
Finding hidden clusters
Presentations and reports (2D plots)

Don't use t-SNE for:

Large datasets (n > 10,000)
Feature reduction before modeling (use PCA instead)
When you need to transform new data
When global structure matters

Best Practices

Normalize data before t-SNE (different scales affect distances)
Try multiple perplexity values (5, 10, 30, 50) to see different structures
Run multiple times with different random seeds (stochastic)
Use enough iterations (500-1000 minimum)
Don't over-interpret distances between clusters
Consider PCA first if dataset > 50 dimensions (reduce to ~50D first)

Example Usage

use aprender::prelude::*;

// High-dimensional data
let data = Matrix::from_vec(100, 50, high_dim_data)?;

// Reduce to 2D for visualization
let mut tsne = TSNE::new(2)
    .with_perplexity(30.0)
    .with_n_iter(1000)
    .with_random_state(42);

let embedding = tsne.fit_transform(&data)?;

// Plot embedding[i, 0] vs embedding[i, 1]

References

van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. JMLR, 9, 2579-2605.
Wattenberg, et al. (2016). How to Use t-SNE Effectively. Distill.
Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10, 5416.

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning