Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

microGPT

A port of Karpathy’s microGPT to Rust using aprender.

What is microGPT?

microGPT is a 4,192-parameter GPT that generates human-like names after training on 32K examples. It demonstrates the algorithmic essence of large language models: tokenization, embeddings, multi-head attention with causal masking, residual connections, RMSNorm, MLPs, and cross-entropy loss with backpropagation.

Everything else is just efficiency.

Why Rust + aprender?

The original Python implementation uses a custom autograd engine operating on scalar Value objects. This Rust port uses aprender-core for SIMD-accelerated tensor operations with automatic differentiation, achieving the same architecture with production-grade numerics.

Quick start

cargo run --release

Trains for 5,000 steps (~2 seconds on CPU), then samples 20 names:

step    0 | loss 3.29
step 2500 | loss 2.56
step 4999 | loss 1.95

Generated names (temperature=0.5):
  karila, maria, misha, mayla, anara