MNIST Dataset

Handwritten digit recognition dataset (LeCun et al., 1998).

Overview

  • Embedded: 100 samples (10 per digit)
  • Full (hf-hub): 70,000 samples
  • Features: 784 pixels (28x28 grayscale)
  • Classes: 10 digits (0-9)
  • Task: Multi-class classification

Loading

use alimentar::datasets::{mnist, CanonicalDataset};

// Embedded sample (offline)
let dataset = mnist()?;
assert_eq!(dataset.len(), 100);
assert_eq!(dataset.num_features(), 784);
assert_eq!(dataset.num_classes(), 10);

Full Dataset

Enable hf-hub feature for complete MNIST:

[dependencies]
alimentar = { version = "0.1", features = ["hf-hub"] }
use alimentar::datasets::MnistDataset;

let full = MnistDataset::load_full()?;
assert_eq!(full.len(), 70_000);

Schema

ColumnTypeDescription
pixel_0..pixel_783f32Pixel intensities (0.0-1.0)
labeli32Digit class (0-9)

Train/Test Split

let dataset = mnist()?;
let split = dataset.split()?;

// 80/20 split
assert_eq!(split.train.len(), 80);
assert_eq!(split.test.len(), 20);

Pixel Layout

Pixels are stored in row-major order:

pixel_0   pixel_1   ... pixel_27     (row 0)
pixel_28  pixel_29  ... pixel_55     (row 1)
...
pixel_756 pixel_757 ... pixel_783    (row 27)

To reconstruct a 28x28 image:

fn pixel_index(row: usize, col: usize) -> usize {
    row * 28 + col
}

Embedded Sample

The embedded dataset contains procedurally generated digit patterns:

  • 10 samples per digit class
  • Simple geometric representations
  • Useful for testing pipelines without downloads

Example: Digit Classification Pipeline

use alimentar::datasets::{mnist, CanonicalDataset};
use alimentar::{DataLoader, Normalize, NormMethod, Transform};

let dataset = mnist()?;
let split = dataset.split()?;

// Normalize pixel values
let normalizer = Normalize::new(NormMethod::MinMax);

let train_loader = DataLoader::new(split.train)
    .batch_size(32)
    .shuffle(true);

for batch in train_loader {
    let normalized = normalizer.apply(batch)?;
    // Feed to model...
}

Reference

LeCun, Y., Cortes, C., & Burges, C.J. (1998). "The MNIST database of handwritten digits." http://yann.lecun.com/exdb/mnist/