Canonical Datasets

Alimentar provides built-in access to well-known ML datasets for tutorials, benchmarking, and quick experimentation. All datasets follow a sovereign-first design: embedded samples work offline without any network dependency.

Design Philosophy

  • Offline by default: Small embedded samples work without downloads
  • Optional full data: Enable hf-hub feature for complete datasets
  • Uniform API: All datasets implement CanonicalDataset trait
  • Zero configuration: One-liner loading with sensible defaults

Available Datasets

DatasetFunctionEmbeddedFull (hf-hub)Use Case
Irisiris()150N/AClassification intro
MNISTmnist()10070,000Digit recognition
Fashion-MNISTfashion_mnist()10070,000Clothing classification
CIFAR-10cifar10()10060,000Image classification
CIFAR-100cifar100()10060,000Fine-grained classification

Quick Start

use alimentar::datasets::{iris, mnist, cifar10, CanonicalDataset};

// Load datasets (no network required)
let iris = iris()?;
let mnist = mnist()?;
let cifar = cifar10()?;

// Common trait methods
println!("Iris: {} samples, {} features", iris.len(), iris.num_features());
println!("MNIST: {} classes", mnist.num_classes());
println!("CIFAR-10: {}", cifar.description());

The CanonicalDataset Trait

All canonical datasets implement this trait:

pub trait CanonicalDataset {
    fn data(&self) -> &ArrowDataset;
    fn len(&self) -> usize;
    fn is_empty(&self) -> bool;
    fn num_features(&self) -> usize;
    fn num_classes(&self) -> usize;
    fn feature_names(&self) -> &'static [&'static str];
    fn target_name(&self) -> &'static str;
    fn description(&self) -> &'static str;
}

Train/Test Splits

MNIST and CIFAR-10 provide built-in 80/20 splits:

let mnist = mnist()?;
let split = mnist.split()?;

println!("Train: {} samples", split.train.len());
println!("Test: {} samples", split.test.len());

Full Datasets (Optional)

For production use, enable the hf-hub feature to download complete datasets:

[dependencies]
alimentar = { version = "0.1", features = ["hf-hub"] }
// Downloads from HuggingFace Hub on first use
let full_mnist = MnistDataset::load_full()?;
let full_cifar = Cifar10Dataset::load_full()?;