CIFAR-10 Dataset

Color image classification dataset (Krizhevsky, 2009).

Overview

  • Embedded: 100 samples (10 per class)
  • Full (hf-hub): 60,000 samples
  • Features: 3,072 pixels (32x32x3 RGB)
  • Classes: 10 object categories
  • Task: Multi-class image classification

Loading

use alimentar::datasets::{cifar10, CanonicalDataset};

let dataset = cifar10()?;
assert_eq!(dataset.len(), 100);
assert_eq!(dataset.num_features(), 3072);
assert_eq!(dataset.num_classes(), 10);

Full Dataset

Enable hf-hub feature for complete CIFAR-10:

[dependencies]
alimentar = { version = "0.1", features = ["hf-hub"] }
use alimentar::datasets::Cifar10Dataset;

let full = Cifar10Dataset::load_full()?;
assert_eq!(full.len(), 60_000);

Class Names

use alimentar::datasets::{Cifar10Dataset, CIFAR10_CLASSES};

// All class names
println!("{:?}", CIFAR10_CLASSES);
// ["airplane", "automobile", "bird", "cat", "deer",
//  "dog", "frog", "horse", "ship", "truck"]

// Lookup by label
let name = Cifar10Dataset::class_name(0); // Some("airplane")
let name = Cifar10Dataset::class_name(9); // Some("truck")
let name = Cifar10Dataset::class_name(10); // None

Schema

ColumnTypeDescription
pixel_0..pixel_3071f32Pixel intensities (0.0-1.0)
labeli32Class index (0-9)

Pixel Layout

Pixels are stored channel-first (planar):

R channel: pixel_0    .. pixel_1023   (32x32 = 1024)
G channel: pixel_1024 .. pixel_2047
B channel: pixel_2048 .. pixel_3071

To extract RGB for pixel (row, col):

fn rgb_indices(row: usize, col: usize) -> (usize, usize, usize) {
    let idx = row * 32 + col;
    (idx, idx + 1024, idx + 2048)  // R, G, B
}

Train/Test Split

let dataset = cifar10()?;
let split = dataset.split()?;

// 80/20 split
assert_eq!(split.train.len(), 80);
assert_eq!(split.test.len(), 20);

Embedded Sample

The embedded dataset uses class-specific color patterns:

ClassColor Pattern
airplaneSky blue
automobileGray
birdBrown
catOrange
deerDark brown
dogTan
frogGreen
horseBrown
shipNavy
truckRed

Example: Image Classification Pipeline

use alimentar::datasets::{cifar10, Cifar10Dataset, CanonicalDataset};
use alimentar::DataLoader;

let dataset = cifar10()?;
let split = dataset.split()?;

let train_loader = DataLoader::new(split.train)
    .batch_size(64)
    .shuffle(true);

for batch in train_loader {
    println!("Batch: {} images", batch.num_rows());
    // Extract features and labels for training...
}

Reference

Krizhevsky, A. (2009). "Learning Multiple Layers of Features from Tiny Images." Technical Report, University of Toronto.