Iris Dataset

The classic Fisher's Iris dataset (1936) for classification tasks.

Overview

  • Samples: 150 (all embedded)
  • Features: 4 numeric measurements
  • Classes: 3 species (setosa, versicolor, virginica)
  • Task: Multi-class classification

Loading

use alimentar::datasets::{iris, CanonicalDataset};

let dataset = iris()?;
assert_eq!(dataset.len(), 150);
assert_eq!(dataset.num_features(), 4);
assert_eq!(dataset.num_classes(), 3);

Schema

ColumnTypeDescription
sepal_lengthf64Sepal length (cm)
sepal_widthf64Sepal width (cm)
petal_lengthf64Petal length (cm)
petal_widthf64Petal width (cm)
speciesstring"setosa", "versicolor", "virginica"

Feature Access

let dataset = iris()?;

// Get feature names
let names = dataset.feature_names();
// ["sepal_length", "sepal_width", "petal_length", "petal_width"]

// Extract features only (no labels)
let features = dataset.features()?;
assert_eq!(features.schema().fields().len(), 4);

Label Access

let dataset = iris()?;

// String labels
let labels = dataset.labels();
// ["setosa", "setosa", ..., "virginica"]

// Numeric labels (0, 1, 2)
let numeric = dataset.labels_numeric();
// [0, 0, ..., 2]

Class Distribution

The dataset is perfectly balanced:

ClassLabelCount
setosa050
versicolor150
virginica250

Example: Simple Classification

use alimentar::datasets::{iris, CanonicalDataset};
use alimentar::DataLoader;

let dataset = iris()?;
let features = dataset.features()?;
let labels = dataset.labels_numeric();

// Create batched loader
let loader = DataLoader::new(features)
    .batch_size(32)
    .shuffle(true);

for batch in loader {
    println!("Batch: {} rows", batch.num_rows());
}

Reference

Fisher, R.A. (1936). "The use of multiple measurements in taxonomic problems." Annals of Eugenics, 7(2), 179-188.