Documentation Standards

Good documentation is essential for maintainable, discoverable, and usable code. Aprender follows Rust's documentation conventions with additional ML-specific guidance.

Why Documentation Matters

Documentation serves multiple audiences:

  1. Users: Learn how to use your APIs
  2. Contributors: Understand implementation details
  3. Future you: Remember why you made certain decisions
  4. Compiler: Doctests are executable examples that prevent documentation rot

Benefits:

  • Faster onboarding (new team members)
  • Better API discoverability (cargo doc)
  • Fewer support questions (self-service)
  • Higher confidence in refactoring (doctests catch breaking changes)

Rustdoc Basics

Rust has three types of documentation comments:

/// Documents the item that follows (function, struct, enum, etc.)
pub fn fit(&mut self, x: &Matrix<f32>, y: &Vector<f32>) -> Result<()> { }

//! Documents the enclosing item (module, crate)
//! Used at the top of files for module-level docs

/// Field documentation for struct fields
pub struct LinearRegression {
    /// Coefficients for features (excluding intercept).
    coefficients: Option<Vector<f32>>,
}

Generate documentation:

cargo doc --no-deps --open       # Generate and open in browser
cargo test --doc                 # Run doctests only
cargo doc --document-private-items  # Include private items

Module-Level Documentation

Every module should start with //! documentation:

//! Clustering algorithms.
//!
//! Includes K-Means, DBSCAN, Hierarchical, Gaussian Mixture Models, and Isolation Forest.
//!
//! # Example
//!
//! ```
//! use aprender::cluster::KMeans;
//! use aprender::primitives::Matrix;
//!
//! let data = Matrix::from_vec(6, 2, vec![
//!     0.0, 0.0, 0.1, 0.1, 0.2, 0.0,  // Cluster 1
//!     10.0, 10.0, 10.1, 10.1, 10.0, 10.2,  // Cluster 2
//! ]).unwrap();
//!
//! let mut kmeans = KMeans::new(2);
//! kmeans.fit(&data).unwrap();
//! let labels = kmeans.predict(&data);
//! ```

use crate::error::Result;
use crate::primitives::{Matrix, Vector};

Location: src/cluster/mod.rs:1-13

Elements:

  1. Summary: One sentence describing the module
  2. Details: Additional context (algorithms included, purpose)
  3. Example: Complete working example demonstrating module usage
  4. Imports: Show what users need to import

Function Documentation

Document public functions with standard sections:

/// Fits the model to training data.
///
/// Uses normal equations: `β = (X^T X)^-1 X^T y` via Cholesky decomposition.
/// Requires X to have full column rank (non-singular X^T X matrix).
///
/// # Arguments
///
/// * `x` - Feature matrix (n_samples × n_features)
/// * `y` - Target values (n_samples)
///
/// # Returns
///
/// `Ok(())` on success, or an error if fitting fails.
///
/// # Errors
///
/// Returns an error if:
/// - Dimensions don't match (x.n_rows() != y.len())
/// - Matrix is singular (collinear features)
/// - No data provided (n_samples == 0)
///
/// # Examples
///
/// ```
/// use aprender::prelude::*;
///
/// let x = Matrix::from_vec(4, 2, vec![
///     1.0, 1.0,
///     2.0, 4.0,
///     3.0, 9.0,
///     4.0, 16.0,
/// ]).unwrap();
/// let y = Vector::from_slice(&[2.1, 4.2, 6.1, 8.3]);
///
/// let mut model = LinearRegression::new();
/// model.fit(&x, &y).unwrap();
/// assert!(model.is_fitted());
/// ```
///
/// # Performance
///
/// - Time complexity: O(n²p + p³) where n = samples, p = features
/// - Space complexity: O(np) for storing X^T X
/// - Best for p < 10,000; use SGD for larger feature spaces
pub fn fit(&mut self, x: &Matrix<f32>, y: &Vector<f32>) -> Result<()> {
    // Implementation...
}

Sections (in order):

  1. Summary: One sentence describing what the function does
  2. Details: Algorithm, approach, or important context
  3. Arguments: Document each parameter (type is inferred from signature)
  4. Returns: What the function returns
  5. Errors: When the function returns Err (for Result types)
  6. Panics: When the function might panic (avoid panics in public APIs)
  7. Examples: Complete, runnable code demonstrating usage
  8. Performance: Complexity analysis, scaling behavior

When to Document Panics

/// Returns the coefficients (excluding intercept).
///
/// # Panics
///
/// Panics if model is not fitted. Call `fit()` first.
///
/// # Examples
///
/// ```
/// use aprender::prelude::*;
///
/// let x = Matrix::from_vec(2, 1, vec![1.0, 2.0]).unwrap();
/// let y = Vector::from_slice(&[3.0, 5.0]);
///
/// let mut model = LinearRegression::new();
/// model.fit(&x, &y).unwrap();
///
/// let coefs = model.coefficients();  // OK: model is fitted
/// assert_eq!(coefs.len(), 1);
/// ```
#[must_use]
pub fn coefficients(&self) -> &Vector<f32> {
    self.coefficients
        .as_ref()
        .expect("Model not fitted. Call fit() first.")
}

Location: src/linear_model/mod.rs:88-98

Guideline:

  • Document panics for unrecoverable programmer errors
  • Prefer Result for recoverable errors (user errors, I/O failures)
  • Use is_fitted() to provide non-panicking alternative

When to Document Errors

/// Saves the model to a binary file using bincode.
///
/// The file can be loaded later using `load()` to restore the model.
///
/// # Arguments
///
/// * `path` - Path where the model will be saved
///
/// # Errors
///
/// Returns an error if:
/// - Serialization fails (internal error)
/// - File writing fails (permissions, disk full, invalid path)
///
/// # Examples
///
/// ```no_run
/// use aprender::prelude::*;
///
/// let mut model = LinearRegression::new();
/// // ... fit the model ...
///
/// model.save("model.bin").unwrap();
/// ```
pub fn save<P: AsRef<Path>>(&self, path: P) -> std::result::Result<(), String> {
    let bytes = bincode::serialize(self).map_err(|e| format!("Serialization failed: {}", e))?;
    fs::write(path, bytes).map_err(|e| format!("File write failed: {}", e))?;
    Ok(())
}

Location: src/linear_model/mod.rs:112-121

Guideline:

  • Document all error conditions for functions returning Result
  • Be specific about when each error occurs
  • Group related errors (e.g., "I/O errors", "validation errors")

Type Documentation

Struct Documentation

/// Ordinary Least Squares (OLS) linear regression.
///
/// Fits a linear model by minimizing the residual sum of squares between
/// observed targets and predicted targets. The model equation is:
///
/// ```text
/// y = X β + ε
/// ```
///
/// where `β` is the coefficient vector and `ε` is random error.
///
/// # Solver
///
/// Uses normal equations: `β = (X^T X)^-1 X^T y` via Cholesky decomposition.
///
/// # Examples
///
/// ```
/// use aprender::prelude::*;
///
/// let x = Matrix::from_vec(4, 1, vec![1.0, 2.0, 3.0, 4.0]).unwrap();
/// let y = Vector::from_slice(&[3.0, 5.0, 7.0, 9.0]);
///
/// let mut model = LinearRegression::new();
/// model.fit(&x, &y).unwrap();
/// let predictions = model.predict(&x);
/// ```
///
/// # Performance
///
/// - Time complexity: O(n²p + p³) where n = samples, p = features
/// - Space complexity: O(np)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LinearRegression {
    /// Coefficients for features (excluding intercept).
    coefficients: Option<Vector<f32>>,
    /// Intercept (bias) term.
    intercept: f32,
    /// Whether to fit an intercept.
    fit_intercept: bool,
}

Location: src/linear_model/mod.rs:13-62

Elements:

  1. Summary: What the type represents
  2. Algorithm/Theory: Mathematical foundation (for ML types)
  3. Examples: How to create and use the type
  4. Performance: Complexity, memory usage
  5. Field docs: Document all fields (even private ones)

Enum Documentation

/// Errors that can occur in aprender operations.
///
/// This enum represents all error conditions that can occur when using
/// aprender. Variants provide detailed context about what went wrong.
///
/// # Examples
///
/// ```
/// use aprender::error::AprenderError;
///
/// let err = AprenderError::DimensionMismatch {
///     expected: "100x10".to_string(),
///     actual: "50x10".to_string(),
/// };
///
/// println!("Error: {}", err);
/// ```
#[derive(Debug, Clone, PartialEq)]
pub enum AprenderError {
    /// Matrix/vector dimensions don't match for the operation.
    DimensionMismatch {
        expected: String,
        actual: String,
    },

    /// Matrix is singular (non-invertible).
    SingularMatrix {
        det: f64,
    },

    /// Algorithm failed to converge within iteration limit.
    ConvergenceFailure {
        iterations: usize,
        final_loss: f64,
    },

    // ... more variants
}

Location: src/error.rs:7-78

Elements:

  1. Summary: Purpose of the enum
  2. Examples: Creating and using variants
  3. Variant docs: Document each variant's meaning

Trait Documentation

/// Primary trait for supervised learning estimators.
///
/// Estimators implement fit/predict/score following sklearn conventions.
/// Models that implement this trait can be used interchangeably in pipelines,
/// cross-validation, and hyperparameter tuning.
///
/// # Required Methods
///
/// - `fit()`: Train the model on labeled data
/// - `predict()`: Make predictions on new data
/// - `score()`: Evaluate model performance
///
/// # Examples
///
/// ```
/// use aprender::prelude::*;
///
/// fn train_and_evaluate<E: Estimator>(mut estimator: E) -> f32 {
///     let x = Matrix::from_vec(4, 1, vec![1.0, 2.0, 3.0, 4.0]).unwrap();
///     let y = Vector::from_slice(&[3.0, 5.0, 7.0, 9.0]);
///
///     estimator.fit(&x, &y).unwrap();
///     estimator.score(&x, &y)
/// }
///
/// // Works with any Estimator
/// let model = LinearRegression::new();
/// let r2 = train_and_evaluate(model);
/// assert!(r2 > 0.99);
/// ```
pub trait Estimator {
    /// Fits the model to training data.
    fn fit(&mut self, x: &Matrix<f32>, y: &Vector<f32>) -> Result<()>;

    /// Predicts target values for input data.
    fn predict(&self, x: &Matrix<f32>) -> Vector<f32>;

    /// Computes the score (R² for regression, accuracy for classification).
    fn score(&self, x: &Matrix<f32>, y: &Vector<f32>) -> f32;
}

Location: src/traits.rs:8-44

Elements:

  1. Summary: Purpose of the trait
  2. Context: When to implement, design philosophy
  3. Required Methods: List and explain each method
  4. Examples: Generic function using the trait

Doctests

Doctests are executable examples in documentation:

Basic Doctest

/// Computes the dot product of two vectors.
///
/// # Examples
///
/// ```
/// use aprender::primitives::Vector;
///
/// let a = Vector::from_slice(&[1.0, 2.0, 3.0]);
/// let b = Vector::from_slice(&[4.0, 5.0, 6.0]);
///
/// let dot = a.dot(&b);
/// assert_eq!(dot, 32.0);  // 1*4 + 2*5 + 3*6 = 32
/// ```
pub fn dot(&self, other: &Vector<f32>) -> f32 {
    // Implementation...
}

Run doctests:

cargo test --doc              # Run all doctests
cargo test --doc -- linear    # Run doctests containing "linear"

Doctest Attributes

/// Saves model to disk.
///
/// # Examples
///
/// ```no_run
/// # use aprender::prelude::*;
/// let model = LinearRegression::new();
/// model.save("model.bin").unwrap();  // Don't actually write file during test
/// ```

Common attributes:

  • no_run: Compile but don't execute (for I/O operations)
  • ignore: Skip this doctest entirely
  • should_panic: Expect the code to panic

Hidden Lines in Doctests

/// Computes R² score.
///
/// # Examples
///
/// ```
/// # use aprender::prelude::*;
/// # let x = Matrix::from_vec(2, 1, vec![1.0, 2.0]).unwrap();
/// # let y = Vector::from_slice(&[3.0, 5.0]);
/// # let mut model = LinearRegression::new();
/// # model.fit(&x, &y).unwrap();
/// let score = model.score(&x, &y);
/// assert!(score > 0.99);
/// ```

Lines starting with # are hidden in rendered docs but executed in tests.

Use for:

  • Imports (use aprender::prelude::*;)
  • Setup code (creating test data)
  • Boilerplate that distracts from the example

Documentation Patterns

Pattern 1: Progressive Disclosure

Start simple, add complexity gradually:

/// K-Means clustering algorithm.
///
/// # Basic Example
///
/// ```
/// use aprender::prelude::*;
///
/// let data = Matrix::from_vec(4, 2, vec![
///     0.0, 0.0,
///     0.1, 0.1,
///     10.0, 10.0,
///     10.1, 10.1,
/// ]).unwrap();
///
/// let mut kmeans = KMeans::new(2);
/// kmeans.fit(&data).unwrap();
/// ```
///
/// # Advanced: Hyperparameter Tuning
///
/// ```
/// # use aprender::prelude::*;
/// # let data = Matrix::from_vec(4, 2, vec![0.0; 8]).unwrap();
/// let mut kmeans = KMeans::new(3)
///     .with_max_iter(500)
///     .with_tol(1e-6)
///     .with_random_state(42);
///
/// kmeans.fit(&data).unwrap();
/// let inertia = kmeans.inertia();
/// ```

Pattern 2: Show Both Success and Failure

/// Loads a model from disk.
///
/// # Examples
///
/// ## Success
///
/// ```no_run
/// # use aprender::prelude::*;
/// let model = LinearRegression::load("model.bin").unwrap();
/// let predictions = model.predict(&x);
/// ```
///
/// ## Handling Errors
///
/// ```no_run
/// # use aprender::prelude::*;
/// match LinearRegression::load("model.bin") {
///     Ok(model) => println!("Loaded successfully"),
///     Err(e) => eprintln!("Failed to load: {}", e),
/// }
/// ```
/// Splits data into training and test sets.
///
/// See also:
/// - [`KFold`] for cross-validation splits
/// - [`cross_validate`] for complete cross-validation
///
/// [`KFold`]: crate::model_selection::KFold
/// [`cross_validate`]: crate::model_selection::cross_validate

Use intra-doc links to help users discover related functionality.

Common Documentation Pitfalls

Pitfall 1: Outdated Examples

// ❌ Example doesn't compile - API changed
/// # Examples
///
/// ```
/// let model = LinearRegression::new(true);  // Constructor signature changed!
/// model.train(&x, &y);  // Method renamed to fit()!
/// ```

Prevention: Run cargo test --doc regularly. Doctests prevent documentation rot.

Pitfall 2: Missing Imports

// ❌ Example won't compile - missing imports
/// ```
/// let model = LinearRegression::new();  // Where does this come from?
/// ```

Fix:

// ✅ Show imports
/// ```
/// use aprender::prelude::*;
///
/// let model = LinearRegression::new();
/// ```

Pitfall 3: Incomplete Examples

// ❌ Example doesn't show how to use the result
/// ```
/// let model = LinearRegression::new();
/// model.fit(&x, &y).unwrap();
/// // Now what?
/// ```

Fix:

// ✅ Complete workflow
/// ```
/// # use aprender::prelude::*;
/// # let x = Matrix::from_vec(2, 1, vec![1.0, 2.0]).unwrap();
/// # let y = Vector::from_slice(&[3.0, 5.0]);
/// let mut model = LinearRegression::new();
/// model.fit(&x, &y).unwrap();
///
/// // Make predictions
/// let predictions = model.predict(&x);
///
/// // Evaluate
/// let r2 = model.score(&x, &y);
/// println!("R² = {}", r2);
/// ```

Pitfall 4: No Motivation

// ❌ Doesn't explain *why* you'd use this
/// Sets the tolerance parameter.
pub fn with_tolerance(mut self, tol: f32) -> Self { }

Fix:

// ✅ Explains purpose and impact
/// Sets the convergence tolerance.
///
/// Smaller values lead to more accurate solutions but require more iterations.
/// Larger values converge faster but may be less precise.
///
/// Default: 1e-4 (good for most use cases)
///
/// # Examples
///
/// ```
/// # use aprender::cluster::KMeans;
/// // High precision (slower)
/// let kmeans = KMeans::new(3).with_tol(1e-8);
///
/// // Fast convergence (less precise)
/// let kmeans = KMeans::new(3).with_tol(1e-2);
/// ```
pub fn with_tolerance(mut self, tol: f32) -> Self { }

Pitfall 5: Assuming Knowledge

// ❌ Uses jargon without explanation
/// Uses k-means++ initialization with Lloyd's algorithm.

Fix:

// ✅ Explains concepts
/// Initializes centroids using k-means++ (smart initialization that spreads
/// centroids apart) then runs Lloyd's algorithm (iteratively assign points
/// to nearest centroid and recompute centroids).

Documentation Checklist

Before merging code, verify:

  • Module has //! documentation with example
  • All public types have /// documentation
  • All public functions have:
    • Summary line
    • Example (that compiles and runs)
    • # Errors section (if returns Result)
    • # Panics section (if can panic)
    • # Arguments section (for complex parameters)
  • Doctests compile and pass (cargo test --doc)
  • Examples show complete workflow (imports, setup, usage)
  • Links to related items (traits, types, functions)
  • Performance notes (for algorithms and hot paths)

Tools

Generate Documentation

# Generate docs for your crate only (no dependencies)
cargo doc --no-deps --open

# Include private items (for internal docs)
cargo doc --document-private-items

# Check for broken links
cargo doc --no-deps 2>&1 | grep "warning: unresolved link"

Test Documentation

# Run all doctests
cargo test --doc

# Run specific doctest
cargo test --doc -- linear_regression

# Show doctest output
cargo test --doc -- --nocapture

Documentation Coverage

# Check which items lack documentation (requires nightly)
cargo +nightly rustdoc -- -Z unstable-options --show-coverage

Summary

Good documentation is code—it must be maintained, tested, and refactored:

Key principles:

  1. Executable examples: Use doctests to prevent documentation rot
  2. Progressive disclosure: Start simple, add complexity
  3. Complete workflows: Show imports, setup, and usage
  4. Explain why: Motivation, trade-offs, when to use
  5. Consistent structure: Follow standard sections (Args, Returns, Errors, Examples)
  6. Link related items: Help users discover functionality
  7. Test regularly: cargo test --doc catches broken examples

Documentation sections (in order):

  1. Summary (one sentence)
  2. Details (algorithm, approach)
  3. Arguments
  4. Returns
  5. Errors
  6. Panics
  7. Examples
  8. Performance

Real-world examples:

  • src/lib.rs:1-47 - Module-level documentation with Quick Start
  • src/linear_model/mod.rs:13-62 - Struct documentation with math and examples
  • src/traits.rs:8-44 - Trait documentation with generic examples
  • src/error.rs:7-78 - Enum documentation with variant descriptions

Tools:

  • cargo doc --no-deps --open - Generate and view documentation
  • cargo test --doc - Run doctests to verify examples
  • # hidden lines - Hide boilerplate while keeping tests complete

Documentation is not an afterthought—it's an essential part of your API that ensures your code is usable, maintainable, and discoverable.