Case Study: Normal-InverseGamma Bayesian Inference

This case study demonstrates Bayesian inference for continuous data with unknown mean and variance using the Normal-InverseGamma conjugate family. We cover four practical scenarios: manufacturing quality control, medical data analysis, sequential learning, and prior comparison.

Overview

The Normal-InverseGamma conjugate family is fundamental for Bayesian inference on normally distributed data with both parameters unknown:

Prior: Normal-InverseGamma(μ₀, κ₀, α₀, β₀) for (μ, σ²)
Likelihood: Normal(μ, σ²) for continuous observations
Posterior: Normal-InverseGamma with closed-form parameter updates

This hierarchical structure models:

σ² ~ InverseGamma(α, β) - variance prior
μ | σ² ~ Normal(μ₀, σ²/κ) - conditional mean prior

This enables exact bivariate Bayesian inference without numerical integration.

Running the Example

cargo run --example normal_inverse_gamma_inference

Expected output: Four demonstrations showing prior specification, bivariate posterior updating, credible intervals for both parameters, and sequential learning.

Example 1: Manufacturing Quality Control

Problem

You're manufacturing precision parts with target diameter 10.0mm. Over a production run, you measure 10 parts: [9.98, 10.02, 9.97, 10.03, 10.01, 9.99, 10.04, 9.96, 10.00, 10.02] mm.

Is the manufacturing process on-target? What is the process precision (standard deviation)?

Solution

use aprender::bayesian::NormalInverseGamma;

// Weakly informative prior centered on target
// μ₀ = 10.0 (target), κ₀ = 1.0 (low confidence)
// α₀ = 3.0, β₀ = 0.02 (weak prior for variance)
let mut model = NormalInverseGamma::new(10.0, 1.0, 3.0, 0.02)
    .expect("Valid parameters");

println!("Prior:");
println!("  E[μ] = {:.4} mm", 10.0);
println!("  E[σ²] = {:.6} mm²", 0.02 / (3.0 - 1.0));  // β/(α-1) = 0.01

// Update with observed measurements
let measurements = vec![9.98, 10.02, 9.97, 10.03, 10.01, 9.99, 10.04, 9.96, 10.00, 10.02];
model.update(&measurements);

let mean_mu = model.posterior_mean_mu();  // E[μ|D] ≈ 10.002
let mean_var = model.posterior_mean_variance().unwrap();  // E[σ²|D] ≈ 0.0033
let std_dev = mean_var.sqrt();  // E[σ|D] ≈ 0.058

Posterior Statistics

use aprender::bayesian::NormalInverseGamma;

// Assume model is already updated with data
let mut model = NormalInverseGamma::new(10.0, 1.0, 3.0, 0.02).expect("Valid parameters");
let measurements = vec![9.98, 10.02, 9.97, 10.03, 10.01, 9.99, 10.04, 9.96, 10.00, 10.02];
model.update(&measurements);

// Posterior mean of μ (location parameter)
let mean_mu = model.posterior_mean_mu();  // 10.002 mm

// Posterior mean of σ² (variance parameter)
let mean_var = model.posterior_mean_variance().unwrap();  // 0.0033 mm²
let std_dev = mean_var.sqrt();  // 0.058 mm

// Posterior variance of μ (uncertainty about mean)
let var_mu = model.posterior_variance_mu().unwrap();  // quantifies uncertainty

// 95% credible interval for μ
let (lower, upper) = model.credible_interval_mu(0.95).unwrap();
// [9.97, 10.04] mm

// Posterior predictive for next measurement
let predicted = model.posterior_predictive();  // E[x_new | D] = mean_mu

Interpretation

Posterior mean μ (10.002mm): The process mean is very close to the 10.0mm target.

Credible interval [9.97, 10.04]: We are 95% confident the true mean diameter is between 9.97mm and 10.04mm. Since the target (10.0mm) falls within this interval, the process is on-target.

Standard deviation (0.058mm): The manufacturing process has good precision with σ ≈ 0.058mm. For ±3σ coverage, parts will range from 9.83mm to 10.17mm.

Practical Application

Process capability: With 6σ = 0.348mm spread and typical tolerance of ±0.1mm (0.2mm total), the process needs tightening or the tolerance specification is too strict.

Quality control: Parts outside [mean - 3σ, mean + 3σ] = [9.83, 10.17] should be investigated as potential outliers.

Example 2: Medical Data Analysis

Problem

You're monitoring two patients' blood pressure (systolic BP in mmHg):

Patient A: [118, 122, 120, 119, 121, 120, 118, 122] mmHg
Patient B: [135, 142, 138, 145, 140, 137, 143, 139] mmHg

Does Patient B have significantly higher BP? Which patient has more variable BP?

Solution

use aprender::bayesian::NormalInverseGamma;

// Patient A
let patient_a = vec![118.0, 122.0, 120.0, 119.0, 121.0, 120.0, 118.0, 122.0];
let mut model_a = NormalInverseGamma::noninformative();
model_a.update(&patient_a);

let mean_a = model_a.posterior_mean_mu();  // 120.0 mmHg
let (lower_a, upper_a) = model_a.credible_interval_mu(0.95).unwrap();
// 95% CI: [118.4, 121.6]
let var_a = model_a.posterior_mean_variance().unwrap();  // 5.4 mmHg²

// Patient B
let patient_b = vec![135.0, 142.0, 138.0, 145.0, 140.0, 137.0, 143.0, 139.0];
let mut model_b = NormalInverseGamma::noninformative();
model_b.update(&patient_b);

let mean_b = model_b.posterior_mean_mu();  // 139.9 mmHg
let (lower_b, upper_b) = model_b.credible_interval_mu(0.95).unwrap();
// 95% CI: [137.1, 142.7]
let var_b = model_b.posterior_mean_variance().unwrap();  // 16.1 mmHg²

Decision Rules

Mean comparison:

use aprender::bayesian::NormalInverseGamma;

// Setup from previous example
let patient_a = vec![118.0, 122.0, 120.0, 119.0, 121.0, 120.0, 118.0, 122.0];
let mut model_a = NormalInverseGamma::noninformative();
model_a.update(&patient_a);
let (lower_a, upper_a) = model_a.credible_interval_mu(0.95).unwrap();

let patient_b = vec![135.0, 142.0, 138.0, 145.0, 140.0, 137.0, 143.0, 139.0];
let mut model_b = NormalInverseGamma::noninformative();
model_b.update(&patient_b);
let (lower_b, upper_b) = model_b.credible_interval_mu(0.95).unwrap();

if lower_b > upper_a {
    println!("Patient B has significantly higher BP (95% confidence)");
} else if lower_a > upper_b {
    println!("Patient A has significantly higher BP (95% confidence)");
} else {
    println!("Credible intervals overlap - no clear difference");
}

Variability comparison:

use aprender::bayesian::NormalInverseGamma;

// Setup from previous example
let patient_a = vec![118.0, 122.0, 120.0, 119.0, 121.0, 120.0, 118.0, 122.0];
let mut model_a = NormalInverseGamma::noninformative();
model_a.update(&patient_a);
let var_a = model_a.posterior_mean_variance().unwrap();

let patient_b = vec![135.0, 142.0, 138.0, 145.0, 140.0, 137.0, 143.0, 139.0];
let mut model_b = NormalInverseGamma::noninformative();
model_b.update(&patient_b);
let var_b = model_b.posterior_mean_variance().unwrap();

if var_b > 2.0 * var_a {
    println!("Patient B shows {:.1}x higher BP variability", var_b / var_a);
    println!("High variability may indicate cardiovascular instability.");
}

Interpretation

Output: "Patient B has significantly higher BP than Patient A (95% confidence)"

The credible intervals do NOT overlap: [118.4, 121.6] for A and [137.1, 142.7] for B. Patient B's minimum plausible BP (137.1) exceeds Patient A's maximum (121.6), indicating a clinically significant difference.

Variability: Patient B shows 3.0× higher variance (16.1 vs 5.4 mmHg²), suggesting BP instability that may require medical attention beyond the elevated mean.

Clinical Significance

Patient A: Normal BP (120 mmHg) with stable readings
Patient B: Stage 2 hypertension (140 mmHg) with high variability
Recommendation: Patient B requires immediate intervention (medication, lifestyle changes)

Example 3: Sequential Learning

Problem

Demonstrate how uncertainty about both mean and variance decreases with sequential sensor calibration data.

Solution

Collect temperature readings in batches (true temperature: 25.0°C):

use aprender::bayesian::NormalInverseGamma;

let mut model = NormalInverseGamma::noninformative();

let experiments = vec![
    vec![25.2, 24.8, 25.1, 24.9, 25.0],               // 5 readings
    vec![25.3, 24.7, 25.2, 24.8, 25.1],               // 5 more
    vec![25.0, 25.1, 24.9, 25.2, 24.8, 25.0],         // 6 more
    vec![25.1, 24.9, 25.0, 25.2, 24.8, 25.1, 25.0],  // 7 more
    vec![25.0, 25.1, 24.9, 25.0, 25.2, 24.8, 25.1, 25.0], // 8 more
];

for batch in experiments {
    model.update(&batch);
    let mean = model.posterior_mean_mu();
    let var_mu = model.posterior_variance_mu().unwrap();
    let (lower, upper) = model.credible_interval_mu(0.95).unwrap();
    // Print statistics...
}

Results

Readings	E[μ] (°C)	Var(μ)	E[σ²] (°C²)	95% CI Width (°C)
5	24.995	0.0484	0.2421	0.8625
10	25.008	0.0125	0.1245	0.4374
16	25.005	0.0049	0.0783	0.2743
23	25.008	0.0025	0.0574	0.1958
31	25.009	0.0015	0.0453	0.1499

Interpretation

Observation 1: Posterior mean E[μ] converges to true value (25.0°C)

Observation 2: Variance of mean Var(μ) decreases inversely with sample size

For Normal-InverseGamma: Var(μ | D) = β/(κ(α-1))

As α and κ increase with data, Var(μ) decreases approximately as 1/n.

Observation 3: Estimate of σ² becomes more precise

E[σ²] decreases from 0.24 (n=5) to 0.045 (n=31), converging to the true sensor noise level.

Observation 4: Credible interval width shrinks with √n

The 95% CI width drops from 0.86°C (n=5) to 0.15°C (n=31), reflecting increased certainty.

Practical Application

Sensor calibration: After 31 readings, we know the sensor's mean bias (0.009°C above true) and noise level (σ ≈ 0.21°C) with high precision.

Anomaly detection: Future readings outside [24.79, 25.23]°C (mean ± 2σ at n=31) should trigger recalibration.

Example 4: Prior Comparison

Problem

Demonstrate how different priors affect bivariate posterior inference with limited data.

Solution

Same data ([22.1, 22.5, 22.3, 22.7, 22.4]°C), three different priors:

use aprender::bayesian::NormalInverseGamma;

let measurements = vec![22.1, 22.5, 22.3, 22.7, 22.4];

// 1. Noninformative Prior NIG(0, 1, 1, 1)
let mut noninformative = NormalInverseGamma::noninformative();
noninformative.update(&measurements);
// E[μ] = 22.40°C, E[σ²] = 0.23°C²

// 2. Weakly Informative Prior NIG(22, 1, 3, 2) [μ ≈ 22, σ² ≈ 1]
let mut weak = NormalInverseGamma::new(22.0, 1.0, 3.0, 2.0).unwrap();
weak.update(&measurements);
// E[μ] = 22.33°C, E[σ²] = 0.48°C²

// 3. Informative Prior NIG(20, 10, 10, 5) [strong μ = 20, σ² ≈ 0.56]
let mut informative = NormalInverseGamma::new(20.0, 10.0, 10.0, 5.0).unwrap();
informative.update(&measurements);
// E[μ] = 20.80°C, E[σ²] = 1.28°C²

Results

Prior Type	Prior NIG(μ₀, κ₀, α₀, β₀)	Posterior E[μ]	Posterior E[σ²]
Noninformative	(0, 1, 1, 1)	22.40°C	0.23°C²
Weak	(22, 1, 3, 2)	22.33°C	0.48°C²
Informative	(20, 10, 10, 5)	20.80°C	1.28°C²

Interpretation

Weak priors (Noninformative, Weak): Posterior mean ≈ 22.4°C (sample mean), posterior variance ≈ 0.23-0.48°C² (sample variance ≈ 0.05°C²)

Strong prior (NIG(20, 10, 10, 5)): Posterior pulled strongly toward prior belief (μ = 20°C vs data mean = 22.4°C)

The informative prior has effective sample size κ₀ = 10 for the mean and 2α₀ = 20 for the variance. With only 5 new observations, the prior dominates, pulling E[μ] from 22.4°C down to 20.8°C.

When to Use Strong Priors

Use informative priors for μ when:

Calibrating instruments with known reference standards
Manufacturing processes with historical mean specifications
Medical baselines from large population studies

Use informative priors for σ² when:

Equipment with known precision specifications
Process capability studies with historical variance data
Measurement devices with manufacturer-specified accuracy

Avoid informative priors when:

Exploring novel systems with no historical data
Prior assumptions may be biased or outdated
Stakeholders require purely "data-driven" decisions

Prior Sensitivity Analysis

Run inference with noninformative prior NIG(0, 1, 1, 1)
Run inference with domain-informed prior (e.g., historical mean/variance)
If posteriors differ substantially, collect more data until convergence
With sufficient data (n > 30), all reasonable priors converge (Bernstein-von Mises theorem)

Key Takeaways

1. Bivariate conjugate prior for (μ, σ²)

Hierarchical structure: σ² ~ InverseGamma, μ | σ² ~ Normal
Closed-form posterior updates for both parameters
No MCMC required

2. Credible intervals quantify uncertainty

Separate intervals for μ and σ²
Width decreases with √n as data accumulates
Can construct joint credible regions (ellipses) for (μ, σ²)

3. Sequential updating is natural

Each posterior becomes next prior
Order-independent (commutativity)
Ideal for online learning (sensor monitoring, quality control)

4. Prior choice affects both parameters

κ₀: effective sample size for mean belief
α₀, β₀: shape variance prior distribution
Always perform sensitivity analysis with small n

5. Practical applications

Manufacturing: process mean and precision monitoring
Medical: patient population mean and variability
Sensors: bias (mean) and noise (variance) estimation

6. Advantages over frequentist methods

Direct probability statements: "95% confident μ ∈ [9.97, 10.04]"
Natural handling of small samples (no asymptotic approximations)
Coherent framework for sequential testing

References

Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press. Chapter 7: "The Central, Gaussian or Normal Distribution."
Gelman, A., et al. (2013). Bayesian Data Analysis (3rd ed.). CRC Press. Chapter 3: "Introduction to Multiparameter Models - Normal model with unknown mean and variance."
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press. Chapter 4.6: "Bayesian inference for the parameters of a Gaussian."
Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian Theory. Wiley. Chapter 5.2: "Normal models with conjugate analysis."

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning