Case Study: Beta-Binomial Bayesian Inference

This case study demonstrates Bayesian inference for binary outcomes using conjugate priors. We cover four practical scenarios: coin flip inference, A/B testing, sequential learning, and prior comparison.

Overview

The Beta-Binomial conjugate family is the foundation of Bayesian inference for binary data:

  • Prior: Beta(α, β) distribution over probability parameter θ ∈ [0, 1]
  • Likelihood: Binomial(n, θ) for k successes in n trials
  • Posterior: Beta(α + k, β + n - k) with closed-form update

This enables exact Bayesian inference without numerical integration.

Running the Example

cargo run --example beta_binomial_inference

Expected output: Four demonstrations showing prior specification, posterior updating, credible intervals, and sequential learning.

Example 1: Coin Flip Inference

Problem

You flip a coin 10 times and observe 7 heads. What is the probability that this coin is fair (θ = 0.5)?

Solution

use aprender::bayesian::BetaBinomial;

// Start with uniform prior Beta(1, 1) = complete ignorance
let mut model = BetaBinomial::uniform();
println!("Prior: Beta({}, {})", model.alpha(), model.beta());
println!("  Prior mean: {:.4}", model.posterior_mean());  // 0.5

// Observe 7 heads in 10 flips
model.update(7, 10);

// Posterior is Beta(1+7, 1+3) = Beta(8, 4)
println!("Posterior: Beta({}, {})", model.alpha(), model.beta());
println!("  Posterior mean: {:.4}", model.posterior_mean());  // 0.6667

Posterior Statistics

// Point estimates
let mean = model.posterior_mean();  // E[θ|D] = 8/12 = 0.6667
let mode = model.posterior_mode().unwrap();  // (8-1)/(12-2) = 0.7
let variance = model.posterior_variance();  // ≈ 0.017

// 95% credible interval
let (lower, upper) = model.credible_interval(0.95).unwrap();
// ≈ [0.41, 0.92] - wide interval due to small sample size

// Posterior predictive
let prob_heads = model.posterior_predictive();  // 0.6667

Interpretation

Posterior mean (0.667): Our best estimate is that the coin has a 66.7% chance of heads.

Credible interval [0.41, 0.92]: We are 95% confident that the true probability is between 41% and 92%. This wide interval reflects uncertainty from small sample size.

Posterior predictive (0.667): The probability of heads on the next flip is 66.7%, integrating over all possible values of θ weighted by the posterior.

Is the coin fair?

The credible interval includes 0.5, so we cannot rule out that the coin is fair. With only 10 flips, the data is consistent with a fair coin that happened to land heads 7 times by chance.

Example 2: A/B Testing

Problem

You run an A/B test comparing two website variants:

  • Variant A: 120 conversions out of 1,000 visitors (12% conversion rate)
  • Variant B: 145 conversions out of 1,000 visitors (14.5% conversion rate)

Is Variant B significantly better, or could the difference be due to chance?

Solution

// Variant A: 120 conversions / 1000 visitors
let mut variant_a = BetaBinomial::uniform();
variant_a.update(120, 1000);
let mean_a = variant_a.posterior_mean();  // 0.1208
let (lower_a, upper_a) = variant_a.credible_interval(0.95).unwrap();
// 95% CI: [0.1006, 0.1409]

// Variant B: 145 conversions / 1000 visitors
let mut variant_b = BetaBinomial::uniform();
variant_b.update(145, 1000);
let mean_b = variant_b.posterior_mean();  // 0.1457
let (lower_b, upper_b) = variant_b.credible_interval(0.95).unwrap();
// 95% CI: [0.1239, 0.1675]

Decision Rule

Check if credible intervals overlap:

if lower_b > upper_a {
    println!("✓ Variant B is significantly better (95% confidence)");
} else if lower_a > upper_b {
    println!("✓ Variant A is significantly better (95% confidence)");
} else {
    println!("⚠ No clear winner yet - credible intervals overlap");
    println!("  Consider collecting more data");
}

Interpretation

Output: "No clear winner yet - credible intervals overlap"

The credible intervals overlap: [10.06%, 14.09%] for A and [12.39%, 16.75%] for B. While B appears better (14.57% vs 12.08%), the uncertainty intervals overlap, meaning we cannot conclusively say B is superior.

Recommendation: Collect more data to reduce uncertainty and determine if the 2.5 percentage point difference is real or due to sampling variability.

Bayesian vs Frequentist

Frequentist approach: Run a z-test for proportions, get p-value ≈ 0.02. Conclude "significant at α = 0.05 level."

Bayesian advantage:

  • Direct probability statements: "95% confident B's conversion rate is between 12.4% and 16.8%"
  • Can incorporate prior knowledge (e.g., historical conversion rates)
  • Natural stopping rules: collect data until credible intervals separate
  • No p-value misinterpretation ("p = 0.02" does NOT mean "2% chance hypothesis is true")

Example 3: Sequential Learning

Problem

Demonstrate how uncertainty decreases as we collect more data, even with a consistent underlying success rate.

Solution

Run 5 sequential experiments with true success rate ≈ 77%:

let mut model = BetaBinomial::uniform();

let experiments = vec![
    (7, 10),   // 70% success
    (15, 20),  // 75% success
    (23, 30),  // 76.7% success
    (31, 40),  // 77.5% success
    (77, 100), // 77% success
];

for (successes, trials) in experiments {
    model.update(successes, trials);

    let mean = model.posterior_mean();
    let variance = model.posterior_variance();
    let (lower, upper) = model.credible_interval(0.95).unwrap();
    let width = upper - lower;

    println!("Trials: {}, Mean: {:.3}, Variance: {:.7}, CI Width: {:.4}",
             total_trials, mean, variance, width);
}

Results

TrialsSuccessesMeanVariance95% CI Width
1070.6670.01709400.5125
30220.7190.00612570.3068
60450.7420.00303920.2161
100760.7550.00179640.1661
2001530.7620.00089240.1171

Interpretation

Observation 1: Posterior mean converges to true value (0.762 → 0.77)

Observation 2: Variance decreases inversely with sample size

For Beta(α, β): Var[θ] = αβ / [(α+β)²(α+β+1)]

As α + β (total count) increases, variance decreases approximately as 1/(α+β).

Observation 3: Credible interval width shrinks with √n

The 95% CI width drops from 51% (n=10) to 12% (n=200), reflecting increased certainty.

Practical Application

Early Stopping: If credible intervals separate in A/B test, you can stop early and deploy the winner. No need for fixed sample size planning as in frequentist statistics.

Sample Size Planning: Want 95% CI width < 5%? Solve for α + β ≈ 400 (200 trials).

Example 4: Prior Comparison

Problem

Demonstrate how different priors affect the posterior with limited data.

Solution

Same data (7 successes in 10 trials), three different priors:

// 1. Uniform Prior Beta(1, 1)
let mut uniform = BetaBinomial::uniform();
uniform.update(7, 10);
// Posterior: Beta(8, 4), mean = 0.6667

// 2. Jeffrey's Prior Beta(0.5, 0.5)
let mut jeffreys = BetaBinomial::jeffreys();
jeffreys.update(7, 10);
// Posterior: Beta(7.5, 3.5), mean = 0.6818

// 3. Informative Prior Beta(50, 50) - strong 50% belief
let mut informative = BetaBinomial::new(50.0, 50.0).unwrap();
informative.update(7, 10);
// Posterior: Beta(57, 53), mean = 0.5182

Results

Prior TypePriorPosteriorPosterior Mean
UniformBeta(1, 1)Beta(8, 4)0.6667
Jeffrey'sBeta(0.5, 0.5)Beta(7.5, 3.5)0.6818
InformativeBeta(50, 50)Beta(57, 53)0.5182

Interpretation

Weak priors (Uniform, Jeffrey's): Posterior dominated by data (≈67% mean)

Strong prior (Beta(50, 50)): Posterior pulled toward prior belief (51.8% vs 66.7%)

The informative prior Beta(50, 50) encodes a strong belief that θ ≈ 0.5 with effective sample size of 100. With only 10 new observations, the prior dominates, pulling the posterior mean from 0.667 down to 0.518.

When to Use Strong Priors

Use informative priors when:

  • You have reliable historical data
  • Expert domain knowledge is available
  • Rare events require regularization
  • Hierarchical learning across related tasks

Avoid informative priors when:

  • No reliable prior knowledge exists
  • Prior assumptions may be wrong
  • Stakeholders require "data-driven" decisions
  • Exploring novel domains

Prior Sensitivity Analysis

Always check robustness:

  1. Run inference with weak prior (Beta(1, 1))
  2. Run inference with strong prior (Beta(50, 50))
  3. If posteriors differ substantially, collect more data until they converge

With enough data, all reasonable priors converge to the same posterior (Bayesian consistency).

Key Takeaways

1. Conjugate priors enable closed-form updates

  • No MCMC or numerical integration required
  • Efficient for real-time sequential updating (online learning)

2. Credible intervals quantify uncertainty

  • Direct probability statements about parameters
  • Width decreases with √n as data accumulates

3. Sequential updating is natural in Bayesian framework

  • Each posterior becomes the next prior
  • Final result is order-independent

4. Prior choice matters with small data

  • Weak priors: let data speak
  • Strong priors: incorporate domain knowledge
  • Always perform sensitivity analysis

5. Bayesian A/B testing avoids p-value pitfalls

  • No arbitrary α = 0.05 threshold
  • Natural early stopping rules
  • Direct decision-theoretic framework

References

  1. Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press. Chapter 6: "Elementary Parameter Estimation."

  2. Gelman, A., et al. (2013). Bayesian Data Analysis (3rd ed.). CRC Press. Chapter 2: "Single-parameter Models."

  3. Kruschke, J. K. (2014). Doing Bayesian Data Analysis (2nd ed.). Academic Press. Chapter 6: "Inferring a Binomial Probability via Exact Mathematical Analysis."

  4. VanderPlas, J. (2014). "Frequentism and Bayesianism: A Python-driven Primer." arXiv:1411.5018. Excellent comparison of paradigms with code examples.