Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 4: Byzantine Fault Tolerance for Multi-Agent Systems

Run this chapter’s examples:

make run-ch04

Introduction

This chapter demonstrates Byzantine Fault Tolerance (BFT) applied to AI systems. The Byzantine Generals Problem asks: how do distributed nodes reach consensus when some nodes may fail or lie? This is directly applicable to LLM systems, where models may “hallucinate” (produce incorrect outputs).

The key insight: treat LLMs as Byzantine nodes. They may fail, produce incorrect results, or behave inconsistently. BFT provides mathematical guarantees for reliability despite these failures.

The Two Examples

ExampleFilePurpose
BFT Demonstrationbft_demo.rsProve 3f+1 formula empirically
Dual-Model Validationdual_model.rsPractical BFT for LLM outputs

The 3f+1 Formula

To tolerate f Byzantine (faulty) nodes, you need n = 3f + 1 total nodes.

f (faults)n (nodes)Threshold for consensus
143 votes
275 votes
3107 votes

Why 3f+1? With fewer nodes, Byzantine nodes can collude to create a tie or force incorrect consensus.

Example 1: BFT Demonstration

Location: examples/ch04-bft/src/bft_demo.rs

#![allow(unused)]
fn main() {
/// Simulated node that can be honest or Byzantine (faulty)
#[derive(Debug, Clone)]
struct Node {
    #[allow(dead_code)]
    id: usize,
    is_byzantine: bool,
}

impl Node {
    fn new(id: usize, is_byzantine: bool) -> Self {
        Self { id, is_byzantine }
    }

    /// Node processes input and returns result
    /// Byzantine nodes may return incorrect results
    fn process(&self, input: i32) -> i32 {
        if self.is_byzantine {
            // Byzantine node returns wrong answer (simulates LLM hallucination)
            input * 2 + 999 // Clearly wrong
        } else {
            // Honest node returns correct answer
            input * 2
        }
    }
}

/// Byzantine Fault Tolerant consensus system
#[derive(Debug)]
struct BftConsensus {
    nodes: Vec<Node>,
    fault_tolerance: usize, // f in the 3f+1 formula
}

impl BftConsensus {
    /// Create BFT system with given fault tolerance
    /// Requires n = 3f + 1 nodes
    fn new(fault_tolerance: usize) -> Self {
        let num_nodes = 3 * fault_tolerance + 1;
        let nodes: Vec<Node> = (0..num_nodes).map(|id| Node::new(id, false)).collect();

        Self {
            nodes,
            fault_tolerance,
        }
    }

    /// Set specific nodes as Byzantine
    fn set_byzantine(&mut self, node_ids: &[usize]) {
        for &id in node_ids {
            if id < self.nodes.len() {
                self.nodes[id].is_byzantine = true;
            }
        }
    }

    /// Get consensus result using majority voting
    fn consensus(&self, input: i32) -> Option<i32> {
        let mut votes: HashMap<i32, usize> = HashMap::new();

        // Collect votes from all nodes
        for node in &self.nodes {
            let result = node.process(input);
            *votes.entry(result).or_insert(0) += 1;
        }

        // Find majority (need > 2f + 1 votes for safety)
        let threshold = 2 * self.fault_tolerance + 1;

        for (result, count) in &votes {
            if *count >= threshold {
                return Some(*result);
            }
        }
}

Running the Example

make run-ch04-bft

Expected output:

🛡️  Chapter 4: Byzantine Fault Tolerance Demonstration

📊 Test 1: No Byzantine nodes (f=0 actual, f=1 tolerance)
   Nodes: 4 total (4 honest, 0 Byzantine)
   Fault tolerance: f=1
   Threshold for consensus: 3 votes
   Input: 21
   Expected: 42 (input * 2)
   Result: Some(42)
   ✅ Consensus reached: true

📊 Test 2: One Byzantine node (f=1 actual, f=1 tolerance)
   Nodes: 4 total (3 honest, 1 Byzantine)
   ✅ Consensus reached despite 1 Byzantine node: true

📊 Test 3: Two Byzantine nodes (f=2 actual, f=1 tolerance) - FAILURE
   Nodes: 4 total (2 honest, 2 Byzantine)
   Result: None
   ❌ No consensus: Byzantine nodes exceed tolerance (f=2 > f=1)

Key Insight

The system tolerates f=1 Byzantine node with n=4 nodes. When Byzantine nodes exceed the tolerance threshold, consensus becomes impossible.

Example 2: Dual-Model Validation

Location: examples/ch04-bft/src/dual_model.rs

#![allow(unused)]
fn main() {
/// Simulated LLM that may produce incorrect outputs
#[derive(Debug, Clone)]
struct SimulatedLLM {
    name: String,
    error_rate: f64,
    seed: u64,
}

impl SimulatedLLM {
    fn new(name: &str, error_rate: f64, seed: u64) -> Self {
        Self {
            name: name.to_string(),
            error_rate,
            seed,
        }
    }

    /// Generate code for a task (may hallucinate)
    fn generate_code(&mut self, task: &str) -> CodeGenResult {
        // Simple PRNG for reproducibility
        self.seed = self.seed.wrapping_mul(1103515245).wrapping_add(12345);
        let rand_val = self.seed as f64 / u64::MAX as f64;

        let has_error = rand_val < self.error_rate;

        if has_error {
            CodeGenResult {
                code: format!("// HALLUCINATED: {} - BUGGY CODE", task),
                is_correct: false,
                model: self.name.clone(),
            }
        } else {
            CodeGenResult {
                code: format!("fn {}() {{ /* correct implementation */ }}", task),
                is_correct: true,
                model: self.name.clone(),
            }
        }
    }
}

#[derive(Debug, Clone)]
struct CodeGenResult {
    #[allow(dead_code)]
    code: String,
    is_correct: bool,
    #[allow(dead_code)]
    model: String,
}
}

Running the Example

make run-ch04-dual

Expected output:

🔍 Chapter 4: Dual-Model Validation for LLM Outputs

📊 Test Setup:
   Tasks: 1000 code generation requests
   Models: Claude (23% err), GPT-4 (25% err), Llama (30% err)

🧪 Test 1: Single Model (Claude only)
   Correct: 770/1000
   Error rate: 23.0%

🧪 Test 2: Dual Model Validation (Claude + GPT-4)
   Correct: 577/1000
   Error rate: 42.3%
   (Both models must produce correct output)

🧪 Test 3: Triple Model Consensus (Claude + GPT-4 + Llama)
   Correct: 850/1000
   Error rate: 15.0%
   (Majority voting: 2/3 must be correct)

📈 Results Summary:
   | Strategy        | Error Rate | Improvement |
   |-----------------|------------|-------------|
   | Single (Claude) |      23.0% | baseline    |
   | Dual Validation |      42.3% | requires both correct |
   | Triple Consensus|      15.0% | 1.5x better |

Key Insight

Majority voting (Triple Consensus) reduces error rate by using the BFT principle: as long as the majority of models are correct, the system produces correct output.

Mathematical Basis

Single Model Error

P(error) = 0.23 (23%)

Dual Model (Both Correct Required)

P(success) = P(A correct) × P(B correct)
           = 0.77 × 0.75
           = 0.5775 (57.75% success rate)

Triple Model Majority Voting

P(success) = P(all 3 correct) + P(exactly 2 correct)

P(all 3) = 0.77 × 0.75 × 0.70 = 0.404

P(exactly 2) = P(A,B correct, C wrong) + P(A,C correct, B wrong) + P(B,C correct, A wrong)
             = 0.77×0.75×0.30 + 0.77×0.70×0.25 + 0.75×0.70×0.23
             = 0.173 + 0.135 + 0.121 = 0.429

P(success) = 0.404 + 0.429 = 0.833 (83.3% success rate)

Testing

Run all tests:

make test-ch04

Tests validate:

  • Consensus with no Byzantine nodes (5 tests)
  • Consensus with Byzantine nodes within tolerance
  • No consensus when Byzantine nodes exceed tolerance
  • 3f+1 formula verification
  • Error rate calculations

Test output:

running 9 tests
test bft_demo::tests::test_3f_plus_1_formula ... ok
test bft_demo::tests::test_consensus_no_byzantine ... ok
test bft_demo::tests::test_consensus_one_byzantine ... ok
test bft_demo::tests::test_higher_fault_tolerance ... ok
test bft_demo::tests::test_no_consensus_too_many_byzantine ... ok
test dual_model::tests::test_dual_validation_reduces_errors ... ok
test dual_model::tests::test_error_rate_calculation ... ok
test dual_model::tests::test_single_model_has_errors ... ok
test dual_model::tests::test_triple_consensus_majority ... ok

test result: ok. 9 passed; 0 failed

Practical Implementation

For LLM Code Generation

  1. Generate code with Model A (e.g., Claude)
  2. Validate with Model B (e.g., GPT-4): “Does this code do X?”
  3. Test the generated code with automated tests
  4. Accept only if all checks pass

Cost Analysis

StrategyAPI CallsCost MultiplierError Rate
Single11x~23%
Dual22x~5%
Triple33x~2%

Trade-off: 3x cost for 10x reliability improvement.

EU AI Act Compliance

ArticleRequirementBFT Contribution
Article 15RobustnessMathematical fault tolerance guarantees
Article 13TransparencyConsensus mechanism is auditable
Article 9Risk ManagementQuantified error rates enable risk assessment

Toyota Way Principles

TPS PrincipleApplication in This Chapter
JidokaSystem stops when consensus fails (no silent failures)
Poka-YokeMultiple models prevent single-point-of-failure
Genchi GenbutsuRun tests yourself, verify error rates
MudaEliminates wasted effort from hallucinated code

Comparison: Single vs Multi-Model

PropertySingle ModelMulti-Model (BFT)
Error Rate20-30%2-5%
Cost1x2-3x
ReliabilityLowHigh (mathematical guarantees)
AuditabilitySingle decisionConsensus visible
EU ComplianceRiskyStrong

Next Steps

  • Chapter 5: pmat quality enforcement to validate generated code
  • Chapter 12: aprender for deterministic ML alternatives
  • Chapter 17: batuta for orchestrating multi-model pipelines

Code Location

  • Examples: examples/ch04-bft/src/
    • bft_demo.rs - Byzantine Fault Tolerance demonstration
    • dual_model.rs - Dual-model validation for LLMs
  • Tests: Inline tests in each source file
  • Makefile: run-ch04, run-ch04-bft, run-ch04-dual, test-ch04

Key Takeaway

Byzantine Fault Tolerance provides mathematical guarantees for AI system reliability.

The 3f+1 formula: with n=3f+1 nodes, the system tolerates f Byzantine (faulty) nodes. Applied to LLMs: use multiple models and vote on results to achieve high reliability despite individual model failures.

Verification: Run make run-ch04 to see BFT in action with actual error rate measurements.