Case Study: Federation Routing Policies

This case study demonstrates intelligent routing policies for distributed ML inference. Each policy evaluates candidates and contributes to a composite score that determines the optimal node for each request.

Overview

Routing policies answer the question: "Given multiple nodes that can handle this request, which one should we use?"

The federation gateway supports five built-in policies:

PolicyPurposeDefault Weight
HealthPenalize unhealthy nodes2.0
LatencyPrefer fast nodes1.0
PrivacyEnforce data sovereignty1.0
LocalityPrefer same-region nodes1.0
CostBalance price vs performance1.0

Running the Example

cargo run -p apr-cli --features inference --example federation_routing

Health Policy

The health policy strongly penalizes unhealthy or degraded nodes:

use apr_cli::federation::policy::HealthPolicy;
use apr_cli::federation::traits::RoutingPolicyTrait;

let policy = HealthPolicy {
    weight: 2.0,           // Double importance
    healthy_score: 1.0,    // Full score for healthy
    degraded_score: 0.3,   // 30% for degraded
};

// Scoring
// Healthy node:  1.0 * 2.0 = 2.0
// Degraded node: 0.3 * 2.0 = 0.6
// Unhealthy:     0.0 * 2.0 = 0.0 (not eligible)

Health States

StateDescriptionScore
HealthyAll checks passing1.0
DegradedSome issues but operational0.3-0.5
UnhealthyNode failing, excluded0.0
UnknownNo recent health data0.3

Latency Policy

Scores nodes inversely proportional to their latency:

use apr_cli::federation::policy::LatencyPolicy;
use std::time::Duration;

let policy = LatencyPolicy {
    weight: 1.0,
    max_latency: Duration::from_secs(5),  // Nodes above this get score 0
};

// Scoring formula: 1.0 - (latency_ms / max_ms)
//
// Example with max_latency = 5000ms:
//   45ms  → 1.0 - (45/5000)   = 0.991
//   120ms → 1.0 - (120/5000)  = 0.976
//   200ms → 1.0 - (200/5000)  = 0.960
//   4000ms → 1.0 - (4000/5000) = 0.200
//   5000ms+ → 0.0 (not eligible)

Eligibility

Nodes with latency exceeding max_latency are excluded from routing:

// This node is NOT eligible
assert!(!policy.is_eligible(&slow_candidate, &request));

Privacy Policy

Enforces data sovereignty by filtering nodes based on privacy levels:

use apr_cli::federation::policy::PrivacyPolicy;
use apr_cli::federation::traits::{PrivacyLevel, RegionId};

let policy = PrivacyPolicy::default()
    .with_region(RegionId("eu-west-1".to_string()), PrivacyLevel::Confidential)
    .with_region(RegionId("us-east-1".to_string()), PrivacyLevel::Internal)
    .with_region(RegionId("ap-south-1".to_string()), PrivacyLevel::Public);

Privacy Levels

LevelDescriptionExample Use
PublicNo restrictionsPublic APIs, demos
InternalCompany dataInternal tools
ConfidentialSensitive dataPII, financial
RestrictedHighest securityHealthcare, government

Eligibility Matrix

Request privacy level determines which nodes are eligible:

RequestPublic RegionInternal RegionConfidential Region
Public
Internal
Confidential
// Request requires confidential handling
let request = InferenceRequest {
    qos: QoSRequirements {
        privacy: PrivacyLevel::Confidential,
        ..Default::default()
    },
    ..Default::default()
};

// Only eu-west-1 is eligible (Confidential region)
assert!(policy.is_eligible(&eu_candidate, &request));
assert!(!policy.is_eligible(&us_candidate, &request));
assert!(!policy.is_eligible(&ap_candidate, &request));

Locality Policy

Prefers nodes in the same region as the request origin:

use apr_cli::federation::policy::LocalityPolicy;

let policy = LocalityPolicy {
    weight: 1.0,
    same_region_boost: 0.3,      // +30% for same region
    cross_region_penalty: 0.1,   // -10% for cross region
};

// If request originates from us-west-2:
//   us-west node: base + 0.3 = higher score
//   eu-west node: base - 0.1 = lower score

Benefits

  • Reduced network latency
  • Lower data transfer costs
  • Compliance with data residency requirements

Cost Policy

Balances cost versus performance based on user tolerance:

use apr_cli::federation::policy::CostPolicy;

let policy = CostPolicy::default()
    .with_region_cost(RegionId("us-west-2".to_string()), 0.8)   // Expensive GPU
    .with_region_cost(RegionId("eu-west-1".to_string()), 0.6)   // Mid-tier
    .with_region_cost(RegionId("ap-south-1".to_string()), 0.3); // Budget CPU

Cost Tolerance

The cost_tolerance field in QoS requirements controls the tradeoff:

ToleranceBehavior
0-30Strongly prefer cheap nodes
31-50Balanced
51-70Prefer performance
71-100Accept premium for best performance
// Budget-conscious request
let cheap_request = InferenceRequest {
    qos: QoSRequirements {
        cost_tolerance: 20,  // Strongly prefer cheap
        ..Default::default()
    },
    ..Default::default()
};

// Premium request (willing to pay for speed)
let premium_request = InferenceRequest {
    qos: QoSRequirements {
        cost_tolerance: 80,  // Accept expensive nodes
        ..Default::default()
    },
    ..Default::default()
};

Composite Policy

Combines all policies with weighted scoring:

use apr_cli::federation::policy::CompositePolicy;

// Enterprise default combines all policies
let policy = CompositePolicy::enterprise_default();

// Custom composition
let custom = CompositePolicy::new()
    .with_policy(HealthPolicy { weight: 3.0, ..Default::default() })  // Triple health weight
    .with_policy(LatencyPolicy { weight: 2.0, ..Default::default() }) // Double latency weight
    .with_policy(PrivacyPolicy::default())
    .with_policy(CostPolicy::default());

Scoring Formula

total_score = average(policy₁.score, policy₂.score, ..., policyₙ.score)

Where each policy's score is already weighted internally.

Eligibility

A candidate must pass ALL policy eligibility checks:

impl RoutingPolicyTrait for CompositePolicy {
    fn is_eligible(&self, candidate: &RouteCandidate, request: &InferenceRequest) -> bool {
        // Must pass ALL policies
        self.policies.iter().all(|p| p.is_eligible(candidate, request))
    }
}

Custom Policies

Implement RoutingPolicyTrait for custom routing logic:

use apr_cli::federation::traits::{
    RoutingPolicyTrait, RouteCandidate, InferenceRequest,
};

struct TenantAffinityPolicy {
    weight: f64,
    tenant_preferences: HashMap<String, String>,  // tenant_id -> preferred_node
}

impl RoutingPolicyTrait for TenantAffinityPolicy {
    fn score(&self, candidate: &RouteCandidate, request: &InferenceRequest) -> f64 {
        if let Some(tenant_id) = &request.tenant_id {
            if let Some(preferred) = self.tenant_preferences.get(tenant_id) {
                if candidate.target.node_id.0 == *preferred {
                    return 1.0 * self.weight;  // Strong boost for preferred node
                }
            }
        }
        0.5 * self.weight  // Neutral for non-preferred
    }

    fn is_eligible(&self, _candidate: &RouteCandidate, _request: &InferenceRequest) -> bool {
        true  // Affinity is a preference, not a hard requirement
    }

    fn name(&self) -> &str {
        "tenant_affinity"
    }
}

Testing Policies

#[test]
fn test_latency_policy_scoring() {
    let policy = LatencyPolicy::default();
    let request = mock_request();

    let fast = mock_candidate(100, 1.0);   // 100ms latency
    let slow = mock_candidate(4000, 1.0);  // 4000ms latency

    let fast_score = policy.score(&fast, &request);
    let slow_score = policy.score(&slow, &request);

    assert!(fast_score > slow_score);
    assert!(fast_score > 0.9);  // Fast node scores high
}

#[test]
fn test_privacy_policy_eligibility() {
    let policy = PrivacyPolicy::default()
        .with_region(RegionId("eu".to_string()), PrivacyLevel::Confidential)
        .with_region(RegionId("us".to_string()), PrivacyLevel::Public);

    let mut request = mock_request();
    request.qos.privacy = PrivacyLevel::Confidential;

    // EU meets confidential requirement
    assert!(policy.is_eligible(&eu_candidate, &request));
    // US is public, doesn't meet confidential
    assert!(!policy.is_eligible(&us_candidate, &request));
}

Best Practices

  1. Tune weights for your use case - Production workloads may need different weights
  2. Monitor policy decisions - Log which policies influenced routing
  3. Test edge cases - Verify behavior when all nodes are degraded
  4. Consider fairness - Ensure no node gets starved of traffic
  5. Update region costs - Keep cost data current

Further Reading