Skip to content

Provider Routing

Provider routing allows intelligent selection and fallback across multiple LLM providers.

Overview

Provider routing enables:

  • Multi-Provider: Use multiple LLM providers simultaneously
  • Cost Optimization: Route to cheapest provider
  • Latency Optimization: Route to fastest provider
  • Failover: Automatic fallback on errors
  • Load Balancing: Distribute across providers

Configuration

Rust
pub struct ProviderRoutingConfig {
    pub providers: Vec<ProviderSpec>,
    pub strategy: RoutingStrategy,
    pub fallback_enabled: bool,
    pub max_retries: usize,
}

pub enum RoutingStrategy {
    RoundRobin,
    LeastLatency,
    LowestCost,
    Random,
}

Usage

Rust
let openai = ProviderSpec::new("openai", "gpt-4o")
    .with_priority(1);

let anthropic = ProviderSpec::new("anthropic", "claude-3-5-sonnet")
    .with_priority(2);

let config = ProviderRoutingConfig::new(openai, vec![anthropic])
    .with_strategy(RoutingStrategy::RoundRobin);

let router = ModelClientRouter::new(config)?;

Routing Strategies

RoundRobin

Alternate between providers:

Text Only
Request 1 → OpenAI
Request 2 → Anthropic
Request 3 → OpenAI

LeastLatency

Use fastest responding provider:

Text Only
OpenAI: 200ms
Anthropic: 150ms (selected)

LowestCost

Use cheapest provider:

Text Only
GPT-4: $0.03/1K tokens
Claude: $0.01/1K tokens (selected)

Random

Random provider selection.

Failover

Automatic fallback on errors:

Text Only
Request → OpenAI (fails)
       → Anthropic (fallback, succeeds)

Configure with:

Rust
config.fallback_enabled = true;
config.max_retries = 3;

Load Balancing

Distribute across providers to avoid rate limits:

Text Only
10 concurrent requests
├─ 5 to OpenAI (50%)
└─ 5 to Anthropic (50%)

Cost Tracking

Monitor spending across providers:

Rust
let usage = router.get_usage_stats();
println!("OpenAI: {} tokens, ${}", 
    usage.openai_tokens, 
    usage.openai_cost);

Implementation Patterns

See MARS Configuration for integration examples.