RSA (Reinforced Self-Aggregation)¶

RSA is a population-based, evolutionary strategy that iteratively improves solution quality by maintaining a population, selecting the best solutions, refining them, and cycling to improve overall population quality. Similar to genetic algorithms, it balances exploration and exploitation.

How It Works¶

Algorithm Overview¶

Text Only

┌─────────────────────────────────────────┐
│   Initial Population of N Solutions     │
└────────────┬────────────────────────────┘
             │
             ▼
    ┌─────────────────────┐
    │  Iteration 1 to T   │
    │                     │
    │  1. Select K best   │
    │  2. Refine them     │
    │  3. Add to pop.     │
    │  4. Trim to N       │
    │  5. Keep best ever  │
    │                     │
    └────────┬────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│   Return Best Solution from Final Pop   │
└─────────────────────────────────────────┘

Key Phases¶

Initialization: Start with N diverse solutions
Selection Loop (per iteration):
Select K best solutions based on criterion
Refine them using configured strategy
Add refined solutions back to population
Maintain population size through trimming
Preserve best solution ever found (elitism)

Selection Criteria¶

BestScore (Default)¶

Select solutions with highest verification scores.

Best for: When verification scores are reliable indicators of quality Trade-off: Greedy approach, may lose diversity

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::BestScore);

Diversity¶

Select solutions with varying reasoning lengths.

Best for: Maintaining population diversity across iterations Trade-off: May not select best solutions

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::Diversity);

Thoroughness¶

Select solutions with longest reasoning.

Best for: When more detailed reasoning correlates with quality Trade-off: Biased toward verbose responses

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::Thoroughness);

Random¶

Random selection from population.

Best for: Exploration and avoiding local optima Trade-off: Less guided optimization

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::Random);

Tournament ⭐ (Recommended)¶

Tournament-style selection (3-way competition).

Best for: Balancing quality and diversity Trade-off: Slight randomness in each tournament

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::Tournament);

Synthesis¶

Combine reasoning from selected solutions.

Process: Merge reasoning paths, select consensus answer Best for: Leveraging multiple perspectives Result: Synthesized solution combining best aspects

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_refinement_strategy(RefinementStrategy::Synthesis);

Merge¶

Create variants by combining solution components.

Process: Mix elements from selected solutions Best for: Exploring solution space around good answers Result: Multiple variants for further exploration

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_refinement_strategy(RefinementStrategy::Merge);

Iterative¶

Focus improvement on the best solution.

Process: Enhance highest-scoring solution Best for: Refining good solutions toward excellence Result: Slightly improved version of best solution

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_refinement_strategy(RefinementStrategy::Iterative);

Ensemble ⭐ (Recommended)¶

Ensemble approach combining selected solutions.

Process: Average reasoning and synthesize answer Best for: Robust improvement across iterations Result: Consensus solution from ensemble

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_refinement_strategy(RefinementStrategy::Ensemble);

Configuration¶

Basic Configuration¶

Rust

use optillm_mars::{RSAConfig, SelectionCriterion, RefinementStrategy};

// Start with 5 initial solutions
// Select 3 for refinement each iteration
// Run 2 iterations
let config = RSAConfig::new(5, 3, 2);

// Run RSA
let (best_solution, metadata) = optillm_mars::RSAAggregator::run_rsa(
    &initial_solutions,
    config,
)?;

println!("Best answer: {}", best_solution.answer);
println!("Score: {:.2}", best_solution.verification_score);

Advanced Configuration¶

Rust

let config = RSAConfig::new(7, 4, 3)
    .with_selection_criterion(SelectionCriterion::Tournament)
    .with_refinement_strategy(RefinementStrategy::Ensemble)
    .with_elitism(true);

let (best_solution, metadata) = optillm_mars::RSAAggregator::run_rsa(
    &initial_solutions,
    config,
)?;

// Analyze results
let stats = optillm_mars::RSAAggregator::get_statistics(&metadata);
println!("Iterations: {}", stats.total_iterations);
println!("Best iteration: {}", stats.best_iteration);
println!("Total improvement: {:.2}", stats.total_improvement);

Use Cases¶

When you have multiple initial solutions and want iterative improvement.

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::Tournament)
    .with_refinement_strategy(RefinementStrategy::Ensemble);

let (refined, _) = RSAAggregator::run_rsa(&initial_solutions, config)?;

Population Improvement ✓✓ (Good)¶

When you want to improve a diverse population of answers.

Rust

let config = RSAConfig::new(10, 5, 3)
    .with_selection_criterion(SelectionCriterion::Diversity);

let (best, _) = RSAAggregator::run_rsa(&population, config)?;

Quality Enhancement ✓✓ (Good)¶

When you want to boost already-good solutions.

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_refinement_strategy(RefinementStrategy::Iterative);

let (enhanced, _) = RSAAggregator::run_rsa(&solutions, config)?;

Consensus Building ✓ (Moderate)¶

When you want to find common ground across solutions.

Rust

let config = RSAConfig::new(7, 4, 2)
    .with_refinement_strategy(RefinementStrategy::Synthesis);

let (consensus, _) = RSAAggregator::run_rsa(&solutions, config)?;

Integration with MARS¶

Use RSA within the MARS pipeline for post-processing:

Rust

use optillm_mars::{Aggregator, RSAConfig, SelectionCriterion};

// Run initial generation and verification
// Then apply RSA to improve population

let rsa_config = RSAConfig::new(5, 3, 2)
    .with_selection_criterion(SelectionCriterion::Tournament);

let refined = Aggregator::aggregate_rsa(&verified_solutions, rsa_config)?;

Performance Characteristics¶

Token Usage¶

Cost: No additional tokens (uses existing solutions)
Computation: O(I × K) where I = iterations, K = selection size
Memory: O(P) where P = population size

Quality Improvement¶

Per iteration: 2-5% typical improvement
Over 3 iterations: 5-15% cumulative improvement
Best case: 20-30% improvement with good initial solutions

Convergence¶

Typical: Improvement slows after 2-3 iterations
Max useful iterations: 3-5 (diminishing returns)
Population diversity: Critical for continued improvement

Advantages vs Disadvantages¶

Advantages¶

✓ No additional API calls (uses existing solutions) ✓ Iterative improvement with transparent progress ✓ Flexible selection and refinement strategies ✓ Maintains solution diversity through elitism ✓ Works with any population of solutions ✓ Can be combined with other strategies ✓ Transparent iteration-by-iteration improvement

Disadvantages¶

✗ Requires good initial population to start ✗ Diminishing returns after few iterations ✗ Computationally intensive (O(I×P×K)) ✗ No new reasoning paths generated ✗ Depends on quality of initial solutions ✗ May get stuck in local optima

Comparison with Other Strategies¶

Strategy	Type	Initial Cost	Iterations	Improvement
Best-of-N	Selection	Nx	0	15-30%
Self-Consistency	Voting	Kx	0	25-40%
MOA	Synthesis	3x	0	20-35%
RSA	Evolutionary	0x	I×K	5-15%
MARS	Full Pipeline	10-20x	2-3	40-60%

Benchmarks¶

Performance on Standard Tasks¶

With Good Initial Population (5-7 solutions) - Baseline quality: 75% - After RSA (2 iterations): 80% (+6.7%) - After RSA (3 iterations): 82% (+9.3%)

Math Problem Refinement - Initial: 65% - After 1 iteration: 68% - After 2 iterations: 72% - After 3 iterations: 75%

Consensus Building - 5 diverse solutions: average answer agreement 40% - After RSA synthesis: agreement increases to 60-70%

Cost Optimization¶

1. Minimize Iterations for Quick Improvement¶

Rust

// Fast: 1 iteration for quick win
let config = RSAConfig::new(5, 3, 1);

// Thorough: 3 iterations for better quality
let config = RSAConfig::new(5, 3, 3);

2. Match Population and Selection Sizes¶

Rust

// Fast track: smaller selections
let config = RSAConfig::new(5, 2, 2);

// Deep exploration: larger selections
let config = RSAConfig::new(7, 5, 3);

Rust

// Synthesis is most computation-intensive
// Merge is lighter weight
let config = RSAConfig::new(5, 3, 2)
    .with_refinement_strategy(RefinementStrategy::Merge);

4. Disable Elitism for Faster Convergence¶

Rust

let config = RSAConfig::new(5, 3, 2)
    .with_elitism(false);

Troubleshooting¶

Quality Not Improving¶

Problem: Iterations don't improve score
Solutions:
Check initial solution quality (need at least some good solutions)
Try different selection criterion (Tournament, not just BestScore)
Use Ensemble refinement instead of Merge
Increase iterations (more time to find improvements)

Too Many Iterations Without Improvement¶

Problem: Running 5+ iterations with no gain
Solutions:
Reduce iterations to 2-3 (diminishing returns)
Initial population quality too low
Try Tournament selection for more exploration
Disable elitism to allow more variation

Population Diversity Lost¶

Problem: All solutions become similar
Solutions:
Use Diversity selection criterion
Increase selection size (K > N/2)
Use Random selection mixed with BestScore
Disable elitism to allow population churn

Example: Complete Application¶

Rust

use optillm_mars::{
    RSAAggregator, RSAConfig,
    SelectionCriterion, RefinementStrategy
};

#[tokio::main]
async fn main() -> Result<()> {
    // Assume you have 5-10 initial solutions from other strategies
    let initial_solutions = generate_initial_solutions().await?;

    // Configure RSA for iterative improvement
    let config = RSAConfig::new(7, 4, 3)
        .with_selection_criterion(SelectionCriterion::Tournament)
        .with_refinement_strategy(RefinementStrategy::Ensemble)
        .with_elitism(true);

    // Run RSA optimization
    let (best_solution, metadata) = RSAAggregator::run_rsa(
        &initial_solutions,
        config,
    )?;

    // Display results
    println!("Initial best: {:.2}", initial_solutions[0].verification_score);
    println!("Final best: {:.2}", best_solution.verification_score);
    println!("Improvement: {:.2}", metadata.best_solution_score - initial_solutions[0].verification_score);

    // Analyze iteration-by-iteration progress
    for stat in &metadata.iteration_history {
        println!(
            "Iteration {}: best={:.2}, avg={:.2}",
            stat.iteration, stat.best_score, stat.avg_score
        );
    }

    println!("Population diversity: {:.2}", metadata.population_diversity);
    println!("Final population: {} solutions", metadata.final_population_size);

    Ok(())
}

Next Steps¶

Start with 5-10 initial solutions (from Best-of-N, MOA, etc.)
Configure RSA with Tournament selection
Use Ensemble refinement for robustness
Run 2-3 iterations initially
Monitor improvement across iterations
Adjust based on results
Combine with other strategies in pipeline

Best-of-N: Initial selection (feed into RSA)
Self-Consistency: Consensus voting (can precede RSA)
MOA: Synthesis-based (alternative to RSA refinement)
MARS: Full pipeline (RSA can be part of aggregation phase)

Algorithm Details¶

Population Management¶

Keep population at fixed size N
Replace worst solutions with refined ones
Elitism: preserve best-ever solution

Selection Methods¶

BestScore: Top K by verification score
Diversity: K solutions with different reasoning lengths
Thoroughness: K longest reasoning paths
Random: Random K solutions
Tournament: K winners of 3-way competitions

Synthesis: Combine all reasoning, consensus answer
Merge: Create variants from selected solutions
Iterative: Improve best solution incrementally
Ensemble: Average approach with consensus

References¶

Population-based optimization strategies
Genetic algorithms and evolutionary computation
Multi-start optimization with local search

RSA (Reinforced Self-Aggregation)¶

How It Works¶

Algorithm Overview¶

Key Phases¶

Selection Criteria¶

BestScore (Default)¶

Diversity¶

Thoroughness¶

Random¶

Tournament ⭐ (Recommended)¶

Refinement Strategies¶

Synthesis¶

Merge¶

Iterative¶

Ensemble ⭐ (Recommended)¶

Configuration¶

Basic Configuration¶

Advanced Configuration¶

Use Cases¶

Multi-Turn Reasoning Refinement ✓✓✓ (Excellent)¶

Population Improvement ✓✓ (Good)¶

Quality Enhancement ✓✓ (Good)¶

Consensus Building ✓ (Moderate)¶

Integration with MARS¶

Performance Characteristics¶

Token Usage¶

Quality Improvement¶

Convergence¶

Advantages vs Disadvantages¶

Advantages¶

Disadvantages¶

Comparison with Other Strategies¶

Benchmarks¶

Performance on Standard Tasks¶

Cost Optimization¶

1. Minimize Iterations for Quick Improvement¶

2. Match Population and Selection Sizes¶

3. Choose Efficient Refinement¶

4. Disable Elitism for Faster Convergence¶

Troubleshooting¶

Quality Not Improving¶

Too Many Iterations Without Improvement¶

Population Diversity Lost¶

Example: Complete Application¶

Next Steps¶

Related Strategies¶

Algorithm Details¶

Population Management¶

Selection Methods¶

Refinement Methods¶

References¶