RSA (Reinforced Self-Aggregation)¶
RSA is a population-based, evolutionary strategy that iteratively improves solution quality by maintaining a population, selecting the best solutions, refining them, and cycling to improve overall population quality. Similar to genetic algorithms, it balances exploration and exploitation.
How It Works¶
Algorithm Overview¶
┌─────────────────────────────────────────┐
│ Initial Population of N Solutions │
└────────────┬────────────────────────────┘
│
▼
┌─────────────────────┐
│ Iteration 1 to T │
│ │
│ 1. Select K best │
│ 2. Refine them │
│ 3. Add to pop. │
│ 4. Trim to N │
│ 5. Keep best ever │
│ │
└────────┬────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Return Best Solution from Final Pop │
└─────────────────────────────────────────┘
Key Phases¶
- Initialization: Start with N diverse solutions
- Selection Loop (per iteration):
- Select K best solutions based on criterion
- Refine them using configured strategy
- Add refined solutions back to population
- Maintain population size through trimming
- Preserve best solution ever found (elitism)
Selection Criteria¶
BestScore (Default)¶
Select solutions with highest verification scores.
Best for: When verification scores are reliable indicators of quality Trade-off: Greedy approach, may lose diversity
let config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::BestScore);
Diversity¶
Select solutions with varying reasoning lengths.
Best for: Maintaining population diversity across iterations Trade-off: May not select best solutions
let config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::Diversity);
Thoroughness¶
Select solutions with longest reasoning.
Best for: When more detailed reasoning correlates with quality Trade-off: Biased toward verbose responses
let config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::Thoroughness);
Random¶
Random selection from population.
Best for: Exploration and avoiding local optima Trade-off: Less guided optimization
let config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::Random);
Tournament ⭐ (Recommended)¶
Tournament-style selection (3-way competition).
Best for: Balancing quality and diversity Trade-off: Slight randomness in each tournament
let config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::Tournament);
Refinement Strategies¶
Synthesis¶
Combine reasoning from selected solutions.
Process: Merge reasoning paths, select consensus answer Best for: Leveraging multiple perspectives Result: Synthesized solution combining best aspects
let config = RSAConfig::new(5, 3, 2)
.with_refinement_strategy(RefinementStrategy::Synthesis);
Merge¶
Create variants by combining solution components.
Process: Mix elements from selected solutions Best for: Exploring solution space around good answers Result: Multiple variants for further exploration
let config = RSAConfig::new(5, 3, 2)
.with_refinement_strategy(RefinementStrategy::Merge);
Iterative¶
Focus improvement on the best solution.
Process: Enhance highest-scoring solution Best for: Refining good solutions toward excellence Result: Slightly improved version of best solution
let config = RSAConfig::new(5, 3, 2)
.with_refinement_strategy(RefinementStrategy::Iterative);
Ensemble ⭐ (Recommended)¶
Ensemble approach combining selected solutions.
Process: Average reasoning and synthesize answer Best for: Robust improvement across iterations Result: Consensus solution from ensemble
let config = RSAConfig::new(5, 3, 2)
.with_refinement_strategy(RefinementStrategy::Ensemble);
Configuration¶
Basic Configuration¶
use optillm_mars::{RSAConfig, SelectionCriterion, RefinementStrategy};
// Start with 5 initial solutions
// Select 3 for refinement each iteration
// Run 2 iterations
let config = RSAConfig::new(5, 3, 2);
// Run RSA
let (best_solution, metadata) = optillm_mars::RSAAggregator::run_rsa(
&initial_solutions,
config,
)?;
println!("Best answer: {}", best_solution.answer);
println!("Score: {:.2}", best_solution.verification_score);
Advanced Configuration¶
let config = RSAConfig::new(7, 4, 3)
.with_selection_criterion(SelectionCriterion::Tournament)
.with_refinement_strategy(RefinementStrategy::Ensemble)
.with_elitism(true);
let (best_solution, metadata) = optillm_mars::RSAAggregator::run_rsa(
&initial_solutions,
config,
)?;
// Analyze results
let stats = optillm_mars::RSAAggregator::get_statistics(&metadata);
println!("Iterations: {}", stats.total_iterations);
println!("Best iteration: {}", stats.best_iteration);
println!("Total improvement: {:.2}", stats.total_improvement);
Use Cases¶
Multi-Turn Reasoning Refinement ✓✓✓ (Excellent)¶
When you have multiple initial solutions and want iterative improvement.
let config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::Tournament)
.with_refinement_strategy(RefinementStrategy::Ensemble);
let (refined, _) = RSAAggregator::run_rsa(&initial_solutions, config)?;
Population Improvement ✓✓ (Good)¶
When you want to improve a diverse population of answers.
let config = RSAConfig::new(10, 5, 3)
.with_selection_criterion(SelectionCriterion::Diversity);
let (best, _) = RSAAggregator::run_rsa(&population, config)?;
Quality Enhancement ✓✓ (Good)¶
When you want to boost already-good solutions.
let config = RSAConfig::new(5, 3, 2)
.with_refinement_strategy(RefinementStrategy::Iterative);
let (enhanced, _) = RSAAggregator::run_rsa(&solutions, config)?;
Consensus Building ✓ (Moderate)¶
When you want to find common ground across solutions.
let config = RSAConfig::new(7, 4, 2)
.with_refinement_strategy(RefinementStrategy::Synthesis);
let (consensus, _) = RSAAggregator::run_rsa(&solutions, config)?;
Integration with MARS¶
Use RSA within the MARS pipeline for post-processing:
use optillm_mars::{Aggregator, RSAConfig, SelectionCriterion};
// Run initial generation and verification
// Then apply RSA to improve population
let rsa_config = RSAConfig::new(5, 3, 2)
.with_selection_criterion(SelectionCriterion::Tournament);
let refined = Aggregator::aggregate_rsa(&verified_solutions, rsa_config)?;
Performance Characteristics¶
Token Usage¶
- Cost: No additional tokens (uses existing solutions)
- Computation: O(I × K) where I = iterations, K = selection size
- Memory: O(P) where P = population size
Quality Improvement¶
- Per iteration: 2-5% typical improvement
- Over 3 iterations: 5-15% cumulative improvement
- Best case: 20-30% improvement with good initial solutions
Convergence¶
- Typical: Improvement slows after 2-3 iterations
- Max useful iterations: 3-5 (diminishing returns)
- Population diversity: Critical for continued improvement
Advantages vs Disadvantages¶
Advantages¶
✓ No additional API calls (uses existing solutions) ✓ Iterative improvement with transparent progress ✓ Flexible selection and refinement strategies ✓ Maintains solution diversity through elitism ✓ Works with any population of solutions ✓ Can be combined with other strategies ✓ Transparent iteration-by-iteration improvement
Disadvantages¶
✗ Requires good initial population to start ✗ Diminishing returns after few iterations ✗ Computationally intensive (O(I×P×K)) ✗ No new reasoning paths generated ✗ Depends on quality of initial solutions ✗ May get stuck in local optima
Comparison with Other Strategies¶
| Strategy | Type | Initial Cost | Iterations | Improvement |
|---|---|---|---|---|
| Best-of-N | Selection | Nx | 0 | 15-30% |
| Self-Consistency | Voting | Kx | 0 | 25-40% |
| MOA | Synthesis | 3x | 0 | 20-35% |
| RSA | Evolutionary | 0x | I×K | 5-15% |
| MARS | Full Pipeline | 10-20x | 2-3 | 40-60% |
Benchmarks¶
Performance on Standard Tasks¶
With Good Initial Population (5-7 solutions) - Baseline quality: 75% - After RSA (2 iterations): 80% (+6.7%) - After RSA (3 iterations): 82% (+9.3%)
Math Problem Refinement - Initial: 65% - After 1 iteration: 68% - After 2 iterations: 72% - After 3 iterations: 75%
Consensus Building - 5 diverse solutions: average answer agreement 40% - After RSA synthesis: agreement increases to 60-70%
Cost Optimization¶
1. Minimize Iterations for Quick Improvement¶
// Fast: 1 iteration for quick win
let config = RSAConfig::new(5, 3, 1);
// Thorough: 3 iterations for better quality
let config = RSAConfig::new(5, 3, 3);
2. Match Population and Selection Sizes¶
// Fast track: smaller selections
let config = RSAConfig::new(5, 2, 2);
// Deep exploration: larger selections
let config = RSAConfig::new(7, 5, 3);
3. Choose Efficient Refinement¶
// Synthesis is most computation-intensive
// Merge is lighter weight
let config = RSAConfig::new(5, 3, 2)
.with_refinement_strategy(RefinementStrategy::Merge);
4. Disable Elitism for Faster Convergence¶
let config = RSAConfig::new(5, 3, 2)
.with_elitism(false);
Troubleshooting¶
Quality Not Improving¶
- Problem: Iterations don't improve score
- Solutions:
- Check initial solution quality (need at least some good solutions)
- Try different selection criterion (Tournament, not just BestScore)
- Use Ensemble refinement instead of Merge
- Increase iterations (more time to find improvements)
Too Many Iterations Without Improvement¶
- Problem: Running 5+ iterations with no gain
- Solutions:
- Reduce iterations to 2-3 (diminishing returns)
- Initial population quality too low
- Try Tournament selection for more exploration
- Disable elitism to allow more variation
Population Diversity Lost¶
- Problem: All solutions become similar
- Solutions:
- Use Diversity selection criterion
- Increase selection size (K > N/2)
- Use Random selection mixed with BestScore
- Disable elitism to allow population churn
Example: Complete Application¶
use optillm_mars::{
RSAAggregator, RSAConfig,
SelectionCriterion, RefinementStrategy
};
#[tokio::main]
async fn main() -> Result<()> {
// Assume you have 5-10 initial solutions from other strategies
let initial_solutions = generate_initial_solutions().await?;
// Configure RSA for iterative improvement
let config = RSAConfig::new(7, 4, 3)
.with_selection_criterion(SelectionCriterion::Tournament)
.with_refinement_strategy(RefinementStrategy::Ensemble)
.with_elitism(true);
// Run RSA optimization
let (best_solution, metadata) = RSAAggregator::run_rsa(
&initial_solutions,
config,
)?;
// Display results
println!("Initial best: {:.2}", initial_solutions[0].verification_score);
println!("Final best: {:.2}", best_solution.verification_score);
println!("Improvement: {:.2}", metadata.best_solution_score - initial_solutions[0].verification_score);
// Analyze iteration-by-iteration progress
for stat in &metadata.iteration_history {
println!(
"Iteration {}: best={:.2}, avg={:.2}",
stat.iteration, stat.best_score, stat.avg_score
);
}
println!("Population diversity: {:.2}", metadata.population_diversity);
println!("Final population: {} solutions", metadata.final_population_size);
Ok(())
}
Next Steps¶
- Start with 5-10 initial solutions (from Best-of-N, MOA, etc.)
- Configure RSA with Tournament selection
- Use Ensemble refinement for robustness
- Run 2-3 iterations initially
- Monitor improvement across iterations
- Adjust based on results
- Combine with other strategies in pipeline
Related Strategies¶
- Best-of-N: Initial selection (feed into RSA)
- Self-Consistency: Consensus voting (can precede RSA)
- MOA: Synthesis-based (alternative to RSA refinement)
- MARS: Full pipeline (RSA can be part of aggregation phase)
Algorithm Details¶
Population Management¶
- Keep population at fixed size N
- Replace worst solutions with refined ones
- Elitism: preserve best-ever solution
Selection Methods¶
- BestScore: Top K by verification score
- Diversity: K solutions with different reasoning lengths
- Thoroughness: K longest reasoning paths
- Random: Random K solutions
- Tournament: K winners of 3-way competitions
Refinement Methods¶
- Synthesis: Combine all reasoning, consensus answer
- Merge: Create variants from selected solutions
- Iterative: Improve best solution incrementally
- Ensemble: Average approach with consensus
References¶
- Population-based optimization strategies
- Genetic algorithms and evolutionary computation
- Multi-start optimization with local search