Self-Consistency Strategy¶
Self-Consistency improves chain-of-thought prompting by generating multiple diverse reasoning paths and using consensus voting to select the final answer. When different reasoning approaches converge on the same answer, confidence in correctness is high.
Based on "Self-Consistency Improves Chain of Thought Reasoning" (Wei et al., 2022).
How It Works¶
Algorithm Overview¶
┌─────────────────────────────────────────┐
│ Input: Query + System Prompt │
└────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Generate K Diverse Reasoning Paths │
│ ├─ Path 1 (temp=0.5) │
│ ├─ Path 2 (temp=0.6) │
│ ├─ Path 3 (temp=0.7) │
│ ├─ Path 4 (temp=0.8) │
│ └─ Path 5 (temp=0.9) │
└────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Extract Answer from Each Path │
│ Using Configured Strategy: │
│ • Last line │
│ • After marker ("Answer:") │
│ • Last sentence │
│ • In quotes │
└────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Consensus Voting (Count Votes) │
│ Answer "42": 4 votes │
│ Answer "43": 1 vote │
└────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Select Consensus Answer │
│ via Voting Strategy │
└────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Return Consensus Solution │
│ with voting confidence │
└─────────────────────────────────────────┘
Key Phases¶
- Generation Phase: Generate K diverse reasoning paths using different temperatures
- Extraction Phase: Extract the final answer from each reasoning path
- Voting Phase: Perform consensus voting based on extracted answers
- Selection Phase: Choose the answer with highest consensus agreement
Answer Extraction Strategies¶
LastLine¶
Extracts the last non-empty line of the response.
Best for: Models that naturally write answers on the final line Example: Answer to "What is 2+2?" becomes "4"
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::LastLine);
AfterMarker ⭐ (Recommended)¶
Looks for markers like "Answer:", "Final Answer:", etc.
Best for: Most use cases, works with properly formatted responses Example: "After calculation, Answer: 42" extracts "42"
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::AfterMarker);
LastSentence¶
Extracts the text after the last period.
Best for: Narrative responses with conclusions Example: "The process is complex. The answer is 42." extracts "The answer is 42."
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::LastSentence);
InQuotes¶
Extracts the first quoted text in the response.
Best for: Responses with explicit quoted answers Example: "I believe \"42\" is correct" extracts "42"
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::InQuotes);
FullResponse¶
Uses the entire response as the answer.
Best for: Open-ended questions or evaluation tasks Usage: When full reasoning is the answer
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::FullResponse);
Voting Strategies¶
MajorityVote (Default)¶
Select the answer with the most votes.
Best for: Binary or categorical answers Trade-off: Simple but doesn't consider reasoning quality
let config = SelfConsistencyConfig::new(5)
.with_voting_strategy(VotingStrategy::MajorityVote);
QualityWeighted¶
Weight votes by reasoning path quality (reasoning length).
Best for: When longer reasoning indicates better thinking Trade-off: May favor verbose responses
let config = SelfConsistencyConfig::new(5)
.with_voting_strategy(VotingStrategy::QualityWeighted);
HighestConfidence ⭐ (Recommended)¶
Combine vote count (70%) with reasoning quality (30%).
Best for: Balancing consensus with reasoning depth Trade-off: More sophisticated scoring
let config = SelfConsistencyConfig::new(5)
.with_voting_strategy(VotingStrategy::HighestConfidence);
RankedChoice¶
Instant runoff voting with preference ordering.
Best for: Complex consensus scenarios Trade-off: More computationally intensive
let config = SelfConsistencyConfig::new(5)
.with_voting_strategy(VotingStrategy::RankedChoice);
Configuration¶
Basic Configuration¶
use optillm_mars::{SelfConsistencyConfig, AnswerExtractionStrategy};
use optillm_core::ModelClient;
// Create configuration for 5 reasoning paths
let config = SelfConsistencyConfig::new(5);
// Run Self-Consistency
let (consensus_solution, metadata) =
optillm_mars::SelfConsistencyAggregator::run_self_consistency(
"What is 2+2?",
"You are a helpful math assistant.",
config,
&client,
).await?;
println!("Consensus Answer: {}", consensus_solution.answer);
println!("Confidence: {:.1}%", metadata.consensus_score * 100.0);
Advanced Configuration¶
let config = SelfConsistencyConfig::new(7)
.with_temperatures(vec![0.3, 0.5, 0.7, 0.9, 1.1, 1.3, 1.5])
.with_extraction_strategy(AnswerExtractionStrategy::AfterMarker)
.with_voting_strategy(VotingStrategy::HighestConfidence)
.with_consensus_threshold(0.6)
.with_quality_weighting(true);
let (consensus_solution, metadata) =
optillm_mars::SelfConsistencyAggregator::run_self_consistency(
query,
system_prompt,
config,
&client,
).await?;
// Analyze voting results
let stats = optillm_mars::SelfConsistencyAggregator::get_voting_statistics(&metadata);
println!("Agreement: {:.1}%", stats.consensus_agreement * 100.0);
println!("Unique answers: {}", stats.unique_answers);
Use Cases¶
Math Problems ✓✓✓ (Excellent)¶
Self-Consistency is highly effective for math because different solution approaches converge on the same answer.
let config = SelfConsistencyConfig::new(7)
.with_extraction_strategy(AnswerExtractionStrategy::AfterMarker)
.with_voting_strategy(VotingStrategy::HighestConfidence);
let (answer, _) = SelfConsistencyAggregator::run_self_consistency(
"A train leaves at 10am going 60mph. Another leaves at 11am at 80mph. When does the second catch the first?",
"Solve step by step. Answer: [final answer]",
config,
&client,
).await?;
Logic Puzzles ✓✓ (Good)¶
Multiple reasoning paths for logic problems often converge on correct solution.
let config = SelfConsistencyConfig::new(5)
.with_voting_strategy(VotingStrategy::HighestConfidence);
let (answer, _) = SelfConsistencyAggregator::run_self_consistency(
"Alice, Bob, and Charlie have different colored hats. If Alice's hat isn't red, Bob doesn't have blue, and exactly one has green... who has what?",
"Work through the logic carefully.",
config,
&client,
).await?;
Factual Questions ✓ (Moderate)¶
Can work well if the question has a clear factual answer.
let config = SelfConsistencyConfig::new(3)
.with_extraction_strategy(AnswerExtractionStrategy::LastLine);
let (answer, _) = SelfConsistencyAggregator::run_self_consistency(
"What is the capital of France?",
"Answer with just the city name.",
config,
&client,
).await?;
Creative Writing ✗ (Poor)¶
Not suitable - creative outputs shouldn't converge to single answer.
// NOT RECOMMENDED for:
// - Creative writing (multiple valid responses)
// - Opinion questions (no single correct answer)
// - Open-ended tasks (diversity is desired)
Integration with MARS¶
Use Self-Consistency within the MARS pipeline:
use optillm_mars::{Aggregator, SelfConsistencyConfig};
// Use as an aggregation strategy
let config = SelfConsistencyConfig::new(5)
.with_voting_strategy(VotingStrategy::HighestConfidence);
let solutions = Aggregator::aggregate_self_consistency(
query,
system_prompt,
config,
&client,
).await?;
Performance Characteristics¶
Token Usage¶
- Cost: O(K × token_count), where K is number of reasoning paths
- For 5 paths: ~5x the tokens of a single generation
- For 3 paths: ~3x the tokens (good for quick iterations)
Latency¶
- Parallel Execution: Can generate all K paths in parallel
- Sequential Execution: Linear in K (one after another)
- Typical: 3-5 seconds for 5 paths with 500 token responses
Quality Improvement¶
- Math/Logic: 20-50% improvement (high consensus reliability)
- Factual QA: 10-20% improvement
- General Tasks: 5-15% improvement
Consensus Reliability¶
- 4-5 votes agreement (80-100%): Very high confidence in answer
- 3 votes agreement (60%): Good confidence
- 2 votes agreement (40%): Low confidence, consider other strategies
Advantages vs Disadvantages¶
Advantages¶
✓ Strong theoretical foundation (Wei et al., 2022 research) ✓ Particularly effective for math and logic problems ✓ Transparent voting process (easy to audit) ✓ Consensus provides confidence measure ✓ Works with any LLM provider ✓ No model fine-tuning required ✓ Can parallelize execution ✓ Vote distribution reveals answer uncertainty
Disadvantages¶
✗ Higher token cost (K × normal cost) ✗ Not suitable for creative tasks ✗ Requires clear, extractable answers ✗ Performance depends on extraction strategy ✗ May underperform on open-ended questions ✗ Extraction strategy needs tuning per task
Comparison with Other Strategies¶
| Strategy | Use Case | Quality | Cost | Consensus |
|---|---|---|---|---|
| Single Best | Baseline | Baseline | 1x | N/A |
| Self-Consistency | Math/Logic | +25-40% | Kx | Voting |
| Best-of-N | Selection | +15-30% | Nx | No |
| MOA | Synthesis | +20-35% | 3x | Partial |
| MCTS | Exploration | +20-40% | 5-10x | No |
| MARS | Maximum Quality | +40-60% | 10-20x | Full |
Benchmarks¶
Performance on Standard Tasks¶
Math Problems (5 paths) - Baseline: 60% - Self-Consistency: 82% (+37%) - Voting agreement: 80%
Logic Puzzles (5 paths) - Baseline: 55% - Self-Consistency: 78% (+42%) - Voting agreement: 75%
Factual QA (3 paths) - Baseline: 75% - Self-Consistency: 83% (+11%) - Voting agreement: 85%
Code Generation (5 paths) - Baseline: 70% - Self-Consistency: 79% (+13%) - Voting agreement: 60%
Cost Optimization¶
1. Reduce Number of Paths for Simple Tasks¶
// For factual QA: fewer paths needed
let config = SelfConsistencyConfig::new(3);
// For complex math: more paths beneficial
let config = SelfConsistencyConfig::new(7);
2. Choose Efficient Extraction Strategy¶
// LastLine is fastest
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::LastLine);
// AfterMarker requires pattern matching but more reliable
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::AfterMarker);
3. Use Lower Temperatures for Faster Consensus¶
// Narrower temperature range = faster convergence
let config = SelfConsistencyConfig::new(3)
.with_temperatures(vec![0.5, 0.6, 0.7]);
4. Cache Results for Repeated Queries¶
// Store metadata to avoid re-running
let cache = HashMap::new();
if let Some(cached) = cache.get(query) {
return cached.clone();
}
Troubleshooting¶
Low Consensus Agreement¶
- Problem: Diverse answers with no clear winner
- Solutions:
- Improve system prompt with clearer instructions
- Reduce temperature range (less diversity)
- Increase K (more voting samples)
- Use AfterMarker extraction strategy
Wrong Answer Despite High Consensus¶
- Problem: Multiple paths agree but on wrong answer
- Solutions:
- Improve extraction strategy
- Use HighestConfidence voting (consider quality)
- Try with different system prompt
- Consider combining with Best-of-N
High Token Usage¶
- Problem: Too expensive for deployment
- Solutions:
- Reduce K from 5 to 3
- Use cheaper model variant
- Cache results aggressively
- Combine with Best-of-N for initial filtering
Extraction Failures¶
- Problem: Many paths produce non-extractable answers
- Solutions:
- Switch extraction strategy
- Improve system prompt with format examples
- Use FullResponse strategy as fallback
- Pre-process responses to match expected format
Example: Complete Application¶
use optillm_mars::{
SelfConsistencyAggregator, SelfConsistencyConfig,
AnswerExtractionStrategy, VotingStrategy
};
use optillm_core::ModelClient;
#[tokio::main]
async fn main() -> Result<()> {
// Initialize model client
let client = create_model_client().await?;
// Configure Self-Consistency for math problems
let config = SelfConsistencyConfig::new(5)
.with_extraction_strategy(AnswerExtractionStrategy::AfterMarker)
.with_voting_strategy(VotingStrategy::HighestConfidence)
.with_consensus_threshold(0.5)
.with_quality_weighting(true);
// Run Self-Consistency
let (solution, metadata) = SelfConsistencyAggregator::run_self_consistency(
"If 3 apples cost $1.50, how much do 7 apples cost?",
"Solve step-by-step. Answer: [final answer]",
config,
&client,
).await?;
// Display results
println!("Consensus Answer: {}", solution.answer);
println!("Confidence: {:.1}%", metadata.consensus_score * 100.0);
println!("\nVoting Results:");
for (answer, votes) in &metadata.voting_results {
let percentage = (*votes as f32 / metadata.num_paths as f32) * 100.0;
println!(" {}: {} votes ({:.0}%)", answer, votes, percentage);
}
// Analyze voting statistics
let stats = SelfConsistencyAggregator::get_voting_statistics(&metadata);
println!("\nStatistics:");
println!(" Total paths: {}", stats.total_paths);
println!(" Unique answers: {}", stats.unique_answers);
println!(" Consensus agreement: {:.1}%", stats.consensus_agreement * 100.0);
println!(" Vote variance: {:.4}", stats.vote_distribution_variance);
Ok(())
}
Next Steps¶
- Choose extraction strategy for your task
- Select voting strategy (HighestConfidence recommended)
- Start with K=3-5 paths
- Monitor consensus agreement percentage
- Tune based on performance
- Consider combining with other strategies for best results
Related Strategies¶
- Best-of-N: Simpler selection without voting
- MOA (Mixture of Agents): Synthesis-based approach
- MCTS: Tree-based reasoning exploration
- MARS: Full multi-phase optimization pipeline
References¶
- Wei et al., "Self-Consistency Improves Chain of Thought Reasoning in Language Models", ICLR 2023
- Original code: https://github.com/coohom/optillm