Self-Consistency Strategy¶

Self-Consistency improves chain-of-thought prompting by generating multiple diverse reasoning paths and using consensus voting to select the final answer. When different reasoning approaches converge on the same answer, confidence in correctness is high.

Based on "Self-Consistency Improves Chain of Thought Reasoning" (Wei et al., 2022).

How It Works¶

Algorithm Overview¶

Text Only

┌─────────────────────────────────────────┐
│      Input: Query + System Prompt       │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│  Generate K Diverse Reasoning Paths     │
│  ├─ Path 1 (temp=0.5)                  │
│  ├─ Path 2 (temp=0.6)                  │
│  ├─ Path 3 (temp=0.7)                  │
│  ├─ Path 4 (temp=0.8)                  │
│  └─ Path 5 (temp=0.9)                  │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│   Extract Answer from Each Path         │
│   Using Configured Strategy:            │
│   • Last line                           │
│   • After marker ("Answer:")            │
│   • Last sentence                       │
│   • In quotes                           │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│    Consensus Voting (Count Votes)       │
│    Answer "42": 4 votes                 │
│    Answer "43": 1 vote                  │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│    Select Consensus Answer              │
│    via Voting Strategy                  │
└────────────┬────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────┐
│   Return Consensus Solution             │
│   with voting confidence                │
└─────────────────────────────────────────┘

Key Phases¶

Generation Phase: Generate K diverse reasoning paths using different temperatures
Extraction Phase: Extract the final answer from each reasoning path
Voting Phase: Perform consensus voting based on extracted answers
Selection Phase: Choose the answer with highest consensus agreement

Answer Extraction Strategies¶

LastLine¶

Extracts the last non-empty line of the response.

Best for: Models that naturally write answers on the final line Example: Answer to "What is 2+2?" becomes "4"

Rust

let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::LastLine);

AfterMarker ⭐ (Recommended)¶

Looks for markers like "Answer:", "Final Answer:", etc.

Best for: Most use cases, works with properly formatted responses Example: "After calculation, Answer: 42" extracts "42"

Rust

let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::AfterMarker);

LastSentence¶

Extracts the text after the last period.

Best for: Narrative responses with conclusions Example: "The process is complex. The answer is 42." extracts "The answer is 42."

Rust

let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::LastSentence);

InQuotes¶

Extracts the first quoted text in the response.

Best for: Responses with explicit quoted answers Example: "I believe \"42\" is correct" extracts "42"

Rust

let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::InQuotes);

FullResponse¶

Uses the entire response as the answer.

Best for: Open-ended questions or evaluation tasks Usage: When full reasoning is the answer

Rust

let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::FullResponse);

Voting Strategies¶

MajorityVote (Default)¶

Select the answer with the most votes.

Best for: Binary or categorical answers Trade-off: Simple but doesn't consider reasoning quality

Rust

let config = SelfConsistencyConfig::new(5)
    .with_voting_strategy(VotingStrategy::MajorityVote);

QualityWeighted¶

Weight votes by reasoning path quality (reasoning length).

Best for: When longer reasoning indicates better thinking Trade-off: May favor verbose responses

Rust

let config = SelfConsistencyConfig::new(5)
    .with_voting_strategy(VotingStrategy::QualityWeighted);

HighestConfidence ⭐ (Recommended)¶

Combine vote count (70%) with reasoning quality (30%).

Best for: Balancing consensus with reasoning depth Trade-off: More sophisticated scoring

Rust

let config = SelfConsistencyConfig::new(5)
    .with_voting_strategy(VotingStrategy::HighestConfidence);

RankedChoice¶

Instant runoff voting with preference ordering.

Best for: Complex consensus scenarios Trade-off: More computationally intensive

Rust

let config = SelfConsistencyConfig::new(5)
    .with_voting_strategy(VotingStrategy::RankedChoice);

Configuration¶

Basic Configuration¶

Rust

use optillm_mars::{SelfConsistencyConfig, AnswerExtractionStrategy};
use optillm_core::ModelClient;

// Create configuration for 5 reasoning paths
let config = SelfConsistencyConfig::new(5);

// Run Self-Consistency
let (consensus_solution, metadata) =
    optillm_mars::SelfConsistencyAggregator::run_self_consistency(
        "What is 2+2?",
        "You are a helpful math assistant.",
        config,
        &client,
    ).await?;

println!("Consensus Answer: {}", consensus_solution.answer);
println!("Confidence: {:.1}%", metadata.consensus_score * 100.0);

Advanced Configuration¶

Rust

let config = SelfConsistencyConfig::new(7)
    .with_temperatures(vec![0.3, 0.5, 0.7, 0.9, 1.1, 1.3, 1.5])
    .with_extraction_strategy(AnswerExtractionStrategy::AfterMarker)
    .with_voting_strategy(VotingStrategy::HighestConfidence)
    .with_consensus_threshold(0.6)
    .with_quality_weighting(true);

let (consensus_solution, metadata) =
    optillm_mars::SelfConsistencyAggregator::run_self_consistency(
        query,
        system_prompt,
        config,
        &client,
    ).await?;

// Analyze voting results
let stats = optillm_mars::SelfConsistencyAggregator::get_voting_statistics(&metadata);
println!("Agreement: {:.1}%", stats.consensus_agreement * 100.0);
println!("Unique answers: {}", stats.unique_answers);

Use Cases¶

Math Problems ✓✓✓ (Excellent)¶

Self-Consistency is highly effective for math because different solution approaches converge on the same answer.

Rust

let config = SelfConsistencyConfig::new(7)
    .with_extraction_strategy(AnswerExtractionStrategy::AfterMarker)
    .with_voting_strategy(VotingStrategy::HighestConfidence);

let (answer, _) = SelfConsistencyAggregator::run_self_consistency(
    "A train leaves at 10am going 60mph. Another leaves at 11am at 80mph. When does the second catch the first?",
    "Solve step by step. Answer: [final answer]",
    config,
    &client,
).await?;

Logic Puzzles ✓✓ (Good)¶

Multiple reasoning paths for logic problems often converge on correct solution.

Rust

let config = SelfConsistencyConfig::new(5)
    .with_voting_strategy(VotingStrategy::HighestConfidence);

let (answer, _) = SelfConsistencyAggregator::run_self_consistency(
    "Alice, Bob, and Charlie have different colored hats. If Alice's hat isn't red, Bob doesn't have blue, and exactly one has green... who has what?",
    "Work through the logic carefully.",
    config,
    &client,
).await?;

Factual Questions ✓ (Moderate)¶

Can work well if the question has a clear factual answer.

Rust

let config = SelfConsistencyConfig::new(3)
    .with_extraction_strategy(AnswerExtractionStrategy::LastLine);

let (answer, _) = SelfConsistencyAggregator::run_self_consistency(
    "What is the capital of France?",
    "Answer with just the city name.",
    config,
    &client,
).await?;

Creative Writing ✗ (Poor)¶

Not suitable - creative outputs shouldn't converge to single answer.

Rust

// NOT RECOMMENDED for:
// - Creative writing (multiple valid responses)
// - Opinion questions (no single correct answer)
// - Open-ended tasks (diversity is desired)

Integration with MARS¶

Use Self-Consistency within the MARS pipeline:

Rust

use optillm_mars::{Aggregator, SelfConsistencyConfig};

// Use as an aggregation strategy
let config = SelfConsistencyConfig::new(5)
    .with_voting_strategy(VotingStrategy::HighestConfidence);

let solutions = Aggregator::aggregate_self_consistency(
    query,
    system_prompt,
    config,
    &client,
).await?;

Performance Characteristics¶

Token Usage¶

Cost: O(K × token_count), where K is number of reasoning paths
For 5 paths: ~5x the tokens of a single generation
For 3 paths: ~3x the tokens (good for quick iterations)

Latency¶

Parallel Execution: Can generate all K paths in parallel
Sequential Execution: Linear in K (one after another)
Typical: 3-5 seconds for 5 paths with 500 token responses

Quality Improvement¶

Math/Logic: 20-50% improvement (high consensus reliability)
Factual QA: 10-20% improvement
General Tasks: 5-15% improvement

Consensus Reliability¶

4-5 votes agreement (80-100%): Very high confidence in answer
3 votes agreement (60%): Good confidence
2 votes agreement (40%): Low confidence, consider other strategies

Advantages vs Disadvantages¶

Advantages¶

✓ Strong theoretical foundation (Wei et al., 2022 research) ✓ Particularly effective for math and logic problems ✓ Transparent voting process (easy to audit) ✓ Consensus provides confidence measure ✓ Works with any LLM provider ✓ No model fine-tuning required ✓ Can parallelize execution ✓ Vote distribution reveals answer uncertainty

Disadvantages¶

✗ Higher token cost (K × normal cost) ✗ Not suitable for creative tasks ✗ Requires clear, extractable answers ✗ Performance depends on extraction strategy ✗ May underperform on open-ended questions ✗ Extraction strategy needs tuning per task

Comparison with Other Strategies¶

Strategy	Use Case	Quality	Cost	Consensus
Single Best	Baseline	Baseline	1x	N/A
Self-Consistency	Math/Logic	+25-40%	Kx	Voting
Best-of-N	Selection	+15-30%	Nx	No
MOA	Synthesis	+20-35%	3x	Partial
MCTS	Exploration	+20-40%	5-10x	No
MARS	Maximum Quality	+40-60%	10-20x	Full

Benchmarks¶

Performance on Standard Tasks¶

Math Problems (5 paths) - Baseline: 60% - Self-Consistency: 82% (+37%) - Voting agreement: 80%

Logic Puzzles (5 paths) - Baseline: 55% - Self-Consistency: 78% (+42%) - Voting agreement: 75%

Factual QA (3 paths) - Baseline: 75% - Self-Consistency: 83% (+11%) - Voting agreement: 85%

Code Generation (5 paths) - Baseline: 70% - Self-Consistency: 79% (+13%) - Voting agreement: 60%

Cost Optimization¶

1. Reduce Number of Paths for Simple Tasks¶

Rust

// For factual QA: fewer paths needed
let config = SelfConsistencyConfig::new(3);

// For complex math: more paths beneficial
let config = SelfConsistencyConfig::new(7);

2. Choose Efficient Extraction Strategy¶

Rust

// LastLine is fastest
let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::LastLine);

// AfterMarker requires pattern matching but more reliable
let config = SelfConsistencyConfig::new(5)
    .with_extraction_strategy(AnswerExtractionStrategy::AfterMarker);

3. Use Lower Temperatures for Faster Consensus¶

Rust

// Narrower temperature range = faster convergence
let config = SelfConsistencyConfig::new(3)
    .with_temperatures(vec![0.5, 0.6, 0.7]);

4. Cache Results for Repeated Queries¶

Rust

// Store metadata to avoid re-running
let cache = HashMap::new();
if let Some(cached) = cache.get(query) {
    return cached.clone();
}

Troubleshooting¶

Low Consensus Agreement¶

Problem: Diverse answers with no clear winner
Solutions:
Improve system prompt with clearer instructions
Reduce temperature range (less diversity)
Increase K (more voting samples)
Use AfterMarker extraction strategy

Wrong Answer Despite High Consensus¶

Problem: Multiple paths agree but on wrong answer
Solutions:
Improve extraction strategy
Use HighestConfidence voting (consider quality)
Try with different system prompt
Consider combining with Best-of-N

High Token Usage¶

Problem: Too expensive for deployment
Solutions:
Reduce K from 5 to 3
Use cheaper model variant
Cache results aggressively
Combine with Best-of-N for initial filtering

Extraction Failures¶

Problem: Many paths produce non-extractable answers
Solutions:
Switch extraction strategy
Improve system prompt with format examples
Use FullResponse strategy as fallback
Pre-process responses to match expected format

Example: Complete Application¶

Rust

use optillm_mars::{
    SelfConsistencyAggregator, SelfConsistencyConfig,
    AnswerExtractionStrategy, VotingStrategy
};
use optillm_core::ModelClient;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize model client
    let client = create_model_client().await?;

    // Configure Self-Consistency for math problems
    let config = SelfConsistencyConfig::new(5)
        .with_extraction_strategy(AnswerExtractionStrategy::AfterMarker)
        .with_voting_strategy(VotingStrategy::HighestConfidence)
        .with_consensus_threshold(0.5)
        .with_quality_weighting(true);

    // Run Self-Consistency
    let (solution, metadata) = SelfConsistencyAggregator::run_self_consistency(
        "If 3 apples cost $1.50, how much do 7 apples cost?",
        "Solve step-by-step. Answer: [final answer]",
        config,
        &client,
    ).await?;

    // Display results
    println!("Consensus Answer: {}", solution.answer);
    println!("Confidence: {:.1}%", metadata.consensus_score * 100.0);
    println!("\nVoting Results:");
    for (answer, votes) in &metadata.voting_results {
        let percentage = (*votes as f32 / metadata.num_paths as f32) * 100.0;
        println!("  {}: {} votes ({:.0}%)", answer, votes, percentage);
    }

    // Analyze voting statistics
    let stats = SelfConsistencyAggregator::get_voting_statistics(&metadata);
    println!("\nStatistics:");
    println!("  Total paths: {}", stats.total_paths);
    println!("  Unique answers: {}", stats.unique_answers);
    println!("  Consensus agreement: {:.1}%", stats.consensus_agreement * 100.0);
    println!("  Vote variance: {:.4}", stats.vote_distribution_variance);

    Ok(())
}

Next Steps¶

Choose extraction strategy for your task
Select voting strategy (HighestConfidence recommended)
Start with K=3-5 paths
Monitor consensus agreement percentage
Tune based on performance
Consider combining with other strategies for best results

Best-of-N: Simpler selection without voting
MOA (Mixture of Agents): Synthesis-based approach
MCTS: Tree-based reasoning exploration
MARS: Full multi-phase optimization pipeline

References¶

Wei et al., "Self-Consistency Improves Chain of Thought Reasoning in Language Models", ICLR 2023
Original code: https://github.com/coohom/optillm

Self-Consistency Strategy¶

How It Works¶

Algorithm Overview¶

Key Phases¶

Answer Extraction Strategies¶

LastLine¶

AfterMarker ⭐ (Recommended)¶

LastSentence¶

InQuotes¶

FullResponse¶

Voting Strategies¶

MajorityVote (Default)¶

QualityWeighted¶

HighestConfidence ⭐ (Recommended)¶

RankedChoice¶

Configuration¶

Basic Configuration¶

Advanced Configuration¶

Use Cases¶

Math Problems ✓✓✓ (Excellent)¶

Logic Puzzles ✓✓ (Good)¶

Factual Questions ✓ (Moderate)¶

Creative Writing ✗ (Poor)¶

Integration with MARS¶

Performance Characteristics¶

Token Usage¶

Latency¶

Quality Improvement¶

Consensus Reliability¶

Advantages vs Disadvantages¶

Advantages¶

Disadvantages¶

Comparison with Other Strategies¶

Benchmarks¶

Performance on Standard Tasks¶

Cost Optimization¶

1. Reduce Number of Paths for Simple Tasks¶

2. Choose Efficient Extraction Strategy¶

3. Use Lower Temperatures for Faster Consensus¶

4. Cache Results for Repeated Queries¶

Troubleshooting¶

Low Consensus Agreement¶

Wrong Answer Despite High Consensus¶

High Token Usage¶

Extraction Failures¶

Example: Complete Application¶

Next Steps¶

Related Strategies¶

References¶