MARS Verification¶

Solution verification is critical for ensuring answer quality in MARS.

Overview¶

Verification involves cross-validating solutions across multiple dimensions:

Correctness: Is the answer factually correct?
Reasoning: Is the logic sound?
Completeness: Does it address all aspects?
Clarity: Is it well-explained?

Verification Process¶

Cross-Agent Verification: Agents verify each other
Scoring: Solutions receive numerical scores (0.0-1.0)
Thresholding: Only high-scoring solutions proceed
Feedback: Low scores trigger improvement

Verifier Component¶

Rust

pub struct Verifier {
    system_prompt: String,
}

impl Verifier {
    pub async fn verify_solution(
        &self,
        solution: &Solution,
        client: &dyn ModelClient,
    ) -> Result<f32> {
        // Generate verification prompt
        // Score the solution
        // Return score
    }
}

Verification Score¶

Scores represent confidence in solution quality:

Score	Interpretation
0.9-1.0	Excellent, ready to use
0.7-0.9	Good, minor improvements possible
0.5-0.7	Acceptable, needs refinement
0.0-0.5	Poor, requires major work

Configuration¶

Verification is configured via MarsConfig:

Rust

pub struct MarsConfig {
    pub num_verification_rounds: usize,
    pub verification_threshold: f32,
}

Verification Rounds¶

Number of verification passes:

Rust

// Default: 2 rounds
config.num_verification_rounds = 3;  // More thorough

Verification Threshold¶

Minimum score to accept solution:

Rust

// Default: 0.7
config.verification_threshold = 0.8;  // Stricter

Multi-Round Verification¶

Text Only

Round 1: Initial Verification
  ├─ Agent 0 verifies Agent 1 & 2
  ├─ Agent 1 verifies Agent 0 & 2
  └─ Agent 2 verifies Agent 0 & 1

Round 2: Secondary Verification
  └─ Check verified solutions again

Score Aggregation¶

When multiple agents verify a solution:

Rust

let scores: Vec<f32> = agents.iter()
    .map(|a| a.verify(&solution, client).await?)
    .collect::<Result<_>>()?;

let avg_score = scores.iter().sum::<f32>() / scores.len() as f32;
let is_verified = avg_score >= threshold;

Verification Failure Handling¶

If solution fails verification:

Rust

if solution.verification_score < threshold {
    // Trigger improvement phase
    improved = coordinator.improve_solution(
        solution,
        verification_feedback,
    ).await?;
}

Custom Verification¶

Implement custom verification logic:

Rust

pub async fn custom_verify(
    solution: &Solution,
    domain_rules: &DomainRules,
) -> Result<f32> {
    // Check domain-specific constraints
    if !domain_rules.validate(&solution.answer)? {
        return Ok(0.0);
    }

    // Check format
    if !is_valid_format(&solution.answer)? {
        return Ok(0.2);
    }

    // Check completeness
    if !is_complete(&solution)? {
        return Ok(0.5);
    }

    Ok(0.9)
}

Performance Impact¶

More rounds = Better quality, slower
Stricter threshold = Higher quality, more iterations
Fewer agents = Faster, less thorough

See MARS Overview and Configuration.