Skip to content

research(orchestration): CoE collaborative entropy — uncertainty quantification for multi-LLM routing confidence scoring #2505

@bug-ops

Description

@bug-ops

Source

arXiv:2603.28360 — "CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems" (March 30, 2026)

Finding

CoE combines two entropy signals:

  1. Intra-model semantic entropy: token-probability variance across a single model's output
  2. Inter-model divergence: semantic disagreement between multiple models on the same query

The combined metric predicts routing confidence: high CoE → escalate to stronger model; low CoE → accept current model output. Outperforms single-entropy baselines at detecting hallucination-prone responses by 15%.

Applicability to Zeph

zeph-llm BaRP router uses Thompson sampling on reward signals (latency + quality). CoE adds a proactive uncertainty signal: instead of waiting for a bad outcome to update the bandit, CoE can flag uncertain responses before committing them, triggering a verification or escalation step.

Implementation sketch:

  • LlmProvider::response_entropy() -> f32 — expose per-token log-prob variance (OpenAI/Claude support logprobs)
  • After each response: compute intra-model entropy; if > threshold, run same prompt on secondary provider
  • inter_model_divergence(r1, r2) -> f32 via embedding cosine distance
  • If divergence > threshold: prefer higher-confidence response or escalate to orchestrator
  • Config: [llm.coe] enabled = false intra_threshold = 0.8 inter_threshold = 0.3

Priority rationale

P3: quality improvement for multi-model routing. Practical only with ≥2 providers configured. The response verification layer (PR #1862) already does post-hoc checking; CoE would make it proactive and cost-aware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Research — medium-high complexityllmzeph-llm crate (Ollama, Claude)researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions