Skip to content

feat(routing): BaRP cost_weight dial and MAR memory_hit_confidence signal#2466

Merged
bug-ops merged 1 commit intomainfrom
barp-memory-augmented-routing
Mar 30, 2026
Merged

feat(routing): BaRP cost_weight dial and MAR memory_hit_confidence signal#2466
bug-ops merged 1 commit intomainfrom
barp-memory-augmented-routing

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 30, 2026

Summary

  • Add cost_weight: f32 to BanditConfig: operators can dial cost vs quality trade-off at runtime without retraining (BaRP, arXiv:2510.07429)
  • Add memory_hit_confidence routing signal: high-confidence memory recall routes to cheaper model, recovering 69% of large-model quality at 96% cost reduction (MAR, arXiv:2603.23013)
  • Both features integrate with the existing LinUCB bandit router; backward-compatible (serde defaults, zero cost_weight preserves current behavior)

Test plan

  • cost_weight = 0.0 → no routing change (existing tests pass)
  • cost_weight = 1.0 → cheap providers win in arm selection
  • memory_hit_confidence >= threshold + cost_weight > 0.0 → boost applied
  • memory_hit_confidence >= threshold + cost_weight = 0.0 → no boost
  • memory_hit_confidence < threshold → no boost
  • provider_cost_estimate unknown provider → returns 0.3
  • Live session test with cargo run --features full -- --config .local/config/testing.toml

Closes #2415, closes #2443

@github-actions github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 30, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 30, 2026 20:15
…e signal

Implements two complementary enhancements to the LinUCB bandit router:

BaRP (#2415): add cost_weight: f32 field to BanditConfig (0.0 = pure
quality, 1.0 = pure cost). The UCB arm selection penalizes expensive
providers proportionally: adjusted_ucb = raw_ucb - cost_weight * cost_est.
provider_cost_estimate() maps model names to relative cost tiers using
pattern matching on both model_id and provider name fields.

MAR (#2443): add memory_hit_confidence: Option<f32> propagation from
SemanticMemory top-1 recall score through MemoryState.last_recall_confidence
to the router. When confidence >= memory_confidence_threshold (default 0.9),
cheap providers receive a boost: (1 - cost_est) * confidence * cost_weight.
When cost_weight = 0.0, the boost is zero, preserving pure-quality mode.

Both config fields have serde defaults (cost_weight = 0.0, threshold = 0.9)
and are clamped to [0.0, 1.0] at bootstrap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

1 participant