feat(routing): BaRP cost_weight dial and MAR memory_hit_confidence signal#2466
Merged
feat(routing): BaRP cost_weight dial and MAR memory_hit_confidence signal#2466
Conversation
…e signal Implements two complementary enhancements to the LinUCB bandit router: BaRP (#2415): add cost_weight: f32 field to BanditConfig (0.0 = pure quality, 1.0 = pure cost). The UCB arm selection penalizes expensive providers proportionally: adjusted_ucb = raw_ucb - cost_weight * cost_est. provider_cost_estimate() maps model names to relative cost tiers using pattern matching on both model_id and provider name fields. MAR (#2443): add memory_hit_confidence: Option<f32> propagation from SemanticMemory top-1 recall score through MemoryState.last_recall_confidence to the router. When confidence >= memory_confidence_threshold (default 0.9), cheap providers receive a boost: (1 - cost_est) * confidence * cost_weight. When cost_weight = 0.0, the boost is zero, preserving pure-quality mode. Both config fields have serde defaults (cost_weight = 0.0, threshold = 0.9) and are clamped to [0.0, 1.0] at bootstrap.
d2a0934 to
f93dcbb
Compare
This was referenced Mar 30, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cost_weight: f32toBanditConfig: operators can dial cost vs quality trade-off at runtime without retraining (BaRP, arXiv:2510.07429)memory_hit_confidencerouting signal: high-confidence memory recall routes to cheaper model, recovering 69% of large-model quality at 96% cost reduction (MAR, arXiv:2603.23013)Test plan
cost_weight = 0.0→ no routing change (existing tests pass)cost_weight = 1.0→ cheap providers win in arm selectionmemory_hit_confidence >= threshold+cost_weight > 0.0→ boost appliedmemory_hit_confidence >= threshold+cost_weight = 0.0→ no boostmemory_hit_confidence < threshold→ no boostprovider_cost_estimateunknown provider → returns 0.3cargo run --features full -- --config .local/config/testing.tomlCloses #2415, closes #2443