Skip to content

Commit 00b8644

Browse files
authored
feat(routing): add BaRP cost_weight dial and MAR memory_hit_confidence signal (#2466)
Implements two complementary enhancements to the LinUCB bandit router: BaRP (#2415): add cost_weight: f32 field to BanditConfig (0.0 = pure quality, 1.0 = pure cost). The UCB arm selection penalizes expensive providers proportionally: adjusted_ucb = raw_ucb - cost_weight * cost_est. provider_cost_estimate() maps model names to relative cost tiers using pattern matching on both model_id and provider name fields. MAR (#2443): add memory_hit_confidence: Option<f32> propagation from SemanticMemory top-1 recall score through MemoryState.last_recall_confidence to the router. When confidence >= memory_confidence_threshold (default 0.9), cheap providers receive a boost: (1 - cost_est) * confidence * cost_weight. When cost_weight = 0.0, the boost is zero, preserving pure-quality mode. Both config fields have serde defaults (cost_weight = 0.0, threshold = 0.9) and are clamped to [0.0, 1.0] at bootstrap.
1 parent 7813191 commit 00b8644

File tree

16 files changed

+495
-28
lines changed

16 files changed

+495
-28
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
1111
- feat(core): `/new` slash command — resets conversation context (messages, compaction state, tool caches, focus/sidequest, pending plans) while preserving memory, MCP connections, providers, and skills; creates a new `ConversationId` in SQLite for audit trail; generates a session digest for the outgoing conversation fire-and-forget unless `--no-digest` is passed; active sub-agents and background compression tasks are cancelled; `--keep-plan` preserves a pending plan graph; available in all channels (CLI, TUI, Telegram) via the unified `handle_builtin_command` path (closes #2451)
1212
- feat(memory): Kumiho AGM-inspired belief revision for graph edges — new `BeliefRevisionConfig` with `similarity_threshold`; `find_superseded_edges()` uses contradiction heuristic (same relation domain + high cosine similarity = supersession); `superseded_by` column added to `graph_edges` for audit trail; `invalidate_edge_with_supersession()` in `GraphStore`; `resolve_edge_typed` accepts optional `BeliefRevisionConfig`; controlled by `[memory.graph.belief_revision] enabled = false` (migration 056, closes #2441)
1313
- feat(memory): D-MEM RPE-based tiered graph extraction routing — `RpeRouter` computes heuristic surprise score from context similarity and entity novelty; low-RPE turns skip the MAGMA LLM extraction pipeline; `consecutive_skips` safety valve forces extraction after `max_skip_turns` consecutive skips; `extract_candidate_entities()` helper for cheap regex+keyword entity detection; controlled by `[memory.graph.rpe] enabled = false, threshold = 0.3, max_skip_turns = 5` (closes #2442)
14+
- feat(llm): BaRP cost-weight dial in bandit router — `cost_weight` now penalises UCB arm scores during provider selection in addition to the existing reward-signal penalty; higher values bias the bandit toward cheaper providers at inference time; static cost tier heuristics based on provider name and model identifier; `cost_weight` is clamped to [0.0, 1.0] at bootstrap (#2415)
15+
- feat(llm): MAR (Memory-Augmented Routing) — new `[llm.routing.bandit] memory_confidence_threshold` (default 0.9); when the top-1 semantic recall score for the current query meets or exceeds the threshold the bandit biases toward fast/cheap providers; signal propagated from `SemanticMemory::recall` through `ContextSlot::SemanticRecall` to `RouterProvider`; no routing change when `cost_weight = 0.0` (operator intent respected) (#2443)
1416
- feat(acp): expose current model in `session/list` and emit `SessionInfoUpdate` on model change — each in-memory `SessionInfo` now carries `meta.currentModel`; after `session/set_config_option` with `configId=model` a `SessionInfoUpdate` notification with `meta.currentModel` is sent in addition to the existing `ConfigOptionUpdate`; same notification is sent after `session/set_session_model` (closes #2435)
1517
- feat(tools): adversarial policy agent — LLM-based pre-execution tool call validation against plain-language policies; configurable fail-closed/fail-open behavior (`fail_open = false` default); prompt injection hardening via code-fence param quoting; strict allow/deny response parsing; full `ToolExecutor` trait delegation; audit log `adversarial_policy_decision` field; executor chain order `PolicyGateExecutor → AdversarialPolicyGateExecutor → TrustGateExecutor`; gated on `policy-enforcer` feature; config `[tools.adversarial_policy]` (closes #2447)
1618
- feat(memory): Memex tool output archive — before compaction, `ToolOutput` bodies in the compaction range are saved to `tool_overflow` with `archive_type = 'archive'`; archived UUIDs are appended as a postfix after LLM summarization so references survive compaction; controlled by `[memory.compression] archive_tool_outputs = false`; archives are excluded from the short-lived cleanup job via `archive_type` column (migration 054, closes #2432)

crates/zeph-config/src/providers.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,18 @@ pub struct BanditConfig {
565565
/// Do not place it in world-writable directories.
566566
#[serde(default)]
567567
pub state_path: Option<String>,
568+
569+
/// MAR (Memory-Augmented Routing) confidence threshold.
570+
///
571+
/// When the top-1 semantic recall score for the current query is >= this value,
572+
/// the bandit biases toward cheaper providers (the answer is likely in memory).
573+
/// Set to 1.0 to disable MAR. Default: 0.9.
574+
#[serde(default = "default_bandit_memory_confidence_threshold")]
575+
pub memory_confidence_threshold: f32,
576+
}
577+
578+
fn default_bandit_memory_confidence_threshold() -> f32 {
579+
0.9
568580
}
569581

570582
impl Default for BanditConfig {
@@ -578,6 +590,7 @@ impl Default for BanditConfig {
578590
embedding_timeout_ms: default_bandit_embedding_timeout_ms(),
579591
cache_size: default_bandit_cache_size(),
580592
state_path: None,
593+
memory_confidence_threshold: default_bandit_memory_confidence_threshold(),
581594
}
582595
}
583596
}

crates/zeph-core/src/agent/context/assembly.rs

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -237,14 +237,15 @@ impl<C: Channel> Agent<C> {
237237
) -> Result<(), super::super::error::AgentError> {
238238
self.remove_recall_messages();
239239

240-
if let Some(msg) = Self::fetch_semantic_recall(
240+
let (msg, _score) = Self::fetch_semantic_recall(
241241
&self.memory_state,
242242
query,
243243
token_budget,
244244
&self.metrics.token_counter,
245245
None,
246246
)
247-
.await?
247+
.await?;
248+
if let Some(msg) = msg
248249
&& self.msg.messages.len() > 1
249250
{
250251
self.msg.messages.insert(1, msg);
@@ -259,12 +260,12 @@ impl<C: Channel> Agent<C> {
259260
token_budget: usize,
260261
tc: &TokenCounter,
261262
router: Option<&dyn zeph_memory::MemoryRouter>,
262-
) -> Result<Option<Message>, super::super::error::AgentError> {
263+
) -> Result<(Option<Message>, Option<f32>), super::super::error::AgentError> {
263264
let Some(memory) = &memory_state.memory else {
264-
return Ok(None);
265+
return Ok((None, None));
265266
};
266267
if memory_state.recall_limit == 0 || token_budget == 0 {
267-
return Ok(None);
268+
return Ok((None, None));
268269
}
269270

270271
let recalled = if let Some(r) = router {
@@ -277,9 +278,11 @@ impl<C: Channel> Agent<C> {
277278
.await?
278279
};
279280
if recalled.is_empty() {
280-
return Ok(None);
281+
return Ok((None, None));
281282
}
282283

284+
let top_score = recalled.first().map(|r| r.score);
285+
283286
let mut recall_text = String::with_capacity(token_budget * 3);
284287
recall_text.push_str(RECALL_PREFIX);
285288
let mut tokens_used = tc.count_tokens(&recall_text);
@@ -300,12 +303,15 @@ impl<C: Channel> Agent<C> {
300303
}
301304

302305
if tokens_used > tc.count_tokens(RECALL_PREFIX) {
303-
Ok(Some(Message::from_parts(
304-
Role::System,
305-
vec![MessagePart::Recall { text: recall_text }],
306-
)))
306+
Ok((
307+
Some(Message::from_parts(
308+
Role::System,
309+
vec![MessagePart::Recall { text: recall_text }],
310+
)),
311+
top_score,
312+
))
307313
} else {
308-
Ok(None)
314+
Ok((None, None))
309315
}
310316
}
311317

@@ -852,6 +858,7 @@ impl<C: Channel> Agent<C> {
852858
let mut summaries_msg: Option<Message> = None;
853859
let mut cross_session_msg: Option<Message> = None;
854860
let mut recall_msg: Option<Message> = None;
861+
let mut recall_confidence: Option<f32> = None;
855862
let mut doc_rag_msg: Option<Message> = None;
856863
let mut corrections_msg: Option<Message> = None;
857864
let mut code_rag_text: Option<String> = None;
@@ -894,7 +901,7 @@ impl<C: Channel> Agent<C> {
894901
Some(&router),
895902
)
896903
.await
897-
.map(ContextSlot::SemanticRecall)
904+
.map(|(msg, score)| ContextSlot::SemanticRecall(msg, score))
898905
}));
899906
fetchers.push(Box::pin(async {
900907
Self::fetch_document_rag(memory_state, &query, alloc.semantic_recall, &tc)
@@ -922,7 +929,10 @@ impl<C: Channel> Agent<C> {
922929
Ok(slot) => match slot {
923930
ContextSlot::Summaries(msg) => summaries_msg = msg,
924931
ContextSlot::CrossSession(msg) => cross_session_msg = msg,
925-
ContextSlot::SemanticRecall(msg) => recall_msg = msg,
932+
ContextSlot::SemanticRecall(msg, score) => {
933+
recall_msg = msg;
934+
recall_confidence = score;
935+
}
926936
ContextSlot::DocumentRag(msg) => doc_rag_msg = msg,
927937
ContextSlot::Corrections(msg) => corrections_msg = msg,
928938
ContextSlot::CodeContext(text) => code_rag_text = text,
@@ -938,6 +948,9 @@ impl<C: Channel> Agent<C> {
938948
}
939949
}
940950

951+
// Store top-1 recall score on agent state for MAR routing signal.
952+
self.memory_state.last_recall_confidence = recall_confidence;
953+
941954
// MemoryFirst: drain conversation history BEFORE inserting memory messages so that the
942955
// memory inserts land into the shorter array and are not accidentally removed.
943956
if memory_first {

crates/zeph-core/src/agent/context/mod.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,8 @@ pub(super) enum CompactionOutcome {
9393
pub(super) enum ContextSlot {
9494
Summaries(Option<Message>),
9595
CrossSession(Option<Message>),
96-
SemanticRecall(Option<Message>),
96+
/// Semantic recall result. Carries the formatted message and the top-1 similarity score.
97+
SemanticRecall(Option<Message>, Option<f32>),
9798
DocumentRag(Option<Message>),
9899
Corrections(Option<Message>),
99100
CodeContext(Option<String>),

crates/zeph-core/src/agent/context/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3179,6 +3179,7 @@ fn make_mem_state(
31793179
shutdown_summary_max_messages: 20,
31803180
shutdown_summary_timeout_secs: 10,
31813181
structured_summaries: false,
3182+
last_recall_confidence: None,
31823183
digest_config: crate::config::DigestConfig::default(),
31833184
cached_session_digest: None,
31843185
context_strategy: crate::config::ContextStrategy::default(),

crates/zeph-core/src/agent/mod.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,7 @@ impl<C: Channel> Agent<C> {
320320
shutdown_summary_max_messages: 20,
321321
shutdown_summary_timeout_secs: 10,
322322
structured_summaries: false,
323+
last_recall_confidence: None,
323324
digest_config: crate::config::DigestConfig::default(),
324325
cached_session_digest: None,
325326
context_strategy: crate::config::ContextStrategy::default(),
@@ -3771,6 +3772,10 @@ impl<C: Channel> Agent<C> {
37713772
tracing::warn!("context preparation failed: {e:#}");
37723773
}
37733774

3775+
// MAR: propagate top-1 recall confidence to the router for cost-aware routing.
3776+
self.provider
3777+
.set_memory_confidence(self.memory_state.last_recall_confidence);
3778+
37743779
self.learning_engine.reset_reflection();
37753780

37763781
let mut all_image_parts = std::mem::take(&mut self.msg.pending_image_parts);

crates/zeph-core/src/agent/state/mod.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ pub(crate) struct MemoryState {
5555
/// When `true`, hard compaction uses `AnchoredSummary` (structured JSON) instead of
5656
/// free-form prose. Falls back to prose on any LLM or validation failure.
5757
pub(crate) structured_summaries: bool,
58+
/// Top-1 semantic recall score from the most recent `prepare_context` cycle.
59+
/// Used by MAR (Memory-Augmented Routing) to bias the bandit toward cheap providers
60+
/// when memory confidence is high. Reset to `None` at the start of each turn.
61+
pub(crate) last_recall_confidence: Option<f32>,
5862
/// Session digest configuration (#2289).
5963
pub(crate) digest_config: crate::config::DigestConfig,
6064
/// Cached session digest text and its token count, loaded at session start.

crates/zeph-core/src/bootstrap/provider.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -535,11 +535,12 @@ fn create_provider_from_pool(config: &Config) -> Result<AnyProvider, BootstrapEr
535535
let router_bandit_cfg = BanditRouterConfig {
536536
alpha: bandit_cfg.alpha,
537537
dim: bandit_cfg.dim,
538-
cost_weight: bandit_cfg.cost_weight,
538+
cost_weight: bandit_cfg.cost_weight.clamp(0.0, 1.0),
539539
decay_factor: bandit_cfg.decay_factor,
540540
warmup_queries: 0, // computed by with_bandit() from provider count
541541
embedding_timeout_ms: bandit_cfg.embedding_timeout_ms,
542542
cache_size: bandit_cfg.cache_size,
543+
memory_confidence_threshold: bandit_cfg.memory_confidence_threshold.clamp(0.0, 1.0),
543544
};
544545
// Resolve embedding provider for feature vectors.
545546
let embed_provider = if bandit_cfg.embedding_provider.is_empty() {

crates/zeph-llm/src/any.rs

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,16 @@ pub enum AnyProvider {
5656
}
5757

5858
impl AnyProvider {
59+
/// Set the MAR memory recall confidence for the current turn.
60+
///
61+
/// Delegates to [`RouterProvider::set_memory_confidence`] when the inner provider is
62+
/// a bandit router. No-op for all other provider types.
63+
pub fn set_memory_confidence(&self, confidence: Option<f32>) {
64+
if let AnyProvider::Router(r) = self {
65+
r.set_memory_confidence(confidence);
66+
}
67+
}
68+
5969
/// Return a cloneable closure that calls `embed()` on this provider.
6070
pub fn embed_fn(&self) -> impl Fn(&str) -> crate::provider::EmbedFuture + Send + Sync + use<> {
6171
let provider = std::sync::Arc::new(self.clone());
@@ -296,6 +306,10 @@ impl LlmProvider for AnyProvider {
296306
delegate_provider!(self, |p| p.name())
297307
}
298308

309+
fn model_identifier(&self) -> &str {
310+
delegate_provider!(self, |p| p.model_identifier())
311+
}
312+
299313
fn supports_structured_output(&self) -> bool {
300314
delegate_provider!(self, |p| p.supports_structured_output())
301315
}

crates/zeph-llm/src/claude/mod.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -814,6 +814,10 @@ impl LlmProvider for ClaudeProvider {
814814
"claude"
815815
}
816816

817+
fn model_identifier(&self) -> &str {
818+
&self.model
819+
}
820+
817821
fn supports_structured_output(&self) -> bool {
818822
true
819823
}

0 commit comments

Comments
 (0)