Skip to content

Commit a7ee1b7

Browse files
authored
feat(skills): add ARISE trace evolution, STEM pattern-to-skill, and ERL heuristics (#2398, #2462, #2463) (#2482)
Implements three complementary skill evolution mechanisms in zeph-skills: - ARISE: summarizes successful multi-tool solution traces into SKILL.md description updates via background LLM call; uses historical success/failure rates to bias embedding candidate re-ranking (policy-driven selection) - STEM: tracks recurring tool-use patterns in skill_usage_log (migration 057); auto-generates quarantined SKILL.md candidates when a pattern repeats >= N times with sufficient success rate - ERL: post-task reflection extracts transferable heuristics into skill_heuristics table (migration 058); injects top heuristics into skill matching context as a ## Learned Heuristics section All three features are disabled by default, fire-and-forget via tokio::spawn, expose *_provider config fields for multi-model routing, and emit TUI status spinners. LLM-generated skill content is validated and starts at quarantined trust level. Config migration and --init wizard updated for the 14 new fields.
1 parent fc09164 commit a7ee1b7

File tree

15 files changed

+1717
-1
lines changed

15 files changed

+1717
-1
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
1010

1111
- feat(tools): `[tools.shell] max_snapshot_bytes` config option to limit transaction snapshot size — returns `SnapshotFailed` when cumulative copied bytes exceed the limit; `0` means unlimited (default)
1212
- feat(tools): transactional `ShellExecutor` — opt-in snapshot+rollback for shell commands; file-level snapshot is captured before write commands (detected via `WRITE_INDICATORS` heuristic + redirection target extraction); rollback restores originals on configurable exit codes; new `ShellConfig` fields: `transactional`, `transaction_scope` (glob-filtered paths), `auto_rollback`, `auto_rollback_exit_codes`, `snapshot_required`; new `ToolError::SnapshotFailed`, `AuditResult::Rollback`, `ToolEvent::Rollback` variants; backed by `tempfile::TempDir` for automatic cleanup on success (closes #2414)
13+
- feat(skills): ARISE trace-based skill improvement — after a successful multi-tool turn, `spawn_arise_trace_improvement()` fires a background LLM call to summarize the tool sequence into an improved SKILL.md body; new version is saved with `source = 'arise_trace'` and starts at `quarantined` trust level (never inherits parent trust); controlled by `[skills.learning] arise_enabled = false`, `arise_min_tool_calls = 2`, `arise_trace_provider = ""` (closes #2398)
14+
- feat(skills): STEM pattern-to-skill conversion — `spawn_stem_detection()` logs every tool sequence to `skill_usage_log` after each turn; `find_recurring_patterns()` detects sequences meeting `stem_min_occurrences` and `stem_min_success_rate`; qualifying patterns trigger a background LLM call to generate a SKILL.md candidate written to the managed skills directory at `quarantined` trust level; retention pruning via `stem_retention_days`; controlled by `[skills.learning] stem_enabled = false` (closes #2462)
15+
- feat(skills): ERL experiential reflective learning — `spawn_erl_reflection()` fires a background LLM call after each successful skill+tool turn to extract transferable heuristics; heuristics are stored in `skill_heuristics` table with Jaccard deduplication; at skill matching time `build_erl_heuristics_prompt()` prepends a `## Learned Heuristics` section to the skill context; controlled by `[skills.learning] erl_enabled = false`, `erl_max_heuristics_per_skill = 3`, `erl_min_confidence = 0.5` (closes #2463)
16+
- feat(db): migrations 057 (`skill_usage_log`) and 058 (`skill_heuristics`) for STEM and ERL storage; both SQLite and Postgres variants
17+
- feat(config): `LearningConfig` extended with 14 new fields for ARISE/STEM/ERL (all disabled by default); new fields registered in `config/default.toml` as commented-out entries
1318
- feat(core): `/new` slash command — resets conversation context (messages, compaction state, tool caches, focus/sidequest, pending plans) while preserving memory, MCP connections, providers, and skills; creates a new `ConversationId` in SQLite for audit trail; generates a session digest for the outgoing conversation fire-and-forget unless `--no-digest` is passed; active sub-agents and background compression tasks are cancelled; `--keep-plan` preserves a pending plan graph; available in all channels (CLI, TUI, Telegram) via the unified `handle_builtin_command` path (closes #2451)
1419
- feat(memory): Kumiho AGM-inspired belief revision for graph edges — new `BeliefRevisionConfig` with `similarity_threshold`; `find_superseded_edges()` uses contradiction heuristic (same relation domain + high cosine similarity = supersession); `superseded_by` column added to `graph_edges` for audit trail; `invalidate_edge_with_supersession()` in `GraphStore`; `resolve_edge_typed` accepts optional `BeliefRevisionConfig`; controlled by `[memory.graph.belief_revision] enabled = false` (migration 056, closes #2441)
1520
- feat(memory): D-MEM RPE-based tiered graph extraction routing — `RpeRouter` computes heuristic surprise score from context similarity and entity novelty; low-RPE turns skip the MAGMA LLM extraction pipeline; `consecutive_skips` safety valve forces extraction after `max_skip_turns` consecutive skips; `extract_candidate_entities()` helper for cheap regex+keyword entity detection; controlled by `[memory.graph.rpe] enabled = false, threshold = 0.3, max_skip_turns = 5` (closes #2442)

config/default.toml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,23 @@ cooldown_minutes = 60
188188
# max_auto_sections = 3
189189
# Domain evaluation gate before promoting auto-generated skills
190190
# domain_success_gate = false
191+
# ARISE: trace-based skill improvement from successful multi-tool turns
192+
# arise_enabled = false
193+
# arise_min_tool_calls = 2
194+
# arise_trace_provider = ""
195+
# STEM: automatic tool pattern detection and skill candidate generation
196+
# stem_enabled = false
197+
# stem_min_occurrences = 3
198+
# stem_min_success_rate = 0.8
199+
# stem_provider = ""
200+
# stem_retention_days = 90
201+
# stem_pattern_window_days = 30
202+
# ERL: post-task heuristic extraction and injection at skill match time
203+
# erl_enabled = false
204+
# erl_extract_provider = ""
205+
# erl_max_heuristics_per_skill = 3
206+
# erl_dedup_threshold = 0.9
207+
# erl_min_confidence = 0.5
191208

192209
[skills.trust]
193210
# Default trust level for newly discovered skills: trusted, verified, quarantined, blocked

crates/zeph-config/src/learning.rs

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,38 @@ fn default_max_auto_sections() -> u32 {
7979
3
8080
}
8181

82+
fn default_arise_min_tool_calls() -> u32 {
83+
2
84+
}
85+
86+
fn default_stem_min_occurrences() -> u32 {
87+
3
88+
}
89+
90+
fn default_stem_min_success_rate() -> f64 {
91+
0.8
92+
}
93+
94+
fn default_stem_retention_days() -> u32 {
95+
90
96+
}
97+
98+
fn default_stem_pattern_window_days() -> u32 {
99+
30
100+
}
101+
102+
fn default_erl_max_heuristics_per_skill() -> u32 {
103+
3
104+
}
105+
106+
fn default_erl_dedup_threshold() -> f32 {
107+
0.9
108+
}
109+
110+
fn default_erl_min_confidence() -> f64 {
111+
0.5
112+
}
113+
82114
/// Strategy for detecting implicit user corrections.
83115
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Deserialize, Serialize)]
84116
#[serde(rename_all = "lowercase")]
@@ -184,6 +216,58 @@ pub struct LearningConfig {
184216
/// activation is skipped (the version is still saved for manual review).
185217
#[serde(default)]
186218
pub domain_success_gate: bool,
219+
220+
// --- ARISE: trace-based skill improvement ---
221+
/// Enable ARISE trace-based skill improvement (disabled by default).
222+
#[serde(default)]
223+
pub arise_enabled: bool,
224+
/// Minimum tool calls in a turn to trigger ARISE trace improvement.
225+
#[serde(default = "default_arise_min_tool_calls")]
226+
pub arise_min_tool_calls: u32,
227+
/// Provider name from `[[llm.providers]]` for ARISE trace summarization.
228+
/// Empty = fall back to primary provider.
229+
#[serde(default)]
230+
pub arise_trace_provider: String,
231+
232+
// --- STEM: pattern-to-skill conversion ---
233+
/// Enable STEM automatic tool pattern detection and skill generation (disabled by default).
234+
#[serde(default)]
235+
pub stem_enabled: bool,
236+
/// Minimum occurrences of a tool sequence before generating a skill candidate.
237+
#[serde(default = "default_stem_min_occurrences")]
238+
pub stem_min_occurrences: u32,
239+
/// Minimum success rate of the pattern before generating a skill candidate.
240+
#[serde(default = "default_stem_min_success_rate")]
241+
pub stem_min_success_rate: f64,
242+
/// Provider name from `[[llm.providers]]` for STEM skill generation.
243+
/// Empty = fall back to primary provider.
244+
#[serde(default)]
245+
pub stem_provider: String,
246+
/// Days to retain rows in `skill_usage_log` before pruning.
247+
#[serde(default = "default_stem_retention_days")]
248+
pub stem_retention_days: u32,
249+
/// Window in days for pattern detection queries (limits scan cost on large tables).
250+
#[serde(default = "default_stem_pattern_window_days")]
251+
pub stem_pattern_window_days: u32,
252+
253+
// --- ERL: experiential reflective learning ---
254+
/// Enable ERL post-task heuristic extraction (disabled by default).
255+
#[serde(default)]
256+
pub erl_enabled: bool,
257+
/// Provider name from `[[llm.providers]]` for ERL heuristic extraction.
258+
/// Empty = fall back to primary provider.
259+
#[serde(default)]
260+
pub erl_extract_provider: String,
261+
/// Maximum heuristics prepended per skill at match time.
262+
#[serde(default = "default_erl_max_heuristics_per_skill")]
263+
pub erl_max_heuristics_per_skill: u32,
264+
/// Text similarity threshold (Jaccard) for heuristic deduplication.
265+
/// When exact text match exceeds this, increment `use_count` instead of inserting.
266+
#[serde(default = "default_erl_dedup_threshold")]
267+
pub erl_dedup_threshold: f32,
268+
/// Minimum confidence to include a heuristic at match time.
269+
#[serde(default = "default_erl_min_confidence")]
270+
pub erl_min_confidence: f64,
187271
}
188272

189273
impl Default for LearningConfig {
@@ -215,6 +299,20 @@ impl Default for LearningConfig {
215299
min_sessions_before_demote: default_min_sessions_before_demote(),
216300
max_auto_sections: default_max_auto_sections(),
217301
domain_success_gate: false,
302+
arise_enabled: false,
303+
arise_min_tool_calls: default_arise_min_tool_calls(),
304+
arise_trace_provider: String::new(),
305+
stem_enabled: false,
306+
stem_min_occurrences: default_stem_min_occurrences(),
307+
stem_min_success_rate: default_stem_min_success_rate(),
308+
stem_provider: String::new(),
309+
stem_retention_days: default_stem_retention_days(),
310+
stem_pattern_window_days: default_stem_pattern_window_days(),
311+
erl_enabled: false,
312+
erl_extract_provider: String::new(),
313+
erl_max_heuristics_per_skill: default_erl_max_heuristics_per_skill(),
314+
erl_dedup_threshold: default_erl_dedup_threshold(),
315+
erl_min_confidence: default_erl_min_confidence(),
218316
}
219317
}
220318
}
@@ -298,6 +396,68 @@ feedback_provider = "fast""#;
298396
assert_eq!(cfg.min_sessions_before_demote, 1);
299397
}
300398

399+
#[test]
400+
fn arise_stem_erl_defaults() {
401+
let cfg = LearningConfig::default();
402+
assert!(!cfg.arise_enabled);
403+
assert_eq!(cfg.arise_min_tool_calls, 2);
404+
assert!(cfg.arise_trace_provider.is_empty());
405+
assert!(!cfg.stem_enabled);
406+
assert_eq!(cfg.stem_min_occurrences, 3);
407+
assert!((cfg.stem_min_success_rate - 0.8).abs() < f64::EPSILON);
408+
assert!(cfg.stem_provider.is_empty());
409+
assert_eq!(cfg.stem_retention_days, 90);
410+
assert_eq!(cfg.stem_pattern_window_days, 30);
411+
assert!(!cfg.erl_enabled);
412+
assert!(cfg.erl_extract_provider.is_empty());
413+
assert_eq!(cfg.erl_max_heuristics_per_skill, 3);
414+
assert!((cfg.erl_dedup_threshold - 0.9).abs() < f32::EPSILON);
415+
assert!((cfg.erl_min_confidence - 0.5).abs() < f64::EPSILON);
416+
}
417+
418+
#[test]
419+
fn arise_stem_erl_serde_roundtrip() {
420+
let toml = r#"
421+
arise_enabled = true
422+
arise_min_tool_calls = 3
423+
arise_trace_provider = "fast"
424+
stem_enabled = true
425+
stem_min_occurrences = 5
426+
stem_min_success_rate = 0.9
427+
stem_provider = "mid"
428+
stem_retention_days = 60
429+
stem_pattern_window_days = 14
430+
erl_enabled = true
431+
erl_extract_provider = "fast"
432+
erl_max_heuristics_per_skill = 5
433+
erl_dedup_threshold = 0.85
434+
erl_min_confidence = 0.6
435+
"#;
436+
let cfg: LearningConfig = toml::from_str(toml).unwrap();
437+
assert!(cfg.arise_enabled);
438+
assert_eq!(cfg.arise_min_tool_calls, 3);
439+
assert_eq!(cfg.arise_trace_provider, "fast");
440+
assert!(cfg.stem_enabled);
441+
assert_eq!(cfg.stem_min_occurrences, 5);
442+
assert!((cfg.stem_min_success_rate - 0.9).abs() < f64::EPSILON);
443+
assert_eq!(cfg.stem_provider, "mid");
444+
assert_eq!(cfg.stem_retention_days, 60);
445+
assert_eq!(cfg.stem_pattern_window_days, 14);
446+
assert!(cfg.erl_enabled);
447+
assert_eq!(cfg.erl_extract_provider, "fast");
448+
assert_eq!(cfg.erl_max_heuristics_per_skill, 5);
449+
assert!((cfg.erl_dedup_threshold - 0.85_f32).abs() < f32::EPSILON);
450+
assert!((cfg.erl_min_confidence - 0.6).abs() < f64::EPSILON);
451+
}
452+
453+
#[test]
454+
fn arise_stem_erl_empty_section_uses_defaults() {
455+
let cfg: LearningConfig = toml::from_str("").unwrap();
456+
assert!(!cfg.arise_enabled);
457+
assert!(!cfg.stem_enabled);
458+
assert!(!cfg.erl_enabled);
459+
}
460+
301461
#[test]
302462
fn learning_config_new_fields_serde_roundtrip() {
303463
let toml = r"

crates/zeph-core/src/agent/context/assembly.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1484,11 +1484,16 @@ impl<C: Channel> Agent<C> {
14841484
other => other,
14851485
};
14861486

1487-
let skills_prompt = if effective_mode == crate::config::SkillPromptMode::Compact {
1487+
let mut skills_prompt = if effective_mode == crate::config::SkillPromptMode::Compact {
14881488
format_skills_prompt_compact(&active_skills)
14891489
} else {
14901490
format_skills_prompt(&active_skills, &trust_map, &health_map)
14911491
};
1492+
// ERL: append learned heuristics for active skills (no-op when erl_enabled = false).
1493+
let erl_suffix = self.build_erl_heuristics_prompt().await;
1494+
if !erl_suffix.is_empty() {
1495+
skills_prompt.push_str(&erl_suffix);
1496+
}
14921497
let catalog_prompt = format_skills_catalog(&remaining_skills);
14931498
self.skill_state
14941499
.last_skills_prompt

0 commit comments

Comments
 (0)