Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

### Added

- feat(memory): Memex tool output archive — before compaction, `ToolOutput` bodies in the compaction range are saved to `tool_overflow` with `archive_type = 'archive'`; archived UUIDs are appended as a postfix after LLM summarization so references survive compaction; controlled by `[memory.compression] archive_tool_outputs = false`; archives are excluded from the short-lived cleanup job via `archive_type` column (migration 054, closes #2432)
- feat(memory): ACON per-category compression guidelines — `compression_failure_pairs` now stores a `category` column (`tool_output`, `assistant_reasoning`, `user_context`, `unknown`); the compression guidelines table gains a `category` column with `UNIQUE(version, category)` constraint; the `compression_guidelines` updater can now maintain per-category guideline documents when `categorized_guidelines = true`; failure category is classified from the compaction summary content before calling the LLM (migration 054, closes #2433)
- feat(memory): RL-based admission control — new `AdmissionStrategy` enum with `heuristic` (default) and `rl` variants; `admission_training_data` table records all messages seen by A-MAC (admitted and rejected) to eliminate survivorship bias; `was_recalled` flag is set by `SemanticMemory::recall()` to provide positive training signal; lightweight logistic regression model in `admission_rl.rs` replaces the LLM `future_utility` factor when enough samples are available; weights persisted in `admission_rl_weights` table; controlled by `[memory.admission] admission_strategy`, `rl_min_samples = 500`, `rl_retrain_interval_secs = 3600` (migration 055, closes #2416)
- feat(security): MCP-to-ACP confused-deputy boundary enforcement — when `mcp_to_acp_boundary = true` (default) and agent is in an ACP session, MCP tool results are unconditionally quarantined before entering the ACP response stream; cross-boundary flows emit `CrossBoundaryMcpToAcp` security events and `cross_boundary_mcp_to_acp: true` audit entries (#2417)
- feat(security): env var sanitization for MCP stdio child processes — `LD_PRELOAD`, `LD_LIBRARY_PATH`, `DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, `DYLD_FRAMEWORK_PATH`, `DYLD_FALLBACK_LIBRARY_PATH` are stripped from ACP-provided env vars (#2417)

### Changed

- **BREAKING**: `tool_allowlist` type changed from `Vec<String>` to `Option<Vec<String>>` in `ServerEntry` and `McpServerConfig` — `None` means no override (all tools, with untrusted warning), `Some(vec![])` means explicit deny-all (fail-closed), `Some(vec![...])` filters to named tools (#2417)

### Added (continued)

- feat(acp): implement `session/close` handler — `ZephAcpAgent::close_session` removes the in-memory session entry, fires `cancel_signal` to stop any running turn, and returns idempotent `Ok` for unknown session IDs; advertise `session.close` capability in `initialize()`; gated behind `unstable-session-close` feature included in `default` and `acp-unstable` (closes #2421)
- feat(acp): bump `agent-client-protocol` 0.10.2→0.10.3, `agent-client-protocol-schema` 0.11.2→0.11.3; add `unstable-logout` feature with no-op logout handler and `auth.logout` capability advertisement; add `unstable-elicitation` feature gate (exposes schema types; SDK methods not yet available upstream); fix discovery endpoint `protocol_version` to use `ProtocolVersion::LATEST`; fix double-feature-activation antipattern in `zeph-acp` feature flags (#2411)
- feat(skills): add `category` field to SKILL.md frontmatter — optional grouping for skill library organisation; all 26 bundled skills annotated with categories (`web`, `data`, `dev`, `system`) (#2268)
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,8 @@ zeph
|---|---|
| **Hybrid inference** | Ollama, Claude, OpenAI, Google Gemini, any OpenAI-compatible API, or fully local via Candle (GGUF). Providers are declared as `[[llm.providers]]` entries in config. Gemini supports SSE streaming, thinking-part surfacing (Gemini 2.5), and streaming `functionCall` parts. Multi-model orchestrator with fallback chains, EMA latency routing, and adaptive Thompson Sampling for exploration/exploitation-balanced model selection. Cascade routing supports `cost_tiers` for explicit cheapest-first provider ordering and `ClassifierMode::Judge` for LLM-scored query routing. **Complexity triage routing** (`LlmRoutingStrategy::Triage`) classifies each request into Simple/Medium/Complex/Expert tiers before inference and dispatches to the tier-matched provider pool, avoiding over-provisioning cheap queries to expensive models. **PILOT LinUCB bandit routing** (`LlmRoutingStrategy::Bandit`) applies a contextual LinUCB bandit to provider selection — features include query complexity, provider latency history, and time-of-day signals; configured via `[llm.router.bandit]`. Claude extended context (`--extended-context` flag or `enable_extended_context = true`) enables the 1M token window with a TUI `[1M CTX]` header badge; cost warning emitted automatically. Built-in pricing includes gpt-5 and gpt-5-mini. [→ Providers](https://bug-ops.github.io/zeph/concepts/providers.html) |
| **Skills-first architecture** | YAML+Markdown skill files with BM25+cosine hybrid retrieval. Bayesian re-ranking, 4-tier trust model, and self-learning evolution — skills improve from real usage. Agent-as-a-Judge feedback detection with adaptive regex/LLM hybrid analysis across 7 languages (English, Russian, Spanish, German, French, Portuguese, Chinese). The `load_skill` tool lets the LLM fetch the full body of any skill outside the active TOP-N set on demand. [→ Skills](https://bug-ops.github.io/zeph/concepts/skills.html) · [→ Self-learning](https://bug-ops.github.io/zeph/advanced/self-learning.html) |
| **Context engineering** | Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. **HiAgent subgoal-aware compaction** tracks active and completed subgoals — active subgoal messages are protected from eviction while completed subgoals are candidates for summarization with MIG redundancy scoring. Large tool outputs are stored in SQLite (not on disk) and injected on demand via the native `read_overflow` tool, eliminating absolute-path leakage and enabling automatic cleanup on conversation delete. **Failure-driven compression guidelines** (ACON): after each hard compaction, the agent monitors responses for context-loss signals; confirmed failure pairs train an LLM-generated `<compression-guidelines>` block that is injected into every future compaction prompt. `--debug-dump [PATH]` writes every LLM request, response, and raw tool output to numbered files for context debugging; `--dump-format <json\|raw\|trace>` (or `/dump-format` at runtime) switches the output format — `trace` emits OpenTelemetry-compatible OTLP JSON with a session → iteration → LLM-call/tool-call/memory-search span hierarchy. [→ Context](https://bug-ops.github.io/zeph/advanced/context.html) · [→ Debug Dump](https://bug-ops.github.io/zeph/advanced/debug-dump.html) |
| **Semantic memory** | SQLite (default) or PostgreSQL + Qdrant with MMR re-ranking, temporal decay, write-time importance scoring, query-aware memory routing (keyword/semantic/hybrid/episodic), cross-session recall, implicit correction detection, and credential scrubbing. **Structured anchored summarization** preserves factual anchors during compaction; **compaction probe validation** verifies quality via probe questions before committing. **Semantic response caching** deduplicates recall queries. Optional **graph memory** adds entity-relationship tracking with typed edges (8 relationship types), FTS5-accelerated entity search, BFS traversal for multi-hop reasoning, bi-temporal edge versioning (`valid_from`/`valid_to`) with point-in-time historical queries (`/graph history <name>`), configurable `temporal_decay_rate` for recency-weighted scoring, and embedding-based entity resolution for semantic deduplication. **SYNAPSE spreading activation** propagates energy through the entity graph with hop-by-hop decay, lateral inhibition, and edge-type filtering (`[memory.graph.spreading_activation]`). **A-MEM dynamic note linking** creates fire-and-forget similarity edges between notes on each graph write (`[memory.graph.note_linking]`). Background LLM extraction runs fire-and-forget on each turn; graph facts are injected into the context window alongside semantic recall. [→ Memory](https://bug-ops.github.io/zeph/concepts/memory.html) · [→ Graph Memory](https://bug-ops.github.io/zeph/concepts/graph-memory.html) |
| **Context engineering** | Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. **HiAgent subgoal-aware compaction** tracks active and completed subgoals — active subgoal messages are protected from eviction while completed subgoals are candidates for summarization with MIG redundancy scoring. Large tool outputs are stored in SQLite (not on disk) and injected on demand via the native `read_overflow` tool, eliminating absolute-path leakage and enabling automatic cleanup on conversation delete. **Failure-driven compression guidelines** (ACON): after each hard compaction, the agent monitors responses for context-loss signals; confirmed failure pairs train an LLM-generated `<compression-guidelines>` block that is injected into every future compaction prompt. **ACON per-category guidelines** (`categorized_guidelines = true` in `[memory.compression_guidelines]`) tags each failure pair by category (tool_output / assistant_reasoning / user_context) and maintains separate per-category guideline blocks for finer-grained compression control. **Memex tool-output archive** (`archive_tool_outputs = true` in `[memory.compression]`) saves tool output bodies to SQLite before compaction and injects UUID back-references into summaries, preserving retrievability after the live context is discarded. `--debug-dump [PATH]` writes every LLM request, response, and raw tool output to numbered files for context debugging; `--dump-format <json\|raw\|trace>` (or `/dump-format` at runtime) switches the output format — `trace` emits OpenTelemetry-compatible OTLP JSON with a session → iteration → LLM-call/tool-call/memory-search span hierarchy. [→ Context](https://bug-ops.github.io/zeph/advanced/context.html) · [→ Debug Dump](https://bug-ops.github.io/zeph/advanced/debug-dump.html) |
| **Semantic memory** | SQLite (default) or PostgreSQL + Qdrant with MMR re-ranking, temporal decay, write-time importance scoring, query-aware memory routing (keyword/semantic/hybrid/episodic), cross-session recall, implicit correction detection, and credential scrubbing. **Structured anchored summarization** preserves factual anchors during compaction; **compaction probe validation** verifies quality via probe questions before committing. **Semantic response caching** deduplicates recall queries. Optional **graph memory** adds entity-relationship tracking with typed edges (8 relationship types), FTS5-accelerated entity search, BFS traversal for multi-hop reasoning, bi-temporal edge versioning (`valid_from`/`valid_to`) with point-in-time historical queries (`/graph history <name>`), configurable `temporal_decay_rate` for recency-weighted scoring, and embedding-based entity resolution for semantic deduplication. **SYNAPSE spreading activation** propagates energy through the entity graph with hop-by-hop decay, lateral inhibition, and edge-type filtering (`[memory.graph.spreading_activation]`). **A-MEM dynamic note linking** creates fire-and-forget similarity edges between notes on each graph write (`[memory.graph.note_linking]`). **RL-based admission control** (`admission_strategy = "rl"`) replaces the static heuristic write-gate with a logistic regression model trained on the `was_recalled` signal; falls back to heuristic until `rl_min_samples` is reached. Background LLM extraction runs fire-and-forget on each turn; graph facts are injected into the context window alongside semantic recall. [→ Memory](https://bug-ops.github.io/zeph/concepts/memory.html) · [→ Graph Memory](https://bug-ops.github.io/zeph/concepts/graph-memory.html) |
| **IDE integration (ACP)** | Stdio, HTTP+SSE, or WebSocket transport. Multi-session isolation with per-session conversation history and SQLite persistence. Session modes, live tool streaming, LSP diagnostics injection, file following, usage reporting. Works in Zed, Helix, VS Code. [→ ACP](https://bug-ops.github.io/zeph/advanced/acp.html) |
| **Multi-channel I/O** | CLI, Telegram, TUI dashboard — all with streaming. Voice and vision input supported. [→ Channels](https://bug-ops.github.io/zeph/advanced/channels.html) |
| **MCP & A2A** | MCP client with full tool exposure to the model. All MCP tool definitions are sanitized at registration time and again on every `tools/list_changed` refresh — 17 injection-detection patterns, Unicode Cf-category strip, and a 1024-byte description cap prevent prompt injection via malicious server metadata. Configure [mcpls](https://github.com/bug-ops/mcpls) as an MCP server for compiler-level code intelligence: hover, definition, references, diagnostics, call hierarchy, and safe rename via rust-analyzer, pyright, gopls, and 30+ other LSP servers. A2A agent-to-agent protocol for multi-agent orchestration. [→ MCP](https://bug-ops.github.io/zeph/guides/mcp.html) · [→ LSP](https://bug-ops.github.io/zeph/guides/lsp.html) · [→ A2A](https://bug-ops.github.io/zeph/advanced/a2a.html) |
Expand Down
Loading
Loading