|
72 | 72 | |---|---| |
73 | 73 | | **Hybrid inference** | Ollama, Claude, OpenAI, Google Gemini, any OpenAI-compatible API, or fully local via Candle (GGUF). Providers are declared as `[[llm.providers]]` entries in config. Gemini supports SSE streaming, thinking-part surfacing (Gemini 2.5), and streaming `functionCall` parts. Multi-model orchestrator with fallback chains, EMA latency routing, and adaptive Thompson Sampling for exploration/exploitation-balanced model selection. Cascade routing supports `cost_tiers` for explicit cheapest-first provider ordering and `ClassifierMode::Judge` for LLM-scored query routing. **Complexity triage routing** (`LlmRoutingStrategy::Triage`) classifies each request into Simple/Medium/Complex/Expert tiers before inference and dispatches to the tier-matched provider pool, avoiding over-provisioning cheap queries to expensive models. **PILOT LinUCB bandit routing** (`LlmRoutingStrategy::Bandit`) applies a contextual LinUCB bandit to provider selection — features include query complexity, provider latency history, and time-of-day signals; configured via `[llm.router.bandit]`. Claude extended context (`--extended-context` flag or `enable_extended_context = true`) enables the 1M token window with a TUI `[1M CTX]` header badge; cost warning emitted automatically. Built-in pricing includes gpt-5 and gpt-5-mini. [→ Providers](https://bug-ops.github.io/zeph/concepts/providers.html) | |
74 | 74 | | **Skills-first architecture** | YAML+Markdown skill files with BM25+cosine hybrid retrieval. Bayesian re-ranking, 4-tier trust model, and self-learning evolution — skills improve from real usage. Agent-as-a-Judge feedback detection with adaptive regex/LLM hybrid analysis across 7 languages (English, Russian, Spanish, German, French, Portuguese, Chinese). The `load_skill` tool lets the LLM fetch the full body of any skill outside the active TOP-N set on demand. [→ Skills](https://bug-ops.github.io/zeph/concepts/skills.html) · [→ Self-learning](https://bug-ops.github.io/zeph/advanced/self-learning.html) | |
75 | | -| **Context engineering** | Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. **HiAgent subgoal-aware compaction** tracks active and completed subgoals — active subgoal messages are protected from eviction while completed subgoals are candidates for summarization with MIG redundancy scoring. Large tool outputs are stored in SQLite (not on disk) and injected on demand via the native `read_overflow` tool, eliminating absolute-path leakage and enabling automatic cleanup on conversation delete. **Failure-driven compression guidelines** (ACON): after each hard compaction, the agent monitors responses for context-loss signals; confirmed failure pairs train an LLM-generated `<compression-guidelines>` block that is injected into every future compaction prompt. `--debug-dump [PATH]` writes every LLM request, response, and raw tool output to numbered files for context debugging; `--dump-format <json\|raw\|trace>` (or `/dump-format` at runtime) switches the output format — `trace` emits OpenTelemetry-compatible OTLP JSON with a session → iteration → LLM-call/tool-call/memory-search span hierarchy. [→ Context](https://bug-ops.github.io/zeph/advanced/context.html) · [→ Debug Dump](https://bug-ops.github.io/zeph/advanced/debug-dump.html) | |
76 | | -| **Semantic memory** | SQLite (default) or PostgreSQL + Qdrant with MMR re-ranking, temporal decay, write-time importance scoring, query-aware memory routing (keyword/semantic/hybrid/episodic), cross-session recall, implicit correction detection, and credential scrubbing. **Structured anchored summarization** preserves factual anchors during compaction; **compaction probe validation** verifies quality via probe questions before committing. **Semantic response caching** deduplicates recall queries. Optional **graph memory** adds entity-relationship tracking with typed edges (8 relationship types), FTS5-accelerated entity search, BFS traversal for multi-hop reasoning, bi-temporal edge versioning (`valid_from`/`valid_to`) with point-in-time historical queries (`/graph history <name>`), configurable `temporal_decay_rate` for recency-weighted scoring, and embedding-based entity resolution for semantic deduplication. **SYNAPSE spreading activation** propagates energy through the entity graph with hop-by-hop decay, lateral inhibition, and edge-type filtering (`[memory.graph.spreading_activation]`). **A-MEM dynamic note linking** creates fire-and-forget similarity edges between notes on each graph write (`[memory.graph.note_linking]`). Background LLM extraction runs fire-and-forget on each turn; graph facts are injected into the context window alongside semantic recall. [→ Memory](https://bug-ops.github.io/zeph/concepts/memory.html) · [→ Graph Memory](https://bug-ops.github.io/zeph/concepts/graph-memory.html) | |
| 75 | +| **Context engineering** | Semantic skill selection, command-aware output filters, tool-pair summarization with deferred application (pre-computed eagerly, applied lazily to stabilize the Claude API prompt cache prefix), proactive context compression (reactive + proactive strategies), and reactive middle-out compaction keep the window efficient under any load. Three-tier compaction pipeline: deferred summary application at 70% context usage → pruning at 80% → LLM compaction on overflow. **HiAgent subgoal-aware compaction** tracks active and completed subgoals — active subgoal messages are protected from eviction while completed subgoals are candidates for summarization with MIG redundancy scoring. Large tool outputs are stored in SQLite (not on disk) and injected on demand via the native `read_overflow` tool, eliminating absolute-path leakage and enabling automatic cleanup on conversation delete. **Failure-driven compression guidelines** (ACON): after each hard compaction, the agent monitors responses for context-loss signals; confirmed failure pairs train an LLM-generated `<compression-guidelines>` block that is injected into every future compaction prompt. **ACON per-category guidelines** (`categorized_guidelines = true` in `[memory.compression_guidelines]`) tags each failure pair by category (tool_output / assistant_reasoning / user_context) and maintains separate per-category guideline blocks for finer-grained compression control. **Memex tool-output archive** (`archive_tool_outputs = true` in `[memory.compression]`) saves tool output bodies to SQLite before compaction and injects UUID back-references into summaries, preserving retrievability after the live context is discarded. `--debug-dump [PATH]` writes every LLM request, response, and raw tool output to numbered files for context debugging; `--dump-format <json\|raw\|trace>` (or `/dump-format` at runtime) switches the output format — `trace` emits OpenTelemetry-compatible OTLP JSON with a session → iteration → LLM-call/tool-call/memory-search span hierarchy. [→ Context](https://bug-ops.github.io/zeph/advanced/context.html) · [→ Debug Dump](https://bug-ops.github.io/zeph/advanced/debug-dump.html) | |
| 76 | +| **Semantic memory** | SQLite (default) or PostgreSQL + Qdrant with MMR re-ranking, temporal decay, write-time importance scoring, query-aware memory routing (keyword/semantic/hybrid/episodic), cross-session recall, implicit correction detection, and credential scrubbing. **Structured anchored summarization** preserves factual anchors during compaction; **compaction probe validation** verifies quality via probe questions before committing. **Semantic response caching** deduplicates recall queries. Optional **graph memory** adds entity-relationship tracking with typed edges (8 relationship types), FTS5-accelerated entity search, BFS traversal for multi-hop reasoning, bi-temporal edge versioning (`valid_from`/`valid_to`) with point-in-time historical queries (`/graph history <name>`), configurable `temporal_decay_rate` for recency-weighted scoring, and embedding-based entity resolution for semantic deduplication. **SYNAPSE spreading activation** propagates energy through the entity graph with hop-by-hop decay, lateral inhibition, and edge-type filtering (`[memory.graph.spreading_activation]`). **A-MEM dynamic note linking** creates fire-and-forget similarity edges between notes on each graph write (`[memory.graph.note_linking]`). **RL-based admission control** (`admission_strategy = "rl"`) replaces the static heuristic write-gate with a logistic regression model trained on the `was_recalled` signal; falls back to heuristic until `rl_min_samples` is reached. Background LLM extraction runs fire-and-forget on each turn; graph facts are injected into the context window alongside semantic recall. [→ Memory](https://bug-ops.github.io/zeph/concepts/memory.html) · [→ Graph Memory](https://bug-ops.github.io/zeph/concepts/graph-memory.html) | |
77 | 77 | | **IDE integration (ACP)** | Stdio, HTTP+SSE, or WebSocket transport. Multi-session isolation with per-session conversation history and SQLite persistence. Session modes, live tool streaming, LSP diagnostics injection, file following, usage reporting. Works in Zed, Helix, VS Code. [→ ACP](https://bug-ops.github.io/zeph/advanced/acp.html) | |
78 | 78 | | **Multi-channel I/O** | CLI, Telegram, TUI dashboard — all with streaming. Voice and vision input supported. [→ Channels](https://bug-ops.github.io/zeph/advanced/channels.html) | |
79 | 79 | | **MCP & A2A** | MCP client with full tool exposure to the model. All MCP tool definitions are sanitized at registration time and again on every `tools/list_changed` refresh — 17 injection-detection patterns, Unicode Cf-category strip, and a 1024-byte description cap prevent prompt injection via malicious server metadata. Configure [mcpls](https://github.com/bug-ops/mcpls) as an MCP server for compiler-level code intelligence: hover, definition, references, diagnostics, call hierarchy, and safe rename via rust-analyzer, pyright, gopls, and 30+ other LSP servers. A2A agent-to-agent protocol for multi-agent orchestration. [→ MCP](https://bug-ops.github.io/zeph/guides/mcp.html) · [→ LSP](https://bug-ops.github.io/zeph/guides/lsp.html) · [→ A2A](https://bug-ops.github.io/zeph/advanced/a2a.html) | |
|
0 commit comments