You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: per-tool inline filter stats in CLI chat (#473)
* feat: per-tool inline filter stats in CLI chat
Add raw/filtered line counts to FilterResult and FilterStats.
Display inline savings after tool output in CLI:
[shell] cargo test (342 lines -> 28 lines, 91.8% filtered)
Update CHANGELOG, README, and mdBook docs for M26/M26.1 features.
Closes#449
* fix: handle filter_stats field in ToolEvent pattern match
Copy file name to clipboardExpand all lines: README.md
+20-27Lines changed: 20 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingF
15
15
16
16
## Why Zeph
17
17
18
-
**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed.
18
+
**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. Smart output filtering further reduces token consumption by 70-99% for common tool outputs (test results, git logs, clippy diagnostics, directory listings, log deduplication) — per-command filter stats are shown inline in CLI chat and aggregated in the TUI dashboard.
19
19
20
20
**Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
|**Skill Trust & Quarantine**| 4-tier trust model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity verification, anomaly detection with automatic blocking, and restricted tool access for untrusted skills ||
119
119
|**Prompt Caching**| Automatic prompt caching for Anthropic and OpenAI providers, reducing latency and cost on repeated context ||
120
120
|**Graceful Shutdown**| Ctrl-C triggers ordered teardown with MCP server cleanup and pending task draining ||
121
-
|**TUI Dashboard**| ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) |[TUI](https://bug-ops.github.io/zeph/guide/tui.html)|
121
+
|**TUI Dashboard**| ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics (including filter savings), message queueing (max 10, FIFO with Ctrl+K clear) |[TUI](https://bug-ops.github.io/zeph/guide/tui.html)|
122
122
|**Multi-Channel I/O**| CLI, Discord, Slack, Telegram, and TUI with streaming support |[Channels](https://bug-ops.github.io/zeph/guide/channels.html)|
|`qdrant`| On | Qdrant vector search for skills and MCP tools |
163
-
|`self-learning`| On | Skill evolution system |
164
-
|`vault-age`| On | Age-encrypted secret storage |
165
-
|`a2a`| Off | A2A protocol client and server |
166
-
|`candle`| Off | Local HuggingFace inference (GGUF) |
167
-
|`index`| Off | AST-based code indexing and semantic retrieval |
168
-
|`mcp`| Off | MCP client for external tool servers |
169
-
|`orchestrator`| Off | Multi-model routing with fallback |
170
-
|`router`| Off | Prompt-based model selection via RouterProvider |
171
-
|`discord`| Off | Discord bot with Gateway v10 WebSocket |
172
-
|`slack`| Off | Slack bot with Events API webhook |
173
-
|`gateway`| Off | HTTP gateway for webhook ingestion |
174
-
|`daemon`| Off | Daemon supervisor for component lifecycle |
175
-
|`scheduler`| Off | Cron-based periodic task scheduler |
176
-
|`otel`| Off | OpenTelemetry OTLP export for Prometheus/Grafana |
177
-
|`metal`| Off | Metal GPU acceleration (macOS) |
178
-
|`tui`| Off | ratatui TUI dashboard with real-time metrics |
179
-
|`cuda`| Off | CUDA GPU acceleration (Linux) |
158
+
The following features are always compiled in (no flag needed): `openai`, `compatible`, `orchestrator`, `router`, `self-learning`, `qdrant`, `vault-age`, `mcp`.
159
+
160
+
| Feature | Description |
161
+
|---------|-------------|
162
+
|`a2a`| A2A protocol client and server |
163
+
|`candle`| Local HuggingFace inference (GGUF) |
164
+
|`index`| AST-based code indexing and semantic retrieval |
165
+
|`discord`| Discord bot with Gateway v10 WebSocket |
166
+
|`slack`| Slack bot with Events API webhook |
167
+
|`gateway`| HTTP gateway for webhook ingestion |
168
+
|`daemon`| Daemon supervisor for component lifecycle |
Copy file name to clipboardExpand all lines: docs/src/architecture/token-efficiency.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,6 +55,30 @@ MCP tools follow the same pipeline:
55
55
56
56
Prompt size stays constant as you add more capabilities. The only cost of more skills is a slightly larger embedding index in Qdrant or memory.
57
57
58
+
### Output Filter Pipeline
59
+
60
+
Tool output is compressed before it enters the LLM context. A command-aware filter pipeline matches each shell command against a set of built-in filters (test runner output, Clippy diagnostics, git log/diff, directory listings, log deduplication) and strips noise while preserving signal. The pipeline runs synchronously inside the tool executor, so the LLM never sees raw output.
After each filtered execution, CLI mode prints a one-line stats summary and TUI mode accumulates the savings in the Resources panel. See [Tool System — Output Filter Pipeline](../guide/tools.md#output-filter-pipeline) for configuration details.
71
+
72
+
### Token Savings Tracking
73
+
74
+
`MetricsSnapshot` tracks cumulative filter metrics across the session:
75
+
76
+
-`filter_raw_tokens` / `filter_saved_tokens` — aggregate volume before and after filtering
77
+
-`filter_total_commands` / `filter_filtered_commands` — hit rate denominator/numerator
78
+
-`filter_confidence_full/partial/fallback` — distribution of filter confidence levels
79
+
80
+
These feed into the [TUI filter metrics display](../guide/tui.md#filter-metrics) and are emitted as `tracing::debug!` every 50 commands.
81
+
58
82
### Two-Tier Context Pruning
59
83
60
84
Long conversations accumulate tool outputs that consume significant context space. Zeph uses a two-tier strategy: Tier 1 selectively prunes old tool outputs (cheap, no LLM call), and Tier 2 falls back to full LLM compaction only when Tier 1 is insufficient. See [Context Engineering](../guide/context.md) for details.
0 commit comments