Skip to content

Commit 0b2debe

Browse files
authored
feat: per-tool inline filter stats in CLI chat (#473)
* feat: per-tool inline filter stats in CLI chat Add raw/filtered line counts to FilterResult and FilterStats. Display inline savings after tool output in CLI: [shell] cargo test (342 lines -> 28 lines, 91.8% filtered) Update CHANGELOG, README, and mdBook docs for M26/M26.1 features. Closes #449 * fix: handle filter_stats field in ToolEvent pattern match
1 parent d13f5f8 commit 0b2debe

File tree

11 files changed

+281
-70
lines changed

11 files changed

+281
-70
lines changed

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
77
## [Unreleased]
88

99
### Added
10+
- Per-tool inline filter stats in CLI chat: `[shell] cargo test (342 lines -> 28 lines, 91.8% filtered)` (#449)
11+
- Filter metrics in TUI Resources panel: confidence distribution, command hit rate, token savings (#448)
12+
- Periodic 250ms tick in TUI event loop for real-time metrics refresh (#447)
13+
- Output filter architecture improvements (M26.1): `CommandMatcher` enum, `FilterConfidence`, `FilterPipeline`, `SecurityPatterns`, per-filter TOML config (#452)
14+
- Token savings tracking and metrics for output filtering (#445)
1015
- Smart tool output filtering: command-aware filters that compress tool output before context insertion
1116
- `OutputFilter` trait and `OutputFilterRegistry` with first-match-wins dispatch
1217
- `sanitize_output()` ANSI escape and progress bar stripping (runs on all tool output)
@@ -37,7 +42,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
3742
- Extract bootstrap logic from main.rs into `zeph-core::bootstrap::AppBuilder` (#393): main.rs reduced from 2313 to 978 lines
3843
- `SecurityConfig` and `TimeoutConfig` gain `Clone + Copy`
3944
- `AnyChannel` moved from main.rs to zeph-channels crate
40-
- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index)
45+
- Remove 8 lightweight feature gates, make always-on: openai, compatible, orchestrator, router, self-learning, qdrant, vault-age, mcp (#438)
46+
- Default features reduced to minimal set (empty after M26)
4147
- Skill matcher concurrency reduced from 50 to 20
4248
- `String::with_capacity` in context building loops
4349
- CI updated to use `--features full`

README.md

Lines changed: 20 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingF
1515

1616
## Why Zeph
1717

18-
**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed.
18+
**Token-efficient by design.** Most agent frameworks inject every tool and instruction into every prompt. Zeph embeds skills and MCP tools as vectors (with concurrent embedding via `buffer_unordered`), then selects only the top-K relevant ones per query via cosine similarity. Prompt size stays O(K) -- not O(N) -- regardless of how many capabilities are installed. Smart output filtering further reduces token consumption by 70-99% for common tool outputs (test results, git logs, clippy diagnostics, directory listings, log deduplication) — per-command filter stats are shown inline in CLI chat and aggregated in the TUI dashboard.
1919

2020
**Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
2121

@@ -118,7 +118,7 @@ cargo build --release --features tui
118118
| **Skill Trust & Quarantine** | 4-tier trust model (Trusted/Verified/Quarantined/Blocked) with blake3 integrity verification, anomaly detection with automatic blocking, and restricted tool access for untrusted skills | |
119119
| **Prompt Caching** | Automatic prompt caching for Anthropic and OpenAI providers, reducing latency and cost on repeated context | |
120120
| **Graceful Shutdown** | Ctrl-C triggers ordered teardown with MCP server cleanup and pending task draining | |
121-
| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics, message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
121+
| **TUI Dashboard** | ratatui terminal UI with tree-sitter syntax highlighting, markdown rendering, deferred model warmup, scrollbar, mouse scroll, thinking blocks, conversation history, splash screen, live metrics (including filter savings), message queueing (max 10, FIFO with Ctrl+K clear) | [TUI](https://bug-ops.github.io/zeph/guide/tui.html) |
122122
| **Multi-Channel I/O** | CLI, Discord, Slack, Telegram, and TUI with streaming support | [Channels](https://bug-ops.github.io/zeph/guide/channels.html) |
123123
| **Defense-in-Depth** | Shell sandbox with relative path traversal detection, file sandbox, command filter, secret redaction (Google/GitLab patterns), audit log, SSRF protection (agent + MCP), rate limiter TTL eviction, doom-loop detection, skill trust quarantine | [Security](https://bug-ops.github.io/zeph/security.html) |
124124

@@ -155,34 +155,27 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o
155155

156156
## Feature Flags
157157

158-
| Feature | Default | Description |
159-
|---------|---------|-------------|
160-
| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) |
161-
| `openai` | On | OpenAI provider |
162-
| `qdrant` | On | Qdrant vector search for skills and MCP tools |
163-
| `self-learning` | On | Skill evolution system |
164-
| `vault-age` | On | Age-encrypted secret storage |
165-
| `a2a` | Off | A2A protocol client and server |
166-
| `candle` | Off | Local HuggingFace inference (GGUF) |
167-
| `index` | Off | AST-based code indexing and semantic retrieval |
168-
| `mcp` | Off | MCP client for external tool servers |
169-
| `orchestrator` | Off | Multi-model routing with fallback |
170-
| `router` | Off | Prompt-based model selection via RouterProvider |
171-
| `discord` | Off | Discord bot with Gateway v10 WebSocket |
172-
| `slack` | Off | Slack bot with Events API webhook |
173-
| `gateway` | Off | HTTP gateway for webhook ingestion |
174-
| `daemon` | Off | Daemon supervisor for component lifecycle |
175-
| `scheduler` | Off | Cron-based periodic task scheduler |
176-
| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana |
177-
| `metal` | Off | Metal GPU acceleration (macOS) |
178-
| `tui` | Off | ratatui TUI dashboard with real-time metrics |
179-
| `cuda` | Off | CUDA GPU acceleration (Linux) |
158+
The following features are always compiled in (no flag needed): `openai`, `compatible`, `orchestrator`, `router`, `self-learning`, `qdrant`, `vault-age`, `mcp`.
159+
160+
| Feature | Description |
161+
|---------|-------------|
162+
| `a2a` | A2A protocol client and server |
163+
| `candle` | Local HuggingFace inference (GGUF) |
164+
| `index` | AST-based code indexing and semantic retrieval |
165+
| `discord` | Discord bot with Gateway v10 WebSocket |
166+
| `slack` | Slack bot with Events API webhook |
167+
| `gateway` | HTTP gateway for webhook ingestion |
168+
| `daemon` | Daemon supervisor for component lifecycle |
169+
| `scheduler` | Cron-based periodic task scheduler |
170+
| `otel` | OpenTelemetry OTLP export for Prometheus/Grafana |
171+
| `metal` | Metal GPU acceleration (macOS) |
172+
| `tui` | ratatui TUI dashboard with real-time metrics |
173+
| `cuda` | CUDA GPU acceleration (Linux) |
180174

181175
```bash
182-
cargo build --release # default features only
183-
cargo build --release --features full # all non-platform features
176+
cargo build --release # default build (all always-on features included)
177+
cargo build --release --features full # all optional features
184178
cargo build --release --features metal # macOS Metal GPU
185-
cargo build --release --no-default-features # minimal binary (Ollama + Claude only)
186179
cargo build --release --features tui # with TUI dashboard
187180
```
188181

crates/zeph-core/src/agent/streaming.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -313,6 +313,13 @@ impl<C: Channel, T: ToolExecutor> Agent<C, T> {
313313
let display = self.maybe_redact(&formatted_output);
314314
self.channel.send(&display).await?;
315315

316+
if let Some(ref fs) = output.filter_stats
317+
&& fs.filtered_lines < fs.raw_lines
318+
{
319+
let stats_line = fs.format_inline(&output.tool_name);
320+
self.channel.send(&stats_line).await?;
321+
}
322+
316323
self.push_message(Message::from_parts(
317324
Role::User,
318325
vec![MessagePart::ToolOutput {

crates/zeph-tools/src/executor.rs

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ pub struct ToolCall {
1313
pub struct FilterStats {
1414
pub raw_chars: usize,
1515
pub filtered_chars: usize,
16+
pub raw_lines: usize,
17+
pub filtered_lines: usize,
1618
pub confidence: Option<crate::FilterConfidence>,
1719
}
1820

@@ -30,6 +32,16 @@ impl FilterStats {
3032
pub fn estimated_tokens_saved(&self) -> usize {
3133
self.raw_chars.saturating_sub(self.filtered_chars) / 4
3234
}
35+
36+
#[must_use]
37+
pub fn format_inline(&self, tool_name: &str) -> String {
38+
format!(
39+
"[{tool_name}] {} lines -> {} lines, {:.1}% filtered",
40+
self.raw_lines,
41+
self.filtered_lines,
42+
self.savings_pct()
43+
)
44+
}
3345
}
3446

3547
/// Structured result from tool execution.
@@ -85,6 +97,7 @@ pub enum ToolEvent {
8597
command: String,
8698
output: String,
8799
success: bool,
100+
filter_stats: Option<FilterStats>,
88101
},
89102
}
90103

@@ -293,4 +306,24 @@ mod tests {
293306
};
294307
assert_eq!(fs.estimated_tokens_saved(), 200); // (1000 - 200) / 4
295308
}
309+
310+
#[test]
311+
fn filter_stats_format_inline() {
312+
let fs = FilterStats {
313+
raw_chars: 1000,
314+
filtered_chars: 200,
315+
raw_lines: 342,
316+
filtered_lines: 28,
317+
..Default::default()
318+
};
319+
let line = fs.format_inline("shell");
320+
assert_eq!(line, "[shell] 342 lines -> 28 lines, 80.0% filtered");
321+
}
322+
323+
#[test]
324+
fn filter_stats_format_inline_zero() {
325+
let fs = FilterStats::default();
326+
let line = fs.format_inline("bash");
327+
assert_eq!(line, "[bash] 0 lines -> 0 lines, 0.0% filtered");
328+
}
296329
}

crates/zeph-tools/src/filter/mod.rs

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ pub struct FilterResult {
3838
pub output: String,
3939
pub raw_chars: usize,
4040
pub filtered_chars: usize,
41+
pub raw_lines: usize,
42+
pub filtered_lines: usize,
4143
pub confidence: FilterConfidence,
4244
}
4345

@@ -131,6 +133,8 @@ impl<'a> FilterPipeline<'a> {
131133
FilterResult {
132134
raw_chars: initial_len,
133135
filtered_chars: current.len(),
136+
raw_lines: count_lines(output),
137+
filtered_lines: count_lines(&current),
134138
output: current,
135139
confidence: worst,
136140
}
@@ -531,9 +535,15 @@ pub fn sanitize_output(raw: &str) -> String {
531535
result
532536
}
533537

538+
fn count_lines(s: &str) -> usize {
539+
if s.is_empty() { 0 } else { s.lines().count() }
540+
}
541+
534542
fn make_result(raw: &str, output: String, confidence: FilterConfidence) -> FilterResult {
535543
let filtered_chars = output.len();
536544
FilterResult {
545+
raw_lines: count_lines(raw),
546+
filtered_lines: count_lines(&output),
537547
output,
538548
raw_chars: raw.len(),
539549
filtered_chars,
@@ -577,6 +587,8 @@ mod tests {
577587
output: String::new(),
578588
raw_chars: 1000,
579589
filtered_chars: 200,
590+
raw_lines: 0,
591+
filtered_lines: 0,
580592
confidence: FilterConfidence::Full,
581593
};
582594
assert!((r.savings_pct() - 80.0).abs() < 0.01);
@@ -588,11 +600,30 @@ mod tests {
588600
output: String::new(),
589601
raw_chars: 0,
590602
filtered_chars: 0,
603+
raw_lines: 0,
604+
filtered_lines: 0,
591605
confidence: FilterConfidence::Full,
592606
};
593607
assert!((r.savings_pct()).abs() < 0.01);
594608
}
595609

610+
#[test]
611+
fn count_lines_helper() {
612+
assert_eq!(count_lines(""), 0);
613+
assert_eq!(count_lines("one"), 1);
614+
assert_eq!(count_lines("one\ntwo\nthree"), 3);
615+
assert_eq!(count_lines("trailing\n"), 1);
616+
}
617+
618+
#[test]
619+
fn make_result_counts_lines() {
620+
let raw = "line1\nline2\nline3\nline4\nline5";
621+
let filtered = "line1\nline3".to_owned();
622+
let r = make_result(raw, filtered, FilterConfidence::Full);
623+
assert_eq!(r.raw_lines, 5);
624+
assert_eq!(r.filtered_lines, 2);
625+
}
626+
596627
#[test]
597628
fn registry_disabled_returns_none() {
598629
let r = OutputFilterRegistry::new(false);
@@ -751,6 +782,8 @@ extra_patterns = ["TODO: security review"]
751782
output: "short".into(),
752783
raw_chars: 100,
753784
filtered_chars: 5,
785+
raw_lines: 10,
786+
filtered_lines: 1,
754787
confidence: FilterConfidence::Full,
755788
};
756789
m.record(&r);

crates/zeph-tools/src/shell.rs

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -207,16 +207,8 @@ impl ShellExecutor {
207207
};
208208
self.log_audit(block, result, duration_ms).await;
209209

210-
if let Some(ref tx) = self.tool_event_tx {
211-
let _ = tx.send(ToolEvent::Completed {
212-
tool_name: "bash".to_owned(),
213-
command: (*block).to_owned(),
214-
output: out.clone(),
215-
success: !out.contains("[error]"),
216-
});
217-
}
218-
219210
let sanitized = sanitize_output(&out);
211+
let mut per_block_stats: Option<FilterStats> = None;
220212
let filtered = if let Some(ref registry) = self.output_filter_registry {
221213
match registry.apply(block, &sanitized, exit_code) {
222214
Some(fr) => {
@@ -227,21 +219,41 @@ impl ShellExecutor {
227219
savings_pct = fr.savings_pct(),
228220
"output filter applied"
229221
);
222+
let block_fs = FilterStats {
223+
raw_chars: fr.raw_chars,
224+
filtered_chars: fr.filtered_chars,
225+
raw_lines: fr.raw_lines,
226+
filtered_lines: fr.filtered_lines,
227+
confidence: Some(fr.confidence),
228+
};
230229
let stats =
231230
cumulative_filter_stats.get_or_insert_with(FilterStats::default);
232231
stats.raw_chars += fr.raw_chars;
233232
stats.filtered_chars += fr.filtered_chars;
233+
stats.raw_lines += fr.raw_lines;
234+
stats.filtered_lines += fr.filtered_lines;
234235
stats.confidence = Some(match (stats.confidence, fr.confidence) {
235236
(Some(prev), cur) => crate::filter::worse_confidence(prev, cur),
236237
(None, cur) => cur,
237238
});
239+
per_block_stats = Some(block_fs);
238240
fr.output
239241
}
240242
None => sanitized,
241243
}
242244
} else {
243245
sanitized
244246
};
247+
248+
if let Some(ref tx) = self.tool_event_tx {
249+
let _ = tx.send(ToolEvent::Completed {
250+
tool_name: "bash".to_owned(),
251+
command: (*block).to_owned(),
252+
output: out.clone(),
253+
success: !out.contains("[error]"),
254+
filter_stats: per_block_stats,
255+
});
256+
}
245257
outputs.push(format!("$ {block}\n{filtered}"));
246258
}
247259

docs/src/architecture/token-efficiency.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,30 @@ MCP tools follow the same pipeline:
5555

5656
Prompt size stays constant as you add more capabilities. The only cost of more skills is a slightly larger embedding index in Qdrant or memory.
5757

58+
### Output Filter Pipeline
59+
60+
Tool output is compressed before it enters the LLM context. A command-aware filter pipeline matches each shell command against a set of built-in filters (test runner output, Clippy diagnostics, git log/diff, directory listings, log deduplication) and strips noise while preserving signal. The pipeline runs synchronously inside the tool executor, so the LLM never sees raw output.
61+
62+
Typical savings by command type:
63+
64+
| Command | Raw lines | Filtered lines | Savings |
65+
|---------|-----------|----------------|---------|
66+
| `cargo test` (100 passing, 2 failing) | ~340 | ~30 | ~91% |
67+
| `cargo clippy` (many warnings) | ~200 | ~50 | ~75% |
68+
| `git log --oneline -50` | 50 | 20 | 60% |
69+
70+
After each filtered execution, CLI mode prints a one-line stats summary and TUI mode accumulates the savings in the Resources panel. See [Tool System — Output Filter Pipeline](../guide/tools.md#output-filter-pipeline) for configuration details.
71+
72+
### Token Savings Tracking
73+
74+
`MetricsSnapshot` tracks cumulative filter metrics across the session:
75+
76+
- `filter_raw_tokens` / `filter_saved_tokens` — aggregate volume before and after filtering
77+
- `filter_total_commands` / `filter_filtered_commands` — hit rate denominator/numerator
78+
- `filter_confidence_full/partial/fallback` — distribution of filter confidence levels
79+
80+
These feed into the [TUI filter metrics display](../guide/tui.md#filter-metrics) and are emitted as `tracing::debug!` every 50 commands.
81+
5882
### Two-Tier Context Pruning
5983

6084
Long conversations accumulate tool outputs that consume significant context space. Zeph uses a two-tier strategy: Tier 1 selectively prunes old tool outputs (cheap, no LLM call), and Tier 2 falls back to full LLM compaction only when Tier 1 is insufficient. See [Context Engineering](../guide/context.md) for details.

0 commit comments

Comments
 (0)