Skip to content

Commit d19ceff

Browse files
committed
docs: update documentation, changelog, and readme for M24
Update feature-flags, configuration, channels, architecture, and security docs to reflect ProviderKind enum, minimal default features, Telegram auth guard, config validation, and path sanitization. Add doc tests step to CI workflow. Update CHANGELOG.md with Unreleased section for M24 changes. Update README.md with new feature flags and architecture notes.
1 parent 95a70d8 commit d19ceff

File tree

8 files changed

+99
-29
lines changed

8 files changed

+99
-29
lines changed

.github/workflows/ci.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,11 @@ jobs:
7171
env:
7272
RUSTC_WRAPPER: sccache
7373
SCCACHE_GHA_ENABLED: "true"
74+
- name: Run doc tests
75+
run: cargo test --workspace --features full --doc
76+
env:
77+
RUSTC_WRAPPER: sccache
78+
SCCACHE_GHA_ENABLED: "true"
7479

7580
integration:
7681
name: Integration Tests

CHANGELOG.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,34 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66

77
## [Unreleased]
88

9+
### Added
10+
- `ProviderKind` enum for type-safe provider selection in config
11+
- `RuntimeConfig` struct grouping agent runtime fields
12+
- `AnyProvider::embed_fn()` shared embedding closure helper
13+
- `Config::validate()` with bounds checking for critical config values
14+
- `sanitize_paths()` for stripping absolute paths from error messages
15+
- 10-second timeout wrapper for embedding API calls
16+
- `full` feature flag enabling all optional features
17+
18+
### Changed
19+
- `AnyChannel` moved from main.rs to zeph-channels crate
20+
- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index)
21+
- Skill matcher concurrency reduced from 50 to 20
22+
- `String::with_capacity` in context building loops
23+
- CI updated to use `--features full`
24+
25+
### Breaking
26+
- `LlmConfig.provider` changed from `String` to `ProviderKind` enum
27+
- Default features reduced -- users needing a2a, candle, mcp, openai, orchestrator, router, tui must enable explicitly or use `--features full`
28+
- Telegram channel rejects empty `allowed_users` at startup
29+
- Config with extreme values now rejected by `Config::validate()`
30+
31+
### Deprecated
32+
- `ToolExecutor::execute()` string-based dispatch (use `execute_tool_call()` instead)
33+
34+
### Fixed
35+
- Closed #410 (clap dropped atty), #411 (rmcp updated quinn-udp), #413 (A2A body limit already present)
36+
937
## [0.9.9] - 2026-02-17
1038

1139
### Added

README.md

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
[![MSRV](https://img.shields.io/badge/MSRV-1.88-blue)](https://www.rust-lang.org)
88
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
99

10-
Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, and HuggingFace** models — with semantic skill matching, vector memory, MCP tooling, and agent-to-agent communication. Ships as a single binary for Linux, macOS, and Windows.
10+
Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingFace, and OpenAI-compatible endpoints** (Together AI, Groq, etc.) — with semantic skill matching, vector memory, MCP tooling, and agent-to-agent communication. Ships as a single binary for Linux, macOS, and Windows.
1111

1212
<div align="center">
1313
<img src="asset/zeph-logo.png" alt="Zeph" width="600">
@@ -19,7 +19,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, and Hugg
1919

2020
**Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
2121

22-
**Run anywhere.** Local models via Ollama or Candle (GGUF with Metal/CUDA), cloud APIs (Claude, OpenAI, GPT-compatible endpoints like Together AI and Groq), or all of them at once through the multi-model orchestrator with automatic fallback chains.
22+
**Run anywhere.** Local models via Ollama or Candle (GGUF with Metal/CUDA), cloud APIs (Claude, OpenAI), OpenAI-compatible endpoints (Together AI, Groq, Fireworks) via `CompatibleProvider`, or all of them at once through the multi-model orchestrator with automatic fallback chains and `RouterProvider` for prompt-based model selection.
2323

2424
**Production-ready security.** Shell sandboxing with path restrictions and relative path traversal detection, pattern-based permission policy per tool, destructive command confirmation, file operation sandbox with path traversal protection, tool output overflow-to-file (with LLM-accessible paths), secret redaction (AWS, OpenAI, Anthropic, Google, GitLab), audit logging, SSRF protection (including MCP client), rate limiter with TTL-based eviction, and Trivy-scanned container images with 0 HIGH/CRITICAL CVEs.
2525

@@ -72,8 +72,12 @@ For cloud providers:
7272
# Claude
7373
ZEPH_LLM_PROVIDER=claude ZEPH_CLAUDE_API_KEY=sk-ant-... ./target/release/zeph
7474

75-
# OpenAI (or any compatible API)
75+
# OpenAI
7676
ZEPH_LLM_PROVIDER=openai ZEPH_OPENAI_API_KEY=sk-... ./target/release/zeph
77+
78+
# OpenAI-compatible endpoint (Together AI, Groq, Fireworks, etc.)
79+
ZEPH_LLM_PROVIDER=compatible ZEPH_COMPATIBLE_BASE_URL=https://api.together.xyz/v1 \
80+
ZEPH_COMPATIBLE_API_KEY=... ./target/release/zeph
7781
```
7882

7983
For Discord or Slack bot mode (requires respective feature):
@@ -101,7 +105,7 @@ cargo build --release --features tui
101105
| Feature | Description | Docs |
102106
|---------|-------------|------|
103107
| **Native Tool Use** | Structured tool calling via Claude tool_use and OpenAI function calling APIs; automatic fallback to text extraction for local models | [Tools](https://bug-ops.github.io/zeph/guide/tools.html) |
104-
| **Hybrid Inference** | Ollama, Claude, OpenAI, Candle (GGUF) — local, cloud, or both | [OpenAI](https://bug-ops.github.io/zeph/guide/openai.html) · [Candle](https://bug-ops.github.io/zeph/guide/candle.html) |
108+
| **Hybrid Inference** | Ollama, Claude, OpenAI, Candle (GGUF), Compatible (any OpenAI-compatible API) — local, cloud, or both | [OpenAI](https://bug-ops.github.io/zeph/guide/openai.html) · [Candle](https://bug-ops.github.io/zeph/guide/candle.html) |
105109
| **Skills-First Architecture** | Embedding-based top-K matching, progressive loading, hot-reload | [Skills](https://bug-ops.github.io/zeph/guide/skills.html) |
106110
| **Code Indexing** | AST-based chunking (tree-sitter), semantic retrieval, repo map generation, incremental indexing | [Code Indexing](https://bug-ops.github.io/zeph/guide/code-indexing.html) |
107111
| **Context Engineering** | Two-tier context pruning (selective tool-output pruning before LLM compaction), semantic recall injection, proportional budget allocation, token-based protection zone for recent context, config hot-reload | [Context](https://bug-ops.github.io/zeph/guide/context.html) · [Configuration](https://bug-ops.github.io/zeph/getting-started/configuration.html) |
@@ -120,15 +124,15 @@ cargo build --release --features tui
120124
## Architecture
121125

122126
```
123-
zeph (binary) — bootstrap, AnyChannel dispatch, vault resolution (anyhow for top-level errors)
127+
zeph (binary) — bootstrap, vault resolution (anyhow for top-level errors)
124128
├── zeph-core — Agent split into 7 submodules (context, streaming, persistence,
125129
│ learning, mcp, index), daemon supervisor, typed AgentError/ChannelError, config hot-reload
126-
├── zeph-llm — LlmProvider: Ollama, Claude, OpenAI, Candle, orchestrator,
127-
│ native tool_use (Claude/OpenAI), typed LlmError
130+
├── zeph-llm — LlmProvider: Ollama, Claude, OpenAI, Candle, Compatible, orchestrator,
131+
RouterProvider, native tool_use (Claude/OpenAI), typed LlmError
128132
├── zeph-skills — SKILL.md parser, embedding matcher, hot-reload, self-learning, typed SkillError
129133
├── zeph-memory — SQLite + Qdrant, semantic recall, summarization, typed MemoryError
130134
├── zeph-index — AST-based code indexing, semantic retrieval, repo map (optional)
131-
├── zeph-channels — Discord, Slack, Telegram adapters with streaming
135+
├── zeph-channels — AnyChannel dispatch, Discord, Slack, Telegram adapters with streaming
132136
├── zeph-tools — schemars-driven tool registry (shell, file ops, web scrape), composite dispatch
133137
├── zeph-mcp — MCP client, multi-server lifecycle, unified tool matching
134138
├── zeph-a2a — A2A client + server, agent discovery, JSON-RPC 2.0
@@ -137,7 +141,7 @@ zeph (binary) — bootstrap, AnyChannel dispatch, vault resolution (anyhow for t
137141
└── zeph-tui — ratatui TUI dashboard with live agent metrics (optional)
138142
```
139143

140-
**Error handling:** Typed errors throughout all library crates -- `AgentError` (7 variants), `ChannelError` (4 variants), `LlmError`, `MemoryError`, `SkillError`. `anyhow` is used only in `main.rs` for top-level orchestration. Shared Qdrant operations consolidated via `QdrantOps` helper. `AnyProvider` dispatch deduplicated via `delegate_provider!` macro.
144+
**Error handling:** Typed errors throughout all library crates -- `AgentError` (7 variants), `ChannelError` (4 variants), `LlmError`, `MemoryError`, `SkillError`. `anyhow` is used only in `main.rs` for top-level orchestration. Shared Qdrant operations consolidated via `QdrantOps` helper. `AnyProvider` dispatch deduplicated via `delegate_provider!` macro. `AnyChannel` enum dispatch lives in `zeph-channels` for reuse across binaries.
141145

142146
**Agent decomposition:** The agent module in `zeph-core` is split into 7 submodules (`mod.rs`, `context.rs`, `streaming.rs`, `persistence.rs`, `learning.rs`, `mcp.rs`, `index.rs`) with 5 inner field-grouping structs (`MemoryState`, `SkillState`, `ContextState`, `McpState`, `IndexState`).
143147

@@ -152,29 +156,32 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o
152156

153157
| Feature | Default | Description |
154158
|---------|---------|-------------|
155-
| `a2a` | On | A2A protocol client and server |
156-
| `openai` | On | OpenAI-compatible provider |
157-
| `mcp` | On | MCP client for external tool servers |
158-
| `candle` | On | Local HuggingFace inference (GGUF) |
159-
| `orchestrator` | On | Multi-model routing with fallback |
160-
| `qdrant` | On | Qdrant vector search for skills and MCP tools (opt-out) |
159+
| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) |
160+
| `openai` | On | OpenAI provider |
161+
| `qdrant` | On | Qdrant vector search for skills and MCP tools |
161162
| `self-learning` | On | Skill evolution system |
162163
| `vault-age` | On | Age-encrypted secret storage |
163-
| `index` | On | AST-based code indexing and semantic retrieval |
164+
| `a2a` | Off | A2A protocol client and server |
165+
| `candle` | Off | Local HuggingFace inference (GGUF) |
166+
| `index` | Off | AST-based code indexing and semantic retrieval |
167+
| `mcp` | Off | MCP client for external tool servers |
168+
| `orchestrator` | Off | Multi-model routing with fallback |
169+
| `router` | Off | Prompt-based model selection via RouterProvider |
164170
| `discord` | Off | Discord bot with Gateway v10 WebSocket |
165171
| `slack` | Off | Slack bot with Events API webhook |
166172
| `gateway` | Off | HTTP gateway for webhook ingestion |
167173
| `daemon` | Off | Daemon supervisor for component lifecycle |
168174
| `scheduler` | Off | Cron-based periodic task scheduler |
175+
| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana |
169176
| `metal` | Off | Metal GPU acceleration (macOS) |
170177
| `tui` | Off | ratatui TUI dashboard with real-time metrics |
171178
| `cuda` | Off | CUDA GPU acceleration (Linux) |
172179

173180
```bash
174-
cargo build --release # all defaults
181+
cargo build --release # default features only
182+
cargo build --release --features full # all non-platform features
175183
cargo build --release --features metal # macOS Metal GPU
176-
cargo build --release --no-default-features # minimal binary
177-
cargo build --release --features index # with code indexing
184+
cargo build --release --no-default-features # minimal binary (Ollama + Claude only)
178185
cargo build --release --features tui # with TUI dashboard
179186
```
180187

docs/src/architecture/crates.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ SQLite-backed conversation persistence with Qdrant vector search.
5858

5959
Channel implementations for the Zeph agent.
6060

61+
- `AnyChannel` — enum dispatch over all channel variants (Cli, Telegram, Discord, Slack, Tui), used by the binary for runtime channel selection
6162
- `ChannelError` — typed error enum (`Telegram`, `NoActiveChat`) replacing prior `anyhow` usage
6263
- `CliChannel` — stdin/stdout with immediate streaming output, blocking recv (queue always empty)
6364
- `TelegramChannel` — teloxide adapter with MarkdownV2 rendering, streaming via edit-in-place, user whitelisting, inline confirmation keyboards, mpsc-backed message queue with 500ms merge window

docs/src/feature-flags.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,17 @@ Zeph uses Cargo feature flags to control optional functionality. Default feature
44

55
| Feature | Default | Description |
66
|---------|---------|-------------|
7-
| `a2a` | Enabled | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication |
7+
| `compatible` | Enabled | `CompatibleProvider` for OpenAI-compatible third-party APIs |
88
| `openai` | Enabled | OpenAI-compatible provider (GPT, Together, Groq, Fireworks, etc.) |
9-
| `mcp` | Enabled | MCP client for external tool servers via stdio/HTTP transport |
10-
| `candle` | Enabled | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) |
11-
| `orchestrator` | Enabled | Multi-model routing with task-based classification and fallback chains |
12-
| `self-learning` | Enabled | Skill evolution via failure detection, self-reflection, and LLM-generated improvements |
139
| `qdrant` | Enabled | Qdrant-backed vector storage for skill matching (`zeph-skills`) and MCP tool registry (`zeph-mcp`) |
10+
| `self-learning` | Enabled | Skill evolution via failure detection, self-reflection, and LLM-generated improvements |
1411
| `vault-age` | Enabled | Age-encrypted vault backend for file-based secret storage ([age](https://age-encryption.org/)) |
15-
| `index` | Enabled | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) |
12+
| `a2a` | Disabled | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication |
13+
| `candle` | Disabled | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) |
14+
| `index` | Disabled | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) |
15+
| `mcp` | Disabled | MCP client for external tool servers via stdio/HTTP transport |
16+
| `orchestrator` | Disabled | Multi-model routing with task-based classification and fallback chains |
17+
| `router` | Disabled | `RouterProvider` for chaining multiple providers with fallback |
1618
| `discord` | Disabled | Discord channel adapter with Gateway v10 WebSocket and slash commands ([guide](guide/channels.md#discord-channel)) |
1719
| `slack` | Disabled | Slack channel adapter with Events API webhook and HMAC-SHA256 verification ([guide](guide/channels.md#slack-channel)) |
1820
| `otel` | Disabled | OpenTelemetry tracing export via OTLP/gRPC ([guide](guide/observability.md)) |
@@ -33,9 +35,12 @@ cargo build --release --features tui # with TUI dashboard
3335
cargo build --release --features discord # with Discord bot
3436
cargo build --release --features slack # with Slack bot
3537
cargo build --release --features gateway,daemon,scheduler # with infrastructure components
38+
cargo build --release --features full # all optional features
3639
cargo build --release --no-default-features # minimal binary
3740
```
3841

42+
The `full` feature enables every optional feature except `metal`, `cuda`, and `otel`.
43+
3944
## zeph-index Language Features
4045

4146
When `index` is enabled, tree-sitter grammars are controlled by sub-features on the `zeph-index` crate. All are enabled by default.

docs/src/getting-started/configuration.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,18 @@ ZEPH_CONFIG=/path/to/custom.toml zeph
1717

1818
Priority: `--config` > `ZEPH_CONFIG` > `config/default.toml`.
1919

20+
## Validation
21+
22+
`Config::validate()` runs at startup and rejects out-of-range values:
23+
24+
| Field | Constraint |
25+
|-------|-----------|
26+
| `memory.history_limit` | <= 10,000 |
27+
| `memory.context_budget_tokens` | <= 1,000,000 (when > 0) |
28+
| `agent.max_tool_iterations` | <= 100 |
29+
| `a2a.rate_limit` | > 0 |
30+
| `gateway.rate_limit` | > 0 |
31+
2032
## Hot-Reload
2133

2234
Zeph watches the config file for changes and applies runtime-safe fields without restart. The file watcher uses 500ms debounce to avoid redundant reloads.
@@ -45,7 +57,7 @@ name = "Zeph"
4557
max_tool_iterations = 10 # Max tool loop iterations per response (default: 10)
4658

4759
[llm]
48-
provider = "ollama"
60+
provider = "ollama" # ollama, claude, openai, candle, compatible, orchestrator, router
4961
base_url = "http://localhost:11434"
5062
model = "mistral:7b"
5163
embedding_model = "qwen3-embedding" # Model for text embeddings
@@ -148,7 +160,7 @@ rate_limit = 60
148160

149161
| Variable | Description |
150162
|----------|-------------|
151-
| `ZEPH_LLM_PROVIDER` | `ollama`, `claude`, `openai`, `candle`, or `orchestrator` |
163+
| `ZEPH_LLM_PROVIDER` | `ollama`, `claude`, `openai`, `candle`, `compatible`, `orchestrator`, or `router` |
152164
| `ZEPH_LLM_BASE_URL` | Ollama API endpoint |
153165
| `ZEPH_LLM_MODEL` | Model name for Ollama |
154166
| `ZEPH_LLM_EMBEDDING_MODEL` | Embedding model for Ollama (default: `qwen3-embedding`) |

docs/src/guide/channels.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Restrict bot access to specific Telegram usernames:
6464
allowed_users = ["alice", "bob"]
6565
```
6666

67-
When `allowed_users` is empty, the bot accepts messages from all users. Messages from unauthorized users are silently rejected with a warning log.
67+
The `allowed_users` list **must not be empty**. The Telegram channel refuses to start without at least one allowed username to prevent accidentally exposing the bot to all users. Messages from unauthorized users are silently rejected with a warning log.
6868

6969
### Bot Commands
7070

docs/src/security.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,18 @@ LLM responses are scanned for common secret patterns before display:
113113
- Secrets replaced with `[REDACTED]` preserving original whitespace formatting
114114
- Enabled by default (`security.redact_secrets = true`), applied to both streaming and non-streaming responses
115115

116+
## Config Validation
117+
118+
`Config::validate()` enforces upper bounds at startup to catch configuration errors early:
119+
120+
- `memory.history_limit` <= 10,000
121+
- `memory.context_budget_tokens` <= 1,000,000 (when non-zero)
122+
- `agent.max_tool_iterations` <= 100
123+
- `a2a.rate_limit` > 0
124+
- `gateway.rate_limit` > 0
125+
126+
The agent exits with an error message if any bound is violated.
127+
116128
## Timeout Policies
117129

118130
Configurable per-operation timeouts prevent hung connections:
@@ -133,7 +145,7 @@ a2a_seconds = 30 # A2A remote calls
133145
**Safe execution model:**
134146
- Commands parsed for blocked patterns, then sandbox-validated, then confirmation-checked
135147
- Timeout enforcement (default: 30s, configurable)
136-
- Full errors logged to system, sanitized messages shown to users
148+
- Full errors logged to system; user-facing messages pass through `sanitize_paths()` which replaces absolute filesystem paths (`/home/`, `/Users/`, `/root/`, `/tmp/`, `/var/`) with `[PATH]` to prevent information disclosure
137149
- Audit trail for all tool executions (when enabled)
138150

139151
## Container Security

0 commit comments

Comments
 (0)