docs: update documentation, changelog, and readme for M24

bug-ops · bug-ops · commit d19ceff7b104 · 2026-02-17T02:28:57.000+01:00
Update feature-flags, configuration, channels, architecture, and
security docs to reflect ProviderKind enum, minimal default features,
Telegram auth guard, config validation, and path sanitization.

Add doc tests step to CI workflow.
Update CHANGELOG.md with Unreleased section for M24 changes.
Update README.md with new feature flags and architecture notes.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -71,6 +71,11 @@ jobs:
         env:
           RUSTC_WRAPPER: sccache
           SCCACHE_GHA_ENABLED: "true"
+      - name: Run doc tests
+        run: cargo test --workspace --features full --doc
+        env:
+          RUSTC_WRAPPER: sccache
+          SCCACHE_GHA_ENABLED: "true"
 
   integration:
     name: Integration Tests
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,34 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
 ## [Unreleased]
 
+### Added
+- `ProviderKind` enum for type-safe provider selection in config
+- `RuntimeConfig` struct grouping agent runtime fields
+- `AnyProvider::embed_fn()` shared embedding closure helper
+- `Config::validate()` with bounds checking for critical config values
+- `sanitize_paths()` for stripping absolute paths from error messages
+- 10-second timeout wrapper for embedding API calls
+- `full` feature flag enabling all optional features
+
+### Changed
+- `AnyChannel` moved from main.rs to zeph-channels crate
+- Default features reduced to minimal set (qdrant, self-learning, vault-age, compatible, index)
+- Skill matcher concurrency reduced from 50 to 20
+- `String::with_capacity` in context building loops
+- CI updated to use `--features full`
+
+### Breaking
+- `LlmConfig.provider` changed from `String` to `ProviderKind` enum
+- Default features reduced -- users needing a2a, candle, mcp, openai, orchestrator, router, tui must enable explicitly or use `--features full`
+- Telegram channel rejects empty `allowed_users` at startup
+- Config with extreme values now rejected by `Config::validate()`
+
+### Deprecated
+- `ToolExecutor::execute()` string-based dispatch (use `execute_tool_call()` instead)
+
+### Fixed
+- Closed #410 (clap dropped atty), #411 (rmcp updated quinn-udp), #413 (A2A body limit already present)
+
 ## [0.9.9] - 2026-02-17
 
 ### Added
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 [![MSRV](https://img.shields.io/badge/MSRV-1.88-blue)](https://www.rust-lang.org)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 
-Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, and HuggingFace** models — with semantic skill matching, vector memory, MCP tooling, and agent-to-agent communication. Ships as a single binary for Linux, macOS, and Windows.
+Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, HuggingFace, and OpenAI-compatible endpoints** (Together AI, Groq, etc.) — with semantic skill matching, vector memory, MCP tooling, and agent-to-agent communication. Ships as a single binary for Linux, macOS, and Windows.
 
 <div align="center">
   <img src="asset/zeph-logo.png" alt="Zeph" width="600">
@@ -19,7 +19,7 @@ Lightweight AI agent that routes tasks across **Ollama, Claude, OpenAI, and Hugg
 
 **Intelligent context management.** Two-tier context pruning: Tier 1 selectively removes old tool outputs (clearing bodies from memory after persisting to SQLite) before falling back to Tier 2 LLM-based compaction, reducing unnecessary LLM calls. A token-based protection zone preserves recent context from pruning. Parallel context preparation via `try_join!` and optimized byte-length token estimation. Cross-session memory transfers knowledge between conversations with relevance filtering. Proportional budget allocation (8% summaries, 8% semantic recall, 4% cross-session, 30% code context, 50% recent history) keeps conversations efficient. Tool outputs are truncated at 30K chars with optional LLM-based summarization for large outputs. Doom-loop detection breaks runaway tool cycles after 3 identical consecutive outputs, with configurable iteration limits (default 10). ZEPH.md project config discovery walks up the directory tree and injects project-specific context when available. Config hot-reload applies runtime-safe fields (timeouts, security, memory limits) on file change without restart.
 
-**Run anywhere.** Local models via Ollama or Candle (GGUF with Metal/CUDA), cloud APIs (Claude, OpenAI, GPT-compatible endpoints like Together AI and Groq), or all of them at once through the multi-model orchestrator with automatic fallback chains.
+**Run anywhere.** Local models via Ollama or Candle (GGUF with Metal/CUDA), cloud APIs (Claude, OpenAI), OpenAI-compatible endpoints (Together AI, Groq, Fireworks) via `CompatibleProvider`, or all of them at once through the multi-model orchestrator with automatic fallback chains and `RouterProvider` for prompt-based model selection.
 
 **Production-ready security.** Shell sandboxing with path restrictions and relative path traversal detection, pattern-based permission policy per tool, destructive command confirmation, file operation sandbox with path traversal protection, tool output overflow-to-file (with LLM-accessible paths), secret redaction (AWS, OpenAI, Anthropic, Google, GitLab), audit logging, SSRF protection (including MCP client), rate limiter with TTL-based eviction, and Trivy-scanned container images with 0 HIGH/CRITICAL CVEs.
 
@@ -72,8 +72,12 @@ For cloud providers:
 # Claude
 ZEPH_LLM_PROVIDER=claude ZEPH_CLAUDE_API_KEY=sk-ant-... ./target/release/zeph
 
-# OpenAI (or any compatible API)
+# OpenAI
 ZEPH_LLM_PROVIDER=openai ZEPH_OPENAI_API_KEY=sk-... ./target/release/zeph
+
+# OpenAI-compatible endpoint (Together AI, Groq, Fireworks, etc.)
+ZEPH_LLM_PROVIDER=compatible ZEPH_COMPATIBLE_BASE_URL=https://api.together.xyz/v1 \
+  ZEPH_COMPATIBLE_API_KEY=... ./target/release/zeph
 ```
 
 For Discord or Slack bot mode (requires respective feature):
@@ -101,7 +105,7 @@ cargo build --release --features tui
 | Feature | Description | Docs |
 |---------|-------------|------|
 | **Native Tool Use** | Structured tool calling via Claude tool_use and OpenAI function calling APIs; automatic fallback to text extraction for local models | [Tools](https://bug-ops.github.io/zeph/guide/tools.html) |
-| **Hybrid Inference** | Ollama, Claude, OpenAI, Candle (GGUF) — local, cloud, or both | [OpenAI](https://bug-ops.github.io/zeph/guide/openai.html) · [Candle](https://bug-ops.github.io/zeph/guide/candle.html) |
+| **Hybrid Inference** | Ollama, Claude, OpenAI, Candle (GGUF), Compatible (any OpenAI-compatible API) — local, cloud, or both | [OpenAI](https://bug-ops.github.io/zeph/guide/openai.html) · [Candle](https://bug-ops.github.io/zeph/guide/candle.html) |
 | **Skills-First Architecture** | Embedding-based top-K matching, progressive loading, hot-reload | [Skills](https://bug-ops.github.io/zeph/guide/skills.html) |
 | **Code Indexing** | AST-based chunking (tree-sitter), semantic retrieval, repo map generation, incremental indexing | [Code Indexing](https://bug-ops.github.io/zeph/guide/code-indexing.html) |
 | **Context Engineering** | Two-tier context pruning (selective tool-output pruning before LLM compaction), semantic recall injection, proportional budget allocation, token-based protection zone for recent context, config hot-reload | [Context](https://bug-ops.github.io/zeph/guide/context.html) · [Configuration](https://bug-ops.github.io/zeph/getting-started/configuration.html) |
@@ -120,15 +124,15 @@ cargo build --release --features tui
 ## Architecture
 
 ```
-zeph (binary) — bootstrap, AnyChannel dispatch, vault resolution (anyhow for top-level errors)
+zeph (binary) — bootstrap, vault resolution (anyhow for top-level errors)
 ├── zeph-core       — Agent split into 7 submodules (context, streaming, persistence,
 │                     learning, mcp, index), daemon supervisor, typed AgentError/ChannelError, config hot-reload
-├── zeph-llm        — LlmProvider: Ollama, Claude, OpenAI, Candle, orchestrator,
-│                     native tool_use (Claude/OpenAI), typed LlmError
+├── zeph-llm        — LlmProvider: Ollama, Claude, OpenAI, Candle, Compatible, orchestrator,
+│                     RouterProvider, native tool_use (Claude/OpenAI), typed LlmError
 ├── zeph-skills     — SKILL.md parser, embedding matcher, hot-reload, self-learning, typed SkillError
 ├── zeph-memory     — SQLite + Qdrant, semantic recall, summarization, typed MemoryError
 ├── zeph-index      — AST-based code indexing, semantic retrieval, repo map (optional)
-├── zeph-channels   — Discord, Slack, Telegram adapters with streaming
+├── zeph-channels   — AnyChannel dispatch, Discord, Slack, Telegram adapters with streaming
 ├── zeph-tools      — schemars-driven tool registry (shell, file ops, web scrape), composite dispatch
 ├── zeph-mcp        — MCP client, multi-server lifecycle, unified tool matching
 ├── zeph-a2a        — A2A client + server, agent discovery, JSON-RPC 2.0
@@ -137,7 +141,7 @@ zeph (binary) — bootstrap, AnyChannel dispatch, vault resolution (anyhow for t
 └── zeph-tui        — ratatui TUI dashboard with live agent metrics (optional)
 ```
 
-**Error handling:** Typed errors throughout all library crates -- `AgentError` (7 variants), `ChannelError` (4 variants), `LlmError`, `MemoryError`, `SkillError`. `anyhow` is used only in `main.rs` for top-level orchestration. Shared Qdrant operations consolidated via `QdrantOps` helper. `AnyProvider` dispatch deduplicated via `delegate_provider!` macro.
+**Error handling:** Typed errors throughout all library crates -- `AgentError` (7 variants), `ChannelError` (4 variants), `LlmError`, `MemoryError`, `SkillError`. `anyhow` is used only in `main.rs` for top-level orchestration. Shared Qdrant operations consolidated via `QdrantOps` helper. `AnyProvider` dispatch deduplicated via `delegate_provider!` macro. `AnyChannel` enum dispatch lives in `zeph-channels` for reuse across binaries.
 
 **Agent decomposition:** The agent module in `zeph-core` is split into 7 submodules (`mod.rs`, `context.rs`, `streaming.rs`, `persistence.rs`, `learning.rs`, `mcp.rs`, `index.rs`) with 5 inner field-grouping structs (`MemoryState`, `SkillState`, `ContextState`, `McpState`, `IndexState`).
 
@@ -152,29 +156,32 @@ Deep dive: [Architecture overview](https://bug-ops.github.io/zeph/architecture/o
 
 | Feature | Default | Description |
 |---------|---------|-------------|
-| `a2a` | On | A2A protocol client and server |
-| `openai` | On | OpenAI-compatible provider |
-| `mcp` | On | MCP client for external tool servers |
-| `candle` | On | Local HuggingFace inference (GGUF) |
-| `orchestrator` | On | Multi-model routing with fallback |
-| `qdrant` | On | Qdrant vector search for skills and MCP tools (opt-out) |
+| `compatible` | On | OpenAI-compatible provider (Together AI, Groq, Fireworks, etc.) |
+| `openai` | On | OpenAI provider |
+| `qdrant` | On | Qdrant vector search for skills and MCP tools |
 | `self-learning` | On | Skill evolution system |
 | `vault-age` | On | Age-encrypted secret storage |
-| `index` | On | AST-based code indexing and semantic retrieval |
+| `a2a` | Off | A2A protocol client and server |
+| `candle` | Off | Local HuggingFace inference (GGUF) |
+| `index` | Off | AST-based code indexing and semantic retrieval |
+| `mcp` | Off | MCP client for external tool servers |
+| `orchestrator` | Off | Multi-model routing with fallback |
+| `router` | Off | Prompt-based model selection via RouterProvider |
 | `discord` | Off | Discord bot with Gateway v10 WebSocket |
 | `slack` | Off | Slack bot with Events API webhook |
 | `gateway` | Off | HTTP gateway for webhook ingestion |
 | `daemon` | Off | Daemon supervisor for component lifecycle |
 | `scheduler` | Off | Cron-based periodic task scheduler |
+| `otel` | Off | OpenTelemetry OTLP export for Prometheus/Grafana |
 | `metal` | Off | Metal GPU acceleration (macOS) |
 | `tui` | Off | ratatui TUI dashboard with real-time metrics |
 | `cuda` | Off | CUDA GPU acceleration (Linux) |
 
 ```bash
-cargo build --release                        # all defaults
+cargo build --release                        # default features only
+cargo build --release --features full        # all non-platform features
 cargo build --release --features metal       # macOS Metal GPU
-cargo build --release --no-default-features  # minimal binary
-cargo build --release --features index       # with code indexing
+cargo build --release --no-default-features  # minimal binary (Ollama + Claude only)
 cargo build --release --features tui         # with TUI dashboard
 ```
 
diff --git a/docs/src/architecture/crates.md b/docs/src/architecture/crates.md
@@ -58,6 +58,7 @@ SQLite-backed conversation persistence with Qdrant vector search.
 
 Channel implementations for the Zeph agent.
 
+- `AnyChannel` — enum dispatch over all channel variants (Cli, Telegram, Discord, Slack, Tui), used by the binary for runtime channel selection
 - `ChannelError` — typed error enum (`Telegram`, `NoActiveChat`) replacing prior `anyhow` usage
 - `CliChannel` — stdin/stdout with immediate streaming output, blocking recv (queue always empty)
 - `TelegramChannel` — teloxide adapter with MarkdownV2 rendering, streaming via edit-in-place, user whitelisting, inline confirmation keyboards, mpsc-backed message queue with 500ms merge window
diff --git a/docs/src/feature-flags.md b/docs/src/feature-flags.md
@@ -4,15 +4,17 @@ Zeph uses Cargo feature flags to control optional functionality. Default feature
 
 | Feature | Default | Description |
 |---------|---------|-------------|
-| `a2a` | Enabled | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication |
+| `compatible` | Enabled | `CompatibleProvider` for OpenAI-compatible third-party APIs |
 | `openai` | Enabled | OpenAI-compatible provider (GPT, Together, Groq, Fireworks, etc.) |
-| `mcp` | Enabled | MCP client for external tool servers via stdio/HTTP transport |
-| `candle` | Enabled | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) |
-| `orchestrator` | Enabled | Multi-model routing with task-based classification and fallback chains |
-| `self-learning` | Enabled | Skill evolution via failure detection, self-reflection, and LLM-generated improvements |
 | `qdrant` | Enabled | Qdrant-backed vector storage for skill matching (`zeph-skills`) and MCP tool registry (`zeph-mcp`) |
+| `self-learning` | Enabled | Skill evolution via failure detection, self-reflection, and LLM-generated improvements |
 | `vault-age` | Enabled | Age-encrypted vault backend for file-based secret storage ([age](https://age-encryption.org/)) |
-| `index` | Enabled | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) |
+| `a2a` | Disabled | [A2A protocol](https://github.com/a2aproject/A2A) client and server for agent-to-agent communication |
+| `candle` | Disabled | Local HuggingFace model inference via [candle](https://github.com/huggingface/candle) (GGUF quantized models) |
+| `index` | Disabled | AST-based code indexing and semantic retrieval via tree-sitter ([guide](guide/code-indexing.md)) |
+| `mcp` | Disabled | MCP client for external tool servers via stdio/HTTP transport |
+| `orchestrator` | Disabled | Multi-model routing with task-based classification and fallback chains |
+| `router` | Disabled | `RouterProvider` for chaining multiple providers with fallback |
 | `discord` | Disabled | Discord channel adapter with Gateway v10 WebSocket and slash commands ([guide](guide/channels.md#discord-channel)) |
 | `slack` | Disabled | Slack channel adapter with Events API webhook and HMAC-SHA256 verification ([guide](guide/channels.md#slack-channel)) |
 | `otel` | Disabled | OpenTelemetry tracing export via OTLP/gRPC ([guide](guide/observability.md)) |
@@ -33,9 +35,12 @@ cargo build --release --features tui                      # with TUI dashboard
 cargo build --release --features discord                    # with Discord bot
 cargo build --release --features slack                      # with Slack bot
 cargo build --release --features gateway,daemon,scheduler  # with infrastructure components
+cargo build --release --features full                      # all optional features
 cargo build --release --no-default-features               # minimal binary
 ```
 
+The `full` feature enables every optional feature except `metal`, `cuda`, and `otel`.
+
 ## zeph-index Language Features
 
 When `index` is enabled, tree-sitter grammars are controlled by sub-features on the `zeph-index` crate. All are enabled by default.
diff --git a/docs/src/getting-started/configuration.md b/docs/src/getting-started/configuration.md
@@ -17,6 +17,18 @@ ZEPH_CONFIG=/path/to/custom.toml zeph
 
 Priority: `--config` > `ZEPH_CONFIG` > `config/default.toml`.
 
+## Validation
+
+`Config::validate()` runs at startup and rejects out-of-range values:
+
+| Field | Constraint |
+|-------|-----------|
+| `memory.history_limit` | <= 10,000 |
+| `memory.context_budget_tokens` | <= 1,000,000 (when > 0) |
+| `agent.max_tool_iterations` | <= 100 |
+| `a2a.rate_limit` | > 0 |
+| `gateway.rate_limit` | > 0 |
+
 ## Hot-Reload
 
 Zeph watches the config file for changes and applies runtime-safe fields without restart. The file watcher uses 500ms debounce to avoid redundant reloads.
@@ -45,7 +57,7 @@ name = "Zeph"
 max_tool_iterations = 10  # Max tool loop iterations per response (default: 10)
 
 [llm]
-provider = "ollama"
+provider = "ollama"  # ollama, claude, openai, candle, compatible, orchestrator, router
 base_url = "http://localhost:11434"
 model = "mistral:7b"
 embedding_model = "qwen3-embedding"  # Model for text embeddings
@@ -148,7 +160,7 @@ rate_limit = 60
 
 | Variable | Description |
 |----------|-------------|
-| `ZEPH_LLM_PROVIDER` | `ollama`, `claude`, `openai`, `candle`, or `orchestrator` |
+| `ZEPH_LLM_PROVIDER` | `ollama`, `claude`, `openai`, `candle`, `compatible`, `orchestrator`, or `router` |
 | `ZEPH_LLM_BASE_URL` | Ollama API endpoint |
 | `ZEPH_LLM_MODEL` | Model name for Ollama |
 | `ZEPH_LLM_EMBEDDING_MODEL` | Embedding model for Ollama (default: `qwen3-embedding`) |
diff --git a/docs/src/guide/channels.md b/docs/src/guide/channels.md
@@ -64,7 +64,7 @@ Restrict bot access to specific Telegram usernames:
 allowed_users = ["alice", "bob"]
 ```
 
-When `allowed_users` is empty, the bot accepts messages from all users. Messages from unauthorized users are silently rejected with a warning log.
+The `allowed_users` list **must not be empty**. The Telegram channel refuses to start without at least one allowed username to prevent accidentally exposing the bot to all users. Messages from unauthorized users are silently rejected with a warning log.
 
 ### Bot Commands
 
diff --git a/docs/src/security.md b/docs/src/security.md
@@ -113,6 +113,18 @@ LLM responses are scanned for common secret patterns before display:
 - Secrets replaced with `[REDACTED]` preserving original whitespace formatting
 - Enabled by default (`security.redact_secrets = true`), applied to both streaming and non-streaming responses
 
+## Config Validation
+
+`Config::validate()` enforces upper bounds at startup to catch configuration errors early:
+
+- `memory.history_limit` <= 10,000
+- `memory.context_budget_tokens` <= 1,000,000 (when non-zero)
+- `agent.max_tool_iterations` <= 100
+- `a2a.rate_limit` > 0
+- `gateway.rate_limit` > 0
+
+The agent exits with an error message if any bound is violated.
+
 ## Timeout Policies
 
 Configurable per-operation timeouts prevent hung connections:
@@ -133,7 +145,7 @@ a2a_seconds = 30        # A2A remote calls
 **Safe execution model:**
 - Commands parsed for blocked patterns, then sandbox-validated, then confirmation-checked
 - Timeout enforcement (default: 30s, configurable)
-- Full errors logged to system, sanitized messages shown to users
+- Full errors logged to system; user-facing messages pass through `sanitize_paths()` which replaces absolute filesystem paths (`/home/`, `/Users/`, `/root/`, `/tmp/`, `/var/`) with `[PATH]` to prevent information disclosure
 - Audit trail for all tool executions (when enabled)
 
 ## Container Security