chore: release v0.4.0 (#66)

bug-ops · web-flow · commit 68ac3325530b · 2026-02-08T00:00:14.000Z
Update version to 0.4.0 across workspace manifests. Add comprehensive
documentation for M9 milestone features including Qdrant integration,
semantic memory orchestration, conversation summarization, and context
budget management.

Changes:
- Bump version from 0.3.0 to 0.4.0 in workspace package section
- Update all zeph-* crate versions in workspace dependencies
- Document M9 Phase 1 (Qdrant integration) in CHANGELOG
- Document M9 Phase 2 (semantic memory integration) in CHANGELOG
- Document M9 Phase 3 (summarization and context budget) in CHANGELOG
- Add semantic memory section to README with Qdrant setup instructions
- Add conversation summarization section to README
- Update configuration examples with new memory.semantic settings
- Add environment variables for Qdrant URL and summarization config
- Update architecture diagram with embeddings and semantic capabilities
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,8 +6,46 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
 ## [Unreleased]
 
+## [0.4.0] - 2026-02-08
+
 ### Added
 
+#### M9 Phase 3: Conversation Summarization and Context Budget (Issue #62)
+- New `SemanticMemory::summarize()` method for LLM-based conversation compression
+- Automatic summarization triggered when message count exceeds threshold
+- SQLite migration `003_summaries.sql` creates dedicated summaries table with CASCADE constraints
+- `SqliteStore::save_summary()` stores summary with metadata (first/last message IDs, token estimate)
+- `SqliteStore::load_summaries()` retrieves all summaries for a conversation ordered by ID
+- `SqliteStore::load_messages_range()` fetches messages after specific ID with limit for batch processing
+- `SqliteStore::count_messages()` counts total messages in conversation
+- `SqliteStore::latest_summary_last_message_id()` gets last summarized message ID for resumption
+- `ContextBudget` struct for proportional token allocation (15% summaries, 25% semantic recall, 60% recent history)
+- `estimate_tokens()` helper using chars/4 heuristic (100x faster than tiktoken, ±25% accuracy)
+- `Agent::check_summarization()` lazy trigger after persist_message() when threshold exceeded
+- Batch size = threshold/2 to balance summary quality with LLM call frequency
+- Configuration: `memory.summarization_threshold` (default: 100), `memory.context_budget_tokens` (default: 0 = unlimited)
+- Environment overrides: `ZEPH_MEMORY_SUMMARIZATION_THRESHOLD`, `ZEPH_MEMORY_CONTEXT_BUDGET_TOKENS`
+- Inline comments in `config/default.toml` documenting all configuration parameters
+- 26 new unit tests for summarization and context budget (196 total tests, 75.31% coverage)
+- Architecture Decision Records ADR-016 through ADR-019 for summarization design
+- Foreign key constraint added to `messages.conversation_id` with ON DELETE CASCADE
+
+#### M9 Phase 2: Semantic Memory Integration (Issue #61)
+- `SemanticMemory<P: LlmProvider>` orchestrator coordinating SQLite, Qdrant, and LlmProvider
+- `SemanticMemory::remember()` saves message to SQLite, generates embedding, stores in Qdrant
+- `SemanticMemory::recall()` performs semantic search with query embedding and fetches messages from SQLite
+- `SemanticMemory::has_embedding()` checks if message already embedded to prevent duplicates
+- `SemanticMemory::embed_missing()` background task to embed old messages (with LIMIT parameter)
+- `Agent<P, C, T>` now generic over LlmProvider to support SemanticMemory
+- `Agent::with_memory()` replaces SqliteStore with SemanticMemory
+- Graceful degradation: embedding failures logged but don't block message save
+- Qdrant connection failures silently downgrade to SQLite-only mode (no semantic recall)
+- Generic provider pattern: `SemanticMemory<P: LlmProvider>` instead of `Arc<dyn LlmProvider>` for Edition 2024 async trait compatibility
+- `AnyProvider`, `OllamaProvider`, `ClaudeProvider` now derive/implement `Clone` for semantic memory integration
+- Integration test updated for SemanticMemory API (with_memory now takes 5 parameters including recall_limit)
+- Semantic memory config: `memory.semantic.enabled`, `memory.semantic.recall_limit` (default: 5)
+- 18 new tests for semantic memory orchestration (recall, remember, embed_missing, graceful degradation)
+
 #### M9 Phase 1: Qdrant Integration (Issue #60)
 - New `QdrantStore` module in zeph-memory for vector storage and similarity search
 - `QdrantStore::store()` persists embeddings to Qdrant and tracks metadata in SQLite
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -5,7 +5,7 @@ resolver = "3"
 [workspace.package]
 edition = "2024"
 rust-version = "1.88"
-version = "0.3.0"
+version = "0.4.0"
 authors = ["bug-ops"]
 license = "MIT"
 repository = "https://github.com/bug-ops/zeph"
@@ -30,12 +30,12 @@ toml = "0.9"
 tracing = "0.1"
 tracing-subscriber = "0.3"
 uuid = "1.20"
-zeph-channels = { path = "crates/zeph-channels", version = "0.3.0" }
-zeph-core = { path = "crates/zeph-core", version = "0.3.0" }
-zeph-llm = { path = "crates/zeph-llm", version = "0.3.0" }
-zeph-memory = { path = "crates/zeph-memory", version = "0.3.0" }
-zeph-skills = { path = "crates/zeph-skills", version = "0.3.0" }
-zeph-tools = { path = "crates/zeph-tools", version = "0.3.0" }
+zeph-channels = { path = "crates/zeph-channels", version = "0.4.0" }
+zeph-core = { path = "crates/zeph-core", version = "0.4.0" }
+zeph-llm = { path = "crates/zeph-llm", version = "0.4.0" }
+zeph-memory = { path = "crates/zeph-memory", version = "0.4.0" }
+zeph-skills = { path = "crates/zeph-skills", version = "0.4.0" }
+zeph-tools = { path = "crates/zeph-tools", version = "0.4.0" }
 
 [workspace.lints.clippy]
 all = "warn"
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 [![codecov](https://codecov.io/gh/bug-ops/zeph/graph/badge.svg?token=S5O0GR9U6G)](https://codecov.io/gh/bug-ops/zeph)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 
-Lightweight AI agent with hybrid inference (Ollama / Claude), skills-first architecture, and multi-channel I/O.
+Lightweight AI agent with hybrid inference (Ollama / Claude), skills-first architecture, semantic memory with Qdrant, and multi-channel I/O.
 
 ## Installation
 
@@ -80,6 +80,12 @@ paths = ["./skills"]
 [memory]
 sqlite_path = "./data/zeph.db"
 history_limit = 50
+summarization_threshold = 100  # Trigger summarization after N messages
+context_budget_tokens = 0      # 0 = unlimited (proportional split: 15% summaries, 25% recall, 60% recent)
+
+[memory.semantic]
+enabled = false               # Enable semantic search via Qdrant
+recall_limit = 5              # Number of semantically relevant messages to inject
 
 [tools]
 enabled = true
@@ -100,6 +106,9 @@ blocked_commands = []  # Additional patterns beyond defaults
 | `ZEPH_CLAUDE_API_KEY` | Anthropic API key (required for Claude) |
 | `ZEPH_TELEGRAM_TOKEN` | Telegram bot token (enables Telegram mode) |
 | `ZEPH_SQLITE_PATH` | SQLite database path |
+| `ZEPH_QDRANT_URL` | Qdrant server URL (default: `http://localhost:6334`) |
+| `ZEPH_MEMORY_SUMMARIZATION_THRESHOLD` | Trigger summarization after N messages (default: 100) |
+| `ZEPH_MEMORY_CONTEXT_BUDGET_TOKENS` | Context budget for proportional token allocation (default: 0 = unlimited) |
 | `ZEPH_TOOLS_TIMEOUT` | Shell command timeout in seconds (default: 30) |
 
 > [!IMPORTANT]
@@ -130,6 +139,61 @@ Use curl to fetch search results...
 
 All loaded skills are injected into the system prompt.
 
+## Semantic Memory (Optional)
+
+> [!TIP]
+> Enable semantic search to retrieve contextually relevant messages from conversation history using vector similarity.
+
+Zeph supports optional integration with [Qdrant](https://qdrant.tech/) for semantic memory:
+
+1. **Start Qdrant:**
+
+   ```bash
+   docker compose up -d qdrant
+   ```
+
+2. **Enable semantic memory in config:**
+
+   ```toml
+   [memory.semantic]
+   enabled = true
+   recall_limit = 5
+   ```
+
+3. **Automatic embedding:** Messages are embedded asynchronously using the configured `embedding_model` and stored in Qdrant alongside SQLite.
+
+4. **Semantic recall:** Context builder injects semantically relevant messages from full history, not just recent messages.
+
+5. **Graceful degradation:** If Qdrant is unavailable, Zeph falls back to SQLite-only mode (recency-based history).
+
+> [!NOTE]
+> Requires Ollama with an embedding model (e.g., `qwen3-embedding`). Claude API does not support embeddings natively.
+
+## Conversation Summarization (Optional)
+
+> [!TIP]
+> Automatically compress long conversation histories using LLM-based summarization to stay within context budget limits.
+
+Zeph supports automatic conversation summarization:
+
+- Triggered when message count exceeds `summarization_threshold` (default: 100)
+- Summaries stored in SQLite with token estimates
+- Context builder allocates proportional token budget:
+  - 15% for summaries
+  - 25% for semantic recall (if enabled)
+  - 60% for recent message history
+
+Enable via configuration:
+
+```toml
+[memory]
+summarization_threshold = 100
+context_budget_tokens = 8000  # Set to LLM context window size (0 = unlimited)
+```
+
+> [!IMPORTANT]
+> Summarization requires an LLM provider (Ollama or Claude). Set `context_budget_tokens = 0` to disable proportional allocation and use unlimited context.
+
 ## Docker
 
 ### Apple Silicon (Ollama on host with Metal GPU)
@@ -160,10 +224,10 @@ docker compose --profile gpu -f docker-compose.yml -f docker-compose.gpu.yml up
 
 ```
 zeph (binary)
-├── zeph-core       Agent loop, config, channel trait
-├── zeph-llm        LlmProvider trait, Ollama + Claude backends, token streaming
+├── zeph-core       Agent loop, config, channel trait, context builder
+├── zeph-llm        LlmProvider trait, Ollama + Claude backends, token streaming, embeddings
 ├── zeph-skills     SKILL.md parser, registry, prompt formatter
-├── zeph-memory     SQLite conversation persistence
+├── zeph-memory     SQLite + Qdrant, SemanticMemory orchestrator, summarization
 ├── zeph-channels   Telegram adapter (teloxide) with streaming
 └── zeph-tools      ToolExecutor trait, ShellExecutor with bash parser
 ```