Skip to content

Commit 68ac332

Browse files
authored
chore: release v0.4.0 (#66)
Update version to 0.4.0 across workspace manifests. Add comprehensive documentation for M9 milestone features including Qdrant integration, semantic memory orchestration, conversation summarization, and context budget management. Changes: - Bump version from 0.3.0 to 0.4.0 in workspace package section - Update all zeph-* crate versions in workspace dependencies - Document M9 Phase 1 (Qdrant integration) in CHANGELOG - Document M9 Phase 2 (semantic memory integration) in CHANGELOG - Document M9 Phase 3 (summarization and context budget) in CHANGELOG - Add semantic memory section to README with Qdrant setup instructions - Add conversation summarization section to README - Update configuration examples with new memory.semantic settings - Add environment variables for Qdrant URL and summarization config - Update architecture diagram with embeddings and semantic capabilities
1 parent 5c01711 commit 68ac332

File tree

4 files changed

+120
-18
lines changed

4 files changed

+120
-18
lines changed

CHANGELOG.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,46 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66

77
## [Unreleased]
88

9+
## [0.4.0] - 2026-02-08
10+
911
### Added
1012

13+
#### M9 Phase 3: Conversation Summarization and Context Budget (Issue #62)
14+
- New `SemanticMemory::summarize()` method for LLM-based conversation compression
15+
- Automatic summarization triggered when message count exceeds threshold
16+
- SQLite migration `003_summaries.sql` creates dedicated summaries table with CASCADE constraints
17+
- `SqliteStore::save_summary()` stores summary with metadata (first/last message IDs, token estimate)
18+
- `SqliteStore::load_summaries()` retrieves all summaries for a conversation ordered by ID
19+
- `SqliteStore::load_messages_range()` fetches messages after specific ID with limit for batch processing
20+
- `SqliteStore::count_messages()` counts total messages in conversation
21+
- `SqliteStore::latest_summary_last_message_id()` gets last summarized message ID for resumption
22+
- `ContextBudget` struct for proportional token allocation (15% summaries, 25% semantic recall, 60% recent history)
23+
- `estimate_tokens()` helper using chars/4 heuristic (100x faster than tiktoken, ±25% accuracy)
24+
- `Agent::check_summarization()` lazy trigger after persist_message() when threshold exceeded
25+
- Batch size = threshold/2 to balance summary quality with LLM call frequency
26+
- Configuration: `memory.summarization_threshold` (default: 100), `memory.context_budget_tokens` (default: 0 = unlimited)
27+
- Environment overrides: `ZEPH_MEMORY_SUMMARIZATION_THRESHOLD`, `ZEPH_MEMORY_CONTEXT_BUDGET_TOKENS`
28+
- Inline comments in `config/default.toml` documenting all configuration parameters
29+
- 26 new unit tests for summarization and context budget (196 total tests, 75.31% coverage)
30+
- Architecture Decision Records ADR-016 through ADR-019 for summarization design
31+
- Foreign key constraint added to `messages.conversation_id` with ON DELETE CASCADE
32+
33+
#### M9 Phase 2: Semantic Memory Integration (Issue #61)
34+
- `SemanticMemory<P: LlmProvider>` orchestrator coordinating SQLite, Qdrant, and LlmProvider
35+
- `SemanticMemory::remember()` saves message to SQLite, generates embedding, stores in Qdrant
36+
- `SemanticMemory::recall()` performs semantic search with query embedding and fetches messages from SQLite
37+
- `SemanticMemory::has_embedding()` checks if message already embedded to prevent duplicates
38+
- `SemanticMemory::embed_missing()` background task to embed old messages (with LIMIT parameter)
39+
- `Agent<P, C, T>` now generic over LlmProvider to support SemanticMemory
40+
- `Agent::with_memory()` replaces SqliteStore with SemanticMemory
41+
- Graceful degradation: embedding failures logged but don't block message save
42+
- Qdrant connection failures silently downgrade to SQLite-only mode (no semantic recall)
43+
- Generic provider pattern: `SemanticMemory<P: LlmProvider>` instead of `Arc<dyn LlmProvider>` for Edition 2024 async trait compatibility
44+
- `AnyProvider`, `OllamaProvider`, `ClaudeProvider` now derive/implement `Clone` for semantic memory integration
45+
- Integration test updated for SemanticMemory API (with_memory now takes 5 parameters including recall_limit)
46+
- Semantic memory config: `memory.semantic.enabled`, `memory.semantic.recall_limit` (default: 5)
47+
- 18 new tests for semantic memory orchestration (recall, remember, embed_missing, graceful degradation)
48+
1149
#### M9 Phase 1: Qdrant Integration (Issue #60)
1250
- New `QdrantStore` module in zeph-memory for vector storage and similarity search
1351
- `QdrantStore::store()` persists embeddings to Qdrant and tracks metadata in SQLite

Cargo.lock

Lines changed: 7 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ resolver = "3"
55
[workspace.package]
66
edition = "2024"
77
rust-version = "1.88"
8-
version = "0.3.0"
8+
version = "0.4.0"
99
authors = ["bug-ops"]
1010
license = "MIT"
1111
repository = "https://github.com/bug-ops/zeph"
@@ -30,12 +30,12 @@ toml = "0.9"
3030
tracing = "0.1"
3131
tracing-subscriber = "0.3"
3232
uuid = "1.20"
33-
zeph-channels = { path = "crates/zeph-channels", version = "0.3.0" }
34-
zeph-core = { path = "crates/zeph-core", version = "0.3.0" }
35-
zeph-llm = { path = "crates/zeph-llm", version = "0.3.0" }
36-
zeph-memory = { path = "crates/zeph-memory", version = "0.3.0" }
37-
zeph-skills = { path = "crates/zeph-skills", version = "0.3.0" }
38-
zeph-tools = { path = "crates/zeph-tools", version = "0.3.0" }
33+
zeph-channels = { path = "crates/zeph-channels", version = "0.4.0" }
34+
zeph-core = { path = "crates/zeph-core", version = "0.4.0" }
35+
zeph-llm = { path = "crates/zeph-llm", version = "0.4.0" }
36+
zeph-memory = { path = "crates/zeph-memory", version = "0.4.0" }
37+
zeph-skills = { path = "crates/zeph-skills", version = "0.4.0" }
38+
zeph-tools = { path = "crates/zeph-tools", version = "0.4.0" }
3939

4040
[workspace.lints.clippy]
4141
all = "warn"

README.md

Lines changed: 68 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
[![codecov](https://codecov.io/gh/bug-ops/zeph/graph/badge.svg?token=S5O0GR9U6G)](https://codecov.io/gh/bug-ops/zeph)
55
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
66

7-
Lightweight AI agent with hybrid inference (Ollama / Claude), skills-first architecture, and multi-channel I/O.
7+
Lightweight AI agent with hybrid inference (Ollama / Claude), skills-first architecture, semantic memory with Qdrant, and multi-channel I/O.
88

99
## Installation
1010

@@ -80,6 +80,12 @@ paths = ["./skills"]
8080
[memory]
8181
sqlite_path = "./data/zeph.db"
8282
history_limit = 50
83+
summarization_threshold = 100 # Trigger summarization after N messages
84+
context_budget_tokens = 0 # 0 = unlimited (proportional split: 15% summaries, 25% recall, 60% recent)
85+
86+
[memory.semantic]
87+
enabled = false # Enable semantic search via Qdrant
88+
recall_limit = 5 # Number of semantically relevant messages to inject
8389
8490
[tools]
8591
enabled = true
@@ -100,6 +106,9 @@ blocked_commands = [] # Additional patterns beyond defaults
100106
| `ZEPH_CLAUDE_API_KEY` | Anthropic API key (required for Claude) |
101107
| `ZEPH_TELEGRAM_TOKEN` | Telegram bot token (enables Telegram mode) |
102108
| `ZEPH_SQLITE_PATH` | SQLite database path |
109+
| `ZEPH_QDRANT_URL` | Qdrant server URL (default: `http://localhost:6334`) |
110+
| `ZEPH_MEMORY_SUMMARIZATION_THRESHOLD` | Trigger summarization after N messages (default: 100) |
111+
| `ZEPH_MEMORY_CONTEXT_BUDGET_TOKENS` | Context budget for proportional token allocation (default: 0 = unlimited) |
103112
| `ZEPH_TOOLS_TIMEOUT` | Shell command timeout in seconds (default: 30) |
104113

105114
> [!IMPORTANT]
@@ -130,6 +139,61 @@ Use curl to fetch search results...
130139

131140
All loaded skills are injected into the system prompt.
132141

142+
## Semantic Memory (Optional)
143+
144+
> [!TIP]
145+
> Enable semantic search to retrieve contextually relevant messages from conversation history using vector similarity.
146+
147+
Zeph supports optional integration with [Qdrant](https://qdrant.tech/) for semantic memory:
148+
149+
1. **Start Qdrant:**
150+
151+
```bash
152+
docker compose up -d qdrant
153+
```
154+
155+
2. **Enable semantic memory in config:**
156+
157+
```toml
158+
[memory.semantic]
159+
enabled = true
160+
recall_limit = 5
161+
```
162+
163+
3. **Automatic embedding:** Messages are embedded asynchronously using the configured `embedding_model` and stored in Qdrant alongside SQLite.
164+
165+
4. **Semantic recall:** Context builder injects semantically relevant messages from full history, not just recent messages.
166+
167+
5. **Graceful degradation:** If Qdrant is unavailable, Zeph falls back to SQLite-only mode (recency-based history).
168+
169+
> [!NOTE]
170+
> Requires Ollama with an embedding model (e.g., `qwen3-embedding`). Claude API does not support embeddings natively.
171+
172+
## Conversation Summarization (Optional)
173+
174+
> [!TIP]
175+
> Automatically compress long conversation histories using LLM-based summarization to stay within context budget limits.
176+
177+
Zeph supports automatic conversation summarization:
178+
179+
- Triggered when message count exceeds `summarization_threshold` (default: 100)
180+
- Summaries stored in SQLite with token estimates
181+
- Context builder allocates proportional token budget:
182+
- 15% for summaries
183+
- 25% for semantic recall (if enabled)
184+
- 60% for recent message history
185+
186+
Enable via configuration:
187+
188+
```toml
189+
[memory]
190+
summarization_threshold = 100
191+
context_budget_tokens = 8000 # Set to LLM context window size (0 = unlimited)
192+
```
193+
194+
> [!IMPORTANT]
195+
> Summarization requires an LLM provider (Ollama or Claude). Set `context_budget_tokens = 0` to disable proportional allocation and use unlimited context.
196+
133197
## Docker
134198

135199
### Apple Silicon (Ollama on host with Metal GPU)
@@ -160,10 +224,10 @@ docker compose --profile gpu -f docker-compose.yml -f docker-compose.gpu.yml up
160224

161225
```
162226
zeph (binary)
163-
├── zeph-core Agent loop, config, channel trait
164-
├── zeph-llm LlmProvider trait, Ollama + Claude backends, token streaming
227+
├── zeph-core Agent loop, config, channel trait, context builder
228+
├── zeph-llm LlmProvider trait, Ollama + Claude backends, token streaming, embeddings
165229
├── zeph-skills SKILL.md parser, registry, prompt formatter
166-
├── zeph-memory SQLite conversation persistence
230+
├── zeph-memory SQLite + Qdrant, SemanticMemory orchestrator, summarization
167231
├── zeph-channels Telegram adapter (teloxide) with streaming
168232
└── zeph-tools ToolExecutor trait, ShellExecutor with bash parser
169233
```

0 commit comments

Comments
 (0)