diff --git a/docs/02-implementation/AGENT_GUIDE.md b/docs/02-implementation/AGENT_GUIDE.md index 5093701..ce780be 100644 --- a/docs/02-implementation/AGENT_GUIDE.md +++ b/docs/02-implementation/AGENT_GUIDE.md @@ -203,7 +203,7 @@ schedule = ["24h", "6h"] @property def database_path(self) -> Path: """Get canonical database file path. - + Automatically strips query parameters (e.g., ?mode=ro) from SQLite URLs before extracting the file path. This prevents invalid file paths when URLs include query parameters. @@ -277,7 +277,7 @@ def init_db() -> None: async def init_db_async() -> None: """Initialize database asynchronously (for FastAPI lifespan). - + Runs migrations in a threadpool using asyncio.to_thread() to avoid blocking the event loop. """ diff --git a/docs/02-implementation/PR-PLANS.md b/docs/02-implementation/PR-PLANS.md index 0ac7258..c2a12d7 100644 --- a/docs/02-implementation/PR-PLANS.md +++ b/docs/02-implementation/PR-PLANS.md @@ -2,19 +2,25 @@ **Status:** Spec Complete | Implementation In Progress **Last Reviewed:** 2025-12-31 -**Total PRs:** 17 (PR-001 through PR-012, plus breakdowns: PR-003a/b/c, PR-005a/b) +**Total PRs:** 18 (PR-001 through PR-012, plus PR-003B and PR-013 through PR-017) ## Overview This document tracks all planned Pull Requests for the TaskGenie project. The plan follows an **Interactive TUI-First** strategy, with chat as the primary capability within the interactive interface. +## Objective (North Star) + +Ship a **local-first task manager** where a user can run `tgenie`, manage tasks in an interactive TUI, and use chat as the primary workflow (with RAG + integrations layered on later). + **Strategy:** 1. **Foundation:** DB + Task API (unblocks everything) -2. **UX First:** Interactive TUI early, iterate fast on usability -3. **Chat Next:** Add LLM chat inside the TUI once the UX shell exists -4. **Feature Delivery:** Add attachments, then ship the earliest “tryable” features (Notifications and/or Integrations) -5. **Intelligence:** RAG + semantic search to level-up chat and discovery -6. **Polish:** Web UI (optional) + deployment/docs +2. **Observability:** Structured logs + telemetry early to de-risk debugging +3. **UX First:** Interactive TUI early, iterate fast on usability +4. **Chat Next:** Add LLM chat inside the TUI once the UX shell exists +5. **Agent Foundations:** Tool-calling + event system for safe actions and updates +6. **Feature Delivery:** Add attachments, then ship the earliest “tryable” features (Notifications and/or Integrations) +7. **Intelligence:** RAG + semantic search to level-up chat and discovery +8. **Polish:** Agent UX + Web UI (optional) + deployment/docs > **Note:** The primary UX is an interactive TUI entered via `tgenie` (default), with chat as the main mode. Non-interactive subcommands (`tgenie add`, `tgenie list`, etc.) exist for scripting and automation. @@ -25,32 +31,37 @@ This document tracks all planned Pull Requests for the TaskGenie project. The pl - Interactive TUI: `docs/01-design/DESIGN_TUI.md` - Background jobs (no queue MVP): `docs/01-design/DESIGN_BACKGROUND_JOBS.md` - Core architecture: `docs/01-design/DESIGN_ARCHITECTURE.md` +- Skill enrichment tags: `docs/02-implementation/SKILL_ENRICHMENT_SUMMARY.md` ## Recommended Execution Order (UX-First) This sequence prioritizes **something usable early** (good UX) and then adds capabilities bit-by-bit. | Seq | PR | Title | Why now? | Depends on | Skill Enrichment | -|---:|---|---|---|---| +|---:|---|---|---|---|---| | 1 | PR-001 | Database & Configuration | Foundation + migrations | - | - | -| 2 | PR-002 | Task CRUD API | Core workflows + enables clients | PR-001 | api-testing | -| 3 | PR-008 | Interactive TUI (Tasks MVP) | Validate UX early | PR-002 | tui-dev | -| 4 | PR-003a | LLM Provider Abstraction | Provider configuration foundation | PR-001 | - | -| 5 | PR-003b | Streaming Chat API | API surface for chat | PR-001, PR-002, PR-003a | api-testing | -| 6 | PR-003c | TUI Chat Integration | Make chat real inside TUI | PR-002, PR-003a, PR-003b | tui-dev | -| 7 | PR-009 | CLI Subcommands (Secondary) | Scriptable workflows | PR-002 | task-workflow | -| 8 | PR-004 | Attachments + Link Detection | Context capture for real work | PR-002 | task-workflow | -| 9 | PR-011 | Notifications | Early "daily value" | PR-002 | task-workflow | -| 10 | PR-007 | GitHub Integration | High-value for dev tasks | PR-004 | integration-setup | -| 11 | PR-006 | Gmail Integration | High-value, higher complexity | PR-004 | integration-setup | -| 12 | PR-005a | ChromaDB + Embeddings | Vector store foundation | PR-001, PR-004 | rag-testing | -| 13 | PR-005b | Semantic Search + RAG | Better recall + better chat | PR-001, PR-003b, PR-004, PR-005a | rag-testing, context-optimization, context-compression | -| 14 | PR-010 | Web UI | Secondary UX for rich preview | PR-002 (chat optional: PR-003c) | - | -| 15 | PR-012 | Deployment + Docs | Make it easy to run/share | PR-010, PR-011 | - | +| 2 | PR-017 | DB Config Follow-ups | Close PR-001 gaps | PR-001 | - | +| 3 | PR-016 | Observability Baseline | De-risk debugging early | PR-001 | - | +| 4 | PR-002 | Task CRUD API | Core workflows + enables clients | PR-001 | api-testing | +| 5 | PR-008 | Interactive TUI (Tasks MVP) | Validate UX early | PR-002 | tui-dev | +| 6 | PR-003 | LLM + Chat Backbone | Make chat real (provider + API + TUI) | PR-001, PR-002, PR-008 | api-testing, tui-dev | +| 7 | PR-004 | Attachments + Link Detection | Context capture for real work | PR-002 | task-workflow | +| 8 | PR-013 | Event System + Realtime Updates | Enable subscriptions + hooks | PR-002, PR-004 | - | +| 9 | PR-003B | Agent Tool-Calling Foundation | Safe tool execution | PR-003, PR-002, PR-004 | - | +| 10 | PR-011 | Notifications | Early \"daily value\" | PR-002 | task-workflow | +| 11 | PR-007 | GitHub Integration | High-value for dev tasks | PR-004 | integration-setup | +| 12 | PR-006 | Gmail Integration | High-value, higher complexity | PR-004 | integration-setup | +| 13 | PR-005 | RAG + Semantic Search | Better recall + better chat | PR-003, PR-004 | rag-testing, context-optimization, context-compression | +| 14 | PR-014 | Multi-Agent Orchestration | Coordinated agent runs | PR-003B, PR-013 | - | +| 15 | PR-015 | Agent UX Panel | Visibility + controls | PR-008, PR-003B, PR-013, PR-014 | - | +| 16 | PR-009 | CLI Subcommands (Secondary) | Scriptable workflows + agent CLI | PR-002, PR-003B, PR-014 | task-workflow | +| 17 | PR-010 | Web UI | Secondary UX for rich preview | PR-002, PR-004 (chat optional: PR-003) | - | +| 18 | PR-012 | Deployment + Docs | Make it easy to run/share | PR-001, PR-017, PR-010, PR-011 | - | Notes: -- You can swap **Seq 7–9** based on what you can test earliest (notifications vs integrations). +- You can swap **Seq 10–13** based on what you can test earliest (notifications vs integrations vs RAG). - PR-010 can be started earlier for task pages, but chat streaming needs PR-003. +- PR-015 depends on PR-014 for agent run endpoints. - Specs (with test scenarios): `pr-specs/INDEX.md` ## PR Dependency Diagram @@ -58,60 +69,150 @@ Notes: ```mermaid flowchart TD PR001["PR-001: Database & Config"] + PR017["PR-017: DB Config Follow-ups"] + PR016["PR-016: Observability Baseline"] PR002["PR-002: Task CRUD API"] PR008["PR-008: Interactive TUI (Tasks MVP)"] - PR003a["PR-003a: LLM Provider"] - PR003b["PR-003b: Streaming Chat API"] - PR003c["PR-003c: TUI Chat Integration"] - PR009["PR-009: CLI Subcommands"] + PR003["PR-003: LLM + Chat Backbone"] PR004["PR-004: Attachments + Link Detection"] + PR013["PR-013: Event System + Realtime Updates"] + PR003B["PR-003B: Agent Tool-Calling"] PR011["PR-011: Notifications"] PR007["PR-007: GitHub Integration"] PR006["PR-006: Gmail Integration"] - PR005a["PR-005a: ChromaDB + Embeddings"] - PR005b["PR-005b: Semantic Search + RAG"] + PR005["PR-005: RAG + Semantic Search"] + PR014["PR-014: Multi-Agent Orchestration"] + PR015["PR-015: Agent UX Panel"] + PR009["PR-009: CLI Subcommands"] PR010["PR-010: Web UI"] PR012["PR-012: Deployment + Docs"] + PR001 --> PR017 + PR001 --> PR016 PR001 --> PR002 PR002 --> PR008 - PR001 --> PR003a - PR002 --> PR003b - PR003a --> PR003b - PR002 --> PR003c - PR003a --> PR003c - PR003b --> PR003c - PR002 --> PR009 + PR001 --> PR003 + PR002 --> PR003 + PR008 --> PR003 PR002 --> PR004 + PR002 --> PR013 + PR004 --> PR013 + PR003 --> PR003B + PR002 --> PR003B + PR004 --> PR003B PR004 --> PR007 PR004 --> PR006 - PR001 --> PR005a - PR004 --> PR005a + PR003 --> PR005 + PR004 --> PR005 PR002 --> PR011 - PR001 --> PR005b - PR003b --> PR005b - PR004 --> PR005b - PR005a --> PR005b + PR003B --> PR014 + PR013 --> PR014 + PR008 --> PR015 + PR003B --> PR015 + PR013 --> PR015 + PR014 --> PR015 + PR014 --> PR009 + PR003B --> PR009 PR002 --> PR010 - PR003c -. "chat UI (optional)" .-> PR010 + PR004 --> PR010 + PR003 -. "chat UI (optional)" .-> PR010 + PR001 --> PR012 + PR017 --> PR012 PR010 --> PR012 PR011 --> PR012 ``` -Notes: -- PR-003 has been split into PR-003a (provider), PR-003b (API), PR-003c (TUI integration). -- PR-005 has been split into PR-005a (indexing) and PR-005b (search + RAG). -- PR-010 can ship "tasks-only" early; chat streaming waits on PR-003c. - Notes: - Edges reflect planned dependency relationships. - PR-010 can ship “tasks-only” early; chat streaming waits on PR-003. -## Phase 1: Foundation + UX MVP (Weeks 1-2) +## Dependency Diagrams by Phase + +These diagrams break the full dependency graph into smaller, phase-focused views. + +### Phase 1-3 (Foundation, UX, Chat, Attachments, Agent Foundations) + +```mermaid +flowchart TD + PR001["PR-001: Database & Config"] + PR017["PR-017: DB Config Follow-ups"] + PR016["PR-016: Observability Baseline"] + PR002["PR-002: Task CRUD API"] + PR008["PR-008: Interactive TUI"] + PR003["PR-003: LLM + Chat Backbone"] + PR004["PR-004: Attachments + Link Detection"] + PR013["PR-013: Event System"] + PR003B["PR-003B: Agent Tool-Calling"] + + PR001 --> PR017 + PR001 --> PR016 + PR001 --> PR002 + PR002 --> PR008 + PR001 --> PR003 + PR002 --> PR003 + PR008 --> PR003 + PR002 --> PR004 + PR002 --> PR013 + PR003 --> PR003B + PR004 --> PR003B + PR002 --> PR003B +``` + +### Phase 4-6 (Early Value, Intelligence, Agent Orchestration + UX) + +```mermaid +flowchart TD + PR002["PR-002: Task CRUD API"] + PR004["PR-004: Attachments + Link Detection"] + PR003["PR-003: LLM + Chat Backbone"] + PR013["PR-013: Event System"] + PR003B["PR-003B: Agent Tool-Calling"] + PR008["PR-008: Interactive TUI"] + PR011["PR-011: Notifications"] + PR007["PR-007: GitHub Integration"] + PR006["PR-006: Gmail Integration"] + PR005["PR-005: RAG + Semantic Search"] + PR014["PR-014: Multi-Agent Orchestration"] + PR015["PR-015: Agent UX Panel"] + + PR002 --> PR011 + PR004 --> PR007 + PR004 --> PR006 + PR003 --> PR005 + PR004 --> PR005 + PR003B --> PR014 + PR013 --> PR014 + PR008 --> PR015 + PR003B --> PR015 + PR013 --> PR015 + PR014 --> PR015 +``` + +### Phase 7-8 (Secondary UIs + Scripting, Deploy + Docs) + +```mermaid +flowchart TD + PR002["PR-002: Task CRUD API"] + PR003B["PR-003B: Agent Tool-Calling"] + PR009["PR-009: CLI Subcommands"] + PR010["PR-010: Web UI"] + PR011["PR-011: Notifications"] + PR012["PR-012: Deployment + Docs"] + PR003["PR-003: LLM + Chat Backbone"] + + PR002 --> PR009 + PR003B --> PR009 + PR002 --> PR010 + PR003 -. "chat UI (optional)" .-> PR010 + PR010 --> PR012 + PR011 --> PR012 +``` + +## Phase 1: Foundation + Observability + UX MVP (Weeks 1-2) ### PR-001: Database & Configuration Setup **Branch:** `feature/db-config` -**Status:** ⬜ Not Started +**Status:** ✅ Implemented **Description:** Initialize SQLite database with migrations and environment configuration. **Spec:** `pr-specs/PR-001-db-config.md` **Files to modify:** @@ -123,6 +224,27 @@ Notes: - [ ] Environment variables load correctly - [ ] Tests pass for database operations +### PR-017: DB Config Follow-ups +**Branch:** `feature/db-config-followups` +**Status:** ⬜ Not Started +**Dependency:** PR-001 +**Description:** Close gaps after PR-001 (dir creation, restore safety, FK enforcement, docs alignment). +**Spec:** `pr-specs/PR-017-db-config-followups.md` +**Acceptance Criteria:** +- [ ] `data/chroma` directory is created in app data dir +- [ ] `tgenie db restore` prompts unless `--yes` is provided +- [ ] SQLite foreign keys are enforced on every connection + +### PR-016: Observability Baseline +**Branch:** `feature/observability-baseline` +**Status:** ⬜ Not Started +**Dependency:** PR-001 +**Description:** Structured JSON logs, request IDs, and a lightweight telemetry endpoint. +**Spec:** `pr-specs/PR-016-observability-baseline.md` +**Acceptance Criteria:** +- [ ] Logs include request IDs in JSON format +- [ ] `/api/v1/telemetry` reports DB health and migration version + ### PR-002: Task CRUD API Endpoints **Branch:** `feature/task-crud` **Status:** ⬜ Not Started @@ -144,8 +266,8 @@ Notes: **Description:** The `tgenie` command entry point. Ships an interactive TUI early for core task workflows (chat can be a stub until PR-003). **Spec:** `pr-specs/PR-008-interactive-tui.md` **Files to modify:** -- `backend/cli/main.py` -- `backend/cli/chat_repl.py` +- `backend/cli/main.py` - launch the TUI by default +- `backend/cli/tui/*` - Textual app, screens/widgets, API client **Acceptance Criteria:** - [ ] `tgenie` opens interactive TUI - [ ] Core task flows work end-to-end (create/list/show/edit/done) @@ -155,52 +277,17 @@ Notes: ## Phase 2: Chat + Attachments (Weeks 3-4) -### PR-003a: LLM Provider Abstraction & Configuration -**Branch:** `feature/llm-provider` +### PR-003: LLM + Chat Backbone +**Branch:** `feature/llm-chat-backbone` **Status:** ⬜ Not Started -**Dependency:** PR-001 -**Description:** Implement core LLM provider abstraction and configuration system. -**Spec:** `pr-specs/PR-003a-llm-provider.md` -**Files to modify:** -- `backend/services/llm_service.py` - LLM Provider logic -- `backend/config.py` - LLM configuration -**Acceptance Criteria:** -- [ ] `LLMService` class implements stream_chat interface -- [ ] Configuration loads from env vars and config file -- [ ] Missing API key raises clear `ValueError` -- [ ] OpenRouter integration works end-to-end - -### PR-003b: Streaming Chat API Endpoint -**Branch:** `feature/streaming-chat-api` -**Status:** ⬜ Not Started -**Dependency:** PR-001, PR-002, PR-003a -**Description:** Create FastAPI streaming chat endpoint with SSE. -**Spec:** `pr-specs/PR-003b-streaming-chat-api.md` -**Files to modify:** -- `backend/api/chat.py` - Chat endpoint -- `backend/schemas/chat.py` - Request/response schemas +**Dependency:** PR-001, PR-002, PR-008 +**Description:** Provider configuration + streaming API + TUI wiring (tool-calling is in PR-003B). +**Spec:** `pr-specs/PR-003-llm-chat-backbone.md` **Acceptance Criteria:** -- [ ] `POST /api/v1/chat` endpoint exists -- [ ] SSE format matches spec (`data:` prefix, `[DONE]` terminator) -- [ ] Missing LLM API key returns 500 with clear error -- [ ] OpenAPI docs include SSE protocol explanation - -### PR-003c: TUI Chat Integration -**Branch:** `feature/tui-chat` -**Status:** ⬜ Not Started -**Dependency:** PR-002, PR-003a, PR-003b -**Description:** Integrate chat functionality into interactive TUI from PR-008. -**Spec:** `pr-specs/PR-003c-tui-chat-integration.md` -**Files to modify:** -- `backend/cli/tui/widgets/chat_panel.py` - Chat widget -- `backend/cli/tui/screens/main.py` - Integrate chat panel -- `backend/cli/tui/client.py` - Streaming support -**Acceptance Criteria:** -- [ ] Chat panel widget exists with input + message display -- [ ] User messages appear immediately after sending -- [ ] AI responses stream in real-time -- [ ] Missing LLM API key shows clear error modal -- [ ] Chat history persists for session +- [ ] Chat endpoint returns a streaming response. +- [ ] TUI can send a message and display a streamed reply. +- [ ] Missing/invalid API key results in a clear error message (in TUI and API). +- [ ] Provider selection (model/base_url) works via config. ### PR-004: Attachment API & Link Detection **Branch:** `feature/attachments` @@ -218,7 +305,32 @@ Notes: --- -## Phase 3: Early Value Track (Weeks 5-6) +## Phase 3: Agent Foundations + Events (Weeks 5-6) + +### PR-013: Event System + Realtime Updates +**Branch:** `feature/event-system` +**Status:** ⬜ Not Started +**Dependency:** PR-002 +**Description:** Event log + SSE stream for task/attachment lifecycle events. +**Spec:** `pr-specs/PR-013-event-system.md` +**Acceptance Criteria:** +- [ ] Task and attachment changes emit events +- [ ] SSE stream supports reconnect with `Last-Event-ID` + +### PR-003B: Agent Tool-Calling Foundation +**Branch:** `feature/agent-tools` +**Status:** ⬜ Not Started +**Dependency:** PR-003, PR-002, PR-004 +**Description:** Tool schema + execution pipeline for safe agent actions. +**Spec:** `pr-specs/PR-003B-agent-tool-calling.md` +**Acceptance Criteria:** +- [ ] Tool schema exposed to LLM provider +- [ ] Tool calls execute with validation and timeouts +- [ ] Destructive tools require confirmation + +--- + +## Phase 4: Early Value Track (Weeks 7-8) This phase is intentionally flexible: pick what’s easiest to validate early from a user POV. @@ -259,57 +371,58 @@ This phase is intentionally flexible: pick what’s easiest to validate early fr --- -## Phase 4: Intelligence (Weeks 8-9) +## Phase 5: Intelligence (Weeks 9-10) -### PR-005a: ChromaDB Setup & Embeddings Pipeline -**Branch:** `feature/chromadb-indexing` +### PR-005: RAG + Semantic Search +**Branch:** `feature/rag-semantic-search` **Status:** ⬜ Not Started -**Dependency:** PR-001, PR-004 -**Description:** Implement ChromaDB vector store and embedding service with sentence-transformers. -**Spec:** `pr-specs/PR-005a-chromadb-embeddings.md` -**Files to modify:** -- `backend/services/rag_service.py` - ChromaDB setup -- `backend/services/embedding_service.py` - Embedding generation -- `backend/config.py` - ChromaDB configuration +**Dependency:** PR-003, PR-004 +**Description:** Vector store + indexing + semantic search + chat context injection (single PR spec for now; see “Potential Future Splits”). +**Spec:** `pr-specs/PR-005-rag-semantic-search.md` **Acceptance Criteria:** -- [ ] `EmbeddingService` generates 384-dimension embeddings -- [ ] `RAGService` creates ChromaDB collection on first use -- [ ] Task indexing works on create/update -- [ ] Attachment content is included in parent task's document -- [ ] Batch indexing processes multiple tasks efficiently - -### PR-005b: Semantic Search API & RAG Context Injection -**Branch:** `feature/semantic-search-rag` +- [ ] Tasks and cached attachments are embedded and indexed automatically. +- [ ] Semantic search returns relevant results for representative queries. +- [ ] Chat responses cite retrieved task/attachment context where applicable. + +--- + +## Phase 6: Agent Orchestration + UX (Weeks 11-12) + +### PR-014: Multi-Agent Orchestration +**Branch:** `feature/multi-agent` **Status:** ⬜ Not Started -**Dependency:** PR-001, PR-003b, PR-004, PR-005a -**Description:** Implement semantic search API endpoint and RAG context injection for chat. -**Spec:** `pr-specs/PR-005b-semantic-search-rag.md` -**Files to modify:** -- `backend/api/search.py` - Semantic search endpoint -- `backend/services/rag_service.py` - Search and context building -- `backend/api/chat.py` - RAG context integration +**Dependency:** PR-003B, PR-013 +**Description:** Run management and coordination for multi-agent workflows. +**Spec:** `pr-specs/PR-014-multi-agent-orchestration.md` +**Acceptance Criteria:** +- [ ] Agent runs can be started, paused, and canceled +- [ ] Run status is persisted and queryable + +### PR-015: Agent UX Panel +**Branch:** `feature/agent-ux` +**Status:** ⬜ Not Started +**Dependency:** PR-008, PR-003B, PR-013, PR-014 +**Description:** TUI panel for agent status, tool execution, and controls. +**Spec:** `pr-specs/PR-015-agent-ux-panel.md` **Acceptance Criteria:** -- [ ] `GET /api/v1/search/semantic` endpoint exists -- [ ] Search returns relevant results sorted by similarity -- [ ] Filters (status, priority) work correctly -- [ ] RAG context builder includes task metadata -- [ ] Context truncation respects token budget -- [ ] Chat endpoint injects RAG context into prompts +- [ ] Agent panel updates in real time +- [ ] Pause/resume/cancel controls work --- -## Phase 5: Secondary UIs + Scripting (Weeks 9-10) +## Phase 7: Secondary UIs + Scripting (Weeks 13-14) ### PR-009: CLI Standard Commands (Secondary) **Branch:** `feature/cli-commands` **Status:** ⬜ Not Started -**Dependency:** PR-002 -**Description:** Standard commands (`add`, `list`, `edit`) for scripting/power users. +**Dependency:** PR-002, PR-003B +**Description:** Standard commands for scripting plus agent run/status commands. **Spec:** `pr-specs/PR-009-cli-subcommands.md` **Files to modify:** - `backend/cli/commands.py` **Acceptance Criteria:** - [ ] `tgenie list`, `tgenie add` work as subcommands (for scripting) +- [ ] `tgenie agent run` and `tgenie agent status` return usable results - [ ] Rich terminal output ### PR-010: Web UI (Chat & Tasks) @@ -328,7 +441,7 @@ This phase is intentionally flexible: pick what’s easiest to validate early fr --- -## Phase 6: Deploy + Docs (Weeks 11-12) +## Phase 8: Deploy + Docs (Weeks 15-16) ### PR-012: Deployment & Documentation **Branch:** `feature/deploy` @@ -346,24 +459,34 @@ This phase is intentionally flexible: pick what’s easiest to validate early fr | Phase | Focus | Weeks | Key PRs | |-------|-------|--------|----------| -| **1** | **Foundation + UX MVP** | 1-2 | PR-001 (DB), PR-002 (Task API), PR-008 (TUI Tasks) | -| **2** | **Chat + Attachments** | 3-4 | PR-003a (Provider), PR-003b (API), PR-003c (TUI), PR-004 (Attachments) | -| **3** | **Early Value Track** | 5-6 | PR-011 (Notifications) and/or PR-007 (GitHub) / PR-006 (Gmail) | -| **4** | **Intelligence** | 8-9 | PR-005a (Indexing), PR-005b (Search+RAG) | -| **5** | **Secondary UIs** | 10-11 | PR-009 (CLI subcommands), PR-010 (Web UI) | -| **6** | **Deploy + Docs** | 12 | PR-012 (Deployment & Docs) | - -**Total Estimated Effort:** ~150 hours (~18-20 weeks for one developer) +| **1** | **Foundation + Observability + UX MVP** | 1-2 | PR-001 (DB), PR-017 (DB follow-ups), PR-016 (Observability), PR-002 (Task API), PR-008 (TUI Tasks) | +| **2** | **Chat + Attachments** | 3-4 | PR-003 (Chat backbone), PR-004 (Attachments) | +| **3** | **Agent Foundations + Events** | 5-6 | PR-013 (Events), PR-003B (Tool-calling) | +| **4** | **Early Value Track** | 7-8 | PR-011 (Notifications) and/or PR-007 (GitHub) / PR-006 (Gmail) | +| **5** | **Intelligence** | 9-10 | PR-005 (RAG + Semantic Search) | +| **6** | **Agent Orchestration + UX** | 11-12 | PR-014 (Multi-agent), PR-015 (Agent UX) | +| **7** | **Secondary UIs + Scripting** | 13-14 | PR-009 (CLI subcommands), PR-010 (Web UI) | +| **8** | **Deploy + Docs** | 15-16 | PR-012 (Deployment & Docs) | -**Key Changes:** -- **PR-003 split**: Provider (003a) → API (003b) → TUI (003c) for parallel work and focused testing -- **PR-005 split**: Indexing (005a) + Search/RAG (005b) to separate technical concerns +**Total Estimated Effort:** ~140 hours (≈16–18 weeks at ~8h/week, or ≈4 weeks full-time) **Skill Integration:** -- `api-testing`: PR-002, PR-003b -- `rag-testing`: PR-005a, PR-005b +- `api-testing`: PR-002, PR-003 +- `rag-testing`: PR-005 - `integration-setup`: PR-006, PR-007 -- `tui-dev`: PR-008, PR-003c -- `context-optimization`: PR-005b -- `context-compression`: PR-005b (future chat history) +- `tui-dev`: PR-008, PR-003 +- `context-optimization`: PR-005 +- `context-compression`: PR-005 (future chat history) - `task-workflow`: PR-004, PR-009, PR-011 + +## Potential Future Splits (Not Part of Current Specs) + +If additional splits are needed, prefer numeric suffixes (e.g., `PR-003-1`, `PR-003-2`) for easier sorting. PR-003B is reserved for the tool-calling foundation. + +- PR-003 (candidate): + - `PR-003-1`: LLM provider config + service abstraction + - `PR-003-2`: Streaming chat API (SSE) + - `PR-003-3`: TUI chat integration +- PR-005 (candidate): + - `PR-005-1`: Vector store + embeddings + indexing + - `PR-005-2`: Semantic search + RAG context injection diff --git a/docs/02-implementation/pr-specs/INDEX.md b/docs/02-implementation/pr-specs/INDEX.md index ea66187..cb22036 100644 --- a/docs/02-implementation/pr-specs/INDEX.md +++ b/docs/02-implementation/pr-specs/INDEX.md @@ -12,6 +12,7 @@ Design deep-dives live in `docs/01-design/` (notably `DESIGN_TUI.md`, `DESIGN_CH - [PR-001-db-config.md](PR-001-db-config.md) - Database, config, migrations, backup/restore - [PR-002-task-crud-api.md](PR-002-task-crud-api.md) - Task CRUD API - [PR-003-llm-chat-backbone.md](PR-003-llm-chat-backbone.md) - LLM service + chat API + TUI chat wiring +- [PR-003B-agent-tool-calling.md](PR-003B-agent-tool-calling.md) - Tool-calling foundation for agents - [PR-004-attachments-link-detection.md](PR-004-attachments-link-detection.md) - Attachments API + link detection - [PR-005-rag-semantic-search.md](PR-005-rag-semantic-search.md) - ChromaDB/RAG + semantic search + chat context - [PR-006-gmail-integration.md](PR-006-gmail-integration.md) - Gmail URL/OAuth + content fetch/cache @@ -21,3 +22,8 @@ Design deep-dives live in `docs/01-design/` (notably `DESIGN_TUI.md`, `DESIGN_CH - [PR-010-web-ui.md](PR-010-web-ui.md) - Web UI (tasks first, chat optional) - [PR-011-notifications.md](PR-011-notifications.md) - Notifications scheduling + delivery + history - [PR-012-deployment-docs.md](PR-012-deployment-docs.md) - Docker/deploy + docs +- [PR-013-event-system.md](PR-013-event-system.md) - Event system + realtime updates +- [PR-014-multi-agent-orchestration.md](PR-014-multi-agent-orchestration.md) - Multi-agent orchestration +- [PR-015-agent-ux-panel.md](PR-015-agent-ux-panel.md) - TUI agent panel + controls +- [PR-016-observability-baseline.md](PR-016-observability-baseline.md) - Structured logging + telemetry +- [PR-017-db-config-followups.md](PR-017-db-config-followups.md) - DB config follow-ups diff --git a/docs/02-implementation/pr-specs/PR-001-db-config.md b/docs/02-implementation/pr-specs/PR-001-db-config.md index a09fb3a..8b39336 100644 --- a/docs/02-implementation/pr-specs/PR-001-db-config.md +++ b/docs/02-implementation/pr-specs/PR-001-db-config.md @@ -6,54 +6,43 @@ ## Goal -Establish a reliable local-first foundation: -- configuration loading (env + config file) -- SQLite database initialization -- repeatable schema migrations -- safe backup/restore workflows +Establish the local-first foundation: config loading, SQLite initialization, +migrations, and backup/restore workflows. ## User Value - The app starts consistently on any machine. -- Schema changes are safe (no “delete DB and hope”). -- Users can backup/restore with one or two commands. +- Schema changes are safe and repeatable. +- Users can back up and restore data with clear commands. + +## References + +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` +- `docs/01-design/DECISIONS.md` ## Scope ### In -- Standardize config sources and precedence (e.g., env vars → `.env` → `~/.taskgenie/config.toml`). -- Introduce migrations (recommended: Alembic) for SQLite schema evolution. -- Add a simple DB CLI surface (either under `tgenie db ...` or a dedicated script): - - upgrade/downgrade migrations - - dump to `.sql` - - restore from `.sql` (with clear warnings) -- Define canonical data paths (DB file, vector store, attachments cache). +- Configuration sources and precedence (env vars, `.env`, user config, defaults). +- Canonical app data directory and subpaths (DB, cache, vector store, attachments, logs). +- Alembic migrations for SQLite and CLI wrappers for upgrade/downgrade/revision. +- Backup and restore commands with explicit confirmation on restore. +- FastAPI startup initializes the database and runs migrations when needed. ### Out -- Building a full “admin console” for DB management. -- Cross-database support (Postgres/MySQL). SQLite-first only. +- Admin UI for DB management. +- Non-SQLite databases. ## Mini-Specs -- Config: - - precedence and validation (env + `.env` + `~/.taskgenie/config.toml`). -- Paths: - - canonical app data dir and subpaths (DB, cache, logs, vector store). -- Migrations: - - Alembic initialized and runnable for SQLite. - - wrapper CLI surface (`tgenie db upgrade|downgrade|revision`). -- Backup/restore: - - dump to `.sql` and restore from `.sql` with explicit overwrite confirmation. -- Docs: - - backup/restore + migration commands documented with examples. - -## References - -- `docs/01-design/DESIGN_DATA.md` (schema + backup/migration notes) -- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` (job persistence patterns) -- `docs/01-design/DECISIONS.md` (TUI-first and migration approach) +- Implement Pydantic Settings in `backend/config.py` with defined precedence. +- Add path helpers for data directory resolution without creating directories at import time. +- Initialize Alembic in `backend/migrations/` with SQLite-safe config and FK enforcement. +- Provide `tgenie db` commands: upgrade, downgrade, revision, dump, restore (reset optional). +- Document backup/restore and migration usage in `docs/`. ## User Stories @@ -62,430 +51,87 @@ Establish a reliable local-first foundation: - As a user, I can dump my data to `backup.sql` and restore it on a new machine. - As a developer, I can reset my local DB quickly during iteration. -## Technical Design +## UX Notes (if applicable) -### Config precedence +- Destructive actions (restore/reset) require explicit confirmation. -- Use **Pydantic Settings** for validation and environment management. -- Recommended precedence (highest wins): - 1. env vars - 2. `.env` file (dev convenience) - 3. config file (default: `~/.taskgenie/config.toml`) - 4. built-in defaults +## Technical Design -### Data locations +### Architecture -- Define a canonical “app data dir” (configurable) for: - - DB file - - vector store - - attachment cache - - logs -- Default to `~/.taskgenie/` to match the rest of the design docs. -- (Optional) On Linux, consider XDG mappings later (config/data/state) if we want to be a better citizen. +- Use Pydantic Settings with cached `get_settings()` and explicit precedence: + env vars > `.env` > `~/.taskgenie/config.toml` > defaults. +- Provide a single resolver for app data paths (configurable via `TASKGENIE_DATA_DIR`). +- FastAPI lifespan calls `init_db_async()` to avoid blocking the event loop. -### Migrations (Alembic) +### Data Model / Migrations -- Add Alembic configuration in `backend/migrations/`. -- Ensure `env.py` is configured for SQLite (handling `PRAGMA foreign_keys=ON`). -- Support: - - `tgenie db upgrade [--rev head]` (wraps `alembic upgrade head`) - - `tgenie db downgrade --rev |-1` - - `tgenie db revision -m "..." --autogenerate` +- Initialize Alembic under `backend/migrations/`. +- Ensure SQLite foreign keys are enabled on every connection. +- Migrations use `sqlite://` for Alembic even when runtime uses `sqlite+aiosqlite://`. -### Backup/restore +### API Contract -- Support the standard SQLite `.dump` workflow. -- If wrapped in CLI, require explicit confirmation on restore. +- CLI surface under `tgenie db`: + `upgrade`, `downgrade`, `revision`, `dump`, `restore` (and optional `reset`). -### DB CLI surface +### Background Jobs -- Provide (or plan) these commands: - - `tgenie db dump --out backup.sql` - - `tgenie db restore --in backup.sql` (confirm overwrite) - - `tgenie db reset --yes` (dev-only convenience) +- N/A. -## Acceptance Criteria +### Security / Privacy -### AC1: Fresh Install Creates Database Automatically ✅ +- Backup/restore commands do not log SQL contents. +- Restore and reset require confirmation before overwrite. -**Success Criteria:** -- [ ] No database file exists initially -- [ ] Running `tgenie db upgrade head` creates database at `~/.taskgenie/data/taskgenie.db` (or configured path) -- [ ] All required tables are created: `tasks`, `attachments`, `notifications`, `chat_history`, `config`, `alembic_version` -- [ ] FastAPI startup automatically runs migrations if DB doesn't exist -- [ ] `/health` endpoint returns `200 OK` after startup - -**How to Test:** - -**Automated:** -```python -def test_fresh_install_creates_db(tmp_path, monkeypatch): - """Test that fresh install creates DB and tables.""" - db_path = tmp_path / "test.db" - monkeypatch.setenv("DATABASE_URL", f"sqlite+aiosqlite:///{db_path}") - - # Verify DB doesn't exist - assert not db_path.exists() - - # Run upgrade - runner = CliRunner() - result = runner.invoke(db_app, ["upgrade", "head"]) - assert result.exit_code == 0 - - # Verify DB created with all tables - assert db_path.exists() - conn = sqlite3.connect(str(db_path)) - cursor = conn.cursor() - cursor.execute("SELECT name FROM sqlite_master WHERE type='table'") - tables = {row[0] for row in cursor.fetchall()} - assert "tasks" in tables - assert "attachments" in tables - assert "notifications" in tables - assert "alembic_version" in tables - conn.close() -``` - -**Manual:** -```bash -# 1. Remove existing DB (if any) -rm ~/.taskgenie/data/taskgenie.db - -# 2. Run upgrade -uv run tgenie db upgrade head - -# 3. Verify DB created -ls -la ~/.taskgenie/data/taskgenie.db - -# 4. Start API and verify health -uv run python -m backend.main & -sleep 2 -curl http://localhost:8080/health -# Expected: {"status": "ok", "version": "0.1.0"} -``` - ---- - -### AC2: Migrations Can Be Created and Applied ✅ +### Error Handling -**Success Criteria:** -- [ ] `tgenie db revision -m "..." --autogenerate` creates new migration file -- [ ] `tgenie db upgrade head` applies all pending migrations -- [ ] `tgenie db downgrade -1` reverts last migration (where feasible) -- [ ] `alembic_version` table tracks current revision -- [ ] Multiple sequential migrations apply correctly - -**How to Test:** - -**Automated:** -```python -def test_create_and_apply_migration(tmp_path, monkeypatch): - """Test creating and applying a migration.""" - # Setup: Fresh DB at initial migration - runner.invoke(db_app, ["upgrade", "head"]) - - # Create new migration - result = runner.invoke( - db_app, - ["revision", "-m", "test migration", "--autogenerate"] - ) - assert result.exit_code == 0 - - # Verify migration file created - versions_dir = Path("backend/migrations/versions") - migration_files = list(versions_dir.glob("*.py")) - assert len(migration_files) >= 2 # initial + new - - # Apply migration - result = runner.invoke(db_app, ["upgrade", "head"]) - assert result.exit_code == 0 - - # Verify alembic_version updated - conn = sqlite3.connect(str(db_path)) - cursor = conn.cursor() - cursor.execute("SELECT version_num FROM alembic_version") - version = cursor.fetchone()[0] - assert version is not None - conn.close() -``` - -**Manual:** -```bash -# 1. Ensure DB is at head -uv run tgenie db upgrade head - -# 2. Create a test migration (add a dummy column to tasks) -# Edit backend/models/task.py to add a test column -# Then generate migration: -uv run tgenie db revision -m "add test column" --autogenerate - -# 3. Review generated migration file -cat backend/migrations/versions/*_add_test_column.py - -# 4. Apply migration -uv run tgenie db upgrade head - -# 5. Verify schema changed -sqlite3 ~/.taskgenie/data/taskgenie.db ".schema tasks" - -# 6. (Optional) Test downgrade -uv run tgenie db downgrade -1 -sqlite3 ~/.taskgenie/data/taskgenie.db ".schema tasks" # Verify reverted -``` - ---- - -### AC3: Backup and Restore Work Correctly ✅ +- Missing optional config files are non-fatal with clear defaults. +- Migration and backup/restore failures surface actionable CLI errors. -**Success Criteria:** -- [ ] `tgenie db dump --out backup.sql` creates SQL dump file -- [ ] Dump file contains all table schemas and data -- [ ] `tgenie db restore --in backup.sql` restores database (with confirmation) -- [ ] Restored database has same schema and data as original -- [ ] Foreign key relationships preserved after restore -- [ ] Restore prompts for confirmation before overwriting - -**How to Test:** - -**Automated:** -```python -async def test_backup_restore_preserves_data(tmp_path, temp_settings): - """Test that backup/restore preserves all data and relationships.""" - # Create task with attachment - async for session in get_db(): - task = Task(id="test-1", title="Test", status="pending", priority="medium") - session.add(task) - await session.flush() - attachment = Attachment(task_id=task.id, type="file", reference="/test.txt") - session.add(attachment) - await session.commit() - - # Backup - backup_file = tmp_path / "backup.sql" - result = runner.invoke(db_app, ["dump", "--out", str(backup_file)]) - assert result.exit_code == 0 - assert backup_file.exists() - - # Delete DB - settings.db_path.unlink() - - # Restore - result = runner.invoke( - db_app, ["restore", "--in", str(backup_file)], - input="y\n" - ) - assert result.exit_code == 0 - - # Verify data restored - async for session in get_db(): - task = await session.get(Task, "test-1") - assert task is not None - assert len(task.attachments) == 1 -``` - -**Manual:** -```bash -# 1. Create some test data (via SQLAlchemy or direct SQL) -sqlite3 ~/.taskgenie/data/taskgenie.db < ~/.taskgenie/config.toml < .env > config.toml > defaults +### Automated ---- +- Unit: settings precedence and path resolution. +- Integration: Alembic upgrade/downgrade paths against temp DB. +- CLI: `tgenie db` commands (upgrade, dump, restore). -## Related Test Documentation +### Manual -- **Migrations Guide:** [`MIGRATIONS.md`](../MIGRATIONS.md) - How to create and manage migrations -- **Testing Guide:** [`TESTING_GUIDE.md`](../TESTING_GUIDE.md) - General testing patterns and practices +- Run `tgenie db upgrade head` on a clean machine; confirm DB created. +- Create data, dump/restore, and confirm data persists. +- Start API and verify `/health` returns OK. ## Notes / Risks / Open Questions -- SQLite downgrade support can be limited depending on migration operations; keep early migrations simple. -- If `tgenie` CLI wiring isn’t available yet, these commands can be exposed via `uv run ...` as an interim step, but the end state should be `tgenie db ...`. +- SQLite downgrade support is limited; keep early migrations simple. +- Structured logging and telemetry are handled in PR-016. diff --git a/docs/02-implementation/pr-specs/PR-002-task-crud-api.md b/docs/02-implementation/pr-specs/PR-002-task-crud-api.md index 792932a..b39204f 100644 --- a/docs/02-implementation/pr-specs/PR-002-task-crud-api.md +++ b/docs/02-implementation/pr-specs/PR-002-task-crud-api.md @@ -6,187 +6,120 @@ ## Goal -Provide the minimal API surface to create, read, update, and delete tasks (plus basic filtering/pagination), so UIs can ship early. +Provide the minimal API surface to create, read, update, and delete tasks with basic +filtering and pagination. ## User Value - Enables the first usable UI (TUI) quickly. -- Establishes a stable contract for future features (attachments, notifications, chat actions). +- Establishes a stable contract for later features. + +## References + +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/API_REFERENCE.md` +- `docs/01-design/DESIGN_CLI.md` ## Scope ### In -- REST endpoints for tasks: - - create (`POST /api/v1/tasks`) - - list (`GET /api/v1/tasks?status=&priority=&due_before=&due_after=&limit=&offset=`) - - get by id (`GET /api/v1/tasks/{id}`) - - update (`PATCH /api/v1/tasks/{id}`) - - delete (`DELETE /api/v1/tasks/{id}`) -- Validation rules (required title, enum validation, timestamps). -- Consistent error format for 400/404/500. +- REST endpoints for task CRUD. +- Basic filters (status, priority, due_before, due_after) and pagination. +- Input validation and consistent error shape. ### Out - Attachments (PR-004). - Semantic search (PR-005). -- Authentication/multi-user (future). +- Event system and realtime updates (PR-013). +- Agent tool schema and actions (PR-003B). +- Authentication/multi-user. ## Mini-Specs -- API routes: - - `POST /api/v1/tasks` - - `GET /api/v1/tasks` (filters + pagination) - - `GET /api/v1/tasks/{id}` - - `PATCH /api/v1/tasks/{id}` - - `DELETE /api/v1/tasks/{id}` -- Validation: - - `title` required; enums validated; timestamps server-generated. -- Error format: - - 404 uses `{"error": "...", "code": "TASK_NOT_FOUND"}` (align `API_REFERENCE.md`). -- OpenAPI: - - Endpoints appear in `/docs` with correct schemas and examples. -- Tests: - - CRUD happy path + validation errors + list filters/pagination. - -## References - -- `docs/01-design/DESIGN_DATA.md` (task schema + relationships) -- `docs/01-design/API_REFERENCE.md` (endpoint examples) -- `docs/01-design/DESIGN_CLI.md` (commands that will rely on this API) +- Implement `/api/v1/tasks` CRUD endpoints with pagination defaults. +- Validate enums and required fields at the API boundary. +- Standardize error responses (404 and validation). +- Publish OpenAPI schemas with examples aligned to `API_REFERENCE.md`. ## User Stories - As a user, I can create tasks and see them appear in my task list. - As a user, I can update task status/priority/ETA and see it reflected everywhere. -- As a user, I can filter tasks by status/priority/due window. +- As a user, I can filter tasks by status, priority, and due window. + +## UX Notes (if applicable) + +- N/A. ## Technical Design -### Data model (SQLAlchemy) +### Architecture -- `Task` model in `backend/models/task.py`: - - `id`: UUID (stored as `VARCHAR(36)` / string) - - `title`: `VARCHAR(255) NOT NULL` - - `description`: `TEXT NULL` - - `status`: `VARCHAR(20) NOT NULL` (`pending|in_progress|completed`) - - `priority`: `VARCHAR(20) NOT NULL` (`low|medium|high|critical`) - - `eta`: `DATETIME NULL` - - `created_at`, `updated_at`: `DATETIME NOT NULL` - - `tags`: `JSON NULL` - - `metadata`: `JSON NULL` -- Index: `(status, eta)` for common “what’s due?” queries. +- FastAPI router with service/repository layer; no direct DB access from clients. +- Use async SQLAlchemy sessions. -### API schemas (Pydantic) +### Data Model / Migrations -- Use the types in `backend/schemas/task.py`: - - `TaskCreate`, `TaskUpdate`, `TaskResponse` - - `TaskStatus` / `TaskPriority` enums validated at the API boundary - - `TaskListResponse`: `{"tasks": [...], "total": 42, "page": 1, "page_size": 50}` +- `Task` fields: id (UUID string), title, description, status, priority, eta, + created_at, updated_at, tags, metadata. +- Index on `(status, eta)` to support "what is due" queries. -### API contract +### API Contract -- `POST /api/v1/tasks` → 201 Created -- `GET /api/v1/tasks` → 200 OK - - Defaults: `limit=50`, `offset=0` -- `GET /api/v1/tasks/{id}` → 200 OK -- `PATCH /api/v1/tasks/{id}` → 200 OK -- `DELETE /api/v1/tasks/{id}` → 204 No Content +- `POST /api/v1/tasks` -> 201 Created. +- `GET /api/v1/tasks` -> 200 OK, supports filters and `limit`/`offset` (defaults 50/0). +- `GET /api/v1/tasks/{id}` -> 200 OK. +- `PATCH /api/v1/tasks/{id}` -> 200 OK, partial update. +- `DELETE /api/v1/tasks/{id}` -> 204 No Content. +- Error shape: `{"error": "...", "code": "TASK_NOT_FOUND"}` for 404s. -### Validation + transitions +### Background Jobs -- Validate enums at the API boundary. -- Status transitions can be permissive in MVP; tighten later if needed. +- N/A. -### Error format +### Security / Privacy -- Standardize errors: - - `{"error": "...", "code": "..."}` +- N/A (no auth in MVP). + +### Error Handling + +- Validation errors return 422 with clear field errors. +- Not found uses standard error shape and code. ## Acceptance Criteria -- [ ] All CRUD endpoints exist under `/api/v1/tasks` and match `docs/01-design/API_REFERENCE.md`. -- [ ] Validation rejects empty title and invalid enums. -- [ ] List supports `status`, `priority`, `due_before`, `due_after`, `limit`, `offset`. -- [ ] 404s return the standard error shape and code. -- [ ] OpenAPI docs render the endpoints + schemas. +### AC1: CRUD Endpoints + +**Success Criteria:** +- [ ] All CRUD endpoints exist and match `API_REFERENCE.md`. +- [ ] OpenAPI docs render endpoints and schemas. + +### AC2: Validation and Error Shape + +**Success Criteria:** +- [ ] Empty titles and invalid enums are rejected. +- [ ] 404 responses use the standard error shape and code. + +### AC3: Filters and Pagination + +**Success Criteria:** +- [ ] List endpoint supports status, priority, due_before, due_after. +- [ ] Limit/offset default to 50/0 and are enforced. ## Test Plan ### Automated -```python -# tests/test_api/test_tasks_crud.py -import pytest -from httpx import AsyncClient - -class TestTasksCRUD: - """Core CRUD scenarios for /api/v1/tasks.""" - - @pytest.mark.asyncio - async def test_create_list_get_update_delete(self, client: AsyncClient): - create_resp = await client.post("/api/v1/tasks", json={"title": "Buy groceries"}) - assert create_resp.status_code == 201 - - task = create_resp.json() - task_id = task["id"] - assert task["title"] == "Buy groceries" - assert task["status"] == "pending" - assert task["priority"] == "medium" - - list_resp = await client.get("/api/v1/tasks", params={"limit": 50, "offset": 0}) - assert list_resp.status_code == 200 - list_data = list_resp.json() - assert any(t["id"] == task_id for t in list_data["tasks"]) - assert isinstance(list_data["total"], int) - - get_resp = await client.get(f"/api/v1/tasks/{task_id}") - assert get_resp.status_code == 200 - assert get_resp.json()["id"] == task_id - - patch_resp = await client.patch(f"/api/v1/tasks/{task_id}", json={"status": "in_progress"}) - assert patch_resp.status_code == 200 - assert patch_resp.json()["status"] == "in_progress" - - del_resp = await client.delete(f"/api/v1/tasks/{task_id}") - assert del_resp.status_code == 204 - - missing_resp = await client.get(f"/api/v1/tasks/{task_id}") - assert missing_resp.status_code == 404 - - @pytest.mark.asyncio - async def test_validation_errors(self, client: AsyncClient): - response = await client.post("/api/v1/tasks", json={"title": ""}) - assert response.status_code == 422 - - @pytest.mark.asyncio - async def test_list_filters(self, client: AsyncClient): - await client.post("/api/v1/tasks", json={"title": "Pending", "status": "pending"}) - await client.post("/api/v1/tasks", json={"title": "Completed", "status": "completed"}) - - response = await client.get("/api/v1/tasks", params={"status": "pending"}) - assert response.status_code == 200 - tasks = response.json()["tasks"] - assert all(t["status"] == "pending" for t in tasks) -``` +- API tests for CRUD, validation, and 404s. +- API tests for filters and pagination ordering. ### Manual -1. Start the API server. -2. Create: - - `curl -X POST http://localhost:8080/api/v1/tasks -H 'content-type: application/json' -d '{"title":"Buy groceries"}'` -3. List: - - `curl http://localhost:8080/api/v1/tasks?limit=50&offset=0` -4. Get by id: - - `curl http://localhost:8080/api/v1/tasks/` -5. Update: - - `curl -X PATCH http://localhost:8080/api/v1/tasks/ -H 'content-type: application/json' -d '{"status":"in_progress"}'` -6. Delete: - - `curl -X DELETE http://localhost:8080/api/v1/tasks/` -3. `curl .../tasks` to list and verify filters. -4. `curl -X PATCH .../tasks/{id}` to update. -5. `curl -X DELETE .../tasks/{id}` then `GET` to confirm 404. +- Use curl to create/list/get/update/delete tasks. +- Verify filtering results for status and due windows. ## Notes / Risks / Open Questions -- Decide whether list endpoints should default-sort (e.g., ETA ascending, then created_at). +- Decide default sort order (eta asc vs created_at desc). diff --git a/docs/02-implementation/pr-specs/PR-003-llm-chat-backbone.md b/docs/02-implementation/pr-specs/PR-003-llm-chat-backbone.md index 04615f4..85bf972 100644 --- a/docs/02-implementation/pr-specs/PR-003-llm-chat-backbone.md +++ b/docs/02-implementation/pr-specs/PR-003-llm-chat-backbone.md @@ -6,216 +6,115 @@ ## Goal -Make chat real: -- implement multi-provider LLM service (OpenRouter/BYOK) -- expose a streaming Chat API -- wire the interactive TUI chat panel to the backend +Implement provider-agnostic LLM chat with streaming responses and wire it into the TUI. ## User Value -- The primary interaction mode (chat inside the TUI) becomes usable. -- Users can ask questions and trigger task actions without leaving the TUI. +- Chat inside the TUI becomes usable. +- Users can bring their own API keys and models. + +## References + +- `docs/01-design/DESIGN_CHAT.md` +- `docs/01-design/API_REFERENCE.md` +- `docs/01-design/DESIGN_TUI.md` +- `docs/01-design/DECISIONS.md` ## Scope ### In -- LLM provider abstraction (at minimum OpenRouter via OpenAI SDK; optional OpenAI-compatible providers). -- Chat endpoint supporting streaming. -- TUI chat view talks to the API and renders streaming responses. -- “No LLM configured” is a first-class UX (clear message, no crash). +- LLM provider abstraction (OpenRouter via OpenAI-compatible SDK). +- Streaming chat API endpoint (SSE). +- TUI chat panel sends messages and renders streamed output. +- Friendly UX for missing/invalid configuration. ### Out - RAG/semantic search context injection (PR-005). -- Advanced tool-calling / agent spawning (future). +- Tool-calling foundation (PR-003B). +- Multi-agent orchestration (PR-014). ## Mini-Specs -- LLM provider configuration: - - OpenRouter via OpenAI SDK (OpenAI-compatible `base_url` + `api_key`). - - model selection via config (`LLM_MODEL`, etc). -- Streaming chat endpoint: - - `POST /api/v1/chat` returns SSE (chunked `data:` messages + `[DONE]`). -- TUI wiring: - - chat panel sends user messages and renders streaming output. - - friendly UX for “not configured”, 401, rate limit, and network errors. -- Observability: - - structured logs for provider errors (no secrets in logs). +- Configurable provider settings (`LLM_API_KEY`, `LLM_BASE_URL`, `LLM_MODEL`). +- `POST /api/v1/chat` returns SSE with `[DONE]` terminator. +- TUI chat client handles streaming, rate limits, and network errors. +- Structured logging for provider errors without leaking content. -## References +## User Stories -- `docs/01-design/DESIGN_CHAT.md` (streaming + error handling) -- `docs/01-design/API_REFERENCE.md` (chat endpoint shape) -- `docs/01-design/DESIGN_TUI.md` (TUI chat panel wiring) -- `docs/01-design/DECISIONS.md` (provider strategy) +- As a user, I can ask "what is due today?" and get a streamed reply in the TUI. +- As a user, I can use my own API key and switch models/providers. +- As a user, I see clear guidance when my API key is missing or invalid. -## User Stories +## UX Notes (if applicable) -- As a user, I can ask “what’s due today?” from inside the TUI and get a streamed response. -- As a user, I can use my own API key (BYOK) and switch models/providers. -- As a user, if my API key is missing/invalid, the UI tells me exactly what to configure. +- Show a clear "LLM not configured" state with setup guidance. ## Technical Design -### Provider abstraction +### Architecture + +- `LLMService` exposes `stream_chat(messages)` and hides provider details. +- Use OpenAI-compatible client with configurable base URL (OpenRouter default). +- TUI uses `httpx.AsyncClient` streaming to append chunks in real time. + +### Data Model / Migrations -- `LLMService` in `backend/services/llm_service.py`: - - `async def stream_chat(messages: List[Dict[str, str]]) -> AsyncIterator[str]` - - Configurable `base_url` and `api_key` for OpenRouter/OpenAI compatibility. +- N/A (chat history persistence is out of scope for this PR). -### API contract (streaming) +### API Contract -- `POST /api/v1/chat` (see `docs/01-design/API_REFERENCE.md` for request fields) -- Request: `application/json` -- Response: Server-Sent Events (SSE) stream (`text/event-stream`) - - Each chunk: `data: \n\n` - - Terminator: `data: [DONE]\n\n` +- `POST /api/v1/chat` accepts message list or single message per `API_REFERENCE.md`. +- Response is `text/event-stream` with `data:` chunks and `[DONE]` terminator. -### TUI integration +### Background Jobs -- `ChatPanel` widget: - - Uses `httpx.AsyncClient` with `stream=True` - - Appends streamed text chunks to the chat transcript incrementally - - Handles `ConnectTimeout` and `401 Unauthorized` with user-friendly modals +- N/A. -### Error mapping +### Security / Privacy -- Normalize common failures: - - missing API key → “configure LLM” message - - 401 → “invalid key” - - 429 → “rate limit” + retry suggestion - - network error → “cannot reach provider” +- Do not log prompts or responses by default. +- Keys are read from env/config and never persisted to DB. + +### Error Handling + +- Map provider failures to user-facing errors: missing key, 401 invalid key, 429 + rate limit, and network timeouts. ## Acceptance Criteria -- [ ] Chat endpoint returns a streaming response. +### AC1: Streaming Chat API + +**Success Criteria:** +- [ ] Chat endpoint streams SSE chunks and ends with `[DONE]`. + +### AC2: TUI Chat Integration + +**Success Criteria:** - [ ] TUI can send a message and display a streamed reply. -- [ ] Missing/invalid API key results in a clear error message (in TUI and API). -- [ ] Provider selection (model/base_url) works via config. + +### AC3: Configuration and Error UX + +**Success Criteria:** +- [ ] Missing or invalid keys produce clear, non-crashing messages. +- [ ] Provider/model configuration works via settings. ## Test Plan ### Automated -```python -# tests/test_api/test_chat.py -import pytest -from httpx import AsyncClient, ASGITransport -from unittest.mock import AsyncMock, patch, MagicMock - -class TestChatEndpoint: - """Tests for POST /api/v1/chat""" - - @pytest.mark.asyncio - async def test_chat_streams_response(self, client: AsyncClient): - """Chat endpoint returns SSE stream.""" - mock_llm = AsyncMock() - mock_llm.stream_chat = AsyncMock(return_value=async_gen_chunks(["Hello", " ", "world", "!"])) - - with patch("backend.services.llm_service.llm_service.stream_chat", mock_llm): - response = await client.post("/api/v1/chat", json={"message": "Hello"}) - assert response.status_code == 200 - - # Verify SSE stream format - content = response.text - assert "data:" in content - assert "[DONE]" in content - - @pytest.mark.asyncio - async def test_chat_missing_api_key(self, client: AsyncClient): - """Missing API key returns clear error.""" - with patch("backend.services.llm_service.llm_service", side_effect=ValueError("API key not configured")): - response = await client.post("/api/v1/chat", json={"message": "test"}) - assert response.status_code == 500 - assert "configure" in response.json()["detail"].lower() - - -class TestLLMService: - """Tests for LLM provider abstraction""" - - @pytest.mark.asyncio - async def test_stream_chat_openrouter(self): - """OpenRouter integration works.""" - from backend.services.llm_service import LLMService - from unittest.mock import AsyncMock - - mock_client = AsyncMock() - mock_response = MagicMock() - mock_response.aiter_bytes.return_value = [b"data: Hello", b"data: World", b"data: [DONE]"] - mock_client.stream.return_value = mock_response - - service = LLMService(api_key="test-key", base_url="https://openrouter.ai/api") - service._client = mock_client - - chunks = [] - async for chunk in service.stream_chat([{"role": "user", "content": "test"}]): - chunks.append(chunk) - - assert len(chunks) == 3 - assert chunks[0].decode() == "data: Hello" - assert chunks[2].decode() == "data: [DONE]" - - @pytest.mark.asyncio - async def test_error_handling_401(self): - """401 Unauthorized returns clear error.""" - from backend.services.llm_service import LLMService - from unittest.mock import AsyncMock, MagicMock - - mock_client = AsyncMock() - mock_response = MagicMock() - mock_response.status_code = 401 - mock_response.json.return_value = {"error": "Invalid API key"} - mock_client.stream.return_value = mock_response - - service = LLMService(api_key="bad-key") - service._client = mock_client - - with pytest.raises(ValueError, match="Invalid API key"): - async for _ in service.stream_chat([{"role": "user", "content": "test"}]): - pass -``` - -```python -# tests/test_services/test_rag_context.py -import pytest - -class TestRAGContext: - """Tests for RAG context building""" - - @pytest.mark.asyncio - async def test_build_context_includes_metadata(self, rag_service_test): - """Context includes task status and priority.""" - context = await rag_service_test.build_context( - "what tasks?", - n_results=2, - include_metadata=True - ) - - assert "status:" in context - assert "priority:" in context - assert "similarity:" in context - - @pytest.mark.asyncio - async def test_build_context_no_results(self, rag_service_test): - """Context message when no relevant results.""" - mock_search = AsyncMock(return_value=[]) - rag_service_test.search = mock_search - - context = await rag_service_test.build_context("test query", n_results=3) - assert "No relevant tasks found" in context -``` +- API tests for SSE format and error mapping. +- Unit tests for `LLMService` provider errors. +- TUI client tests using mocked SSE streams. ### Manual -1. Configure a valid API key and model. -2. Start API and run `tgenie`: - - send "What tasks are due today?" - - verify response streams and renders correctly -3. Unset API key and retry: - - verify UI explains how to configure LLM and does not crash +- Configure a valid key and verify streaming replies in the TUI. +- Unset the key and verify setup guidance is shown. ## Notes / Risks / Open Questions -- Keep SSE payload format consistent across TUI and Web UI to avoid divergence. -- Avoid logging prompts/responses by default (privacy). +- Keep SSE payload format consistent for TUI and Web UI. +- Tool execution and agent workflows are handled in PR-003B. diff --git a/docs/02-implementation/pr-specs/PR-003B-agent-tool-calling.md b/docs/02-implementation/pr-specs/PR-003B-agent-tool-calling.md new file mode 100644 index 0000000..697a425 --- /dev/null +++ b/docs/02-implementation/pr-specs/PR-003B-agent-tool-calling.md @@ -0,0 +1,623 @@ +# PR-003B: Agent Tool-Calling Foundation (Spec) + +**Status:** Spec Only +**Depends on:** PR-003, PR-002, PR-004 +**Last Reviewed:** 2026-01-01 + +## Goal + +Enable tool-calling in chat so the assistant can take safe, auditable actions on +behalf of the user. + +## User Value + +- Users can ask the assistant to create/update tasks directly. +- The assistant can refresh attachments and fetch context on demand. + +## References + +- `docs/01-design/DESIGN_CHAT.md` +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/API_REFERENCE.md` + +## Scope + +### In + +- Tool schema definition and registry (name, description, parameters). +- Tool execution pipeline with validation, timeouts, and retries. +- Safe defaults: allowlist and confirmation for destructive actions. +- Initial tool set covers task CRUD and attachment list/create/delete. +- Semantic search tool is included if PR-005 exists. +- Tool execution logging with correlation IDs (no content logs). + +### Out + +- Multi-agent orchestration (PR-014). +- Long-running background tools and scheduling. +- Marketplace or dynamic tool discovery. + +## Mini-Specs + +- Tool schema defined in Pydantic and exported to JSON Schema for LLM providers. +- `ToolRegistry` for registering tools and validating args. +- `ToolExecutor` that runs tools, captures results, and maps errors. +- Confirmation gate for destructive tools (delete, overwrite). +- Tool results are appended to the chat stream before final model response. + +## User Stories + +- As a user, I can say "mark my task as done" and the assistant updates it. +- As a user, I can ask the assistant to refresh a GitHub attachment. +- As a developer, I can add a new tool with a schema and handler. + +## UX Notes (if applicable) + +- Destructive actions require explicit confirmation before execution. + +## Technical Design + +### Architecture + +- `ToolDefinition` (schema + metadata) and `ToolHandler` (callable). +- Registry stores tools and exposes schema list to the LLM provider. +- Chat flow: + 1. Send messages + tool schema to LLM. + 2. If tool call returned, validate and execute tool. + 3. Append tool result and call LLM again for final response. + +### Data Model / Migrations + +- N/A (tool execution logs are structured logs, not persisted). + +### API Contract + +- `POST /api/v1/chat` accepts tool-enabled requests; tool calls are handled server-side. +- Optional `GET /api/v1/tools` returns tool schema for debugging. + +### Background Jobs + +- N/A (all tools execute inline in this PR). + +### Security / Privacy + +- Tool allowlist and confirmation for destructive operations. +- Tool execution logs contain metadata only (no user content or secrets). + +### Error Handling + +- Invalid tool args return a tool error message back to the model. +- Tool timeouts return a clear failure response without crashing chat. + +### Implementation Notes + +#### Consolidation Principle + +Prefer single comprehensive tools over multiple narrow tools: + +```python +# ❌ Anti-pattern: Multiple narrow tools +def get_task_by_id(task_id: str) -> Task: + """Retrieve task by ID.""" + pass + +def get_tasks_by_status(status: str) -> list[Task]: + """Retrieve tasks by status.""" + pass + +def get_tasks_due_before(date: datetime) -> list[Task]: + """Retrieve tasks due before date.""" + pass + +# ✅ Better: Single comprehensive tool +def query_tasks( + filters: TaskFilters, + limit: int = 50, + sort: str = "eta_asc" +) -> TaskQueryResult: + """ + Query tasks with flexible filtering. + + Use when: + - User asks "What's due today?" + - User asks "Show high-priority tasks" + - User asks "List completed tasks" + + Args: + filters: Filter criteria (status, priority, date_range, tags) + limit: Max results to return (default 50) + sort: Sort order (eta_asc, priority_desc, created_desc) + + Returns: + TaskQueryResult with tasks, total count, and applied filters + + Consolidates: get_task_by_id, get_tasks_by_status, get_tasks_due_before + """ + pass +``` + +#### Tool Description Engineering + +Write descriptions that answer what, when, and what returns: + +```python +@tool_definition( + name="create_task", + description=""" + Create a new task in the task manager. + + Use when: + - User explicitly asks to create a task + - User says "Add a task for X" + - User needs to capture a quick action item + + The task will be created with 'pending' status and 'medium' priority by default. + + Args: + title: Task title (required, 1-255 characters) + description: Detailed description (optional, markdown supported) + priority: Task priority (optional, one of: low, medium, high, critical) + eta: Due date/time (optional, ISO 8601 format) + tags: List of tags (optional, for categorization) + + Returns: + Created task object with ID, timestamps, and initial state + + Errors: + INVALID_TITLE: Title too long or contains invalid characters + INVALID_PRIORITY: Priority value not in allowed set + INVALID_ETA: Date format not parseable + """ +) +async def create_task( + title: str, + description: str | None = None, + priority: str = "medium", + eta: str | None = None, + tags: list[str] | None = None +) -> dict: + """Create a new task.""" + # Implementation... +``` + +## Acceptance Criteria + +### AC1: Tool Schema and Registry + +**Success Criteria:** +- [ ] Tools are defined with JSON Schema parameters. +- [ ] Tool registry exposes the schema list to the chat pipeline. + +### AC2: Tool Execution Flow + +**Success Criteria:** +- [ ] Tool calls are validated and executed with timeout handling. +- [ ] Tool results are included in the chat response flow. + +### AC3: Safety and Confirmation + +**Success Criteria:** +- [ ] Destructive tools require confirmation or are blocked by default. +- [ ] Tool errors are surfaced as readable assistant responses. + +## Test Plan + +### Automated + +- Unit tests for tool schema validation and registry behavior. +- Integration tests for tool call -> execution -> response flow with mocked LLM. +- Safety tests for destructive tool confirmation gating. + +### Manual + +- Run chat and ask the assistant to create/update/complete a task. +- Attempt a destructive action and verify confirmation is required. + +## Notes / Risks / Open Questions + +- Decide whether tool execution logs should be persisted for audit (future). + +### Response Format Optimization + +Provide concise and detailed format options: + +```python +from enum import Enum + +class ResponseFormat(Enum): + """Control response verbosity for token efficiency.""" + CONCISE = "concise" # ID + status only + STANDARD = "standard" # Core fields (title, status, priority, eta) + DETAILED = "detailed" # Full object with all fields + +@tool_definition( + name="list_tasks", + description=f""" + Query tasks from the task manager. + + Use when: + - User asks "What tasks do I have?" + - User needs to see tasks by status/priority/date + - User wants to review workload + + Args: + filters: Filter criteria (optional) + limit: Max results to return (default 50) + format: Response format - 'concise', 'standard', or 'detailed' + + Use 'concise' for quick overviews or when user needs minimal info. + Use 'standard' for most queries where full context needed. + Use 'detailed' when user needs comprehensive view or all fields. + + Returns: + Formatted task list based on requested format. + + 'concise': Returns only task ID and status + 'standard': Returns title, status, priority, eta, created_at + 'detailed': Returns all fields including description, tags, attachments + """ +) +async def list_tasks( + filters: dict | None = None, + limit: int = 50, + format: str = "standard" +) -> dict: + """List tasks with format control.""" + # Implementation... +``` + +### Tool Definition Schema + +Use consistent schema with naming conventions: + +```python +from pydantic import BaseModel, Field +from typing import Optional, List +from enum import Enum + +class ToolParameter(BaseModel): + """Tool parameter definition.""" + name: str + type: str # "string", "integer", "boolean", "array" + description: str + required: bool + default: Optional[any] = None + enum: Optional[List[str]] = None + +class ToolDefinition(BaseModel): + """Tool definition following JSON Schema.""" + name: str = Field(..., description="Tool identifier (verb_noun pattern)") + description: str = Field(..., description="What tool does, when to use it, What it returns") + parameters: ToolParameter | Field(..., description="Parameters schema") + returns: dict = Field( + default={"type": "object"}, + description="Return type and structure" + ) + examples: Optional[List[dict]] = Field( + None, + description="Example usage patterns for agents" + ) + +# Example: Create task tool +CREATE_TASK_TOOL = ToolDefinition( + name="create_task", + description=( + "Create a new task. Use when user asks to add, create, " + "or capture a task. Returns the created task with ID and timestamps." + ), + parameters=ToolParameter( + name="args", + type="object", + description="Task creation parameters", + required=True, + properties={ + "title": { + "type": "string", + "description": "Task title (1-255 characters)" + }, + "priority": { + "type": "string", + "description": "Task priority (default: medium)", + "enum": ["low", "medium", "high", "critical"], + "default": "medium" + }, + "eta": { + "type": "string", + "description": "Due date/time (ISO 8601 format)" + } + } + ), + examples=[ + { + "input": {"title": "Buy groceries", "priority": "high"}, + "output": {"id": "task-123", "status": "pending", "created_at": "2025-01-15T10:00:00Z"} + } + ] +) +``` + +### Error Message Design + +Design error messages for agent recovery: + +```python +class ToolError(Exception): + """Tool error with actionable recovery guidance.""" + + def __init__( + self, + code: str, + message: str, + retryable: bool = False, + recover_action: str | None = None + ): + self.code = code + self.message = message + self.retryable = retryable + self.recover_action = recover_action + + def to_dict(self) -> dict: + return { + "error": self.message, + "code": self.code, + "retryable": self.retryable, + "recover_action": self.recover_action + } + +async def create_task(title: str, priority: str) -> dict: + """Create a task with clear error handling.""" + if len(title) > 255: + raise ToolError( + code="INVALID_TITLE", + message=f"Title too long: {len(title)} characters (max 255)", + recover_action="Truncate title to 255 characters and retry" + ) + + if priority not in ["low", "medium", "high", "critical"]: + raise ToolError( + code="INVALID_PRIORITY", + message=f"Invalid priority: {priority}", + recover_action="Use one of: low, medium, high, critical" + ) + + # Implementation... + +async def fetch_github_pr(owner: str, repo: str, number: int) -> dict: + """Fetch GitHub PR with rate limit handling.""" + try: + pr = await github_service.get_pr(owner, repo, number) + return pr + except httpx.HTTPStatusError as e: + if e.response.status_code == 429: + raise ToolError( + code="RATE_LIMITED", + message="GitHub rate limit exceeded", + retryable=True, + recover_action="Wait 1 hour and retry, or configure GITHUB_TOKEN" + ) + elif e.response.status_code == 403: + raise ToolError( + code="FORBIDDEN", + message="Repository access forbidden", + recover_action="Check repository permissions or verify GITHUB_TOKEN" + ) + else: + raise ToolError( + code="FETCH_FAILED", + message=f"Failed to fetch PR: {e}", + retryable=True, + recover_action=f"Check that {owner}/{repo}#{number} exists" + ) +``` + +### Async Tool Execution Patterns + +Implement async tool patterns with timeout handling: + +```python +import asyncio +from contextlib import asynccontextmanager + +class AsyncToolExecutor: + """Async tool execution with timeout and cancellation.""" + + def __init__(self, timeout: float = 30.0): + self.timeout = timeout + + async def execute( + self, + tool_name: str, + tool_func: Callable, + **kwargs + ) -> dict: + """Execute tool with timeout and error handling.""" + try: + # Execute with timeout + result = await asyncio.wait_for( + tool_func(**kwargs), + timeout=self.timeout + ) + return { + "status": "success", + "result": result + } + + except asyncio.TimeoutError: + logger.warning({ + "event": "tool_timeout", + "tool": tool_name, + "timeout": self.timeout + }) + return { + "status": "timeout", + "error": f"Tool execution timed out after {self.timeout}s", + "recoverable": True + } + + except ToolError as e: + logger.warning({ + "event": "tool_error", + "tool": tool_name, + "code": e.code, + "retryable": e.retryable + }) + return { + "status": "error", + "error": e.message, + "code": e.code, + "retryable": e.retryable, + "recover_action": e.recover_action + } + + except Exception as e: + logger.error({ + "event": "tool_unexpected_error", + "tool": tool_name, + "error": str(e) + }) + return { + "status": "error", + "error": "Unexpected tool error", + "recoverable": False + } + +# Usage +executor = AsyncToolExecutor(timeout=30.0) +result = await executor.execute( + tool_name="create_task", + tool_func=create_task, + title="Buy groceries", + priority="high" +) +``` + +### Tool Result Caching + +Implement memoization for expensive tool calls: + +```python +from functools import lru_cache +from typing import Optional +import hashlib + +def cache_key(tool_name: str, **kwargs) -> str: + """Generate stable cache key.""" + # Sort kwargs for consistent keys + sorted_items = sorted(kwargs.items()) + key_str = f"{tool_name}:{sorted_items}" + return hashlib.md5(key_str.encode()).hexdigest() + +class CachedToolExecutor: + """Tool executor with memoization.""" + + def __init__(self, ttl_seconds: int = 300): + self.ttl_seconds = ttl_seconds + self._cache: dict[str, tuple[dict, float]] = {} + + async def execute( + self, + tool_name: str, + tool_func: Callable, + **kwargs + ) -> dict: + """Execute tool with caching.""" + key = cache_key(tool_name, **kwargs) + + # Check cache + if key in self._cache: + result, cached_at = self._cache[key] + age_seconds = (datetime.now() - cached_at).total_seconds() + + if age_seconds < self.ttl_seconds: + logger.debug({ + "event": "tool_cache_hit", + "tool": tool_name, + "key": key + }) + return result + + # Execute tool + result = await tool_func(**kwargs) + + # Cache result + self._cache[key] = (result, datetime.now()) + logger.debug({ + "event": "tool_cache_miss", + "tool": tool_name, + "key": key + }) + + return result + +# Usage +executor = CachedToolExecutor(ttl_seconds=300) +await executor.execute(tool_name="list_tasks", tool_func=list_tasks, status="pending") +``` + +### Tool Composition and Chaining + +Support sequential tool chaining for workflows: + +```python +class ToolChainer: + """Chain multiple tools in sequence.""" + + def __init__(self): + self.tools: dict[str, Callable] = {} + + async def chain( + self, + steps: List[dict] + ) -> List[dict]: + """ + Execute tools in sequence, passing outputs to next steps. + + Steps format: + [ + {"tool": "get_task", "args": {"task_id": "$task_id"}}, + {"tool": "update_task", "args": {"task_id": "$task_id", "status": "completed"}} + ] + """ + context: dict = {} + results: List[dict] = [] + + for step in steps: + tool_name = step["tool"] + step_args = self._resolve_args(step["args"], context) + + result = await self.tools[tool_name](**step_args) + results.append(result) + + # Update context for next step + context.update(result.get("output", {})) + + return results + + def _resolve_args(self, args: dict, context: dict) -> dict: + """Resolve argument references from previous step outputs.""" + resolved = {} + for key, value in args.items(): + if isinstance(value, str) and value.startswith("$"): + ref_key = value[1:] # Remove $ prefix + resolved[key] = context.get(ref_key, value) + else: + resolved[key] = value + return resolved + +# Example workflow chain +workflow = ToolChainer() +results = await workflow.chain([ + { + "tool": "create_task", + "args": {"title": "Review PR #123"} + }, + { + "tool": "search_github_pr", + "args": {"query": "PR #123"} # Will get task_id from previous output + }, + { + "tool": "add_attachment", + "args": {"task_id": "$task_id", "url": "$pr_url"} + } +]) +``` diff --git a/docs/02-implementation/pr-specs/PR-004-attachments-link-detection.md b/docs/02-implementation/pr-specs/PR-004-attachments-link-detection.md index 6db620b..eaa444a 100644 --- a/docs/02-implementation/pr-specs/PR-004-attachments-link-detection.md +++ b/docs/02-implementation/pr-specs/PR-004-attachments-link-detection.md @@ -6,24 +6,27 @@ ## Goal -Enable “context-first tasks” by supporting attachments and auto-detecting URLs in task content. +Support task attachments and automatically detect URLs in task content. ## User Value -- Users can capture a task and paste a URL (GitHub PR, Gmail, doc) and know it will be saved + surfaced later. -- This unlocks integrations incrementally (GitHub/Gmail can be layered on without changing core flows). +- Users can attach links to tasks and keep context nearby. +- Integrations can be layered later without changing the core workflow. + +## References + +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/INTEGRATION_GUIDE.md` +- `docs/01-design/API_REFERENCE.md` ## Scope ### In -- Attachment model + schema (type, reference URL, title, cached content, timestamps). -- Attachment CRUD endpoints (at least: create/list/delete; update optional). -- Link detection service that scans task title/description for URLs and creates attachments automatically. -- Minimal provider registry: - - `match_url(url) -> bool` - - `normalize(url) -> reference` - - default provider for “generic url” +- Attachment model and DB table. +- Attachment CRUD endpoints (create/list/delete; update optional). +- Link detection on task create/update with URL normalization and dedup. +- Provider registry with `match_url` and `normalize` hooks. ### Out @@ -32,231 +35,86 @@ Enable “context-first tasks” by supporting attachments and auto-detecting UR ## Mini-Specs -- Data model: - - `Attachment` schema + DB table aligned with `DESIGN_DATA.md`. -- API endpoints: - - create/list/delete attachments under `/api/v1/...` (align `API_REFERENCE.md`). -- URL normalization + dedup: - - normalize provider references (`github:...`, `gmail:...`, generic URLs). - - prevent duplicates per `(task_id, normalized_reference)`. -- Link detection: - - parse task title/description and auto-create attachments for supported URLs. - - provider registry for `match_url` / `normalize` hooks. -- Tests: - - manual attach, auto-detect, dedup, 404/409 behavior. +- Attachment schema aligned with `DESIGN_DATA.md`. +- Endpoints for creating/listing/deleting attachments under `/api/v1`. +- URL normalization to stable references; dedup per task and reference. +- Auto-detection runs on task create/update for title/description. +- Attachment service exposes safe methods that can be wrapped as tools (PR-003B). -## References +## User Stories -- `docs/01-design/DESIGN_DATA.md` (attachment schema) -- `docs/01-design/INTEGRATION_GUIDE.md` (provider protocol pattern) -- `docs/01-design/API_REFERENCE.md` (attachments endpoints, if/when defined) +- As a user, I can attach a URL to a task and see it listed. +- As a user, URLs pasted into task content create attachments automatically. +- As a user, duplicate URLs do not create duplicate attachments. -## User Stories +## UX Notes (if applicable) -- As a user, I can attach a URL to a task and see it listed on the task. -- As a user, when I paste a URL into a task description, an attachment is created automatically. -- As a user, I don’t get duplicate attachments for the same link pasted twice. +- N/A. ## Technical Design -### Attachment model - -- Attachments belong to a task (`task_id` foreign key). -- Store: - - `type` (e.g., `url|github|gmail|notion|...`) - - `reference` (canonical/normalized reference) - - `title` (optional) - - `content` (optional cached content; may be empty until integrations run) - - timestamps - -### Endpoints (minimum) - -- Create attachment: - - `POST /api/v1/tasks/{id}/attachments` (or `POST /api/v1/attachments` with `task_id`) -- List attachments for a task: - - `GET /api/v1/tasks/{id}/attachments` -- Delete attachment: - - `DELETE /api/v1/attachments/{attachment_id}` - -### Auto-detection behavior - -- Extract URL candidates from `title` and `description` on: - - task create - - task update (when title/description changes) -- **Parsing:** Start with a simple regex (e.g., `r"https?://\\S+"`), then: - - trim common trailing punctuation (e.g., `)`, `]`, `.`, `,`) - - normalize with `urllib.parse` (lowercase scheme/host, drop obvious tracking params if desired) -- **LinkDetectionService:** - - Providers: - - `GitHubProvider`: matches `github.com/(.*)/(.*)/(pull|issues)/(\d+)` - - `GmailProvider`: matches `mail.google.com/mail/u/\d+/#(inbox|label|all)/(\w+)` - - `GenericProvider`: matches any other valid URL -- **Deduplication:** Check if `reference` already exists for the given `task_id` before inserting. - -### Attachment lifecycle - -- Attachments are created immediately with reference URL. -- Cached content is optional at this stage (can be empty until integrations fetch it). +### Architecture + +- `LinkDetectionService` parses URLs and calls the provider registry. +- Provider registry includes GitHub, Gmail, and generic URL providers. +- Attachment service handles create/list/delete with dedup checks. +- Provide a thin tool wrapper layer in PR-003B (no tool execution here). + +### Data Model / Migrations + +- `Attachment` fields: id, task_id (FK), type, reference, title, content, + created_at, updated_at. +- Unique constraint on `(task_id, reference)` to prevent duplicates. + +### API Contract + +- `POST /api/v1/tasks/{id}/attachments` (or `/api/v1/attachments` with `task_id`). +- `GET /api/v1/tasks/{id}/attachments`. +- `DELETE /api/v1/attachments/{attachment_id}`. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- Only store references; no external fetching in this PR. + +### Error Handling + +- 404 for missing task/attachment. +- 409 conflict for duplicate normalized references. ## Acceptance Criteria -- [ ] Users can manually attach a URL to a task. -- [ ] URLs in task description auto-create attachments. -- [ ] Attachments are listed with task and have stable IDs. -- [ ] Link detection correctly identifies GitHub, Gmail, and generic URLs. -- [ ] Deduplication prevents duplicate attachments for same normalized URL. +### AC1: Manual Attachment CRUD + +**Success Criteria:** +- [ ] Users can add, list, and delete attachments for a task. + +### AC2: Auto-Detect URLs + +**Success Criteria:** +- [ ] URLs in title/description create attachments automatically. + +### AC3: Normalization and Deduplication + +**Success Criteria:** +- [ ] Providers normalize URLs to stable references. +- [ ] Duplicate references for a task are rejected or no-op per design. ## Test Plan ### Automated -```python -# tests/test_api/test_attachments.py -import pytest -from httpx import AsyncClient -from backend.services.attachment_service import AttachmentType - -class TestAddAttachment: - """Tests for POST /api/v1/tasks/{id}/attachments""" - - @pytest.mark.asyncio - async def test_add_github_attachment(self, client: AsyncClient, sample_task): - """Add GitHub PR attachment.""" - response = await client.post( - f"/api/v1/tasks/{sample_task.id}/attachments", - json={ - "type": "github", - "reference": "https://github.com/owner/repo/pull/123" - } - ) - assert response.status_code == 201 - data = response.json() - assert data["type"] == "github" - - @pytest.mark.asyncio - async def test_add_gmail_attachment(self, client: AsyncClient, sample_task): - """Add Gmail attachment.""" - response = await client.post( - f"/api/v1/tasks/{sample_task.id}/attachments", - json={ - "type": "gmail", - "reference": "gmail:18e4f7a2b3c4d5e" - } - ) - assert response.status_code == 201 - data = response.json() - assert data["type"] == "gmail" - - @pytest.mark.asyncio - async def test_add_generic_url_attachment(self, client: AsyncClient, sample_task): - """Add generic URL attachment.""" - response = await client.post( - f"/api/v1/tasks/{sample_task.id}/attachments", - json={ - "type": "url", - "reference": "https://example.com/docs" - } - ) - assert response.status_code == 201 - - @pytest.mark.asyncio - async def test_add_attachment_task_not_found(self, client: AsyncClient): - """Return 404 when task doesn't exist.""" - response = await client.post( - "/api/v1/tasks/nonexistent/attachments", - json={"type": "url", "reference": "https://example.com"} - ) - assert response.status_code == 404 - - -class TestLinkDetection: - """Tests for automatic link detection""" - - @pytest.mark.asyncio - async def test_detect_github_pr_url(self, db_session): - """Detect GitHub PR URL in task description.""" - from backend.services.link_detection import LinkDetectionService - - task = await create_task_with_desc( - "Check https://github.com/owner/repo/pull/123 please" - ) - - detected = await LinkDetectionService.detect_and_create(db_session, task) - assert len(detected) == 1 - assert detected[0]["type"] == "github" - assert detected[0]["reference"] == "github:owner/repo/pull/123" - - @pytest.mark.asyncio - async def test_detect_github_issue_url(self, db_session): - """Detect GitHub Issue URL.""" - from backend.services.link_detection import LinkDetectionService - - task = await create_task_with_desc( - "Issue at owner/repo#456" - ) - - detected = await LinkDetectionService.detect_and_create(db_session, task) - assert len(detected) == 1 - assert detected[0]["type"] == "github" - - @pytest.mark.asyncio - async def test_detect_gmail_url(self, db_session): - """Detect Gmail URL.""" - from backend.services.link_detection import LinkDetectionService - - task = await create_task_with_desc( - "Email https://mail.google.com/mail/u/0/#inbox/18e4f7a2b3c4d5e" - ) - - detected = await LinkDetectionService.detect_and_create(db_session, task) - assert len(detected) == 1 - assert detected[0]["type"] == "gmail" - - -class TestDeduplication: - """Tests for attachment deduplication""" - - @pytest.mark.asyncio - async def test_same_url_no_duplicate(self, client: AsyncClient, sample_task, db_session): - """Same URL doesn't create duplicate attachment.""" - from backend.services.link_detection import LinkDetectionService - - # First attachment - await LinkDetectionService.create_attachment( - db_session, sample_task.id, "github", "github:owner/repo/pull/123" - ) - - # Try same reference again - response = await client.post( - f"/api/v1/tasks/{sample_task.id}/attachments", - json={"type": "github", "reference": "github:owner/repo/pull/123"} - ) - assert response.status_code == 409 # Conflict - - @pytest.mark.asyncio - async def test_normalized_url_no_duplicate(self, client: AsyncClient, sample_task, db_session): - """Normalized URLs don't create duplicates.""" - from backend.services.link_detection import LinkDetectionService - - # Create with different param in URL - await LinkDetectionService.create_attachment( - db_session, sample_task.id, "github", "github:owner/repo/pull/123" - ) - - # Try with query param (normalized to same) - response = await client.post( - f"/api/v1/tasks/{sample_task.id}/attachments", - json={"type": "github", "reference": "github:owner/repo/pull/123?foo=bar"} - ) - assert response.status_code == 409 # Conflict -``` +- API tests for attachment CRUD. +- Unit tests for URL parsing, normalization, and dedup. ### Manual -1. Create task with `https://github.com/owner/repo/pull/123` in description. -2. Verify task details show 1 attachment entry. -3. Edit task and add a second URL; verify attachments list updates. +- Create tasks with URLs in description; verify attachments created. +- Paste duplicate links; verify no duplicate attachments. ## Notes / Risks / Open Questions -- Decide canonical URL normalization rules early (strip tracking params? keep fragments?). +- Finalize normalization rules (tracking params, fragments). diff --git a/docs/02-implementation/pr-specs/PR-005-rag-semantic-search.md b/docs/02-implementation/pr-specs/PR-005-rag-semantic-search.md index 8297ff8..0d2ac55 100644 --- a/docs/02-implementation/pr-specs/PR-005-rag-semantic-search.md +++ b/docs/02-implementation/pr-specs/PR-005-rag-semantic-search.md @@ -6,108 +6,114 @@ ## Goal -Add semantic recall across tasks and cached attachment content, and improve chat answers by injecting the most relevant context. +Add semantic search across tasks and cached attachment content, and use retrieved +context to improve chat responses. ## User Value -- “Where did I put that?” becomes fast: semantic search finds tasks/attachments by meaning. -- Chat becomes more accurate because it can ground responses in local data. +- Users can find tasks by meaning, not just keywords. +- Chat responses can reference local context. + +## References + +- `docs/01-design/DESIGN_CHAT.md` +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/API_REFERENCE.md` ## Scope ### In -- Local vector store (ChromaDB). -- Embedding pipeline for: - - tasks - - attachments (cached content) -- Semantic search endpoint (query → top-k results with scores). -- Chat integration: - - retrieve top context snippets - - inject into prompt (bounded by token limits) +- Local vector store (ChromaDB) and embeddings pipeline. +- Indexing for tasks and cached attachment content. +- Semantic search endpoint returning ranked results. +- Chat context injection with strict token limits. ### Out -- Cross-user/multi-tenant search. -- Remote vector DB. +- Multi-tenant or remote vector databases. +- Hybrid keyword search, re-ranking, and query expansion (future). ## Mini-Specs -- Embeddings: - - choose default embedding model/provider (configurable). -- Indexing: - - tasks + cached attachments are embedded and stored in ChromaDB. - - re-index on create/update and after attachment fetch. -- Query: - - semantic search API (`GET /api/v1/search/semantic`) returns ranked results. -- Chat augmentation: - - retrieve top snippets and inject into chat prompt (bounded by token limits). -- Tests: - - small fixture corpus; validate top-k results for representative queries. +- Configurable embedding model and vector store location. +- Indexing triggers on task create/update and attachment content updates. +- `/api/v1/search/semantic` endpoint returning top-k results with scores. +- Prompt assembly that injects retrieved context within a token budget. -## References +## User Stories -- `docs/01-design/DESIGN_CHAT.md` (RAG strategy + prompt assembly) -- `docs/01-design/DESIGN_DATA.md` (RAG document structure) -- `docs/01-design/API_REFERENCE.md` (semantic search endpoint examples) +- As a user, I can search by meaning and find relevant tasks. +- As a user, chat answers reference the right tasks or attachments. -## User Stories +## UX Notes (if applicable) -- As a user, I can search by meaning (“login issues”) and find the right task. -- As a user, chat answers can reference the correct task/attachment context. +- N/A. ## Technical Design -### Indexing triggers +### Architecture + +- Embedding service creates chunks and writes to ChromaDB. +- Retrieval service returns top-k snippets with scores and metadata. +- Chat pipeline requests top snippets and injects them into the prompt. + +### Data Model / Migrations + +- Vector store collection `taskgenie_items` with metadata: + id, source_type (task|attachment), parent_task_id, text, embedding_version. -- Index tasks: - - on task create/update -- Index attachments: - - when attachment content is fetched/updated (PR-006/PR-007) +### API Contract -### Embeddings +- `GET /api/v1/search/semantic?query=...&limit=...` returns results with id, type, + score, snippet, and task linkage. -- **Model:** `sentence-transformers/all-MiniLM-L6-v2` (via `sentence-transformers`; make configurable later) -- **Vector Store:** ChromaDB (running in-process) -- **Collection Naming:** `taskgenie_items` -- **Document Metadata:** - - `id`: task or attachment ID - - `source_type`: `task | attachment` - - `parent_task_id`: ID of the task - - `text`: the chunked content +### Background Jobs -### Retrieval + response format +- Optional background indexing for large attachment content to avoid blocking + requests. -- Semantic search endpoint returns: - - top-k results with scores - - short excerpts for display -- Chat integration: - - retrieve top snippets - - inject into prompt within a strict token budget - - (optional) include “sources” in response for traceability +### Security / Privacy + +- Do not log raw content or prompts by default. +- Embeddings are stored locally in the app data directory. + +### Error Handling + +- Empty index returns empty results (200 OK). +- Embedding failures return actionable errors and do not crash chat. ## Acceptance Criteria +### AC1: Indexing Pipeline + +**Success Criteria:** - [ ] Tasks and cached attachments are embedded and indexed automatically. -- [ ] Semantic search returns relevant results for representative queries. -- [ ] Chat responses cite retrieved task/attachment context where applicable. + +### AC2: Semantic Search API + +**Success Criteria:** +- [ ] Search returns relevant results for representative queries. + +### AC3: Chat Context Injection + +**Success Criteria:** +- [ ] Chat responses include retrieved context when available and respect token + limits. ## Test Plan ### Automated -- Unit: chunking logic, prompt assembly, context truncation. -- Integration: - 1. Create two tasks with different topics; query semantic search; verify ranking. - 2. Add attachment content; query that content; verify attachment appears in results. - 3. Chat query uses retrieved context (verify via stubbed “context used” markers). +- Unit tests for chunking, prompt assembly, token budgeting. +- Integration tests for search ranking with a small fixture corpus. ### Manual -1. Create tasks “Fix auth bug” and “Plan vacation”. -2. Search “login issues” → should surface the auth task. -3. Ask chat “What’s the status of the login work?” → response should reference the correct task. +- Create tasks and attachments and query semantic search. +- Ask chat a question that should be grounded in a task. ## Notes / Risks / Open Questions -- Decide whether to store embeddings for task description only vs (title + description). +- Decide whether embeddings include title only or title + description. +- Consider hybrid search and re-ranking in a follow-on PR for higher precision. diff --git a/docs/02-implementation/pr-specs/PR-006-gmail-integration.md b/docs/02-implementation/pr-specs/PR-006-gmail-integration.md index 028c658..851c809 100644 --- a/docs/02-implementation/pr-specs/PR-006-gmail-integration.md +++ b/docs/02-implementation/pr-specs/PR-006-gmail-integration.md @@ -6,306 +6,112 @@ ## Goal -Given a Gmail URL attached to a task, authenticate via OAuth and fetch the email content, then cache it locally as attachment content. +Given a Gmail URL attachment, authenticate via OAuth, fetch email content, and cache +it locally on the attachment. ## User Value -- Email-linked tasks keep the important context without re-opening Gmail. -- Cached email content can later be searched via RAG (PR-005). +- Email-linked tasks keep essential context without reopening Gmail. +- Cached email content can later be searched via RAG. + +## References + +- `docs/01-design/INTEGRATION_GUIDE.md` +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` ## Scope ### In -- Parse Gmail URLs to a stable reference (message id / thread id as appropriate). -- OAuth flow (local machine): - - open browser for consent - - store credentials securely on disk -- Fetch: - - subject - - from/to - - date - - body (plain text preferred; HTML optional) -- Cache fetched content into the attachment record. +- Gmail URL normalization to a stable reference. +- OAuth flow for local machines and secure credential storage. +- Fetch subject/from/to/date/body and store in `attachment.content`. +- Explicit refresh action and/or background fetch for pending attachments. ### Out -- Two-way sync (labeling, replying, etc.). -- Bulk email ingestion. +- Two-way sync or bulk ingestion. ## Mini-Specs -- URL recognition + normalization: - - detect Gmail message URLs and normalize to a stable reference key. -- OAuth + credential storage: - - `tgenie config --gmail-auth` (or equivalent) to set up tokens. - - store secrets under `~/.taskgenie/credentials/` with `0600` permissions. -- Fetch + cache: - - fetch subject/from/date/body and store into `attachment.content`. -- Background fetch: - - allow explicit refresh and/or background “pending fetch” processing. -- Tests: - - URL parsing, auth error mapping, and fetch-to-cache pipeline (mocked HTTP). +- Provider that parses Gmail URLs and normalizes to `gmail:`. +- OAuth setup command and token storage under `~/.taskgenie/credentials/` with 0600. +- Fetch pipeline to populate attachment title/content and metadata. +- Optional background job to refresh pending attachments. +- Expose a tool wrapper for attachment refresh (registered in PR-003B). -## References +## User Stories -- `docs/01-design/INTEGRATION_GUIDE.md` (provider pattern + security) -- `docs/01-design/DESIGN_DATA.md` (attachment cache fields) -- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` (fetch jobs without a queue) +- As a user, I can paste a Gmail URL and see email context on the task. +- As a user, my OAuth tokens are stored securely and not logged. -## User Stories +## UX Notes (if applicable) -- As a user, I can paste a Gmail URL into a task and see key email context in the attachment. -- As a user, the system handles OAuth securely and doesn’t leak email content into logs. +- Provide clear setup instructions when Gmail auth is missing. ## Technical Design -### URL-first normalization +### Architecture + +- Gmail provider service with methods: `normalize_url`, `fetch_message`, + `cache_content`. +- Auth uses Google OAuth InstalledAppFlow with local browser callback. +- A refresh entrypoint (CLI or background job) triggers fetch per attachment. +- Tool wrapper integration is defined in PR-003B and calls the refresh entrypoint. -- Accept Gmail URLs as the primary input. -- Normalize to a stable identifier (message id and/or thread id) stored in `attachment.reference`. +### Data Model / Migrations -### OAuth + credential storage +- Store Gmail message metadata in `attachment.metadata` and content in + `attachment.content`. -- OAuth flow opens the browser and stores credentials on disk with strict permissions. -- Do not store OAuth secrets inside the main SQLite DB. +### API Contract -### Fetch + cache +- No new public API required beyond attachment refresh endpoint/command. +- If adding API: `POST /api/v1/attachments/{id}/refresh` returns updated attachment. -- Fetch minimal useful fields (subject, from/to, date, text body). -- Store formatted content into `attachment.content` for later viewing/searching. +### Background Jobs -### Background fetch +- Optional background loop to process attachments with missing content. -- Fetch can be initiated: - - explicitly (e.g., “refresh attachment” action), and/or - - as a background job that processes “pending fetch” attachments. +### Security / Privacy + +- Store OAuth credentials on disk with 0600 permissions. +- Do not log email bodies or tokens. + +### Error Handling + +- Map Gmail API errors (401/403/404/429) to user-facing messages. +- Fail fetch gracefully without deleting the attachment. ## Acceptance Criteria -- [ ] OAuth flow works and credentials persist. -- [ ] Gmail URL is normalized and fetch succeeds. -- [ ] Email content is cached and viewable from task attachment. -- [ ] Credentials stored securely with file permissions (0600). +### AC1: OAuth and Credential Persistence -## Technical Design +**Success Criteria:** +- [ ] OAuth flow completes and tokens persist with secure permissions. + +### AC2: URL Normalization and Fetch + +**Success Criteria:** +- [ ] Gmail URLs normalize to stable references and fetch succeeds. + +### AC3: Cache Attachment Content -### OAuth Flow Implementation - -```python -# backend/services/gmail_service.py -from google_auth_oauthlib.flow import InstalledAppFlow -from google.oauth2.credentials import Credentials -from pathlib import Path -import json - -class GmailService: - def __init__(self): - self.credentials: Credentials | None = None - self.SCOPES = ["https://www.googleapis.com/auth/gmail.readonly"] - self.CREDENTIALS_DIR = Path.home() / ".taskgenie" / "credentials" - self.CLIENT_SECRET_PATH = self.CREDENTIALS_DIR / "gmail_client_secret.json" - self.TOKEN_PATH = self.CREDENTIALS_DIR / "gmail_token.json" - - async def authenticate(self) -> bool: - """Run OAuth flow for Gmail.""" - # Check for existing token - if self.TOKEN_PATH.exists(): - self.credentials = Credentials.from_authorized_user_file( - str(self.TOKEN_PATH), self.SCOPES - ) - - # Refresh if expired - if self.credentials.expired and self.credentials.refresh_token: - self.credentials.refresh(Request()) - self._save_token() - return True - - # Run OAuth flow - if not self.CLIENT_SECRET_PATH.exists(): - raise FileNotFoundError( - f"Gmail client secret not found at {self.CLIENT_SECRET_PATH}. " - "Download from Google Cloud Console." - ) - - flow = InstalledAppFlow.from_client_secrets_file( - str(self.CLIENT_SECRET_PATH), self.SCOPES - ) - - self.credentials = flow.run_local_server( - port=0, - prompt="consent", - success_message="Authentication successful! You can close this tab." - ) - - self._save_token() - return True - - def _save_token(self): - """Save token to file with secure permissions.""" - self.TOKEN_PATH.parent.mkdir(parents=True, exist_ok=True) - - # Save token - with open(self.TOKEN_PATH, "w") as f: - f.write(self.credentials.to_json()) - - # Set secure file permissions (owner read/write only) - self.TOKEN_PATH.chmod(0o600) - - async def get_message(self, message_id: str) -> dict: - """Fetch Gmail message by ID.""" - if not self.credentials or not self.credentials.valid: - await self.authenticate() - - service = build("gmail", "v1", credentials=self.credentials) - message = service.users().messages().get( - userId="me", - id=message_id, - format="full" - ).execute() - - return self._parse_message(message) - - def _parse_message(self, raw_message: dict) -> dict: - """Parse Gmail message to structured data.""" - headers = { - h["name"].lower(): h["value"] - for h in raw_message.get("payload", {}).get("headers", []) - } - - # Extract body - payload = raw_message.get("payload", {}) - body = self._extract_body(payload) - - return { - "id": raw_message["id"], - "thread_id": raw_message["threadId"], - "subject": headers.get("subject", ""), - "from": headers.get("from", ""), - "to": headers.get("to", ""), - "date": headers.get("date", ""), - "body": body, - "snippet": raw_message.get("snippet", ""), - } - - def _extract_body(self, payload: dict) -> str: - """Extract body from message payload.""" - import base64 - - # Try direct body first - if "body" in payload and payload["body"].get("data"): - return base64.urlsafe_b64decode(payload["body"]["data"]).decode("utf-8") - - # Try multipart - if "parts" in payload: - for part in payload["parts"]: - if part.get("mimeType") == "text/plain" and part.get("body", {}).get("data"): - return base64.urlsafe_b64decode(part["body"]["data"]).decode("utf-8") - - return "" -``` +**Success Criteria:** +- [ ] Attachment content includes subject/from/to/date/body excerpt. ## Test Plan ### Automated -```python -# tests/test_services/test_gmail_service.py -import pytest -from unittest.mock import MagicMock, patch -from backend.services.gmail_service import GmailService - -class TestGmailOAuth: - """Tests for Gmail OAuth flow""" - - @pytest.mark.asyncio - async def test_authenticate_creates_token(self, tmp_path): - """OAuth flow creates token file.""" - gmail = GmailService() - - # Mock OAuth flow - mock_flow = MagicMock() - mock_flow.run_local_server.return_value = MagicMock( - to_json=lambda: '{"refresh_token": "test-refresh", "token": "test-token"}' - ) - - with patch("backend.services.gmail_service.InstalledAppFlow", return_value=mock_flow): - await gmail.authenticate() - - # Verify token was saved - assert gmail.TOKEN_PATH.exists() - - # Check file permissions - import os - assert os.stat(gmail.TOKEN_PATH).st_mode == 0o600 - - @pytest.mark.asyncio - async def test_authenticate_loads_existing_token(self, tmp_path): - """Existing token is loaded and refreshed if needed.""" - gmail = GmailService() - - # Create valid token file - gmail.TOKEN_PATH.parent.mkdir(parents=True, exist_ok=True) - with open(gmail.TOKEN_PATH, "w") as f: - f.write('{"refresh_token": "valid", "token": "valid"}') - - # Load (should not run OAuth flow) - mock_flow = MagicMock() - with patch("backend.services.gmail_service.InstalledAppFlow", return_value=mock_flow): - await gmail.authenticate() - - # Verify existing token was used - mock_flow.run_local_server.assert_not_called() - - -class TestGmailMessageParsing: - """Tests for Gmail message parsing""" - - def test_parse_simple_message(self): - """Parse simple text message.""" - gmail = GmailService() - raw = { - "id": "msg-123", - "threadId": "thread-456", - "snippet": "Test message", - "payload": { - "headers": [ - {"name": "Subject", "value": "Test Subject"}, - {"name": "From", "value": "sender@example.com"} - ], - "body": {"data": "SGVsbG8gV29ybGQ="} # "Hello World" - } - } - - result = gmail._parse_message(raw) - assert result["subject"] == "Test Subject" - assert result["body"] == "Hello World" - - def test_parse_multipart_message(self): - """Parse multipart message.""" - gmail = GmailService() - raw = { - "id": "msg-123", - "payload": { - "parts": [ - { - "mimeType": "text/plain", - "body": {"data": "TWVsbG8gV29ybGQ="} - } - ] - } - } - - result = gmail._parse_message(raw) - assert result["body"] == "Hello World" -``` +- Unit tests for URL normalization and message parsing. +- Integration tests for token storage and refresh behavior (mocked HTTP). ### Manual -1. Run `tgenie config --gmail-auth` (or equivalent) and complete OAuth in browser. -2. Create task containing a Gmail URL. -3. Trigger fetch and verify attachment shows subject/from/date/body excerpt. +- Run OAuth setup, attach a Gmail URL, refresh, and verify cached content. ## Notes / Risks / Open Questions -- Gmail URL formats vary; normalization rules should be tested against real examples. +- Gmail URL formats vary; expand normalization fixtures with real samples. diff --git a/docs/02-implementation/pr-specs/PR-007-github-integration.md b/docs/02-implementation/pr-specs/PR-007-github-integration.md index 5820471..6200fb2 100644 --- a/docs/02-implementation/pr-specs/PR-007-github-integration.md +++ b/docs/02-implementation/pr-specs/PR-007-github-integration.md @@ -6,335 +6,110 @@ ## Goal -Given a GitHub Issue/PR URL attached to a task, fetch the useful content and cache it locally as attachment content. +Given a GitHub Issue/PR URL attachment, fetch useful content and cache it locally. ## User Value -- Tasks that reference GitHub become “self-contained”: the user can see key PR/Issue details without context switching. -- Cached content becomes searchable later (RAG is layered on in PR-005). +- GitHub-linked tasks become self-contained with key PR/issue details. +- Cached content becomes searchable via RAG later. + +## References + +- `docs/01-design/INTEGRATION_GUIDE.md` +- `docs/01-design/DESIGN_DATA.md` +- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` ## Scope ### In -- Recognize GitHub URLs (issue, pull request). -- Fetch: - - title - - description/body - - key metadata (state, author, repo, number, updated_at) -- Cache fetched content into the attachment record. -- Auth: - - support `GITHUB_TOKEN` (PAT) for higher rate limits - - handle unauthenticated mode with stricter limits +- URL recognition and normalization for issues and pull requests. +- Fetch title/body and key metadata. +- Cache content on the attachment with a refresh strategy. +- Optional `GITHUB_TOKEN` auth with rate-limit handling. ### Out -- Full thread/comment ingestion (future / optional). -- Webhooks (future). +- Full comment ingestion or webhooks. ## Mini-Specs -- URL recognition + normalization: - - issues and pull requests (public + private). -- Fetch + cache: - - title/body + key metadata cached into the attachment record. - - TTL/refresh strategy to avoid excessive refetching. -- Auth + rate limits: - - support `GITHUB_TOKEN` (PAT) and unauthenticated mode. - - map 401/403/404/429 to actionable user messages. -- Tests: - - normalize URLs; mocked GitHub API responses for success + error cases. - -## References - -- `docs/01-design/INTEGRATION_GUIDE.md` (provider pattern + security) -- `docs/01-design/DESIGN_DATA.md` (attachment cache fields) -- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` (fetch jobs without a queue) +- Provider that normalizes GitHub URLs to `github:///`. +- Fetch pipeline that stores title/body and metadata on the attachment. +- Token-based auth and unauthenticated mode with clear rate-limit messaging. +- Optional TTL to avoid refetching too often. +- Expose a tool wrapper for attachment refresh (registered in PR-003B). ## User Stories -- As a user, I can paste a GitHub PR/Issue URL and see the important details in my task. +- As a user, I can paste a GitHub URL and see key details in the task. - As a user, I get a clear error if the repo is private or my token is invalid. +## UX Notes (if applicable) + +- Show actionable errors for auth and rate limits. + ## Technical Design -### Provider implementation +### Architecture + +- GitHub provider service with `normalize_url`, `fetch_issue`, `fetch_pr`. +- Refresh entrypoint (CLI or background job) triggers fetch per attachment. +- Tool wrapper integration is defined in PR-003B and calls the refresh entrypoint. -- Implement a GitHub provider with: - - `match_url()`, `normalize()` - - `fetch_content()` returning a concise text payload - - `metadata()` returning structured fields (repo, number, state, updated_at) +### Data Model / Migrations -### Fetch + cache +- Store GitHub metadata (state, author, repo, number, updated_at) in + `attachment.metadata`. +- Store cached title/body in `attachment.title` and `attachment.content`. -- Cache: - - `attachment.title` from PR/Issue title - - `attachment.content` from body + a short metadata header -- Avoid refetching too frequently (cache TTL). +### API Contract -### Auth + rate limits +- No new public API required beyond attachment refresh endpoint/command. +- If adding API: `POST /api/v1/attachments/{id}/refresh` returns updated attachment. + +### Background Jobs + +- Optional background loop to refresh stale attachments based on TTL. + +### Security / Privacy + +- Read `GITHUB_TOKEN` from env/config; never log token values. + +### Error Handling -- Support `GITHUB_TOKEN` (PAT) and graceful unauthenticated mode. - Map 401/403/404/429 to actionable user messages. +- Unauthenticated mode works for public repos with lower rate limits. ## Acceptance Criteria -- [ ] GitHub PR/Issue URL is recognized and normalized. -- [ ] Content is fetched and cached in attachment record. -- [ ] Errors are actionable (401/403/404/429 show clear messages). -- [ ] Unauthenticated mode works with public repos. -- [ ] PAT authentication uses token for higher rate limits. +### AC1: URL Recognition and Normalization -## Technical Design +**Success Criteria:** +- [ ] GitHub issue/PR URLs normalize to stable references. + +### AC2: Fetch and Cache + +**Success Criteria:** +- [ ] Attachment content includes title/body and key metadata. + +### AC3: Auth and Rate Limits -### Provider Implementation - -```python -# backend/services/github_service.py -import httpx -import re -from dataclasses import dataclass -from backend.config import settings - -@dataclass -class GitHubPR: - """GitHub Pull Request data.""" - number: int - title: str - state: str - author: str - url: str - body: str - additions: int - deletions: int - labels: list[str] - created_at: str - updated_at: str - -@dataclass -class GitHubIssue: - """GitHub Issue data.""" - number: int - title: str - state: str - author: str - url: str - body: str - labels: list[str] - created_at: str - updated_at: str - - -class GitHubService: - """GitHub API service wrapper.""" - - API_BASE = "https://api.github.com" - - def __init__(self, token: str | None = None): - self.token = token or settings.github_token - self._client: httpx.AsyncClient | None = None - - @property - def client(self) -> httpx.AsyncClient: - """Lazy-initialize async HTTP client.""" - if self._client is None: - headers = { - "Accept": "application/vnd.github+json", - "X-GitHub-Api-Version": "2022-11-28", - } - if self.token: - headers["Authorization"] = f"Bearer {self.token}" - - self._client = httpx.AsyncClient( - base_url=self.API_BASE, - headers=headers, - timeout=30.0, - ) - return self._client - - async def close(self): - """Close HTTP client.""" - if self._client: - await self._client.aclose() - self._client = None - - async def get_pull_request( - self, owner: str, repo: str, number: int - ) -> GitHubPR: - """Fetch pull request details.""" - response = await self.client.get(f"/repos/{owner}/{repo}/pulls/{number}") - self._handle_errors(response) - data = response.json() - - return GitHubPR( - number=data["number"], - title=data["title"], - state=data["state"], - author=data["user"]["login"], - url=data["html_url"], - body=data.get("body") or "", - additions=data.get("additions", 0), - deletions=data.get("deletions", 0), - labels=[l["name"] for l in data.get("labels", [])], - created_at=data["created_at"], - updated_at=data["updated_at"], - ) - - async def get_issue(self, owner: str, repo: str, number: int) -> GitHubIssue: - """Fetch issue details.""" - response = await self.client.get(f"/repos/{owner}/{repo}/issues/{number}") - self._handle_errors(response) - data = response.json() - - return GitHubIssue( - number=data["number"], - title=data["title"], - state=data["state"], - author=data["user"]["login"], - url=data["html_url"], - body=data.get("body") or "", - labels=[l["name"] for l in data.get("labels", [])], - created_at=data["created_at"], - updated_at=data["updated_at"], - ) - - @staticmethod - def parse_github_url(url: str) -> tuple[str, str, str, int]: - """ - Parse GitHub URL to extract owner, repo, type, and number. - - Returns: (owner, repo, type, number) - type is 'pull' or 'issues' - """ - patterns = [ - r"github\.com/([^/]+)/([^/]+)/(pull|issues)/(\d+)", - r"^([^/]+)/([^/]+)/(pull|issues)/(\d+)$", - r"^([^/]+)/([^/]+)#(\d+)$", - ] - - for pattern in patterns: - match = re.search(pattern, url) - if match: - groups = match.groups() - if len(groups) == 4: - return groups[0], groups[1], groups[2], int(groups[3]) - elif len(groups) == 3: - return groups[0], groups[1], "issues", int(groups[2]) - - raise ValueError(f"Invalid GitHub URL: {url}") - - def _handle_errors(self, response: httpx.Response): - """Map GitHub API errors to actionable messages.""" - if response.status_code == 401: - raise ValueError("Invalid GitHub token. Configure GITHUB_TOKEN.") - elif response.status_code == 403: - raise ValueError("Forbidden. Check repository access permissions.") - elif response.status_code == 404: - raise ValueError("Repository or resource not found.") - elif response.status_code == 429: - raise ValueError("Rate limit exceeded. Wait before retrying.") - - response.raise_for_status() -``` +**Success Criteria:** +- [ ] Authenticated and unauthenticated modes both function with clear errors. ## Test Plan ### Automated -```python -# tests/test_services/test_github_service.py -import pytest -from unittest.mock import AsyncMock, patch -import httpx -from backend.services.github_service import GitHubService, GitHubPR - -class TestGitHubService: - """Tests for GitHub API integration""" - - @pytest.mark.asyncio - async def test_get_pull_request_success(self): - """Fetch PR details successfully.""" - service = GitHubService(token="test-token") - mock_client = AsyncMock() - - mock_response = AsyncMock() - mock_response.json.return_value = { - "number": 123, - "title": "Fix authentication bug", - "state": "open", - "user": {"login": "developer"}, - "html_url": "https://github.com/owner/repo/pull/123", - "body": "This PR fixes that issue", - "additions": 50, - "deletions": 10, - "labels": [{"name": "bug"}], - "created_at": "2025-01-10T10:00:00Z", - "updated_at": "2025-01-12T15:30:00Z", - } - mock_response.status_code = 200 - mock_response.raise_for_status = MagicMock() - mock_client.get = AsyncMock(return_value=mock_response) - - service._client = mock_client - pr = await service.get_pull_request("owner", "repo", 123) - - assert pr.number == 123 - assert pr.title == "Fix authentication bug" - assert "bug" in pr.labels - mock_client.get.assert_called_once_with("/repos/owner/repo/pulls/123") - - @pytest.mark.asyncio - async def test_get_issue_success(self): - """Fetch issue details successfully.""" - service = GitHubService(token="test-token") - mock_client = AsyncMock() - - mock_response = AsyncMock() - mock_response.json.return_value = { - "number": 456, - "title": "Add new feature", - "state": "open", - "user": {"login": "developer"}, - "html_url": "https://github.com/owner/repo/issues/456", - "labels": [{"name": "enhancement"}], - } - mock_response.status_code = 200 - mock_client.get = AsyncMock(return_value=mock_response) - service._client = mock_client - - issue = await service.get_issue("owner", "repo", 456) - assert issue.number == 456 - - def test_parse_github_url_full(self): - """Parse full GitHub URL.""" - owner, repo, type_, num = GitHubService.parse_github_url( - "https://github.com/owner/repo/pull/123" - ) - assert owner == "owner" - assert repo == "repo" - assert type_ == "pull" - assert num == 123 - - def test_parse_github_url_shorthand(self): - """Parse shorthand owner/repo#123 format.""" - owner, repo, type_, num = GitHubService.parse_github_url("owner/repo#456") - assert owner == "owner" - assert repo == "repo" - assert type_ == "issues" - assert num == 456 - - def test_parse_github_url_invalid(self): - """Parse invalid URL raises error.""" - with pytest.raises(ValueError, match="Invalid GitHub URL"): - GitHubService.parse_github_url("not-a-valid-url") -``` +- Unit tests for URL normalization (issues and PRs). +- Integration tests with mocked GitHub API responses and error mapping. ### Manual -1. Create a task with a GitHub PR URL in its description (auto-detect should attach it via PR-004). -2. Trigger attachment fetch (explicit command or background fetch). -3. Verify attachment shows title + body summary in task detail. +- Attach a public PR URL and verify cached content. +- Set/unset `GITHUB_TOKEN` and verify rate-limit/error messaging. ## Notes / Risks / Open Questions -- Decide whether to include comments later; MVP should keep API calls minimal. +- Decide whether to include comments in a later PR to avoid excessive API calls. diff --git a/docs/02-implementation/pr-specs/PR-008-interactive-tui.md b/docs/02-implementation/pr-specs/PR-008-interactive-tui.md index 95013a4..aac1a30 100644 --- a/docs/02-implementation/pr-specs/PR-008-interactive-tui.md +++ b/docs/02-implementation/pr-specs/PR-008-interactive-tui.md @@ -6,196 +6,113 @@ ## Goal -Ship a first-class interactive TUI as early as possible so we can iterate on UX from day 1. +Ship a first-class interactive TUI so we can iterate on UX early. ## User Value -- The user can actually “try the product” early (add/list/edit/complete tasks). -- UX feedback happens before heavy features (integrations/RAG) lock in assumptions. +- Users can try the product quickly and give feedback. +- Provides a primary workflow before integrations and RAG. + +## References + +- `docs/01-design/DESIGN_TUI.md` +- `docs/01-design/DESIGN_ARCHITECTURE.md` +- `docs/01-design/DESIGN_DATA.md` ## Scope ### In -- `tgenie` launches an interactive TUI by default. -- Core task workflows: - - view task list - - view task details - - create task - - edit task - - mark done -- Clear empty/loading/error states. -- A “chat panel” placeholder is allowed, but real LLM chat is PR-003. +- `tgenie` launches the interactive TUI by default. +- Task list, detail, create/edit, and mark-done flows. +- Empty/loading/error states and confirmation for destructive actions. +- Placeholder chat panel (no LLM yet). ### Out -- Full attachment viewing (PR-004). +- Attachment viewing (PR-004). - LLM-backed chat (PR-003). +- Agent panel and tool execution UI (PR-015). - Full web UI (PR-010). ## Mini-Specs -- Entry point: - - `tgenie` starts the interactive TUI (tasks MVP). -- Screens/widgets: - - task list + task detail pane, modal flows for add/edit/delete. -- Keybindings: - - navigation + common actions (add/edit/done/delete/refresh/help/quit). -- API client: - - calls PR-002 endpoints; clear API-down errors and retry flow. -- Chat panel: - - placeholder UI until PR-003, but visible and non-crashing. -- Tests: - - smoke test for app start + basic widget interactions (where feasible). - -## References - -- `docs/01-design/DESIGN_TUI.md` -- `docs/01-design/DESIGN_ARCHITECTURE.md` -- `docs/01-design/DESIGN_DATA.md` +- Textual-based app with list/detail split view and modal forms. +- Keybindings for navigation and common actions. +- Async API client for PR-002 endpoints with retry flow. +- Clear UI states for empty list and API-down scenarios. ## User Stories -- As a user, I can open `tgenie` and immediately see my tasks with status/priority. -- As a user, I can create a task quickly without remembering flags. -- As a user, I can edit a task (title/description/eta/priority) and see it update immediately. -- As a user, I can mark a task done and watch it leave the “pending” list. -- As a user, if the API is down, I see a clear “cannot connect” state (not a traceback). +- As a user, I can open `tgenie` and see my tasks. +- As a user, I can create and edit tasks without leaving the UI. +- As a user, I can mark tasks done and see status update. +- As a user, I see a clear error when the API is unavailable. -## UX Notes +## UX Notes (if applicable) -- MVP is “tasks-first”; chat can be a visible placeholder until PR-003. -- Prefer visible affordances (help bar / action hints) over hidden shortcuts-only UX. -- Destructive actions (delete) require confirmation. +- Prefer visible affordances (help bar, hints) over hidden shortcuts. +- Destructive actions require confirmation. ## Technical Design -### TUI framework +### Architecture -- Prefer **Textual** for a modern full-screen TUI and rapid iteration. -- Use httpx.AsyncClient for API calls -- Use reactive variables for state management (filters, loading state) -- Style with TCSS (Textual CSS) +- Textual app in `backend/cli/tui/` with screens and widgets for list/detail/forms. +- Thin client: all task mutations go through the API. -### Architecture +### Data Model / Migrations -**File Structure:** -``` -backend/cli/tui/ -├── __init__.py -├── app.py # TodoApp (main app class) -├── client.py # TaskAPIClient (async HTTP wrapper) -├── styles.tcss # TCSS styling -├── screens/ -│ ├── __init__.py -│ ├── main.py # MainScreen (list + detail split view) -│ ├── task_form.py # TaskFormScreen (add/edit modal) -│ └── help.py # HelpScreen -└── widgets/ - ├── __init__.py - ├── task_list.py # TaskListView + TaskItem + TaskRow - ├── task_detail.py # TaskDetailPanel (rich table) - ├── filter_bar.py # FilterBar (filter buttons) - └── status_bar.py # StatusBar (bottom status) -``` - -**Component Layout:** -- A Textual app with: - - task list pane (filterable, left side) - - task detail pane (selected task, right side) - - filter bar at top (status/priority filters) - - status/help bar at bottom (keybindings) - - chat pane placeholder (stub, for PR-003) - -**Message Passing:** -- TaskListView posts TaskSelected message -- MainScreen receives and updates TaskDetailPanel -- TaskFormScreen posts TaskUpdated message on save -- App handles task refresh/notifications - -**Thin client principle:** -- all task operations go through the Task CRUD API -- no direct DB access from the TUI -- API base URL configurable via env var/config +- N/A. ### API Contract -- Requires PR-002 endpoints: - - `POST /api/v1/tasks` - - `GET /api/v1/tasks` - - `GET /api/v1/tasks/{id}` - - `PATCH /api/v1/tasks/{id}` - - `DELETE /api/v1/tasks/{id}` -- Mark done uses PATCH: `{"status": "completed"}`. +- Uses PR-002 task endpoints for all CRUD operations. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- N/A. -### Resilience +### Error Handling -- API unreachable: - - show non-blocking banner + retry affordance - - avoid “hanging” renders (timeouts on HTTP calls) +- Timeouts and connection failures show a banner/state and allow retry. +- API errors are surfaced as friendly messages, not stack traces. ## Acceptance Criteria -- [ ] `tgenie` opens a responsive TUI (no stack traces). -- [ ] Create/list/show/edit/done flows work end-to-end against the API. -- [ ] UI handles empty state and API-down state with clear messaging. -- [ ] Keyboard navigation works (arrows, j/k, g/G, enter, escape). -- [ ] All keybindings work (a/e/d/D/r/?/q). -- [ ] Filter by status/priority works correctly. -- [ ] Task list shows priority colors and status styling. -- [ ] Task detail panel shows formatted table with attachments. -- [ ] Destructive actions (delete) show confirmation modal. +### AC1: Tasks MVP Flows + +**Success Criteria:** +- [ ] `tgenie` opens a responsive TUI without tracebacks. +- [ ] Create/list/show/edit/done flows work against the API. + +### AC2: Navigation and Keybindings + +**Success Criteria:** +- [ ] Navigation keys and action bindings work consistently. +- [ ] Delete requires confirmation. + +### AC3: Empty and Error States + +**Success Criteria:** +- [ ] Empty list and API-down states are clear and recoverable. ## Test Plan ### Automated -- Unit: UI state reducers / formatting utilities (where applicable). -- Integration (light): API client calls mocked via HTTPX mock transport. -- Widget tests: TaskRow rendering, TaskListView message passing, reactive properties -- App tests: keybindings, screen navigation, error handling with mocked API - -**Test structure:** -``` -tests/test_tui/ -├── __init__.py -├── test_app.py # TodoApp lifecycle, keybindings, error handling -├── test_widgets.py # TaskRow, TaskListView, TaskDetailPanel rendering -└── test_screens.py # MainScreen, TaskFormScreen modal behavior -``` - -**Running tests:** -```bash -pytest tests/test_tui/ -v -textual run --dev backend.cli.tui.app:TodoApp -``` +- Widget tests for rendering and message passing. +- App tests for keybindings and error handling with mocked API client. ### Manual -1. Start the API. -2. Run `tgenie`: - - empty state renders correctly with hint to add task - - add a task from the UI → task appears in list with correct styling - - open task detail → shows fields in rich table format - - edit task title/eta/priority → persists immediately - - mark task done → status updates in list (strikethrough) - - delete task → shows confirmation modal - - test all filters (1/2/3/4 keys for status, p for priority) - - test search (/ key) → filters by title/description -3. Stop the API and re-run `tgenie`: - - UI shows "cannot connect" state and recovery guidance - - 'r' key retries connection -4. Test all keybindings: - - 'q' quits app - - '?' shows help screen - - 'a' opens add task form - - 'e' opens edit task form - - 'd' marks task done - - 'D' deletes task with confirmation - - arrows/j/k navigate task list - - g/G jump to top/bottom - - Enter shows task details +- Run the API and verify task flows end to end. +- Stop the API and verify the TUI shows a recoverable error state. ## Notes / Risks / Open Questions -- If Textual is adopted, we should ensure a non-interactive mode still exists (`tgenie add`, etc.) for scripting (PR-009). +- Ensure non-interactive CLI commands remain available (PR-009). diff --git a/docs/02-implementation/pr-specs/PR-009-cli-subcommands.md b/docs/02-implementation/pr-specs/PR-009-cli-subcommands.md index 8a89a8d..ab79cb7 100644 --- a/docs/02-implementation/pr-specs/PR-009-cli-subcommands.md +++ b/docs/02-implementation/pr-specs/PR-009-cli-subcommands.md @@ -1,284 +1,124 @@ # PR-009: CLI Subcommands (Secondary) (Spec) **Status:** Spec Only -**Depends on:** PR-002 +**Depends on:** PR-002, PR-003B, PR-014 **Last Reviewed:** 2025-12-29 ## Goal -Provide non-interactive commands for scripting and automation, while keeping the interactive TUI as the primary UX. +Provide non-interactive commands for scripting while the TUI remains primary. ## User Value -- Users can automate workflows (`cron`, shell scripts, quick one-liners). -- Enables easy debugging of API behavior without the TUI. +- Users can automate workflows in shell scripts or cron. +- Developers can debug API behavior without the TUI. + +## References + +- `docs/01-design/DESIGN_CLI.md` +- `docs/01-design/API_REFERENCE.md` +- `docs/01-design/DESIGN_TUI.md` +- `docs/02-implementation/pr-specs/PR-008-interactive-tui.md` ## Scope ### In -- Subcommands (initial set): - - `tgenie add` - - `tgenie list` - - `tgenie show` - - `tgenie edit` - - `tgenie done` - - `tgenie delete` - - Output conventions: - - human-friendly default - - optional `--json` for scripting (recommended) +- `tgenie add|list|show|edit|done|delete` subcommands. +- `tgenie agent run` and `tgenie agent status` for autonomous workflows. +- Human-friendly output with optional `--json` for scripting. +- Commands call the API, not the DB directly. +- Stable exit codes and clear error messages. ### Out -- Full import/export formats (can be added later). -- Power-user TUI features (PR-008 iteration). +- Import/export formats. +- Advanced TUI power-user features. ## Mini-Specs -- Commands: - - `tgenie add|list|show|edit|done|delete` (non-interactive). -- Output: - - human-friendly by default; optional `--json` where useful. -- API integration: - - commands call the API, not DB directly. -- Error handling: - - stable exit codes and actionable errors when API is down or args invalid. -- Tests: - - CLI runner tests for basic flag parsing + JSON output validity. +- Typer-based CLI under `backend/cli/main.py`. +- Consistent flags across commands (status, priority, eta, tags). +- `--json` output for list/show/add/edit where useful. +- Exit codes for API errors and invalid args. +- `tgenie agent run ""` starts a run and returns a run ID. +- `tgenie agent status ` prints status (supports `--json`). -## References +## User Stories -- `docs/01-design/DESIGN_CLI.md` (command UX and flags) -- `docs/01-design/API_REFERENCE.md` (API endpoints) -- `docs/01-design/DESIGN_TUI.md` (division of responsibilities: TUI vs subcommands, keybindings, screen layout) -- `docs/02-implementation/pr-specs/PR-008-interactive-tui.md` (TUI implementation) +- As a user, I can add and list tasks from shell scripts. +- As a user, I can request machine-readable output with `--json`. +- As a user, I get clear errors when the API is down. +- As a user, I can run an agent goal from the CLI and check its status. -## Technical Design +## UX Notes (if applicable) -- Thin client: subcommands call to API, not the DB directly. -- Provide stable scripting behavior: - - exit code `0` on success, non-zero on errors - - `--json` prints machine-readable output only (no extra formatting) - - Prefer consistent flags across commands (e.g., `--status`, `--priority`, `--eta`). +- Keep output stable and minimal in `--json` mode. ## Technical Design -### CLI Framework - -```python -# backend/cli/main.py -import typer -from typing import Optional - -app = typer.Typer( - name="tgenie", - help="Personal task management with AI chat", - no_args_is_help=True, -) - -@app.command() -def add( - title: str, - description: Optional[str] = typer.Option(None, "-d", "--description"), - attach: list[str] = typer.Option(None, "-a", "--attach", multiple=True), - eta: Optional[str] = typer.Option(None, "-e", "--eta"), - priority: str = typer.Option("medium", "-p", "--priority"), - status: str = typer.Option("pending", "-s", "--status"), - json_output: bool = typer.Option(False, "--json", help="Output in JSON format"), -): - """Create a new task.""" - # Implementation would call API or TUI - typer.echo(f"Task '{title}' created") - -@app.command() -def list_cmd( - status: Optional[str] = typer.Option(None, "--status"), - priority: Optional[str] = typer.Option(None, "--priority"), - due: Optional[str] = typer.Option(None, "--due"), - search: Optional[str] = typer.Option(None, "--search"), - limit: int = typer.Option(50, "--limit"), - json_output: bool = typer.Option(False, "--json", help="Output in JSON format"), -): - """List tasks with optional filters.""" - # Implementation would call API and format output - typer.echo(f"Listing tasks...") - -@app.command() -def show( - task_id: str, - json_output: bool = typer.Option(False, "--json", help="Output in JSON format"), -): - """Show task details.""" - typer.echo(f"Showing task {task_id}") - -@app.command() -def edit( - task_id: str, - title: Optional[str] = typer.Option(None, "-t", "--title"), - description: Optional[str] = typer.Option(None, "-d", "--description"), - status: Optional[str] = typer.Option(None, "-s", "--status"), - priority: Optional[str] = typer.Option(None, "-p", "--priority"), - eta: Optional[str] = typer.Option(None, "-e", "--eta"), -): - """Update task fields.""" - typer.echo(f"Updating task {task_id}") - -@app.command() -def done( - task_id: str, - note: Optional[str] = typer.Option(None, "--note", help="Add completion note"), -): - """Mark task as completed.""" - typer.echo(f"Task {task_id} marked as completed") - -@app.command() -def delete( - task_id: str, - force: bool = typer.Option(False, "--yes", "-y", help="Skip confirmation"), -): - """Delete a task.""" - typer.echo(f"Task {task_id} deleted") - -if __name__ == "__main__": - app() -``` +### Architecture + +- Typer command group `tgenie` that delegates to a shared API client. +- Output formatter for human vs JSON output. + +### Data Model / Migrations + +- N/A. + +### API Contract + +- Uses PR-002 task endpoints. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- N/A. + +### Error Handling + +- Non-zero exit codes on API/network errors. +- Actionable messages for 404/409/500 responses. ## Acceptance Criteria -- [ ] Core subcommands work end-to-end against the API. -- [ ] `--json` output is valid and stable for scripting (where provided). -- [ ] Helpful errors on API-down or invalid arguments. +### AC1: Core Subcommands -## Test Plan +**Success Criteria:** +- [ ] Add/list/show/edit/done/delete work against the API. -### Automated +### AC2: JSON Output + +**Success Criteria:** +- [ ] `--json` output is valid JSON with no extra formatting. -```python -# tests/test_cli/test_commands.py -import pytest -from typer.testing import CliRunner -from backend.cli.main import app +### AC3: Errors and Exit Codes -@pytest.fixture -def runner(): - return CliRunner() +**Success Criteria:** +- [ ] API errors and network failures return non-zero exit codes. -class TestAddCommand: - """Tests for 'tgenie add' command""" +### AC4: Agent Run Commands - def test_add_minimal(self, runner): - """Add task with title only.""" - result = runner.invoke(app, ["add", "Buy groceries"]) - assert result.exit_code == 0 - assert "Created" in result.output - - def test_add_with_all_options(self, runner): - """Add task with all fields.""" - result = runner.invoke(app, [ - "add", "Complete project", - "-d", "Finish implementation", - "-p", "high", - "-e", "2025-01-15", - "-s", "in_progress", - "--json" - ]) - assert result.exit_code == 0 - assert json.loads(result.output)["priority"] == "high" - - def test_add_invalid_priority(self, runner): - """Reject invalid priority.""" - result = runner.invoke(app, ["add", "Test", "-p", "super-high"]) - assert result.exit_code != 0 - assert "Invalid" in result.output or "Error" in result.output - - -class TestListCommand: - """Tests for 'tgenie list' command""" - - def test_list_json_output(self, runner): - """List with JSON output.""" - result = runner.invoke(app, ["list", "--json"]) - assert result.exit_code == 0 - - # Verify valid JSON - import json - data = json.loads(result.output) - assert "tasks" in data - - def test_list_with_filters(self, runner): - """List with status and priority filters.""" - # Add a task first - runner.invoke(app, ["add", "Test task"]) - - result = runner.invoke(app, ["list", "--status", "pending"]) - assert result.exit_code == 0 - - -class TestEditCommand: - """Tests for 'tgenie edit' command""" - - def test_edit_status(self, runner): - """Update task status.""" - result = runner.invoke(app, ["edit", "task-123", "-s", "in_progress"]) - assert result.exit_code == 0 - - def test_edit_not_found(self, runner): - """Error on non-existent task.""" - # Mock API to return 404 - result = runner.invoke(app, ["edit", "nonexistent", "-s", "done"]) - assert result.exit_code != 0 - - -class TestDoneCommand: - """Tests for 'tgenie done' command""" - - def test_done_success(self, runner): - """Mark task as completed.""" - result = runner.invoke(app, ["done", "task-123"]) - assert result.exit_code == 0 - assert "completed" in result.output.lower() - - def test_done_with_note(self, runner): - """Mark task completed with note.""" - result = runner.invoke(app, ["done", "task-123", "--note", "Deployed successfully"]) - assert result.exit_code == 0 - - -class TestDeleteCommand: - """Tests for 'tgenie delete' command""" - - def test_delete_requires_confirmation(self, runner): - """Delete requires confirmation without --yes.""" - result = runner.invoke(app, ["delete", "task-123"]) - assert result.exit_code != 0 # Typer should prompt for confirmation - - def test_delete_with_force(self, runner): - """Delete with --yes skips confirmation.""" - result = runner.invoke(app, ["delete", "task-123", "--yes"]) - assert result.exit_code == 0 - - -class TestExitCodes: - """Tests for proper exit codes""" - - def test_success_returns_zero(self, runner): - """Successful commands return exit code 0.""" - result = runner.invoke(app, ["list"]) - assert result.exit_code == 0 - - def test_api_error_returns_nonzero(self, runner): - """API errors return non-zero exit code.""" - # Mock HTTP client to return 500 - result = runner.invoke(app, ["list"]) - assert result.exit_code != 0 -``` +**Success Criteria:** +- [ ] `tgenie agent run` starts an agent run and returns a run ID. +- [ ] `tgenie agent status` returns the current state of a run. + +## Test Plan + +### Automated + +- CLI runner tests for command behavior and JSON output. +- Mocked API failures to validate exit codes. +- Agent CLI tests with mocked agent service responses. ### Manual -1. Start API. -2. Run: - - `tgenie add "Test task"` - - `tgenie list` - - `tgenie show ` - - `tgenie done ` -3. Stop API; verify commands fail with clear guidance. +- Run commands against a live API and verify output. + +## Notes / Risks / Open Questions + +- Ensure CLI flags align with TUI field names and API enums. +- Agent commands depend on PR-003B and PR-014 (agent run API contract). diff --git a/docs/02-implementation/pr-specs/PR-010-web-ui.md b/docs/02-implementation/pr-specs/PR-010-web-ui.md index 13c6e9d..ebbe9e5 100644 --- a/docs/02-implementation/pr-specs/PR-010-web-ui.md +++ b/docs/02-implementation/pr-specs/PR-010-web-ui.md @@ -1,93 +1,118 @@ # PR-010: Web UI (Spec) **Status:** Spec Only -**Depends on:** PR-002 (chat optional: PR-003) +**Depends on:** PR-002, PR-004 (attachment viewing), PR-003 (chat optional) **Last Reviewed:** 2025-12-30 ## Goal -Provide a secondary web interface for: -- managing tasks -- richer viewing of attachments (longer text, links) -- optional chat streaming in the browser (once chat exists) +Provide a secondary web interface for managing tasks and viewing attachments, with +optional chat streaming. ## User Value -- Easy browsing and reading for long attachments. -- A fallback UX when terminal UI isn’t ideal. +- Easier reading of longer task descriptions and attachments. +- A fallback UX when terminal UI is not ideal. + +## References + +- `docs/01-design/DESIGN_WEB.md` +- `docs/01-design/DESIGN_CHAT.md` +- `docs/01-design/API_REFERENCE.md` ## Scope ### In -- Tasks pages: - - list + filters - - detail view - - create/edit (HTMX forms) -- Optional (if PR-003 exists): - - chat page with streaming responses +- Task list, detail, create, and edit pages (HTMX forms). +- Attachment viewing pages for reading attachment content. +- Basic responsive layout. +- Optional chat page if PR-003 is implemented. ### Out -- Authentication/multi-user (future). -- Mobile-first polish beyond basic responsiveness (future iteration). +- Authentication/multi-user. +- Advanced mobile-first polish. ## Mini-Specs -- Pages: - - tasks list + task detail; create/edit flows (HTMX forms). -- Chat (optional): - - if PR-003 is present, chat page streams responses and handles reconnects. -- Notifications (optional): - - in-app notification feed view (if PR-011 exists). -- Design: - - responsive layout; minimal JS (HTMX + Tailwind or equivalent). -- Tests: - - basic route rendering + API integration smoke tests. +- FastAPI template routes for tasks list/detail and edit/create. +- Attachment viewing pages for displaying attachment content. +- HTMX interactions for inline updates and form submissions. +- Optional chat page using SSE via EventSource. -## References +## User Stories + +- As a user, I can view and edit tasks in a browser. +- As a user, I can read long attachment content comfortably. +- As a user, I can use chat in the browser when available. + +## UX Notes (if applicable) -- `docs/01-design/DESIGN_WEB.md` (page layouts + HTMX interactions) -- `docs/01-design/DESIGN_CHAT.md` (streaming/SSE handling) -- `docs/01-design/API_REFERENCE.md` (API endpoints consumed) +- Keep JS minimal and favor HTMX for interactions. ## Technical Design -- **Backend:** FastAPI with Jinja2Templates -- **Frontend:** - - **CSS:** Tailwind CSS (via CDN or standalone CLI) - - **JS:** HTMX for SPA-like feel without a full framework -- **Patterns:** - - `GET /tasks` → returns full page - - `POST /tasks` → returns HTMX fragment (single task row) to be prepended to the list - - `GET /tasks/{id}/edit` → returns HTMX fragment (form) to swap into the row/modal -- **Chat (optional):** - - Start with plain SSE via `EventSource` for streaming chat output. - - If we want tighter HTMX integration later, consider the `htmx-sse` extension. -- **Thin Client:** - - call the same backend services as the API (no duplicated business logic) - - rely on the Task API for CRUD +### Architecture + +- FastAPI + Jinja2 templates for server-rendered pages. +- HTMX for partial updates; Tailwind (or equivalent) for styling. + +### Data Model / Migrations + +- N/A. + +### API Contract + +- Uses existing task API for CRUD operations. +- Optional chat page uses `/api/v1/chat` SSE. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- N/A (no auth in MVP). + +### Error Handling + +- Render friendly error states when API calls fail. ## Acceptance Criteria -- [ ] Task pages work end-to-end against the API. -- [ ] Basic responsive layout (desktop + narrow viewport). -- [ ] If PR-003 is present: chat page streams responses correctly and handles disconnects gracefully. +### AC1: Task Pages + +**Success Criteria:** +- [ ] List/detail/create/edit flows work against the API. + +### AC2: Attachment Viewing + +**Success Criteria:** +- [ ] Attachment viewing pages display attachment content correctly. + +### AC3: Responsive Layout + +**Success Criteria:** +- [ ] Pages remain usable on narrow viewports. + +### AC4: Optional Chat UI + +**Success Criteria:** +- [ ] If PR-003 is present, chat page streams responses and handles disconnects. ## Test Plan ### Automated -- Integration: fetch pages and verify key elements render. -- E2E (optional): Playwright smoke test for task list → create → detail. +- Integration tests for page rendering and basic actions. +- Optional Playwright smoke test for task flow. ### Manual -1. Start API. -2. Open tasks page; verify list renders. -3. Create a task; verify it appears and detail page loads. -4. If chat is enabled, open chat page and send a message; verify streaming. +- Verify task CRUD in the browser and resize for responsiveness. +- If chat enabled, verify streaming behavior. ## Notes / Risks / Open Questions -- Keep web UI optional/secondary; avoid blocking core UX iteration in the TUI. +- Keep the web UI optional so it does not block core TUI iteration. diff --git a/docs/02-implementation/pr-specs/PR-011-notifications.md b/docs/02-implementation/pr-specs/PR-011-notifications.md index f7bdae7..11e70e6 100644 --- a/docs/02-implementation/pr-specs/PR-011-notifications.md +++ b/docs/02-implementation/pr-specs/PR-011-notifications.md @@ -1,110 +1,127 @@ # PR-011: Notifications (Spec) **Status:** Spec Only -**Depends on:** PR-002 +**Depends on:** PR-002 (agent notifications optional: PR-003B, PR-014) **Last Reviewed:** 2025-12-30 ## Goal -Deliver early "daily value" by reminding user about tasks with ETAs (24h/6h/overdue), with a path that works both locally and in Docker. +Deliver reminders for tasks with ETAs (24h/6h/overdue) with a path that works locally +and in Docker. ## User Value -- Users feel app helping them without needing integrations or RAG. -- Encourages regular usage (reminders + quick completion loop). +- Users get timely reminders without external integrations. +- Encourages regular task completion. + +## References + +- `docs/01-design/DESIGN_NOTIFICATIONS.md` +- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` +- `docs/01-design/REQUIREMENTS_AUDIT.md` ## Scope ### In -- Scheduling logic: - - reminders at configured offsets (default 24h + 6h) - - overdue alerts - - quiet hours - - dedup (don't spam same notification) -- Notification history persisted (so UI can show "what was sent"). -- Delivery mechanism (pragmatic): - - **Local run:** desktop notifications (e.g., `plyer`) - - **Docker run:** provide an in-app notification channel (Web UI / API), with desktop as optional later -- Scheduler runs every 1-5 minutes (configurable). +- Scheduler logic for reminders and overdue alerts. +- Quiet hours and deduplication. +- Persisted notification history for UI/TUI viewing. +- Delivery channels: local desktop notifications and in-app feed for Docker. +- Optional agent-run notifications (started/completed/failed) when agent system exists (requires PR-003B and PR-014). ### Out -- Mobile notifications (future). -- Multi-device sync (future). +- Mobile notifications. +- Multi-device sync. ## Mini-Specs -- Data model: - - `notifications` table to persist delivery history (type, task_id, sent_at, channel, status, error). -- Scheduler: - - in-process tick loop (APScheduler or background task), configurable interval (default 60s). - - computes due reminders (24h/6h/overdue) for non-completed tasks with `eta`. - - deduplication via persisted history (no repeated sends for same task/type). -- Delivery channels: - - local dev: desktop notifications (e.g., `plyer`) - - Docker: in-app notification feed (API-backed) as the reliable baseline -- UX: - - quiet hours support (defer or suppress) - - clear “why did I get this?” content (task title + due time + shortcut to open task) -- Tests: - - time-based computations, deduplication, quiet hours, and Docker/local channel selection. +- `notifications` table to store delivery history and status. +- In-process scheduler tick (default 60s) to compute due reminders. +- Channel adapters for desktop (local) and in-app notifications. +- In-app API to list notifications for UI/TUI. +- Notification types for agent runs and tool failures (depends on PR-003B/PR-014). -## References +## User Stories -- `docs/01-design/DESIGN_NOTIFICATIONS.md` -- `docs/01-design/DESIGN_BACKGROUND_JOBS.md` -- `docs/01-design/REQUIREMENTS_AUDIT.md` (Docker notification constraints) +- As a user, I get reminders before tasks are due and when they are overdue. +- As a user, I can see a history of notifications in the app. +- As a user, notifications respect quiet hours. +- As a user, I can see when an agent run completes or fails. + +## UX Notes (if applicable) + +- Notifications should explain why they fired (task title + due time). ## Technical Design -### Scheduler Implementation - -- **Background Worker:** Use `asyncio.create_task` or a separate process started on app startup. -- **Precision:** Run every 60 seconds. -- **Logic:** - 1. Query `tasks` where `status != 'completed'` and `eta` is not null. - 2. For each task, check if a notification of type (24h, 6h, overdue) has already been sent (check `notifications` table). - 3. If not sent and `now() >= eta - offset`, trigger delivery. -- **Delivery:** - - `DesktopNotifier`: wrapper around `plyer` for local notifications. - - `InAppNotifier`: inserts into `notifications` table with `status='unread'`. - -### Notification Delivery Channels - -- Implement a `NotificationService` that exposes a single “tick” operation: - - query due tasks - - compute which reminder types should fire - - write delivery history (dedup) - - dispatch via the configured channel(s) -- Implement channel adapters: - - `DesktopNotifier` (local) wraps `plyer` - - `InAppNotifier` persists a notification record for UI/TUI to display +### Architecture + +- Background task or scheduler runs on app startup and calls a `NotificationService`. +- `NotificationService` computes due reminders, persists history, and dispatches + through configured channels. + +### Data Model / Migrations + +- `notifications` table: id, task_id, type, channel, status, error, sent_at, + created_at. + +### API Contract + +- `GET /api/v1/notifications` returns recent notifications for UI/TUI. +- `PATCH /api/v1/notifications/{id}` marks as read (optional). + +### Background Jobs + +- In-process scheduler runs every 60s (configurable). + +### Security / Privacy + +- Do not log notification content beyond metadata. + +### Error Handling + +- Delivery failures are stored with error reason and do not stop the scheduler. ## Acceptance Criteria -- [ ] Reminders fire at default offsets (24h/6h) and for overdue tasks. -- [ ] No duplicate notifications for the same task/type (dedup persisted). -- [ ] Quiet hours are respected (defer or suppress, per config). +### AC1: Scheduling and Overdue + +**Success Criteria:** +- [ ] Reminders fire at configured offsets and for overdue tasks. + +### AC2: Deduplication and History + +**Success Criteria:** +- [ ] No duplicate notifications for the same task/type. +- [ ] Notification history is persisted and queryable. + +### AC3: Delivery Channels + +**Success Criteria:** - [ ] Docker mode uses in-app notifications as the baseline channel. -- [ ] Notification history is persisted and queryable (for UI/TUI listing). +- [ ] Local mode can use desktop notifications. + +### AC4: Quiet Hours + +**Success Criteria:** +- [ ] Quiet hours suppress or defer notifications per config. ## Test Plan ### Automated -- Unit (pure logic): given `(now, eta, offsets, history)` compute which notifications should fire. -- Integration (DB): history persistence prevents duplicates across ticks/restarts. -- Channel selection: local run uses desktop channel; Docker run uses in-app channel. -- Quiet hours: notifications are suppressed/deferred during quiet window. +- Unit tests for schedule computation and quiet hours. +- Integration tests for history persistence and dedup across restarts. +- Channel selection tests for local vs Docker mode. ### Manual -1. Create a task due in ~10 minutes; temporarily set schedule to `["10m"]` for testing. -2. Run scheduler job (or wait for interval). -3. Verify a notification appears and is recorded in history. -4. Mark task complete; ensure no further reminders fire. +- Configure a short offset and verify reminders fire. +- Run in Docker and confirm in-app notifications appear. ## Notes / Risks / Open Questions -- Desktop notifications from Docker are non-trivial; plan for an "in-app notifications" channel as a reliable baseline. +- Desktop notifications from Docker are unreliable; in-app feed is the baseline. +- Agent notifications depend on agent run APIs and may be staged after PR-003B. diff --git a/docs/02-implementation/pr-specs/PR-012-deployment-docs.md b/docs/02-implementation/pr-specs/PR-012-deployment-docs.md index f64d825..26e83a2 100644 --- a/docs/02-implementation/pr-specs/PR-012-deployment-docs.md +++ b/docs/02-implementation/pr-specs/PR-012-deployment-docs.md @@ -1,94 +1,108 @@ # PR-012: Deployment + Documentation (Spec) **Status:** Spec Only -**Depends on:** PR-010, PR-011 +**Depends on:** PR-001 (backup/restore), PR-010, PR-011, PR-017 (backup/restore guidance) **Last Reviewed:** 2025-12-29 ## Goal -Make the system easy to run, upgrade, and back up: -- Docker Compose “just works” -- docs match reality -- data persistence is safe +Make the system easy to run, upgrade, and back up with reliable Docker and accurate +docs. ## User Value -- Faster setup for new machines. -- Lower risk of data loss (clear volumes + backup instructions). +- Faster setup on new machines. +- Lower risk of data loss via clear persistence and backup guidance. + +## References + +- `docs/SETUP.md` +- `docs/01-design/DESIGN_DATA.md` +- `docs/02-implementation/PR-PLANS.md` ## Scope ### In -- Docker Compose configuration: - - API service - - persistent volume for SQLite DB and vector store - - env var configuration - - health checks -- Documentation updates: - - setup instructions - - backup/restore - - “what works today” vs planned +- Docker Compose configuration with persistent volumes and health checks. +- Environment variable documentation and `.env.example` alignment. +- Backup/restore and upgrade instructions. ### Out -- Cloud deployment (Kubernetes, etc.). -- Multi-user auth/HTTPS termination (future). +- Kubernetes or cloud deployment. +- Multi-user auth and HTTPS termination. ## Mini-Specs -- Docker Compose: - - one-command local run with persisted data volumes. -- Environment docs: - - `.env.example` and `docs/SETUP.md` aligned with the actual config keys. -- Developer ergonomics: - - smoke checks (`/health`, basic task CRUD) documented. -- Docs hygiene: - - docs link/name checks in CI (no `todo` command examples; consistent `tgenie` usage; no broken relative links). -- Release: - - minimal “how to run” instructions and expected ports/paths. +- Compose file for API service with volume mounts for DB and vector store. +- Health check on `/health` and clear port mappings. +- Docs updates for setup, backups, and upgrade path. -## References +## User Stories -- `docs/SETUP.md` -- `docs/01-design/DESIGN_DATA.md` (data locations + backup) -- `docs/02-implementation/PR-PLANS.md` (current roadmap) +- As a user, I can start the app with `docker compose up` and keep my data. +- As a user, I can follow docs and get a working setup on a new machine. + +## UX Notes (if applicable) + +- N/A. ## Technical Design -### Docker Compose +### Architecture + +- Docker Compose with a single API service and named volumes for data. +- Document environment variables and default ports. + +### Data Model / Migrations + +- N/A. -- Define volumes for: - - SQLite DB - - vector store - - attachment cache (if needed) -- Add health checks and clear env var configuration. +### API Contract -### Upgrade path +- N/A. -- When migrations exist: - - document `tgenie db upgrade` for upgrades - - document backup/restore workflows (SQL dump + restore) +### Background Jobs + +- N/A. + +### Security / Privacy + +- Document data locations and backup guidance to avoid accidental loss. + +### Error Handling + +- Health checks surface startup failures in Docker. ## Acceptance Criteria -- [ ] `docker compose up` starts successfully and `/health` returns ok. +### AC1: Docker Compose Boots Cleanly + +**Success Criteria:** +- [ ] `docker compose up` starts successfully and `/health` returns OK. + +### AC2: Data Persistence + +**Success Criteria:** - [ ] Data persists across container restarts. -- [ ] Docs are accurate and consistent with the chosen CLI name and UX. + +### AC3: Docs Match Reality + +**Success Criteria:** +- [ ] Setup and backup docs match actual commands, ports, and paths. ## Test Plan ### Automated -- Optional: CI smoke test that builds images and runs health check (if CI exists). +- Optional CI smoke test to build images and hit `/health`. ### Manual -1. `docker compose up -d` -2. Create a task (via TUI or API). -3. Restart containers; verify task persists. -4. Run `tgenie db upgrade` inside container context and verify no errors. +- Run compose, create a task, restart, and verify data persists. +- Follow `docs/SETUP.md` and validate commands. ## Notes / Risks / Open Questions -- Keep local dev (no docker) and docker paths aligned to avoid “works on my machine” data-loss issues. +- Keep docker and local paths aligned to avoid data-loss confusion. diff --git a/docs/02-implementation/pr-specs/PR-013-event-system.md b/docs/02-implementation/pr-specs/PR-013-event-system.md new file mode 100644 index 0000000..6366ab1 --- /dev/null +++ b/docs/02-implementation/pr-specs/PR-013-event-system.md @@ -0,0 +1,427 @@ +# PR-013: Event System + Realtime Updates (Spec) + +**Status:** Spec Only +**Depends on:** PR-002, PR-004 (for attachment event emission) +**Last Reviewed:** 2026-01-01 + +## Goal + +Introduce a lightweight event system for task lifecycle events and realtime updates +for UI and agent hooks. + +## User Value + +- UIs can update in realtime without polling. +- Agent workflows can react to task and attachment changes. + +## References + +- `docs/01-design/DESIGN_ARCHITECTURE.md` +- `docs/01-design/DESIGN_DATA.md` + +## Scope + +### In + +- Event schema and naming conventions (e.g., `task.created`, `task.updated`). +- Event emitter hooks in task and attachment services. +- Persistent event log for replay and SSE resume. +- SSE endpoint for realtime updates. +- Optional webhook delivery to configured endpoints. + +### Out + +- External message brokers (Kafka, SNS/SQS). +- Guaranteed delivery beyond basic retry/backoff. +- Multi-tenant or ACL-based event filtering. + +## Mini-Specs + +- `events` table with `id`, `type`, `payload`, `created_at`. +- `EventService.emit(type, payload)` used by task/attachment mutations. +- `GET /api/v1/events` SSE stream with `Last-Event-ID` support. +- Webhook dispatcher reads configured endpoints and retries on failure. + +## User Stories + +- As a user, my UI updates when tasks change without manual refresh. +- As a developer, I can subscribe to task lifecycle events. + +## UX Notes (if applicable) + +- N/A. + +## Technical Design + +### Architecture + +- In-process event bus writes to the `events` table and publishes to subscribers. +- SSE stream reads from the event log and emits ordered events. +- Webhook dispatcher sends event payloads to configured endpoints. + +### Data Model / Migrations + +- `events` table for lightweight event storage and replay. + +### API Contract + +- `GET /api/v1/events` returns `text/event-stream` with `event:` and `data:` fields. +- Optional query params: `types=task.created,task.updated` for filtering. + +### Background Jobs + +- Webhook delivery worker with basic retry/backoff. + +### Security / Privacy + +- Event payloads include IDs and minimal metadata only. +- Webhook targets are allowlisted via config. + +### Error Handling + +- Failed webhook deliveries are logged and retried without blocking core flows. +- SSE clients can reconnect with `Last-Event-ID`. + +### Implementation Notes + +#### Event Batching for Efficiency + +Aggregate high-frequency events to reduce processing overhead: + +```python +from dataclasses import dataclass, field +from datetime import datetime, timedelta +from typing import List, Optional +from collections import defaultdict + +@dataclass +class EventBatch: + """Batched events for efficient processing.""" + events: List[dict] + batch_id: str + created_at: datetime + target_sse_client_id: Optional[str] = None + +class EventBatcher: + """Event batching and debouncing for high-frequency events.""" + + def __init__( + self, + batch_window_ms: int = 100, + max_batch_size: int = 50 + ): + self.batch_window_ms = batch_window_ms + self.max_batch_size = max_batch_size + self.batches: dict[str, EventBatch] = {} + self.pending_events: defaultdict(list) = defaultdict(list) + self.last_emit_time: dict[str, datetime] = {} + + async def emit( + self, + event_type: str, + payload: dict, + client_id: str | None = None + ) -> str: + """Emit event with batching.""" + event_id = str(uuid.uuid4()) + now = datetime.now() + + # Add to pending events + self.pending_events[client_id].append({ + "id": event_id, + "type": event_type, + "payload": payload, + "created_at": now + }) + + # Check if batch window expired + last_emit = self.last_emit_time.get(client_id, now - timedelta(days=1)) + if (now - last_emit).total_seconds() >= self.batch_window_ms / 1000: + await self._flush_batch(client_id) + + # Flush if batch size exceeded + if len(self.pending_events[client_id]) >= self.max_batch_size: + await self._flush_batch(client_id) + + return event_id + + async def _flush_batch(self, client_id: str) -> None: + """Flush pending events as a batch.""" + if not self.pending_events[client_id]: + return + + events = self.pending_events[client_id].copy() + self.pending_events[client_id].clear() + + batch = EventBatch( + events=events, + batch_id=str(uuid.uuid4()), + created_at=datetime.now() + ) + + # Store batch in memory (for SSE delivery) + self.batches[f"{client_id}:{batch.batch_id}"] = batch + + # Emit batch event + await self._emit_batch_event(client_id, batch) + + self.last_emit_time[client_id] = datetime.now() + + logger.info({ + "event": "event_batch_emitted", + "client_id": client_id, + "batch_size": len(events), + "batch_id": batch.batch_id + }) + + async def _emit_batch_event( + self, + client_id: str, + batch: EventBatch + ) -> None: + """Emit the batch metadata event.""" + # Store batch metadata in event log + batch_event = { + "id": str(uuid.uuid4()), + "type": "event_batch", + "payload": { + "batch_id": batch.batch_id, + "event_count": len(batch.events), + "created_at": batch.created_at.isoformat() + }, + "created_at": datetime.now().isoformat() + } + + # Store in database + await store_event(batch_event) + + # Emit to SSE clients + await sse_broadcast(client_id, { + "type": "batch", + "data": batch + }) +``` + +#### Event Deduplication + +Prevent duplicate events from being emitted: + +```python +from hashlib import md5 +from typing import Set + +class EventDeduplicator: + """Deduplicate events based on content hash.""" + + def __init__(self, dedup_window_seconds: int = 10): + self.dedup_window_seconds = dedup_window_seconds + self.recent_hashes: Set[str] = set() + + async def emit_if_unique( + self, + event_type: str, + payload: dict + ) -> Optional[str]: + """Emit event if not duplicate.""" + # Create content hash + content = f"{event_type}:{json.dumps(payload, sort_keys=True)}" + content_hash = md5(content.encode()).hexdigest() + + # Check if recently emitted + if content_hash in self.recent_hashes: + logger.debug({ + "event": "event_deduped", + "type": event_type, + "hash": content_hash + }) + return None + + # Add to recent hashes + self.recent_hashes.add(content_hash) + + # Clean old hashes (beyond window) + if len(self.recent_hashes) > 1000: + # Keep only recent 1000 hashes + self.recent_hashes = set(list(self.recent_hashes)[-1000:]) + + return await emit_event(event_type, payload) + + async def cleanup_old_hashes(self) -> int: + """Clean old hashes from dedup window.""" + now = datetime.now() + cutoff = now - timedelta(seconds=self.dedup_window_seconds) + + self.recent_hashes = { + h for h in self.recent_hashes + # Assume hash includes timestamp for age tracking + } + + logger.debug({ + "event": "event_dedup_cleanup", + "cleaned_count": len(self.recent_hashes) + }) +``` + +#### Event Filtering and Routing + +Filter events by type and content to reduce context noise: + +```python +@dataclass +class EventFilter: + """Event filtering rules for context optimization.""" + include_types: List[str] = field(default_factory=list) + exclude_types: List[str] = field(default_factory=list) + include_patterns: List[str] = field(default_factory=list) + exclude_patterns: List[str] = field(default_factory=list) + + async def should_emit( + self, + event_type: str, + payload: dict + ) -> bool: + """Check if event should be emitted to context.""" + # Check type filters + if self.include_types and event_type not in self.include_types: + return False + + if event_type in self.exclude_types: + return False + + # Check content patterns + payload_str = json.dumps(payload, sort_keys=True) + + for exclude in self.exclude_patterns: + if exclude in payload_str: + return False + + for include in self.include_patterns: + if include not in payload_str: + return False + + return True + +# Example filters for agent context +AGENT_CONTEXT_FILTER = EventFilter( + include_types=[ + "task.created", + "task.updated", + "attachment.created", + "agent.run_started", + "agent.run_completed" + ], + exclude_types=[ + # Exclude high-frequency noise + "heartbeat", + "ping", + "debug" + ], + include_patterns=[ + # Only include events with agent-relevant content + "task_id", + "run_id", + "agent_name" + ] +) +``` + +#### Event Prioritization + +Prioritize events for context loading (most relevant first): + +```python +from enum import Enum +from typing import Callable + +class EventPriority(Enum): + CRITICAL = 1 + HIGH = 2 + MEDIUM = 3 + LOW = 4 + INFO = 5 + +def get_event_priority( + event_type: str, + payload: dict +) -> EventPriority: + """Determine event priority based on type and content.""" + # Critical: errors, failures, security events + if event_type in ["error", "failure", "security_alert"]: + return EventPriority.CRITICAL + + # High: task completions, agent results + if event_type in ["task.completed", "agent.run_completed"]: + return EventPriority.HIGH + + # Medium: task updates, attachments + if event_type in ["task.updated", "attachment.created", "attachment.updated"]: + return EventPriority.MEDIUM + + # Low: informational events + if event_type in ["heartbeat", "ping", "debug"]: + return EventPriority.LOW + + # Default info + return EventPriority.INFO + +async def emit_with_priority( + event_type: str, + payload: dict +) -> str: + """Emit event with priority metadata.""" + priority = get_event_priority(event_type, payload) + + event_id = await emit_event( + event_type, + { + **payload, + "priority": priority.value + } + ) + + logger.info({ + "event": "event_emitted_with_priority", + "type": event_type, + "priority": priority.name, + "event_id": event_id + }) + + return event_id +``` + +## Acceptance Criteria + +### AC1: Event Emission + +**Success Criteria:** +- [ ] Task create/update/delete emits the expected event types. +- [ ] Attachment create/delete emits events when applicable. + +### AC2: SSE Streaming + +**Success Criteria:** +- [ ] SSE endpoint streams events in order. +- [ ] Reconnect with `Last-Event-ID` resumes without duplicates. + +### AC3: Webhook Delivery (Optional) + +**Success Criteria:** +- [ ] Webhooks receive events when configured. +- [ ] Delivery failures are retried and logged. + +## Test Plan + +### Automated + +- Unit tests for event emission and payloads. +- Integration tests for SSE streaming and resume behavior. +- Webhook dispatcher tests with mocked HTTP endpoints. + +### Manual + +- Create a task and observe SSE events in a terminal client. +- Configure a webhook endpoint and verify deliveries. + +## Notes / Risks / Open Questions + +- Decide how long to retain events in the log. diff --git a/docs/02-implementation/pr-specs/PR-014-multi-agent-orchestration.md b/docs/02-implementation/pr-specs/PR-014-multi-agent-orchestration.md new file mode 100644 index 0000000..7b1461a --- /dev/null +++ b/docs/02-implementation/pr-specs/PR-014-multi-agent-orchestration.md @@ -0,0 +1,967 @@ +# PR-014: Multi-Agent Orchestration (Spec) + +**Status:** Spec Only +**Depends on:** PR-003B, PR-013 +**Last Reviewed:** 2026-01-01 + +## Goal + +Enable multiple agents to collaborate on a goal with shared memory and coordinated +execution. + +## User Value + +- Complex goals can be decomposed across specialized agents. +- Users can track agent progress and outcomes over time. + +## References + +- `docs/01-design/DESIGN_CHAT.md` +- `docs/01-design/DESIGN_ARCHITECTURE.md` + +## Scope + +### In + +- Agent manager that spawns and supervises agent runs. +- Shared memory store for agent context and summaries. +- Run lifecycle: start, pause, resume, cancel, complete. +- Concurrency controls and rate limiting. + +### Out + +- Agent marketplace or third-party agent plugins. +- Cross-user collaboration and shared workspaces. + +## Mini-Specs + +- `AgentRun` record with status, goal, timestamps, and summary. +- `AgentManager` orchestrates runs and delegates tool calls. +- Shared memory storage for summaries and recent context. +- Event integration to emit run status updates (PR-013). + +## User Stories + +- As a user, I can launch an agent run and watch progress updates. +- As a user, I can pause or cancel a long-running agent. + +## UX Notes (if applicable) + +- Provide a clear "running" indicator and last-updated timestamp. + +## Technical Design + +### Architecture + +- Supervisor process manages agent workers and run state. +- Agent workers execute plans using the tool-calling foundation. +- Memory store persists summaries and last-known context for runs. + +### Data Model / Migrations + +- `agent_runs` table: id, goal, status, started_at, finished_at, summary. +- Optional `agent_messages` table for recent context slices. + +### API Contract + +- `POST /api/v1/agents/run` starts a run and returns `run_id`. +- `GET /api/v1/agents/{run_id}` returns status and summary. +- `POST /api/v1/agents/{run_id}/cancel` cancels a run. + +### Background Jobs + +- Agent worker loop and run scheduler. + +### Security / Privacy + +- Run data stored locally; no prompts or responses logged by default. + +### Error Handling + +- Failed runs are marked with error status and short reason. + +### Implementation Notes + +#### Architectural Pattern Selection + +Implement **Swarm/Peer-to-Peer** pattern (not Supervisor) to avoid telephone game: + +```python +@dataclass +class AgentDefinition: + """Agent definition for swarm pattern.""" + name: str + role: str # researcher, planner, executor + tools: list[str] # Tools available to this agent + system_prompt: str | None = None # Optional role-specific prompt + can_handoff: bool = True # Supports peer handoff + +class SwarmOrchestrator: + """Peer-to-peer agent orchestration without supervisor bottleneck.""" + + def __init__(self): + self.agents: dict[str, AgentDefinition] = {} + self.active_runs: dict[str, AgentRun] = {} + + def register_agent(self, agent_def: AgentDefinition): + """Register agent with role and tools.""" + self.agents[agent_def.name] = agent_def + logger.info({ + "event": "agent_registered", + "agent_name": agent_def.name, + "role": agent_def.role, + "tools": agent_def.tools + }) + + async def delegate_task(self, goal: str, context: dict) -> AgentRun: + """Delegate to appropriate specialist agent.""" + # Analyze goal to select initial agent + initial_agent = self._select_agent(goal) + + # Create run with isolated context + run_id = str(uuid.uuid4()) + run = AgentRun( + id=run_id, + goal=goal, + agent_name=initial_agent.name, + status="running", + started_at=datetime.now() + ) + + self.active_runs[run_id] = run + + logger.info({ + "event": "run_started", + "run_id": run_id, + "agent": initial_agent.name, + "goal": goal + }) + + # Initial agent has control + return await initial_agent.execute(run_id, goal, context, self) + + def _select_agent(self, goal: str) -> AgentDefinition: + """Select appropriate agent based on goal analysis.""" + goal_lower = goal.lower() + + if any(kw in goal_lower for kw in ["research", "find", "search", "look up"]): + return self.agents.get("researcher") + elif any(kw in goal_lower for kw in ["plan", "design", "organize", "schedule"]): + return self.agents.get("planner") + elif any(kw in goal_lower for kw in ["execute", "do", "implement", "write", "create"]): + return self.agents.get("executor") + else: + return self.agents.get("generalist") + +# Example agent definitions +RESEARCHER = AgentDefinition( + name="researcher", + role="Research specialist", + system_prompt="You are a research specialist. Gather information from available sources and provide factual findings.", + tools=["search_tasks", "search_rag", "fetch_attachment"], + can_handoff=True +) + +PLANNER = AgentDefinition( + name="planner", + role="Planning specialist", + system_prompt="You are a planning specialist. Break down goals into actionable steps and delegate to specialists.", + tools=["list_tasks", "create_task", "add_attachment"], + can_handoff=True +) + +EXECUTOR = AgentDefinition( + name="executor", + role="Execution specialist", + system_prompt="You are an execution specialist. Perform tasks directly using available tools. Focus on correctness and efficiency.", + tools=["update_task", "mark_done", "delete_task"], + can_handoff=False +) + +orchestrator = SwarmOrchestrator() +orchestrator.register_agent(RESEARCHER) +orchestrator.register_agent(PLANNER) +orchestrator.register_agent(EXECUTOR) +``` + +#### Handoff Protocol + +Implement explicit handoff mechanism with `forward_message` tool: + +```python +def forward_to_agent( + agent_name: str, + message: str, + context: dict | None = None +) -> dict: + """ + Forward to another agent with full context preservation. + + Use when: + - Current agent lacks required tools for task + - Task requires different specialization + - Agent reached depth/convergence limit + + Returns: + Handoff directive with target agent and full message + """ + return { + "type": "handoff", + "target_agent": agent_name, + "message": message, + "context": context # Pass accumulated context + } + +# Register as tool in agent tool definitions +HANDOFF_TOOL = ToolDefinition( + name="forward_to_agent", + description=( + "Transfer control to a different agent. " + "Use when current agent lacks required tools or specialization. " + "Preserves full conversation context for handoff." + ), + parameters=ToolParameter( + name="args", + type="object", + required=True, + properties={ + "agent_name": { + "type": "string", + "description": "Target agent name" + }, + "message": { + "type": "string", + "description": "Reason for handoff and accumulated findings" + } + } + ) +) + +# Agent usage +async def researcher_agent(goal: str, context: dict) -> dict: + """Researcher agent with handoff capability.""" + findings = await perform_research(goal) + + # If goal requires planning, handoff to planner + if requires_planning(goal): + return forward_to_agent("planner", f"Research findings: {findings}", context) + + # If goal requires execution, handoff to executor + if requires_execution(goal): + return forward_to_agent("executor", f"Action plan: {findings}", context) + + return {"type": "final", "result": findings} +``` + +#### Convergence and Consensus + +Implement debate protocol for complex decisions: + +```python +class DebateCoordinator: + """Coordinate debate between multiple agents.""" + + def __init__(self): + self.participants: list[str] = [] + self.round: int = 0 + self.max_rounds: int = 3 + + async def coordinate(self, question: str) -> dict: + """Run debate and aggregate results.""" + self.participants = self._select_participants(question) + self.round = 0 + + while self.round < self.max_rounds: + self.round += 1 + + # Get positions from all agents + positions = await self._gather_positions(question) + + # Critique phase: agents critique each other's positions + critiques = await self._gather_critiques(positions) + + # Present critiques to agents for next round + updated_positions = await self._incorporate_critiques(positions, critiques) + + # Check convergence + if self._has_converged(updated_positions): + break + + return self._aggregate_results(updated_positions) + + def _select_participants(self, question: str) -> list[str]: + """Select relevant agents for question.""" + question_lower = question.lower() + + if "github" in question_lower or "pr" in question_lower: + return ["github_specialist", "code_reviewer"] + elif "email" in question_lower or "gmail" in question_lower: + return ["gmail_specialist", "email_analyst"] + else: + return ["generalist", "researcher", "planner"] + + def _has_converged(self, positions: list[dict]) -> bool: + """Check if agents have converged.""" + if len(positions) < 2: + return False + + # Check agreement threshold (e.g., 80% agree) + positions_text = [p.get("answer") for p in positions] + most_common = max(set(positions_text), key=positions_text.count) + agreement_ratio = positions_text.count(most_common) / len(positions_text) + + return agreement_ratio >= 0.8 + + def _aggregate_results(self, positions: list[dict]) -> dict: + """Aggregate debate results.""" + # Weight votes by confidence + weighted_votes = [] + for p in positions: + confidence = p.get("confidence", 0.5) + weighted_votes.extend([p.get("answer")] * int(confidence * 10)) + + most_common = max(set(weighted_votes), key=weighted_votes.count) + + return { + "type": "consensus", + "answer": most_common, + "confidence": weighted_votes.count(most_common) / len(weighted_votes), + "rounds": self.round + } +``` + +#### Failure Mode Mitigations + +**Supervisor Bottleneck:** Implement output schema constraints + +```python +@dataclass +class SupervisorOutput: + """Constrained output to prevent supervisor context bloat.""" + status: str # "success", "partial", "failed" + summary: str # Concise summary only + findings: list[str] # Key findings (not full context) + next_actions: list[str] # Actionable next steps + +def supervisor_delegate( + task: str, + sub_agents: list[str] +) -> SupervisorOutput: + """Delegate to sub-agents with constrained output.""" + results = await asyncio.gather([ + run_agent(agent, task) for agent in sub_agents + ]) + + return SupervisorOutput( + status="success", + summary=f"Delegated to {len(results)} agents", + findings=[r["output"] for r in results if "output" in r], + next_actions=["Review findings for action plan"] + ) +``` + +**Divergence Prevention:** Define clear objective boundaries + +```python +@dataclass +class AgentObjective: + """Agent objective with boundaries.""" + primary_goal: str + success_criteria: list[str] # When to consider complete + constraints: list[str] # What NOT to do + handoff_triggers: list[str] # When to transfer control + +def enforce_objectives(agent_run: AgentRun, objective: AgentObjective): + """Enforce objective boundaries during execution.""" + current_goal = agent_run.goal.lower() + + # Check for divergence + for constraint in objective.constraints: + if constraint.lower() in current_goal: + logger.warning({ + "event": "agent_divergence", + "agent": agent_run.agent_name, + "objective": objective.primary_goal, + "constraint": constraint, + "goal": agent_run.goal + }) + return False + + return True + +--- + +### Memory Implementation Notes + +#### Shared Memory Architecture + +Implement hierarchical memory with temporal validity for multi-agent coordination: + +```python +from dataclasses import dataclass, field +from datetime import datetime +from typing import Optional, Dict, List + +@dataclass +class MemoryFact: + """Memory fact with temporal validity.""" + entity_type: str # "task", "attachment", "github_pr", "email" + entity_id: str + property_name: str + property_value: str + source_agent: str # Which agent learned this fact + valid_from: datetime # Fact becomes valid at + valid_until: Optional[datetime] = None # Fact expires at + confidence: float = 1.0 # Confidence in fact + created_at: datetime + +@dataclass +class SharedMemory: + """Shared memory store for agent coordination.""" + + def __init__(self): + self.facts: Dict[str, MemoryFact] = {} + self.entity_indices: Dict[str, set[str]] = {} + self.temporal_indexes: Dict[str, list[str]] = {} + + def store_fact( + self, + entity_type: str, + entity_id: str, + property_name: str, + property_value: str, + source_agent: str, + confidence: float = 1.0, + valid_hours: int | None = None + ) -> None: + """Store a memory fact with optional temporal validity.""" + fact_id = f"{entity_type}:{entity_id}:{property_name}" + + valid_from = datetime.now() + valid_until = datetime.now() + timedelta(hours=valid_hours) if valid_hours else None + + fact = MemoryFact( + entity_type=entity_type, + entity_id=entity_id, + property_name=property_name, + property_value=property_value, + source_agent=source_agent, + valid_from=valid_from, + valid_until=valid_until, + confidence=confidence, + created_at=datetime.now() + ) + + self.facts[fact_id] = fact + + # Update entity index + if entity_type not in self.entity_indices: + self.entity_indices[entity_type] = set() + self.entity_indices[entity_type].add(entity_id) + + # Update temporal index + if valid_until: + time_key = valid_until.strftime("%Y-%m-%d") + if time_key not in self.temporal_indexes: + self.temporal_indexes[time_key] = [] + self.temporal_indexes[time_key].append(fact_id) + + logger.info({ + "event": "memory_fact_stored", + "fact_id": fact_id, + "source_agent": source_agent, + "confidence": confidence + }) + + def retrieve_facts( + self, + entity_type: str, + entity_id: str, + as_of: datetime | None = None + ) -> List[MemoryFact]: + """Retrieve facts for entity, respecting temporal validity.""" + as_of = as_of or datetime.now() + prefix = f"{entity_type}:{entity_id}:" + + matching = [ + fact for fact_id, fact in self.facts.items() + if fact_id.startswith(prefix) + and fact.valid_from <= as_of + and (fact.valid_until is None or fact.valid_until > as_of) + ] + + # Sort by confidence and recency + matching.sort(key=lambda f: (f.confidence, f.created_at), reverse=True) + + logger.debug({ + "event": "memory_retrieval", + "entity_type": entity_type, + "entity_id": entity_id, + "facts_found": len(matching), + "as_of": as_of.isoformat() + }) + + return matching + + def temporal_query( + self, + entity_type: str, + query_time: datetime, + time_window_hours: int = 24 + ) -> List[MemoryFact]: + """Query memory state as of specific time (time-travel).""" + window_start = query_time - timedelta(hours=time_window_hours) + window_end = query_time + timedelta(hours=time_window_hours) + + matching = [ + fact for fact in self.facts.values() + if fact.entity_type == entity_type + and fact.valid_from <= window_end + and (fact.valid_until is None or fact.valid_until > window_start) + ] + + logger.info({ + "event": "memory_temporal_query", + "entity_type": entity_type, + "query_time": query_time.isoformat(), + "window_start": window_start.isoformat(), + "window_end": window_end.isoformat(), + "facts_found": len(matching) + }) + + return matching + +# Global shared memory instance +shared_memory = SharedMemory() + +# Agent usage +async def researcher_agent(goal: str, context: dict) -> dict: + """Researcher agent stores findings in shared memory.""" + findings = await perform_research(goal) + + # Store in shared memory for other agents + for finding in findings: + shared_memory.store_fact( + entity_type="research_finding", + entity_id=str(hash(finding)), + property_name="content", + property_value=str(finding), + source_agent="researcher", + confidence=0.8, + valid_hours=48 # Facts valid for 48 hours + ) + + return {"type": "final", "result": findings} + +async def executor_agent(goal: str, context: dict) -> dict: + """Executor agent reads from shared memory.""" + task_id = context.get("task_id") + + # Retrieve relevant facts about task + task_facts = shared_memory.retrieve_facts( + entity_type="task", + entity_id=task_id + ) + + return {"type": "final", "result": f"Executing task with: {task_facts}"} +``` + +#### Entity Consistency Tracking + +Track entity identity across sessions and agents: + +```python +class EntityRegistry: + """Maintain entity consistency across sessions.""" + + def __init__(self, shared_memory: SharedMemory): + self.memory = shared_memory + self.entity_references: Dict[str, set[str]] = {} + + def track_reference( + self, + session_id: str, + entity_type: str, + entity_id: str, + reference_text: str + ) -> None: + """Track when an entity is referenced in conversation.""" + key = f"{session_id}:{entity_type}" + + if key not in self.entity_references: + self.entity_references[key] = set() + + self.entity_references[key].add(entity_id) + + # Check if this is a new entity reference + existing_facts = self.memory.retrieve_facts(entity_type, entity_id) + is_new_entity = len(existing_facts) == 0 + + if is_new_entity: + logger.info({ + "event": "entity_first_mention", + "session_id": session_id, + "entity_type": entity_type, + "entity_id": entity_id, + "reference": reference_text[:100] + }) + + def resolve_identity( + self, + session_id: str, + entity_type: str, + entity_id: str, + potential_matches: list[str] + ) -> Optional[str]: + """Resolve entity identity when multiple candidates exist.""" + key = f"{session_id}:{entity_type}" + tracked_entities = self.entity_references.get(key, set()) + + if not tracked_entities: + return None + + # Find best match among tracked entities + best_match = self._find_best_match(entity_id, potential_matches, tracked_entities) + + if best_match: + # Link identities + for entity in tracked_entities: + if entity != best_match: + self.memory.store_fact( + entity_type="entity_alias", + entity_id=entity, + property_name="same_as", + property_value=best_match, + source_agent="system", + confidence=0.95, + valid_hours=None + ) + + return best_match + + return entity_id + + def _find_best_match( + self, + entity_id: str, + potential_matches: list[str], + tracked_entities: set[str] + ) -> Optional[str]: + """Find best match using string similarity.""" + from difflib import SequenceMatcher + + best_match = None + best_ratio = 0.0 + + for candidate in tracked_entities: + ratio = SequenceMatcher(None, candidate).ratio(entity_id) + + # Penalize matches that differ significantly from entity_id + candidate_similarity = SequenceMatcher(None, entity_id).ratio(candidate) + adjusted_ratio = ratio * (1.0 if candidate_similarity > 0.9 else 0.8) + + if adjusted_ratio > best_ratio: + best_ratio = adjusted_ratio + best_match = candidate + + return best_match +``` + +#### Memory Consolidation + +Implement periodic consolidation to prevent unbounded growth: + +```python +class MemoryConsolidator: + """Consolidate memory to prevent unbounded growth.""" + + def __init__(self, shared_memory: SharedMemory): + self.memory = shared_memory + self.last_consolidation: Optional[datetime] = None + self.consolidation_interval_hours: int = 24 + + async def consolidate_if_needed(self) -> dict: + """Consolidate memory if interval elapsed.""" + now = datetime.now() + + if (self.last_consolidation is None or + (now - self.last_consolidation).total_seconds() > + self.consolidation_interval_hours * 3600): + + logger.info({"event": "memory_consolidation_started"}) + + stats = await self._consolidate() + + self.last_consolidation = now + + return stats + + return {"status": "skipped", "reason": "Too recent"} + + async def _consolidate(self) -> dict: + """Perform consolidation.""" + # 1. Remove outdated facts + removed = await self._remove_outdated_facts() + + # 2. Merge duplicate facts + merged = await self._merge_duplicate_facts() + + # 3. Update validity periods + updated = await self._refresh_validity_periods() + + # 4. Remove low-confidence facts + pruned = await self._prune_low_confidence() + + return { + "removed_facts": removed, + "merged_facts": merged, + "updated_validity": updated, + "pruned_low_confidence": pruned + } + + async def _remove_outdated_facts(self) -> int: + """Remove facts past validity period.""" + now = datetime.now() + fact_ids_to_remove = [] + + for fact_id, fact in self.memory.facts.items(): + if fact.valid_until and fact.valid_until < now: + fact_ids_to_remove.append(fact_id) + + for fact_id in fact_ids_to_remove: + del self.memory.facts[fact_id] + + logger.info({ + "event": "memory_consolidation", + "action": "removed_outdated", + "count": len(fact_ids_to_remove) + }) + + return len(fact_ids_to_remove) + + async def _merge_duplicate_facts(self) -> int: + """Merge facts about same entity/property.""" + # Group facts by entity:property + from collections import defaultdict + property_groups = defaultdict(list) + + for fact_id, fact in self.memory.facts.items(): + key = f"{fact.entity_type}:{fact.entity_id}:{fact.property_name}" + property_groups[key].append(fact) + + merged_count = 0 + for facts in property_groups.values(): + if len(facts) > 1: + # Keep highest confidence, mark others as superseded + facts.sort(key=lambda f: f.confidence, reverse=True) + best_fact = facts[0] + + for fact in facts[1:]: + if fact.property_value != best_fact.property_value: + # Mark as superseded + self.memory.store_fact( + entity_type=fact.entity_type, + entity_id=fact.entity_id, + property_name="superseded_by", + property_value=best_fact.property_value, + source_agent="consolidation", + confidence=0.5, + valid_hours=None + ) + merged_count += 1 + + logger.info({ + "event": "memory_consolidation", + "action": "merged_duplicates", + "count": merged_count + }) + + return merged_count + + async def _prune_low_confidence(self) -> int: + """Remove facts below confidence threshold.""" + confidence_threshold = 0.3 + + fact_ids_to_remove = [ + fact_id for fact_id, fact in self.memory.facts.items() + if fact.confidence < confidence_threshold + ] + + for fact_id in fact_ids_to_remove: + del self.memory.facts[fact_id] + + logger.info({ + "event": "memory_consolidation", + "action": "pruned_low_confidence", + "count": len(fact_ids_to_remove) + }) + + return len(fact_ids_to_remove) +``` + +#### Integration with Context Loading + +Load relevant memories into agent context: + +```python +async def load_relevant_memory( + agent_name: str, + goal: str, + context_budget: int = 2000 +) -> str: + """Load relevant memories into agent context.""" + # Extract entities from goal + entities = extract_entities(goal) + + relevant_facts = [] + tokens_used = 0 + + for entity_type, entity_id in entities: + facts = shared_memory.retrieve_facts(entity_type, entity_id) + + for fact in facts: + fact_tokens = estimate_tokens(str(fact)) + + if tokens_used + fact_tokens <= context_budget: + relevant_facts.append(fact) + tokens_used += fact_tokens + else: + break + + if not relevant_facts: + return "" + + memory_context = "\n\n".join([ + f"[Memory from {fact.source_agent}] {fact.entity_type}:{fact.entity_id} " + f"{fact.property_name}={fact.property_value} " + f"(confidence: {fact.confidence})" + for fact in relevant_facts + ]) + + logger.info({ + "event": "memory_context_loaded", + "agent": agent_name, + "entities_count": len(entities), + "facts_loaded": len(relevant_facts), + "tokens_used": tokens_used, + "tokens_remaining": context_budget - tokens_used + }) + + return memory_context + +def estimate_tokens(text: str) -> int: + """Rough token estimation.""" + return len(text.split()) * 1.3 +``` + +#### Agent Task Summarization + +Store agent summaries in memory for future retrieval: + +```python +@dataclass +class AgentRunSummary: + """Summary of agent run stored in memory.""" + run_id: str + agent_name: str + goal: str + summary: str # Exec summary + key_findings: list[str] # Important discoveries + outcomes: list[str] # Results achieved + tokens_used: int + duration_seconds: int + +async def store_agent_summary( + run: AgentRun, + summary: str, + key_findings: list[str], + outcomes: list[str] +) -> None: + """Store agent run summary in shared memory.""" + run_summary = AgentRunSummary( + run_id=run.id, + agent_name=run.agent_name, + goal=run.goal, + summary=summary, + key_findings=key_findings, + outcomes=outcomes, + tokens_used=run.tokens_used or 0, + duration_seconds=(run.finished_at - run.started_at).total_seconds() if run.finished_at else 0 + ) + + # Store as memory facts for later retrieval + shared_memory.store_fact( + entity_type="agent_run", + entity_id=run.id, + property_name="summary", + property_value=summary, + source_agent=run.agent_name, + confidence=0.9, + valid_hours=None + ) + + for i, finding in enumerate(key_findings): + shared_memory.store_fact( + entity_type="agent_run", + entity_id=run.id, + property_name=f"finding_{i}", + property_value=finding, + source_agent=run.agent_name, + confidence=0.8, + valid_hours=None + ) + + for i, outcome in enumerate(outcomes): + shared_memory.store_fact( + entity_type="agent_run", + entity_id=run.id, + property_name=f"outcome_{i}", + property_value=outcome, + source_agent=run.agent_name, + confidence=0.9, + valid_hours=None + ) + + logger.info({ + "event": "agent_summary_stored", + "run_id": run.id, + "agent": run.agent_name, + "key_findings": len(key_findings), + "outcomes": len(outcomes) + }) +``` + +## Acceptance Criteria + +### AC1: Run Lifecycle + +**Success Criteria:** +- [ ] Agent runs can be started, paused, resumed, and canceled. +- [ ] Run status is persisted and queryable. + +### AC2: Concurrency Control + +**Success Criteria:** +- [ ] Concurrency limits prevent excessive parallel runs. +- [ ] Runs back off or queue when limits are reached. + +### AC3: Event Integration + +**Success Criteria:** +- [ ] Run status updates emit events for UI consumption. + +## Test Plan + +### Automated + +- Unit tests for run state transitions. +- Integration tests for API start/cancel/status. +- Concurrency tests for max-parallel settings. + +### Manual + +- Start a run, observe status updates, then cancel it. + +## Notes / Risks / Open Questions + +- Decide how much agent context to persist vs summarize. diff --git a/docs/02-implementation/pr-specs/PR-015-agent-ux-panel.md b/docs/02-implementation/pr-specs/PR-015-agent-ux-panel.md new file mode 100644 index 0000000..f8dd70c --- /dev/null +++ b/docs/02-implementation/pr-specs/PR-015-agent-ux-panel.md @@ -0,0 +1,108 @@ +# PR-015: Agent UX Panel (Spec) + +**Status:** Spec Only +**Depends on:** PR-008, PR-003B, PR-013, PR-014 +**Last Reviewed:** 2026-01-01 + +## Goal + +Add a TUI panel for agent runs, tool execution status, and controls. + +## User Value + +- Users can see what the agent is doing and intervene when needed. +- Tool execution is transparent and traceable. + +## References + +- `docs/01-design/DESIGN_TUI.md` +- `docs/01-design/DESIGN_CHAT.md` + +## Scope + +### In + +- Agent panel in the TUI showing run status and recent actions. +- Tool execution timeline with success/failure states. +- Controls to pause, resume, and cancel agent runs. +- Real-time updates via event stream (PR-013). + +### Out + +- Web UI for agent runs. +- Multi-agent visualization dashboards. + +## Mini-Specs + +- New TUI screen/panel showing agent run list and details. +- Status indicator (running/paused/failed/completed) with timestamps. +- Keybindings for pause/resume/cancel actions. +- Inline view of recent tool calls and results. + +## User Stories + +- As a user, I can see when an agent starts, progresses, and finishes. +- As a user, I can pause or cancel a run from the TUI. + +## UX Notes (if applicable) + +- Avoid showing hidden reasoning; show tool actions and status only. + +## Technical Design + +### Architecture + +- TUI panel subscribes to SSE events for agent run updates. +- Agent controls call agent run endpoints (`/api/v1/agents/...`). + +### Data Model / Migrations + +- N/A. + +### API Contract + +- Uses PR-014 run endpoints and PR-013 event stream. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- Do not display or log raw prompts by default. + +### Error Handling + +- If event stream disconnects, show a recoverable error state and retry option. + +## Acceptance Criteria + +### AC1: Agent Panel Rendering + +**Success Criteria:** +- [ ] Agent panel renders and updates in real time. + +### AC2: Tool Execution Visibility + +**Success Criteria:** +- [ ] Tool calls and results appear in the timeline with statuses. + +### AC3: Run Controls + +**Success Criteria:** +- [ ] Pause/resume/cancel actions work from the TUI. + +## Test Plan + +### Automated + +- TUI widget tests for agent panel rendering. +- Integration tests with mocked SSE events and agent API calls. + +### Manual + +- Start an agent run and verify live status updates and controls. + +## Notes / Risks / Open Questions + +- Decide how much tool detail to surface without overwhelming users. diff --git a/docs/02-implementation/pr-specs/PR-016-observability-baseline.md b/docs/02-implementation/pr-specs/PR-016-observability-baseline.md new file mode 100644 index 0000000..ddf2606 --- /dev/null +++ b/docs/02-implementation/pr-specs/PR-016-observability-baseline.md @@ -0,0 +1,113 @@ +# PR-016: Observability Baseline (Spec) + +**Status:** Spec Only +**Depends on:** PR-001 +**Last Reviewed:** 2026-01-01 + +## Goal + +Provide baseline observability with structured logs and a lightweight telemetry +endpoint. + +## User Value + +- Easier debugging of failures and performance issues. +- Clear visibility into DB health and migration status. + +## References + +- `docs/01-design/DESIGN_ARCHITECTURE.md` + +## Scope + +### In + +- JSON structured logging with correlation IDs. +- Request logging middleware for API routes. +- Telemetry endpoint with health and basic metrics. +- Log redaction for secrets and tokens. + +### Out + +- Full distributed tracing or external telemetry exporters. +- Long-term metrics storage. + +## Mini-Specs + +- Log format includes `request_id`, `event`, `duration_ms`, `status`. +- Middleware assigns request IDs and injects into logs. +- `GET /api/v1/telemetry` returns JSON metrics: + - DB connectivity + - migration version + - event queue size (if PR-013 exists) + - agent run counts (if PR-014 exists) +- Redact tokens and email addresses from logs. + +## User Stories + +- As a developer, I can correlate logs by request ID. +- As a user, I can see a single endpoint with system health metrics. + +## UX Notes (if applicable) + +- N/A. + +## Technical Design + +### Architecture + +- Logging config in `backend/logging.py` (or equivalent) with JSON formatter. +- Middleware attaches `request_id` to context and response headers. + +### Data Model / Migrations + +- N/A. + +### API Contract + +- `GET /api/v1/telemetry` returns JSON metrics. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- Telemetry excludes PII and content payloads. + +### Error Handling + +- Telemetry endpoint degrades gracefully if optional subsystems are missing. + +## Acceptance Criteria + +### AC1: Structured Logging + +**Success Criteria:** +- [ ] Logs are JSON and include `request_id` for API requests. + +### AC2: Telemetry Endpoint + +**Success Criteria:** +- [ ] `/api/v1/telemetry` returns JSON with DB health and migration version. + +### AC3: Redaction + +**Success Criteria:** +- [ ] Tokens and emails are redacted from logs. + +## Test Plan + +### Automated + +- Unit tests for log formatter and redaction. +- Integration tests for telemetry endpoint response shape. + +### Manual + +- Trigger API requests and verify logs contain `request_id`. +- Call `/api/v1/telemetry` and verify metrics payload. + +## Notes / Risks / Open Questions + +- Decide whether to expose telemetry in production by default. diff --git a/docs/02-implementation/pr-specs/PR-017-db-config-followups.md b/docs/02-implementation/pr-specs/PR-017-db-config-followups.md new file mode 100644 index 0000000..d4fee70 --- /dev/null +++ b/docs/02-implementation/pr-specs/PR-017-db-config-followups.md @@ -0,0 +1,130 @@ +# PR-017: DB Config Follow-ups (Spec) + +**Status:** Spec Only +**Depends on:** PR-001 +**Last Reviewed:** 2025-12-30 + +## Goal + +Close gaps discovered after PR-001 by tightening directory creation, restore safety, +SQLite FK enforcement, and docs alignment. + +## User Value + +- App data directories are consistent and complete across machines. +- Restore confirmations prevent accidental overwrites while supporting automation. +- SQLite foreign keys are enforced for all connections. +- Troubleshooting docs reflect current behavior. + +## References + +- `docs/02-implementation/pr-specs/PR-001-db-config.md` +- `docs/01-design/DESIGN_DATA.md` +- `docs/02-implementation/MIGRATIONS.md` + +## Scope + +### In + +- Ensure `ensure_app_dirs()` creates the vector store directory (`data/chroma`). +- Keep confirmation when `tgenie db restore` would overwrite an existing DB, + and add a `--yes` flag to skip the prompt for automation. +- Enforce SQLite foreign keys on every SQLAlchemy connection (not just sessions). +- Update `docs/TROUBLESHOOTING.md` to remove PR-001 "planned" wording and align + default DB path guidance with current settings. + +### Out + +- Non-SQLite databases or new backup/retention policies. +- Changes to migration scripts or schema. + +## Mini-Specs + +- Add vector store directory creation to settings dir bootstrap. +- Add a restore confirmation bypass flag for overwrite prompts. +- Add engine-level SQLite FK enforcement hook. +- Align troubleshooting docs with actual DB/config behavior. + +## User Stories + +- As a user, I see my vector store directory created alongside the DB. +- As a user, I must explicitly confirm a restore before it overwrites data. +- As a developer, I can rely on SQLite foreign keys for any connection. + +## UX Notes (if applicable) + +- Restore prompts should be explicit about the target DB path. +- The `--yes` flag should be consistent with `db reset`. + +## Technical Design + +### Architecture + +- In `Settings.ensure_app_dirs()`, create `data/chroma` using the same root as + `database_path`. +- For SQLite, attach a SQLAlchemy engine `connect` event that runs + `PRAGMA foreign_keys=ON` on every new DBAPI connection (sync + async engines). +- Keep `get_db()` PRAGMA as a secondary guard. + +### Data Model / Migrations + +- No schema changes. + +### API Contract + +- `tgenie db restore --in ` prompts when the target DB exists, unless + `--yes` is provided. + +### Background Jobs + +- N/A. + +### Security / Privacy + +- Do not log SQL contents during restore. + +### Error Handling + +- Restore failures should remain actionable and exit non-zero. + +## Acceptance Criteria + +### AC1: App Dir Completeness + +**Success Criteria:** +- [ ] `ensure_app_dirs()` creates `data/chroma` under the resolved app data dir. +- [ ] No directories are created at import time. + +### AC2: Restore Confirmation + +**Success Criteria:** +- [ ] `tgenie db restore` prompts for confirmation only when the target DB exists. +- [ ] `--yes` skips the prompt when an existing DB would be overwritten. + +### AC3: SQLite FK Enforcement + +**Success Criteria:** +- [ ] SQLite `PRAGMA foreign_keys=ON` is executed for every engine connection. + +### AC4: Docs Alignment + +**Success Criteria:** +- [ ] `docs/TROUBLESHOOTING.md` reflects current DB wiring and default paths. + +## Test Plan + +### Automated + +- Unit: `ensure_app_dirs()` creates `data/chroma`. +- CLI: `tgenie db restore` prompts only when the DB exists; `--yes` skips. +- Integration: engine-level FK PRAGMA is applied for a fresh connection. + +### Manual + +- Run `tgenie db restore --in backup.sql` on an existing DB and confirm prompt. +- Run `tgenie db restore --in backup.sql` on a missing DB and confirm no prompt. +- Start the API and confirm `~/.taskgenie/data/chroma` is created. + +## Notes / Risks / Open Questions + +- N/A. diff --git a/docs/02-implementation/pr-specs/TEMPLATE.md b/docs/02-implementation/pr-specs/TEMPLATE.md index ff9a526..f503352 100644 --- a/docs/02-implementation/pr-specs/TEMPLATE.md +++ b/docs/02-implementation/pr-specs/TEMPLATE.md @@ -2,7 +2,7 @@ **Status:** Spec Only **Depends on:** PR-XXX -**Last Reviewed:** 2025-12-29 +**Last Reviewed:** YYYY-MM-DD ## Goal @@ -20,14 +20,16 @@ ## Mini-Specs -- Bullet list of concrete deliverables (sub-features) to ship in this PR. +- Concrete deliverables for this PR. ## User Stories -- As a user, I can … +- As a user, I can ... ## UX Notes (if applicable) +- N/A. + ## Technical Design ### Architecture @@ -36,7 +38,9 @@ ### API Contract -### Background Jobs (if applicable) +### Background Jobs + +- N/A. ### Security / Privacy @@ -44,6 +48,16 @@ ## Acceptance Criteria +### AC1: + +**Success Criteria:** +- [ ] ... + +### AC2: + +**Success Criteria:** +- [ ] ... + ## Test Plan ### Automated diff --git a/scripts/check_docs.py b/scripts/check_docs.py index b6f8400..7eed5a6 100644 --- a/scripts/check_docs.py +++ b/scripts/check_docs.py @@ -31,7 +31,12 @@ def check_relative_links(markdown_files: list[Path]) -> list[str]: for md in markdown_files: text = md.read_text(encoding="utf-8") - for raw_target in LINK_RE.findall(text): + + # Remove code blocks to avoid false positives from code examples + code_block_pattern = re.compile(r"```[\s\S]*?```", re.MULTILINE) + text_without_code = code_block_pattern.sub("", text) + + for raw_target in LINK_RE.findall(text_without_code): target = raw_target.strip() if not target or target.startswith("#"): continue