Skip to content

Commit 0b9941d

Browse files
author
Eric Oliver
committed
lots of work on cleanup, SSE and buffering messages.
1 parent 1f14bba commit 0b9941d

32 files changed

+4920
-187
lines changed

docs/product-stories/api-ask.md

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
# PRD: API Ask Functionality
2+
3+
## Executive Summary
4+
5+
The API needs to support interactive questions from the LLM when running as an API server. This enables the agent to ask clarifying questions during task execution, just like it does in VSCode and CLI modes. The implementation must support asynchronous, multi-channel interaction patterns to eventually support SMS, email, IM, and other communication channels.
6+
7+
**Critical Design Principle**: This system is designed for long-lived workflow tasks where users operate on their own timeline. Users may go home, take vacations, or travel, and return days or weeks later to continue their tasks. Questions should persist indefinitely without timeout deletion, and the system should support task state persistence and resumption.
8+
9+
## Problem Statement
10+
11+
Currently, when tasks run via the API server with SSE streaming, the [`askFollowupQuestionTool`](src/core/tools/askFollowupQuestionTool.ts:6) cannot properly ask questions and wait for responses. The [`SSEOutputAdapter`](src/api/streaming/SSEOutputAdapter.ts:86) just returns default values, making the agent unable to gather additional information needed to complete tasks effectively.
12+
13+
## Goals
14+
15+
### Primary Goals
16+
17+
- **Enable Interactive Tasks**: Allow the LLM to ask questions and receive responses during API task execution
18+
- **Maintain Async Architecture**: Keep ask/answer channels decoupled for future multi-channel support
19+
- **Preserve User Experience**: Ensure the existing [`test-api.js`](test-api.js:1) client works seamlessly
20+
- **Support Blocking Behavior**: Task execution should pause until response is received
21+
22+
### Secondary Goals
23+
24+
- **Prepare for Multi-Channel**: Design architecture to support future SMS, email, IM channels
25+
- **Timeout Handling**: Questions should timeout gracefully if no response received
26+
- **Question History**: Maintain audit trail of questions and responses
27+
- **Transport Agnostic**: Support future WebSocket adapters alongside SSE
28+
29+
## User Stories
30+
31+
### Story 1: Basic Ask Question Flow
32+
33+
**As a** API client
34+
**I want** to receive questions from the LLM via SSE
35+
**So that** I can provide additional information to help complete tasks
36+
37+
**Acceptance Criteria:**
38+
39+
- Questions are sent via SSE with unique question ID
40+
- Questions include the question text and suggested answers
41+
- Question SSE events have all necessary metadata (timestamp, jobId, questionId)
42+
- Client receives question immediately when LLM asks
43+
44+
### Story 2: Answer Submission
45+
46+
**As a** API client
47+
**I want** to submit answers to questions via HTTP POST
48+
**So that** the LLM can continue task execution with my input
49+
50+
**Acceptance Criteria:**
51+
52+
- New `/api/questions/{questionId}/answer` endpoint accepts POST requests
53+
- Endpoint validates question ID exists and is pending
54+
- Answer submission unblocks the waiting task immediately
55+
- Invalid question IDs return appropriate error responses
56+
57+
### Story 3: Task Blocking and Resumption
58+
59+
**As a** task execution system
60+
**I want** to pause task execution when questions are asked
61+
**So that** the LLM waits for user input before continuing
62+
63+
**Acceptance Criteria:**
64+
65+
- Task execution blocks when `askFollowupQuestionTool` is invoked
66+
- Task resumes immediately when answer is received
67+
- Blocking mechanism supports concurrent questions across different jobs
68+
- Task state is preserved during blocking period
69+
70+
### Story 4: Test Client Integration
71+
72+
**As a** developer using [`test-api.js`](test-api.js:1)
73+
**I want** to automatically handle questions during task execution
74+
**So that** I can interact with the agent seamlessly
75+
76+
**Acceptance Criteria:**
77+
78+
- Test client detects question SSE events
79+
- Client prompts user for input via command line
80+
- Client automatically submits answer via POST endpoint
81+
- Question/answer flow integrates smoothly with existing streaming output
82+
83+
### Story 5: Long-lived Question Management
84+
85+
**As a** task execution system
86+
**I want** to support long-lived questions without forced timeouts
87+
**So that** users can work on their own timeline without losing progress
88+
89+
**Acceptance Criteria:**
90+
91+
- Questions persist indefinitely without automatic deletion
92+
- Configurable timeout for optional automated responses (default: disabled)
93+
- Task state can be preserved during long question periods
94+
- Questions remain available even if SSE connection is lost
95+
- System supports task resumption after extended periods
96+
97+
### Story 6: Persistent Question State Management
98+
99+
**As a** API server
100+
**I want** to track question state and history with persistence
101+
**So that** I can manage long-lived questions and support task resumption
102+
103+
**Acceptance Criteria:**
104+
105+
- Questions have states: pending, answered, expired (optional), cancelled
106+
- Question store supports concurrent questions across multiple jobs
107+
- Question history is persisted to disk for long-term retention
108+
- Manual cleanup mechanisms for completed questions (no automatic deletion)
109+
- Question state survives server restarts and connection losses
110+
- Integration with existing task state persistence mechanisms
111+
112+
## Technical Architecture
113+
114+
### Core Components
115+
116+
#### 1. Question Manager (`ApiQuestionManager`)
117+
118+
- Manages question lifecycle and persistent state
119+
- Generates unique question IDs with long-term stability
120+
- Handles optional expiration (disabled by default)
121+
- Supports concurrent questions across jobs
122+
- Persists question state to disk for durability
123+
- Integrates with task checkpoint/resumption system
124+
125+
#### 2. Enhanced SSE Output Adapter
126+
127+
- Extends current [`SSEOutputAdapter`](src/api/streaming/SSEOutputAdapter.ts:21) to support long-lived blocking questions
128+
- Implements `IUserInterface.askQuestion()` with Promise-based blocking that can persist across sessions
129+
- Emits question SSE events with unique IDs
130+
- Integrates with Question Manager for persistent question state
131+
- Supports task suspension and resumption when questions are pending
132+
133+
#### 3. Question API Endpoints
134+
135+
- `POST /api/questions/{questionId}/answer` - Submit answer
136+
- `GET /api/questions/{questionId}` - Get question status (optional)
137+
- `GET /api/questions` - List pending questions for job (optional)
138+
139+
#### 4. Enhanced Test Client
140+
141+
- Extend [`test-api.js`](test-api.js:289) to handle question SSE events
142+
- Interactive command-line prompting for questions
143+
- Automatic answer submission via HTTP POST
144+
- Maintain existing streaming behavior
145+
146+
### Data Flow Diagram
147+
148+
```mermaid
149+
sequenceDiagram
150+
participant LLM
151+
participant Task
152+
participant SSEAdapter
153+
participant QuestionMgr
154+
participant SSEStream
155+
participant Client
156+
participant AnswerAPI
157+
158+
LLM->>Task: askFollowupQuestionTool()
159+
Task->>SSEAdapter: askQuestion()
160+
SSEAdapter->>QuestionMgr: createQuestion()
161+
QuestionMgr-->>SSEAdapter: questionId + Promise
162+
SSEAdapter->>SSEStream: emit question event
163+
SSEStream->>Client: SSE: question event
164+
Client->>Client: prompt user
165+
Client->>AnswerAPI: POST /questions/{id}/answer
166+
AnswerAPI->>QuestionMgr: submitAnswer()
167+
QuestionMgr->>QuestionMgr: resolve Promise
168+
SSEAdapter->>Task: return answer
169+
Task->>LLM: continue with answer
170+
```
171+
172+
### Integration Points
173+
174+
#### Current System Integration
175+
176+
- **Fastify Server**: Add new question endpoints to [`FastifyServer.ts`](src/api/server/FastifyServer.ts:18)
177+
- **SSE Types**: Extend [`types.ts`](src/api/streaming/types.ts:8) with question event types
178+
- **Task System**: Modify Task creation to use enhanced SSE adapter
179+
180+
#### Future Extension Points
181+
182+
- **Multi-Channel Support**: Question Manager can support multiple output adapters
183+
- **WebSocket Support**: Alternative transport can use same Question Manager
184+
- **External Integrations**: SMS/Email services can integrate via Question Manager API
185+
186+
## Implementation Stories
187+
188+
### Epic 1: Core Question Infrastructure
189+
190+
**Story 1.1**: Create `ApiQuestionManager` class with question lifecycle management
191+
**Story 1.2**: Extend SSE event types and [`SSEOutputAdapter`](src/api/streaming/SSEOutputAdapter.ts:21) for blocking questions
192+
**Story 1.3**: Add question-related API endpoints to [`FastifyServer`](src/api/server/FastifyServer.ts:18)
193+
194+
### Epic 2: Client Integration
195+
196+
**Story 2.1**: Enhance [`test-api.js`](test-api.js:1) to detect and handle question events
197+
**Story 2.2**: Implement interactive command-line question prompting
198+
**Story 2.3**: Add automatic answer submission functionality
199+
200+
### Epic 3: Persistence and Long-lived Task Support
201+
202+
**Story 3.1**: Implement persistent question storage with disk-based state
203+
**Story 3.2**: Add comprehensive error handling for invalid questions/answers
204+
**Story 3.3**: Create question state management with manual cleanup options
205+
**Story 3.4**: Integrate with existing task checkpoint/resumption system
206+
207+
### Epic 4: Testing and Documentation
208+
209+
**Story 4.1**: Create comprehensive unit tests for question functionality
210+
**Story 4.2**: Add integration tests with actual LLM question scenarios
211+
**Story 4.3**: Update API documentation and examples
212+
213+
## Success Metrics
214+
215+
### Functional Metrics
216+
217+
- ✅ LLM can ask questions and receive answers during API task execution
218+
-[`test-api.js`](test-api.js:1) client handles questions seamlessly
219+
- ✅ Questions persist across server restarts and connection losses
220+
- ✅ Multiple concurrent questions work across different jobs
221+
- ✅ Tasks can be suspended and resumed with pending questions intact
222+
223+
### Performance Metrics
224+
225+
- ⏱️ Question-to-answer roundtrip time < 1 second (excluding user think time)
226+
- 🔄 Support for 100+ concurrent long-lived questions across jobs
227+
- 💾 Memory usage remains constant with persistent disk-based storage
228+
- 🗃️ Question storage scales to thousands of persistent questions
229+
230+
### User Experience Metrics
231+
232+
- 🎯 Zero breaking changes to existing [`test-api.js`](test-api.js:1) usage
233+
- 📱 Question/answer flow feels natural and responsive
234+
- 🛠️ Clear error messages for debugging question issues
235+
236+
## Risk Mitigation
237+
238+
### Technical Risks
239+
240+
- **Resource Exhaustion**: Disk-based storage and manual cleanup prevent memory issues
241+
- **State Corruption**: Transactional question state updates with rollback capability
242+
- **Race Conditions**: Thread-safe question state management with file locking
243+
- **Long-term Storage**: Monitoring and alerting for question storage growth
244+
245+
### Integration Risks
246+
247+
- **Breaking Changes**: Maintain backward compatibility with existing SSE streaming
248+
- **Client Complexity**: Keep question handling optional and graceful degradation
249+
- **Performance Impact**: Minimal overhead when questions are not used
250+
251+
## Future Considerations
252+
253+
### Multi-Channel Vision
254+
255+
The architecture is designed to support the eventual multi-channel vision:
256+
257+
- **Output Channels**: SMS, Email, Slack, Teams, etc. for sending questions
258+
- **Input Channels**: Web forms, IM bots, voice interfaces for receiving answers
259+
- **Channel Bridging**: Questions sent via one channel can be answered via another
260+
- **Async by Design**: Long delays between questions and answers are supported
261+
262+
### Advanced Features
263+
264+
- **Question Threading**: Related questions grouped together
265+
- **Rich Media**: Questions with images, files, complex UI elements
266+
- **Collaborative Answers**: Multiple users contributing to complex questions
267+
- **Question Analytics**: Insights into question patterns and user behavior
268+
269+
---
270+
271+
_This PRD serves as the foundation for implementing interactive API task execution while maintaining the flexibility needed for future multi-channel communication scenarios._

0 commit comments

Comments
 (0)