|
| 1 | +# PRD: API Ask Functionality |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +The API needs to support interactive questions from the LLM when running as an API server. This enables the agent to ask clarifying questions during task execution, just like it does in VSCode and CLI modes. The implementation must support asynchronous, multi-channel interaction patterns to eventually support SMS, email, IM, and other communication channels. |
| 6 | + |
| 7 | +**Critical Design Principle**: This system is designed for long-lived workflow tasks where users operate on their own timeline. Users may go home, take vacations, or travel, and return days or weeks later to continue their tasks. Questions should persist indefinitely without timeout deletion, and the system should support task state persistence and resumption. |
| 8 | + |
| 9 | +## Problem Statement |
| 10 | + |
| 11 | +Currently, when tasks run via the API server with SSE streaming, the [`askFollowupQuestionTool`](src/core/tools/askFollowupQuestionTool.ts:6) cannot properly ask questions and wait for responses. The [`SSEOutputAdapter`](src/api/streaming/SSEOutputAdapter.ts:86) just returns default values, making the agent unable to gather additional information needed to complete tasks effectively. |
| 12 | + |
| 13 | +## Goals |
| 14 | + |
| 15 | +### Primary Goals |
| 16 | + |
| 17 | +- **Enable Interactive Tasks**: Allow the LLM to ask questions and receive responses during API task execution |
| 18 | +- **Maintain Async Architecture**: Keep ask/answer channels decoupled for future multi-channel support |
| 19 | +- **Preserve User Experience**: Ensure the existing [`test-api.js`](test-api.js:1) client works seamlessly |
| 20 | +- **Support Blocking Behavior**: Task execution should pause until response is received |
| 21 | + |
| 22 | +### Secondary Goals |
| 23 | + |
| 24 | +- **Prepare for Multi-Channel**: Design architecture to support future SMS, email, IM channels |
| 25 | +- **Timeout Handling**: Questions should timeout gracefully if no response received |
| 26 | +- **Question History**: Maintain audit trail of questions and responses |
| 27 | +- **Transport Agnostic**: Support future WebSocket adapters alongside SSE |
| 28 | + |
| 29 | +## User Stories |
| 30 | + |
| 31 | +### Story 1: Basic Ask Question Flow |
| 32 | + |
| 33 | +**As a** API client |
| 34 | +**I want** to receive questions from the LLM via SSE |
| 35 | +**So that** I can provide additional information to help complete tasks |
| 36 | + |
| 37 | +**Acceptance Criteria:** |
| 38 | + |
| 39 | +- Questions are sent via SSE with unique question ID |
| 40 | +- Questions include the question text and suggested answers |
| 41 | +- Question SSE events have all necessary metadata (timestamp, jobId, questionId) |
| 42 | +- Client receives question immediately when LLM asks |
| 43 | + |
| 44 | +### Story 2: Answer Submission |
| 45 | + |
| 46 | +**As a** API client |
| 47 | +**I want** to submit answers to questions via HTTP POST |
| 48 | +**So that** the LLM can continue task execution with my input |
| 49 | + |
| 50 | +**Acceptance Criteria:** |
| 51 | + |
| 52 | +- New `/api/questions/{questionId}/answer` endpoint accepts POST requests |
| 53 | +- Endpoint validates question ID exists and is pending |
| 54 | +- Answer submission unblocks the waiting task immediately |
| 55 | +- Invalid question IDs return appropriate error responses |
| 56 | + |
| 57 | +### Story 3: Task Blocking and Resumption |
| 58 | + |
| 59 | +**As a** task execution system |
| 60 | +**I want** to pause task execution when questions are asked |
| 61 | +**So that** the LLM waits for user input before continuing |
| 62 | + |
| 63 | +**Acceptance Criteria:** |
| 64 | + |
| 65 | +- Task execution blocks when `askFollowupQuestionTool` is invoked |
| 66 | +- Task resumes immediately when answer is received |
| 67 | +- Blocking mechanism supports concurrent questions across different jobs |
| 68 | +- Task state is preserved during blocking period |
| 69 | + |
| 70 | +### Story 4: Test Client Integration |
| 71 | + |
| 72 | +**As a** developer using [`test-api.js`](test-api.js:1) |
| 73 | +**I want** to automatically handle questions during task execution |
| 74 | +**So that** I can interact with the agent seamlessly |
| 75 | + |
| 76 | +**Acceptance Criteria:** |
| 77 | + |
| 78 | +- Test client detects question SSE events |
| 79 | +- Client prompts user for input via command line |
| 80 | +- Client automatically submits answer via POST endpoint |
| 81 | +- Question/answer flow integrates smoothly with existing streaming output |
| 82 | + |
| 83 | +### Story 5: Long-lived Question Management |
| 84 | + |
| 85 | +**As a** task execution system |
| 86 | +**I want** to support long-lived questions without forced timeouts |
| 87 | +**So that** users can work on their own timeline without losing progress |
| 88 | + |
| 89 | +**Acceptance Criteria:** |
| 90 | + |
| 91 | +- Questions persist indefinitely without automatic deletion |
| 92 | +- Configurable timeout for optional automated responses (default: disabled) |
| 93 | +- Task state can be preserved during long question periods |
| 94 | +- Questions remain available even if SSE connection is lost |
| 95 | +- System supports task resumption after extended periods |
| 96 | + |
| 97 | +### Story 6: Persistent Question State Management |
| 98 | + |
| 99 | +**As a** API server |
| 100 | +**I want** to track question state and history with persistence |
| 101 | +**So that** I can manage long-lived questions and support task resumption |
| 102 | + |
| 103 | +**Acceptance Criteria:** |
| 104 | + |
| 105 | +- Questions have states: pending, answered, expired (optional), cancelled |
| 106 | +- Question store supports concurrent questions across multiple jobs |
| 107 | +- Question history is persisted to disk for long-term retention |
| 108 | +- Manual cleanup mechanisms for completed questions (no automatic deletion) |
| 109 | +- Question state survives server restarts and connection losses |
| 110 | +- Integration with existing task state persistence mechanisms |
| 111 | + |
| 112 | +## Technical Architecture |
| 113 | + |
| 114 | +### Core Components |
| 115 | + |
| 116 | +#### 1. Question Manager (`ApiQuestionManager`) |
| 117 | + |
| 118 | +- Manages question lifecycle and persistent state |
| 119 | +- Generates unique question IDs with long-term stability |
| 120 | +- Handles optional expiration (disabled by default) |
| 121 | +- Supports concurrent questions across jobs |
| 122 | +- Persists question state to disk for durability |
| 123 | +- Integrates with task checkpoint/resumption system |
| 124 | + |
| 125 | +#### 2. Enhanced SSE Output Adapter |
| 126 | + |
| 127 | +- Extends current [`SSEOutputAdapter`](src/api/streaming/SSEOutputAdapter.ts:21) to support long-lived blocking questions |
| 128 | +- Implements `IUserInterface.askQuestion()` with Promise-based blocking that can persist across sessions |
| 129 | +- Emits question SSE events with unique IDs |
| 130 | +- Integrates with Question Manager for persistent question state |
| 131 | +- Supports task suspension and resumption when questions are pending |
| 132 | + |
| 133 | +#### 3. Question API Endpoints |
| 134 | + |
| 135 | +- `POST /api/questions/{questionId}/answer` - Submit answer |
| 136 | +- `GET /api/questions/{questionId}` - Get question status (optional) |
| 137 | +- `GET /api/questions` - List pending questions for job (optional) |
| 138 | + |
| 139 | +#### 4. Enhanced Test Client |
| 140 | + |
| 141 | +- Extend [`test-api.js`](test-api.js:289) to handle question SSE events |
| 142 | +- Interactive command-line prompting for questions |
| 143 | +- Automatic answer submission via HTTP POST |
| 144 | +- Maintain existing streaming behavior |
| 145 | + |
| 146 | +### Data Flow Diagram |
| 147 | + |
| 148 | +```mermaid |
| 149 | +sequenceDiagram |
| 150 | + participant LLM |
| 151 | + participant Task |
| 152 | + participant SSEAdapter |
| 153 | + participant QuestionMgr |
| 154 | + participant SSEStream |
| 155 | + participant Client |
| 156 | + participant AnswerAPI |
| 157 | +
|
| 158 | + LLM->>Task: askFollowupQuestionTool() |
| 159 | + Task->>SSEAdapter: askQuestion() |
| 160 | + SSEAdapter->>QuestionMgr: createQuestion() |
| 161 | + QuestionMgr-->>SSEAdapter: questionId + Promise |
| 162 | + SSEAdapter->>SSEStream: emit question event |
| 163 | + SSEStream->>Client: SSE: question event |
| 164 | + Client->>Client: prompt user |
| 165 | + Client->>AnswerAPI: POST /questions/{id}/answer |
| 166 | + AnswerAPI->>QuestionMgr: submitAnswer() |
| 167 | + QuestionMgr->>QuestionMgr: resolve Promise |
| 168 | + SSEAdapter->>Task: return answer |
| 169 | + Task->>LLM: continue with answer |
| 170 | +``` |
| 171 | + |
| 172 | +### Integration Points |
| 173 | + |
| 174 | +#### Current System Integration |
| 175 | + |
| 176 | +- **Fastify Server**: Add new question endpoints to [`FastifyServer.ts`](src/api/server/FastifyServer.ts:18) |
| 177 | +- **SSE Types**: Extend [`types.ts`](src/api/streaming/types.ts:8) with question event types |
| 178 | +- **Task System**: Modify Task creation to use enhanced SSE adapter |
| 179 | + |
| 180 | +#### Future Extension Points |
| 181 | + |
| 182 | +- **Multi-Channel Support**: Question Manager can support multiple output adapters |
| 183 | +- **WebSocket Support**: Alternative transport can use same Question Manager |
| 184 | +- **External Integrations**: SMS/Email services can integrate via Question Manager API |
| 185 | + |
| 186 | +## Implementation Stories |
| 187 | + |
| 188 | +### Epic 1: Core Question Infrastructure |
| 189 | + |
| 190 | +**Story 1.1**: Create `ApiQuestionManager` class with question lifecycle management |
| 191 | +**Story 1.2**: Extend SSE event types and [`SSEOutputAdapter`](src/api/streaming/SSEOutputAdapter.ts:21) for blocking questions |
| 192 | +**Story 1.3**: Add question-related API endpoints to [`FastifyServer`](src/api/server/FastifyServer.ts:18) |
| 193 | + |
| 194 | +### Epic 2: Client Integration |
| 195 | + |
| 196 | +**Story 2.1**: Enhance [`test-api.js`](test-api.js:1) to detect and handle question events |
| 197 | +**Story 2.2**: Implement interactive command-line question prompting |
| 198 | +**Story 2.3**: Add automatic answer submission functionality |
| 199 | + |
| 200 | +### Epic 3: Persistence and Long-lived Task Support |
| 201 | + |
| 202 | +**Story 3.1**: Implement persistent question storage with disk-based state |
| 203 | +**Story 3.2**: Add comprehensive error handling for invalid questions/answers |
| 204 | +**Story 3.3**: Create question state management with manual cleanup options |
| 205 | +**Story 3.4**: Integrate with existing task checkpoint/resumption system |
| 206 | + |
| 207 | +### Epic 4: Testing and Documentation |
| 208 | + |
| 209 | +**Story 4.1**: Create comprehensive unit tests for question functionality |
| 210 | +**Story 4.2**: Add integration tests with actual LLM question scenarios |
| 211 | +**Story 4.3**: Update API documentation and examples |
| 212 | + |
| 213 | +## Success Metrics |
| 214 | + |
| 215 | +### Functional Metrics |
| 216 | + |
| 217 | +- ✅ LLM can ask questions and receive answers during API task execution |
| 218 | +- ✅ [`test-api.js`](test-api.js:1) client handles questions seamlessly |
| 219 | +- ✅ Questions persist across server restarts and connection losses |
| 220 | +- ✅ Multiple concurrent questions work across different jobs |
| 221 | +- ✅ Tasks can be suspended and resumed with pending questions intact |
| 222 | + |
| 223 | +### Performance Metrics |
| 224 | + |
| 225 | +- ⏱️ Question-to-answer roundtrip time < 1 second (excluding user think time) |
| 226 | +- 🔄 Support for 100+ concurrent long-lived questions across jobs |
| 227 | +- 💾 Memory usage remains constant with persistent disk-based storage |
| 228 | +- 🗃️ Question storage scales to thousands of persistent questions |
| 229 | + |
| 230 | +### User Experience Metrics |
| 231 | + |
| 232 | +- 🎯 Zero breaking changes to existing [`test-api.js`](test-api.js:1) usage |
| 233 | +- 📱 Question/answer flow feels natural and responsive |
| 234 | +- 🛠️ Clear error messages for debugging question issues |
| 235 | + |
| 236 | +## Risk Mitigation |
| 237 | + |
| 238 | +### Technical Risks |
| 239 | + |
| 240 | +- **Resource Exhaustion**: Disk-based storage and manual cleanup prevent memory issues |
| 241 | +- **State Corruption**: Transactional question state updates with rollback capability |
| 242 | +- **Race Conditions**: Thread-safe question state management with file locking |
| 243 | +- **Long-term Storage**: Monitoring and alerting for question storage growth |
| 244 | + |
| 245 | +### Integration Risks |
| 246 | + |
| 247 | +- **Breaking Changes**: Maintain backward compatibility with existing SSE streaming |
| 248 | +- **Client Complexity**: Keep question handling optional and graceful degradation |
| 249 | +- **Performance Impact**: Minimal overhead when questions are not used |
| 250 | + |
| 251 | +## Future Considerations |
| 252 | + |
| 253 | +### Multi-Channel Vision |
| 254 | + |
| 255 | +The architecture is designed to support the eventual multi-channel vision: |
| 256 | + |
| 257 | +- **Output Channels**: SMS, Email, Slack, Teams, etc. for sending questions |
| 258 | +- **Input Channels**: Web forms, IM bots, voice interfaces for receiving answers |
| 259 | +- **Channel Bridging**: Questions sent via one channel can be answered via another |
| 260 | +- **Async by Design**: Long delays between questions and answers are supported |
| 261 | + |
| 262 | +### Advanced Features |
| 263 | + |
| 264 | +- **Question Threading**: Related questions grouped together |
| 265 | +- **Rich Media**: Questions with images, files, complex UI elements |
| 266 | +- **Collaborative Answers**: Multiple users contributing to complex questions |
| 267 | +- **Question Analytics**: Insights into question patterns and user behavior |
| 268 | + |
| 269 | +--- |
| 270 | + |
| 271 | +_This PRD serves as the foundation for implementing interactive API task execution while maintaining the flexibility needed for future multi-channel communication scenarios._ |
0 commit comments