|
| 1 | +# OCR Workaround Implementation - Detailed Agent Instructions |
| 2 | + |
| 3 | +## Project Context |
| 4 | +Faxbot currently has a severe UX limitation: MCP servers require base64-encoded file content, making large PDFs (>1MB) fail due to token consumption. The solution is to implement MCP prompts that automatically extract text from PDFs instead of sending the entire file as base64. |
| 5 | + |
| 6 | +## Current Project Structure (CRITICAL - DO NOT BREAK THIS) |
| 7 | +``` |
| 8 | +/Users/davidmontgomery/faxbot/ |
| 9 | +├── api/ # Main FastAPI service + Node MCP servers |
| 10 | +│ ├── app/ # FastAPI Python code |
| 11 | +│ ├── mcp_server.js # Node MCP stdio server |
| 12 | +│ ├── mcp_http_server.js # Node MCP HTTP server |
| 13 | +│ ├── mcp_sse_server.js # Node MCP SSE+OAuth server |
| 14 | +│ ├── package.json # Node dependencies |
| 15 | +│ └── setup-mcp.js # MCP installer script |
| 16 | +├── python_mcp/ # Python MCP servers (EXISTING - DO NOT TOUCH) |
| 17 | +│ ├── stdio_server.py |
| 18 | +│ ├── http_server.py |
| 19 | +│ ├── server.py |
| 20 | +│ └── requirements.txt |
| 21 | +├── sdks/ # Client SDKs |
| 22 | +│ ├── node/ |
| 23 | +│ └── python/ |
| 24 | +└── docs/ # Documentation |
| 25 | +``` |
| 26 | + |
| 27 | +## Proposed Structure Addition (NEW - CREATE THIS) |
| 28 | +``` |
| 29 | +/Users/davidmontgomery/faxbot/ |
| 30 | +├── node_mcp/ # NEW: Organized Node MCP servers |
| 31 | +│ ├── src/ # Source code |
| 32 | +│ │ ├── servers/ # Individual server implementations |
| 33 | +│ │ │ ├── stdio.js # Stdio transport server |
| 34 | +│ │ │ ├── http.js # HTTP transport server |
| 35 | +│ │ │ └── sse.js # SSE+OAuth transport server |
| 36 | +│ │ ├── prompts/ # MCP prompt definitions |
| 37 | +│ │ │ ├── faxbot.js # Faxbot prompts (OCR workflow) |
| 38 | +│ │ │ └── index.js # Prompt registry |
| 39 | +│ │ ├── tools/ # MCP tool implementations |
| 40 | +│ │ │ ├── fax-tools.js # send_fax, get_fax_status |
| 41 | +│ │ │ └── pdf-tools.js # extract_pdf_text (internal) |
| 42 | +│ │ └── shared/ # Shared utilities |
| 43 | +│ │ ├── pdf-extractor.js # PDF text extraction logic |
| 44 | +│ │ └── fax-client.js # Faxbot API client |
| 45 | +│ ├── package.json # Dependencies (pdf-parse, etc.) |
| 46 | +│ ├── README.md # Node MCP documentation |
| 47 | +│ └── scripts/ # Build/run scripts |
| 48 | +│ ├── start-stdio.sh |
| 49 | +│ ├── start-http.sh |
| 50 | +│ └── start-sse.sh |
| 51 | +``` |
| 52 | + |
| 53 | +## Implementation Tasks |
| 54 | + |
| 55 | +### Phase 1: Create New Structure |
| 56 | +1. **Create /node_mcp directory structure** |
| 57 | + - All directories and subdirectories as shown above |
| 58 | + - DO NOT modify anything in /api directory yet |
| 59 | + - This is a clean slate implementation |
| 60 | + |
| 61 | +2. **Initialize package.json in /node_mcp** |
| 62 | + ```json |
| 63 | + { |
| 64 | + "name": "faxbot-node-mcp", |
| 65 | + "version": "1.0.0", |
| 66 | + "description": "Node.js MCP servers for Faxbot with OCR workflow support", |
| 67 | + "main": "src/servers/stdio.js", |
| 68 | + "scripts": { |
| 69 | + "stdio": "node src/servers/stdio.js", |
| 70 | + "http": "node src/servers/http.js", |
| 71 | + "sse": "node src/servers/sse.js" |
| 72 | + }, |
| 73 | + "dependencies": { |
| 74 | + "@modelcontextprotocol/sdk": "^1.17.5", |
| 75 | + "axios": "^1.7.0", |
| 76 | + "form-data": "^4.0.0", |
| 77 | + "pdf-parse": "^1.1.1", |
| 78 | + "fs": "^0.0.1-security", |
| 79 | + "path": "^0.12.7" |
| 80 | + } |
| 81 | + } |
| 82 | + ``` |
| 83 | + |
| 84 | +### Phase 2: Implement Core Utilities |
| 85 | + |
| 86 | +3. **Create /node_mcp/src/shared/pdf-extractor.js** |
| 87 | + - Import pdf-parse library |
| 88 | + - Function: `extractTextFromPDF(filePath)` |
| 89 | + - Function: `extractTextFromBuffer(buffer)` |
| 90 | + - Error handling for corrupted PDFs |
| 91 | + - Return cleaned text (remove excessive whitespace, format nicely) |
| 92 | + |
| 93 | +4. **Create /node_mcp/src/shared/fax-client.js** |
| 94 | + - Axios-based client for Faxbot API |
| 95 | + - Functions: `sendFax(to, content, type)`, `getFaxStatus(jobId)` |
| 96 | + - Handle API authentication (X-API-Key header) |
| 97 | + - Base URL from environment variable |
| 98 | + |
| 99 | +### Phase 3: Implement MCP Tools |
| 100 | + |
| 101 | +5. **Create /node_mcp/src/tools/pdf-tools.js** |
| 102 | + - MCP tool: `extract_pdf_text` |
| 103 | + - Input schema: `{ filePath: string }` |
| 104 | + - Uses pdf-extractor.js internally |
| 105 | + - This is INTERNAL tool, not exposed to user |
| 106 | + |
| 107 | +6. **Create /node_mcp/src/tools/fax-tools.js** |
| 108 | + - MCP tools: `send_fax`, `get_fax_status` (existing tools) |
| 109 | + - Move logic from current /api/mcp_server.js |
| 110 | + - Clean up and organize properly |
| 111 | + |
| 112 | +### Phase 4: Implement MCP Prompts (THE KEY FEATURE) |
| 113 | + |
| 114 | +7. **Create /node_mcp/src/prompts/faxbot.js** |
| 115 | + ```javascript |
| 116 | + const FAXBOT_PROMPTS = { |
| 117 | + "faxbot_pdf": { |
| 118 | + name: "faxbot_pdf", |
| 119 | + description: "Extract text from PDF and send as fax (avoids base64 token limits)", |
| 120 | + arguments: [ |
| 121 | + { |
| 122 | + name: "pdf_path", |
| 123 | + description: "Absolute path to PDF file", |
| 124 | + required: true |
| 125 | + }, |
| 126 | + { |
| 127 | + name: "to", |
| 128 | + description: "Fax number (E.164 format preferred)", |
| 129 | + required: true |
| 130 | + }, |
| 131 | + { |
| 132 | + name: "header_text", |
| 133 | + description: "Optional header text to add", |
| 134 | + required: false |
| 135 | + } |
| 136 | + ] |
| 137 | + } |
| 138 | + }; |
| 139 | + ``` |
| 140 | + |
| 141 | +8. **Create /node_mcp/src/prompts/index.js** |
| 142 | + - Export all prompt definitions |
| 143 | + - Registry pattern for easy expansion |
| 144 | + |
| 145 | +### Phase 5: Implement MCP Servers |
| 146 | + |
| 147 | +9. **Create /node_mcp/src/servers/stdio.js** |
| 148 | + - Copy structure from /api/mcp_server.js |
| 149 | + - Add ListPromptsRequestSchema, GetPromptRequestSchema handlers |
| 150 | + - Add prompt execution logic for faxbot_pdf |
| 151 | + - Import tools and prompts from organized modules |
| 152 | + |
| 153 | +10. **Create /node_mcp/src/servers/http.js** |
| 154 | + - Copy structure from /api/mcp_http_server.js |
| 155 | + - Add same prompt support as stdio server |
| 156 | + - Maintain HTTP transport functionality |
| 157 | + |
| 158 | +11. **Create /node_mcp/src/servers/sse.js** |
| 159 | + - Copy structure from /api/mcp_sse_server.js |
| 160 | + - Add same prompt support as stdio server |
| 161 | + - Maintain OAuth2/JWT functionality |
| 162 | + |
| 163 | +### Phase 6: Prompt Execution Logic |
| 164 | + |
| 165 | +12. **Implement faxbot_pdf workflow in each server** |
| 166 | + ```javascript |
| 167 | + async function executeSmartFaxPdf(args) { |
| 168 | + // 1. Validate PDF file exists |
| 169 | + // 2. Extract text using pdf-extractor |
| 170 | + // 3. Format text nicely (add headers if provided) |
| 171 | + // 4. Send as TXT fax using fax-client |
| 172 | + // 5. Return job ID and confirmation |
| 173 | + // 6. Handle errors gracefully (file not found, extraction failed, etc.) |
| 174 | + } |
| 175 | + ``` |
| 176 | + |
| 177 | +### Phase 7: Testing & Integration |
| 178 | + |
| 179 | +13. **Create test scripts in /node_mcp/scripts/** |
| 180 | + - start-stdio.sh, start-http.sh, start-sse.sh |
| 181 | + - Test with small PDF, large PDF, corrupted PDF |
| 182 | + - Verify text extraction quality |
| 183 | + - Confirm fax transmission works |
| 184 | +
|
| 185 | +14. **Update documentation** |
| 186 | + - Create /node_mcp/README.md with usage examples |
| 187 | + - Update main project docs to reference new structure |
| 188 | + - Add migration guide from /api servers to /node_mcp servers |
| 189 | +
|
| 190 | +### Phase 8: Migration Path (CAREFUL) |
| 191 | +
|
| 192 | +15. **DO NOT DELETE /api MCP servers yet** |
| 193 | + - Keep them as fallback |
| 194 | + - Add deprecation notices |
| 195 | + - Update setup scripts to point to /node_mcp by default |
| 196 | + - Test extensively before considering removal |
| 197 | +
|
| 198 | +## Key Implementation Details |
| 199 | +
|
| 200 | +### MCP Prompt Handler Structure |
| 201 | +```javascript |
| 202 | +this.server.setRequestHandler(GetPromptRequestSchema, async (request) => { |
| 203 | + const { name, arguments: args } = request.params; |
| 204 | + |
| 205 | + switch (name) { |
| 206 | + case 'faxbot_pdf': |
| 207 | + // Extract text from PDF |
| 208 | + const text = await extractTextFromPDF(args.pdf_path); |
| 209 | + // Send as text fax |
| 210 | + const result = await sendFax(args.to, text, 'txt'); |
| 211 | + // Return formatted message for LLM |
| 212 | + return { |
| 213 | + messages: [ |
| 214 | + { |
| 215 | + role: 'user', |
| 216 | + content: { |
| 217 | + type: 'text', |
| 218 | + text: `Faxbot workflow initiated. PDF "${args.pdf_path}" extracted to ${text.length} characters. Fax job ID: ${result.jobId}` |
| 219 | + } |
| 220 | + } |
| 221 | + ] |
| 222 | + }; |
| 223 | + } |
| 224 | +}); |
| 225 | +``` |
| 226 | +
|
| 227 | +### Error Handling Requirements |
| 228 | +- File not found: Clear error message with file path |
| 229 | +- PDF extraction failed: Graceful fallback message |
| 230 | +- Fax API errors: Pass through original error |
| 231 | +- Large text extraction: Warn if >100KB of text |
| 232 | +
|
| 233 | +### Environment Variables |
| 234 | +- `FAX_API_URL`: Faxbot API endpoint (default: http://localhost:8080) |
| 235 | +- `API_KEY`: Faxbot API authentication key |
| 236 | +- `MAX_TEXT_SIZE`: Maximum extracted text size in bytes (default: 100000) |
| 237 | +
|
| 238 | +## Expected User Experience After Implementation |
| 239 | +
|
| 240 | +### Before (Broken): |
| 241 | +``` |
| 242 | +User: "Fax report.pdf to +1234567890" |
| 243 | +Claude: "I need to read the file first and encode it as base64..." |
| 244 | +Result: Token limit exceeded, fails |
| 245 | +``` |
| 246 | +
|
| 247 | +### After (Working): |
| 248 | +``` |
| 249 | +User: "Faxbot report.pdf to +1234567890" |
| 250 | +Claude: "I'll use the faxbot_pdf workflow to extract text and send it." |
| 251 | +Result: PDF text extracted, sent as text fax, succeeds |
| 252 | +``` |
| 253 | +
|
| 254 | +## Critical Success Criteria |
| 255 | +1. **File size handling**: 10MB PDF → ~100KB text (99% reduction) |
| 256 | +2. **Token efficiency**: No base64 encoding in conversation |
| 257 | +3. **Text fidelity**: Extracted text is readable and formatted |
| 258 | +4. **Error resilience**: Graceful failures with helpful messages |
| 259 | +5. **Backward compatibility**: Existing tools still work |
| 260 | +6. **Project structure**: Clean, organized, maintainable code |
| 261 | +
|
| 262 | +## What NOT To Do |
| 263 | +- DO NOT modify /api directory during initial implementation |
| 264 | +- DO NOT delete existing MCP servers until new ones are proven |
| 265 | +- DO NOT break existing functionality |
| 266 | +- DO NOT create files in random locations |
| 267 | +- DO NOT ignore error handling |
| 268 | +- DO NOT hardcode file paths or API endpoints |
| 269 | +- DO NOT add unnecessary dependencies |
| 270 | +
|
| 271 | +## Deliverables |
| 272 | +1. Complete /node_mcp directory structure |
| 273 | +2. Working MCP servers with prompt support |
| 274 | +3. PDF text extraction functionality |
| 275 | +4. Documentation and examples |
| 276 | +5. Test scripts and validation |
| 277 | +6. Migration guide |
| 278 | +
|
| 279 | +This implementation will solve the base64 limitation while maintaining clean project structure and providing a foundation for future MCP prompt workflows. |
0 commit comments