Skip to content

Commit c62ced2

Browse files
committed
docs: enhance MCP integration documentation with detailed transport options matrix and critical limitations regarding base64 file handling; update troubleshooting guide for common MCP issues
1 parent c0aee92 commit c62ced2

20 files changed

+2969
-69
lines changed

OCR_WORKAROUND_IMPLEMENTATION.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
# OCR Workaround Implementation - Detailed Agent Instructions
2+
3+
## Project Context
4+
Faxbot currently has a severe UX limitation: MCP servers require base64-encoded file content, making large PDFs (>1MB) fail due to token consumption. The solution is to implement MCP prompts that automatically extract text from PDFs instead of sending the entire file as base64.
5+
6+
## Current Project Structure (CRITICAL - DO NOT BREAK THIS)
7+
```
8+
/Users/davidmontgomery/faxbot/
9+
├── api/ # Main FastAPI service + Node MCP servers
10+
│ ├── app/ # FastAPI Python code
11+
│ ├── mcp_server.js # Node MCP stdio server
12+
│ ├── mcp_http_server.js # Node MCP HTTP server
13+
│ ├── mcp_sse_server.js # Node MCP SSE+OAuth server
14+
│ ├── package.json # Node dependencies
15+
│ └── setup-mcp.js # MCP installer script
16+
├── python_mcp/ # Python MCP servers (EXISTING - DO NOT TOUCH)
17+
│ ├── stdio_server.py
18+
│ ├── http_server.py
19+
│ ├── server.py
20+
│ └── requirements.txt
21+
├── sdks/ # Client SDKs
22+
│ ├── node/
23+
│ └── python/
24+
└── docs/ # Documentation
25+
```
26+
27+
## Proposed Structure Addition (NEW - CREATE THIS)
28+
```
29+
/Users/davidmontgomery/faxbot/
30+
├── node_mcp/ # NEW: Organized Node MCP servers
31+
│ ├── src/ # Source code
32+
│ │ ├── servers/ # Individual server implementations
33+
│ │ │ ├── stdio.js # Stdio transport server
34+
│ │ │ ├── http.js # HTTP transport server
35+
│ │ │ └── sse.js # SSE+OAuth transport server
36+
│ │ ├── prompts/ # MCP prompt definitions
37+
│ │ │ ├── faxbot.js # Faxbot prompts (OCR workflow)
38+
│ │ │ └── index.js # Prompt registry
39+
│ │ ├── tools/ # MCP tool implementations
40+
│ │ │ ├── fax-tools.js # send_fax, get_fax_status
41+
│ │ │ └── pdf-tools.js # extract_pdf_text (internal)
42+
│ │ └── shared/ # Shared utilities
43+
│ │ ├── pdf-extractor.js # PDF text extraction logic
44+
│ │ └── fax-client.js # Faxbot API client
45+
│ ├── package.json # Dependencies (pdf-parse, etc.)
46+
│ ├── README.md # Node MCP documentation
47+
│ └── scripts/ # Build/run scripts
48+
│ ├── start-stdio.sh
49+
│ ├── start-http.sh
50+
│ └── start-sse.sh
51+
```
52+
53+
## Implementation Tasks
54+
55+
### Phase 1: Create New Structure
56+
1. **Create /node_mcp directory structure**
57+
- All directories and subdirectories as shown above
58+
- DO NOT modify anything in /api directory yet
59+
- This is a clean slate implementation
60+
61+
2. **Initialize package.json in /node_mcp**
62+
```json
63+
{
64+
"name": "faxbot-node-mcp",
65+
"version": "1.0.0",
66+
"description": "Node.js MCP servers for Faxbot with OCR workflow support",
67+
"main": "src/servers/stdio.js",
68+
"scripts": {
69+
"stdio": "node src/servers/stdio.js",
70+
"http": "node src/servers/http.js",
71+
"sse": "node src/servers/sse.js"
72+
},
73+
"dependencies": {
74+
"@modelcontextprotocol/sdk": "^1.17.5",
75+
"axios": "^1.7.0",
76+
"form-data": "^4.0.0",
77+
"pdf-parse": "^1.1.1",
78+
"fs": "^0.0.1-security",
79+
"path": "^0.12.7"
80+
}
81+
}
82+
```
83+
84+
### Phase 2: Implement Core Utilities
85+
86+
3. **Create /node_mcp/src/shared/pdf-extractor.js**
87+
- Import pdf-parse library
88+
- Function: `extractTextFromPDF(filePath)`
89+
- Function: `extractTextFromBuffer(buffer)`
90+
- Error handling for corrupted PDFs
91+
- Return cleaned text (remove excessive whitespace, format nicely)
92+
93+
4. **Create /node_mcp/src/shared/fax-client.js**
94+
- Axios-based client for Faxbot API
95+
- Functions: `sendFax(to, content, type)`, `getFaxStatus(jobId)`
96+
- Handle API authentication (X-API-Key header)
97+
- Base URL from environment variable
98+
99+
### Phase 3: Implement MCP Tools
100+
101+
5. **Create /node_mcp/src/tools/pdf-tools.js**
102+
- MCP tool: `extract_pdf_text`
103+
- Input schema: `{ filePath: string }`
104+
- Uses pdf-extractor.js internally
105+
- This is INTERNAL tool, not exposed to user
106+
107+
6. **Create /node_mcp/src/tools/fax-tools.js**
108+
- MCP tools: `send_fax`, `get_fax_status` (existing tools)
109+
- Move logic from current /api/mcp_server.js
110+
- Clean up and organize properly
111+
112+
### Phase 4: Implement MCP Prompts (THE KEY FEATURE)
113+
114+
7. **Create /node_mcp/src/prompts/faxbot.js**
115+
```javascript
116+
const FAXBOT_PROMPTS = {
117+
"faxbot_pdf": {
118+
name: "faxbot_pdf",
119+
description: "Extract text from PDF and send as fax (avoids base64 token limits)",
120+
arguments: [
121+
{
122+
name: "pdf_path",
123+
description: "Absolute path to PDF file",
124+
required: true
125+
},
126+
{
127+
name: "to",
128+
description: "Fax number (E.164 format preferred)",
129+
required: true
130+
},
131+
{
132+
name: "header_text",
133+
description: "Optional header text to add",
134+
required: false
135+
}
136+
]
137+
}
138+
};
139+
```
140+
141+
8. **Create /node_mcp/src/prompts/index.js**
142+
- Export all prompt definitions
143+
- Registry pattern for easy expansion
144+
145+
### Phase 5: Implement MCP Servers
146+
147+
9. **Create /node_mcp/src/servers/stdio.js**
148+
- Copy structure from /api/mcp_server.js
149+
- Add ListPromptsRequestSchema, GetPromptRequestSchema handlers
150+
- Add prompt execution logic for faxbot_pdf
151+
- Import tools and prompts from organized modules
152+
153+
10. **Create /node_mcp/src/servers/http.js**
154+
- Copy structure from /api/mcp_http_server.js
155+
- Add same prompt support as stdio server
156+
- Maintain HTTP transport functionality
157+
158+
11. **Create /node_mcp/src/servers/sse.js**
159+
- Copy structure from /api/mcp_sse_server.js
160+
- Add same prompt support as stdio server
161+
- Maintain OAuth2/JWT functionality
162+
163+
### Phase 6: Prompt Execution Logic
164+
165+
12. **Implement faxbot_pdf workflow in each server**
166+
```javascript
167+
async function executeSmartFaxPdf(args) {
168+
// 1. Validate PDF file exists
169+
// 2. Extract text using pdf-extractor
170+
// 3. Format text nicely (add headers if provided)
171+
// 4. Send as TXT fax using fax-client
172+
// 5. Return job ID and confirmation
173+
// 6. Handle errors gracefully (file not found, extraction failed, etc.)
174+
}
175+
```
176+
177+
### Phase 7: Testing & Integration
178+
179+
13. **Create test scripts in /node_mcp/scripts/**
180+
- start-stdio.sh, start-http.sh, start-sse.sh
181+
- Test with small PDF, large PDF, corrupted PDF
182+
- Verify text extraction quality
183+
- Confirm fax transmission works
184+
185+
14. **Update documentation**
186+
- Create /node_mcp/README.md with usage examples
187+
- Update main project docs to reference new structure
188+
- Add migration guide from /api servers to /node_mcp servers
189+
190+
### Phase 8: Migration Path (CAREFUL)
191+
192+
15. **DO NOT DELETE /api MCP servers yet**
193+
- Keep them as fallback
194+
- Add deprecation notices
195+
- Update setup scripts to point to /node_mcp by default
196+
- Test extensively before considering removal
197+
198+
## Key Implementation Details
199+
200+
### MCP Prompt Handler Structure
201+
```javascript
202+
this.server.setRequestHandler(GetPromptRequestSchema, async (request) => {
203+
const { name, arguments: args } = request.params;
204+
205+
switch (name) {
206+
case 'faxbot_pdf':
207+
// Extract text from PDF
208+
const text = await extractTextFromPDF(args.pdf_path);
209+
// Send as text fax
210+
const result = await sendFax(args.to, text, 'txt');
211+
// Return formatted message for LLM
212+
return {
213+
messages: [
214+
{
215+
role: 'user',
216+
content: {
217+
type: 'text',
218+
text: `Faxbot workflow initiated. PDF "${args.pdf_path}" extracted to ${text.length} characters. Fax job ID: ${result.jobId}`
219+
}
220+
}
221+
]
222+
};
223+
}
224+
});
225+
```
226+
227+
### Error Handling Requirements
228+
- File not found: Clear error message with file path
229+
- PDF extraction failed: Graceful fallback message
230+
- Fax API errors: Pass through original error
231+
- Large text extraction: Warn if >100KB of text
232+
233+
### Environment Variables
234+
- `FAX_API_URL`: Faxbot API endpoint (default: http://localhost:8080)
235+
- `API_KEY`: Faxbot API authentication key
236+
- `MAX_TEXT_SIZE`: Maximum extracted text size in bytes (default: 100000)
237+
238+
## Expected User Experience After Implementation
239+
240+
### Before (Broken):
241+
```
242+
User: "Fax report.pdf to +1234567890"
243+
Claude: "I need to read the file first and encode it as base64..."
244+
Result: Token limit exceeded, fails
245+
```
246+
247+
### After (Working):
248+
```
249+
User: "Faxbot report.pdf to +1234567890"
250+
Claude: "I'll use the faxbot_pdf workflow to extract text and send it."
251+
Result: PDF text extracted, sent as text fax, succeeds
252+
```
253+
254+
## Critical Success Criteria
255+
1. **File size handling**: 10MB PDF → ~100KB text (99% reduction)
256+
2. **Token efficiency**: No base64 encoding in conversation
257+
3. **Text fidelity**: Extracted text is readable and formatted
258+
4. **Error resilience**: Graceful failures with helpful messages
259+
5. **Backward compatibility**: Existing tools still work
260+
6. **Project structure**: Clean, organized, maintainable code
261+
262+
## What NOT To Do
263+
- DO NOT modify /api directory during initial implementation
264+
- DO NOT delete existing MCP servers until new ones are proven
265+
- DO NOT break existing functionality
266+
- DO NOT create files in random locations
267+
- DO NOT ignore error handling
268+
- DO NOT hardcode file paths or API endpoints
269+
- DO NOT add unnecessary dependencies
270+
271+
## Deliverables
272+
1. Complete /node_mcp directory structure
273+
2. Working MCP servers with prompt support
274+
3. PDF text extraction functionality
275+
4. Documentation and examples
276+
5. Test scripts and validation
277+
6. Migration guide
278+
279+
This implementation will solve the base64 limitation while maintaining clean project structure and providing a foundation for future MCP prompt workflows.

0 commit comments

Comments
 (0)