This guide helps you test and diagnose the PDF Extraction MCP Server to isolate issues between the server and clients (like Forest Admin).
pip install httpx httpx-sse psutilTest against your Heroku deployment:
python test_mcp_client.py https://pdf-extraction-mcp-54041c60e7d7.herokuapp.com/mcp 20Or test locally:
# Terminal 1: Start server
cd src
python -m pdf_extraction.http_server --port 8000
# Terminal 2: Run tests
python test_mcp_client.py http://localhost:8000/mcp 20While the server is running, check its status:
# Health check
curl https://pdf-extraction-mcp-54041c60e7d7.herokuapp.com/health
# Detailed metrics
curl https://pdf-extraction-mcp-54041c60e7d7.herokuapp.com/metrics✅ 100% success rate = Server is working correctly
- If test script succeeds but Forest Admin fails → Forest Admin client issue
- Check Forest Admin's request frequency and session management
✅ Consistent failures = Identifiable server issue
- Check Heroku logs for error details
- Look for resource constraints (memory, CPU)
- Check temp file cleanup issues
❌ Every other request fails = Session/state management issue
- Possible causes:
- SSE connection not being reused properly
- Session state corruption
- Race condition in temp file handling
❌ Random failures (no pattern) = Resource or timing issue
- Possible causes:
- Heroku dyno sleeping
- Memory/CPU exhaustion
- Network timeouts
- OCR/Tesseract failures
❌ First request fails, others succeed = Initialization issue
- Cold start problems
- Missing dependencies on first run
Watch real-time logs while testing:
heroku logs --tail --app pdf-extraction-mcp-54041c60e7d7Look for:
- Session connection/disconnection patterns
- Tool call timing and success/failure
- Error messages and stack traces
- Resource usage warnings
Tests if the server handles quick successive requests:
python test_mcp_client.py https://pdf-extraction-mcp-54041c60e7d7.herokuapp.com/mcp 50Modify test_mcp_client.py to test with different PDFs:
test_pdfs = [
"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
"https://example.com/other-pdf.pdf",
]Test with a large multi-page PDF to check memory handling:
result = await client.call_tool(
"extract-pdf-contents",
{"pdf_path": "https://example.com/large.pdf", "pages": None} # All pages
){
"resources": {
"memory_rss_mb": 150.5,
"memory_vms_mb": 350.2
}
}- Normal: RSS < 200 MB
- Warning: RSS > 300 MB (approaching Heroku limits)
- Critical: RSS > 450 MB (may cause dyno restart)
{
"resources": {
"cpu_percent": 45.2
}
}- Normal: < 50% during extraction
- Warning: Sustained > 80%
- Critical: Sustained 100% (requests will queue)
{
"requests": {
"total": 100,
"errors": 5,
"success_rate": "95.0%"
}
}- Excellent: > 98%
- Good: 95-98%
- Poor: < 95% (investigate errors)
Diagnosis: Session state issue
Solution:
- Check if Forest Admin is properly managing SSE connections
- Verify session IDs are being passed correctly
- Consider adding session persistence
Diagnosis: Resource exhaustion or cold starts
Solution:
- Upgrade Heroku dyno tier
- Optimize PDF processing (limit page count)
- Add request queuing
Diagnosis: Missing Tesseract language data
Solution:
# Check Heroku buildpack
heroku buildpacks --app pdf-extraction-mcp-54041c60e7d7
# Ensure tesseract buildpack is installed
heroku buildpacks:add https://github.com/heroku/heroku-buildpack-aptDiagnosis: File system issues or cleanup race conditions
Solution:
- Check logs for "Failed to clean up temp file" warnings
- Verify
/tmpdirectory has space - Add retry logic for file operations
In http_server.py:
logging.basicConfig(
level=logging.DEBUG, # Changed from INFO
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)Create custom tests for specific scenarios:
# test_custom.py
import asyncio
from test_mcp_client import MCPTestClient
async def test_concurrent():
"""Test multiple concurrent clients"""
clients = [MCPTestClient(server_url) for _ in range(5)]
# Connect all clients
await asyncio.gather(*[c.connect() for c in clients])
# Run concurrent tool calls
tasks = []
for client in clients:
for _ in range(10):
tasks.append(client.call_tool("extract-pdf-contents", {...}))
results = await asyncio.gather(*tasks, return_exceptions=True)
# Analyze results
asyncio.run(test_concurrent())- Use Forest Admin to call the tool 20 times
- Document which attempts fail
- Note the failure pattern
- Run
test_mcp_client.pywith same parameters - Compare failure patterns
| Scenario | Forest Admin | Test Script | Conclusion |
|---|---|---|---|
| Both fail with same pattern | ✗ | ✗ | Server issue |
| Forest fails, script succeeds | ✗ | ✓ | Forest Admin issue |
| Script fails, Forest succeeds | ✓ | ✗ | Test script issue |
| Both succeed | ✓ | ✓ | No issue found |
If tests reveal issues:
-
Save test output:
python test_mcp_client.py ... > test_results.txt 2>&1
-
Save metrics:
curl https://.../metrics > metrics.json -
Save logs:
heroku logs --tail > heroku_logs.txt -
Share these files for diagnosis
After identifying the issue:
- Server issue: Fix in
http_server.pyorpdf_extractor.py, redeploy - Forest Admin issue: Report to Forest Admin support with test results
- Infrastructure issue: Upgrade Heroku dyno or optimize resources