An OpenCode tool for processing PDF files using DeepSeek-OCR. Converts PDFs to high-quality images, performs OCR on each page, and returns markdown or plain text output.
This tool should be installed globally at ~/.config/opencode/tool/.
Use the provided deployment script for one-command installation and updates:
# Initial installation
./deploy-tool.sh
# Update after making changes to the repository
./deploy-tool.sh
# Force reinstallation (even if already installed)
./deploy-tool.sh --force
# Specify custom repository path
./deploy-tool.sh --repo /path/to/opencode-ocrThe script automatically detects if the tool is already installed and performs an update instead.
# Create tool directory
mkdir -p ~/.config/opencode/tool/
# Copy files
cp pdf-ocr.ts ~/.config/opencode/tool/
cp pdf_ocr_backend.py ~/.config/opencode/tool/
cp pyproject.toml ~/.config/opencode/tool/
# Install Python dependencies
cd ~/.config/opencode/tool && uv syncImportant: Python scripts must be run using uv run to ensure proper dependency management:
# Direct backend execution (with .env file)
uv run --directory ~/.config/opencode/tool --env-file .env pdf_ocr_backend.py <pdf_path> <output_format>
# Via OpenCode agent
Agent will use the pdf-ocr tool automaticallypdf_path: Absolute path to PDF fileoutput_format: Output format - "markdown" or "text" (defaults to "markdown")
- openai>=1.0.0
- PyMuPDF>=1.23.0
- Pillow>=10.0.0
The tool connects to an OpenAI-compatible endpoint. The endpoint can be configured in three ways:
-
.env file (recommended for persistent configuration): Copy
.env.exampleto.envand edit it:cp .env.example .env # Edit .env with your endpoint URLThen run with
uv run --env-file .env. -
Environment variable:
export DEEPSEEK_OCR_BASE_URL="http://your-endpoint:8080/v1"
-
Command-line argument (overrides both above):
uv run --directory ~/.config/opencode/tool pdf_ocr_backend.py <pdf_path> <output_format> --base-url http://your-endpoint:8080/v1
If none of these are set, the tool will throw an error.
The tool uses model-based routing to determine which OCR method to use. This is configured in ocr_routing.json:
Location: ~/.config/opencode/tool/pdf-ocr/tool/ocr_routing.json
Structure:
{
"_comment": "OCR Routing Configuration - Maps model IDs to preferred OCR method",
"_routing_options": {
"deepseek-ocr": "Use DeepSeek-OCR model (requires sufficient VRAM)",
"current_model": "Use the currently loaded model (requires vision support)"
},
"ocr_routing": {
"kimi-k2.5": "deepseek-ocr",
"kimi-k2.5-abliterated": "deepseek-ocr"
},
"default": "current_model"
}Routing Options:
deepseek-ocr: Use the dedicated DeepSeek-OCR model for OCR taskscurrent_model: Use the currently loaded model (requires vision/multimodal support)
Matching Logic:
- Exact match: Full model ID in
ocr_routing - Partial match: Pattern appears in model ID or model ID appears in pattern
- Default: Falls back to
defaultvalue if no match found
Examples:
Enable DeepSeek-OCR for Kimi models:
{
"ocr_routing": {
"kimi-k2.5": "deepseek-ocr",
"kimi-k2.5-abliterated": "deepseek-ocr"
},
"default": "current_model"
}Always use current model (disable DeepSeek-OCR):
{
"ocr_routing": {},
"default": "current_model"
}Enable DeepSeek-OCR for all models by default:
{
"ocr_routing": {},
"default": "deepseek-ocr"
}- PDF-to-image conversion at 144 DPI (high quality for OCR)
- PNG format with RGB color space
- Sequential page processing for memory management
- OCR parameters: temperature=0.0, max_tokens=8192, ngram_size=30, window_size=90
| Exit Code | Meaning | Scenario |
|---|---|---|
| 0 | Success | OCR completed successfully using either DeepSeek-OCR or a multimodal model |
| 1 | General Error | File not found, API error, processing error, or DeepSeek-OCR failure |
| 3 | NO_OCR_SUPPORT | Current model is configured to use current_model routing but lacks vision/multimodal support |
This error occurs when:
- The current model is configured to use
current_modelfor OCR routing - The model does not support multimodal/vision capabilities
Solution: Add the model to ocr_routing.json with deepseek-ocr value:
{
"ocr_routing": {
"your-model-name": "deepseek-ocr"
},
"default": "current_model"
}Or switch to a model with vision support.
If DeepSeek-OCR is configured but fails:
- Verify the endpoint is running and accessible
- Check that the DeepSeek-OCR model is loaded
- Review endpoint logs for errors
If ocr_routing.json is missing, the tool will use defaults (current_model). To create the configuration file, run:
./deploy-tool.sh- Partial page range support (e.g., process pages 5-10 only)
- Progress reporting during OCR processing
- Batch processing of multiple PDFs
- Additional output formats (e.g., JSON, HTML)
- Image quality settings configuration