This repository provides a Dockerized REST API wrapper around the opendataloader-pdf Python package.
The API accepts one or more PDF streams and returns extracted output in multiple formats using content negotiation (Accept header) and/or explicit conversion options.
OpenDataLoader extracts structured content from PDFs for downstream use cases like search, indexing, RAG, and document automation. It can produce formats such as JSON, Markdown, HTML, text, and annotated PDF output through conversion options.
For full product details, capabilities, and documentation, see the official site:
Full API reference for this repository:
REST_API.md
- Docker Desktop (or Docker Engine + Compose)
- Python 3.10+ (only needed to run local test script)
From the repository root:
docker compose up -d --buildCheck service health:
curl http://localhost:8080/healthExpected response:
{"status":"ok"}python scripts/test_rest_api.pyThis validates:
/healthand/options- Single-file conversion with JSON and Markdown responses
- Multi-file ZIP response
- JSON options payload handling
Open:
docker-api/opendataloader-api-examples.http
Run requests directly from VS Code REST Client to test common scenarios.
docker compose downThe container reads configuration from:
/app/docker-api/config.yaml
Override with environment variable:
APP_CONFIG
For JSON format reference, see:
docker-api/config.example.json
- The API validates uploaded PDF streams and returns clear
400errors for invalid/truncated uploads. - The implementation installs and uses
opendataloader-pdffrom PyPI inside the container.