This document describes the Dockerized OpenDataLoader PDF REST API provided by this repository.
Base URL (local default):
http://localhost:8080
The API wraps the opendataloader-pdf package and supports:
- single or multiple PDF uploads (
multipart/form-data) - all OpenDataLoader conversion options as request parameters
Acceptheader content negotiation- ZIP packaging for multi-file or multi-format responses
No authentication is implemented in this project by default.
When format is not explicitly provided, output format can be inferred from the Accept header.
Supported Accept values:
application/jsontext/plaintext/markdowntext/htmlapplication/pdfapplication/zip
If no supported value is present, the API falls back to configured defaults.
Health check endpoint.
Response 200:
{"status":"ok"}Returns the full option metadata list supported by the current opendataloader-pdf runtime.
Response 200 (example excerpt):
[
{
"name": "format",
"python_name": "format",
"type": "string",
"required": false,
"default": null,
"description": "Output formats ..."
}
]Use this endpoint as the authoritative source for supported parameters.
Converts uploaded PDF files to one or more output formats.
Request type:
multipart/form-data
Required form fields:
files: one or more file parts (application/pdf)
Optional form fields:
- Any conversion option listed by
GET /options options: JSON object containing conversion options (using eithernameorpython_namekeys)
Notes:
- If both direct form fields and
optionsJSON contain the same option,optionsJSON wins. - Multiple file uploads are supported by repeating the
filespart.
curl -X POST "http://localhost:8080/convert" \
-H "Accept: application/json" \
-F "files=@samples/pdf/lorem.pdf"curl -X POST "http://localhost:8080/convert" \
-H "Accept: text/markdown" \
-F "files=@samples/pdf/lorem.pdf" \
-F "format=markdown"curl -X POST "http://localhost:8080/convert" \
-H "Accept: text/plain" \
-F "files=@samples/pdf/lorem.pdf" \
-F 'options={"format":"text","keep-line-breaks":true,"sanitize":true}'curl -X POST "http://localhost:8080/convert" \
-H "Accept: application/zip" \
-F "files=@samples/pdf/lorem.pdf" \
-F "files=@samples/pdf/1901.03003.pdf" \
--output output.zipPOST /convert returns one of these forms:
application/zip:- returned when multiple files are uploaded
- or when
Accept: application/zip - or when
formatincludes multiple values (for examplejson,markdown)
- Direct file response (single output):
- media type based on negotiated result (
application/json,text/markdown,text/plain,text/html,application/pdf)
- media type based on negotiated result (
- JSON envelope:
- when multiple outputs remain after filtering and response is not zipped
- payload shape:
{
"outputs": [
{
"name": "relative/path.ext",
"size": 1234,
"content": "...or parsed JSON..."
}
],
"count": 1
}Common error responses:
400 Bad Request- non-PDF upload extension
- invalid
optionsJSON - unsupported option key in
options - invalid/truncated PDF stream
- parser rejection of incomplete/invalid PDF stream
413 Payload Too Large- total uploaded content exceeds configured
max_upload_mb
- total uploaded content exceeds configured
500 Internal Server Error- conversion failure not mapped to a known client error
- no output files produced
Error body shape:
{
"detail": "Error description"
}Runtime behavior is controlled by config (docker-api/config.yaml by default):
runtime.max_upload_mb: request upload size limitruntime.infer_format_from_accept: enable/disableAcceptinferenceruntime.default_format: fallback conversion formatruntime.default_response: fallback response media mappingapi.accepted_media_types: mapAcceptvalues to formatsapi.direct_media_types: map single output formats to response media types
To use a different config file, set environment variable APP_CONFIG.
- Request examples:
docker-api/opendataloader-api-examples.http - Automated tests:
scripts/test_rest_api.py