Why do all my PDF files take almost exactly 5 seconds (occasionally 10) to be processed? #2259
Replies: 1 comment
-
Hi @chris-prezi! I'm Dosu and I’m helping the docling team. The consistent 5-second (or 10-second) processing time per PDF chunk in docling-serve is not caused by any hardcoded delay or timeout in the codebase. The "processing_time" field only measures the core extraction step and does not include I/O, model loading, or network overhead, which explains why it reports microseconds even though the wall-clock time is much longer example example. Common causes for this fixed delay include model loading or initialization on each request (especially if your API parameters differ from the docling-serve defaults, which can prevent effective caching), network or storage latency (such as models being downloaded at runtime instead of pre-mounted), and infrastructure bottlenecks like thread or worker configuration example. To reduce processing time, make sure all required models are pre-downloaded and mounted into the container, set the DOCLING_SERVE_ARTIFACTS_PATH environment variable to the correct path, and align your API parameters with the docling-serve defaults to maximize cache reuse. Also, tune thread and worker settings (e.g., DOCLING_NUM_THREADS) to match your hardware example. If you’re running in a cluster or containerized environment, use persistent storage for models and avoid runtime downloads. If you’re using async mode, be aware that running with multiple Uvicorn workers can cause reliability issues unless you use an external orchestrator like RQ or KFP example. If you want to match sagemaker/docling performance, check which PDF backend it uses (e.g., pypdfium2 is much faster than the default for some documents) and set the same backend in docling-serve example. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
My service splits larger PDFs into chunks, which vary in size, then passes them to docling-serve for text extraction. Checking the extraction durations for all chunks in my logs, they are all 5.xxx seconds, or occasionally 10.xxx seconds, suggesting some 5 second process or delay. I'm looking at docling-serve to replace sagemaker/docling, but sagemaker is much faster at individual chunk processing. Not sure what to make of the returned processing_time from docling-serve, which always claims to be a few microseconds
Beta Was this translation helpful? Give feedback.
All reactions