Replies: 1 comment
-
docling-serve's scaling is limited by its architecture: each instance runs an in-process async orchestrator with a pool of worker threads, controlled by To scale up, you can increase the number of Uvicorn worker processes ( Key config variables: set If you see little improvement from more workers or hardware, the bottleneck may be in PDF parsing, I/O, or model inference. Profiling with logs, metrics, or external profilers can help pinpoint where time is spent. For GPU memory management, consider adding explicit cleanup (e.g., For best throughput: maximize both To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
We're very happy with docling's accuracy and we want to use it in production. We need to be able to process hundreds of documents every 5-10 minutes.
I've found docling-serve and I tried to stress-test it but the performance was quite slow. I'd love to know if I'm doing something wrong.
I've performed 5 different experiments - 1 locally (Macbook Pro M2, 12 cores, 32gb memory) and 4 in Google Cloud Run with GPU (1 GPU - Nvidia L4).
I've tried to process 5 PDF documents at once (using the UI's file upload feature), overall size of 6.2 MBs. PDF page counts are: 2, 3, 8, 13 and 101 pages. Overall 127 pages.
DOCLING_SERVE_ENG_LOC_NUM_WORKERS
to 4 didn’t seem to helpDOCLING_SERVE_ENG_LOC_NUM_WORKERS
to 8 didn’t seem to helpAs you can see, the lowest duration was 4m 37s. I also tried to increase
DOCLING_SERVE_ENG_LOC_NUM_WORKERS
but that didn't seem to work.I'd love to know how can I scale docling (and docling-serve maybe) to a point where it can process 500-1000 minutes under 1-2 minutes.
Beta Was this translation helpful? Give feedback.
All reactions