I want to learn about lazy loading in docling and how can it help me to scale my service which is dependent on docling #2180
Replies: 2 comments 3 replies
-
Docling doesn't have a direct "lazy loading" utility, but it uses several resource-efficient patterns that help with scalability. For example, it processes documents in batches and uses iterator/generator-based reading, so data is loaded and processed only as needed—this is similar to lazy loading and helps reduce memory usage and speed up processing for large workloads. You can control batch size for Vision Language Models (VLMs) with the Docling also caches initialized pipelines keyed by their configuration, so pipelines are only created when needed and reused across requests. This thread-safe caching is a form of lazy initialization that boosts performance in concurrent environments (source). If you don't need OCR, make sure it's disabled in your pipeline configuration—Docling won't use OCR unless you explicitly enable it, which saves significant resources (source). For further scalability, use batching, parallel processing (e.g., with Example: Batch processing pages in Docling CLI docling convert input.pdf output.json --page-batch-size 10 Example: Disabling image generation and using a lightweight PDF backend from docling import convert
convert(
input_path="input.pdf",
output_path="output.json",
images_scale=1.0,
generate_page_images=False,
generate_picture_images=False,
pdf_backend="pypdfium2"
) Example: Thread-safe pipeline caching (from Docling source) with _PIPELINE_CACHE_LOCK:
if cache_key not in self.initialized_pipelines:
self.initialized_pipelines[cache_key] = pipeline_class(pipeline_options=pipeline_options)
# else: reuse cached pipeline For large-scale or distributed workloads, consider using Let me know if you want more details or examples for a specific deployment scenario! To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
tell me about pypdfium how is it light and how is it benefecial and with proper code samples |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
please help me and also provide me code snippets.
also tell me if there are any specific ways to improve the scalability of the service in my use case i dont need ocr also
Beta Was this translation helpful? Give feedback.
All reactions