I want to learn about lazy loading in docling and how can it help me to scale my service which is dependent on docling #2180

madmaxtwentytwo · 2025-09-03T06:10:12Z

madmaxtwentytwo
Sep 3, 2025

please help me and also provide me code snippets.

also tell me if there are any specific ways to improve the scalability of the service in my use case i dont need ocr also

2025-09-03T06:12:39Z

dosubot[bot]
bot Sep 3, 2025

Docling doesn't have a direct "lazy loading" utility, but it uses several resource-efficient patterns that help with scalability. For example, it processes documents in batches and uses iterator/generator-based reading, so data is loaded and processed only as needed—this is similar to lazy loading and helps reduce memory usage and speed up processing for large workloads. You can control batch size for Vision Language Models (VLMs) with the --page-batch-size option, which processes pages in chunks instead of all at once, improving resource management and scalability (source).

Docling also caches initialized pipelines keyed by their configuration, so pipelines are only created when needed and reused across requests. This thread-safe caching is a form of lazy initialization that boosts performance in concurrent environments (source).

If you don't need OCR, make sure it's disabled in your pipeline configuration—Docling won't use OCR unless you explicitly enable it, which saves significant resources (source). For further scalability, use batching, parallel processing (e.g., with ThreadPoolExecutor), and hardware acceleration (set num_threads for CPU, or use cuda/mps for GPU).

Example: Batch processing pages in Docling CLI

docling convert input.pdf output.json --page-batch-size 10

Example: Disabling image generation and using a lightweight PDF backend

from docling import convert

convert(
    input_path="input.pdf",
    output_path="output.json",
    images_scale=1.0,
    generate_page_images=False,
    generate_picture_images=False,
    pdf_backend="pypdfium2"
)

Example: Thread-safe pipeline caching (from Docling source)

with _PIPELINE_CACHE_LOCK:
    if cache_key not in self.initialized_pipelines:
        self.initialized_pipelines[cache_key] = pipeline_class(pipeline_options=pipeline_options)
    # else: reuse cached pipeline

For large-scale or distributed workloads, consider using docling-serve and pre-loading models before startup to avoid runtime downloads (source). This ensures your service is ready to handle requests immediately and scales efficiently.

Let me know if you want more details or examples for a specific deployment scenario!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

madmaxtwentytwo · 2025-09-03T06:31:42Z

madmaxtwentytwo
Sep 3, 2025
Author

tell me about pypdfium how is it light and how is it benefecial and with proper code samples
and the different settings we can configure in it

3 replies

dosubot[bot] bot Sep 3, 2025

pypdfium2 is a Python binding for the PDFium library, designed for fast and memory-efficient PDF rendering. In Docling, it's considered lightweight because it loads and processes only what's needed, supports lazy page parsing, and can skip heavy image generation steps—making it ideal for scalable, high-throughput services.

Using pypdfium2 as the PDF backend in Docling speeds up document conversion and reduces memory usage, especially when you disable unnecessary features like page or picture image generation. Docling's backend defers parsing of PDF pages until text or image data is actually requested, and can unload pages after processing to free resources, which is a form of lazy loading and helps with scalability and resource management (source).

Key configuration options for pypdfium2 in Docling:

pdf_backend="pypdfium2": Selects pypdfium2 as the PDF backend.
images_scale: Controls the scaling factor for images (set to 1.0 for speed).
generate_page_images and generate_picture_images: Set to False to skip image generation and save resources.
generate_parsed_pages: Controls whether parsed page data is kept after processing (set to False to reduce memory usage).

Python example:

from docling import convert

convert(
    input_path="input.pdf",
    output_path="output.json",
    images_scale=1.0,
    generate_page_images=False,
    generate_picture_images=False,
    pdf_backend="pypdfium2"
)

CLI example:

docling convert input.pdf output.json --images-scale 1.0 --generate-page-images false --generate-picture-images false --pdf-backend pypdfium2

For advanced usage, you can also control parsed page retention and thread safety is handled internally for concurrent access (source). These settings help you scale Docling-based services efficiently, especially when you don't need OCR or image outputs (source).

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

madmaxtwentytwo Sep 4, 2025
Author

what is batch processing in docling and how to enable that and also give me code samples along with it
and explain with an example of how it is beneficial

dosubot[bot] bot Sep 4, 2025

Batch processing in Docling means processing multiple pages or documents together in groups (batches) instead of one at a time. This improves scalability and efficiency by reducing overhead, making better use of CPU/GPU resources, and speeding up processing for large files or datasets. For example, when using Vision Language Models (VLMs), batching lets you process several pages in a single model call, which is much faster than processing each page individually.

You can enable batch processing in the CLI with the --page-batch-size flag:

docling convert input.pdf output.json --page-batch-size 8

In Python, you can set the batch size via pipeline options:

from docling import convert

convert(
    input_path="input.pdf",
    output_path="output.json",
    page_batch_size=8  # Number of pages per batch
)

You can also control other batch and concurrency settings (like doc_batch_size, doc_batch_concurrency) for advanced scenarios.

Example benefit: If you have a 100-page PDF and set --page-batch-size 10, Docling will process 10 pages at a time. This reduces the number of model calls and can cut total processing time significantly, especially when using GPU or remote models. Tests show up to 40% speedup for pipelines with heavy models like OCR or VLMs, and batching helps keep memory usage predictable by limiting how many pages are in-flight at once. This is especially useful for large-scale or server deployments where throughput and resource control matter most.

Best practice: Tune the batch size based on your hardware and workload. Too large a batch may use too much memory; too small may underutilize your hardware. Start with 4–16 and profile for your use case.

References:
Batching support for VLMs and --page-batch-size CLI flag
Threaded PDF pipeline and batch settings

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I want to learn about lazy loading in docling and how can it help me to scale my service which is dependent on docling #2180

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I want to learn about lazy loading in docling and how can it help me to scale my service which is dependent on docling #2180

Uh oh!

madmaxtwentytwo Sep 3, 2025

Replies: 2 comments · 3 replies

Uh oh!

dosubot[bot] bot Sep 3, 2025

Uh oh!

madmaxtwentytwo Sep 3, 2025 Author

Uh oh!

dosubot[bot] bot Sep 3, 2025

Uh oh!

madmaxtwentytwo Sep 4, 2025 Author

Uh oh!

dosubot[bot] bot Sep 4, 2025

madmaxtwentytwo
Sep 3, 2025

Replies: 2 comments 3 replies

dosubot[bot]
bot Sep 3, 2025

madmaxtwentytwo
Sep 3, 2025
Author

madmaxtwentytwo Sep 4, 2025
Author