feat: Layout + VLM model with layout prompt #2244

cau-git · 2025-09-10T15:35:42Z

Establishes a pipeline targeting VLMs that:

Produce DocTags
Accept a prompt with injected layout (box locations) from a layout model

⚠️ Note: This is an experimental pipeline, hence it resides in the docling.experimental module path.

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

Signed-off-by: Peter Staar <[email protected]>

mergify · 2025-09-10T15:35:47Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

github-actions · 2025-09-10T15:35:51Z

✅ DCO Check Passed

Thanks @cau-git, all your commits are properly signed off. 🎉

Signed-off-by: Christoph Auer <[email protected]>

codecov · 2025-09-10T17:34:30Z

Codecov Report

❌ Patch coverage is 2.59366% with 338 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...erimental/pipeline/threaded_layout_vlm_pipeline.py	0.00%	196 Missing ⚠️
docling/experimental/demo_layout_vlm.py	0.00%	69 Missing ⚠️
docling/models/api_vlm_model.py	10.76%	58 Missing ⚠️
.../datamodel/threaded_layout_vlm_pipeline_options.py	0.00%	13 Missing ⚠️
.../models/vlm_models_inline/hf_transformers_model.py	0.00%	1 Missing ⚠️
docling/models/vlm_models_inline/mlx_model.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Christoph Auer <[email protected]>

…ipeline

Signed-off-by: Christoph Auer <[email protected]>

Signed-off-by: Peter El Hachem <[email protected]> Signed-off-by: ElHachem02 <[email protected]>

Signed-off-by: ElHachem02 <[email protected]>

ElHachem02 · 2025-11-05T13:10:56Z

@dosu review

dosubot · 2025-11-05T13:18:03Z

The experimental VLM pipeline is implemented in the main codebase (not a separate experimental module) and supports producing DocTags and injecting layout information into prompts for vision-language models. The pipeline is flexible, allowing prompt customization per page, and supports multiple VLM backends (Transformers, MLX, vLLM, API) with output formats like DocTags, Markdown, and HTML. Layout prompt injection is handled by constructing prompts that can include box locations or other layout model outputs at runtime.

Documentation and usage examples have been updated and are available, including scripts for minimal pipelines, model comparisons, API usage, and GPU acceleration. These examples demonstrate how to configure the pipeline, inject layout into prompts, and export results in various formats. Recent PRs confirm that these examples and docs are maintained and integrated into the CI workflow for visibility and testing (source, source, source).

Automated test coverage for this experimental feature is not clearly documented, but the code is modular, maintainable, and well-documented for experimental status. If you need to see concrete usage, check the example scripts like gpu_vlm_pipeline.py, minimal_vlm_pipeline.py, and vlm_pipeline_api_model.py for end-to-end demonstrations.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Copilot

Pull Request Overview

This PR standardizes the build_prompt method call signature across VLM models and introduces a new experimental threaded pipeline that combines layout model preprocessing with VLM processing.

Standardizes build_prompt() to accept full Page objects instead of inconsistent page.parsed_page or callable checks
Introduces BaseVlmPageModel as a common base class for VLM page models
Adds experimental ThreadedLayoutVlmPipeline that injects layout coordinates into VLM prompts

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
docling/models/vlm_models_inline/mlx_model.py	Simplified to use standardized `build_prompt(page)` instead of callable check
docling/models/vlm_models_inline/hf_transformers_model.py	Changed from `build_prompt(page.parsed_page)` to `build_prompt(page)` for consistency
docling/models/api_vlm_model.py	Refactored to inherit from `BaseVlmPageModel`, added `process_images()` method, and standardized prompt building (with inconsistency issue)
docling/experimental/pipeline/threaded_layout_vlm_pipeline.py	New experimental pipeline combining layout and VLM processing with layout coordinate injection
docling/experimental/demo_layout_vlm.py	Demo script for the new threaded pipeline (contains typo in parameter name)
docling/experimental/datamodel/threaded_layout_vlm_pipeline_options.py	Configuration options for the new pipeline
docling/experimental/pipeline/init.py	New package initialization
docling/experimental/datamodel/init.py	New package initialization
docling/experimental/init.py	New experimental package with docstring
docling/datamodel/pipeline_options_vlm_model.py	Updated `build_prompt()` signature to accept `Union[Page, SegmentedPage]` with duplicate import

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-05T13:21:01Z

docling/models/api_vlm_model.py

+                    if hi_res_image is not None:
+                        images.append(hi_res_image)
+                        prompt = self.vlm_options.build_prompt(
+                            page.parsed_page


Inconsistent parameter passed to build_prompt(). This file calls build_prompt(page.parsed_page) while the other VLM models (mlx_model.py, hf_transformers_model.py) call build_prompt(page). This should be changed to self.vlm_options.build_prompt(page) for consistency.

Suggested change

page.parsed_page

page

Copilot · 2025-11-05T13:21:01Z

docling/models/api_vlm_model.py

+                        images.append(hi_res_image)
+                        prompt = self.vlm_options.build_prompt(
+                            page.parsed_page
+                        )  # ask christoph


Remove or resolve the TODO comment '# ask christoph' before merging to production.

Suggested change

) # ask christoph

)

Copilot · 2025-11-05T13:21:01Z

docling/models/api_vlm_model.py

+
+            stop_reason = VlmStopReason.UNSPECIFIED
+
+            if self.vlm_options.custom_stopping_criteria:  # Ask christoph


Remove or resolve the TODO comment '# Ask christoph' before merging to production.

Suggested change

if self.vlm_options.custom_stopping_criteria: # Ask christoph

if self.vlm_options.custom_stopping_criteria:

Copilot · 2025-11-05T13:21:02Z

docling/experimental/demo_layout_vlm.py

+        generate_page_images=True,
+    )
+
+    pipeline_options_classic_vlm = VlmPipelineOptions(vlm_otpions=GRANITEDOCLING_VLLM)


Corrected spelling of 'vlm_otpions' to 'vlm_options'.

Suggested change

pipeline_options_classic_vlm = VlmPipelineOptions(vlm_otpions=GRANITEDOCLING_VLLM)

pipeline_options_classic_vlm = VlmPipelineOptions(vlm_options=GRANITEDOCLING_VLLM)

Copilot · 2025-11-05T13:21:02Z

docling/datamodel/pipeline_options_vlm_model.py

-from typing import Any, Dict, List, Literal, Optional, Union
+from typing import TYPE_CHECKING, Any, Dict, List, Literal, Optional, Union

 from docling_core.types.doc.page import SegmentedPage


Duplicate import: SegmentedPage is imported both at line 4 and inside the TYPE_CHECKING block at line 13. The import at line 4 should be removed since it's only used for type annotations and is already imported in the TYPE_CHECKING block.

Suggested change

from docling_core.types.doc.page import SegmentedPage

Copilot · 2025-11-05T13:21:02Z

docling/experimental/pipeline/threaded_layout_vlm_pipeline.py

+                        layout_injection = f"{layout_xml}"
+
+                        custom_prompt = base_prompt + layout_injection
+                        print(f"Layout injection prompt: {custom_prompt}")


Debug print statement should be removed or replaced with proper logging using the _log logger that's already defined at the module level.

Suggested change

print(f"Layout injection prompt: {custom_prompt}")

_log.debug(f"Layout injection prompt: {custom_prompt}")

Copilot · 2025-11-05T13:21:03Z

docling/experimental/pipeline/threaded_layout_vlm_pipeline.py

+_log = logging.getLogger(__name__)
+
+


The global variable '_log' is not used.

Suggested change

_log = logging.getLogger(__name__)

Copilot · 2025-11-05T13:21:03Z

docling/experimental/pipeline/threaded_layout_vlm_pipeline.py

+import itertools
+import logging
+from pathlib import Path
+from typing import Iterable, List, Optional, Union, cast


Import of 'Iterable' is not used.

Suggested change

from typing import Iterable, List, Optional, Union, cast

from typing import List, Optional, Union, cast

PeterStaar-IBM added 2 commits September 5, 2025 15:00

adding granite-docling preview

c1dcb05

Signed-off-by: Peter Staar <[email protected]>

updated the model specs

0e2f370

Signed-off-by: Peter Staar <[email protected]>

Add Layout+VLM pipeline with prompt injection, ApiVlmModel updates

72007b9

Signed-off-by: Christoph Auer <[email protected]>

cau-git force-pushed the cau/layout_vlm_pipeline branch from 0dbdd1f to 72007b9 Compare September 10, 2025 15:38

cau-git added 3 commits September 11, 2025 16:42

Update layout injection, move to experimental

ed68b47

Signed-off-by: Christoph Auer <[email protected]>

Merge branch 'main' of github.com:DS4SD/docling into cau/layout_vlm_p…

57c40b1

…ipeline

Adjust defaults

0a39e2f

Signed-off-by: Christoph Auer <[email protected]>

cau-git force-pushed the cau/layout_vlm_pipeline branch from a63ebb1 to 0a39e2f Compare September 11, 2025 14:50

cau-git added 3 commits September 16, 2025 15:57

Map Layout+VLM pipeline to GraniteDoclign

98fde58

Signed-off-by: Christoph Auer <[email protected]>

Update from main

0be5ebc

Signed-off-by: Christoph Auer <[email protected]>

Remove base_prompt from layout injection prompt

f06664a

Signed-off-by: Christoph Auer <[email protected]>

PeterStaar-IBM requested review from PeterStaar-IBM and dolfim-ibm September 20, 2025 11:01

cau-git added 2 commits October 16, 2025 19:45

Sync branch with main

77107fe

Signed-off-by: Christoph Auer <[email protected]>

Reinstate custom prompt

f248bbd

Signed-off-by: Christoph Auer <[email protected]>

ElHachem02 self-assigned this Nov 5, 2025

add demo_layout file that produces with vs without layout injection

6ea00bb

Signed-off-by: Peter El Hachem <[email protected]> Signed-off-by: ElHachem02 <[email protected]>

ElHachem02 force-pushed the cau/layout_vlm_pipeline branch from d572cf4 to 6ea00bb Compare November 5, 2025 10:05

ElHachem02 added 2 commits November 5, 2025 11:32

merge with main branch

fa0b603

Signed-off-by: ElHachem02 <[email protected]>

feat: merge wit main, remove inverence time fron api_vlm_model

8cf23e7

Signed-off-by: ElHachem02 <[email protected]>

cau-git requested a review from Copilot November 5, 2025 13:10

Copilot AI reviewed Nov 5, 2025

View reviewed changes


		stop_reason = VlmStopReason.UNSPECIFIED

		if self.vlm_options.custom_stopping_criteria: # Ask christoph

	pipeline_options_classic_vlm = VlmPipelineOptions(vlm_otpions=GRANITEDOCLING_VLLM)
	pipeline_options_classic_vlm = VlmPipelineOptions(vlm_options=GRANITEDOCLING_VLLM)

	print(f"Layout injection prompt: {custom_prompt}")
	_log.debug(f"Layout injection prompt: {custom_prompt}")

	from typing import Iterable, List, Optional, Union, cast
	from typing import List, Optional, Union, cast

feat: Layout + VLM model with layout prompt #2244

Are you sure you want to change the base?

feat: Layout + VLM model with layout prompt #2244

Conversation

cau-git commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Sep 10, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

github-actions bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ElHachem02 commented Nov 5, 2025

Uh oh!

dosubot bot commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cau-git commented Sep 10, 2025 •

edited

Loading

github-actions bot commented Sep 10, 2025 •

edited

Loading

codecov bot commented Sep 10, 2025 •

edited

Loading