Skip to content

Commit 0be5ebc

Browse files
committed
Update from main
Signed-off-by: Christoph Auer <[email protected]>
2 parents 98fde58 + 8322c2e commit 0be5ebc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4826
-2716
lines changed

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,35 @@
1+
## [v2.53.0](https://github.com/docling-project/docling/releases/tag/v2.53.0) - 2025-09-17
2+
3+
### Feature
4+
5+
* Add granite-docling model ([#2272](https://github.com/docling-project/docling/issues/2272)) ([`17afb66`](https://github.com/docling-project/docling/commit/17afb664d005168b5a6f12a2df4432076a9329bb))
6+
* **RapidOcr:** Support generic extra arguments for RapidOcr ([#2266](https://github.com/docling-project/docling/issues/2266)) ([`0e95171`](https://github.com/docling-project/docling/commit/0e95171dd64733ba52f2f0906642be24f6237977))
7+
8+
### Fix
9+
10+
* Handle empty result from RapidOCR to avoid crash ([#2264](https://github.com/docling-project/docling/issues/2264)) ([`609d902`](https://github.com/docling-project/docling/commit/609d902eef157ae68e33faa26d73533ef7a4a749))
11+
12+
### Documentation
13+
14+
* Describe examples ([#2262](https://github.com/docling-project/docling/issues/2262)) ([`ff351fd`](https://github.com/docling-project/docling/commit/ff351fd40c4b635133401e77dea89bec8cd0ca33))
15+
16+
## [v2.52.0](https://github.com/docling-project/docling/releases/tag/v2.52.0) - 2025-09-11
17+
18+
### Feature
19+
20+
* Enrichment steps on all convert pipelines (incl docx, html, etc) ([#2251](https://github.com/docling-project/docling/issues/2251)) ([`2c91234`](https://github.com/docling-project/docling/commit/2c9123419f541feda8cc98c53aeb37288fabcaee))
21+
22+
### Fix
23+
24+
* Add missing features in ThreadedStandardPdfPipeline ([#2252](https://github.com/docling-project/docling/issues/2252)) ([`0700af2`](https://github.com/docling-project/docling/commit/0700af212cce8d90dbe0477dcb06d69370649e97))
25+
* Address deprecation warnings of dependencies ([#2237](https://github.com/docling-project/docling/issues/2237)) ([`c696549`](https://github.com/docling-project/docling/commit/c6965495a22703d0e35105b5daafcaaf8a8063d6))
26+
27+
### Documentation
28+
29+
* Add an example of RAG with OpenSearch ([#2238](https://github.com/docling-project/docling/issues/2238)) ([`f8cc545`](https://github.com/docling-project/docling/commit/f8cc545bab07e5fdd79bcff7042e9279e18926c6))
30+
* Add instructions for using Docling with MCP to README ([#2219](https://github.com/docling-project/docling/issues/2219)) ([`e5cd702`](https://github.com/docling-project/docling/commit/e5cd7020bd281aea63519db9a5332dd2dcca54b4))
31+
* Document VLM support requirement in extraction example ([#2231](https://github.com/docling-project/docling/issues/2231)) ([`55f5f37`](https://github.com/docling-project/docling/commit/55f5f3752f33f5b495cb2af5e6a3aee5d157fad8))
32+
133
## [v2.51.0](https://github.com/docling-project/docling/releases/tag/v2.51.0) - 2025-09-05
234

335
### Feature

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Docling simplifies document processing, parsing diverse formats — including ad
3636
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
3737
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
3838
* 🔍 Extensive OCR support for scanned PDFs and images
39-
* 👓 Support of several Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
39+
* 👓 Support of several Visual Language Models ([GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M))
4040
* 🎙️ Audio support with Automatic Speech Recognition (ASR) models
4141
* 🔌 Connect to any agent using the [MCP server](https://docling-project.github.io/docling/usage/mcp/)
4242
* 💻 Simple and convenient CLI
@@ -88,9 +88,9 @@ Docling has a built-in CLI to run conversions.
8888
docling https://arxiv.org/pdf/2206.01062
8989
```
9090

91-
You can also use 🥚[SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview) and other VLMs via Docling CLI:
91+
You can also use 🥚[GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M) and other VLMs via Docling CLI:
9292
```bash
93-
docling --pipeline vlm --vlm-model smoldocling https://arxiv.org/pdf/2206.01062
93+
docling --pipeline vlm --vlm-model granite_docling https://arxiv.org/pdf/2206.01062
9494
```
9595
This will use MLX acceleration on supported Apple Silicon hardware.
9696

docling/cli/main.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ def convert( # noqa: C901
336336
vlm_model: Annotated[
337337
VlmModelType,
338338
typer.Option(..., help="Choose the VLM model to use with PDF or image files."),
339-
] = VlmModelType.SMOLDOCLING,
339+
] = VlmModelType.GRANITEDOCLING,
340340
asr_model: Annotated[
341341
AsrModelType,
342342
typer.Option(..., help="Choose the ASR model to use with audio/video files."),
@@ -695,7 +695,7 @@ def convert( # noqa: C901
695695
pipeline_options.vlm_options = GRANITEDOCLING_MLX
696696
except ImportError:
697697
_log.warning(
698-
"To run SmolDocling faster, please install mlx-vlm:\n"
698+
"To run GraniteDocling faster, please install mlx-vlm:\n"
699699
"pip install mlx-vlm"
700700
)
701701
elif vlm_model == VlmModelType.SMOLDOCLING_VLLM:

docling/cli/models.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ class _AvailableModels(str, Enum):
3333
CODE_FORMULA = "code_formula"
3434
PICTURE_CLASSIFIER = "picture_classifier"
3535
SMOLVLM = "smolvlm"
36+
GRANITEDOCLING = "granitedocling"
37+
GRANITEDOCLING_MLX = "granitedocling_mlx"
3638
SMOLDOCLING = "smoldocling"
3739
SMOLDOCLING_MLX = "smoldocling_mlx"
3840
GRANITE_VISION = "granite_vision"
@@ -108,6 +110,8 @@ def download(
108110
with_code_formula=_AvailableModels.CODE_FORMULA in to_download,
109111
with_picture_classifier=_AvailableModels.PICTURE_CLASSIFIER in to_download,
110112
with_smolvlm=_AvailableModels.SMOLVLM in to_download,
113+
with_granitedocling=_AvailableModels.GRANITEDOCLING in to_download,
114+
with_granitedocling_mlx=_AvailableModels.GRANITEDOCLING_MLX in to_download,
111115
with_smoldocling=_AvailableModels.SMOLDOCLING in to_download,
112116
with_smoldocling_mlx=_AvailableModels.SMOLDOCLING_MLX in to_download,
113117
with_granite_vision=_AvailableModels.GRANITE_VISION in to_download,

docling/datamodel/pipeline_options.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
)
1313
from typing_extensions import deprecated
1414

15-
from docling.datamodel import asr_model_specs
15+
from docling.datamodel import asr_model_specs, vlm_model_specs
1616

1717
# Import the following for backwards compatibility
1818
from docling.datamodel.accelerator_options import AcceleratorDevice, AcceleratorOptions
@@ -114,7 +114,11 @@ class RapidOcrOptions(OcrOptions):
114114
cls_model_path: Optional[str] = None # same default as rapidocr
115115
rec_model_path: Optional[str] = None # same default as rapidocr
116116
rec_keys_path: Optional[str] = None # same default as rapidocr
117-
rec_font_path: Optional[str] = None # same default as rapidocr
117+
rec_font_path: Optional[str] = None # Deprecated, please use font_path instead
118+
font_path: Optional[str] = None # same default as rapidocr
119+
120+
# Dictionary to overwrite or pass-through additional parameters
121+
rapidocr_params: Dict[str, Any] = Field(default_factory=dict)
118122

119123
model_config = ConfigDict(
120124
extra="forbid",
@@ -286,7 +290,7 @@ class VlmPipelineOptions(PaginatedPipelineOptions):
286290
)
287291
# If True, text from backend will be used instead of generated text
288292
vlm_options: Union[InlineVlmOptions, ApiVlmOptions] = (
289-
smoldocling_vlm_conversion_options
293+
vlm_model_specs.GRANITEDOCLING_TRANSFORMERS
290294
)
291295

292296

docling/datamodel/vlm_model_specs.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,11 @@
2020

2121
# Granite-Docling
2222
GRANITEDOCLING_TRANSFORMERS = InlineVlmOptions(
23-
repo_id="ds4sd/granite-docling-258m-2-9-2025-v2",
23+
repo_id="ibm-granite/granite-docling-258M",
2424
prompt="Convert this page to docling.",
2525
response_format=ResponseFormat.DOCTAGS,
26-
inference_framework=InferenceFramework.MLX,
26+
inference_framework=InferenceFramework.TRANSFORMERS,
27+
transformers_model_type=TransformersModelType.AUTOMODEL_IMAGETEXTTOTEXT,
2728
supported_devices=[
2829
AcceleratorDevice.CPU,
2930
AcceleratorDevice.CUDA,
@@ -35,7 +36,7 @@
3536
)
3637

3738
GRANITEDOCLING_MLX = InlineVlmOptions(
38-
repo_id="ds4sd/granite-docling-258m-2-9-2025-v2-mlx-bf16",
39+
repo_id="ibm-granite/granite-docling-258M-mlx",
3940
prompt="Convert this page to docling.",
4041
response_format=ResponseFormat.DOCTAGS,
4142
inference_framework=InferenceFramework.MLX,

docling/models/rapid_ocr_model.py

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -62,32 +62,44 @@ def __init__(
6262
}
6363
backend_enum = _ALIASES.get(self.options.backend, EngineType.ONNXRUNTIME)
6464

65+
params = {
66+
# Global settings (these are still correct)
67+
"Global.text_score": self.options.text_score,
68+
"Global.font_path": self.options.font_path,
69+
# "Global.verbose": self.options.print_verbose,
70+
# Detection model settings
71+
"Det.model_path": self.options.det_model_path,
72+
"Det.use_cuda": use_cuda,
73+
"Det.use_dml": use_dml,
74+
"Det.intra_op_num_threads": intra_op_num_threads,
75+
# Classification model settings
76+
"Cls.model_path": self.options.cls_model_path,
77+
"Cls.use_cuda": use_cuda,
78+
"Cls.use_dml": use_dml,
79+
"Cls.intra_op_num_threads": intra_op_num_threads,
80+
# Recognition model settings
81+
"Rec.model_path": self.options.rec_model_path,
82+
"Rec.font_path": self.options.rec_font_path,
83+
"Rec.keys_path": self.options.rec_keys_path,
84+
"Rec.use_cuda": use_cuda,
85+
"Rec.use_dml": use_dml,
86+
"Rec.intra_op_num_threads": intra_op_num_threads,
87+
"Det.engine_type": backend_enum,
88+
"Cls.engine_type": backend_enum,
89+
"Rec.engine_type": backend_enum,
90+
}
91+
92+
if self.options.rec_font_path is not None:
93+
_log.warning(
94+
"The 'rec_font_path' option for RapidOCR is deprecated. Please use 'font_path' instead."
95+
)
96+
user_params = self.options.rapidocr_params
97+
if user_params:
98+
_log.debug("Overwriting RapidOCR params with user-provided values.")
99+
params.update(user_params)
100+
65101
self.reader = RapidOCR(
66-
params={
67-
# Global settings (these are still correct)
68-
"Global.text_score": self.options.text_score,
69-
# "Global.verbose": self.options.print_verbose,
70-
# Detection model settings
71-
"Det.model_path": self.options.det_model_path,
72-
"Det.use_cuda": use_cuda,
73-
"Det.use_dml": use_dml,
74-
"Det.intra_op_num_threads": intra_op_num_threads,
75-
# Classification model settings
76-
"Cls.model_path": self.options.cls_model_path,
77-
"Cls.use_cuda": use_cuda,
78-
"Cls.use_dml": use_dml,
79-
"Cls.intra_op_num_threads": intra_op_num_threads,
80-
# Recognition model settings
81-
"Rec.model_path": self.options.rec_model_path,
82-
"Rec.font_path": self.options.rec_font_path,
83-
"Rec.keys_path": self.options.rec_keys_path,
84-
"Rec.use_cuda": use_cuda,
85-
"Rec.use_dml": use_dml,
86-
"Rec.intra_op_num_threads": intra_op_num_threads,
87-
"Det.engine_type": backend_enum,
88-
"Cls.engine_type": backend_enum,
89-
"Rec.engine_type": backend_enum,
90-
}
102+
params=params,
91103
)
92104

93105
def __call__(
@@ -120,6 +132,9 @@ def __call__(
120132
use_cls=self.options.use_cls,
121133
use_rec=self.options.use_rec,
122134
)
135+
if result is None or result.boxes is None:
136+
_log.warning("RapidOCR returned empty result!")
137+
continue
123138
result = list(
124139
zip(result.boxes.tolist(), result.txts, result.scores)
125140
)

docling/utils/model_downloader.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
)
1111
from docling.datamodel.settings import settings
1212
from docling.datamodel.vlm_model_specs import (
13+
GRANITEDOCLING_MLX,
14+
GRANITEDOCLING_TRANSFORMERS,
1315
SMOLDOCLING_MLX,
1416
SMOLDOCLING_TRANSFORMERS,
1517
)
@@ -34,6 +36,8 @@ def download_models(
3436
with_code_formula: bool = True,
3537
with_picture_classifier: bool = True,
3638
with_smolvlm: bool = False,
39+
with_granitedocling: bool = False,
40+
with_granitedocling_mlx: bool = False,
3741
with_smoldocling: bool = False,
3842
with_smoldocling_mlx: bool = False,
3943
with_granite_vision: bool = False,
@@ -86,6 +90,24 @@ def download_models(
8690
progress=progress,
8791
)
8892

93+
if with_granitedocling:
94+
_log.info("Downloading GraniteDocling model...")
95+
download_hf_model(
96+
repo_id=GRANITEDOCLING_TRANSFORMERS.repo_id,
97+
local_dir=output_dir / GRANITEDOCLING_TRANSFORMERS.repo_cache_folder,
98+
force=force,
99+
progress=progress,
100+
)
101+
102+
if with_granitedocling_mlx:
103+
_log.info("Downloading GraniteDocling MLX model...")
104+
download_hf_model(
105+
repo_id=GRANITEDOCLING_MLX.repo_id,
106+
local_dir=output_dir / GRANITEDOCLING_MLX.repo_cache_folder,
107+
force=force,
108+
progress=progress,
109+
)
110+
89111
if with_smoldocling:
90112
_log.info("Downloading SmolDocling model...")
91113
download_hf_model(

docs/examples/batch_convert.py

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,33 @@
1+
# %% [markdown]
2+
# Batch convert multiple PDF files and export results in several formats.
3+
4+
# What this example does
5+
# - Loads a small set of sample PDFs.
6+
# - Runs the Docling PDF pipeline once per file.
7+
# - Writes outputs to `scratch/` in multiple formats (JSON, HTML, Markdown, text, doctags, YAML).
8+
9+
# Prerequisites
10+
# - Install Docling and dependencies as described in the repository README.
11+
# - Ensure you can import `docling` from your Python environment.
12+
# <!-- YAML export requires `PyYAML` (`pip install pyyaml`). -->
13+
14+
# Input documents
15+
# - By default, this example uses a few PDFs from `tests/data/pdf/` in the repo.
16+
# - If you cloned without test data, or want to use your own files, edit
17+
# `input_doc_paths` below to point to PDFs on your machine.
18+
19+
# Output formats (controlled by flags)
20+
# - `USE_V2 = True` enables the current Docling document exports (recommended).
21+
# - `USE_LEGACY = False` keeps legacy Deep Search exports disabled.
22+
# You can set it to `True` if you need legacy formats for compatibility tests.
23+
24+
# Notes
25+
# - Set `pipeline_options.generate_page_images = True` to include page images in HTML.
26+
# - The script logs conversion progress and raises if any documents fail.
27+
# <!-- This example shows both helper methods like `save_as_*` and lower-level
28+
# `export_to_*` + manual file writes; outputs may overlap intentionally. -->
29+
# %%
30+
131
import json
232
import logging
333
import time
@@ -15,6 +45,9 @@
1545

1646
_log = logging.getLogger(__name__)
1747

48+
# Export toggles:
49+
# - USE_V2 controls modern Docling document exports.
50+
# - USE_LEGACY enables legacy Deep Search exports for comparison or migration.
1851
USE_V2 = True
1952
USE_LEGACY = False
2053

@@ -35,6 +68,9 @@ def export_documents(
3568
doc_filename = conv_res.input.file.stem
3669

3770
if USE_V2:
71+
# Recommended modern Docling exports. These helpers mirror the
72+
# lower-level "export_to_*" methods used below, but handle
73+
# common details like image handling.
3874
conv_res.document.save_as_json(
3975
output_dir / f"{doc_filename}.json",
4076
image_mode=ImageRefMode.PLACEHOLDER,
@@ -121,6 +157,9 @@ def export_documents(
121157
def main():
122158
logging.basicConfig(level=logging.INFO)
123159

160+
# Location of sample PDFs used by this example. If your checkout does not
161+
# include test data, change `data_folder` or point `input_doc_paths` to
162+
# your own files.
124163
data_folder = Path(__file__).parent / "../../tests/data"
125164
input_doc_paths = [
126165
data_folder / "pdf/2206.01062.pdf",
@@ -139,6 +178,8 @@ def main():
139178
# settings.debug.visualize_tables = True
140179
# settings.debug.visualize_cells = True
141180

181+
# Configure the PDF pipeline. Enabling page image generation improves HTML
182+
# previews (embedded images) but adds processing time.
142183
pipeline_options = PdfPipelineOptions()
143184
pipeline_options.generate_page_images = True
144185

@@ -152,11 +193,14 @@ def main():
152193

153194
start_time = time.time()
154195

196+
# Convert all inputs. Set `raises_on_error=False` to keep processing other
197+
# files even if one fails; errors are summarized after the run.
155198
conv_results = doc_converter.convert_all(
156199
input_doc_paths,
157200
raises_on_error=False, # to let conversion run through all and examine results at the end
158201
)
159-
success_count, partial_success_count, failure_count = export_documents(
202+
# Write outputs to ./scratch and log a summary.
203+
_success_count, _partial_success_count, failure_count = export_documents(
160204
conv_results, output_dir=Path("scratch")
161205
)
162206

0 commit comments

Comments
 (0)