feat: add use_bundled_models flag to RapidOcrOptions (#2584)#2896
Open
geoHeil wants to merge 2 commits intodocling-project:mainfrom
Open
feat: add use_bundled_models flag to RapidOcrOptions (#2584)#2896geoHeil wants to merge 2 commits intodocling-project:mainfrom
geoHeil wants to merge 2 commits intodocling-project:mainfrom
Conversation
Contributor
|
✅ DCO Check Passed Thanks @geoHeil, all your commits are properly signed off. 🎉 |
…#2584) Add explicit control for using RapidOCR's bundled models instead of artifacts_path. This addresses the issue where users couldn't use RapidOCR's pre-packaged models when artifacts_path was set globally. Changes: - Add use_bundled_models flag to RapidOcrOptions (default: False) - When True: Ignores artifacts_path and uses RapidOCR's bundled models - When False: Follows Docling's standard behavior (uses artifacts_path) - Explicitly set model paths always take precedence This maintains Docling's design philosophy where artifacts_path controls model sources by default, while providing an explicit opt-in for users who want to use RapidOCR's pre-installed models. Fixes docling-project#2584 Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
b70f196 to
d34d11a
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Related Documentation 2 document(s) may need updating based on files changed in this PR: Docling Can I use a custom OCR model in Docling, and how do I set its path in the pipeline options?View Suggested Changes@@ -24,7 +24,7 @@
)
```
-- **RapidOCR:** Set `det_model_path`, `cls_model_path`, `rec_model_path`, and `font_path` in `RapidOcrOptions` to your custom model files and font file. The `font_path` option allows you to specify a custom font for text recognition.
+- **RapidOCR:** Set `det_model_path`, `cls_model_path`, `rec_model_path`, and `font_path` in `RapidOcrOptions` to your custom model files and font file. The `font_path` option allows you to specify a custom font for text recognition. Alternatively, set `use_bundled_models=True` to use RapidOCR's bundled models (shipped with the package) instead of models from `artifacts_path`.
```python
from docling.datamodel.pipeline_options import PdfPipelineOptions, RapidOcrOptions
pipeline_options = PdfPipelineOptions(
@@ -38,6 +38,20 @@
)
)
```
+
+ To use RapidOCR's bundled models:
+ ```python
+ from docling.datamodel.pipeline_options import PdfPipelineOptions, RapidOcrOptions
+ pipeline_options = PdfPipelineOptions(
+ do_ocr=True,
+ ocr_options=RapidOcrOptions(
+ lang=["en", "ru"],
+ use_bundled_models=True
+ )
+ )
+ ```
+
+ Note: Explicitly set model paths (`det_model_path`, etc.) always take precedence over `use_bundled_models` and `artifacts_path`.
After configuring your `pipeline_options`, pass them to your `DocumentConverter` as usual.
Models handling in Docling ServeView Suggested Changes@@ -143,11 +143,21 @@
## Local Docker Execution
For local Docker or Podman execution, you can use any of the approaches above. Mounting a local directory with pre-downloaded models is the most reliable for repeated runs and avoids network dependencies. EasyOCR models are included by default in auto-ocr workflows.
+## RapidOCR Bundled Models
+RapidOCR supports using bundled models (shipped with the package) instead of models from the artifacts path. This is controlled by the `use_bundled_models` flag in `RapidOcrOptions`:
+
+- When `use_bundled_models=True`: RapidOCR uses models from its own package directory (`site-packages/rapidocr/models`), ignoring the artifacts path.
+- When `use_bundled_models=False` (default): RapidOCR follows Docling's standard behavior and uses models from the artifacts path when set.
+- Explicitly set model paths (such as `det_model_path`, `cls_model_path`, or `rec_model_path`) always take precedence over both bundled models and the artifacts path.
+
+This option is useful when you want to use RapidOCR's default models without managing separate model downloads.
+
## Troubleshooting and Best Practices
- If a required model is missing from the artifacts path, Docling Serve will raise a runtime error.
- Always ensure the value of `DOCLING_SERVE_ARTIFACTS_PATH` matches the directory where models are stored and mounted.
- For multi-worker or reload scenarios, use the environment variable, not the CLI argument, to set the artifacts path.
- For production and cluster environments, prefer persistent storage and pre-loading models via a dedicated job.
- EasyOCR models are now included by default in auto-ocr; explicit inclusion is only needed for custom workflows.
+- For RapidOCR, you can use bundled models by setting `use_bundled_models=True` in the OCR options, which bypasses the need for artifacts path configuration.
For more details and YAML manifest examples, see the [pre-loading models documentation](https://github.com/docling-project/docling-serve/blob/fd1b987e8dc174f1a6013c003dde33e9acbae39a/docs/pre-loading-models.md) and [deployment documentation](https://github.com/docling-project/docling-serve/blob/fd1b987e8dc174f1a6013c003dde33e9acbae39a/docs/deployment.md).Note: You must be authenticated to accept/decline updates. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resolves: #2584 (comment)