Skip to content

feat: add use_bundled_models flag to RapidOcrOptions (#2584)#2896

Open
geoHeil wants to merge 2 commits intodocling-project:mainfrom
geoHeil:fix/rapidocr-v2
Open

feat: add use_bundled_models flag to RapidOcrOptions (#2584)#2896
geoHeil wants to merge 2 commits intodocling-project:mainfrom
geoHeil:fix/rapidocr-v2

Conversation

@geoHeil
Copy link
Contributor

@geoHeil geoHeil commented Jan 20, 2026

resolves: #2584 (comment)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 20, 2026

DCO Check Passed

Thanks @geoHeil, all your commits are properly signed off. 🎉

…#2584)

Add explicit control for using RapidOCR's bundled models instead of
artifacts_path. This addresses the issue where users couldn't use
RapidOCR's pre-packaged models when artifacts_path was set globally.

Changes:
- Add use_bundled_models flag to RapidOcrOptions (default: False)
- When True: Ignores artifacts_path and uses RapidOCR's bundled models
- When False: Follows Docling's standard behavior (uses artifacts_path)
- Explicitly set model paths always take precedence

This maintains Docling's design philosophy where artifacts_path controls
model sources by default, while providing an explicit opt-in for users
who want to use RapidOCR's pre-installed models.

Fixes docling-project#2584

Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
@mergify
Copy link

mergify bot commented Jan 20, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 53.84615% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/models/stages/ocr/rapid_ocr_model.py 50.00% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@geoHeil geoHeil marked this pull request as ready for review February 19, 2026 16:07
@dosubot
Copy link

dosubot bot commented Feb 19, 2026

Related Documentation

2 document(s) may need updating based on files changed in this PR:

Docling

Can I use a custom OCR model in Docling, and how do I set its path in the pipeline options?
View Suggested Changes
@@ -24,7 +24,7 @@
   )
   ```
 
-- **RapidOCR:** Set `det_model_path`, `cls_model_path`, `rec_model_path`, and `font_path` in `RapidOcrOptions` to your custom model files and font file. The `font_path` option allows you to specify a custom font for text recognition.
+- **RapidOCR:** Set `det_model_path`, `cls_model_path`, `rec_model_path`, and `font_path` in `RapidOcrOptions` to your custom model files and font file. The `font_path` option allows you to specify a custom font for text recognition. Alternatively, set `use_bundled_models=True` to use RapidOCR's bundled models (shipped with the package) instead of models from `artifacts_path`.
   ```python
   from docling.datamodel.pipeline_options import PdfPipelineOptions, RapidOcrOptions
   pipeline_options = PdfPipelineOptions(
@@ -38,6 +38,20 @@
       )
   )
   ```
+  
+  To use RapidOCR's bundled models:
+  ```python
+  from docling.datamodel.pipeline_options import PdfPipelineOptions, RapidOcrOptions
+  pipeline_options = PdfPipelineOptions(
+      do_ocr=True,
+      ocr_options=RapidOcrOptions(
+          lang=["en", "ru"],
+          use_bundled_models=True
+      )
+  )
+  ```
+  
+  Note: Explicitly set model paths (`det_model_path`, etc.) always take precedence over `use_bundled_models` and `artifacts_path`.
 
 After configuring your `pipeline_options`, pass them to your `DocumentConverter` as usual.
 

[Accept] [Decline]

Models handling in Docling Serve
View Suggested Changes
@@ -143,11 +143,21 @@
 ## Local Docker Execution
 For local Docker or Podman execution, you can use any of the approaches above. Mounting a local directory with pre-downloaded models is the most reliable for repeated runs and avoids network dependencies. EasyOCR models are included by default in auto-ocr workflows.
 
+## RapidOCR Bundled Models
+RapidOCR supports using bundled models (shipped with the package) instead of models from the artifacts path. This is controlled by the `use_bundled_models` flag in `RapidOcrOptions`:
+
+- When `use_bundled_models=True`: RapidOCR uses models from its own package directory (`site-packages/rapidocr/models`), ignoring the artifacts path.
+- When `use_bundled_models=False` (default): RapidOCR follows Docling's standard behavior and uses models from the artifacts path when set.
+- Explicitly set model paths (such as `det_model_path`, `cls_model_path`, or `rec_model_path`) always take precedence over both bundled models and the artifacts path.
+
+This option is useful when you want to use RapidOCR's default models without managing separate model downloads.
+
 ## Troubleshooting and Best Practices
 - If a required model is missing from the artifacts path, Docling Serve will raise a runtime error.
 - Always ensure the value of `DOCLING_SERVE_ARTIFACTS_PATH` matches the directory where models are stored and mounted.
 - For multi-worker or reload scenarios, use the environment variable, not the CLI argument, to set the artifacts path.
 - For production and cluster environments, prefer persistent storage and pre-loading models via a dedicated job.
 - EasyOCR models are now included by default in auto-ocr; explicit inclusion is only needed for custom workflows.
+- For RapidOCR, you can use bundled models by setting `use_bundled_models=True` in the OCR options, which bypasses the need for artifacts path configuration.
 
 For more details and YAML manifest examples, see the [pre-loading models documentation](https://github.com/docling-project/docling-serve/blob/fd1b987e8dc174f1a6013c003dde33e9acbae39a/docs/pre-loading-models.md) and [deployment documentation](https://github.com/docling-project/docling-serve/blob/fd1b987e8dc174f1a6013c003dde33e9acbae39a/docs/deployment.md).

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make RapidOcrOptions ignore artifacts_path

1 participant

Comments