-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Requested feature
Currently, when using RapidOCR, the initialization of RapidOcrModel considers artifacts_path. That is, it searches for model artifacts under artifacts_path as defined here:
docling/docling/models/auto_ocr_model.py
Lines 65 to 74 in 6a04e27
self._engine = RapidOcrModel( enabled=self.enabled, artifacts_path=artifacts_path, options=RapidOcrOptions( backend="onnxruntime", bitmap_area_threshold=self.options.bitmap_area_threshold, force_full_page_ocr=self.options.force_full_page_ocr, ), accelerator_options=accelerator_options, ) docling/docling/models/rapid_ocr_model.py
Lines 125 to 149 in 6a04e27
if artifacts_path is not None: det_model_path = ( det_model_path or artifacts_path / self._model_repo_folder / self._default_models[backend_enum.value]["det_model_path"]["path"] ) cls_model_path = ( cls_model_path or artifacts_path / self._model_repo_folder / self._default_models[backend_enum.value]["cls_model_path"]["path"] ) rec_model_path = ( rec_model_path or artifacts_path / self._model_repo_folder / self._default_models[backend_enum.value]["rec_model_path"]["path"] ) rec_keys_path = ( rec_keys_path or artifacts_path / self._model_repo_folder / self._default_models[backend_enum.value]["rec_keys_path"]["path"] )
When installing rapidocr it already ships with the base onnx models available under ...\Lib\site-packages\rapidocr\models. Therefore, we usually do not need downloading these models separately from Modelscope. However, when setting artifacts_path in the pipeline, e.g., for loading the layout detection or table structure model, we are currently not able to load the default onnx models shipped with rapidocr.
@geoHeil Would it be possible to either deliberately skip a globally defined artifacts_path in order to preload the shipped models and skip downloading from Modelscope or b) make loading from ...\Lib\site-packages\rapidocr\models the default?