Skip to content

Commit 4ab7e9d

Browse files
authored
fix: Guard against attribute errors in TesseractOcrModel __del__ (#1494)
This moves the initialization of the `reader` and `script_readers` attributes to before we attempt to import tesserocr, so that when later accessing these attributes in the garbage collection method `__del__` the attributes exist. This requires changing the typing of the `script_readers` dict value to `Any` because we cannot yet reference its actual strong type, since it's a tesserocr value. This prevents throwing an exception during garbage collection for cases where the TesseractOcrModel instance didn't properly initialize, like when it throws an `ImportError` during its initializer. Signed-off-by: Ben Browning <bbrownin@redhat.com>
1 parent cc45396 commit 4ab7e9d

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docling/models/tesseract_ocr_model.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import logging
22
from collections.abc import Iterable
33
from pathlib import Path
4-
from typing import Optional, Type
4+
from typing import Any, Optional, Type
55

66
from docling_core.types.doc import BoundingBox, CoordOrigin
77
from docling_core.types.doc.page import BoundingRectangle, TextCell
@@ -38,6 +38,8 @@ def __init__(
3838
self.options: TesseractOcrOptions
3939

4040
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
41+
self.reader = None
42+
self.script_readers: dict[str, Any] = {}
4143

4244
if self.enabled:
4345
install_errmsg = (
@@ -84,9 +86,7 @@ def __init__(
8486
"oem": tesserocr.OEM.DEFAULT,
8587
}
8688

87-
self.reader = None
8889
self.osd_reader = None
89-
self.script_readers: dict[str, tesserocr.PyTessBaseAPI] = {}
9090

9191
if self.options.path is not None:
9292
tesserocr_kwargs["path"] = self.options.path

0 commit comments

Comments
 (0)