Skip to content

Commit 0610d01

Browse files
authored
fix: enrichment of documents without pages metadata (pptx and xlsx) (#2401)
fix logic for pptx and xlsx Signed-off-by: Michele Dolfi <[email protected]>
1 parent 9705f40 commit 0610d01

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docling/models/base_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,11 +173,11 @@ def prepare_element(
173173
assert isinstance(element, DocItem)
174174

175175
# Allow the case of documents without page images but embedded images (e.g. Word and HTML docs)
176-
if len(element.prov) == 0 and isinstance(element, PictureItem):
176+
if isinstance(element, PictureItem):
177177
embedded_im = element.get_image(conv_res.document)
178178
if embedded_im is not None:
179179
return ItemAndImageEnrichmentElement(item=element, image=embedded_im)
180-
else:
180+
elif len(element.prov) == 0:
181181
return None
182182

183183
# Crop the image form the page

0 commit comments

Comments
 (0)