PDF - Only OCR Images and Background Images #2370
Replies: 2 comments
-
|
I have a similar issue, but I see some images. |
Beta Was this translation helpful? Give feedback.
-
|
Interestingly docling-parse finds them all. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Good day Community,
to optimize performance, plain text from PDFs should not undergo OCR, but images/pictures and background imagesshould undergo external vlm OCR and the resulting text/descriptuon/annotaion should be inserted at the correct location in the markdown export.
I identified that these images/pictures are located in two different places:
pictures[no].image.uri
pages[no].image.uri
Thanks and Best Regards,
Sascha
Edit: Correction, the image/s in question are not showing in the Docling document structure.
Beta Was this translation helpful? Give feedback.
All reactions