Is there a way to easily tell which pages of a PDF were skipped when OCR is turned off? #1135
Unanswered
sanhuezapablo
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a way to easily tell which pages of the PDF were skipped when OCR is not enabled? I guess what I'm looking for is to process a PDF via Docling, and somehow use the
DoclingDocument
to determine which pages were skipped and then reference them with the image pages generated to go and process later. Is there such a way to do this?Example (mixed PDF with scanned pages and pages with searchable text):
Page 1 - Searchable text
Page 2 - Searchable Text
Page 3 - Scanned
Page 4 - Scanned
Page 5 - Searchable Text
Pages not processed when OCR is not enabled: Page 3 & 4
If it's not easily doable, any hints as what I should be looking for?
Beta Was this translation helpful? Give feedback.
All reactions