Is there a way to easily tell which pages of a PDF were skipped when OCR is turned off? #1135

sanhuezapablo · 2025-03-08T02:34:34Z

sanhuezapablo
Mar 8, 2025

Is there a way to easily tell which pages of the PDF were skipped when OCR is not enabled? I guess what I'm looking for is to process a PDF via Docling, and somehow use the DoclingDocument to determine which pages were skipped and then reference them with the image pages generated to go and process later. Is there such a way to do this?

Example (mixed PDF with scanned pages and pages with searchable text):
Page 1 - Searchable text
Page 2 - Searchable Text
Page 3 - Scanned
Page 4 - Scanned
Page 5 - Searchable Text

Pages not processed when OCR is not enabled: Page 3 & 4

If it's not easily doable, any hints as what I should be looking for?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to easily tell which pages of a PDF were skipped when OCR is turned off? #1135

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is there a way to easily tell which pages of a PDF were skipped when OCR is turned off? #1135

Uh oh!

sanhuezapablo Mar 8, 2025

Replies: 0 comments

sanhuezapablo
Mar 8, 2025