Running OCR on embedded images of PDF using Poppler pdfimages or ImageMapping instead of whole pdf pages converted to png?

Requesting a version of PDF OCR that only runs tesseract OCR on embedded images in PDF instead of capturing the whole page of the PDF.

A lot of my professors use powerpoints converted to PDF, the text is already text, while the screen-grabs they use lack this and could benefit from OCR.

I believe this could save time for others as well as not all PDF documents are purely images and often a combination.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running OCR on embedded images of PDF using Poppler pdfimages or ImageMapping instead of whole pdf pages converted to png? #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running OCR on embedded images of PDF using Poppler pdfimages or ImageMapping instead of whole pdf pages converted to png? #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions