how to convert scanned pdf to searchable pdf #2474
Unanswered
Laxmi530
asked this question in
Looking for help
Replies: 1 comment 10 replies
-
There is more than one option:
import fitz
DPI = 150 # desired resolution
src = fitz.open("input.pdf")
doc = fitz.open() # output PDF with text layer
for page in src:
pix = page.get_pixmap(dpi=DPI)
imgpdf = fitz.open("pdf", pix.pdfocr_tobytes()) # make 1-page temp PDF with text layer
doc.insert_pdf(imgpdf) # append page
imgpdf.close()
doc.save("input-ocr.pdf") |
Beta Was this translation helpful? Give feedback.
10 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hai,
Actually, I am trying to convert a scanned pdf or non-searchable pdf to searchable pdf by using pytesseract. I am able to do if the pdf is one page pdf when it becomes multiple page it is doing only the last page.
Can someone please help me is there any method available in PyMuPDF. Below is the sample code and I am using.
Thanking you advance.
Beta Was this translation helpful? Give feedback.
All reactions