Pixmap.copy not reflected in saved document #1495
Unanswered
python3Berg
asked this question in
Q&A
Replies: 2 comments 3 replies
-
taking the liberty to transform this into a Discussions item ... |
Beta Was this translation helpful? Give feedback.
0 replies
-
I don't know about "unpaper", but it seems it does somethinbg with an image to produce a new image. imgdoc = fitz.open(newpix.tobytes()) # make a Document from the pixmap's image
pdfbytes = imgdoc.convert_to_pdf() # convert it to the binary stream of a PDF
imgpdf = fitz.open("pdf", pdfbytes) # make a PDF doc from the stream data
doc.insert_pdf(imgpdf) # append the original image as a PDF page
# clean up stuff
imgdoc.close()
imgpdf.close()
newpix = None As I said: just an example. You could have OCRed newpix first too or whatever |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to use unpaper to preprocess pdf images before sending them to ocr. Following the documentation examples, it appears I should be able to copy a deskewed pixmap into the existing document and then have a cleaner page for subsequent processing. Following the example almost verbatim and then saving document at end of page loop, I would expect to see something similar to deskewing provided by ocrmypdf or equivalent. Instead I see document that appears unchanged. I am far more experienced with extraction than modification so perhaps I am missing something obvious. The preprocessing is working..just not saving
Your configuration (mandatory)
Ubuntu 20.04
python 3.8
pymupdf 1.18.17
Beta Was this translation helpful? Give feedback.
All reactions