Original text visible beneath translated text. #2317
-
This occurs after turning a non-searchable pdf to an image(pymupdf), then ocr(Tesseract), then back to pdf. I then run the pdf through Google Translate and the overwritten text is the result. pdf before translation: pdf after translation: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
The translation has converted the original page to an image, upon which the translated text has been written. import fitz
doc=fitz.open("page_4_tr.pdf")
page=doc[0]
print(page.get_images())
[(13, 0, 1681, 2378, 8, 'DeviceRGB', '', 'FXX1', 'DCTDecode')]
page.delete_image(13)
doc.save("english.pdf", garbage=3, deflate=True) This gives a page with only the English text - nothing else. Note the garbage collection and compression options. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
The translation has converted the original page to an image, upon which the translated text has been written.
So the easiest way probably is to just remove that image:
This gives a page with only the English text - nothing else. Note the garbage collection and compression options.