Replies: 2 comments 3 replies
-
|
Additional data point : with |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
I'd need to see evidence that this affects all viewers not just qpdfview. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Something is strange and I'm not sure what the problem could possibly be.
I'm processing files with
--redo-ocr --sidecar. A typical file has 1300 pages, 54MB before, 59MB after processing. The sidecar is a 1.9MB txt file.The original PDF does have some OCR already; if I extract the text with
pdf2txt.pyfrom pdfminer, it produces a 1.7MB text file. So the amount of text is roughly similar, by that metric.When I open the original PDF in qpdfview, a search takes less than 2 seconds. In the processed PDF, searching for the same text takes 28-30 seconds !
Is there something in PDF/A structure that would explain this ? And some way to improve the situation ? This effect was seen in 16.4.2 and 16.6.0 so probably not a random bug.
Beta Was this translation helpful? Give feedback.
All reactions