Skip to content
Discussion options

You must be logged in to vote

Please be more specific: what does randomly mean? These are multi-page documents, so a page number would help.
In test1.pdf page 0 there is no difference whether using sort=True or not.

As a general comment:
There never is a guarantee that sort=True will deliver text in a sequence you like. The reason is how PDF works ... not (Py-) Mupdf. Every single character can be stored internally in arbitrary sequence. For the 1090 characters of page 0 in test1.pdf this means there are 1090! = 2,1E+2839 different ways to produce the exact same page appearance. I tried to demonstrate this with the two files file1 and file2. They look the same, but file2 has its characters stored in random sequence. S…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1900 on August 30, 2022 07:51.