Skip to content
Discussion options

You must be logged in to vote

Why are you using the flags value 31? Its bit decomposition is '0b11111', which, among other things, suppresses the corrective MuPDF action that inserts spaces where deemed beneficial ...
IAW you are setting fitz.TEXT_INHIBIT_SPACES.

Here is what I get as a result:

In [1]: import fitz

In [2]: doc=fitz.open("en.company_presentation.pdf")

In [3]: page=doc[1]

In [4]: print(page.get_text(sort=True))  # sort option internally causes "blocks" extraction
We are creators and makers of technology
One of the worlds largest semiconductor companies
$16.1 billion revenues
in 2022
Over 50,000 employees
of which 9,000+ in R&D
14 main manufacturing
sites
Over 80 sales & marketing
offices serving over 2…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@henrygriffiths
Comment options

@JorjMcKie
Comment options

Answer selected by henrygriffiths
@henrygriffiths
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #2437 on June 01, 2023 07:34.