Matching the width of existing text on PDF #3267
-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Your basic problem is that the inter-character spacing information is not delivered to you by our text extraction: Another option is using the morph parameter: this is a tuple (point, matrix). |
Beta Was this translation helpful? Give feedback.
Your basic problem is that the inter-character spacing information is not delivered to you by our text extraction:
In PDF you can treat character positions individually, sometimes letting the characters themselves "decide" about their distance to the predecessor, sometimes adding a modifier that shifts the current character just a bit, left or right.
Other differences stem from how justified text is implemented: if words are significantly apart from each other here, they form a separate span, in other cases (distance not large enough), MuPDF decides to leave them in the same span.
Etc.
Whatever algorithm you choose: it won't get perfect this way in a failsafe manner.
You probably have to …