-
-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
The returned text output from pdfplumber sometimes returns all tokens duplicated. Here is a short snippet of what it looks like:
HHeett mmaaxxiimmaaaall fíiissccaaaall...
Unfortunately I cannot provide an example pdf, since it is private data.
It is a known issue, and pdfplumber has a method 'dedup_chars' to fix it. Can a setting be included to add a call to this in the pdf processing pipeline?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels