Handling indents #2197
Unanswered
vdavez
asked this question in
Looking for help
Replies: 1 comment
-
No the only way is to note at which x-coordinate the first character starts and set this into relation to page width and the corresponding values of other lines. Why don't you use the "dict" format: for block in page.get_text("dict", flags=fitz.TEXTFLAGS_TEXT)["blocks"]:
for line in block["lines"]:
bbox = line["bbox"]
text = "".join([span["text"] for span in line["spans"]])
print(f"line '{text}' starts at {bbox[0]}") To be a little picky with the wording: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello—First of all, this library is amazing. Thank you!
I am working with a PDF where indentation matters to a human but the PDF itself doesn't include any whitespace characters in the document. When I convert to
xhtml
(my desired output), I lose the whitespace.Is there any smart way to determine whether a line of text starts with whitespace and then insert whitespace characters so that when I output to xhtml, I can preserve that aspect of the layout?
Here's a screenshot of what I'm working with...
Thank you!!!
Beta Was this translation helpful? Give feedback.
All reactions