Detect subscript and superscript with pymupdf? #3286
Answered
by
JorjMcKie
CaiSamuelsFSA
asked this question in
Looking for help
-
I currently have a script that extracts text from a PDF file and converts it into HTML. Some of the documents include superscript for references and footnotes. Is there a way for pymupdf to detect subscript and superscript so I can output the HTML equivalent? |
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Mar 20, 2024
Replies: 1 comment
-
You can detect detect (many, not all) superscripts by checking the |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
CaiSamuelsFSA
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can detect detect (many, not all) superscripts by checking the
span["flags"]
inpage.get_text("dict", ...)
, same with "rawdict".There is no way to detect subscripts.