Detect subscript and superscript with pymupdf? #3286

CaiSamuelsFSA · 2024-03-20T17:40:28Z

CaiSamuelsFSA
Mar 20, 2024

I currently have a script that extracts text from a PDF file and converts it into HTML.

Some of the documents include superscript for references and footnotes.

Is there a way for pymupdf to detect subscript and superscript so I can output the HTML equivalent?

Mar 20, 2024

You can detect detect (many, not all) superscripts by checking the span["flags"] in page.get_text("dict", ...), same with "rawdict".
There is no way to detect subscripts.

View full answer

JorjMcKie · 2024-03-20T20:42:44Z

JorjMcKie
Mar 20, 2024
Maintainer

You can detect detect (many, not all) superscripts by checking the span["flags"] in page.get_text("dict", ...), same with "rawdict".
There is no way to detect subscripts.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect subscript and superscript with pymupdf? #3286

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Detect subscript and superscript with pymupdf? #3286

Uh oh!

CaiSamuelsFSA Mar 20, 2024

Replies: 1 comment

Uh oh!

JorjMcKie Mar 20, 2024 Maintainer

CaiSamuelsFSA
Mar 20, 2024

JorjMcKie
Mar 20, 2024
Maintainer