How to get bytes instead of str inside span? #2834
Unanswered
ivanstepanovftw
asked this question in
Q&A
Replies: 1 comment 6 replies
-
This no issue, but a Discussions post. Transferring ... |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I need to get raw
bytes
that are inside span.span["text"]
is of typestr
.span["chars"][N]["c"]
is of typestr
too. What should I do to get raw bytes?PDF: http://tug.ctan.org/macros/latex/exptl/mem/arabic.pdf
This script is from https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/OCR/tesseract2.py, and the idea I am wanted to do is to extract bytes from span character and try to search for relevant glyph in the current font (that is of

.cff
file type), then render the glyph for OCR.FontForge screenshot:
Beta Was this translation helpful? Give feedback.
All reactions