Can TextWriter embed a subset of a font automatically? #1910
-
Background: this code puts two Chinese characters on the page. File size, 4KiB.
This one does something very similar but file size is 3.5MiB:
I know the comparison is not fair b/c the first one does not embed fonts and (I'm supposing) does not work reliably on printers, for example. Adobe Acrobat users have to download extra stuff. There are various advantages of the 2nd option but file size is certainly not one of them. Is it do-able for PyMuPDF to embed a subset? I realize this gets really tricky when one uses multiple TextWriters over multiple pages. But to keep it simple, suppose I had just one TextWriter. In my use-case its to "stamp" folks' names on the front of an existing files. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Docs:
Hmmm, so why is my example 3.5MiB? |
Beta Was this translation helpful? Give feedback.
-
The |
Beta Was this translation helpful? Give feedback.
-
Very nice, and very complete answer. Sorry for not RTFM on One thing that scares me a bit about In particular, if the upstream PDF did not subset their fonts, it seems "rude"/dangerous for me to do so. Maybe this concern is unfounded. Is it realistic to request a allow-list for |
Beta Was this translation helpful? Give feedback.
The
Document
methodsubset_fonts()
is independent from TextWriter and always works. It will walk through all the PDF pages and collect all their characters by font - but only for those fonts that are no subsets already.Then present each font with all used chars in the file to fonttools and let it compute a subset font.
If successful (should work for OTF, TTF and WOFF fonts), then the subset fontfile is used to replace the original. Also the font (base) name is prefixed with that PDF-specific 6 character prefix.