Unable to extract "Oblique font style" and "Helvetica font family" being extracted in two different ways in two different machines #1878
-
I am trying to extract font styles and family using the below code:
I am attaching my input pdf below which has text written in Times New Roman font family in normal, italic and oblique font styles (1st, 2nd and 3rd paragraphs respectively) and also text (4th paragraph) written in Helvetica Font family. The output of the above code is
in one machine and
in another machine |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
This is what I get: for b in page.get_text("dict",flags=fitz.TEXTFLAGS_TEXT)["blocks"]:
for l in b["lines"]:
for s in l["spans"]:
print(f"font {s['font']}, text: '{s['text']}'")
font TimesNewRomanPSMT, text: 'This is a paragraph in normal style. '
font TimesNewRomanPS-ItalicMT, text: 'This is a paragraph in italic style. '
font TimesNewRomanPS-ItalicMT, text: 'This is a paragraph in oblique style. '
font Helvetica, text: 'This is a paragraph in Helvetica font family. '
font Calibri, text: ' ' Who wrote 1 space char in Calibri? |
Beta Was this translation helpful? Give feedback.
This is what I get:
Who wrote 1 space char in Calibri?
Blame him!