How to get text horizontal width #1950
-
How to get the horizontal scaled value of the text. refer below image |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
There is no way to extract this value. pprint(page.get_text("dict")["blocks"]) # get the text in dictionary format
[{'bbox': (18.8164005279541,
17.379932403564453,
27.15677833557129,
22.965869903564453),
'lines': [{'bbox': (18.8164005279541,
17.379932403564453,
27.15677833557129,
22.965869903564453),
'dir': (1.0, 0.0),
'spans': [{'ascender': 0.9052734375,
'bbox': (18.8164005279541,
17.379932403564453,
27.15677833557129,
22.965869903564453),
'color': 22352,
'descender': -0.2119140625,
'flags': 16,
'font': 'Arial-BoldMT',
'origin': (18.8164005279541, 21.906299591064453),
'size': 3.535533905029297,
'text': '123456'}],
'wmode': 0}],
'number': 0,
'type': 0},
{'bbox': (18.8164005279541,
25.971233367919922,
43.837547302246094,
40.14897155761719),
'lines': [{'bbox': (18.8164005279541,
25.971233367919922,
35.49716567993164,
31.557170867919922),
'dir': (1.0, 0.0),
'spans': [{'ascender': 0.9052734375,
'bbox': (18.8164005279541,
25.971233367919922,
35.49716567993164,
31.557170867919922),
'color': 22352,
'descender': -0.2119140625,
'flags': 16,
'font': 'Arial-BoldMT',
'origin': (18.8164005279541, 30.497600555419922),
'size': 5.0,
'text': '123456'}],
'wmode': 0},
{'bbox': (18.8164005279541,
34.56303405761719,
43.837547302246094,
40.14897155761719),
'dir': (1.0, 0.0),
'spans': [{'ascender': 0.9052734375, # this is the text span we want to look at
'bbox': (18.8164005279541,
34.56303405761719,
43.837547302246094, # width of the text is bbox.width
40.14897155761719),
'color': 22352,
'descender': -0.2119140625,
'flags': 16,
'font': 'Arial-BoldMT',
'origin': (18.8164005279541, 39.08940124511719),
'size': 6.123724460601807, # the font size used
'text': '123456'}],
'wmode': 0}],
'number': 1,
'type': 0}]
pprint(page.get_fonts()) # extract the font of that text span
[(5, 'ttf', 'TrueType', 'TWVHJA+Arial-BoldMT', 'TT0', 'WinAnsiEncoding')]
font=fitz.Font(fontbuffer=doc.extract_font(5)[-1]) # create the fitz.Font of that font
font.text_length("123456",fontsize=6.123724460601807) # check the "normal" text width of that font size
20.434342267457396
width=43.837547302246094 - 18.8164005279541 # bbox width
width / 20.434342267457396 # compute actual width divided by "normal" width
1.2244654829991415 # so it looks like our text is 22.45% wider |
Beta Was this translation helpful? Give feedback.
-
Looking at all the different spans: spans = []
for b in page.get_text("dict")["blocks"]:
for l in b["lines"]:
for s in l["spans"]:
spans.append(s)
for s in spans:
fs = s["size"]
width=s["bbox"][2]-s["bbox"][0]
tl=font.text_length("123456",fontsize=fs)
print(width / tl)
0.706945111138104
0.9997719353599649
1.2244654829991415 |
Beta Was this translation helpful? Give feedback.
-
@JorjMcKie thanks for this solution. it's working perfectly. But I have another issue. I know because of the horizontal scale. My original requirement is, how to get the correct font size as per Adobe illustrator or Adobe acrobat. |
Beta Was this translation helpful? Give feedback.
-
You could do this:
{'bbox': (18.8164005279541,
19.041404724121094,
20.2067813873291,
22.57693862915039),
'c': '1',
'origin': (18.8164005279541, 21.906299591064453)}
Please also note that the formnulation "original fontsize" is not correct ... because Adobe neither reports it! print(page.read_contents().decode())
/OC /MC0 BDC
BT
/CS0 cs 1 scn
/GS0 gs
/TT0 1 Tf # "original" fontsize is 1!
2.5 0 0 5 18.8164 36.4541 Tm # this text matrix scales x by 2.5 and y by 5
(123456)Tj
5 0 0 5 18.8164 27.8628 Tm
(123456)Tj
7.5 0 0 5 18.8164 19.271 Tm
(123456)Tj
ET
EMC So it is somewhat deliberate to say 5 is the original font size. The reported font size 3.535533905029297 is equal to sqrt(2.5 * 5), the geometric mean of the two scaling factors. |
Beta Was this translation helpful? Give feedback.
You could do this:
extract the text after executing
fitz.TOOLS.set_small_glyph_heights(True)
. Thenfont.glyph_advance(ord("1")) = 0.55615234375
for character "1".