How to un-rotate all text character bounding boxes #1723
Replies: 1 comment
-
As a follow up to this question since I can't share the data here is how the text appears at a low level in the PDF (NOTE: I have obfuscated the text). The "dir" is (0, -1) and the page shows a rotation of 90. As you can see in the BBOX and the ORIGIN for each character they are going up in the Y-Axis. However, on the screen it is going from right to left just in landscape mode. CH X Origin: (114.18000030517578, 720.0) BBOX:,(108.78533935546875, 715.5, 116.28533935546875, 720.0) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a PDF where the text was generated where the LINES of text are running down the page according to the character bbox'es. The image however, is rotated 90 degrees and it appears as a normal landscape image. I am find text and annotating strings and I need to check whether the annotation splits a line or not (or a page) which I do by iterating the text and looking for changes to the next line by looking at the y positions. However with this arrangement that doesn't really work (each character actually gets it's own annotation).
Is there a simple way to rotate the coordinates of all the text GLOBALLY so that they now conform to the visual page on the screen? (i.e. text rows are going from left to right and not top to bottom. I have tried rotating the page (page.set_rotation(0) for example) but the text coordinates do not change.
I could iterate and use the transform with the derotation matrix against each element but that seem like a very expensive call to make.
Do you have any suggestions on resolve this?
Beta Was this translation helpful? Give feedback.
All reactions