How to extract text from rectangles larger than CropBox #1603
-
Describe the bugI am trying to extract text from a PDF for offset print that in the mediabox (outside the trimbox) contains text for printing worflow. I would like this extra-page information not to be extracted with the get_text, so I've set the clip argument equal to the bleed box, but the method always returns text that is outside the page as well. To ReproduceSample file: p5.pdf
Expected behaviorExtract only visible text. ScreenshotsYour configuration
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Both rectangles cropbox, bleedbox contain all the text! |
Beta Was this translation helpful? Give feedback.
-
To confirm draw the rectangles, set cropbox to mediabox and save to a new file. |
Beta Was this translation helpful? Give feedback.
-
No - revoking the previous! Forget it.
|
Beta Was this translation helpful? Give feedback.
No - revoking the previous! Forget it.
Text extraction does not / cannot "know" that you are handing in the
/BleedBox
. It has no choice but to assume that clip is a rectangle relative to the cropbox - it will not be regarded as being relative to the mediabox.So what always works is choosing the cropbox as clip.
If you want a rectangle larger than that, you must