Text Positions #1802

meghanaviyyapu · 2022-07-07T12:02:25Z

meghanaviyyapu
Jul 7, 2022

I have extracted rect values of text positions from a PDF. I want to insert text based on the positions extracted. Can you let me know if text matrix(Tm) values of a PDF can be obtained based on rect values? Is there any difference between rect values of text and text matrix values?

Answered by JorjMcKie

Jul 7, 2022

The text matrix Tm in PDF /Contents source is not directly available. Its use is also not intended in PyMuPDF - although you could of course read the /Contents as a stream and dig your way through to all information you want.

The bbox accompanying extracted text always already contains any computations required to deliver the correct value.

So you can use those bboxes (plus the span["origin"] - which is even more important) for inserting the span text.

View full answer

JorjMcKie · 2022-07-07T12:47:49Z

JorjMcKie
Jul 7, 2022
Maintainer

The text matrix Tm in PDF /Contents source is not directly available. Its use is also not intended in PyMuPDF - although you could of course read the /Contents as a stream and dig your way through to all information you want.

The bbox accompanying extracted text always already contains any computations required to deliver the correct value.

So you can use those bboxes (plus the span["origin"] - which is even more important) for inserting the span text.

11 replies

JorjMcKie Jul 7, 2022
Maintainer

In this case simple set the required text matrix to the identity matrix (1,0,0,1,0,0).
But confirm that that other package uses the same geometry as MuPDF: top-left = (0,0).
Otherwise use the matrix ~page.transformation_matrix to recover PDF geometry: bottom-left = (0,0).

meghanaviyyapu Jul 7, 2022
Author

My objective is to insert text in a new PDF based on the text extracted from the input PDF. To insert text at the same position as that of input PDF, can you let me know the values that will be required and if extracted values can be used directly without any transformation?

meghanaviyyapu Jul 7, 2022
Author

In this case simple set the required text matrix to the identity matrix (1,0,0,1,0,0). But confirm that that other package uses the same geometry as MuPDF: top-left = (0,0). Otherwise use the matrix ~page.transformation_matrix to recover PDF geometry: bottom-left = (0,0).

Can you let me know what (x,y) values I should provide while inserting text to insert at the exact position of the input PDF?

JorjMcKie Jul 7, 2022
Maintainer

This is span["origin"]. As I said: potentially to be multiplied with the transformation matrix inverse, depending on the geometry priciple of that other package.

meghanaviyyapu Jul 7, 2022
Author

Ok, Thanks for clarifying.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Text Positions #1802

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 11 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Text Positions #1802

Uh oh!

meghanaviyyapu Jul 7, 2022

Replies: 1 comment · 11 replies

Uh oh!

JorjMcKie Jul 7, 2022 Maintainer

Uh oh!

JorjMcKie Jul 7, 2022 Maintainer

Uh oh!

meghanaviyyapu Jul 7, 2022 Author

Uh oh!

Uh oh!

meghanaviyyapu Jul 7, 2022 Author

Uh oh!

JorjMcKie Jul 7, 2022 Maintainer

Uh oh!

meghanaviyyapu Jul 7, 2022 Author

meghanaviyyapu
Jul 7, 2022

Replies: 1 comment 11 replies

JorjMcKie
Jul 7, 2022
Maintainer

JorjMcKie Jul 7, 2022
Maintainer

meghanaviyyapu Jul 7, 2022
Author

meghanaviyyapu Jul 7, 2022
Author

JorjMcKie Jul 7, 2022
Maintainer

meghanaviyyapu Jul 7, 2022
Author