-
I'm interested in redacting and modifying certain words from PDF documents, which I am able to do successfully. First I am doing some location detection, redaction at that location, then adding a shape with a textbox inserted into at that same location. This works. However, I am having trouble maintaining alignment - there does not seem to be anywhere within the PyMuPDF Object's that I can access this data, I am looking at extracting with rawdict expecting perhaps a block-level attribute or even span-level attribute, but this does not seem to be available. Per the Adobe PDF reference, it specifies that there is a TextAlign attribute available for block-level structured elements. I may be misinterpriting this as I am not super familiar with the PDF standard, but I am interpreting that as the alignment info should be available somewhere in the raw PDF data, and want a way to expose that to my program. I would like for that ideally to be added to the output of rawdict extraction, or some other means of retrieving it. If I need to get a specific textbox reference or something that is fine also. If there is any existing way to do this I would very much appreciate a pointer in the right direction, just where I have looked so far I haven't found anything. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
The replacment text in the redact method itself Is indeed only roughly aligned. If you however know that the new text would exactly fit, you can use the "origin" point from it's span to Insert. Likewise the font size.
This means you need a separate text Insertion step after applying the redactions.
Gesendet von Outlook für Android<https://aka.ms/AAb9ysg>
…________________________________
From: Adair Fulweber ***@***.***>
Sent: Tuesday, February 21, 2023 4:21:49 PM
To: pymupdf/PyMuPDF ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [pymupdf/PyMuPDF] Text Alignment Parsing (Issue #2244)
I'm interested in redacting and modifying certain words from PDF documents, which I am able to do successfully. First I am doing some location detection, redaction at that location, then adding a shape with a textbox inserted into at that same location. This works. However, I am having trouble maintaining alignment - there does not seem to be anywhere within the PyMuPDF Object's that I can access this data, I am looking at extracting with rawdict expecting perhaps a block-level attribute or even span-level attribute, but this does not seem to be available. Per the Adobe PDF reference, it specifies that there is a TextAlign attribute available for block-level structured elements. I may be misinterpriting this as I am not super familiar with the PDF standard, but I am interpreting that as the alignment info should be available somewhere in the raw PDF data, and want a way to expose that to my program.
I would like for that ideally to be added to the output of rawdict extraction, or some other means of retrieving it. If I need to get a specific textbox reference or something that is fine also. If there is any existing way to do this I would very much appreciate a pointer in the right direction, just where I have looked so far I haven't found anything. Thanks!
—
Reply to this email directly, view it on GitHub<#2244>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB7IDITIOP6AC5LB62SHAZLWYUPV3ANCNFSM6AAAAAAVDPM7EE>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I am using a separate text insertion step with the text rect being the same size of that of the original area, but my new text may not be the same pixel width/height. In the case where it is not exactly the same (i.e. the new text does not entirely fill up the area that the old text occupied), I would like to match the alignment of the previous text. If the original text is center justified I should provide align=1 when inserting the textbox and so on - but I am not seeing any way to obtain that data, though the specification seems to imply that it may exist. |
Beta Was this translation helpful? Give feedback.
-
It is not possible to find out a text's original alignment - sorry. Even the PDF spec does not have this concept, at least not in this way: |
Beta Was this translation helpful? Give feedback.
It is not possible to find out a text's original alignment - sorry. Even the PDF spec does not have this concept, at least not in…