Can we delete vertical text? #2271
Replies: 2 comments 7 replies
-
EDIT; maybe I found a way
If the above statement is true, then this should do the job, although I'm not sure that deleting a block is changing the pdf variable. Can you guys double-check for me? def delete_vertical_text(pdf):
# loop though every page
for page in pdf:
# get all the block
blocks = page.getText("dict")["blocks"]
# loop though every block
for block in blocks:
# get the bounding box of the block
bbox = fitz.Rect(block["bbox"])
# calculate the width and height of the block
width, height = bbox.width, bbox.height
# if the block is vertical, remove it from the page
if height > width: page.delete_block(block)
return pdf |
Beta Was this translation helpful? Give feedback.
-
page.add_redact_annot(bbox1)
page.add_redact_annot(bbox2)
...
page.apply_redactions() Of course your check for vertical text is error-prone and will work only probably. A clean solution should check for the actual writing direction like this: for block in page.get_text("dict", flags=fitz.TEXTGLAGS_TEXT)["blocks"]:
for line in block["lines"]:
wdir = line["dir"] # writing direction = (cosine, sine)
if wdir[0] == 0: # either 90° or 270°
page.add_redact_annot(line["bbox"])
page.apply_redactions(images=fitz.REDACT_IMAGE_NONE) # remove text, but no image |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🙋🏻♂️ Hey there, I noticed that in 99% of my docs vertical texts are pretty much useless, so I want to get rid of them.
❓ I read it can be done with
page.get_text("dict", sort=True)
but I could not find examples online.🤖 ChatGPT gave me this output, but I'm pretty sure the "angle" property does not exist after reading the documentation here
📝 Here is the doc I'm parsing with PyMuPDF

Beta Was this translation helpful? Give feedback.
All reactions