How to modify line properties on an existing pdf, or how to copy all the content to a new pdf changing only few properties #2870

kompre · 2023-12-06T18:43:20Z

kompre
Dec 6, 2023

Hello,

I'm new to PyMuPDF and I would like to know if is possible to copy almost exactly a source pdf, while changing only few properties?

The source file is a technical drawing which contains vector graphic (e.g. lines) and text. I only need to change the lineCap and lineJoin properties of the existing line objects, the rest can be the same.

Following the HOW TO GUIDE for drawing and graphics, I'm able to change the linework as I want, but from the resulting pdf is missing all the text.

I've gathered that vector graphics needs to be redrawn from scracth, there is no a simple change the property in-place, so I suppose I must create a new pdf, cannot change the existing one. Hence I need to copy any other object that is not a drawing/graphics object in the same position as the original pdf.

Playing a bit with the CLI mutool.exe, I've seen I can draw a new pdf with only the text from the original (-KK options), so it must be doable.

Can someone please nudge me in the correct direction?

JorjMcKie · 2023-12-08T12:20:43Z

JorjMcKie
Dec 8, 2023
Maintainer

This should be possible via extracting vector graphics and text and re-inserting both on a new page.

The redrawing happens via Page/Shape .draw_*() methods. Depending on how sophisticated the coloring has been done in the original (I am talking about the "even-odd" parameter), fill colors may not come out correctly.

Otherwise please look at this example script
extract-draws.zip

Please do not hesitate to ask if running into issues.

0 replies

kompre · 2023-12-12T08:49:22Z

kompre
Dec 12, 2023
Author

Hey, thanks for your support.

I've tried the script you attached and modified to best suit my case, but I'm not still there: while the linework seems fine, some text blocks "fall off", they do not get rendered (each run yield the same results):

I attach the modified script where:

I've modified lineJoin and lineCap properties as it was my preference in shpae.finis()
added or 0 to few properties (width, stroke_opacity, fill_opacity) that could get a None value, which would have raised aTypeError ('>=' not supported between instances of 'NoneType' and 'int')
force the use of Arial Narrow font, which is what is used in the input document
changed some camelCase method name to snake_case (e.g. getText -> get_text; maybe was an old script?)

#%%
import fitz

doc = fitz.open("input.pdf")
page = doc[0]
paths = page.get_drawings()  # extract existing drawings
# this is a list of "paths", which can directly be drawn again using Shape
# -------------------------------------------------------------------------
#
# define some output page with the same dimensions
outpdf = fitz.open()
outpage = outpdf.new_page(width=page.rect.width, height=page.rect.height)
shape = outpage.new_shape()  # make a drawing canvas for the output page

pathrects = []  # store all path rectangles here for text processing


def check_span(span):
    r = fitz.Rect(span["bbox"])
    for prect in pathrects:
        if prect.intersects(r):
            return True
    return False

#%%
# --------------------------------------
# loop through the paths and draw them
# --------------------------------------
for path in paths:
    pathrects.append(path["rect"])
    # ------------------------------------
    # draw each entry of the 'items' list
    # ------------------------------------
    for item in path["items"]:  # these are the draw commands
        if item[0] == "l":  # line
            shape.draw_line(item[1], item[2])
        elif item[0] == "re":  # rectangle
            shape.draw_rect(item[1])
        elif item[0] == "qu":  # quad
            shape.draw_quad(item[1])
        elif item[0] == "c":  # curve
            shape.draw_bezier(item[1], item[2], item[3], item[4])
        else:
            raise ValueError("unhandled drawing", item)
    # ------------------------------------------------------
    # all items are drawn, now apply the common properties
    # to finish the path
    # ------------------------------------------------------
    shape.finish(
        fill=path["fill"],  # fill color
        color=path["color"],  # line color
        dashes=path["dashes"],  # line dashing
        even_odd=path.get("even_odd", True),  # control color of overlaps
        closePath=path["closePath"],  # whether to connect last and first point
        lineJoin=1, # how line joins should look like
        lineCap=1, # how line ends should look like
        # -------
        # reading this properties could yield `None`, which will raise an error: `or 0` is added to ensure compatibility
        width=path["width"] or 0,  # line width
        stroke_opacity=path.get("stroke_opacity", 1) or 0,  # same value for both
        fill_opacity=path.get("fill_opacity", 1) or 0,  # opacity parameters
    )

# %%
blocks = page.get_text("dict", flags=0)["blocks"]
for block in blocks:
    for line in block["lines"]:
        for span in line["spans"]:
            if not check_span(span):
                continue
            shape.insert_text(
                span["origin"],
                span["text"],
                fontsize=span["size"],
                color=fitz.sRGB_to_pdf(span["color"]),
                # -------------------------------
                # I force the use of Arial Narrow
                fontname="Arial-Narrow",
                fontfile="C:\WINDOWS\FONTS\ARIALN.TTF",
            )

#%%
# all paths processed - commit the shape to its page
shape.commit()
outpdf.save("output.pdf")

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to modify line properties on an existing pdf, or how to copy all the content to a new pdf changing only few properties #2870

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to modify line properties on an existing pdf, or how to copy all the content to a new pdf changing only few properties #2870

Uh oh!

kompre Dec 6, 2023

Replies: 2 comments

Uh oh!

JorjMcKie Dec 8, 2023 Maintainer

Uh oh!

Uh oh!

kompre Dec 12, 2023 Author

kompre
Dec 6, 2023

JorjMcKie
Dec 8, 2023
Maintainer

kompre
Dec 12, 2023
Author