Skip to content

Commit dd4f4f2

Browse files
committed
Support Image Stamps
Add support for image stamp annotations. Add support for recoloring pages.
1 parent a0240dc commit dd4f4f2

File tree

4 files changed

+141
-43
lines changed

4 files changed

+141
-43
lines changed

docs/document.rst

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ For details on **embedded files** refer to Appendix 3.
9696
:meth:`Document.pdf_catalog` PDF only: :data:`xref` of catalog (root)
9797
:meth:`Document.pdf_trailer` PDF only: trailer source
9898
:meth:`Document.prev_location` return (chapter, pno) of preceding page
99+
:meth:`Document.recolor` PDF only: execute :meth:`Page.recolor` for all pages
99100
:meth:`Document.reload_page` PDF only: provide a new copy of a page
100101
:meth:`Document.resolve_names` PDF only: Convert destination names into a Python dict
101102
:meth:`Document.save` PDF only: save the document
@@ -594,6 +595,16 @@ For details on **embedded files** refer to Appendix 3.
594595

595596
To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapter_count` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements.
596597

598+
599+
.. method:: recolor(components=1)
600+
601+
PDF only: Change the color component counts for all object types text, image and vector graphics for all pages.
602+
603+
:arg int components: desired color space indicated by the number of color components: 1 = DeviceGRAY, 3 = DeviceRGB, 4 = DeviceCMYK.
604+
605+
The typical use case is 1 (DeviceGRAY) which converts the PDF to grayscale.
606+
607+
597608
.. method:: reload_page(page)
598609

599610
* New in v1.16.10
@@ -924,14 +935,14 @@ For details on **embedded files** refer to Appendix 3.
924935

925936
.. method:: get_page_fonts(pno, full=False)
926937

927-
PDF only: Return a list of all fonts (directly or indirectly) referenced by the page.
938+
PDF only: Return a list of all fonts (directly or indirectly) referenced by the page object definition.
928939

929940
:arg int pno: page number, 0-based, `-∞ < pno < page_count`.
930941
:arg bool full: whether to also include the referencer's :data:`xref`. If *True*, the returned items are one entry longer. Use this option if you need to know, whether the page directly references the font. In this case the last entry is 0. If the font is referenced by an `/XObject` of the page, you will find its :data:`xref` here.
931942

932943
:rtype: list
933944

934-
:returns: a list of fonts referenced by this page. Each entry looks like
945+
:returns: a list of fonts referenced by the object definition of the page. Each entry looks like
935946

936947
**(xref, ext, type, basefont, name, encoding, referencer)**,
937948

@@ -959,7 +970,12 @@ For details on **embedded files** refer to Appendix 3.
959970

960971
.. note::
961972
* This list has no duplicate entries: the combination of :data:`xref`, *name* and *referencer* is unique.
962-
* In general, this is a superset of the fonts actually in use by this page. The PDF creator may e.g. have specified some global list, of which each page only makes partial use.
973+
* In general, this is a true superset of the fonts actually in use by this page. The PDF creator may e.g. have specified some global list, of which each page make only partial use.
974+
* Be aware that font names returned by some variants of :meth:`Page.get_text` (respectively :ref:`TextPage` methods) need not (exactly) equal the base font name shown here. Reasons for any differences include:
975+
976+
- This method always shows any subset prefixes (the pattern ``ABCDEF+``), whereas text extractions do not do this by default.
977+
- Text extractions use the base library to access the font name, which has a length cap of 31 bytes and generally interrogates the font file binary to access the name. Method ``get_page_fonts()`` however looks at the PDF definition source.
978+
- Text extractions work for all supported document types in exactly the same way -- not just for PDFs. Consequently they do not contain PDF-specifics.
963979

964980
.. method:: get_page_text(pno, output="text", flags=3, textpage=None, sort=False)
965981

docs/page.rst

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ In a nutshell, this is what you can do with PyMuPDF:
106106
:meth:`Page.load_widget` PDF only: load a specific field
107107
:meth:`Page.load_links` return the first link on a page
108108
:meth:`Page.new_shape` PDF only: create a new :ref:`Shape`
109+
:meth:`Page.recolor` PDF only: change the colorspace of objects
109110
:meth:`Page.remove_rotation` PDF only: set page rotation to 0
110111
:meth:`Page.replace_image` PDF only: replace an image
111112
:meth:`Page.search_for` search for a string
@@ -543,23 +544,34 @@ In a nutshell, this is what you can do with PyMuPDF:
543544

544545
.. method:: add_stamp_annot(rect, stamp=0)
545546

546-
PDF only: Add a "rubber stamp" like annotation to e.g. indicate the document's intended use ("DRAFT", "CONFIDENTIAL", etc.).
547+
PDF only: Add a "rubber stamp" annotation to e.g. indicate the document's intended use ("DRAFT", "CONFIDENTIAL", etc.). The parameter may be either an integer to select text from a predefined array of standard texts or an image.
547548

548549
:arg rect_like rect: rectangle where to place the annotation.
550+
:arg multiple stamp: The following options are available:
551+
552+
* The id number (int) of the stamp text. For available stamps see :ref:`StampIcons`.
553+
554+
* A string specifying an image file path.
549555

550-
:arg int stamp: id number of the stamp text. For available stamps see :ref:`StampIcons`.
556+
* A ``bytes``, ``bytearray`` or ``io.BytesIO`` object for an image in memory.
551557

552-
.. note::
558+
* A :ref:`Pixmap`.
559+
560+
1. **Text-based stamps**
553561

554-
* The stamp's text and its border line will automatically be sized and be put horizontally and vertically centered in the given rectangle. :attr:`Annot.rect` is automatically calculated to fit the given **width** and will usually be smaller than this parameter.
562+
* :attr:`Annot.rect` is automatically calculated as the largest rectangle with an aspect ratio of ``width:height = 3.8`` that fits in the provided ``rect``. Its position is vertically and horizontally centered.
555563
* The font chosen is "Times Bold" and the text will be upper case.
556-
* The appearance can be changed using :meth:`Annot.set_opacity` and by setting the "stroke" color (no "fill" color supported).
557-
* This can be used to create watermark images: on a temporary PDF page create a stamp annotation with a low opacity value, make a pixmap from it with *alpha=True* (and potentially also rotate it), discard the temporary PDF page and use the pixmap with :meth:`insert_image` for your target PDF.
564+
* The appearance can be modified using :meth:`Annot.set_opacity` and by setting the "stroke" color. By PDF specification, stamp annotations have no "fill" color.
558565

566+
.. image:: images/img-stampannot.*
559567

560-
.. image:: images/img-stampannot.*
561-
:scale: 80
568+
2. **Image-based stamps**
562569

570+
* The image is scaled to fit into the rectangle `rect` such that the image's center and the center of `rect` coincide. The aspect ratio of the image is preserved, so the image may not fill the entire rectangle. However, at least one of the given rectangle's width or height are fully covered.
571+
* The annotation can be modified via :meth:`Annot.set_opacity`. This method therefore is a way to display images transparently even if no alpha channel is present.
572+
* Setting colors has no effect on image stamps.
573+
* Rotating image-based stamps **is not supported**. Setting the rotation may lead to unexpected results.
574+
563575
.. method:: add_widget(widget)
564576

565577
PDF only: Add a PDF Form field ("widget") to a page. This also **turns the PDF into a Form PDF**. Because of the large amount of different options available for widgets, we have developed a new class :ref:`Widget`, which contains the possible PDF field attributes. It must be used for both, form field creation and updates.
@@ -1935,6 +1947,14 @@ In a nutshell, this is what you can do with PyMuPDF:
19351947

19361948
:arg int rotate: An integer specifying the required rotation in degrees. Must be an integer multiple of 90. Values will be converted to one of 0, 90, 180, 270.
19371949

1950+
.. method:: recolor(components=1)
1951+
1952+
PDF only: Change the colorspace components of all objects on page.
1953+
1954+
:arg int components: The desired count of color components. Must be one of 1, 3 or 4, which results in color spaces DeviceGray, DeviceRGB or DeviceCMYK respectively. The method affects text, images and vector graphics. For instance, with the default value 1, a page will be converted to grayscale. If a page is already grayscale, the method will not cause visible changes -- independent of the value of ``components``.
1955+
1956+
These changes are **permanent** and cannot be reverted.
1957+
19381958
.. method:: remove_rotation()
19391959

19401960
PDF only: Set page rotation to 0 while maintaining appearance and page content.

src/__init__.py

Lines changed: 84 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -5366,6 +5366,19 @@ def resolve_link(self, uri=None, chapters=0):
53665366
pno = mupdf.fz_page_number_from_location(self.this, loc)
53675367
return pno, xp, yp
53685368

5369+
def recolor(self, components=1):
5370+
"""Change the color component count on all pages.
5371+
5372+
Args:
5373+
components: (int) desired color component count, one of 1, 3, 4.
5374+
5375+
Invokes the same-named method for all pages.
5376+
"""
5377+
if not self.is_pdf:
5378+
raise ValueError("is no PDF")
5379+
for i in range(self.page_count):
5380+
self.load_page(i).recolor(components)
5381+
53695382
def resolve_names(self):
53705383
"""Convert the PDF's destination names into a Python dict.
53715384

@@ -7717,42 +7730,69 @@ def _add_square_or_circle(self, rect, annot_type):
77177730
return Annot(annot)
77187731

77197732
def _add_stamp_annot(self, rect, stamp=0):
7733+
rect = Rect(rect)
7734+
r = JM_rect_from_py(rect)
7735+
if mupdf.fz_is_infinite_rect(r) or mupdf.fz_is_empty_rect(r):
7736+
raise ValueError(MSG_BAD_RECT)
77207737
page = self._pdf_page()
77217738
stamp_id = [
7722-
PDF_NAME('Approved'),
7723-
PDF_NAME('AsIs'),
7724-
PDF_NAME('Confidential'),
7725-
PDF_NAME('Departmental'),
7726-
PDF_NAME('Experimental'),
7727-
PDF_NAME('Expired'),
7728-
PDF_NAME('Final'),
7729-
PDF_NAME('ForComment'),
7730-
PDF_NAME('ForPublicRelease'),
7731-
PDF_NAME('NotApproved'),
7732-
PDF_NAME('NotForPublicRelease'),
7733-
PDF_NAME('Sold'),
7734-
PDF_NAME('TopSecret'),
7735-
PDF_NAME('Draft'),
7739+
"Approved",
7740+
"AsIs",
7741+
"Confidential",
7742+
"Departmental",
7743+
"Experimental",
7744+
"Expired",
7745+
"Final",
7746+
"ForComment",
7747+
"ForPublicRelease",
7748+
"NotApproved",
7749+
"NotForPublicRelease",
7750+
"Sold",
7751+
"TopSecret",
7752+
"Draft",
77367753
]
77377754
n = len(stamp_id)
7738-
name = stamp_id[0]
7739-
r = JM_rect_from_py(rect)
7740-
if mupdf.fz_is_infinite_rect(r) or mupdf.fz_is_empty_rect(r):
7741-
raise ValueError( MSG_BAD_RECT)
7742-
if _INRANGE(stamp, 0, n-1):
7755+
buf = None
7756+
name = None
7757+
if stamp in range(n):
77437758
name = stamp_id[stamp]
7759+
elif isinstance(stamp, Pixmap):
7760+
buf = stamp.tobytes()
7761+
elif isinstance(stamp, str):
7762+
buf = pathlib.Path(stamp).read_bytes()
7763+
elif isinstance(stamp, (bytes, bytearray)):
7764+
buf = stamp
7765+
elif isinstance(stamp, io.BytesIO):
7766+
buf = stamp.getvalue()
7767+
else:
7768+
name = stamp_id[0]
7769+
77447770
annot = mupdf.pdf_create_annot(page, mupdf.PDF_ANNOT_STAMP)
7745-
mupdf.pdf_set_annot_rect(annot, r)
7746-
try:
7747-
n = PDF_NAME('Name')
7748-
mupdf.pdf_dict_put(mupdf.pdf_annot_obj(annot), PDF_NAME('Name'), name)
7749-
except Exception:
7750-
if g_exceptions_verbose: exception_info()
7751-
raise
7752-
mupdf.pdf_set_annot_contents(
7753-
annot,
7754-
mupdf.pdf_dict_get_name(mupdf.pdf_annot_obj(annot), PDF_NAME('Name')),
7755-
)
7771+
if buf: # image stamp
7772+
fzbuff = mupdf.fz_new_buffer_from_copied_data(buf)
7773+
img = mupdf.fz_new_image_from_buffer(fzbuff)
7774+
7775+
# compute image boundary box on page
7776+
w, h = img.w(), img.h()
7777+
scale = min(rect.width / w, rect.height / h)
7778+
width = w * scale # bbox width
7779+
height = h * scale # bbox height
7780+
7781+
# center of "rect"
7782+
center = (rect.tl + rect.br) / 2
7783+
x0 = center.x - width / 2
7784+
y0 = center.y - height / 2
7785+
x1 = x0 + width
7786+
y1 = y0 + height
7787+
r = mupdf.fz_make_rect(x0, y0, x1, y1)
7788+
mupdf.pdf_set_annot_rect(annot, r)
7789+
mupdf.pdf_set_annot_stamp_image(annot, img)
7790+
mupdf.pdf_dict_put(mupdf.pdf_annot_obj(annot), PDF_NAME("Name"), mupdf.pdf_new_name("ImageStamp"))
7791+
mupdf.pdf_set_annot_contents(annot, "Image Stamp")
7792+
else: # text stamp
7793+
mupdf.pdf_set_annot_rect(annot, r)
7794+
mupdf.pdf_dict_put(mupdf.pdf_annot_obj(annot), PDF_NAME("Name"), PDF_NAME(name))
7795+
mupdf.pdf_set_annot_contents(annot, name)
77567796
mupdf.pdf_update_annot(annot)
77577797
JM_add_annot_id(annot, "A")
77587798
return Annot(annot)
@@ -8510,7 +8550,7 @@ def add_squiggly_annot(
85108550
q = CheckMarkerArg(quads)
85118551
return self._add_text_marker(q, mupdf.PDF_ANNOT_SQUIGGLY)
85128552

8513-
def add_stamp_annot(self, rect: rect_like, stamp: int =0) -> Annot:
8553+
def add_stamp_annot(self, rect: rect_like, stamp=0) -> Annot:
85148554
"""Add a ('rubber') 'Stamp' annotation."""
85158555
old_rotation = annot_preprocess(self)
85168556
try:
@@ -8601,6 +8641,19 @@ def annots(self, types=None):
86018641
annot._yielded=True
86028642
yield annot
86038643

8644+
def recolor(self, components=1):
8645+
"""Convert colorspaces of objects on the page.
8646+
8647+
Valid values are 1, 3 and 4.
8648+
"""
8649+
if components not in (1, 3, 4):
8650+
raise ValueError("components must be one of 1, 3, 4")
8651+
pdfdoc = _as_pdf_document(self.parent)
8652+
ropt = mupdf.pdf_recolor_options()
8653+
ropt.num_comp = components
8654+
ropts = mupdf.PdfRecolorOptions(ropt)
8655+
mupdf.pdf_recolor_page(pdfdoc, self.number, ropts)
8656+
86048657
@property
86058658
def artbox(self):
86068659
"""The ArtBox"""

tests/test_annots.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,15 +144,24 @@ def test_fileattachment():
144144
def test_stamp():
145145
doc = pymupdf.open()
146146
page = doc.new_page()
147-
annot = page.add_stamp_annot(r, stamp=10)
147+
annot = page.add_stamp_annot(r, stamp=0)
148148
assert annot.type == (13, "Stamp")
149+
assert annot.info["content"] == "Approved"
149150
annot_id = annot.info["id"]
150151
annot_xref = annot.xref
151152
page.load_annot(annot_id)
152153
page.load_annot(annot_xref)
153154
page = doc.reload_page(page)
154155

155156

157+
def test_image_stamp():
158+
doc = pymupdf.open()
159+
page = doc.new_page()
160+
filename = os.path.join(scriptdir, "resources", "nur-ruhig.jpg")
161+
annot = page.add_stamp_annot(r, stamp=filename)
162+
assert annot.info["content"] == "Image Stamp"
163+
164+
156165
def test_redact1():
157166
doc = pymupdf.open()
158167
page = doc.new_page()

0 commit comments

Comments
 (0)