Any plans to support arbitrary image rotation angles? #1658

RamKromberg · 2022-03-30T19:29:20Z

RamKromberg
Mar 30, 2022

I want to losslessly rotate and shear skewed scanned images (typically smartphone captures) by:

Package them as .pdfs.
Apply arbitrary angle (integer / real) rotation, shears and shifts via CTMs.
Set the bleed box to hide the skewed edges.

Looking at PDF 32000-1:2008 p.205, it seems applying transformation matrices to images is possible:

An image can be placed on the output page in any position, orientation, and size by using the cm operator to modify the current transformation matrix (CTM) so as to map the unit square of user space to the rectangle or parallelogram in which the image shall be painted.

However, image-maintenance.md states that:

Currently, PyMuPDF supports image rotations by multiples of 90° only. Other angles are not supported for modification.

So:

Is this still the case?
Does MuPDF have this limitation as well?
Any idea when/if image CTMs will be implemented?
Any alternatives/suggestions that will spare me learning pdf programming and reversing how img2pdf does its thing?

Regards

Answered by JorjMcKie

Mar 31, 2022

For example, insert some Pillow image rotated left by 45°:

from PIL import Image
import fitz
import io
img=Image.open("beauty-contest.jpg")
img.size
(607, 741)
src = fitz.open()  # the 1-page PDF with the image
# give page width / height from image
spage = src.new_page(width=img.size[0], height=img.size[1])
fp = io.BytesIO()
img.save(fp, "JPEG")  # make image memory
# and insert in PDF as fullpage image
spage.insert_image(spage.rect, stream=fp.getvalue())
5
doc = fitz.open()  # your target PDF
page = doc.new_page()
rect = fitz.Rect(100,100,300,300)  # image should land inside this
page.show_pdf_page(rect, src, 0, rotate=45)
7
doc.ez_save("test.pdf")

Result looks like this:

View full answer

JorjMcKie · 2022-03-31T14:12:54Z

JorjMcKie
Mar 31, 2022
Maintainer

First of all, CTMs are already used of course - in image and all other insertions.
For image insertion I just couldn't figure out how to compute the matrix for non-multiples of 90°.

The script image_maintenance.py you were alluding to, only is a demo and not part of the package itself. So its limitations do not necessarily apply to the package.

I am having no difficulties showing pages from other PDFs in any angle. So you might,

Convert an image to a 1-page PDF. At this point you can choose from files, images in memory or pixmaps. This gives you the first chance to manipulate the image before actually converting it.
Insert the new PDF page into your target PDF page. Here you can rotate by any angle. You can also restrict the visible image (i.e. source PDF page) part by the clip parameter.

0 replies

JorjMcKie · 2022-03-31T14:37:40Z

JorjMcKie
Mar 31, 2022
Maintainer

For example, insert some Pillow image rotated left by 45°:

from PIL import Image
import fitz
import io
img=Image.open("beauty-contest.jpg")
img.size
(607, 741)
src = fitz.open()  # the 1-page PDF with the image
# give page width / height from image
spage = src.new_page(width=img.size[0], height=img.size[1])
fp = io.BytesIO()
img.save(fp, "JPEG")  # make image memory
# and insert in PDF as fullpage image
spage.insert_image(spage.rect, stream=fp.getvalue())
5
doc = fitz.open()  # your target PDF
page = doc.new_page()
rect = fitz.Rect(100,100,300,300)  # image should land inside this
page.show_pdf_page(rect, src, 0, rotate=45)
7
doc.ez_save("test.pdf")

Result looks like this:

0 replies

RamKromberg · 2022-04-01T10:17:36Z

RamKromberg
Apr 1, 2022
Author

I'll need to test it out on pdf input since pillow and the like don't handle jpegs losslessly but this seems about right. Thanks!

Btw, refactoring show_pdf_page() as a primitive that takes a CTM in (so calc_matrix is an external function) would be for the best. Putting aside how it reflects the PDF's own structure, even if you were to put down entry points variables for simple shifting and shearing, there's some fancy projections transformation matrices that can compensate for image capture angles that would still be best left as CTMs.

On a side note, linear transformations math are covered in game engines tutorialst. Personally I can't remember any of it so it's more of a note-to-self really... ;) Still, thought I mention it in case someone passes by with similar interests and no clue what transformation matrices are.

2 replies

JorjMcKie Apr 1, 2022
Maintainer

Pillow was just an example. PyMuPDF can read most images directly too - don't need Pillow for JPEG, PNG, BMP, TIFF, GIF and more.

RamKromberg Apr 1, 2022
Author

Yeah it's just the losslessness that needs figuring out really. Regardless, I'll want to figure out how to insert PDFs first (like you suggested in your answer's 2nd point) since that will circumvent a lot of potential limitations.

RamKromberg · 2022-04-28T21:28:15Z

RamKromberg
Apr 28, 2022
Author

Found some time to mess around with this again and managed a proof-of-concept prototype img2pdf output transformation:

import fitz, sys

infile = sys.argv[1]
outfile = sys.argv[2]

doc = fitz.open(infile)
width, height = doc[0].mediabox_size

content_stream_xref = doc[0].get_contents()[0]
content_stream = doc.xref_stream(content_stream_xref)

# note this only works with img2pdf output
# the language in stream objects doesn't use newlines as a delim
# i.e. it's a separate postfix (e.g. rpn) notation script that needs it's own parser tokenizer and so on...
# but for img2pdf this is enough I guess 
stack = content_stream.decode("ascii").split('\n')
stack.reverse()

for i in range(len(stack)):
	if (stack[i].split(' ')[-1:]==['cm']):
		break

m = fitz.Matrix(stack[i].split(' ')[:-1])
m = m.prerotate(45)

stack[i]=" ".join(map(str, m))+" cm"
stack.reverse()

stack='\n'.join(stack).encode("ascii")

doc.update_stream(content_stream_xref, stack, new=False)

doc.save(outfile, garbage = 4, deflate = False)

I believe img2pdf always has just the 1 content stream so it should work on all img2pdf single-page outputs but masks might be an issue? Regardless, the general purpose version will cover those scenarios I guess.

I actually did all of this first in pikepdf (the qpdf bindings) since the low-level APIs there were well documented:

import sys
import pikepdf

infile = sys.argv[1]
outfile = sys.argv[2]

src = pikepdf.Pdf.open(infile)
page1 = src.pages[0]
width = page1.mediabox[2]
height = page1.mediabox[3]
commands = []
for operands, operator in pikepdf.parse_content_stream(page1):
	commands.append([operands, operator])
print(repr(commands[1][0]))
original = pikepdf.PdfMatrix(commands[1][0])

new_matrix = original.scaled(0.5, 0.5).rotated(45).translated(width/2, height/2)

commands[1][0] = pikepdf.Array([*new_matrix.shorthand])
new_content_stream = pikepdf.unparse_content_stream(commands)

page1.Contents = src.make_stream(new_content_stream)

src.save(outfile)

But I don't want to deal with c++ for such a simple use case and I want to render with mupdf so I've converted it to pymupdf.

The general purpose version does things differently. I got it working in pikepdf by duplicating the page to create a graphic stack and modifying that:

import sys
import pikepdf

infile = sys.argv[1]
outfile = sys.argv[2]

src = pikepdf.Pdf.open(infile)
page1 = src.pages[0]
width = page1.mediabox[2]
height = page1.mediabox[3]
commands = []
for operands, operator in pikepdf.parse_content_stream(page1):
	commands.append([operands, operator])
print(repr(commands[1][0]))
original = pikepdf.PdfMatrix(commands[1][0])

new_matrix = original.scaled(0.5, 0.5).rotated(45).translated(width/2, height/2)

commands[1][0] = pikepdf.Array([*new_matrix.shorthand])
new_content_stream = pikepdf.unparse_content_stream(commands)

page1.Contents = src.make_stream(new_content_stream)

src.save(outfile)

I believe pymupdf's existing n-up APIs should similarly create graphic stacks so it should work about the same but I haven't looked into it yet.

Anyhow, that's about it for now. I'll try to find some time and wrap things up next couple of weeks.

Thanks for the tips.

p.s. In https://pymupdf.readthedocs.io/en/latest/faq.html#low-level-interfaces , the first example should read print(doc.xref_object(xref, compressed=False)) instead of print(doc.xref_object(i, compressed=False)).

0 replies

RamKromberg · 2022-04-29T17:02:42Z

RamKromberg
Apr 29, 2022
Author

the slightly more general purpose prototype with pymupdf:

import fitz, sys, re

infile = sys.argv[1]
outfile = sys.argv[2]

src = fitz.open(infile)

doc = fitz.open()
page=doc.new_page()
page.set_cropbox(src[0].cropbox)
page.set_mediabox(src[0].mediabox)
#page.set_artbox(src[0].artbox)
#page.set_bleedbox(src[0].bleedbox)
#page.set_trimbox(src[0].trimbox)

r = fitz.Rect(0, 0, page.rect.width, page.rect.height)

content_stream_xref = page.show_pdf_page(r, src)
content_stream = doc.xref_stream(content_stream_xref)

#basic lexing around the white space.
white_spaces = (
	"\x00", #NULL (NUL)
	"\x09", #HORIZONTAL TAB (HT)
	"\x0a", #LINE FEED (LF)
	"\x0c", #FORM FEED (FF)
	"\x0d", #CARRIAGE RETURN (CR)
	"\x20"  #SPACE (SP)
)

stack=re.split('|'.join(white_spaces),content_stream.decode("ascii"))

stack.reverse()
index = stack.index('cm')
m = stack[index+1:index+7]
m.reverse()
stack.reverse()
index=len(stack)-index

m = fitz.Matrix(m)
m = m.prerotate(45)

stack[index-7:index-1] = m[0:6]


stack=' '.join(map(str, stack)).encode("ascii")

doc.update_stream(content_stream_xref, stack, new=False)

doc.save(outfile, garbage = 4, deflate = False)

Anyhow, I still don't know how to use the mupdf parser on the content stream (it should be there somewhere...) so I rolled a half baked junky lexer and hacked around that. Still, this should work on all single page pdfs, img2pdf or otherwise. So, that's good enough for me I guess.

That's that I guess.

1 reply

JorjMcKie Apr 30, 2022
Maintainer

Thank you for working on this!
As per using MuPDF for dealing with content streams: method Page.clean_contents() is a good way to prepare these sources for easier Python string manipulations:

It corrects and standardizes: inserts any missing q/Q wrappings, shortens numbers (like 1.23000 => 1.23), shortens mutiple token delimiters to one, etc.
It concatenates any multiple content streams of a page into one.
It will put every PDF command on its own line, so using .splitlines() will give you a list of bytes strings, where the last token always is the respective PDF command: hence 1 2 3 4 \n 5 6 cm 50 60 100 100 re will become this:

1 2 3 4 5 6 cm
50 60 100 100 re

The above is also done automatically for any Form XObjects (recursively). These object come with their own content streams.

Annotations consist of multiple separate (xref-ed) objects, some of which can also have content streams. Using Annot.clean_contents() does the above for all content streams associated with an annotation. The source of the main one used (found under PDF key AP/N, should always exist) can be extracted and stored back using source = Annot._getAP(), resp. Annot._setAP(source).

RamKromberg · 2022-04-30T12:24:36Z

RamKromberg
Apr 30, 2022
Author

Thanks! Page.clean_contents() sounds promising and I'll be sure to experiment with it.

And yeah, to clarify, the hacked-up mini-lexer above only happens to work on the specific, non-arbitrary graphic stacks coming off img2pdf and Document.show_pdf_page(). It doesn't handle comments... It doesn't handle data structures... It doesn't handle out-of-spec or faulty code... So, don't go using it in your production code :D

Thanks again :)

p.s. References on how to roll your own lexer/parser:

The specs: PDF32000 Lexical Conventions (section 7.2, p.12)
The real world: https://github.com/ArtifexSoftware/mupdf/blob/master/source/pdf/pdf-lex.c
A python library for lexers: http://www.dabeaz.com/ply/ply.html
A toy implementation: https://feliam.wordpress.com/2010/08/06/lexing-pdf-just-for-the-un-fun-of-it/

Best of luck.

2 replies

JorjMcKie Apr 30, 2022
Maintainer

Thanks for the references - I will look into them. The MuPDF one I knew of course.

RamKromberg May 1, 2022
Author

Oh those were just warnings for any poor soul passing through here after googling how to do low-level stuff on pdfs thinking it's ok to use my hacks on arbitrary inputs :D

As for pymupdf, I just assumed the reason you didn't have low-level functions binded or a content stream parser was because the project's scope was focused on rendering, high-level editing and content generation in much the same way qpdf/pikepdf don't have a renderer or content creation since they're focused on low-level work.

If it's something you're considering, I doubt how useful that blog tutorial would be seeing how binding the existing mupdf lexer and parser function would be easier, safer, more performant and faster to code than writing your own.

Regardless of what anyone does, best of luck :)

Any plans to support arbitrary image rotation angles? #1658

Uh oh!

RamKromberg Mar 30, 2022

Replies: 6 comments · 5 replies

Uh oh!

JorjMcKie Mar 31, 2022 Maintainer

Uh oh!

JorjMcKie Mar 31, 2022 Maintainer

Uh oh!

RamKromberg Apr 1, 2022 Author

Uh oh!

JorjMcKie Apr 1, 2022 Maintainer

Uh oh!

RamKromberg Apr 1, 2022 Author

Uh oh!

Uh oh!

RamKromberg Apr 28, 2022 Author

Uh oh!

RamKromberg Apr 29, 2022 Author

Uh oh!

JorjMcKie Apr 30, 2022 Maintainer

Uh oh!

Uh oh!

RamKromberg Apr 30, 2022 Author

Uh oh!

JorjMcKie Apr 30, 2022 Maintainer

Uh oh!

RamKromberg May 1, 2022 Author

RamKromberg
Mar 30, 2022

Replies: 6 comments 5 replies

JorjMcKie
Mar 31, 2022
Maintainer

JorjMcKie
Mar 31, 2022
Maintainer

RamKromberg
Apr 1, 2022
Author

JorjMcKie Apr 1, 2022
Maintainer

RamKromberg Apr 1, 2022
Author

RamKromberg
Apr 28, 2022
Author

RamKromberg
Apr 29, 2022
Author

JorjMcKie Apr 30, 2022
Maintainer

RamKromberg
Apr 30, 2022
Author

JorjMcKie Apr 30, 2022
Maintainer

RamKromberg May 1, 2022
Author