Filter out unwanted/empty drawings from Page.get_drawings() #2305
Unanswered
megh-khaire
asked this question in
Looking for help
Replies: 1 comment 3 replies
-
Thank you very much for the feedback! Also chapeau for your advanced use of PyMuPDF features👏! You could use Pixmap color counting to exclude insignificant vector graphics:
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I have a PDF file that looks something like this: lech102.pdf
I want to extract all images from this PDF file.
Sample of the expected output:
I tried using a simple image extraction script to extract these images from the pdf file, however, that did not work and I realized the images I want to extract are actually vector graphics. So I implemented the following script (based on your drawing extraction script) to extract the vector images and save them as pictures:
To save time jump directly to
extract_drawings
This script works however it creates a lot of unwanted pictures that have whitespaces and/or a few lines/rectangles:


Could someone please help me out in filtering out these unwanted pictures?
PS: PyMuPDF is amazing ❤️
Thank you!
System: Windows 10 (64 bit)
Python Version: 3.11.2
PyMuPDF Version: 1.21.1
Beta Was this translation helpful? Give feedback.
All reactions