claims to respect access permissions on encrypted PDFs but actually doesn't

Camelot contains this code: https://github.com/camelot-dev/camelot/blob/master/camelot/utils.py#L1391

Which would appear to respect the "extractable" flag on encrypted PDFs and refuse to process them.  Why you would actually want to do that is a mystery to me, but I can see the benefit in some compliance-related situations.

Unfortunately it does no such thing, because it also decrypts PDFs while splitting them into individual pages with pypdf, which eliminates the possibility of knowing whether the author wanted you to extract text or not (as this flag is only available on encrypted PDFs).

So the code above does nothing.  If you want to respect the permissions then you'll need to look at the `permissions` property on the `PdfReader` when you're doing that decryption:

https://pypdf.readthedocs.io/en/stable/modules/PdfDocCommon.html#pypdf._doc_common.PdfDocCommon.user_access_permissions
https://pypdf.readthedocs.io/en/stable/modules/constants.html#pypdf.constants.UserAccessPermissions

Or, alternately, you could apply #589 and have one fewer PDF parser to deal with ;-)

There is no way to reproduce this because it's an expected behaviour that doesn't occur, but I noticed it with this PDF from the test set:

https://github.com/camelot-dev/camelot/blob/master/tests/files/birdisland.pdf


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

claims to respect access permissions on encrypted PDFs but actually doesn't #590

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

claims to respect access permissions on encrypted PDFs but actually doesn't #590

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions