Skip to content

claims to respect access permissions on encrypted PDFs but actually doesn't #590

@dhdaines

Description

@dhdaines

Camelot contains this code: https://github.com/camelot-dev/camelot/blob/master/camelot/utils.py#L1391

Which would appear to respect the "extractable" flag on encrypted PDFs and refuse to process them. Why you would actually want to do that is a mystery to me, but I can see the benefit in some compliance-related situations.

Unfortunately it does no such thing, because it also decrypts PDFs while splitting them into individual pages with pypdf, which eliminates the possibility of knowing whether the author wanted you to extract text or not (as this flag is only available on encrypted PDFs).

So the code above does nothing. If you want to respect the permissions then you'll need to look at the permissions property on the PdfReader when you're doing that decryption:

https://pypdf.readthedocs.io/en/stable/modules/PdfDocCommon.html#pypdf._doc_common.PdfDocCommon.user_access_permissions
https://pypdf.readthedocs.io/en/stable/modules/constants.html#pypdf.constants.UserAccessPermissions

Or, alternately, you could apply #589 and have one fewer PDF parser to deal with ;-)

There is no way to reproduce this because it's an expected behaviour that doesn't occur, but I noticed it with this PDF from the test set:

https://github.com/camelot-dev/camelot/blob/master/tests/files/birdisland.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions