Skip to content

Conversation

@dignissimus
Copy link
Contributor

@dignissimus dignissimus commented Apr 3, 2022

Co-authored by Alexey Boriskin

https://bugs.python.org/issue15795

TODO

  • Documentation
    • Document constants
    • Document permission preservation parameters
    • Document new behaviour for archives created on windows
  • Testing
    • Get current tests to pass
      • Work out why extracted files have mode 777
    • Add more tests for the other stat flags
      • Explicitly state file type by applying file-type masks
      • Re-add tests for files with mode 0
    • Check the permissions of extracted files against the umask
    • Extract test archive contents to a test directory
  • Enumerations
    • Replace constants with enumerations
    • Document enumerations

@dignissimus dignissimus changed the title bpo-15795: Add option to preserve permissions in ZipFile.extract gh-59999: Add option to preserve permissions in ZipFile.extract Apr 13, 2022
@dignissimus dignissimus marked this pull request as ready for review April 13, 2022 10:26
@dignissimus dignissimus requested a review from merwok April 13, 2022 10:26
@sharkwouter
Copy link

Thanks for making this. I didn't really get back to my PR after it look a year to be reviewed.

What I did differently is that I made preserving the permissions the default. The reason for this is that the unzip command which comes with any Linux distribution, MacOS or BSD system will preserve permissions by default as well. So the user will probably expect that to be the default.

@dignissimus
Copy link
Contributor Author

Thanks for making this. I didn't really get back to my PR after it look a year to be reviewed.

It's no problem

What I did differently is that I made preserving the permissions the default. The reason for this is that the unzip command which comes with any Linux distribution, MacOS or BSD system will preserve permissions by default as well. So the user will probably expect that to be the default.

On the BPO issue (#59999) I proposed not doing this to maintain backwards compatibility

@sharkwouter
Copy link

Makes some sense, thanks for clarifying. Keep up the good work!

@MaxwellDupre
Copy link
Contributor

Running test_zipfile I get two errors:
FAIL: test_extract_preserve_none (test.test_zipfile.TestsPermissionExtraction.test_extract_preserve_none)

Traceback (most recent call last):
File "/home/dougal/Documents/GitHub/cpython/Lib/test/test_zipfile.py", line 2099, in test_extract_preserve_none
self.assertEqual(os.stat(filename).st_mode,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 33204 != 33188

FAIL: test_extractall_preserve_none (test.test_zipfile.TestsPermissionExtraction.test_extractall_preserve_none)
Traceback (most recent call last):
File "/home/dougal/Documents/GitHub/cpython/Lib/test/test_zipfile.py", line 2090, in test_extractall_preserve_none
self.assertEqual(os.stat(filename).st_mode,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 33204 != 33188

Ran 274 tests in 94.070s
FAILED (failures=2, skipped=3)

@dignissimus
Copy link
Contributor Author

Ok, interesting. What OS and file system are you running this on? And, do the other test_extract tests pass, or were some of them skipped? And, do tests still fail on 6470201?

@MaxwellDupre

@MaxwellDupre
Copy link
Contributor

I am running Fedora 35. I did a clean start (compiling python from scratch just in case) and ran the test again and got the same result.
Ran 274 tests in 78.664s
FAILED (failures=2, skipped=3)

Other test_extract tests are ok ( except
test_extract_hackers_arcnames_windows_only (test.test_zipfile.ExtractTests.test_extract_hackers_arcnames_windows_only) ... skipped 'Requires \ as path separator.'
)

I don't know what you mean by 6470201, I didn't see any tests there.

@dignissimus
Copy link
Contributor Author

I don't know what you mean by 6470201, I didn't see any tests there.

Oh of course, they didn't exist in that revision.

I've just checked and the default mode for created files differs on Fedora. On CI and my system it's 644 on Fedora it seems to be 664. I'll write the test so it checks against what the default for the system is instead of the 0o644 constant.

@dignissimus dignissimus requested a review from merwok February 15, 2025 01:38
@dignissimus dignissimus requested a review from merwok February 17, 2025 09:53
@merwok merwok requested review from ethanfurman and removed request for merwok February 17, 2025 14:02
Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I just have a few suggestions, mostly nitpicks.

@merwok
Copy link
Member

merwok commented Mar 5, 2025

(Please follow the devguide: avoid force pushes, they create notifications with broken links for reviewers, and make it harder to see changes compared to previous time. Thanks!)

@sigma67
Copy link

sigma67 commented Apr 11, 2025

Seems another look is needed from a reviewer?

@merwok merwok requested a review from jaraco April 11, 2025 13:34
extracted as accurately as possible. *member* can be a filename or a
:class:`ZipInfo` object.

*path*, *pwd*, and *preserve_permissions* have the same meaning as for :meth:`extract`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a mis-copy perhaps?

return

# Ignore permissions if the archive was created on Windows
if member.create_system == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the check should probably be more conservative than this, there is also 10 which is Windows NTFS, and 20-255 are unused and could be invalid for this use case. An allow-list would be safer.

Copy link
Contributor

@lordmauve lordmauve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some improvements, but if we follow @thatch's guidance and only support x bits then &-ing bitfields is not correct; what we would want to do instead is exactly what pip does here and

is_executable = mode and stat.S_ISREG(mode) and mode & 0o111
mode = 0o777 if is_executable else 0o666

so that any x bit in u/g/o is extended to 0o111 then ANDed with umask

Comment on lines +380 to +384
class PreserveMode(enum.Enum):
"""Options for preserving file permissions upon extraction."""
NONE = enum.auto()
SAFE = enum.auto()
ALL = enum.auto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a mode for x bits only - see #59999 (comment) . (Also this is the behaviour I want and how I found this issue.)

It is not very natural to use an enum here - much more general is to define preserve_mode taking an int so that I can pass 0o333 or something to preserve wx bits and default the r bit to 1 (from a default of 0o666).

The constants here could directly be

PRESERVE_NONE = 0
PRESERVE_SAFE = 0o0777
PRESERVE_ALL = 0o7777

import importlib.util
import io
import os
import pathlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used?

XXX references to utf-8 need further investigation.
"""
import binascii
import contextlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

PreserveMode.SAFE: 0o777,
PreserveMode.ALL: 0xFFFF,
}
new_mode = (member.external_attr >> 16) & mask[mode]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should honour the current umask. It's clumsy to get the current umask so the easiest way to do this is not to chmod the file after creation but to pass the requested mode bits into os.open() with os.O_CREAT when opening the target file for writing. It's also one fewer syscall so might be slightly faster.

Comment on lines +1868 to +1871
mask = {
PreserveMode.SAFE: 0o777,
PreserveMode.ALL: 0xFFFF,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could define

class PreserveMode(enum.IntEnum):
    NONE = 0
    SAFE = 0o777
    ALL = 0o7777

then here it's just

(member.external_attr >> 16) & mode

(However, see my other comment about not restricting this to specific values - users may have arbitrary requirements that an int can capture better.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.