-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
gh-117829: add --include and --exclude flag to zipapp #120537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…829' into implement-issue-117829
…829' into implement-issue-117829
Lib/test/test_zipapp.py
Outdated
| (source / 'bin' / 'baz').touch() | ||
| (source / '__main__.py').touch() | ||
|
|
||
| args = [str(source), '--include-pattern', r'.*\.py', '--exclude-pattern', r'.*z.*'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can self.tmpdir contain a z? If so, I believe this exclude pattern would ignore all files (since AFAICS it's matched against absolute paths, not against paths relative to the source directory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @godlygeek , note that this is fixed.
…829' into implement-issue-117829
Doc/library/zipapp.rst
Outdated
| this case, any other options are ignored and SOURCE must be an archive, not a | ||
| directory. | ||
|
|
||
| .. option:: --include-pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tian suggested --include and --exclude; what is the reason for the -pattern suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, --include-pattern is more verbose without adding any significant benefit (IMO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, I have addressed this suggestion
Lib/zipapp.py
Outdated
| filter=None, compressed=False, include_pattern=None, | ||
| exclude_pattern=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to directly translate the --include and --exclude CLI options into a filter function, thus leaving the create_archive API unchanged, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'm a strong -1 on changing the create_archive API for this. The functionality is already available there, this should be purely a CLI change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback, I have updated the PR accordingly
Lib/test/test_zipapp.py
Outdated
| with zipfile.ZipFile(new_target, 'r') as z: | ||
| self.assertEqual(set(z.namelist()), {'__main__.py'}) | ||
|
|
||
| def test_create_archive_with_include_pattern(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn't needed, as we shouldn't be changing create_archive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, i have addressed this
|
Thanks for making the requested changes! @pfmoore: please review the changes made to this pull request. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Doc/library/zipapp.rst
Outdated
| $ ls myapp | ||
| __main__.py helper.py notthis.py | ||
| $ python -m zipapp myapp -o myapp.pyz --include "help*" --include "not*" --exclude "n*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example seems over-complicated. I'd suggest two examples, one showing --include (and specifically that anything not covered by the --include pattern is implicitly excluded), and one showing --exclude (demonstrating that everything except what's excluded gets added to the archive).
IMO the case where both --include and --exclude are used is too much of an edge case to warrant an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the examples as requested. Please let me know of other changes that are needed.
|
|
||
| Accept glob-patterns filtering files to be denied inclusion in output archive. This will | ||
| run second if :option:`--include` is also used. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the second sentence (explaining priorities) from these two descriptions, because it's not very clear. Instead, have a separate paragraph that explains how the options interact. Something like this:
If both
--includeand--excludeare specified, the files to be included are picked first. Then from that list of files, any to be excluded are removed. The order of the options does not affect how they are processed.
Also, add a note against --include saying "If this option is not specified, all files in the given directory are included by default."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made that "second paragraph" as a note. Please let me know what other things need to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these need to be notes, they are too visually imposing like that. Please make them ordinary text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the change, please let me know if I should change something else
Lib/zipapp.py
Outdated
|
|
||
| def _normalize_patterns(values: Iterable[str] | None) -> list[str]: | ||
| """ | ||
| Split comma-separated items, strip whitespace, drop empties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not allow comma-separated items in one argument. Instead of --include a,b the user should use --include a --include b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've chosen to interpret "should not allow comma-separated items in one argument" as in we'll take the patterns as-is and won't make any assumptions about commas actually have extra semantics. I've also made the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you prefer that I throw a "ValueError" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree commas should have no special semantics.
| return False | ||
| return True | ||
|
|
||
| return _filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all feels way too complicated. We should just use the pathlib glob matching function, pth.match(). Something like:
- If there are no
--includeor--excludeoptions, the filter is none. - Otherwise,
filter(pth)should check whetherpth.match(inc)for each include pattern. If none of them match, returnFalse. Then, checkpth.match(exc)against each exclude filter. If any of them match, returnFalse. Otherwise, returnTrue.
The documentation can simply state that patterns are standard glob patterns, as implemented by PurePath.match.
In particular, matches should not assume Posix path separators or case sensitivity. The pathlib functions handle this automatically, your current code doesn't.
And yes, this does mean that to match a directory and all of its contents, you need to use --include foo/**. I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the thorough review. I was also hesitant at first about whether I wanted to implement all this complexities but I figured let's see if it's possible and let you/other reviewers see if it's appropriate. I've made the changes
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @pfmoore: please review the changes made to this pull request. |
|
|
||
| Accept glob-patterns filtering files to be denied inclusion in output archive. This will | ||
| run second if :option:`--include` is also used. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these need to be notes, they are too visually imposing like that. Please make them ordinary text.
Doc/library/zipapp.rst
Outdated
| * Patterns follow :class:`pathlib.PurePath.match`. To match all of a | ||
| directory contents, use ``dir/**``. | ||
| * Path separator handling is platform-specific. On POSIX, backslashes are | ||
| literals (``\`` does not act as a separator). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the second bullet is worth including. It probably causes more confusion than help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the second bullet point
| # build a filter from include and exclude flags | ||
| filter_fn = None | ||
| src_path = pathlib.Path(args.source) | ||
| if src_path.exists() and src_path.is_dir() and (args.include or args.exclude): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we only supporting --include and --exclude for directory sources? I don't see why they wouldn't be useful for archives as well. What's the motivation for this limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if a file source is given, we do a byte-copy operation other than modifying the shebang (_copy_archive), so my implementation applied --include/--exclude only when we’re creating a new archive from a directory as I wasn't sure if applying filtering to an archive source is expected/unexpected. If we want those filters to work with an existing archive, that mode would no longer be a pure copy as we’d need to unzip, filter, then re-zip the contents.
That being said, I'm happy to implement a "repack with filtering" functionality. Or I could have the CLI error when an archive source has filters applied.
if os.path.isfile(args.source) and (args.include or args.exclude):
raise SystemExit("--include/--exclude only apply when SOURCE is a directory")Please let me know what you prefer and I'll update the PR accordingly @pfmoore .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good instinct here. This is a problem that was introduced when we added the filter argument to create_archive, but which was not picked up then. IMO it's a bad UI for the filter argument to do nothing when copying an archive - but there's no practical use case that I'm aware of for filtering an existing archive.
I don't want to add extra complexity (that needs to be maintained, tested and supported) when no-one will actually use it, but equally I don't think we should have a UI that suggests an "obvious" usage (zipapp foo.pyz -o new_foo.pyz --exclude secret_data.txt) but doesn't actually implement it.
Honestly, at this point I'm wishing I'd stuck to my original principles and rejected the addition of the filter argument in the first place 🙁
Let's just raise an error when the source is an existing zipapp and --include/--exclude are specified. But please also add a similar error in the create_archive function. Also add a note to the command line help, and to the documentation of the create_archive function, explaining that when the source is an existing zipapp, the only supported functionality is to modify the shebang line (via the -p/--python command line argument, and the interpreter argument of the function, respectively).
Lib/zipapp.py
Outdated
| v = v.strip() | ||
| if v: | ||
| out.append(v) | ||
| return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't feel very useful. Why not just inline its logic into _make_glob_filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made it inline
Lib/zipapp.py
Outdated
| exc = _normalize_patterns(values=excludes) | ||
|
|
||
| if not inc and not exc: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can never happen, because we check in the caller that one of includes or excludes is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this check
Lib/zipapp.py
Outdated
| - Patterns are standard glob patterns as implemented by PurePath.match. | ||
| - If 'includes' is empty, all files/dirs are initially eligible. | ||
| - If any exclude pattern matches, the path is rejected. | ||
| - Matching respects the current platform's path flavor (separators, case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the final bullet point, as it's implied by the use of PurePath.match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the last bullet point
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @pfmoore: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. We need to explicitly reject using the filter argument and --include/--exclude when the source is a file. We also need to document the limitation that the only thing you can do when the source is a file is change the shebang.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Hi, I'm a first time contributor to Python. I saw issue gh-117829 is interesting and took a stab at it. Currently it's a Work in Progress PR for others to see if my direction is good. Will add tests soon.