Skip to content

Conversation

@VietThan
Copy link
Contributor

@VietThan VietThan commented Jun 15, 2024

Hi, I'm a first time contributor to Python. I saw issue gh-117829 is interesting and took a stab at it. Currently it's a Work in Progress PR for others to see if my direction is good. Will add tests soon.

@VietThan VietThan marked this pull request as ready for review August 17, 2024 03:39
(source / 'bin' / 'baz').touch()
(source / '__main__.py').touch()

args = [str(source), '--include-pattern', r'.*\.py', '--exclude-pattern', r'.*z.*']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can self.tmpdir contain a z? If so, I believe this exclude pattern would ignore all files (since AFAICS it's matched against absolute paths, not against paths relative to the source directory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @godlygeek , note that this is fixed.

this case, any other options are ignored and SOURCE must be an archive, not a
directory.

.. option:: --include-pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tian suggested --include and --exclude; what is the reason for the -pattern suffix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, --include-pattern is more verbose without adding any significant benefit (IMO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I have addressed this suggestion

Lib/zipapp.py Outdated
Comment on lines 78 to 79
filter=None, compressed=False, include_pattern=None,
exclude_pattern=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to directly translate the --include and --exclude CLI options into a filter function, thus leaving the create_archive API unchanged, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'm a strong -1 on changing the create_archive API for this. The functionality is already available there, this should be purely a CLI change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback, I have updated the PR accordingly

with zipfile.ZipFile(new_target, 'r') as z:
self.assertEqual(set(z.namelist()), {'__main__.py'})

def test_create_archive_with_include_pattern(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't needed, as we shouldn't be changing create_archive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, i have addressed this

@bedevere-app
Copy link

bedevere-app bot commented Sep 27, 2025

Thanks for making the requested changes!

@pfmoore: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from pfmoore September 27, 2025 03:46
@mowmanminer

This comment was marked as off-topic.

$ ls myapp
__main__.py helper.py notthis.py
$ python -m zipapp myapp -o myapp.pyz --include "help*" --include "not*" --exclude "n*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example seems over-complicated. I'd suggest two examples, one showing --include (and specifically that anything not covered by the --include pattern is implicitly excluded), and one showing --exclude (demonstrating that everything except what's excluded gets added to the archive).

IMO the case where both --include and --exclude are used is too much of an edge case to warrant an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the examples as requested. Please let me know of other changes that are needed.


Accept glob-patterns filtering files to be denied inclusion in output archive. This will
run second if :option:`--include` is also used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the second sentence (explaining priorities) from these two descriptions, because it's not very clear. Instead, have a separate paragraph that explains how the options interact. Something like this:

If both --include and --exclude are specified, the files to be included are picked first. Then from that list of files, any to be excluded are removed. The order of the options does not affect how they are processed.

Also, add a note against --include saying "If this option is not specified, all files in the given directory are included by default."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made that "second paragraph" as a note. Please let me know what other things need to be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these need to be notes, they are too visually imposing like that. Please make them ordinary text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change, please let me know if I should change something else

Lib/zipapp.py Outdated

def _normalize_patterns(values: Iterable[str] | None) -> list[str]:
"""
Split comma-separated items, strip whitespace, drop empties.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not allow comma-separated items in one argument. Instead of --include a,b the user should use --include a --include b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've chosen to interpret "should not allow comma-separated items in one argument" as in we'll take the patterns as-is and won't make any assumptions about commas actually have extra semantics. I've also made the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you prefer that I throw a "ValueError" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree commas should have no special semantics.

return False
return True

return _filter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all feels way too complicated. We should just use the pathlib glob matching function, pth.match(). Something like:

  • If there are no --include or --exclude options, the filter is none.
  • Otherwise, filter(pth) should check whether pth.match(inc) for each include pattern. If none of them match, return False. Then, check pth.match(exc) against each exclude filter. If any of them match, return False. Otherwise, return True.

The documentation can simply state that patterns are standard glob patterns, as implemented by PurePath.match.

In particular, matches should not assume Posix path separators or case sensitivity. The pathlib functions handle this automatically, your current code doesn't.

And yes, this does mean that to match a directory and all of its contents, you need to use --include foo/**. I'm fine with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the thorough review. I was also hesitant at first about whether I wanted to implement all this complexities but I figured let's see if it's possible and let you/other reviewers see if it's appropriate. I've made the changes

@bedevere-app
Copy link

bedevere-app bot commented Sep 27, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@VietThan
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Sep 28, 2025

Thanks for making the requested changes!

@pfmoore: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from pfmoore September 28, 2025 02:08

Accept glob-patterns filtering files to be denied inclusion in output archive. This will
run second if :option:`--include` is also used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these need to be notes, they are too visually imposing like that. Please make them ordinary text.

* Patterns follow :class:`pathlib.PurePath.match`. To match all of a
directory contents, use ``dir/**``.
* Path separator handling is platform-specific. On POSIX, backslashes are
literals (``\`` does not act as a separator).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the second bullet is worth including. It probably causes more confusion than help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the second bullet point

# build a filter from include and exclude flags
filter_fn = None
src_path = pathlib.Path(args.source)
if src_path.exists() and src_path.is_dir() and (args.include or args.exclude):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only supporting --include and --exclude for directory sources? I don't see why they wouldn't be useful for archives as well. What's the motivation for this limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if a file source is given, we do a byte-copy operation other than modifying the shebang (_copy_archive), so my implementation applied --include/--exclude only when we’re creating a new archive from a directory as I wasn't sure if applying filtering to an archive source is expected/unexpected. If we want those filters to work with an existing archive, that mode would no longer be a pure copy as we’d need to unzip, filter, then re-zip the contents.

That being said, I'm happy to implement a "repack with filtering" functionality. Or I could have the CLI error when an archive source has filters applied.

if os.path.isfile(args.source) and (args.include or args.exclude):
    raise SystemExit("--include/--exclude only apply when SOURCE is a directory")

Please let me know what you prefer and I'll update the PR accordingly @pfmoore .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good instinct here. This is a problem that was introduced when we added the filter argument to create_archive, but which was not picked up then. IMO it's a bad UI for the filter argument to do nothing when copying an archive - but there's no practical use case that I'm aware of for filtering an existing archive.

I don't want to add extra complexity (that needs to be maintained, tested and supported) when no-one will actually use it, but equally I don't think we should have a UI that suggests an "obvious" usage (zipapp foo.pyz -o new_foo.pyz --exclude secret_data.txt) but doesn't actually implement it.

Honestly, at this point I'm wishing I'd stuck to my original principles and rejected the addition of the filter argument in the first place 🙁

Let's just raise an error when the source is an existing zipapp and --include/--exclude are specified. But please also add a similar error in the create_archive function. Also add a note to the command line help, and to the documentation of the create_archive function, explaining that when the source is an existing zipapp, the only supported functionality is to modify the shebang line (via the -p/--python command line argument, and the interpreter argument of the function, respectively).

Lib/zipapp.py Outdated
v = v.strip()
if v:
out.append(v)
return out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't feel very useful. Why not just inline its logic into _make_glob_filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made it inline

Lib/zipapp.py Outdated
exc = _normalize_patterns(values=excludes)

if not inc and not exc:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can never happen, because we check in the caller that one of includes or excludes is present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this check

Lib/zipapp.py Outdated
- Patterns are standard glob patterns as implemented by PurePath.match.
- If 'includes' is empty, all files/dirs are initially eligible.
- If any exclude pattern matches, the path is rejected.
- Matching respects the current platform's path flavor (separators, case).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the final bullet point, as it's implied by the use of PurePath.match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the last bullet point

@bedevere-app
Copy link

bedevere-app bot commented Sep 28, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@VietThan
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Sep 28, 2025

Thanks for making the requested changes!

@pfmoore: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from pfmoore September 28, 2025 19:38
Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. We need to explicitly reject using the filter argument and --include/--exclude when the source is a file. We also need to document the limitation that the only thing you can do when the source is a file is change the shebang.

@bedevere-app
Copy link

bedevere-app bot commented Oct 11, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants