Skip to content

Conversation

@gdh1995
Copy link

@gdh1995 gdh1995 commented Mar 4, 2025

The latest tarfile may still generate a file slightly different with the one made by GNU Tar, whenever a path name is longer than 100 bytes. So this PR tries to avoid the difference.

More details are in #130819 .

@gdh1995 gdh1995 requested a review from ethanfurman as a code owner March 4, 2025 03:03
@ghost
Copy link

ghost commented Mar 4, 2025

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Mar 4, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@gdh1995 gdh1995 force-pushed the fix_long_gnu_name_in_tarfile branch from 24b90cf to 283b34e Compare March 5, 2025 02:38
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you motivate the choice for this? namely is there a real benefit between having an explicit user+mode rather than letting the "defaults"? And more importantly, can you cite the relevant manpage / specs where we can find this?

Note: whether this is accpeted or not, this should be treated as a feature request and not a bug IMO. As such, a What's New entry will need to be created, unless the motivation behind this change is not sufficient (in which case we would close the issue as "not planned")

Lib/tarfile.py Outdated
Comment on lines 1193 to 1195
info["mode"] = 0o100644
info["uname"] = "root"
info["gname"] = "root"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the specs are these decided?

@bedevere-app
Copy link

bedevere-app bot commented Mar 8, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Member

picnixz commented Mar 8, 2025

Oh btw, please reply on the issue instead of the PR (I'll repost my comment above)

@gdh1995 gdh1995 force-pushed the fix_long_gnu_name_in_tarfile branch 4 times, most recently from 80b8591 to 5282dd6 Compare April 23, 2025 08:31
@gdh1995
Copy link
Author

gdh1995 commented Apr 23, 2025

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Apr 23, 2025

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from picnixz April 23, 2025 09:04
Lib/tarfile.py Outdated
Comment on lines 909 to 910
_unames = {} # Cached mappings of uid=0 -> uname
_gnames = {} # Cached mappings of gid=0 -> gname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that we keep per-instance caches instead of per-class caches, even for 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, sorry but there seems no available TarFile object for _create_gnu_long_header to cache the querying result:

  • a TarFile instance does have cache members of self._unames: Dict[uid, uname]
  • however, across the calling stack of TarInfo.tobuf() -> TarInfo.create_gnu_header() -> TarInfo._create_gnu_long_header(), there's no TarFile argument.

If we add a TarFile into TarInfo.tobuf(...), then this PR may break existing subclasses of TarInfo. Is it indeed necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, to make this cache safe for the free-threading version of CPython, I've replaced the _unames = {} with _name_uid0 = None (and _gnames = {} with _name_gid0 = None).

Do you developers have any suggestions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz Any idea about where to put the cache object?

@gdh1995 gdh1995 force-pushed the fix_long_gnu_name_in_tarfile branch 2 times, most recently from 47861a1 to ca20885 Compare May 6, 2025 09:26
@gdh1995 gdh1995 force-pushed the fix_long_gnu_name_in_tarfile branch from ca20885 to 02bbde5 Compare May 8, 2025 12:22
@gdh1995
Copy link
Author

gdh1995 commented May 9, 2025

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented May 9, 2025

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from picnixz May 9, 2025 02:47
(Contributed by Xuehai Pan in :gh:`131799`.)


tarfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be moved in whatsnew/3.15.rst now

Comment on lines +898 to +899
_name_uid0 = None # Cached uname of uid=0
_name_gid0 = None # Cached gname of gid=0
Copy link
Member

@picnixz picnixz May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant before is: why using a class variable? the issue is that once we deduce uid=0, we're stuck with it for the entire Python process.

EDIT: I didn't see you comment, my bad. Then we need to think of another solution because storing them in TarFile feels wrong. What we can do is to add a private attribute in TarInfo and populate it from TarFile. When writing, if the attribute is not set, we populate it eagerly (and thus subclasses of TarInfo will be slower but they won't be broken). Or instead, we can even just dump them with the legacy way (namely without aligning with GNU Tar). Only default TarFile and TarInfo objects will be having this new feature.

More generally, we should be able to set cached contextual information on TarInfo objects coming from a TarFile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants