Skip to content

Conversation

@stefanor
Copy link
Contributor

@stefanor stefanor commented Aug 10, 2024

The multiarch triple encodes the platform name, so there is no need to have both in the _sysconfigdata filename.

Debian has been carrying a (dumber) variant of this patch for a while.

Fixes: #81742

The multiarch triple encodes the platform name, so there is no need to
have both in the _sysconfigdata filename.

Debian has been carrying a (dumber) variant of this patch for a while.

Fixes: python#81742
This is used in cross-compilation, so let's document it.
@stefanor stefanor force-pushed the sysconfigdata-multiarch branch from e662510 to 552ad77 Compare December 19, 2024 20:42
@freakboy3742
Copy link
Contributor

There's no real context on this PR or on the original ticket for why this change is necessary or beneficial, other than "Debian does it", and a possible issue with the MACHDEP definition on KFreeBSD. It's not clear to me why Debian has chosen to make this change, or why the fix isn't with the MACHDEP definition for that specific platform.

At least in the iOS context, platform and multiarch are encoding different things - _sysconfigdata__ios_arm64-iphoneos.py tells you it's the iOS platform, using the iphoneos SDK. You can certainly infer that sys.platform is ios based on the last part of the multiarch definition, but at the very least that change would be disruptive to the existing body of code that can reliably predict the sysconfig module name.

So - what's the motivation for this change?

@stefanor
Copy link
Contributor Author

Historically, the motivation was that the FreeBSD kernel's platform name changed frequently, without coordinating with any other userland change. So encoding sys.platform in here really caused problems on Debian/kFreeBSD.

Debian no longer has any kFreeBSD ports. So the issue is mostly moot for us. If you don't want it, we can attempt to transition away from this patch.

@freakboy3742
Copy link
Contributor

Historically, the motivation was that the FreeBSD kernel's platform name changed frequently, without coordinating with any other userland change. So encoding sys.platform in here really caused problems on Debian/kFreeBSD.

Debian no longer has any kFreeBSD ports. So the issue is mostly moot for us. If you don't want it, we can attempt to transition away from this patch.

I'm willing to be convinced otherwise, but based on your explanation, my inclination is that this sounds like a kFreeBSD problem (or, an issue with kFreeBSD platform identification in the configure script), so there's no reason to be disruptive with _sysconfigdata naming.

@stefanor
Copy link
Contributor Author

It's basically the same issue as linux/linux2 in sys.platform, where Python eventually hardcoded linux. FreeBSD just bumps their number a lot more frequently.

@freakboy3742
Copy link
Contributor

It's basically the same issue as linux/linux2 in sys.platform, where Python eventually hardcoded linux. FreeBSD just bumps their number a lot more frequently.

In which case, it sounds like the same solution is called for - a transition to using freebsd as the sys.platform identifier, rather than modifying the name of the _sysconfigdata filename.

@stefanor
Copy link
Contributor Author

Agreed with all of that.

At least in the iOS context, platform and multiarch are encoding different things - _sysconfigdata__ios_arm64-iphoneos.py tells you it's the iOS platform, using the iphoneos SDK. You can certainly infer that sys.platform is ios based on the last part of the multiarch definition, but at the very least that change would be disruptive to the existing body of code that can reliably predict the sysconfig module name.

Yes, I can't argue that this wouldn't be disruptive to code that attempts to predict sysconfig data module names like this. https://github.com/search?q=sysconfigdata_&type=code shows up a fairly large number of results. Making the change in Debian hasn't broken anything for any of the Python stuff we have, that we've noticed. There were probably knock-on effects in projects that build on Debian/Ubuntu. And we will have those affects again if we revert this patch.

I would consider the sysconfig data module name to be an undocumented part of Python's private API, and available for change. There is a private (and undocumented) API for querying the name (sysconfig._get_sysconfigdata_name() It is something that one has to twiddle with (via _PYTHON_SYSCONFIGDATA_NAME to do cross-compilation with Python.
https://codesearch.debian.net/search?q=sysconfigdata_name&literal=1&perpkg=1 shows a couple of things using it, but that's about it.

The name has previously changed in 3.6 (c4b53af).

I've described the historical, practical need for the change, which no longer applies in Debian.

The philosophical motivation for the change was that the multiarch-triplet is a more specific description of the platform than the platform name. Why specify both a vague and a specific name, when we can just specify one. It's more concise, and not ambiguous. Yes, you may need a lookup of one to the other. But you'll need a variety of lookups anyway, if you are doing anything with this value.

As this is no longer a pressing issue for Debian, I'd be fine to close this MR and move towards retiring this patch in Debian.
I do still think the change is a reasonable and correct cleanup, which is why I proposed it here.

@freakboy3742
Copy link
Contributor

The philosophical motivation for the change was that the multiarch-triplet is a more specific description of the platform than the platform name. Why specify both a vague and a specific name, when we can just specify one. It's more concise, and not ambiguous. Yes, you may need a lookup of one to the other. But you'll need a variety of lookups anyway, if you are doing anything with this value.

The counterargument - even accepting that this falls into the category of a "private" API, with a Python API to retrieve the module name, many of the uses will be in build systems and the like, a change of this nature will involve some disruption, and we don't actually gain anything (that I can tell) from the disruption, other than a shorter name.

This differs from the previous call, which was necessary to add support for multiple co-existing sysconfigdata instances. This was equally disruptive, but there was a tangible benefit to the change (one that the iOS and Android ports make good use of).

If we had our time over, I might agree that the more concise name would be better; but given the current state of things, and the limited benefits to a disruptive change, I'm not convinced the change is worth it.

On that basis, I'll close this PR and the original ticket; however, if you want to make your case, I'd suggest starting a discussion on https://discuss.python.org.

@stefanor stefanor deleted the sysconfigdata-multiarch branch December 22, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

the _sysconfigdata name should not encode MACHDEP and PLATFORM_TRIPLET

2 participants