Skip to content

Conversation

@picnixz
Copy link
Member

@picnixz picnixz commented Feb 7, 2025

This PR fixes the UBSan failures and addresses some minor cosmetic changes. PEP-7 changes were not applied since they could scramble the diff but other semantic changes affecting the signature of touched functions may have been done.

@ZeroIntensity ZeroIntensity self-requested a review February 7, 2025 14:44
@picnixz picnixz marked this pull request as ready for review February 7, 2025 18:09
@picnixz
Copy link
Member Author

picnixz commented Feb 7, 2025

Ah I'm sorry I put this one as ready for review but I learned that _ + capital letter was also UB (#128248 (comment)) so I'll fix them tomorrow.

@picnixz picnixz marked this pull request as draft February 7, 2025 18:41
@ZeroIntensity
Copy link
Member

It's not exactly UB, it's just that those names are reserved for the C standard. In practice, it's probably fine to leave it, and not worth the inconsistency/confusion of adding a _Py prefix to everything.

@picnixz
Copy link
Member Author

picnixz commented Feb 7, 2025

The problem with _Py would remain. It seems to be UB with _ + capital letter. So I followed @encukou's advice.

@ZeroIntensity
Copy link
Member

Half the codebase is UB then! I understand the issue, but what's the actual chance that we'll get broken by prefixing something with _Py? Or with _ClassName_CAST?

@picnixz
Copy link
Member Author

picnixz commented Feb 8, 2025

The chance is probably almost zero but Petr wanted to avoid adding more UBs. So I guess it makes sense. By the way, I found it easier to actually type the macro without the leading underscore as well so it's also a good idea to actually ease writing!

@encukou
Copy link
Member

encukou commented Feb 8, 2025

Half the codebase is UB then!

Half?
Most of the codebase technically has UB.

It's not exactly UB, it's just that those names are reserved for the C standard.

It is exactly UB. That's a hard fact. The C23 standard says:

  • “All identifiers that begin with a double underscore (__) or begin with an underscore (_) followed by an uppercase letter are reserved for any use, except those identifiers which are lexically identical to keywords.”
  • “All identifiers that begin with an underscore are reserved for use as identifiers with file scope in both the ordinary and tag name spaces.”
  • “If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), the behavior is undefined.”

The chance is probably almost zero

Also yes. CPython is big enough that we can hope that compiler/libc authors won't actually use the _Py prefix, or underscore+lowercase, for naming new features. The disruption wouldn't be worth it, we hope.
The fact remains that CPython is doing the wrong thing here, and relying on the compiler/libc to be lenient.

In practice, new features in C nowadays tend to start with double-underscore or underscore+uppercase. Let's not use more of those -- especially in PRs that remove UB.


(Speaking of technicalities: PEP-7 says “#undef file local macros after use”. I think it's OK to ignore that for macros that would go in a header, if there was a header for the module. But it's always good to know the rules one is breaking :)

@picnixz
Copy link
Member Author

picnixz commented Feb 8, 2025

Concerning the #undef, I can address that in a separate issue (I don't mind doing mechanical tasks like these, especially if this kind of task can be automated; the task of fixing UB is much more harder though). We can also fix this kind of issue by using a static inline function but since I started using macros instead for the.. 50 previous PRs, I think it's ok to keep a macro in a .c file (and not undef it if it has small chance to being included from a header)

@picnixz picnixz marked this pull request as ready for review February 8, 2025 09:27
@ZeroIntensity
Copy link
Member

It is exactly UB. That's a hard fact. The C23 standard says:

Not disagreeing, I'm just saying it's not UB in the same way that using uninitialized memory is UB. I'm just pointing out that messing around with it probably isn't worth the weird inconsistencies in our macros. There's no actual plan to remove _ from the codebase, right?

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one question.

@encukou encukou merged commit 6669905 into python:main Feb 17, 2025
43 checks passed
@picnixz picnixz deleted the fix/ubsan/interp-111178 branch February 17, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants