-
Notifications
You must be signed in to change notification settings - Fork 13.8k
repr(C) enums: fix enum size computation #146504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some changes occurred in match lowering cc @Nadrieril These commits modify compiler targets. Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt This PR modifies cc @jieyouxu |
r? @davidtwco rustbot has assigned @davidtwco. Use |
}; | ||
|
||
// Each value fits in i32 or u32, but not all values fit into the same type. | ||
// FIXME: This seems to do the wrong thing for 32bit Linux? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will ask about this in #124403 (comment).
With the infrastructure in this PR we could fix this by setting the maximum enum size to 64bit on affected 32bit targets... but I have no idea which targets may be affected.^^
This comment has been minimized.
This comment has been minimized.
f28ab61
to
be18677
Compare
Some changes occurred in src/tools/clippy cc @rust-lang/clippy |
This comment has been minimized.
This comment has been minimized.
Hm, @rust-lang/rust-analyzer is there some way to get a |
Looking at the code there I think you can fetch it via |
Very nice, I'll take a look later today at the code, but I am in favor of such a change. |
compiler/rustc_abi/src/lib.rs
Outdated
/// Maximum size of #[repr(C)] enums (defaults to pointer size). | ||
pub c_enum_max_size: Option<Integer>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda feels like we should just have a Range here, but on the other hand ranges usually have an unwieldy API so I'm not sure that's actually a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how would the range look like in JSON?
Also, the top end of the range is optional so it can be pointer-sized (Integer
only has fixed-sized variants). Though maybe that's not actually correct and the pointer size has no influence on the enum size in C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does indeed seem to be an incorrect assumption. ( Apologies as I am probably going to reiterate things you have already read a bit, but there are a few comments I have to give along the way. )
C11
C11 says only that the enum's representation should fit an int
:
The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an
int
.
This would seem to disallow using something that only fits in a long
, but then the following is specified:
Each enumerated type shall be compatible with
char
, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined but shall be capable of representing the values of all the members of the enumeration.
This is what suggests that you can just do whatever in terms of sizing the C enum. Some use this to shrink enums to char
, thus our usage of c_enum_min_size
to begin with, but the previous clause seems to ban using long int
and thus seems to mandate a maximum.
Now, as we know, most C compilers except MSVC disregard this and allow long
-fitted values in practice. This is partially because C++ allows larger enums, which also have specified underlying integer types (like Rust does).
C23
So, they updated to allow the C++ form, and then updated the logic to make clearer that the real requirement is that there be a unifying underlying type:
All enumerations have an underlying type. The underlying type can be explicitly specified using an enum type specifier and is its fixed underlying type. If it is not explicitly specified, the underlying type is the enumeration’s compatible type, which is either
char
or a standard or extended signed or unsigned integer type.
An "extended... integer type" can include a "vendor type", as with __int128
on some platforms (so, i128). By saying "standard or extended... integer type", they have opened the door to a compiler even using essentially any size of integer type. They then reiterate that there must be a unifying type for the enum (emphasis mine):
For all the integer constant expressions which make up the values of the enumeration constants, there shall be a type capable of representing all the values that is a standard or extended signed or unsigned integer type, or
char
.
Note however that the reality that the enumeration members, as syntactically interacted with by programmers, are really constants, remains in effect. That means that if there is no fixed underlying type named by the programmer, then they have their own types when named as constants, which may differ from the type of the enum. This becomes visible when you do a cast, as given by 6.7.7.3.20 (EXAMPLE 4):
enum no_underlying {
a0
};
int main (void) {
int a = _Generic(a0,
int: 2,
unsigned char: 1,
default: 0
);
int b = _Generic((enum no_underlying)a0,
int: 2,
unsigned char: 1,
default: 0
);
return a + b;
}
The value returned is implementation-defined, based on whether the underlying type picked by the compiler for the enum is an int
or an unsigned char
. If an underlying type has been specified using the equivalent to our repr(u8)
:
enum underlying: unsigned char {
a0
};
Then we have a guaranteed value, because we have a guaranteed type for every one of the enum's constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Mind, the enumeration constant detail is pure nonsense from Rust's POV, so it mostly serves to explain why C compilers have such a rough time figuring out how to handle this, since they didn't have the good sense to not extend the language with implementation-defined "sure, why not?"s. On some level they have to simultaneously pretend their enums are both int
s and then some other type.
Here, it does mean that "pointer-size" for a C enum would seem to be meaningless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Thinking about it, I guess they can hypothetically pick intptr_t
as the type they're using, if that's a type that is held as distinct from any other integer type instead of a typedef? Hmm. I kinda think that's not going to be the case, though, and the limits will be arbitrary instead.
be18677
to
63d04de
Compare
rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer |
Thanks! That works, except that that function returns a EDIT: Though there are some unwraps of this elsewhere in the code, so maybe this is okay?
|
This comment has been minimized.
This comment has been minimized.
// MSVC does not seem to ever automatically increase enums beyond their default size (see | ||
// <https://github.com/rust-lang/rust/issues/124403>, <https://godbolt.org/z/1Pdb3hP9E>). | ||
c_enum_min_bits: Some(32), | ||
c_enum_max_bits: Some(32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also affects the UEFI targets. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, they use the MSVC ABI for things, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the psABI document the enum size rules? I would be surprised if it did.^^
I almost think we should seriously consider issuing a deprecation warning on |
A less disruptive option would be to make Once you get into the realm of integers that do not fit into |
I have extended the PR to also fix the enum size on 32bit targets, by defaulting the discriminant type to I also realized that this is a much bigger breaking change than I thought: any time the enum discriminant is given by calling a function, or by reusing some existing constants, it'll now have the wrong type... it used to be @bors try |
This comment has been minimized.
This comment has been minimized.
repr(C) enums: fix enum size computation
The job Click to see the possible cause of the failure (guessed by this bot)
|
@craterbot check |
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
🎉 Experiment
Footnotes
|
Yeah, as expected this approach is entirely dead in the water. There's tons of
|
☔ The latest upstream changes (presumably #146805) made this pull request unmergeable. Please resolve the merge conflicts. |
Of note, C23 defines the behavior of Thus, in my opinion, the best option is to:
It's unfortunate that this doesn't handle long enums on 32-bit targets, especially since C23 standardizes their support, but at least that is hinted at by the required |
Yeah, our hands are entirely tied for this one.
👍 , I was thinking the same. Basically we only support the pre-C23 fragment of enums in C, where nothing beyond c_int was standardized. Due to how we fix the type for the discriminant to be I wonder if this should be an FCW and eventually hard error, given that repr(C) doesn't really work as advertised for such enums.
We already do that, right? |
#124403 is exactly that we don't do so correctly (note the "wrapping as necessary") — on the x64 MSVC target, we translate But looking again at that sentence it was an unclear mix of terms to basically say “match how the C compiler would behave if each specified variant value is specified with literal suffix |
Wrapping the value into the c_int range does the wrong thing on GCC, so that does not seem like a clear improvement. Or are you saying we should keep the part of this PR where we have per-target "max enum size" information? TBH I am not convinced that's worth it. We should IMO declare everything exceeding c_int to be unsupported. Having enum discriminants wrap just on some targets (potentially causing target-specified duplicates) doesn't sound great. |
Uh, okay, so... how do I do this?
Lucky enough, "generic parameters may not be used in enum discriminant values", so it should be possible to do this all pre-monomorphization. @oli-obk @compiler-errors any ideas? |
I found the |
Closing in favor of #147017 which implements the lint mentioned above. |
This fixes #124403: the layout of some repr(C) enums did not always match the C compiler.
Specifically, what rustc did is to always begin with
isize
as the discriminant type for a repr(C) enum, and ensure that all discriminant values fit that type (i.e., that's the type for which the given discriminant value is const-eval'd; the value gets wrapped if needed). Then if the layout code notices that the enum can fit an i32 or u32, its size is reduced accordingly (but not more than that, since for most targets, 32bit is the minimum size for unannotated C enums).However, for 64bit MSVC, that's just not correct: as https://godbolt.org/z/1Pdb3hP9E shows, this enum has size 4, despite its discriminant not fitting a 32bit integer:
For 32bit targets, it's not correct wrt GCC/clang either: the above enum has size 8 there, but Rust gives it size 4.
So this PR adds a notion of "maximum enum size" to overwrite the
isize
default, and sets it to i32 for MSVC targets and to i64 everywhere else.This changes the size of some existing enums. I guess that's a kind of breaking change? But it changes their size to match what MSVC does, so... this is probably fine? The bigger issue is that it changes the type of the constant that denotes the discriminant value: if that is a literal, that's fine since its type can be inferred, but if the discriminant is computed by calling a function, then that function would no have to return
i32
on MSVC andi64
everywhere else. That seems like it could be quite problematic.A migration lint for the 64bit MSVC case should be possible but is not obvious -- we have to know the actual discriminant values to figure out if the size of the enum is different with the old vs new logic. For the 32bit GCC case however I don't know how to do one since the discriminant value is already wrapped to
isize
so how could we know whether it would have been different withi64
? We'd have to analyze the actual expression that defines the discriminant, not just its final value, and that gets arbitrarily complicated very quickly.TODO:
Cc @workingjubilee @CAD97 @RossSmyth @ChrisDenton @wesleywiser @lcnr