Skip to content

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 13, 2025

This fixes #124403: the layout of some repr(C) enums did not always match the C compiler.

Specifically, what rustc did is to always begin with isize as the discriminant type for a repr(C) enum, and ensure that all discriminant values fit that type (i.e., that's the type for which the given discriminant value is const-eval'd; the value gets wrapped if needed). Then if the layout code notices that the enum can fit an i32 or u32, its size is reduced accordingly (but not more than that, since for most targets, 32bit is the minimum size for unannotated C enums).

However, for 64bit MSVC, that's just not correct: as https://godbolt.org/z/1Pdb3hP9E shows, this enum has size 4, despite its discriminant not fitting a 32bit integer:

enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807, // i64::MAX
};

For 32bit targets, it's not correct wrt GCC/clang either: the above enum has size 8 there, but Rust gives it size 4.

So this PR adds a notion of "maximum enum size" to overwrite the isize default, and sets it to i32 for MSVC targets and to i64 everywhere else.

This changes the size of some existing enums. I guess that's a kind of breaking change? But it changes their size to match what MSVC does, so... this is probably fine? The bigger issue is that it changes the type of the constant that denotes the discriminant value: if that is a literal, that's fine since its type can be inferred, but if the discriminant is computed by calling a function, then that function would no have to return i32 on MSVC and i64 everywhere else. That seems like it could be quite problematic.

A migration lint for the 64bit MSVC case should be possible but is not obvious -- we have to know the actual discriminant values to figure out if the size of the enum is different with the old vs new logic. For the 32bit GCC case however I don't know how to do one since the discriminant value is already wrapped to isize so how could we know whether it would have been different with i64? We'd have to analyze the actual expression that defines the discriminant, not just its final value, and that gets arbitrarily complicated very quickly.

TODO:

  • I have not yet hooked this up in the target spec JSON parser; I'll wait for feedback on the PR first.
  • Make sure t-types is fine with what this does on the type system level
  • Inquire with the lang team about migration plans

Cc @workingjubilee @CAD97 @RossSmyth @ChrisDenton @wesleywiser @lcnr

@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

Some changes occurred in match lowering

cc @Nadrieril

These commits modify compiler targets.
(See the Target Tier Policy.)

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

This PR modifies tests/auxiliary/minicore.rs.

cc @jieyouxu

@rustbot rustbot added A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 13, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

r? @davidtwco

rustbot has assigned @davidtwco.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

};

// Each value fits in i32 or u32, but not all values fit into the same type.
// FIXME: This seems to do the wrong thing for 32bit Linux?
Copy link
Member Author

@RalfJung RalfJung Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask about this in #124403 (comment).

With the infrastructure in this PR we could fix this by setting the maximum enum size to 64bit on affected 32bit targets... but I have no idea which targets may be affected.^^

@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the msvc-c-enums branch 3 times, most recently from f28ab61 to be18677 Compare September 13, 2025 12:31
@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@rustbot rustbot added the T-clippy Relevant to the Clippy team. label Sep 13, 2025
@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member Author

Hm, @rust-lang/rust-analyzer is there some way to get a HasDataLayout value in RA's chalk logic?

@Veykril
Copy link
Member

Veykril commented Sep 13, 2025

Looking at the code there I think you can fetch it via self.db.target_data_layout(self.krate) (self being ChalkContext) if I see this right. Though that code path seems to have been moved on rust-analyzer master as we are currently in the middle of removing chalk (sync is stalled though #146238)

@RossSmyth
Copy link
Contributor

Very nice, I'll take a look later today at the code, but I am in favor of such a change.

Comment on lines 291 to 292
/// Maximum size of #[repr(C)] enums (defaults to pointer size).
pub c_enum_max_size: Option<Integer>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda feels like we should just have a Range here, but on the other hand ranges usually have an unwieldy API so I'm not sure that's actually a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how would the range look like in JSON?

Also, the top end of the range is optional so it can be pointer-sized (Integer only has fixed-sized variants). Though maybe that's not actually correct and the pointer size has no influence on the enum size in C.

Copy link
Member

@workingjubilee workingjubilee Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does indeed seem to be an incorrect assumption. ( Apologies as I am probably going to reiterate things you have already read a bit, but there are a few comments I have to give along the way. )

C11

C11 says only that the enum's representation should fit an int:

The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

This would seem to disallow using something that only fits in a long, but then the following is specified:

Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined but shall be capable of representing the values of all the members of the enumeration.

This is what suggests that you can just do whatever in terms of sizing the C enum. Some use this to shrink enums to char, thus our usage of c_enum_min_size to begin with, but the previous clause seems to ban using long int and thus seems to mandate a maximum.

Now, as we know, most C compilers except MSVC disregard this and allow long-fitted values in practice. This is partially because C++ allows larger enums, which also have specified underlying integer types (like Rust does).

C23

So, they updated to allow the C++ form, and then updated the logic to make clearer that the real requirement is that there be a unifying underlying type:

All enumerations have an underlying type. The underlying type can be explicitly specified using an enum type specifier and is its fixed underlying type. If it is not explicitly specified, the underlying type is the enumeration’s compatible type, which is either char or a standard or extended signed or unsigned integer type.

An "extended... integer type" can include a "vendor type", as with __int128 on some platforms (so, i128). By saying "standard or extended... integer type", they have opened the door to a compiler even using essentially any size of integer type. They then reiterate that there must be a unifying type for the enum (emphasis mine):

For all the integer constant expressions which make up the values of the enumeration constants, there shall be a type capable of representing all the values that is a standard or extended signed or unsigned integer type, or char.

Note however that the reality that the enumeration members, as syntactically interacted with by programmers, are really constants, remains in effect. That means that if there is no fixed underlying type named by the programmer, then they have their own types when named as constants, which may differ from the type of the enum. This becomes visible when you do a cast, as given by 6.7.7.3.20 (EXAMPLE 4):

enum no_underlying {
    a0
};

int main (void) {
    int a = _Generic(a0,
        int: 2,
        unsigned char: 1,
        default: 0
    );
    int b = _Generic((enum no_underlying)a0,
        int: 2,
        unsigned char: 1,
        default: 0
    );
    return a + b;
}

The value returned is implementation-defined, based on whether the underlying type picked by the compiler for the enum is an int or an unsigned char. If an underlying type has been specified using the equivalent to our repr(u8):

enum underlying: unsigned char {
    a0
};

Then we have a guaranteed value, because we have a guaranteed type for every one of the enum's constants.

Copy link
Member

@workingjubilee workingjubilee Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Mind, the enumeration constant detail is pure nonsense from Rust's POV, so it mostly serves to explain why C compilers have such a rough time figuring out how to handle this, since they didn't have the good sense to not extend the language with implementation-defined "sure, why not?"s. On some level they have to simultaneously pretend their enums are both ints and then some other type.

Here, it does mean that "pointer-size" for a C enum would seem to be meaningless.

Copy link
Member

@workingjubilee workingjubilee Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Thinking about it, I guess they can hypothetically pick intptr_t as the type they're using, if that's a type that is held as distinct from any other integer type instead of a typedef? Hmm. I kinda think that's not going to be the case, though, and the limits will be arbitrary instead.

@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2025

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

@rustbot rustbot added the T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue. label Sep 13, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Sep 13, 2025

Looking at the code there I think you can fetch it via self.db.target_data_layout(self.krate) (self being ChalkContext) if I see this right. Though that code path seems to have been moved on rust-analyzer master as we are currently in the middle of removing chalk (sync is stalled though #146238)

Thanks! That works, except that that function returns a Result. I presume unwrap is not the correct error handling strategy? ;)

EDIT: Though there are some unwraps of this elsewhere in the code, so maybe this is okay?

.map(|layout| Layout(layout, db.target_data_layout(self.krate(db).into()).unwrap()))

db.target_data_layout(parent_enum.krate(db).into()).unwrap(),

@rust-log-analyzer

This comment has been minimized.

// MSVC does not seem to ever automatically increase enums beyond their default size (see
// <https://github.com/rust-lang/rust/issues/124403>, <https://godbolt.org/z/1Pdb3hP9E>).
c_enum_min_bits: Some(32),
c_enum_max_bits: Some(32),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also affects the UEFI targets. Is that intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, they use the MSVC ABI for things, yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the psABI document the enum size rules? I would be surprised if it did.^^

@workingjubilee
Copy link
Member

workingjubilee commented Sep 13, 2025

I almost think we should seriously consider issuing a deprecation warning on repr(C) enums and say that you should use a C compiler that can handle C23's fixed underlying types for enums, then update to match that after picking your preferred integer size from stdint.h or such.

@RalfJung
Copy link
Member Author

A less disruptive option would be to make repr(C) only support values that fit in int (but still have the automatic shrinking a la "short enum"). Maybe with some allowance for the case where all values fit in unsigned int (but not all fit in int), if that is sufficiently coherent among compilers.

Once you get into the realm of integers that do not fit into long long int, things start to get dicey as technically even just writing those literals (without a u suffix) is already UB in C. But in C you at least have the option of putting a trailing u everywhere and then you get an unsigned enum; in Rust we'd need repr(C_unsigned) or so for that...

@RalfJung RalfJung changed the title repr(C) enums: fix enum size mismatch on MSVC targets repr(C) enums: fix enum size computation Sep 14, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Sep 14, 2025

I have extended the PR to also fix the enum size on 32bit targets, by defaulting the discriminant type to i64.

I also realized that this is a much bigger breaking change than I thought: any time the enum discriminant is given by calling a function, or by reusing some existing constants, it'll now have the wrong type... it used to be isize everywhere, now it's i32 on MSVC and i64 everywhere else. That seems pretty bad, I honestly don't know if we can actually pull this through. Let's see what crater says.

@bors try

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 14, 2025
repr(C) enums: fix enum size computation
@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-19-2 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@rust-bors
Copy link

rust-bors bot commented Sep 14, 2025

☀️ Try build successful (CI)
Build commit: 969026c (969026c535f4240ff7866ef11207dfda18a26af6, parent: ddaf12390d3ffb7d5ba74491a48f3cd528e5d777)

@RalfJung
Copy link
Member Author

@craterbot check

@craterbot
Copy link
Collaborator

👌 Experiment pr-146504 created and queued.
🤖 Automatically detected try build 969026c
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 14, 2025
@craterbot
Copy link
Collaborator

🚧 Experiment pr-146504 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot
Copy link
Collaborator

🎉 Experiment pr-146504 is completed!
📊 1783 regressed and 0 fixed (699011 total)
📊 1492 spurious results on the retry-regessed-list.txt, consider a retry1 if this is a significant amount.
📰 Open the summary report.

⚠️ If you notice any spurious failure please add them to the denylist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

Footnotes

  1. re-run the experiment with crates=https://crater-reports.s3.amazonaws.com/pr-146504/retry-regressed-list.txt

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Sep 16, 2025
@RalfJung
Copy link
Member Author

Yeah, as expected this approach is entirely dead in the water. There's tons of

[INFO] [stdout] error[E0308]: mismatched types
[INFO] [stdout]   --> /opt/rustwide/cargo-home/registry/src/index.crates.io-1949cf8c6b5b557f/dbus-0.5.4/src/watch.rs:48:13
[INFO] [stdout]    |
[INFO] [stdout] 48 |     Error = ffi::DBUS_WATCH_ERROR as isize,
[INFO] [stdout]    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `i64`, found `isize`
[INFO] [stdout] error[E0308]: mismatched types
[INFO] [stdout]   --> /opt/rustwide/cargo-home/registry/src/index.crates.io-1949cf8c6b5b557f/libsqlite3-sys-0.11.1/src/lib.rs:24:31
[INFO] [stdout]    |
[INFO] [stdout] 24 |     SQLITE_LIMIT_SQL_LENGTH = SQLITE_LIMIT_SQL_LENGTH as isize,
[INFO] [stdout]    |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `i64`, found `isize`
[INFO] [stdout] error[E0308]: mismatched types
[INFO] [stdout]   --> /opt/rustwide/cargo-home/registry/src/index.crates.io-1949cf8c6b5b557f/sp-core-38.0.0/src/offchain/mod.rs:69:15
[INFO] [stdout]    |
[INFO] [stdout] 69 |     PERSISTENT = 0_isize,
[INFO] [stdout]    |                  ^^^^^^^ expected `i64`, found `isize`
[INFO] [stdout]    |

@bors
Copy link
Collaborator

bors commented Sep 20, 2025

☔ The latest upstream changes (presumably #146805) made this pull request unmergeable. Please resolve the merge conflicts.

@CAD97
Copy link
Contributor

CAD97 commented Sep 24, 2025

Of note, C23 defines the behavior of enum declarations w.r.t. enumeration-constant values that don't fit in c_int. Specifically, it specifies the type to be that of the integer expression if the value does not fit in c_int.

Thus, in my opinion, the best option is to:

  • Calculate the discriminant value using isize;
  • If the value exceeds that of c_int::MAX, lint the enum as being questionably portable and recommend the use of #[repr(int)]; and
  • Use the compatible type selected by the target C compiler as the underlying type, wrapping any discriminant values if necessary, and erroring if this creates duplicate discriminators.

It's unfortunate that this doesn't handle long enums on 32-bit targets, especially since C23 standardizes their support, but at least that is hinted at by the required as isize.

@RalfJung
Copy link
Member Author

Calculate the discriminant value using isize;

Yeah, our hands are entirely tied for this one.

If the value exceeds that of c_int::MAX, lint the enum as being questionably portable and recommend the use of #[repr(int)]

👍 , I was thinking the same.

Basically we only support the pre-C23 fragment of enums in C, where nothing beyond c_int was standardized. Due to how we fix the type for the discriminant to be isize, we can't support C23 enums.

I wonder if this should be an FCW and eventually hard error, given that repr(C) doesn't really work as advertised for such enums.

Use the compatible type selected by the target C compiler as the underlying type, wrapping any discriminant values if necessary, and erroring if this creates duplicate discriminators.

We already do that, right?

@CAD97
Copy link
Contributor

CAD97 commented Sep 24, 2025

We already do that, right?

#124403 is exactly that we don't do so correctly (note the "wrapping as necessary") — on the x64 MSVC target, we translate repr(C) enums with a variant value > c_uint::MAX as i64 enums with the value intact, but the MSVC compiler wraps the value into the c_int range.

But looking again at that sentence it was an unclear mix of terms to basically say “match how the C compiler would behave if each specified variant value is specified with literal suffix z (ssize_t).”

@RalfJung
Copy link
Member Author

Wrapping the value into the c_int range does the wrong thing on GCC, so that does not seem like a clear improvement.

Or are you saying we should keep the part of this PR where we have per-target "max enum size" information? TBH I am not convinced that's worth it. We should IMO declare everything exceeding c_int to be unsupported. Having enum discriminants wrap just on some targets (potentially causing target-specified duplicates) doesn't sound great.

@RalfJung
Copy link
Member Author

If the value exceeds that of c_int::MAX, lint the enum as being questionably portable and recommend the use of #[repr(int)]

Uh, okay, so... how do I do this?

  • Do I make this a lint pass? But then all I get is HIR stuff and I have no idea how to get the discriminant values out of that.
  • Do I put this into the layout computation code? That doesn't sound right, it would make this a post-mono error.
  • Do I put this wherever HIR types are converted to "middle" types? This sounds like the best plan to me, assuming we actually do this for all types in a crate. In particular this avoids having to duplicate the logic that fills in missing discriminants by incrementing the previous one. Where does that happen?

Lucky enough, "generic parameters may not be used in enum discriminant values", so it should be possible to do this all pre-monomorphization.

@oli-obk @compiler-errors any ideas?

@RalfJung
Copy link
Member Author

I found the EnumDiscriminantOverflowed error and where it is emitted. That looks like it may be a good place also for this new check.

@RalfJung
Copy link
Member Author

Closing in favor of #147017 which implements the lint mentioned above.

@RalfJung RalfJung closed this Sep 25, 2025
@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rust-analyzer Relevant to the rust-analyzer team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

repr(C) on an enum accepts discriminants that do not fit into the default C enum size
10 participants