Skip to content

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Sep 25, 2025

Context: #146504.

The current behavior repr(C) enums is as follows:

  • The discriminant values are interpreted as const expressions of type isize
  • We compute the smallest size that can hold all discriminant values
  • The target spec contains the smallest size for repr(C) enums
  • We take the larger of these two sizes

Unfortunately, this doesn't always match what C compilers do. In particular, MSVC seems to always give enums a size of 4 bytes, whereas the algorithm above will give enums a size of up to 8 bytes on 64bit targets. Here's an example enum affected by this:

// We give this size 4 on 32bit targets (with a warning since the discriminant is wrapped to fit an isize)
// and size 8 on 64bit targets.
#[repr(C)]
enum OverflowingEnum {
    A = 9223372036854775807, // i64::MAX
}

// MSVC always gives this size 4 (without any warning).
// GCC always gives it size 8 (without any warning).
// Godbolt: https://godbolt.org/z/P49MaYvMd
enum overflowing_enum {
    OVERFLOWING_ENUM_A = 9223372036854775807,
};

If we look at the C standard, then up until C20, there was no official support enums without an explicit underlying type and with discriminants that do not fit an int. With C23, this has changed: now enums have to grow automatically if there is an integer type that can hold all their discriminants. MSVC does not implement this part of C23.

Furthermore, Rust fundamentally cannot implement this (without major changes)! Enum discriminants work fundamentally different in Rust and C:

  • In Rust, every enum has a discriminant type entirely determined by its repr flags, and then the discriminant values must be const expressions of that type. For repr(C), that type is isize. So from the outset we interpret 9223372036854775807 as an isize literal and never give it a chance to be stored in a bigger type. If the discriminant is given as a literal without type annotation, it gets wrapped implicitly with a warning; otherwise the user has to write as isize explicitly and thus trigger the wrapping. Later, we can then decide to make the tag that stores the discriminant smaller than the discriminant type if all discriminant values fit into a smaller type, but those values have allready all been made to fit an isize so nothing bigger than isize could ever come out of this. That makes the behavior of 32bit GCC impossible for us to match.
  • In C, things flow the other way around: every discriminant value has a type determined entirely by its constant expression, and then the type for the enum is determined based on that. IOW, the expression can have any type a priori, different variants can even use a different type, and then the compiler is supposed to look at the resulting values (presumably as mathematical integers) and find a type that can hold them all. For the example above, 9223372036854775807 is a signed integer, so the compiler looks for the smallest signed type that can hold it, which is long long, and then uses that to compute the size of the enum (at least that's what C23 says should happen and GCC does this correctly).

Realistically I think the best we can do is to not attempt to support C23 enums, and to require repr(C) enums to satisfy the C20 requirements: all discriminants must fit into a c_int. So that's what this PR implements, by adding a FCW for enums with discriminants that do not fit into c_int. As a slight extension, we do not lint enums where all discriminants fit into a c_uint (i.e. unsigned int): while C20 does (in my reading) not allow this, and C23 does not prescribe the size of such an enum, this seems to behave consistently across compilers (giving the enum the size of an unsigned int). IOW, the lint fires whenever our layout algorithm would make the enum larger than an int, irrespective of whether we pick a signed or unsigned discriminant. This extension was added because crater found multiple cases of such enums across the ecosystem.

Note that it is impossible to trigger this FCW on targets where isize and c_int are the same size (i.e., the typical 32bit target): since we interpret discriminant values as isize, by the time we look at them, they have already been wrapped. However, we have an existing lint (overflowing_literals) that should notify people when this kind of wrapping occurs implicitly. Also, 64bit targets are much more common. On the other hand, even on 64bit targets it is possible to fall into the same trap by writing a literal that is so big that it does not fit into isize, gets wrapped (triggering overflowing_literals), and the wrapped value fits into c_int. Furthermore, overflowing_literals is just a lint, so if it occurs in a dependency you won't notice. (Arguably there is also a more general problem here: for literals of type usize/isize, it is fairly easy to write code that only triggers overflowing_literals on 32bit targets, and to never see that lint if one develops on a 64bit target.)

Specifically, the above example triggers the FCW on 64bit targets, but on 32bit targets we get this err-by-default lint instead (which will be hidden if it occurs in a dependency):

error: literal out of range for `isize`
  --> $DIR/repr-c-big-discriminant1.rs:16:9
   |
LL |     A = 9223372036854775807,
   |         ^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `9223372036854775807` does not fit into the type `isize` whose range is `-2147483648..=2147483647`
   = note: `#[deny(overflowing_literals)]` on by default

Also see the tests added by this PR.

This isn't perfect, but so far I don't think I have seen a better option. In #146504 I tried adjusting our enum logic to make the size of the example enum above actually match what C compilers do, but that's a massive breaking change since we have to change the expected type of the discriminant expression from isize to i64 or even i128 -- so that seems like a no-go. To improve the lint we could analyze things on the HIR level and specifically catch "repr(C) enums with discriminants defined as literals that are too big", but that would have to be on top of the lint in this PR I think since we'd still want to also always check the actually evaluated value (which we can't always determined on the HIR level).

Cc @workingjubilee @CAD97

@rustbot
Copy link
Collaborator

rustbot commented Sep 25, 2025

This PR modifies tests/auxiliary/minicore.rs.

cc @jieyouxu

@rustbot rustbot added A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 25, 2025

r? @davidtwco

rustbot has assigned @davidtwco.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@future_incompatible = FutureIncompatibleInfo {
reason: FutureIncompatibilityReason::FutureReleaseError,
reference: "issue #124403 <https://github.com/rust-lang/rust/issues/124403>",
report_in_deps: false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind making this show up in dependencies immediately since it should be rather rare; let's see what the lang team thinks.

@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from ea5e355 to 8d8188e Compare September 25, 2025 14:25
@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from 8d8188e to ca6a04d Compare September 25, 2025 15:58
@workingjubilee
Copy link
Member

I think we should crater it (presumably dialed up to deny-by-default) before we go with any report_in_deps: true right off the bat.

@RalfJung
Copy link
Member Author

presumably dialed up to deny-by-default

That's still ignored in dependencies, we'd want a hard error. And sadly converting between a lint and a hard error is a huge pain in the neck ever since we got that (by now apparently unmaintained) translatable diagnostics infrastructure. :/

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from ca6a04d to 491bfd6 Compare September 25, 2025 17:31
@RalfJung
Copy link
Member Author

Ah, I can just not use translatable diagnostics. :)

@bors try

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 25, 2025
FCW for repr(C) enums whose discriminant values do not fit into a c_int
@workingjubilee
Copy link
Member

Yes, if new code is added using the translatable diagnostics infrastructure that's fine but also we should just bypass it the moment it is an inconvenience.

@rust-bors
Copy link

rust-bors bot commented Sep 25, 2025

☀️ Try build successful (CI)
Build commit: 0f31ace (0f31acebf540c89e4a4d5d114959fe91973419cb, parent: 6f34f4ee074ce0affc7bbf4e2c835f66cd576f13)

@RalfJung
Copy link
Member Author

@craterbot check

@craterbot
Copy link
Collaborator

👌 Experiment pr-147017 created and queued.
🤖 Automatically detected try build 0f31ace
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 25, 2025
Copy link
Member

@davidtwco davidtwco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me. I had looked at the previous PR but didn't have any suggestions as to how to resolve the issues with that approach, so this seems like the best we can do.

View changes since this review

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Oct 1, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2025

This did uncover an interesting case... enums like this where all discriminants fit in a c_uint but not all fit in a c_int. I think this technically is not covered by C20 (and also C23 does not define whether this will end up with the next-largest signed type, or an unsigned type).

But unless we have evidence that there is actually non-portability with such enums, there is probably no good reason to lint on that.

@workingjubilee
Copy link
Member

Wonderful, thank you for making the crater run and digging that up.

Yes, we probably want to remain compatible with that, as the way Standard C is written suggests strongly that the flexibility between using i32 and u32 is basically "the reason" that some of the rules of C enums are ambiguous. The other compiler-specific type extensions that have resulted seem an accidental consequence.

Is there a good way to do that?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 1, 2025

Yeah it's definitely possible to extend the logic this PR added, though the logic becomes more gnarly...

(Layout code already kind of has that logic but I don't know if we can reuse it.)

@RalfJung
Copy link
Member Author

RalfJung commented Oct 2, 2025

FWIW enums like that will already trigger the overflowing_literals lint on 32bit systems. But that doesn't show up in dependencies so it is easy to miss...

@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from 789c382 to 470702b Compare October 2, 2025 08:12
@RalfJung
Copy link
Member Author

RalfJung commented Oct 2, 2025

@bors try

rust-bors bot added a commit that referenced this pull request Oct 2, 2025
FCW for repr(C) enums whose discriminant values do not fit into a c_int
@rust-bors

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Oct 2, 2025

☀️ Try build successful (CI)
Build commit: 8037014 (80370147ae6685a1436ab7bcd206f61157af70d2, parent: 42b384ec0dfcd528d99a4db0a337d9188a9eecaa)

@RalfJung
Copy link
Member Author

RalfJung commented Oct 2, 2025

@craterbot
Copy link
Collaborator

👌 Experiment pr-147017-1 created and queued.
🤖 Automatically detected try build 8037014
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot craterbot added S-waiting-on-crater Status: Waiting on a crater run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 2, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Oct 3, 2025

I began this post trying to point out some subtle behavior that is actually fine albeit in a non-obvious way, and then realized there may actually be a problem here...

There's something non-obvious going on with enums like this one:

#[repr(C)]
enum E {
  A = 0,
  B = 0xffffffff,
}
  • On 64bit, these are interpreted as isize, so the discriminants are considered to have the values 0 and 4294967295. The enum layout logic then realizes that these both fit into a 32bit unsigned integer, so that is picked as the underlying type for the discriminant. Loading/storing the discriminant involves a cast between u32 and isize.
  • On 32bit, first of all we get a deny-by-default lint since 0xffffffff does not fit an isize. If we ignore that (e.g. because we don't see it as it occurs in a dependency), the discriminants are considered to have the values 0 and -1. The enum layout logic then decides they all fit into i32, and on most targets this is picked as the underlying type. Loading/storing the discriminant involves a cast between i32 and isize, which are the same type. We end up with the same size as on 64bit, but a different sign... which might actually matter for ABI purposes? (EDIT: I don't think there's any target where this actually makes a difference, since enums are generally 32bit on 32bit targets and then sign vs zero-extending doesn't matter.)
    On top of this, some targets are special! If c_enum_min_size is set to less than the size of a C int, then we will give this enum a smaller size, since the discriminant values all fit into an i8... so for instance on armv7r-none-eabi, this enum gets size 1.

I'm not entirely sure what, if anything, we should do here. This is not a new issue and so far I didn't see any reports about it. When compiling the code for the affected target, a deny-by-default lint is triggered saying that 0xffffffff is wrapped to fit into isize. The only way this could ever really go wrong is if such an enum is defined in crate A which is never tested on a 32bit target, and then some downstream crate starts using it for FFI on 32bit.

OTOH this makes me question whether it is really a good idea to exempt enums where all discriminant values fit into unsigned int from this lint. Arguably, enums like the ones crater identified should be declared with repr(u32).

@craterbot
Copy link
Collaborator

🚧 Experiment pr-147017-1 is now running

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot
Copy link
Collaborator

🎉 Experiment pr-147017-1 is completed!
📊 3 regressed and 0 fixed (1706 total)
📊 96 spurious results on the retry-regessed-list.txt, consider a retry1 if this is a significant amount.
📰 Open the summary report.

⚠️ If you notice any spurious failure please add them to the denylist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

Footnotes

  1. re-run the experiment with crates=https://crater-reports.s3.amazonaws.com/pr-147017-1/retry-regressed-list.txt

@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Oct 4, 2025
@RalfJung
Copy link
Member Author

RalfJung commented Oct 5, 2025

  • this is an actually-too-big enum, but in the latest version of this crate it already got converted to u64
  • this has -1 and 0xffffffff in the same enum. The variant with the latter is called ForceDWORD so this seems to be an indirect attempt to control the size, but it doesn't look right since an enum that has both negative and u32::MAX discriminants is 8 bytes large on GCC compilers.

Those both look like legit cases where the lint should indeed fire.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 5, 2025

@rust-lang/lang so to summarize, the questions for you are:

  • Are you okay with the general direction here (not changing anything about how we compute layout, but identify potentially problematic enums and FCW against them telling people to use an explicit integer type instead)? Explicit integer types have downsides, e.g. there's no repr(c_int), but given all its problems I don't think using repr(C) instead is a good fix for this... or if you do, then write all discriminants as ... as c_int as isize, which will also silence the FCW.
  • Should the FCW start out as immediately also reporting issues in dependencies, or not?
  • What about enums where all values would fit into an unsigned int, but some are too big for an int? If we warn for them, we'll trigger the warning a lot more often (crater says 57 vs 3 cases; I didn't look through those 57 regressions to see how many root regressions this has but it is clear that a good chunk of them come from seccomp-sys). On the targets where the warning would trigger (which can only be on 64bit targets), I don't think we have an example of such enums actually being problematic. They are definitely problematic on "short-enums" targets (ARM32, hexagon) but there the new lint can't fire; the existing overflowing_literals lint fires instead. So this partially boils down to, do we want to reject enums that are problematic for the current target, or reject problematic enums in general? We're not doing amazing at the latter for the reasons explained in the PR description -- we'd need a HIR-level lint directly accessing the literals instead of their already-wrapped results -- but this particular check is something we can do easily. Do we want a principled check with very clear scope, or something slightly more fuzzy that can detect more potential problems?

@workingjubilee
Copy link
Member

workingjubilee commented Oct 5, 2025

re: 2 (FCW immediately or not?): Unless we choose the most-minimal-impact version of this, I think we should slow-roll this instead of jumping to warning in deps to avoid a repeat of the "lol this FCW hits the windows crate" situation, as it seems likely to be something that is quickly found and fixed in deps.

re: 3 (enums that hypothetically fit into uints) and this:

OTOH this makes me question whether it is really a good idea to exempt enums where all discriminant values fit into unsigned int from this lint. Arguably, enums like the ones crater identified should be declared with repr(u32).

Yes, despite saying that allowing u32 or i32 is "the point" of how C enums work, I think based on what you found, I think that maybe we shouldn't try to further handle that in Rust. We should probably take a stronger "#[repr(C)] enum is bad, actually" stance. People are obviously trying to translate their intuitions from C compilers, but the ways the C compilers behave are non-intuitive, so the result simply cannot make sense when you try to cross-reference the way rustc and C compilers behave.

@traviscross traviscross added I-lang-radar Items that are on lang's radar and will need eventual work or consideration. T-lang Relevant to the language team needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. labels Oct 5, 2025
@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from 470702b to aa78e00 Compare October 5, 2025 20:51
@RalfJung RalfJung force-pushed the repr-c-big-discriminant branch from aa78e00 to 929a236 Compare October 5, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-test-infra-minicore Area: `minicore` test auxiliary and `//@ add-core-stubs` I-lang-nominated Nominated for discussion during a lang team meeting. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants