x64: report an error instead of panicking on stack frames larger than i32::MAX#13783
x64: report an error instead of panicking on stack frames larger than i32::MAX#13783soumik15630m wants to merge 2 commits into
Conversation
| if rsp_relative.saturating_add(probe_margin) > i32::MAX as u64 | ||
| || rbp_relative > i32::MAX as u64 | ||
| { | ||
| return Err(crate::CodegenError::Unsupported( |
There was a problem hiding this comment.
Should probably use ImplLimitExceeded instead.
There was a problem hiding this comment.
Good call, thanks. ImplLimitExceeded doesn't carry a message though ...so is it okay?Happy to switch either way; want to make sure I'm not throwing away information you'd want to keep.
| insts | ||
| } | ||
|
|
||
| fn validate_frame_layout(frame_layout: &FrameLayout, guard_size: u32) -> CodegenResult<()> { |
There was a problem hiding this comment.
This validation should probably be done in generic code (maybe the caller of compute_frame_layout?). The FrameLayout type uses 32bit ints, so the 4GB stack frame size applies to all targets.
There was a problem hiding this comment.
Sure ... will push a commit soon
There was a problem hiding this comment.
Pushed a commit adding a generic check in compute_frame_layout.
I kept the x64-specific validate_frame_layout check as well, since it's
catching a different, tighter condition: x64 encodes these offsets as
signed 32-bit immediates (i32::MAX, not u32::MAX), and the inline
probestack loop rounds the frame size up to the next guard-page boundary
before encoding it - so a frame just under i32::MAX can get pushed
over it by that rounding. The original repro (~2.147GB) is under
u32::MAX (~4.295GB), so the generic check alone doesn't catch it -
only the x64-specific one does.
Happy to remove the x64-specific check if you'd rather keep this simpler
and generic-only, but wanted to flag that it currently exists to catch
a real case the generic check can't - unless other backends share x64's
signed-immediate constraint, in which case maybe it'd make sense to
generalize the tighter check too rather than keep it x64-only. Let me
know which direction you'd prefer.
48d5999 to
b94de39
Compare
|
Thanks for the PR, but personally I'm pretty wary of adding a parallel path that calculates information about a frame that duplicates what's already in each backend. Frame handling/layout is pretty precise and requires extremely careful handling, and I wouldn't be confident that it could be accurately mirrored elsewhere in the system. What I think should happen here is either (a) to propagate a result upwards at the time of a |
|
@alexcrichton 1. RSP arithmetic (the 2. Stack-relative memory addressing (
One thing worth flagging on option (a): Given that, (b) seems like the more proportionate direction, and I'm |
|
I'd like to confirm that you're aware of the Bytecode Alliance's AI tool usage policy and would ask that if you haven't seen that that you read over it. My main concern here is still that I would want to make sure that there's not duplication in calculating or interpreting frame layout. I believe there should be a "mostly central" location that each backend handles all this and ideally a limit, if any, would fall out of that. It's true that returning an error during emission would require changing signatures and it would also affect other situations such as stack offsets. I'll bring this up at next week's Cranelift meeting because I'm not sure how best to handle this. One option would be to just say "all backends must support up to N byte stack frames" and return a hard error for anything over N for all backends. That would notably be a Cranelift/cross-platform limitation rather than just the x64 backend. Alternatively if N is high enough (e.g. 4GiB) then it would require changes to the x64 backend (and possibly others). I think it'd be good to loop in others on this discussion though. |
|
Sure I will go through the ai tool usuage policy ... and yeah this issue needs a deeper discussion cause to me both the approach looks equally reasonable ..so i'm happy to wait for that discussion, or start on whichever direction comes out of it. Let me know how I can help move it forward. |
|
(Ghost-of-cfallin popping in from PTO briefly to offer thoughts as original-person-who-inflicted-this-ABI-code-upon-us) I think that while this is a real issue that should indeed be surfaced into |
Summary
x64 codegen panics with
TryFromIntErrorwhen a function's stack frameis large enough that offsets relative to it no longer fit in a 32-bit
immediate. This affects three places: the inline stack-probe loop
(
guard_size * probe_count), clobber save/restore (RSP-relativeoffsets), and incoming-argument access (RBP-relative offsets via
SyntheticAmode::IncomingArg::finalize).This was reported downstream in
rust-lang/rustc_codegen_cranelift#1656, where a ~2GB stack allocation
(
[0; 536870787_usize], a[i32; N]array) triggers an ICE:thread 'optimize module ...' panicked at cranelift/codegen/src/isa/x64/inst/emit.rs:373:18:
guard_size * probe_count is too large to fit in a 32-bit immediate: TryFromIntError(PosOverflow)
What this does
Adds
ABIMachineSpec::validate_frame_layout, a new trait method with ano-op default, overridden for x64. It's called from
VCode::emitrightafter
compute_frame_layout, before any offset encoding takes place,and checks:
On overflow, this returns
CodegenError::Unsupportedinstead ofpanicking. Only x64 overrides the new trait method; other backends are
unaffected (
aarch64/riscv64/s390x/pulleyneeded a one-line?each to propagate
emit's now-fallible return type, no behaviorchange).
Testing
isa/x64/abi.rscovering: a normal frame(accepted), an oversized
fixed_frame_storage_size(rejected), anoversized
tail_args_sizevia the RBP-relative path (rejected), anda frame that's under
i32::MAXon its own but pushed over by theguard-page rounding margin (rejected) — this last case specifically
reproduces the original bug.
Known follow-up (not included here)
bjorn3's comment mentions "all stack load/store instructionlowerings" more broadly. I traced and covered the two offset-encoding
paths that are actually reachable from the repro (
gen_clobber_save/gen_clobber_restoreandSyntheticAmode::IncomingArg::finalize), butI have not exhaustively audited every stack-relative offset computation
in the x64 backend for the same class of narrowing. Happy to widen this
if there's a specific path you'd like checked.
Separately,
rustc_codegen_craneliftcurrently panics when it receivesany
ErrfromContext::compileatsrc/base.rs:224rather than converting it to a rustc diagnostic. I have open a PR rust-lang/rustc_codegen_cranelift/pull/1669 there to handle
CodegenError::Unsupportedthe same waythe existing
CodegenError::Verifierarm already does.