Skip to content

Conversation

@MacDue
Copy link
Member

@MacDue MacDue commented Jun 25, 2025

This patch moves computing and storing of VG in functions with streaming mode changes to a new pre-RA pass (MachineSMEABI). The goal is to make saving VG simpler, as computing VG may require calling __arm_get_current_vg -- which requires saving X0 around the call and the LR (among complexities in frame lowering). Doing this pre-RA allows the register allocator to handle this (rather than manual scavenging).

The MachineSMEABI saves to VG to AArch64::SAVED_STREAMING_VG_SLOT and AArch64::SAVED_VG_SLOT target frame indices. These will be resolved to an actual frame indices during PEI (as they are not known before then).

For the most part this does not significantly change codegen, however, there is one downside that resolving the frame indices outside of the prologue may need exta instructions (to step past later allocations on the stack, such as scalable vectors).

Fixes #145635

This patch moves computing and storing of VG in functions with streaming
mode changes to a new pre-RA pass (MachineSMEABI). The goal is to make
saving VG simpler, as computing VG may require calling
`__arm_get_current_vg` -- which requires saving X0 around the call and
the LR (among complexities in frame lowering). Doing this pre-RA allows
the register allocator to handle this (rather than manual scavenging).

The MachineSMEABI saves to VG to AArch64::SAVED_STREAMING_VG_SLOT and
AArch64::SAVED_VG_SLOT target frame indices. These will be resolved to
an actual frame indices during PEI (as they are not known before then).

For the most part this does not significantly change codegen, however,
there is one downside that resolving the frame indices outside of
the prologue may need exta instructions (to step past later allocations
on the stack, such as scalable vectors).

Fixes llvm#145635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Miscompile with __arm_locally_streaming with -march=armv8-a+sme

1 participant