-
Notifications
You must be signed in to change notification settings - Fork 569
Document ppc inline asm support #2056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ Support for inline assembly is stable on the following architectures: | |
| - RISC-V | ||
| - LoongArch | ||
| - s390x | ||
| - PowerPC and PowerPC64 | ||
|
|
||
| The compiler will emit an error if an assembly macro is used on an unsupported target. | ||
|
|
||
|
|
@@ -602,11 +603,24 @@ Here is the list of currently supported register classes: | |
| | s390x | `freg` | `f[0-15]` | `f` | | ||
| | s390x | `vreg` | `v[0-31]` | Only clobbers | | ||
| | s390x | `areg` | `a[2-15]` | Only clobbers | | ||
| | PowerPC | `reg` | `r0`, `r[3-12]`, `r[14-28]` | `r` | | ||
| | PowerPC | `reg_nonzero` | `r[3-12]`, `r[14-28]` | `b` | | ||
| | PowerPC | `spe_acc` | `spe_acc` | Only clobbers | | ||
| | PowerPC64 | `reg` | `r0`, `r[3-12]`, `r[14-29]` | `r` | | ||
| | PowerPC64 | `reg_nonzero` | `r[3-12]`, `r[14-29]` | `b` | | ||
| | PowerPC/PowerPC64 | `freg` | `f[0-31]` | `f` | | ||
| | PowerPC/PowerPC64 | `vreg` | `v[0-31]` | `v` | | ||
| | PowerPC/PowerPC64 | `vsreg` | `vs[0-63]` | `wa` | | ||
| | PowerPC/PowerPC64 | `cr` | `cr[0-7]`, `cr` | Only clobbers | | ||
| | PowerPC/PowerPC64 | `ctr` | `ctr` | Only clobbers | | ||
| | PowerPC/PowerPC64 | `lr` | `lr` | Only clobbers | | ||
| | PowerPC/PowerPC64 | `xer` | `xer` | Only clobbers | | ||
|
|
||
| > [!NOTE] | ||
| > - On x86 we treat `reg_byte` differently from `reg` because the compiler can allocate `al` and `ah` separately whereas `reg` reserves the whole register. | ||
| > - On x86-64 the high byte registers (e.g. `ah`) are not available in the `reg_byte` register class. | ||
| > - Some register classes are marked as "Only clobbers" which means that registers in these classes cannot be used for inputs or outputs, only clobbers of the form `out(<explicit register>) _` or `lateout(<explicit register>) _`. | ||
| > - The `spe_acc` register is only available on PowerPC SPE targets | ||
|
|
||
| r[asm.register-operands.value-type-constraints] | ||
| Each register class has constraints on which value types they can be used with. | ||
|
|
@@ -649,6 +663,17 @@ The availability of supported types for a particular register class may depend o | |
| | s390x | `freg` | None | `f32`, `f64` | | ||
| | s390x | `vreg` | N/A | Only clobbers | | ||
| | s390x | `areg` | N/A | Only clobbers | | ||
| | PowerPC | `spe_acc` | None | Only clobbers | | ||
| | PowerPC/PowerPC64 | `reg` | None | `i8`, `i16`, `i32`, `i64` (PowerPC64 only) | | ||
| | PowerPC/PowerPC64 | `reg_nonzero` | None | `i8`, `i16`, `i32`, `i64` (PowerPC64 only) | | ||
| | PowerPC/PowerPC64 | `freg` | None | `f32`, `f64` | | ||
| | PowerPC/PowerPC64 | `vreg` | `altivec` | `i8x16`, `i16x8`, `i32x4`, `f32x4` | | ||
| | PowerPC/PowerPC64 | `vreg` | `vsx` | `f32`, `f64`, `i64x2`, `f64x2` | | ||
| | PowerPC/PowerPC64 | `vsreg` | `vsx` | The union of vsx and altivec vreg types | | ||
| | PowerPC/PowerPC64 | `cr` | None | Only clobbers | | ||
| | PowerPC/PowerPC64 | `ctr` | None | Only clobbers | | ||
| | PowerPC/PowerPC64 | `lr` | None | Only clobbers | | ||
| | PowerPC/PowerPC64 | `xer` | None | Only clobbers | | ||
|
|
||
| > [!NOTE] | ||
| > For the purposes of the above table pointers, function pointers and `isize`/`usize` are treated as the equivalent integer type (`i16`/`i32`/`i64` depending on the target). | ||
|
|
@@ -790,6 +815,10 @@ Here is the list of all supported register aliases: | |
| | LoongArch | `$f[0-7]` | `$fa[0-7]` | | ||
| | LoongArch | `$f[8-23]` | `$ft[0-15]` | | ||
| | LoongArch | `$f[24-31]` | `$fs[0-7]` | | ||
| | PowerPC/PowerPC64 | `r1` | `sp` | | ||
| | PowerPC/PowerPC64 | `r31` | `fp` | | ||
| | PowerPC/PowerPC64 | `r[0-31]` | `[0-31]` | | ||
| | PowerPC/PowerPC64 | `f[0-31]` | `fr[0-31]`| | ||
|
|
||
| ```rust | ||
| # #[cfg(target_arch = "x86_64")] { | ||
|
|
@@ -804,10 +833,10 @@ Some registers cannot be used for input or output operands: | |
|
|
||
| | Architecture | Unsupported register | Reason | | ||
| | ------------ | -------------------- | ------ | | ||
| | All | `sp`, `r15` (s390x) | The stack pointer must be restored to its original value at the end of the assembly code or before jumping to a `label` block. | | ||
| | All | `bp` (x86), `x29` (AArch64 and Arm64EC), `x8` (RISC-V), `$fp` (LoongArch), `r11` (s390x) | The frame pointer cannot be used as an input or output. | | ||
| | All | `sp`, `r15` (s390x), `r1` (PowerPC and PowerPC64) | The stack pointer must be restored to its original value at the end of the assembly code or before jumping to a `label` block. | | ||
| | All | `bp` (x86), `x29` (AArch64 and Arm64EC), `x8` (RISC-V), `$fp` (LoongArch), `r11` (s390x), `fp` (PowerPC and PowerPC64) | The frame pointer cannot be used as an input or output. | | ||
| | ARM | `r7` or `r11` | On ARM the frame pointer can be either `r7` or `r11` depending on the target. The frame pointer cannot be used as an input or output. | | ||
| | All | `si` (x86-32), `bx` (x86-64), `r6` (ARM), `x19` (AArch64 and Arm64EC), `x9` (RISC-V), `$s8` (LoongArch) | This is used internally by LLVM as a "base pointer" for functions with complex stack frames. | | ||
| | All | `si` (x86-32), `bx` (x86-64), `r6` (ARM), `x19` (AArch64 and Arm64EC), `x9` (RISC-V), `$s8` (LoongArch), `r29` and `r30` (PowerPC), `r30` (PowerPC64) | This is used internally by LLVM as a "base pointer" for functions with complex stack frames. | | ||
| | x86 | `ip` | This is the program counter, not a real register. | | ||
| | AArch64 | `xzr` | This is a constant zero register which can't be modified. | | ||
| | AArch64 | `x18` | This is an OS-reserved register on some AArch64 targets. | | ||
|
|
@@ -823,6 +852,8 @@ Some registers cannot be used for input or output operands: | |
| | LoongArch | `$r21` | This is reserved by the ABI. | | ||
| | s390x | `c[0-15]` | Reserved by the kernel. | | ||
| | s390x | `a[0-1]` | Reserved for system use. | | ||
| | PowerPC/PowerPC64 | `r2`, `r13` | These are system reserved registers. | | ||
| | PowerPC/PowerPC64 | `vrsave` | The vrsave register cannot be used as an input or output. | | ||
|
|
||
| ```rust,compile_fail | ||
| # #[cfg(target_arch = "x86_64")] { | ||
|
|
@@ -898,6 +929,11 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen | |
| | s390x | `reg` | None | `%r0` | None | | ||
| | s390x | `reg_addr` | None | `%r1` | None | | ||
| | s390x | `freg` | None | `%f0` | None | | ||
| | PowerPC/PowerPC64 | `reg` | None | `0` | None | | ||
| | PowerPC/PowerPC64 | `reg_nonzero` | None | `3` | None | | ||
| | PowerPC/PowerPC64 | `freg` | None | `0` | None | | ||
| | PowerPC/PowerPC64 | `vreg` | None | `0` | None | | ||
| | PowerPC/PowerPC64 | `vsreg` | None | `0` | None | | ||
|
|
||
| > [!NOTE] | ||
| > - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register. | ||
|
|
@@ -1316,6 +1352,10 @@ r[asm.rules.stack-below-sp] | |
| - You should adjust the stack pointer when allocating stack memory as required by the target ABI. | ||
| - The stack pointer must be restored to its original value before leaving the assembly code. | ||
|
|
||
| r[asm.rules.stack-above-sp] | ||
| - Unless the `nostack` option is set, assembly code is allowed to modify the caller's stack frame in specific cases. | ||
| - The target ABI requires storing certain values in the caller's frame (e.g saving the `lr` on PowerPC64) | ||
|
Comment on lines
+1355
to
+1357
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this meaning to say:
?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I intended it to be a sublist. I am not sure there aren't other cases which may need mentioning.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right; I'm asking here about content rather than factoring -- I'm checking with you my reading is correct. Are there any cases other than when the target ABI requires values to be stored in the caller's frame?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The parameter save space, if any, of the caller can be used to spill arguments on ppc64 in some cases. That's usually for C variadic functions.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I know writing into the caller's stack frame is something that only happens on PowerPC. |
||
|
|
||
| r[asm.rules.noreturn] | ||
| - If the `noreturn` option is set then behavior is undefined if execution falls through the end of the assembly code. | ||
|
|
||
|
|
@@ -1346,6 +1386,11 @@ r[asm.rules.preserved-registers] | |
| - Vector extension state (`vtype`, `vl`, `vxsat`, and `vxrm`). | ||
| - LoongArch | ||
| - Floating-point condition flags in `$fcc[0-7]`. | ||
| - PowerPC/PowerPC64 | ||
| - Floating-point status and sticky bits in the `fpscr` (any field other than DRN, VE, OE, UE, ZE, XE, NI, or RN). | ||
| - Vector status and sticky bits in the `vscr` (any field other than NJ). | ||
| - PowerPC SPE | ||
| - The sticky and status bits of the `spefscr` (any field other than FINXE, FINVE, FDBZE, FUNFE, FOVFE, or FRMC) | ||
| - s390x | ||
| - The condition code register `cc`. | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source has the following
FIXMElinking to rust-lang/rust#131551 (comment), maybe that can be resolved?https://github.com/rust-lang/rust/blob/a3f2d5abe45a9acfaccbf09266b33e1fd7ab193e/compiler/rustc_target/src/asm/powerpc.rs#L58-L62
cc @taiki-e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integer support for vsx is quite limited. It's only 128b integers (power8) with very restricted functionality prior to power10 (notably, no divide support). Maybe that could be supported via
VecI128(1)? I suspect the llvm issues remain. Is that a blocker for stabilization?F128 is another type which should be allowed, but I think that support remains unstable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker I think, this just seemed like a good moment to go over the implementation while the specifics are "in ram". If you believe the rust implementation is complete given what powerpc is able to provide we're all good here.
LLVM currently does not allow putting
f128intovregorvsreg.https://github.com/llvm/llvm-project/blob/f59d12001fd877e44e25f260db888a352d5ab755/llvm/lib/Target/PowerPC/PPCISelLowering.cpp#L18201-L18207
Do you think that is an oversight in LLVM?
(the
f128type is still unstable but we'd like to add support for it in the compiler where possible, see rust-lang/rust#125398)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Gcc allows it, and I believe ELFv2 calling conventions require they be passed via vector registers. LLVM also crashed when I tried wrapping it into a vector.
As for I128. It behaves strangely using LLVM. If you don't wrap it in a vector, it gets treated like a register pair. If you wrap it in a vector and fail to set the minimum cpu to pwr8 or newer (on ppc64), it quietly generates bad code to move between gpr and vr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe our target maintainers can confirm the expected behavior here?
cc @daltenty @gilamn5tr @amy-kwan
I'd be happy to submit a patch to LLVM if we can agree on how this should work.