-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LangRef] inline asm: the instructions are treated opaquely #157080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d9137b4
to
4fe6b43
Compare
@llvm/pr-subscribers-llvm-ir Author: Ralf Jung (RalfJung) ChangesThis wasn't true until recently, but #156571 got fixed to make it true. I was not entirely sure where to put this; for now I made it a new paragraph fairly early on in the inline asm docs. Cc @nikic @jyknight @efriedma-quic Full diff: https://github.com/llvm/llvm-project/pull/157080.diff 1 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index e64b9343b7622..bdf90c8465d74 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -5229,6 +5229,11 @@ flag that indicates whether or not the inline asm expression has side effects,
and a flag indicating whether the function containing the asm needs to align its
stack conservatively.
+LLVM will treat the actual instructions entirely opaquely (i.e. no optimizations
+will be performed based on the contents of the template string). Only the
+operand constraints are used to deduce what the expression may do during
+execution.
+
The template string supports argument substitution of the operands using "``$``"
followed by a number, to indicate substitution of the given register/memory
location, as specified by the constraint string. "``${NUM:MODIFIER}``" may also
|
Interesting. I know absolutely nothing about the AMDGPU backend (and hardly anything about the general execution model for GPU kernels), but from a purely conceptual point -- if the compiler needs more information about what happens inside the asm block, the usual way such information is provided is via clobbers. |
4fe6b43
to
cd10858
Compare
What we want to do is allow the user to specify the scheduling unit for an inline asm instruction. The current implementation does so in a specially formatted assembly language comment which is part of the inline asm string. The comment looks like Do you have another way we could pass that information? If not, it may be best not to add this to the specification so that we can give users the ability to specify this information, and perhaps other similar information in the future. |
Well, we need something like this in the specification to give frontends like Rust the assurances they would like to provide to their users -- e.g. for use of inline asm blocks in self-modifying code. Using magic comments as a channel between the user and the backend is kind of a hack, but also doesn't affect self-modifying code since the comments anyway don't end up in the binary. So we could either bless the hack by exempting backend-specific magic comments from the "the asm template is opaque" property, or have a better way to convey extra information to the backend. |
llvm/docs/LangRef.rst
Outdated
LLVM will treat the actual instructions entirely opaquely (i.e. no optimizations | ||
will be performed based on the contents of the template string). Only the flags, | ||
operand constraints, and attributes on the call and surrounding function are | ||
used to deduce what the expression may do during execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit stronger in that it is not only about instructions but the entire template string, to require preserving and not looking at comments, pragmas to the assembler, labels, whitespace indentation...
LLVM needs to look at the template string to interpolate certain things to produce the final string, and it should only be allowed to look and modify those interpolation points, and nothing else (at least as a starting point).
LLVM will treat the actual instructions entirely opaquely (i.e. no optimizations | |
will be performed based on the contents of the template string). Only the flags, | |
operand constraints, and attributes on the call and surrounding function are | |
used to deduce what the expression may do during execution. | |
Except for the template substitution features described below, LLVM preserves | |
the inline asm template string verbatim and treats its contents as an opaque blob. | |
LLVM does not inspect this opaque blob and therefore does not perform any | |
optimizations based on the contents of the inline asm template string itself. | |
The only mechanisms available to LLVM to deduce what inline asm expressions | |
may do during execution are their flags, operand constraints, and attributes on | |
the call and surrounding function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That rules out the AMDGPU case of having magic comments in that string even more explicitly than before, I think.
I don't have a strong preference for whether that should be allowed or not, I am just trying to get some reasonable baseline documented. :) @nikic @arsenm do you have an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc LLVM will sometimes reformat the asm string, so it's not strictly true that it's used verbatim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assembler is part of "LLVM", so we need to be careful when saying what "LLVM" will do. That is, when the integrated assembler is enabled, "LLVM" certainly does look at the contents of the string -- even when writing textual assembly, we parse it, then re-emit with the assembly-writer.
I'd also suggest to shorten this, it doesn't really need to be this verbose I think. Maybe just stick a single sentence on the end of the first paragraph, e.g.:
The compiler's understanding of the semantics of the expression comes
only from the list of operand constraints and the flags -- not the
contents of the template string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels worth mentioning "optimizations" explicitly though to make it clear what is meany by "The compiler's understanding of the semantics of the expression". I have updated this to a version derived from yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That rules out the AMDGPU case of having magic comments in that string even more explicitly than before, I think.
Correct, from @arsenm and @jyknight responses in the other threads there seems to be some consensus that those magic comments should not be allowed and the information should be conveyed somehow else.
iirc LLVM will sometimes reformat the asm string, so it's not strictly true that it's used verbatim.
In my experience, it is quite common for programmers to include \n
and \t
characters in inline asm template strings with the intent of precisely controlling their formatting. It may make sense to document the reformatting conditions/rules in the LangRef as well.
IMO, we should not allow a magic comment in inline-asm strings to be semantically meaningful to the compiler. Additional information can be attached to inline assembly IR in other more suitable ways; for example, attributes or metadata attached to the call. |
cd10858
to
173fc51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
That would require changing the clang frontend to support a nonstandard C/C++ Unless there is another specific, minimally intrusive, standards-compliant mechanism for a user to attach the information we need, could we please weaken the wording for this? |
@RalfJung I don't think there is a better way to convey the information required for PR #155491, so could you please weaken the wording to make clear that this is only for self-modifying code and would not disrupt the mechanism used for PR #155491? There's no reason we couldn't strengthen it later if a better mechanism for passing that scheduling information is discovered later. |
@RalfJung how about this: |
using just standard C++11: [[amdgpu::sgmask("...")]] asm ("instructions_like_normal"); |
Or via a new backend-specific clobber: asm ("instructions_like_normal" : : : "sgmask(...)" ); or backend-specific operand asm (
"instruction0;\n" // doesn't get the property
"instruction1 %0;" // gets the property
: : "sgmask"(0)
); The rules of the clobber or operand then say what the effect of this new option is (e.g. it inserts a hint so that the assembler can do X). |
I don't think that's sufficient, since from how I read that it allows LLVM to assume pub fn f(mut v: u64) -> u64 {
unsafe {
asm!(
".balign 16",
".nop 8", // this can be atomically replaced with an 8-byte call instruction
inout("rdi") v,
);
}
v
} |
@programmerjake @gonzalobg thank you for these suggestions. For each of these suggestions, if the user used, for instance, GCC, which does not support this backend feature in its AMDGPU backend, would your suggested directive be ignored, or would the user encounter a compiler error? |
That's not enough for self-patching code. For instance, if the asm block writes a constant to a given register, it is crucial that the compiler is not allowed to use this information for constant-propagation (since the code could actually be changed later to use a different constant). If we do allow any kind of analysis of the template string, we have to restrict it to something like magic comments, and we should link to a place where such magic comments will be documented. So, something like: "The compiler's understanding of the semantics of the Maybe we should also add that this is explicitly intended to allow self-modifying code?
C has existing facilities for adjusting the code to the abilities of the current compiler. |
I'd expect the C++ attributes to warn and be ignored by GCC unless |
If the inline asm writes to a register, it should have an operand constraint indicating that it writes to that register. I intended copy-propagating that write to be in violation of the proposed language. Perhaps language like "operand constraints must be treated opaquely" or something similar would make that clearer. |
Changing the code to write a different value to the register is not in violation of the operand constraint. So no, your wording simply doesn't constrain the compiler enough. No information about what the inline asm block does must be used to optimize the surrounding code. Note that register contents are just one of countless examples. "The constraints say that this may write to memory but actually it doesn't, let me use that for optimizations" must also be excluded. I have no idea what you mean by "operand constraints must be treated opaquely"; the entire point of those constraints is for the compiler to analyze them so they are not opaque. It's the actually asm template string that must be opaque. |
Just to clarify my position on this: LLVM should absolutely not be looking at magic comments in inline assembly to derive semantics. And I think that everyone apart from @linuxrocks123 agrees with that -- including AMDGPU maintainers, who have already rejected this approach in the linked PR. We should not be changing the wording to accommodate this use case. |
Okay, thanks. I don't know who has which roles here so this can be hard to navigate. :) Do you think it makes sense to explicitly mention self-modifying code like in my latest revision, or do you prefer the previous, more concise (but maybe harder to interpret) version? |
@RalfJung whether or not we ultimately go with the comment approach -- which is the approach currently in use by the user of this feature, who already has a private build -- I don't think anyone has made a case that it is necessary to prohibit anything not relevant to self-modifying code. I agree that my previous suggested language was insufficient, so how about this: "No optimizations may be performed which would be incorrect if the inline assembly string were changed to any other instruction or sequence of instructions that also respects all of the operand constraints and attributes placed on the inline assembly." |
If there are magic comments, presumably it also has to still respect those? I don't know what these scheduler comments actually do/mean.^^ Anyway, I'm not the one you have to convince here; I am not an LLVM maintainer. But I do agree with the maintainers that magic comments are a first step towards an unmaintainable mess. And modulo magic comments, your wording and mine are equivalent. |
@RalfJung well, if I can't convince anyone to let me solve this issue in a way satisfactory to the user, what I have to do instead is take pity on the user and provide him private builds until the performance difference is big enough that it can't be ignored ;) But that's my problem, not yours. The idea behind my wording is that the compiler can use the inline assembly string in order to get hints -- whether those hints come from magic comments, from parsing the assembly directly, or from anything else that might be in the string. It can use those hints to do things like schedule the inline assembly in a place where there is a free execution unit, or where register stalls would be avoided, etc. What it can't do is anything that would be incorrect if that inline assembly string ultimately isn't what the compiler expects. Maybe the code isn't as fast if the text gets swapped out, because maybe there's not actually a free execution unit in the place it got put or maybe it actually created a WAR hazard instead of avoiding one, but it will still be correct. I think that gives the self-modifying code people what they need without restricting anything else unnecessarily. Edit: What are the self-modifying code people doing?^H^H^H^H oh they're probably changing branch destinations or something. Never mind. |
This PR clarifies what the LangRef intent is. There is at least one LLVM backend that is "buggy" according to the LangRef intent. Merging this PR does NOT immediately tank the performance of any user of such bugs. |
Yeah, exactly. Or changing constants. The linux kernel has various such tricks to implement "global variables" that change rarely in a way that has extremely low overhead for the reader (compared to a compile-time constant). |
@RalfJung @gonzalobg to be clear I meant "solve the issue fixed by #155491", not solve the language in this documentation. Re this PR, @RalfJung, what do you think of this proposed language:
Do you like it? |
@linuxrocks123 Maybe it helps if we try to provide some context, cause the LangRef won't say why it says what it does. What the Anything that allows the compiler to look at the string (beyond templating mechanisms) is against the soul of this feature. Multiple LLVM backends being buggy here is just "business as usual". We've fixed some for which the fix didn't have a big impact, and we'll need to figure out a transition path for those in which users are impacted. But my hope is that this results in LLVM, those backends, and their users all ending up better off, because instead of hacking the inline asm string they'll get a first-class feature to solve their problems that will be less brittle and allow backends to do more. |
@gonzalobg You've mentioned a few times that there are backends violating this rule -- which ones are that? |
The x86 and Arm backend were, but you fixed it here: https://github.com/llvm/llvm-project/pull/156617/files. |
@gonzalobg other than the approach taken by PR #155491, I don't think there's any way to give hints about inline asm that won't cause compilation errors on compilers that don't understand the hints, whether that compiler is GCC or a version of LLVM prior to the release that understands the hints. That is motivation for giving the hints as part of the asm string. Since neither Rust nor anyone else needs to break that, I hope @RalfJung will decide not to break it. |
I'm not deciding anything here, I am awaiting the decision of the LLVM maintainers.
|
@RalfJung this is your PR, so I'm confused. To whom are you delegating the decision on the wording of that paragraph in your PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current wording LGTM.
From pushback from arsenm on PR 155491, I don't think the proposal to add a magic comment hack to AMDGPU looks destined to be accepted regardless of what happens here. Given that, I don't think there's any reason to have a debate in this PR about whether we ought to accommodate it.
In general, storing structured data in inline-asm comments is not a good design -- and at the IR level, source-code compatibility with other compilers is not at all a valid rationale. That is to say, if passing this data in a inline-asm comment string was the best way to spell this functionality in C (which I am skeptical of, but just for argument's sake...), then the frontend may parse the magic comment, and emit a more reasonable representation in IR.
@jyknight the current wording may not just break magic comments. It may also break parsing the assembly and then doing optimizations based on that even if those optimizations would not compromise the correctness of the code if the assembly were changed. Why break those potential optimizations? Re magic comments, I'm not too familiar with clang, so I'll ask: do we have backend-specific hooks that would allow parsing an inline assembly string for a particular architecture? Like, is there enough backend-specific code in clang to where there would be a place to put that, if someone wanted to do that? |
@RalfJung worst case scenario, I assume this isn't set in stone, right? Because, I mean, it's being changed here, after all. So, if PR #155491 gets accepted, or someone writes an inline assembly hint analysis, we can change this since those applications don't break the actual motivation for this, which is just self-modifying code? |
You're suggesting that the compiler could do certain optimizations based on the inline asm which don't impact correctness? Like, trying to guess the latency of instructions inside the inline asm? If it doesn't affect semantics, it's hard to say it's wrong. But I'm not sure it's actually a good idea: the user would have limited control if the compiler guesses wrong.
Yeah, this wouldn't be too hard; we already do a bunch of target-specific stuff for inline asm to handle constraints/clobbers/etc. |
Yes, exactly! Scheduling optimizations are the easiest to imagine doing that way (and PR #155491 is in fact about a scheduling hint).
Good to know, although involving the frontend still feels superfluous to me. |
Yes I think allowing the compiler to heuristically analyze the asm string, e.g. for the purpose of scheduling decisions, is fine in principle. It's a bit unclear whether this even needs to be mentioned since it doesn't actually affect the semantics of the program. It could still be undesirable if the scheduler decision turns out to be really bad once the code changes to something else. But that's more of a case-by-case thing than something that needs to go into the LangRef. @nikic @efriedma-quic what do you think? Paging @Amanieu to ensure this is fine from a Rust perspective. |
analyzing the asm string seems fine to me as long as the output is still correct if the asm string is replaced at runtime with any other asm string -- so that means the compiler can look at it to guess how many instructions it contains and their latency to schedule code around it, but the compiler isn't allowed to e.g. see that it's an empty string and assume it can never have any effect, or see that it writes a |
Agree. The programmer doesn't have control about how instructions around the string are scheduled, so I think its fine and even desirable for a backend to inspect the string to guess its latency to be able to schedule it properly around surrounding instructions (or equivalently, to be able to properly schedule surrounding instructions around it). What I think would not be fine is to modify the string (e.g., reorder instructions within the string to improve latency, e.g., based on information from comments in the string), or impact the semantic of the programs (e.g. backend erroring if it doesn't know the latency of an instruction within the string). I have no suggestions about how to word this. |
related, gcc documents that they inspect inline asm to guess maximum size (presumably for deciding if gcc can emit instructions that require their target labels to be within a small range, e.g. x86 jumps with 1 byte offsets), as well as for inlining cost estimation: https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html |
What about something like this?
|
looks good, though you should probably explicitly mention that empty strings can't be assumed to have no effect, there can be hardware-specific mechanisms to trap and change state there even if there are no instructions that can be overwritten there. also it's conventionally used as a compiler barrier. |
I think that's already implied... not sure if it is worth listing this as an explicit non-exception (if we list all non-exceptions we'll be here for a while ;). |
not necessarily, because if you're thinking about overwriting code, zero bytes are always zero bytes, so could be assumed to have no effect. so I think we should explicitly state otherwise. |
The "self-modifying code" part is a clarification/motivated stated at the end. The main point is:
This clearly includes the empty string as well. |
LGTM. Just to double check. This part:
constraints the implementation to not modify the inline asm string beyond expanding the templating, guaranteeing that the block of code entered by the programmer is otherwise preserved kind of verbatim, which enables replacing it at runtime. |
That guarantee does not hold unconditionally since pure asm blocks can be removed.
I am not sure how to best phrase a separate "preserve verbatim" guarantee or if it is needed. After all most replacements would require analyzing what the asm block does and we are ruling that out. In particular that bswap optimization should be ruled out.
|
This wasn't true until recently, but #156571 got fixed to make it true.
I was not entirely sure where to put this; for now I made it a new paragraph fairly early on in the inline asm docs.
Cc @nikic @jyknight @efriedma-quic