-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[NVPTX] Fix lit test issue from used_bytes_mask #171220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Fix lit test issue from used_bytes_mask #171220
Conversation
|
@llvm/pr-subscribers-backend-nvptx Author: Drew Kersnar (dakersnar) ChangesWhoops, I made the same mistake as https://github.com/llvm/llvm-project/pull/169535/files again, so 5c8c7f3 is causing build issues. This should fix that, I updated the tests to use sm_90 and ptx 8.8, where the used bytes mask feature is approved. Full diff: https://github.com/llvm/llvm-project/pull/171220.diff 2 Files Affected:
diff --git a/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll b/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll
index a75ddd032d4c0..a0dfe15db2ba9 100644
--- a/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll
+++ b/llvm/test/CodeGen/NVPTX/LoadStoreVectorizer.ll
@@ -1,8 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s | FileCheck -check-prefix=ENABLED %s
-; RUN: llc -disable-nvptx-load-store-vectorizer < %s | FileCheck -check-prefix=DISABLED %s
-; RUN: %if ptxas %{ llc < %s | %ptxas-verify %}
-; RUN: %if ptxas %{ llc -disable-nvptx-load-store-vectorizer < %s | %ptxas-verify %}
+; RUN: llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 < %s | FileCheck -check-prefix=ENABLED %s
+; RUN: llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 -disable-nvptx-load-store-vectorizer < %s | FileCheck -check-prefix=DISABLED %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.8 %{ llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 < %s | %ptxas-verify -arch=sm_90 %}
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.8 %{ llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 -disable-nvptx-load-store-vectorizer < %s | %ptxas-verify -arch=sm_90 %}
target triple = "nvptx64-nvidia-cuda"
@@ -52,7 +52,7 @@ define half @fh(ptr %p) {
; ENABLED-NEXT: ld.param.b64 %rd1, [fh_param_0];
; ENABLED-NEXT: .pragma "used_bytes_mask 0x3ff";
; ENABLED-NEXT: ld.v4.b32 {%r1, %r2, %r3, %r4}, [%rd1];
-; ENABLED-NEXT: { .reg .b16 tmp; mov.b32 {%rs1, tmp}, %r3; }
+; ENABLED-NEXT: mov.b32 {%rs1, _}, %r3;
; ENABLED-NEXT: mov.b32 {%rs2, %rs3}, %r2;
; ENABLED-NEXT: mov.b32 {%rs4, %rs5}, %r1;
; ENABLED-NEXT: cvt.f32.f16 %r5, %rs5;
diff --git a/llvm/test/CodeGen/NVPTX/param-vectorize-device.ll b/llvm/test/CodeGen/NVPTX/param-vectorize-device.ll
index 643de006f14c4..a2658ac5873a4 100644
--- a/llvm/test/CodeGen/NVPTX/param-vectorize-device.ll
+++ b/llvm/test/CodeGen/NVPTX/param-vectorize-device.ll
@@ -1,5 +1,5 @@
-; RUN: llc < %s -mtriple=nvptx64-unknown-unknown | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64-unknown-unknown | %ptxas-verify %}
+; RUN: llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 < %s | FileCheck %s
+; RUN: %if ptxas-sm_90 && ptxas-isa-8.8 %{ llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 < %s | %ptxas-verify -arch=sm_90 %}
;
; Check that parameters of a __device__ function with private or internal
; linkage called from a __global__ (kernel) function get increased alignment,
|
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, there's still something odd here. Why does llc with the default flags generates something that is not supported by a recent ptxas?
Hmm, I see what you mean. I might have missed something in my original masked load lowering implementation. Perhaps there needs to be a guard for this feature based on sm and ptx version. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
Should we revert the original patch in the mean time? Or is fixing forward the best option here? |
This sounds plausible. AFAICT, all failures we see follow the same pattern, complaining about |
FWIW as far as PTX is concerned the correctness of the PTX is not impacted by the existence of the pragma, the pragma is a feature that exists to be a hint for the sanitizer. So for older versions, we should be able to create masked loads and simply not emit the used_bytes_mask pragma if the subtarget check fails.
I can have a fix up in 5-10 mins, which I think might be easier. |
If it's just a missing predicate, fixing forward may be the easiest. If it's going to take more time, then we should revert it, as it will break everyone with the older GPU (though I doubt there are many users of cuda older than 12.3 or pre-sm_50 GPUs). |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
I think this should do it. Trying to get my build built locally to confirm. I also think we should keep the changes to the llc commands in these tests so that they are testing more recent versions, rather than having them call llc with no arguments and not generate the used_bytes_mask feature. But I can confirm locally that without the llc arguments the used_bytes_mask is not generated. |
I would suggest adding a dedicated test that verifies that the pragma is generated or not. |
|
Shoot, @Artem-B any chance you know the fix to this off the top of your head? I was hoping I could just add this argument to printUsedBytesMaskPragma but it doesn't seem to be that simple.
|
|
Looks like tablegen back-end assumes a particular list of arguments passed to the operand |
|
An alternative solution is to filter these masks away in |
That's not great. :-/ If the change is still revertable cleanly, let's revert it for now and try again later. If reverting is not doable cleanly, we can comment out pragma generation altogether for now, as a stop-gap workaround and fix it forward later. |
This may work. |
|
Let me put up the alternate solution, and if we decide it's a no go, I'm ok with reverting. |
8cd79d7 to
b6e1e51
Compare
|
To minimize churn, I'm setting the existing tests to run on sm_50 and ptx 8.3 |
|
(setting it to sm_90 unlocks some other optimizations in LoadStoreVectorizer.ll that I don't think we should include in this fix change) |
|
Ok @Artem-B this looks good locally. Sorry again for missing this, but assuming we are ok with this solution I think we could merge this as soon as checks are green and it should solve the build break. |
| ; RUN: llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 < %s | FileCheck -check-prefix=ENABLED %s | ||
| ; RUN: llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 -disable-nvptx-load-store-vectorizer < %s | FileCheck -check-prefix=DISABLED %s | ||
| ; RUN: %if ptxas-sm_90 && ptxas-isa-8.8 %{ llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 < %s | %ptxas-verify -arch=sm_90 %} | ||
| ; RUN: %if ptxas-sm_90 && ptxas-isa-8.8 %{ llc -march=nvptx64 -mcpu=sm_90 -mattr=+ptx88 -disable-nvptx-load-store-vectorizer < %s | %ptxas-verify -arch=sm_90 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be left with the defaults.
Pragma's own tests should live in a separate test. They are coincidental here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've removed the version specifiers and with the addition of the new gate all the pragmas as no longer generated, as we expect.
| @@ -0,0 +1,36 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6 | |||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_50 -mattr=+ptx82 | FileCheck %s -check-prefixes=NOMASK | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do bother to explicitly specify a GPU variant, I'd use a more recent one, so the tests continue to work with cuda-13 which no longer supports sm_50. E.g. sm_80 or sm_90 would be a choice reasonably stable long term.
I'd also add a comment statiing the actual constrint on the pragma use, and that we're using PTX to gate it, and GPU is fixed for stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with sm_90 for this test. Done.
Up to you. sm_50 is OK to use for now. |
117fe03 to
d01d16c
Compare
|
@boomanaiden154 this should be good to merge whenever, although I'm not sure whether you would prefer to wait for the final two in-progress checks. |
Should be good to merge now. AArch64 is non-blocking and the libcxx-ci checks usually never finish. |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/14563 Here is the relevant piece of the build log for the reference |
|
I don't think the above is caused by me? |
Whoops, I made the same mistake as https://github.com/llvm/llvm-project/pull/169535/files again, so llvm@5c8c7f3 is causing build issues.

Whoops, I made the same mistake as https://github.com/llvm/llvm-project/pull/169535/files again, so 5c8c7f3 is causing build issues. This should fix that, I updated the tests to use sm_90 and ptx 8.8, where the used bytes mask feature is approved.