Skip to content

Conversation

@SergeantCooper
Copy link

Solves #114339

Implemented the isTruncateFree and isZExtFree virtual functions in the NVPTXTargetLowering class. This implementation optimizes truncation from i64 to i32 as a free operation for NVPTX arch.

Details:

  • The isTruncateFree function returns true for truncation from i64 to i32.
  • The isZExtFree function is implemented to return false for zero-extension from i32 to i64.
  • These changes improve the efficiency of code generation by leveraging hardware capabilities.

Testing:

  • Added lit tests to validate the functionality of these optimizations.

@github-actions
Copy link

github-actions bot commented Nov 2, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2024

@llvm/pr-subscribers-backend-nvptx

Author: None (Quark-69)

Changes

Solves #114339

Implemented the isTruncateFree and isZExtFree virtual functions in the NVPTXTargetLowering class. This implementation optimizes truncation from i64 to i32 as a free operation for NVPTX arch.

Details:

  • The isTruncateFree function returns true for truncation from i64 to i32.
  • The isZExtFree function is implemented to return false for zero-extension from i32 to i64.
  • These changes improve the efficiency of code generation by leveraging hardware capabilities.

Testing:

  • Added lit tests to validate the functionality of these optimizations.

Full diff: https://github.com/llvm/llvm-project/pull/114683.diff

3 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+17)
  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (+4)
  • (added) llvm/test/CodeGen/NVPTX/truncate_zext.ll (+17)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index d3bf0ecfe2cc92..d208caebbd1151 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -3340,6 +3340,23 @@ bool NVPTXTargetLowering::splitValueIntoRegisterParts(
   return false;
 }
 
+bool llvm::NVPTXTargetLowering::isTruncateFree(EVT FromVT, EVT ToVT) const {
+  if (FromVT.isVector() || ToVT.isVector() || !FromVT.isInteger() ||
+      !ToVT.isInteger()) {
+    return false;
+  }
+
+  return FromVT.getSizeInBits() == 64 && ToVT.getSizeInBits() == 32;
+}
+
+bool llvm::NVPTXTargetLowering::isZExtFree(EVT FromVT, EVT ToVT) const {
+  return false;
+}
+
+bool llvm::NVPTXTargetLowering::isZExtFree(Type *SrcTy, Type *DstTy) const {
+  return false;
+}
+
 // This creates target external symbol for a function parameter.
 // Name of the symbol is composed from its index and the function name.
 // Negative index corresponds to special parameter (unsized array) used for
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index c8b589ae39413e..fa73938a35a168 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -616,6 +616,10 @@ class NVPTXTargetLowering : public TargetLowering {
     return true;
   }
 
+  bool isTruncateFree(EVT FromVT, EVT ToVT) const override;
+  bool isZExtFree(EVT FromVT, EVT ToVT) const override;
+  bool isZExtFree(Type *SrcTy, Type *DstTy) const override;
+
 private:
   const NVPTXSubtarget &STI; // cache the subtarget here
   SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;
diff --git a/llvm/test/CodeGen/NVPTX/truncate_zext.ll b/llvm/test/CodeGen/NVPTX/truncate_zext.ll
new file mode 100644
index 00000000000000..decc02c5840491
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/truncate_zext.ll
@@ -0,0 +1,17 @@
+; RUN: llc -march=nvptx64 < %s | FileCheck %s
+
+; Test for truncation from i64 to i32
+define i32 @test_trunc_i64_to_i32(i64 %val) {
+  ; CHECK-LABEL: test_trunc_i64_to_i32
+  ; CHECK: trunc
+  %trunc = trunc i64 %val to i32
+  ret i32 %trunc
+}
+
+; Test for zero-extension from i32 to i64
+define i64 @test_zext_i32_to_i64(i32 %val) {
+  ; CHECK-LABEL: test_zext_i32_to_i64
+  ; CHECK: zext
+  %zext = zext i32 %val to i64
+  ret i64 %zext
+}
\ No newline at end of file

Comment on lines 3343 to 3365
bool llvm::NVPTXTargetLowering::isTruncateFree(EVT FromVT, EVT ToVT) const {
if (FromVT.isVector() || ToVT.isVector() || !FromVT.isInteger() ||
!ToVT.isInteger()) {
return false;
}

return FromVT.getSizeInBits() == 64 && ToVT.getSizeInBits() == 32;
}

bool llvm::NVPTXTargetLowering::isZExtFree(EVT FromVT, EVT ToVT) const {
return false;
}

bool llvm::NVPTXTargetLowering::isZExtFree(Type *SrcTy, Type *DstTy) const {
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let's move the implementation of these functions to NVPTXISelLowering.h. They are small enough that it should be okay, and it's the existing convention.

Comment on lines 619 to 621
bool isTruncateFree(EVT FromVT, EVT ToVT) const override;
bool isZExtFree(EVT FromVT, EVT ToVT) const override;
bool isZExtFree(Type *SrcTy, Type *DstTy) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this up to where the existing implementation of isTruncateFree() resides.

; CHECK: zext
%zext = zext i32 %val to i64
ret i64 %zext
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line at the end of the file.

@@ -0,0 +1,17 @@
; RUN: llc -march=nvptx64 < %s | FileCheck %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks like it's checking that the correct PTX is generated for trunc and zext. This is of course nice to have, but not what we need for this PR.

The purpose of the test associated with this PR should be to ensure that your implementation of isTruncateFree() has an impact. Thus, the test should fail without your change and pass with it.

I like to find a place where isTruncateFree(EVT, EVT) is used in a simple peephole and check against that. For example, you could piggyback off of this peephole (Added and tested in 3332b70).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to that. The test should have some code which has to choose between free ops vs an alternative which would be used otherwise (e.g. i32 trunc(add(i64,i64)) vs i32 add(trunc(i64), trunc(i64)))

Comment on lines 3352 to 3364
bool llvm::NVPTXTargetLowering::isZExtFree(EVT FromVT, EVT ToVT) const {
return false;
}

bool llvm::NVPTXTargetLowering::isZExtFree(Type *SrcTy, Type *DstTy) const {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following up offline on what our approach for these should be. I'll update this thread later.


bool llvm::NVPTXTargetLowering::isTruncateFree(EVT FromVT, EVT ToVT) const {
if (FromVT.isVector() || ToVT.isVector() || !FromVT.isInteger() ||
!ToVT.isInteger()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit. LLVM prefers no braces around single-statement bodies.
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

Also, my personal nit: I prefer minimizing the number of negations. .. || !(FromVT.isInteger() && ToVT.isInteger()) is, IMO, better at conveying the intent ( as in "we want both to be integers") instead of eliminating unwanted parts piecemeal.

@@ -0,0 +1,17 @@
; RUN: llc -march=nvptx64 < %s | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to use llvm/utils/update_llc_test_checks.py to generate the checks.

@@ -0,0 +1,17 @@
; RUN: llc -march=nvptx64 < %s | FileCheck %s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to that. The test should have some code which has to choose between free ops vs an alternative which would be used otherwise (e.g. i32 trunc(add(i64,i64)) vs i32 add(trunc(i64), trunc(i64)))

@SergeantCooper
Copy link
Author

@justinfargnoli, @Artem-B can you specify what checks should I append to my test case.

@Artem-B
Copy link
Member

Artem-B commented Nov 4, 2024

@justinfargnoli has also pointed you at a the code processing IR which would be affected by your changes,
His example is simlar to what I've suggested, only uses select instead of add: trunc (select c, a, b) -> select c, (trunc a), (trunc b)

You should create an function which does that (or, find an existing example that may already exist).
I would check the git blame for the code @justinfargnoli pointed at, and look at the tests that were committed when the optimization was implemented. That may give you an example of what kind of tests you may want to add to this PR.

@SergeantCooper
Copy link
Author

Thanks I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants