-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[RISCV] Support umin/umax in tryFoldSelectIntoOp #157548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is coverage for an upcoming change, but I thought the choice of configurations to check was probably worth a moment of consideration as well.
The neutral values for these are -1U, and 0 respectively. We already have good arithmetic lowerings for selects with one arm equal to these values. smin/smax are a bit harder, and will be a separate change. Somewhat surprisingly, this looks to be a net code improvement in all of the configurations. With both zbb, it's a clear win. With only zicond, we still seem to come out ahead because we reduce the number of ziconds needed (since we lower min/max to them). Without either zbb or zicond, we're a bit more of wash, but the available arithmetic sequences are good enough that doing the select unconditionally before using branches for the min/max is probably still worthwhile?
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesThe neutral values for these are -1U, and 0 respectively. We already have good arithmetic lowerings for selects with one arm equal to these values. smin/smax are a bit harder, and will be a separate change. Somewhat surprisingly, this looks to be a net code improvement in all of the configurations. With both zbb, it's a clear win. With only zicond, we still seem to come out ahead because we reduce the number of ziconds needed (since we lower min/max to them). Without either zbb or zicond, we're a bit more of wash, but the available arithmetic sequences are good enough that doing the select unconditionally before using branches for the min/max is probably still worthwhile? This stacks on #157539, and includes that change. Patch is 47.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157548.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 5f01633126c7b..1fed0721c994d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -18835,6 +18835,8 @@ static SDValue tryFoldSelectIntoOp(SDNode *N, SelectionDAG &DAG,
case ISD::ADD:
case ISD::OR:
case ISD::XOR:
+ case ISD::UMIN:
+ case ISD::UMAX:
break;
}
diff --git a/llvm/test/CodeGen/RISCV/select-zbb.ll b/llvm/test/CodeGen/RISCV/select-zbb.ll
new file mode 100644
index 0000000000000..6bf4009eceea1
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/select-zbb.ll
@@ -0,0 +1,1462 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s | FileCheck --check-prefixes=RV32IM %s
+; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s | FileCheck --check-prefixes=RV64IM %s
+; RUN: llc -mtriple=riscv32 -mattr=+m,+zbb -verify-machineinstrs < %s | FileCheck --check-prefixes=RV32IMZBB %s
+; RUN: llc -mtriple=riscv64 -mattr=+m,+zbb -verify-machineinstrs < %s | FileCheck --check-prefixes=RV64IMZBB %s
+; RUN: llc -mtriple=riscv32 -mattr=+m,+zicond -verify-machineinstrs < %s | FileCheck --check-prefixes=RV32IMZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+m,+zicond -verify-machineinstrs < %s | FileCheck --check-prefixes=RV64IMZICOND %s
+; RUN: llc -mtriple=riscv32 -mattr=+m,+zicond,+zbb -verify-machineinstrs < %s | FileCheck --check-prefixes=RV32IMBOTH %s
+; RUN: llc -mtriple=riscv64 -mattr=+m,+zicond,+zbb -verify-machineinstrs < %s | FileCheck --check-prefixes=RV64IMBOTH %s
+
+
+define i32 @select_umin_1(i1 zeroext %cond, i32 %a, i32 %b) {
+; RV32IM-LABEL: select_umin_1:
+; RV32IM: # %bb.0: # %entry
+; RV32IM-NEXT: addi a0, a0, -1
+; RV32IM-NEXT: or a1, a0, a1
+; RV32IM-NEXT: mv a0, a2
+; RV32IM-NEXT: bltu a2, a1, .LBB0_2
+; RV32IM-NEXT: # %bb.1: # %entry
+; RV32IM-NEXT: mv a0, a1
+; RV32IM-NEXT: .LBB0_2: # %entry
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umin_1:
+; RV64IM: # %bb.0: # %entry
+; RV64IM-NEXT: mv a3, a0
+; RV64IM-NEXT: sext.w a0, a2
+; RV64IM-NEXT: addi a3, a3, -1
+; RV64IM-NEXT: or a1, a3, a1
+; RV64IM-NEXT: sext.w a1, a1
+; RV64IM-NEXT: bltu a0, a1, .LBB0_2
+; RV64IM-NEXT: # %bb.1: # %entry
+; RV64IM-NEXT: mv a0, a1
+; RV64IM-NEXT: .LBB0_2: # %entry
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umin_1:
+; RV32IMZBB: # %bb.0: # %entry
+; RV32IMZBB-NEXT: addi a0, a0, -1
+; RV32IMZBB-NEXT: or a0, a0, a1
+; RV32IMZBB-NEXT: minu a0, a2, a0
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umin_1:
+; RV64IMZBB: # %bb.0: # %entry
+; RV64IMZBB-NEXT: sext.w a2, a2
+; RV64IMZBB-NEXT: addi a0, a0, -1
+; RV64IMZBB-NEXT: or a0, a0, a1
+; RV64IMZBB-NEXT: sext.w a0, a0
+; RV64IMZBB-NEXT: minu a0, a2, a0
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umin_1:
+; RV32IMZICOND: # %bb.0: # %entry
+; RV32IMZICOND-NEXT: addi a0, a0, -1
+; RV32IMZICOND-NEXT: or a0, a0, a1
+; RV32IMZICOND-NEXT: sltu a1, a2, a0
+; RV32IMZICOND-NEXT: czero.nez a0, a0, a1
+; RV32IMZICOND-NEXT: czero.eqz a1, a2, a1
+; RV32IMZICOND-NEXT: or a0, a1, a0
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umin_1:
+; RV64IMZICOND: # %bb.0: # %entry
+; RV64IMZICOND-NEXT: sext.w a2, a2
+; RV64IMZICOND-NEXT: addi a0, a0, -1
+; RV64IMZICOND-NEXT: or a0, a0, a1
+; RV64IMZICOND-NEXT: sext.w a0, a0
+; RV64IMZICOND-NEXT: sltu a1, a2, a0
+; RV64IMZICOND-NEXT: czero.nez a0, a0, a1
+; RV64IMZICOND-NEXT: czero.eqz a1, a2, a1
+; RV64IMZICOND-NEXT: or a0, a1, a0
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umin_1:
+; RV32IMBOTH: # %bb.0: # %entry
+; RV32IMBOTH-NEXT: addi a0, a0, -1
+; RV32IMBOTH-NEXT: or a0, a0, a1
+; RV32IMBOTH-NEXT: minu a0, a2, a0
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umin_1:
+; RV64IMBOTH: # %bb.0: # %entry
+; RV64IMBOTH-NEXT: sext.w a2, a2
+; RV64IMBOTH-NEXT: addi a0, a0, -1
+; RV64IMBOTH-NEXT: or a0, a0, a1
+; RV64IMBOTH-NEXT: sext.w a0, a0
+; RV64IMBOTH-NEXT: minu a0, a2, a0
+; RV64IMBOTH-NEXT: ret
+entry:
+ %c = call i32 @llvm.umin(i32 %a, i32 %b)
+ %res = select i1 %cond, i32 %c, i32 %b
+ ret i32 %res
+}
+
+define i32 @select_umin_2(i1 zeroext %cond, i32 %a, i32 %b) {
+; RV32IM-LABEL: select_umin_2:
+; RV32IM: # %bb.0: # %entry
+; RV32IM-NEXT: neg a0, a0
+; RV32IM-NEXT: ori a2, a0, 32
+; RV32IM-NEXT: mv a0, a1
+; RV32IM-NEXT: bltu a1, a2, .LBB1_2
+; RV32IM-NEXT: # %bb.1: # %entry
+; RV32IM-NEXT: mv a0, a2
+; RV32IM-NEXT: .LBB1_2: # %entry
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umin_2:
+; RV64IM: # %bb.0: # %entry
+; RV64IM-NEXT: mv a2, a0
+; RV64IM-NEXT: sext.w a0, a1
+; RV64IM-NEXT: neg a1, a2
+; RV64IM-NEXT: ori a1, a1, 32
+; RV64IM-NEXT: bltu a0, a1, .LBB1_2
+; RV64IM-NEXT: # %bb.1: # %entry
+; RV64IM-NEXT: mv a0, a1
+; RV64IM-NEXT: .LBB1_2: # %entry
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umin_2:
+; RV32IMZBB: # %bb.0: # %entry
+; RV32IMZBB-NEXT: neg a0, a0
+; RV32IMZBB-NEXT: ori a0, a0, 32
+; RV32IMZBB-NEXT: minu a0, a1, a0
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umin_2:
+; RV64IMZBB: # %bb.0: # %entry
+; RV64IMZBB-NEXT: sext.w a1, a1
+; RV64IMZBB-NEXT: neg a0, a0
+; RV64IMZBB-NEXT: ori a0, a0, 32
+; RV64IMZBB-NEXT: minu a0, a1, a0
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umin_2:
+; RV32IMZICOND: # %bb.0: # %entry
+; RV32IMZICOND-NEXT: neg a0, a0
+; RV32IMZICOND-NEXT: ori a0, a0, 32
+; RV32IMZICOND-NEXT: sltu a2, a1, a0
+; RV32IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV32IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV32IMZICOND-NEXT: or a0, a1, a0
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umin_2:
+; RV64IMZICOND: # %bb.0: # %entry
+; RV64IMZICOND-NEXT: sext.w a1, a1
+; RV64IMZICOND-NEXT: neg a0, a0
+; RV64IMZICOND-NEXT: ori a0, a0, 32
+; RV64IMZICOND-NEXT: sltu a2, a1, a0
+; RV64IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV64IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV64IMZICOND-NEXT: or a0, a1, a0
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umin_2:
+; RV32IMBOTH: # %bb.0: # %entry
+; RV32IMBOTH-NEXT: neg a0, a0
+; RV32IMBOTH-NEXT: ori a0, a0, 32
+; RV32IMBOTH-NEXT: minu a0, a1, a0
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umin_2:
+; RV64IMBOTH: # %bb.0: # %entry
+; RV64IMBOTH-NEXT: sext.w a1, a1
+; RV64IMBOTH-NEXT: neg a0, a0
+; RV64IMBOTH-NEXT: ori a0, a0, 32
+; RV64IMBOTH-NEXT: minu a0, a1, a0
+; RV64IMBOTH-NEXT: ret
+entry:
+ %c = call i32 @llvm.umin(i32 %a, i32 32)
+ %res = select i1 %cond, i32 %a, i32 %c
+ ret i32 %res
+}
+
+define i32 @select_umin_3(i1 zeroext %cond, i32 %a) {
+; RV32IM-LABEL: select_umin_3:
+; RV32IM: # %bb.0: # %entry
+; RV32IM-NEXT: neg a0, a0
+; RV32IM-NEXT: ori a2, a0, 32
+; RV32IM-NEXT: mv a0, a1
+; RV32IM-NEXT: bltu a1, a2, .LBB2_2
+; RV32IM-NEXT: # %bb.1: # %entry
+; RV32IM-NEXT: mv a0, a2
+; RV32IM-NEXT: .LBB2_2: # %entry
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umin_3:
+; RV64IM: # %bb.0: # %entry
+; RV64IM-NEXT: mv a2, a0
+; RV64IM-NEXT: sext.w a0, a1
+; RV64IM-NEXT: neg a1, a2
+; RV64IM-NEXT: ori a1, a1, 32
+; RV64IM-NEXT: bltu a0, a1, .LBB2_2
+; RV64IM-NEXT: # %bb.1: # %entry
+; RV64IM-NEXT: mv a0, a1
+; RV64IM-NEXT: .LBB2_2: # %entry
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umin_3:
+; RV32IMZBB: # %bb.0: # %entry
+; RV32IMZBB-NEXT: neg a0, a0
+; RV32IMZBB-NEXT: ori a0, a0, 32
+; RV32IMZBB-NEXT: minu a0, a1, a0
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umin_3:
+; RV64IMZBB: # %bb.0: # %entry
+; RV64IMZBB-NEXT: sext.w a1, a1
+; RV64IMZBB-NEXT: neg a0, a0
+; RV64IMZBB-NEXT: ori a0, a0, 32
+; RV64IMZBB-NEXT: minu a0, a1, a0
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umin_3:
+; RV32IMZICOND: # %bb.0: # %entry
+; RV32IMZICOND-NEXT: neg a0, a0
+; RV32IMZICOND-NEXT: ori a0, a0, 32
+; RV32IMZICOND-NEXT: sltu a2, a1, a0
+; RV32IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV32IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV32IMZICOND-NEXT: or a0, a1, a0
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umin_3:
+; RV64IMZICOND: # %bb.0: # %entry
+; RV64IMZICOND-NEXT: sext.w a1, a1
+; RV64IMZICOND-NEXT: neg a0, a0
+; RV64IMZICOND-NEXT: ori a0, a0, 32
+; RV64IMZICOND-NEXT: sltu a2, a1, a0
+; RV64IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV64IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV64IMZICOND-NEXT: or a0, a1, a0
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umin_3:
+; RV32IMBOTH: # %bb.0: # %entry
+; RV32IMBOTH-NEXT: neg a0, a0
+; RV32IMBOTH-NEXT: ori a0, a0, 32
+; RV32IMBOTH-NEXT: minu a0, a1, a0
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umin_3:
+; RV64IMBOTH: # %bb.0: # %entry
+; RV64IMBOTH-NEXT: sext.w a1, a1
+; RV64IMBOTH-NEXT: neg a0, a0
+; RV64IMBOTH-NEXT: ori a0, a0, 32
+; RV64IMBOTH-NEXT: minu a0, a1, a0
+; RV64IMBOTH-NEXT: ret
+entry:
+ %c = call i32 @llvm.umin(i32 %a, i32 32)
+ %res = select i1 %cond, i32 %a, i32 %c
+ ret i32 %res
+}
+
+define i32 @select_umin_4(i1 zeroext %cond, i32 %x) {
+; RV32IM-LABEL: select_umin_4:
+; RV32IM: # %bb.0:
+; RV32IM-NEXT: neg a0, a0
+; RV32IM-NEXT: or a0, a0, a1
+; RV32IM-NEXT: li a1, 128
+; RV32IM-NEXT: bltu a0, a1, .LBB3_2
+; RV32IM-NEXT: # %bb.1:
+; RV32IM-NEXT: li a0, 128
+; RV32IM-NEXT: .LBB3_2:
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umin_4:
+; RV64IM: # %bb.0:
+; RV64IM-NEXT: neg a0, a0
+; RV64IM-NEXT: or a0, a0, a1
+; RV64IM-NEXT: sext.w a0, a0
+; RV64IM-NEXT: li a1, 128
+; RV64IM-NEXT: bltu a0, a1, .LBB3_2
+; RV64IM-NEXT: # %bb.1:
+; RV64IM-NEXT: li a0, 128
+; RV64IM-NEXT: .LBB3_2:
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umin_4:
+; RV32IMZBB: # %bb.0:
+; RV32IMZBB-NEXT: neg a0, a0
+; RV32IMZBB-NEXT: or a0, a0, a1
+; RV32IMZBB-NEXT: li a1, 128
+; RV32IMZBB-NEXT: minu a0, a0, a1
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umin_4:
+; RV64IMZBB: # %bb.0:
+; RV64IMZBB-NEXT: neg a0, a0
+; RV64IMZBB-NEXT: or a0, a0, a1
+; RV64IMZBB-NEXT: sext.w a0, a0
+; RV64IMZBB-NEXT: li a1, 128
+; RV64IMZBB-NEXT: minu a0, a0, a1
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umin_4:
+; RV32IMZICOND: # %bb.0:
+; RV32IMZICOND-NEXT: neg a0, a0
+; RV32IMZICOND-NEXT: or a0, a0, a1
+; RV32IMZICOND-NEXT: sltiu a1, a0, 128
+; RV32IMZICOND-NEXT: addi a0, a0, -128
+; RV32IMZICOND-NEXT: czero.eqz a0, a0, a1
+; RV32IMZICOND-NEXT: addi a0, a0, 128
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umin_4:
+; RV64IMZICOND: # %bb.0:
+; RV64IMZICOND-NEXT: neg a0, a0
+; RV64IMZICOND-NEXT: or a0, a0, a1
+; RV64IMZICOND-NEXT: sext.w a0, a0
+; RV64IMZICOND-NEXT: sltiu a1, a0, 128
+; RV64IMZICOND-NEXT: addi a0, a0, -128
+; RV64IMZICOND-NEXT: czero.eqz a0, a0, a1
+; RV64IMZICOND-NEXT: addi a0, a0, 128
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umin_4:
+; RV32IMBOTH: # %bb.0:
+; RV32IMBOTH-NEXT: neg a0, a0
+; RV32IMBOTH-NEXT: or a0, a0, a1
+; RV32IMBOTH-NEXT: li a1, 128
+; RV32IMBOTH-NEXT: minu a0, a0, a1
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umin_4:
+; RV64IMBOTH: # %bb.0:
+; RV64IMBOTH-NEXT: neg a0, a0
+; RV64IMBOTH-NEXT: or a0, a0, a1
+; RV64IMBOTH-NEXT: sext.w a0, a0
+; RV64IMBOTH-NEXT: li a1, 128
+; RV64IMBOTH-NEXT: minu a0, a0, a1
+; RV64IMBOTH-NEXT: ret
+ %add = call i32 @llvm.umin(i32 %x, i32 128)
+ %sel = select i1 %cond, i32 128, i32 %add
+ ret i32 %sel
+}
+
+define i32 @select_umax_1(i1 zeroext %cond, i32 %a, i32 %b) {
+; RV32IM-LABEL: select_umax_1:
+; RV32IM: # %bb.0: # %entry
+; RV32IM-NEXT: neg a0, a0
+; RV32IM-NEXT: and a1, a0, a1
+; RV32IM-NEXT: mv a0, a2
+; RV32IM-NEXT: bltu a1, a2, .LBB4_2
+; RV32IM-NEXT: # %bb.1: # %entry
+; RV32IM-NEXT: mv a0, a1
+; RV32IM-NEXT: .LBB4_2: # %entry
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umax_1:
+; RV64IM: # %bb.0: # %entry
+; RV64IM-NEXT: mv a3, a0
+; RV64IM-NEXT: sext.w a0, a2
+; RV64IM-NEXT: neg a2, a3
+; RV64IM-NEXT: and a1, a2, a1
+; RV64IM-NEXT: sext.w a1, a1
+; RV64IM-NEXT: bltu a1, a0, .LBB4_2
+; RV64IM-NEXT: # %bb.1: # %entry
+; RV64IM-NEXT: mv a0, a1
+; RV64IM-NEXT: .LBB4_2: # %entry
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umax_1:
+; RV32IMZBB: # %bb.0: # %entry
+; RV32IMZBB-NEXT: neg a0, a0
+; RV32IMZBB-NEXT: and a0, a0, a1
+; RV32IMZBB-NEXT: maxu a0, a2, a0
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umax_1:
+; RV64IMZBB: # %bb.0: # %entry
+; RV64IMZBB-NEXT: sext.w a2, a2
+; RV64IMZBB-NEXT: neg a0, a0
+; RV64IMZBB-NEXT: and a0, a0, a1
+; RV64IMZBB-NEXT: sext.w a0, a0
+; RV64IMZBB-NEXT: maxu a0, a2, a0
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umax_1:
+; RV32IMZICOND: # %bb.0: # %entry
+; RV32IMZICOND-NEXT: czero.eqz a0, a1, a0
+; RV32IMZICOND-NEXT: sltu a1, a0, a2
+; RV32IMZICOND-NEXT: czero.nez a0, a0, a1
+; RV32IMZICOND-NEXT: czero.eqz a1, a2, a1
+; RV32IMZICOND-NEXT: or a0, a1, a0
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umax_1:
+; RV64IMZICOND: # %bb.0: # %entry
+; RV64IMZICOND-NEXT: sext.w a2, a2
+; RV64IMZICOND-NEXT: czero.eqz a0, a1, a0
+; RV64IMZICOND-NEXT: sext.w a0, a0
+; RV64IMZICOND-NEXT: sltu a1, a0, a2
+; RV64IMZICOND-NEXT: czero.nez a0, a0, a1
+; RV64IMZICOND-NEXT: czero.eqz a1, a2, a1
+; RV64IMZICOND-NEXT: or a0, a1, a0
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umax_1:
+; RV32IMBOTH: # %bb.0: # %entry
+; RV32IMBOTH-NEXT: czero.eqz a0, a1, a0
+; RV32IMBOTH-NEXT: maxu a0, a2, a0
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umax_1:
+; RV64IMBOTH: # %bb.0: # %entry
+; RV64IMBOTH-NEXT: sext.w a2, a2
+; RV64IMBOTH-NEXT: czero.eqz a0, a1, a0
+; RV64IMBOTH-NEXT: sext.w a0, a0
+; RV64IMBOTH-NEXT: maxu a0, a2, a0
+; RV64IMBOTH-NEXT: ret
+entry:
+ %c = call i32 @llvm.umax(i32 %a, i32 %b)
+ %res = select i1 %cond, i32 %c, i32 %b
+ ret i32 %res
+}
+
+define i32 @select_umax_2(i1 zeroext %cond, i32 %a, i32 %b) {
+; RV32IM-LABEL: select_umax_2:
+; RV32IM: # %bb.0: # %entry
+; RV32IM-NEXT: addi a0, a0, -1
+; RV32IM-NEXT: andi a2, a0, 32
+; RV32IM-NEXT: mv a0, a1
+; RV32IM-NEXT: bltu a2, a1, .LBB5_2
+; RV32IM-NEXT: # %bb.1: # %entry
+; RV32IM-NEXT: mv a0, a2
+; RV32IM-NEXT: .LBB5_2: # %entry
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umax_2:
+; RV64IM: # %bb.0: # %entry
+; RV64IM-NEXT: mv a2, a0
+; RV64IM-NEXT: sext.w a0, a1
+; RV64IM-NEXT: addi a2, a2, -1
+; RV64IM-NEXT: andi a1, a2, 32
+; RV64IM-NEXT: bltu a1, a0, .LBB5_2
+; RV64IM-NEXT: # %bb.1: # %entry
+; RV64IM-NEXT: mv a0, a1
+; RV64IM-NEXT: .LBB5_2: # %entry
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umax_2:
+; RV32IMZBB: # %bb.0: # %entry
+; RV32IMZBB-NEXT: addi a0, a0, -1
+; RV32IMZBB-NEXT: andi a0, a0, 32
+; RV32IMZBB-NEXT: maxu a0, a1, a0
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umax_2:
+; RV64IMZBB: # %bb.0: # %entry
+; RV64IMZBB-NEXT: sext.w a1, a1
+; RV64IMZBB-NEXT: addi a0, a0, -1
+; RV64IMZBB-NEXT: andi a0, a0, 32
+; RV64IMZBB-NEXT: maxu a0, a1, a0
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umax_2:
+; RV32IMZICOND: # %bb.0: # %entry
+; RV32IMZICOND-NEXT: addi a0, a0, -1
+; RV32IMZICOND-NEXT: andi a0, a0, 32
+; RV32IMZICOND-NEXT: sltu a2, a0, a1
+; RV32IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV32IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV32IMZICOND-NEXT: or a0, a1, a0
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umax_2:
+; RV64IMZICOND: # %bb.0: # %entry
+; RV64IMZICOND-NEXT: sext.w a1, a1
+; RV64IMZICOND-NEXT: addi a0, a0, -1
+; RV64IMZICOND-NEXT: andi a0, a0, 32
+; RV64IMZICOND-NEXT: sltu a2, a0, a1
+; RV64IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV64IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV64IMZICOND-NEXT: or a0, a1, a0
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umax_2:
+; RV32IMBOTH: # %bb.0: # %entry
+; RV32IMBOTH-NEXT: addi a0, a0, -1
+; RV32IMBOTH-NEXT: andi a0, a0, 32
+; RV32IMBOTH-NEXT: maxu a0, a1, a0
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umax_2:
+; RV64IMBOTH: # %bb.0: # %entry
+; RV64IMBOTH-NEXT: sext.w a1, a1
+; RV64IMBOTH-NEXT: addi a0, a0, -1
+; RV64IMBOTH-NEXT: andi a0, a0, 32
+; RV64IMBOTH-NEXT: maxu a0, a1, a0
+; RV64IMBOTH-NEXT: ret
+entry:
+ %c = call i32 @llvm.umax(i32 %a, i32 32)
+ %res = select i1 %cond, i32 %a, i32 %c
+ ret i32 %res
+}
+
+define i32 @select_umax_3(i1 zeroext %cond, i32 %a) {
+; RV32IM-LABEL: select_umax_3:
+; RV32IM: # %bb.0: # %entry
+; RV32IM-NEXT: addi a0, a0, -1
+; RV32IM-NEXT: andi a2, a0, 32
+; RV32IM-NEXT: mv a0, a1
+; RV32IM-NEXT: bltu a2, a1, .LBB6_2
+; RV32IM-NEXT: # %bb.1: # %entry
+; RV32IM-NEXT: mv a0, a2
+; RV32IM-NEXT: .LBB6_2: # %entry
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umax_3:
+; RV64IM: # %bb.0: # %entry
+; RV64IM-NEXT: mv a2, a0
+; RV64IM-NEXT: sext.w a0, a1
+; RV64IM-NEXT: addi a2, a2, -1
+; RV64IM-NEXT: andi a1, a2, 32
+; RV64IM-NEXT: bltu a1, a0, .LBB6_2
+; RV64IM-NEXT: # %bb.1: # %entry
+; RV64IM-NEXT: mv a0, a1
+; RV64IM-NEXT: .LBB6_2: # %entry
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umax_3:
+; RV32IMZBB: # %bb.0: # %entry
+; RV32IMZBB-NEXT: addi a0, a0, -1
+; RV32IMZBB-NEXT: andi a0, a0, 32
+; RV32IMZBB-NEXT: maxu a0, a1, a0
+; RV32IMZBB-NEXT: ret
+;
+; RV64IMZBB-LABEL: select_umax_3:
+; RV64IMZBB: # %bb.0: # %entry
+; RV64IMZBB-NEXT: sext.w a1, a1
+; RV64IMZBB-NEXT: addi a0, a0, -1
+; RV64IMZBB-NEXT: andi a0, a0, 32
+; RV64IMZBB-NEXT: maxu a0, a1, a0
+; RV64IMZBB-NEXT: ret
+;
+; RV32IMZICOND-LABEL: select_umax_3:
+; RV32IMZICOND: # %bb.0: # %entry
+; RV32IMZICOND-NEXT: addi a0, a0, -1
+; RV32IMZICOND-NEXT: andi a0, a0, 32
+; RV32IMZICOND-NEXT: sltu a2, a0, a1
+; RV32IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV32IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV32IMZICOND-NEXT: or a0, a1, a0
+; RV32IMZICOND-NEXT: ret
+;
+; RV64IMZICOND-LABEL: select_umax_3:
+; RV64IMZICOND: # %bb.0: # %entry
+; RV64IMZICOND-NEXT: sext.w a1, a1
+; RV64IMZICOND-NEXT: addi a0, a0, -1
+; RV64IMZICOND-NEXT: andi a0, a0, 32
+; RV64IMZICOND-NEXT: sltu a2, a0, a1
+; RV64IMZICOND-NEXT: czero.nez a0, a0, a2
+; RV64IMZICOND-NEXT: czero.eqz a1, a1, a2
+; RV64IMZICOND-NEXT: or a0, a1, a0
+; RV64IMZICOND-NEXT: ret
+;
+; RV32IMBOTH-LABEL: select_umax_3:
+; RV32IMBOTH: # %bb.0: # %entry
+; RV32IMBOTH-NEXT: addi a0, a0, -1
+; RV32IMBOTH-NEXT: andi a0, a0, 32
+; RV32IMBOTH-NEXT: maxu a0, a1, a0
+; RV32IMBOTH-NEXT: ret
+;
+; RV64IMBOTH-LABEL: select_umax_3:
+; RV64IMBOTH: # %bb.0: # %entry
+; RV64IMBOTH-NEXT: sext.w a1, a1
+; RV64IMBOTH-NEXT: addi a0, a0, -1
+; RV64IMBOTH-NEXT: andi a0, a0, 32
+; RV64IMBOTH-NEXT: maxu a0, a1, a0
+; RV64IMBOTH-NEXT: ret
+entry:
+ %c = call i32 @llvm.umax(i32 %a, i32 32)
+ %res = select i1 %cond, i32 %a, i32 %c
+ ret i32 %res
+}
+
+define i32 @select_umax_4(i1 zeroext %cond, i32 %x) {
+; RV32IM-LABEL: select_umax_4:
+; RV32IM: # %bb.0:
+; RV32IM-NEXT: addi a0, a0, -1
+; RV32IM-NEXT: and a0, a0, a1
+; RV32IM-NEXT: li a1, 128
+; RV32IM-NEXT: bltu a1, a0, .LBB7_2
+; RV32IM-NEXT: # %bb.1:
+; RV32IM-NEXT: li a0, 128
+; RV32IM-NEXT: .LBB7_2:
+; RV32IM-NEXT: ret
+;
+; RV64IM-LABEL: select_umax_4:
+; RV64IM: # %bb.0:
+; RV64IM-NEXT: addi a0, a0, -1
+; RV64IM-NEXT: and a0, a0, a1
+; RV64IM-NEXT: sext.w a0, a0
+; RV64IM-NEXT: li a1, 128
+; RV64IM-NEXT: bltu a1, a0, .LBB7_2
+; RV64IM-NEXT: # %bb.1:
+; RV64IM-NEXT: li a0, 128
+; RV64IM-NEXT: .LBB7_2:
+; RV64IM-NEXT: ret
+;
+; RV32IMZBB-LABEL: select_umax...
[truncated]
|
case ISD::UMIN: | ||
case ISD::UMAX: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SMAX/SMIN would also be good to add, and yield similar results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mentioned in the description "smin/smax are a bit harder, and will be a separate change."
ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/31009 Here is the relevant piece of the build log for the reference
|
The neutral values for these are -1U, and 0 respectively. We already have good arithmetic lowerings for selects with one arm equal to these values. smin/smax are a bit harder, and will be a separate change.
Somewhat surprisingly, this looks to be a net code improvement in all of the configurations. With both zbb, it's a clear win. With only zicond, we still seem to come out ahead because we reduce the number of ziconds needed (since we lower min/max to them). Without either zbb or zicond, we're a bit more of wash, but the available arithmetic sequences are good enough that doing the select unconditionally before using branches for the min/max is probably still worthwhile?
This stacks on #157539, and includes that change.