Skip to content

Conversation

@slachowsky
Copy link
Contributor

Provide shorter atomic LR/SC sequences with non-base instructions (eg. ''B'' extension instructions) when implementations opt in to FeaturePermissiveZalrsc. Currently this shortens atomicrmw {min,max,umin,umax} pseudo expansions.

There is no functional change for machines when this target feature is not requested.

@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2025

@llvm/pr-subscribers-backend-risc-v

Author: None (slachowsky)

Changes

Provide shorter atomic LR/SC sequences with non-base instructions (eg. ''B'' extension instructions) when implementations opt in to FeaturePermissiveZalrsc. Currently this shortens atomicrmw {min,max,umin,umax} pseudo expansions.

There is no functional change for machines when this target feature is not requested.


Patch is 46.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165042.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp (+23)
  • (modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+16)
  • (added) llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll (+1074)
diff --git a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
index 98b636e8e0e55..9bd66a43717e7 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
@@ -373,6 +373,26 @@ static void doAtomicBinOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
         .addReg(ScratchReg)
         .addImm(-1);
     break;
+  case AtomicRMWInst::Max:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MAX), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
+  case AtomicRMWInst::Min:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MIN), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
+  case AtomicRMWInst::UMax:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MAXU), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
+  case AtomicRMWInst::UMin:
+    BuildMI(LoopMBB, DL, TII->get(RISCV::MINU), ScratchReg)
+        .addReg(DestReg)
+        .addReg(IncrReg);
+    break;
   }
   BuildMI(LoopMBB, DL, TII->get(getSCForRMW(Ordering, Width, STI)), ScratchReg)
       .addReg(ScratchReg)
@@ -682,6 +702,9 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
     MachineBasicBlock::iterator &NextMBBI) {
+  // Using MIN(U)/MAX(U) is preferrable if permitted
+  if (STI->hasPermissiveZalrsc() && STI->hasStdExtZbb() && !IsMasked)
+    return expandAtomicBinOp(MBB, MBBI, BinOp, IsMasked, Width, NextMBBI);
 
   MachineInstr &MI = *MBBI;
   DebugLoc DL = MI.getDebugLoc();
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 2754d789b9899..3c1e9665d823e 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1906,6 +1906,22 @@ def FeatureForcedAtomics : SubtargetFeature<
 def HasAtomicLdSt
     : Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;
 
+// The RISCV Unprivileged Architecture defines _constrained_ LR/SC loops:
+//   The dynamic code executed between the LR and SC instructions can only
+//   contain instructions from the base ''I'' instruction set, excluding loads,
+//   stores, backward jumps, taken backward branches, JALR, FENCE, and SYSTEM
+//   instructions. Compressed forms of the aforementioned ''I'' instructions in
+//   the Zca and Zcb extensions are also permitted.
+// LR/SC loops that do not adhere to the above are _unconstrained_ LR/SC loops,
+// and success is implementation specific. For implementations which know that
+// non-base instructions (such as the ''B'' extension) will not violate any
+// forward progress guarantees, using these instructions to reduce the LR/SC
+// sequence length is desirable.
+def FeaturePermissiveZalrsc
+    : SubtargetFeature<
+          "permissive-zalrsc", "HasPermissiveZalrsc", "true",
+          "Implementation permits non-base instructions between LR/SC pairs">;
+
 def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
     "AllowTaggedGlobals",
     "true", "Use an instruction sequence for taking the address of a global "
diff --git a/llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll b/llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll
new file mode 100644
index 0000000000000..9ce987c1add50
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/atomic-rmw-minmax.ll
@@ -0,0 +1,1074 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 -mattr=+zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32I-ZALRSC %s
+; RUN: llc -mtriple=riscv32 -mattr=+b,+zalrsc,+permissive-zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32IB-ZALRSC %s
+; RUN: llc -mtriple=riscv32 -mattr=+a -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV32IA %s
+;
+; RUN: llc -mtriple=riscv64 -mattr=+zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64I-ZALRSC %s
+; RUN: llc -mtriple=riscv64 -mattr=+b,+zalrsc,+permissive-zalrsc -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64IB-ZALRSC %s
+; RUN: llc -mtriple=riscv64 -mattr=+a -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=RV64IA %s
+
+define i32 @atomicrmw_max_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bge a3, a1, .LBB0_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB0_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB0_3: # in Loop: Header=BB0_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    max a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_max_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amomax.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bge a3, a2, .LBB0_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB0_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB0_3: # in Loop: Header=BB0_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_max_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB0_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    max a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB0_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_max_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amomax.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw max ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i32 @atomicrmw_min_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bge a1, a3, .LBB1_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB1_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB1_3: # in Loop: Header=BB1_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    min a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_min_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amomin.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bge a2, a3, .LBB1_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB1_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB1_3: # in Loop: Header=BB1_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_min_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB1_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    min a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB1_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_min_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amomin.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw min ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i32 @atomicrmw_umax_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bgeu a3, a1, .LBB2_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB2_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB2_3: # in Loop: Header=BB2_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    maxu a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amomaxu.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bgeu a3, a2, .LBB2_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB2_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB2_3: # in Loop: Header=BB2_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB2_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    maxu a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB2_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_umax_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amomaxu.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw umax ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i32 @atomicrmw_umin_i32_seq_cst(ptr %a, i32 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32I-ZALRSC-NEXT:    mv a3, a2
+; RV32I-ZALRSC-NEXT:    bgeu a1, a3, .LBB3_3
+; RV32I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB3_1 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a3, a1
+; RV32I-ZALRSC-NEXT:  .LBB3_3: # in Loop: Header=BB3_1 Depth=1
+; RV32I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32I-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV32I-ZALRSC-NEXT:  # %bb.4:
+; RV32I-ZALRSC-NEXT:    mv a0, a2
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    lr.w.aqrl a2, (a0)
+; RV32IB-ZALRSC-NEXT:    minu a3, a2, a1
+; RV32IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV32IB-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV32IB-ZALRSC-NEXT:  # %bb.2:
+; RV32IB-ZALRSC-NEXT:    mv a0, a2
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    amominu.w.aqrl a0, a1, (a0)
+; RV32IA-NEXT:    ret
+;
+; RV64I-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV64I-ZALRSC:       # %bb.0:
+; RV64I-ZALRSC-NEXT:    sext.w a2, a1
+; RV64I-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV64I-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64I-ZALRSC-NEXT:    mv a3, a1
+; RV64I-ZALRSC-NEXT:    bgeu a2, a3, .LBB3_3
+; RV64I-ZALRSC-NEXT:  # %bb.2: # in Loop: Header=BB3_1 Depth=1
+; RV64I-ZALRSC-NEXT:    mv a3, a2
+; RV64I-ZALRSC-NEXT:  .LBB3_3: # in Loop: Header=BB3_1 Depth=1
+; RV64I-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64I-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV64I-ZALRSC-NEXT:  # %bb.4:
+; RV64I-ZALRSC-NEXT:    mv a0, a1
+; RV64I-ZALRSC-NEXT:    ret
+;
+; RV64IB-ZALRSC-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV64IB-ZALRSC:       # %bb.0:
+; RV64IB-ZALRSC-NEXT:    sext.w a2, a1
+; RV64IB-ZALRSC-NEXT:  .LBB3_1: # =>This Inner Loop Header: Depth=1
+; RV64IB-ZALRSC-NEXT:    lr.w.aqrl a1, (a0)
+; RV64IB-ZALRSC-NEXT:    minu a3, a1, a2
+; RV64IB-ZALRSC-NEXT:    sc.w.rl a3, a3, (a0)
+; RV64IB-ZALRSC-NEXT:    bnez a3, .LBB3_1
+; RV64IB-ZALRSC-NEXT:  # %bb.2:
+; RV64IB-ZALRSC-NEXT:    mv a0, a1
+; RV64IB-ZALRSC-NEXT:    ret
+;
+; RV64IA-LABEL: atomicrmw_umin_i32_seq_cst:
+; RV64IA:       # %bb.0:
+; RV64IA-NEXT:    amominu.w.aqrl a0, a1, (a0)
+; RV64IA-NEXT:    ret
+  %1 = atomicrmw umin ptr %a, i32 %b seq_cst
+  ret i32 %1
+}
+
+define i64 @atomicrmw_max_i64_seq_cst(ptr %a, i64 %b) nounwind {
+; RV32I-ZALRSC-LABEL: atomicrmw_max_i64_seq_cst:
+; RV32I-ZALRSC:       # %bb.0:
+; RV32I-ZALRSC-NEXT:    addi sp, sp, -32
+; RV32I-ZALRSC-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
+; RV32I-ZALRSC-NEXT:    mv s0, a2
+; RV32I-ZALRSC-NEXT:    mv s1, a0
+; RV32I-ZALRSC-NEXT:    lw a4, 0(a0)
+; RV32I-ZALRSC-NEXT:    lw a5, 4(a0)
+; RV32I-ZALRSC-NEXT:    mv s2, a1
+; RV32I-ZALRSC-NEXT:    j .LBB4_2
+; RV32I-ZALRSC-NEXT:  .LBB4_1: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    sw a4, 8(sp)
+; RV32I-ZALRSC-NEXT:    sw a5, 12(sp)
+; RV32I-ZALRSC-NEXT:    addi a1, sp, 8
+; RV32I-ZALRSC-NEXT:    li a4, 5
+; RV32I-ZALRSC-NEXT:    li a5, 5
+; RV32I-ZALRSC-NEXT:    mv a0, s1
+; RV32I-ZALRSC-NEXT:    call __atomic_compare_exchange_8
+; RV32I-ZALRSC-NEXT:    lw a4, 8(sp)
+; RV32I-ZALRSC-NEXT:    lw a5, 12(sp)
+; RV32I-ZALRSC-NEXT:    bnez a0, .LBB4_7
+; RV32I-ZALRSC-NEXT:  .LBB4_2: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32I-ZALRSC-NEXT:    beq a5, s0, .LBB4_4
+; RV32I-ZALRSC-NEXT:  # %bb.3: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    slt a0, s0, a5
+; RV32I-ZALRSC-NEXT:    j .LBB4_5
+; RV32I-ZALRSC-NEXT:  .LBB4_4: # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    sltu a0, s2, a4
+; RV32I-ZALRSC-NEXT:  .LBB4_5: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a2, a4
+; RV32I-ZALRSC-NEXT:    mv a3, a5
+; RV32I-ZALRSC-NEXT:    bnez a0, .LBB4_1
+; RV32I-ZALRSC-NEXT:  # %bb.6: # %atomicrmw.start
+; RV32I-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32I-ZALRSC-NEXT:    mv a2, s2
+; RV32I-ZALRSC-NEXT:    mv a3, s0
+; RV32I-ZALRSC-NEXT:    j .LBB4_1
+; RV32I-ZALRSC-NEXT:  .LBB4_7: # %atomicrmw.end
+; RV32I-ZALRSC-NEXT:    mv a0, a4
+; RV32I-ZALRSC-NEXT:    mv a1, a5
+; RV32I-ZALRSC-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
+; RV32I-ZALRSC-NEXT:    addi sp, sp, 32
+; RV32I-ZALRSC-NEXT:    ret
+;
+; RV32IB-ZALRSC-LABEL: atomicrmw_max_i64_seq_cst:
+; RV32IB-ZALRSC:       # %bb.0:
+; RV32IB-ZALRSC-NEXT:    addi sp, sp, -32
+; RV32IB-ZALRSC-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
+; RV32IB-ZALRSC-NEXT:    mv s0, a2
+; RV32IB-ZALRSC-NEXT:    mv s1, a0
+; RV32IB-ZALRSC-NEXT:    lw a4, 0(a0)
+; RV32IB-ZALRSC-NEXT:    lw a5, 4(a0)
+; RV32IB-ZALRSC-NEXT:    mv s2, a1
+; RV32IB-ZALRSC-NEXT:    j .LBB4_2
+; RV32IB-ZALRSC-NEXT:  .LBB4_1: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    sw a4, 8(sp)
+; RV32IB-ZALRSC-NEXT:    sw a5, 12(sp)
+; RV32IB-ZALRSC-NEXT:    addi a1, sp, 8
+; RV32IB-ZALRSC-NEXT:    li a4, 5
+; RV32IB-ZALRSC-NEXT:    li a5, 5
+; RV32IB-ZALRSC-NEXT:    mv a0, s1
+; RV32IB-ZALRSC-NEXT:    call __atomic_compare_exchange_8
+; RV32IB-ZALRSC-NEXT:    lw a4, 8(sp)
+; RV32IB-ZALRSC-NEXT:    lw a5, 12(sp)
+; RV32IB-ZALRSC-NEXT:    bnez a0, .LBB4_7
+; RV32IB-ZALRSC-NEXT:  .LBB4_2: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # =>This Inner Loop Header: Depth=1
+; RV32IB-ZALRSC-NEXT:    beq a5, s0, .LBB4_4
+; RV32IB-ZALRSC-NEXT:  # %bb.3: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    slt a0, s0, a5
+; RV32IB-ZALRSC-NEXT:    j .LBB4_5
+; RV32IB-ZALRSC-NEXT:  .LBB4_4: # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    sltu a0, s2, a4
+; RV32IB-ZALRSC-NEXT:  .LBB4_5: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    mv a2, a4
+; RV32IB-ZALRSC-NEXT:    mv a3, a5
+; RV32IB-ZALRSC-NEXT:    bnez a0, .LBB4_1
+; RV32IB-ZALRSC-NEXT:  # %bb.6: # %atomicrmw.start
+; RV32IB-ZALRSC-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IB-ZALRSC-NEXT:    mv a2, s2
+; RV32IB-ZALRSC-NEXT:    mv a3, s0
+; RV32IB-ZALRSC-NEXT:    j .LBB4_1
+; RV32IB-ZALRSC-NEXT:  .LBB4_7: # %atomicrmw.end
+; RV32IB-ZALRSC-NEXT:    mv a0, a4
+; RV32IB-ZALRSC-NEXT:    mv a1, a5
+; RV32IB-ZALRSC-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
+; RV32IB-ZALRSC-NEXT:    addi sp, sp, 32
+; RV32IB-ZALRSC-NEXT:    ret
+;
+; RV32IA-LABEL: atomicrmw_max_i64_seq_cst:
+; RV32IA:       # %bb.0:
+; RV32IA-NEXT:    addi sp, sp, -32
+; RV32IA-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
+; RV32IA-NEXT:    mv s0, a2
+; RV32IA-NEXT:    mv s1, a0
+; RV32IA-NEXT:    lw a4, 0(a0)
+; RV32IA-NEXT:    lw a5, 4(a0)
+; RV32IA-NEXT:    mv s2, a1
+; RV32IA-NEXT:    j .LBB4_2
+; RV32IA-NEXT:  .LBB4_1: # %atomicrmw.start
+; RV32IA-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; RV32IA-NEXT:    sw a4, 8(sp)
+; RV32IA-NEXT:    sw a5, 12(sp)
+; RV32IA-NEXT:    addi a1, sp, 8
+; RV32IA-NEXT:    li a4, 5
+; RV32IA-NEXT:    li a5, ...
[truncated]

@slachowsky
Copy link
Contributor Author

@topperc @lenary @fpetrogalli @arichardson @wangpc-pp
This follows on from the previous #163672 'Zalrsc' enablement patch.

Provide shorter atomic LR/SC sequences with non-base instructions (eg.
''B'' extension instructions) when implementations opt in to
FeaturePermissiveZalrsc.  Currently this shortens `atomicrmw
{min,max,umin,umax}` pseudo expansions.

There is no functional change for machines when this target feature is
not requested.
@topperc
Copy link
Collaborator

topperc commented Oct 24, 2025

Please do not force push unless absolutely necessary. https://llvm.org/docs/GitHub.html#rebasing-pull-requests-and-force-pushes

def HasAtomicLdSt
: Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;

// The RISCV Unprivileged Architecture defines _constrained_ LR/SC loops:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISCV -> RISC-V. It is trademark that we should respect whenever possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@fpetrogalli fpetrogalli self-requested a review October 27, 2025 16:44
Copy link
Member

@fpetrogalli fpetrogalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +1923 to +1926
def FeaturePermissiveZalrsc
: SubtargetFeature<
"permissive-zalrsc", "HasPermissiveZalrsc", "true",
"Implementation permits non-base instructions between LR/SC pairs">;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask about the general trajectory here:

Do we expect all "permissive" cores to support the same set of instructions in zalrsc, or might different cores allow different sets of extensions in their loops?

If we expect different cores to allow different extensions, we should not be using such a generic name - I would prefer something like FeatureZalrscAllowsEXT for different EXTs (in this case, FeatureZalrscAllowsZbb), perhaps?

I would be slightly concerned if we're going to have a massive spread of these features (as for SFB).

Copy link
Contributor Author

@slachowsky slachowsky Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly a reasonable ask.

This feature is from the point of view of a minimal RISC-V core with LR/SC, and a global monitor that is external to the core. In such a configuration the global monitor is aware only of the load/store transactions to the memory system, and completely unaware of what instructions or control flow occurred on the CPU(s) (or non-CPU devices) to generate those transactions. Any instruction mix is permissible in this style of system (ignoring higher order concerns of guaranteed forward progress / eventual success), as long as the same memory transactions present to the monitor.

It is necessary to have some FeaturePermissiveZalrsc control to enable 'unconstrained' LR/SC loops, and the proposal here is there are no constraints on what is permissible. The idea is to admit shorter sequences via checks on extant secondary extension feature availability:

if (STI->hasPermissiveZalrsc() && STI->hasVendorExtABC())
  // build short vendor ABC instruction sequence
else if (STI->hasPermissiveZalrsc() && STI->hasStdExtXYZ())
  // build short standard XYZ instruction sequence
else
  // build original constrained sequence with only 'I' instructions

This avoids the explosion in features to cover the cross-products of permitted Zalrsc x {XYZ, ABC, etc}. If a core has no constraints on what is permitted, and it also has an instruction extension that gives a shorter sequence go ahead and use it.

Realistically though there is a tiny vocabulary of atomicrmw <ops>, and the existing pseudo expansions for these are very tightly coded, so there is very limited opportunity for improvement here. Other than Zbb MIN/MAX instructions in this patch, the only other instruction extension that I can think of that has utility is some sort of bit field insertion / bit select instructions that could shorten the xor + and + xor sequence used in the masked atomics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for explaining.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does not mention the "16 instructions placed sequentially in memory" rule that is also applied on constrained LR/SC sequences. Is it considered to be relaxed here?

LR/SC sequences succeed on Rocket if the SC comes after the LR within a fixed window measured in cycles and there are no intervening memory operations. Importantly, all of the instructions allowed in a constrained LR/SC loop are single-cycle on Rocket; min/max would extremely likely be fine, but div is problematic, mul might be depending on configuration, and floating point may work if there is real hardware support, but definitely won't if it is emulated by privileged software (which is allowed even if F/D are in -march).

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slachowsky
Copy link
Contributor Author

Ready for anyone with approval to merge.

@lenary lenary merged commit e940119 into llvm:main Oct 28, 2025
10 checks passed
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
Provide shorter atomic LR/SC sequences with non-base instructions (eg.
''B'' extension instructions) when implementations opt in to
FeaturePermissiveZalrsc. Currently this shortens `atomicrmw
{min,max,umin,umax}` pseudo expansions.

There is no functional change for machines when this target feature is
not requested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants