Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,26 @@ static void doAtomicBinOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
.addReg(ScratchReg)
.addImm(-1);
break;
case AtomicRMWInst::Max:
BuildMI(LoopMBB, DL, TII->get(RISCV::MAX), ScratchReg)
.addReg(DestReg)
.addReg(IncrReg);
break;
case AtomicRMWInst::Min:
BuildMI(LoopMBB, DL, TII->get(RISCV::MIN), ScratchReg)
.addReg(DestReg)
.addReg(IncrReg);
break;
case AtomicRMWInst::UMax:
BuildMI(LoopMBB, DL, TII->get(RISCV::MAXU), ScratchReg)
.addReg(DestReg)
.addReg(IncrReg);
break;
case AtomicRMWInst::UMin:
BuildMI(LoopMBB, DL, TII->get(RISCV::MINU), ScratchReg)
.addReg(DestReg)
.addReg(IncrReg);
break;
}
BuildMI(LoopMBB, DL, TII->get(getSCForRMW(Ordering, Width, STI)), ScratchReg)
.addReg(ScratchReg)
Expand Down Expand Up @@ -682,6 +702,9 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
MachineBasicBlock::iterator &NextMBBI) {
// Using MIN(U)/MAX(U) is preferrable if permitted
if (STI->hasPermissiveZalrsc() && STI->hasStdExtZbb() && !IsMasked)
return expandAtomicBinOp(MBB, MBBI, BinOp, IsMasked, Width, NextMBBI);

MachineInstr &MI = *MBBI;
DebugLoc DL = MI.getDebugLoc();
Expand Down
19 changes: 19 additions & 0 deletions llvm/lib/Target/RISCV/RISCVFeatures.td
Original file line number Diff line number Diff line change
Expand Up @@ -1906,6 +1906,25 @@ def FeatureForcedAtomics : SubtargetFeature<
def HasAtomicLdSt
: Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;

// The RISC-V Unprivileged Architecture - ISA Volume 1 (Version: 20250508)
// [https://docs.riscv.org/reference/isa/_attachments/riscv-unprivileged.pdf]
// in section 13.3. Eventual Success of Store-Conditional Instructions, defines
// _constrained_ LR/SC loops:
// The dynamic code executed between the LR and SC instructions can only
// contain instructions from the base ''I'' instruction set, excluding loads,
// stores, backward jumps, taken backward branches, JALR, FENCE, and SYSTEM
// instructions. Compressed forms of the aforementioned ''I'' instructions in
// the Zca and Zcb extensions are also permitted.
// LR/SC loops that do not adhere to the above are _unconstrained_ LR/SC loops,
// and success is implementation specific. For implementations which know that
// non-base instructions (such as the ''B'' extension) will not violate any
// forward progress guarantees, using these instructions to reduce the LR/SC
// sequence length is desirable.
def FeaturePermissiveZalrsc
: SubtargetFeature<
"permissive-zalrsc", "HasPermissiveZalrsc", "true",
"Implementation permits non-base instructions between LR/SC pairs">;
Comment on lines +1923 to +1926
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask about the general trajectory here:

Do we expect all "permissive" cores to support the same set of instructions in zalrsc, or might different cores allow different sets of extensions in their loops?

If we expect different cores to allow different extensions, we should not be using such a generic name - I would prefer something like FeatureZalrscAllowsEXT for different EXTs (in this case, FeatureZalrscAllowsZbb), perhaps?

I would be slightly concerned if we're going to have a massive spread of these features (as for SFB).

Copy link
Contributor Author

@slachowsky slachowsky Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly a reasonable ask.

This feature is from the point of view of a minimal RISC-V core with LR/SC, and a global monitor that is external to the core. In such a configuration the global monitor is aware only of the load/store transactions to the memory system, and completely unaware of what instructions or control flow occurred on the CPU(s) (or non-CPU devices) to generate those transactions. Any instruction mix is permissible in this style of system (ignoring higher order concerns of guaranteed forward progress / eventual success), as long as the same memory transactions present to the monitor.

It is necessary to have some FeaturePermissiveZalrsc control to enable 'unconstrained' LR/SC loops, and the proposal here is there are no constraints on what is permissible. The idea is to admit shorter sequences via checks on extant secondary extension feature availability:

if (STI->hasPermissiveZalrsc() && STI->hasVendorExtABC())
  // build short vendor ABC instruction sequence
else if (STI->hasPermissiveZalrsc() && STI->hasStdExtXYZ())
  // build short standard XYZ instruction sequence
else
  // build original constrained sequence with only 'I' instructions

This avoids the explosion in features to cover the cross-products of permitted Zalrsc x {XYZ, ABC, etc}. If a core has no constraints on what is permitted, and it also has an instruction extension that gives a shorter sequence go ahead and use it.

Realistically though there is a tiny vocabulary of atomicrmw <ops>, and the existing pseudo expansions for these are very tightly coded, so there is very limited opportunity for improvement here. Other than Zbb MIN/MAX instructions in this patch, the only other instruction extension that I can think of that has utility is some sort of bit field insertion / bit select instructions that could shorten the xor + and + xor sequence used in the masked atomics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for explaining.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does not mention the "16 instructions placed sequentially in memory" rule that is also applied on constrained LR/SC sequences. Is it considered to be relaxed here?

LR/SC sequences succeed on Rocket if the SC comes after the LR within a fixed window measured in cycles and there are no intervening memory operations. Importantly, all of the instructions allowed in a constrained LR/SC loop are single-cycle on Rocket; min/max would extremely likely be fine, but div is problematic, mul might be depending on configuration, and floating point may work if there is real hardware support, but definitely won't if it is emulated by privileged software (which is allowed even if F/D are in -march).


def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
"AllowTaggedGlobals",
"true", "Use an instruction sequence for taking the address of a global "
Expand Down
Loading