-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[RISCV][LLVM] Enable atomics for 'Zalrsc' #163672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-clang Author: None (slachowsky) ChangesThe 'A' atomics extension is composed of two subextensions, 'Zaamo' which has For machines where 'Zalrsc' is present, but 'Zaamo' is not, implement and enable There will be no functional change to subtargets supporting 'A', while allowing Patch is 1.02 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163672.diff 11 Files Affected:
diff --git a/clang/lib/Basic/Targets/RISCV.cpp b/clang/lib/Basic/Targets/RISCV.cpp
index 04da4e637af51..edfd92fc20bde 100644
--- a/clang/lib/Basic/Targets/RISCV.cpp
+++ b/clang/lib/Basic/Targets/RISCV.cpp
@@ -192,8 +192,10 @@ void RISCVTargetInfo::getTargetDefines(const LangOptions &Opts,
Builder.defineMacro("__riscv_muldiv");
}
- if (ISAInfo->hasExtension("a")) {
+ if (ISAInfo->hasExtension("a"))
Builder.defineMacro("__riscv_atomic");
+
+ if (ISAInfo->hasExtension("a") || ISAInfo->hasExtension("zalrsc")) {
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4");
diff --git a/clang/lib/Basic/Targets/RISCV.h b/clang/lib/Basic/Targets/RISCV.h
index d8b0e64c90dd6..3e1442db131a7 100644
--- a/clang/lib/Basic/Targets/RISCV.h
+++ b/clang/lib/Basic/Targets/RISCV.h
@@ -195,7 +195,7 @@ class LLVM_LIBRARY_VISIBILITY RISCV32TargetInfo : public RISCVTargetInfo {
void setMaxAtomicWidth() override {
MaxAtomicPromoteWidth = 128;
- if (ISAInfo->hasExtension("a"))
+ if (ISAInfo->hasExtension("a") || ISAInfo->hasExtension("zalrsc"))
MaxAtomicInlineWidth = 32;
}
};
@@ -225,7 +225,7 @@ class LLVM_LIBRARY_VISIBILITY RISCV64TargetInfo : public RISCVTargetInfo {
void setMaxAtomicWidth() override {
MaxAtomicPromoteWidth = 128;
- if (ISAInfo->hasExtension("a"))
+ if (ISAInfo->hasExtension("a") || ISAInfo->hasExtension("zalrsc"))
MaxAtomicInlineWidth = 64;
}
};
diff --git a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
index 5dd4bf415a23c..8f033b6b14038 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
@@ -109,12 +109,72 @@ bool RISCVExpandAtomicPseudo::expandMI(MachineBasicBlock &MBB,
// expanded instructions for each pseudo is correct in the Size field of the
// tablegen definition for the pseudo.
switch (MBBI->getOpcode()) {
+ case RISCV::PseudoAtomicSwap32:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Xchg, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicSwap64:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Xchg, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadAdd32:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Add, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadAdd64:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Add, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadSub32:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Sub, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadSub64:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Sub, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadAnd32:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::And, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadAnd64:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::And, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadOr32:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Or, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadOr64:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Or, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadXor32:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Xor, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadXor64:
+ return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Xor, false, 64,
+ NextMBBI);
case RISCV::PseudoAtomicLoadNand32:
return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Nand, false, 32,
NextMBBI);
case RISCV::PseudoAtomicLoadNand64:
return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Nand, false, 64,
NextMBBI);
+ case RISCV::PseudoAtomicLoadMin32:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::Min, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadMin64:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::Min, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadMax32:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::Max, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadMax64:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::Max, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadUMin32:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::UMin, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadUMin64:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::UMin, false, 64,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadUMax32:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::UMax, false, 32,
+ NextMBBI);
+ case RISCV::PseudoAtomicLoadUMax64:
+ return expandAtomicMinMaxOp(MBB, MBBI, AtomicRMWInst::UMax, false, 64,
+ NextMBBI);
case RISCV::PseudoMaskedAtomicSwap32:
return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Xchg, true, 32,
NextMBBI);
@@ -264,6 +324,9 @@ static void doAtomicBinOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
Register ScratchReg = MI.getOperand(1).getReg();
Register AddrReg = MI.getOperand(2).getReg();
Register IncrReg = MI.getOperand(3).getReg();
+ bool IsUnsigned = BinOp == AtomicRMWInst::UMin ||
+ BinOp == AtomicRMWInst::UMax;
+ bool Zext = IsUnsigned && STI->is64Bit() && Width == 32;
AtomicOrdering Ordering =
static_cast<AtomicOrdering>(MI.getOperand(4).getImm());
@@ -277,6 +340,36 @@ static void doAtomicBinOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
switch (BinOp) {
default:
llvm_unreachable("Unexpected AtomicRMW BinOp");
+ case AtomicRMWInst::Xchg:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::ADDI), ScratchReg)
+ .addReg(IncrReg)
+ .addImm(0);
+ break;
+ case AtomicRMWInst::Add:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::ADD), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::Sub:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::SUB), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::And:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::AND), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::Or:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::OR), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::Xor:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::XOR), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
case AtomicRMWInst::Nand:
BuildMI(LoopMBB, DL, TII->get(RISCV::AND), ScratchReg)
.addReg(DestReg)
@@ -285,6 +378,34 @@ static void doAtomicBinOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
.addReg(ScratchReg)
.addImm(-1);
break;
+ case AtomicRMWInst::Min:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::MIN), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::Max:
+ BuildMI(LoopMBB, DL, TII->get(RISCV::MAX), ScratchReg)
+ .addReg(DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::UMin:
+ if (Zext)
+ BuildMI(LoopMBB, DL, TII->get(RISCV::ADD_UW), ScratchReg)
+ .addReg(DestReg)
+ .addReg(RISCV::X0);
+ BuildMI(LoopMBB, DL, TII->get(RISCV::MINU), ScratchReg)
+ .addReg(Zext ? ScratchReg : DestReg)
+ .addReg(IncrReg);
+ break;
+ case AtomicRMWInst::UMax:
+ if (Zext)
+ BuildMI(LoopMBB, DL, TII->get(RISCV::ADD_UW), ScratchReg)
+ .addReg(DestReg)
+ .addReg(RISCV::X0);
+ BuildMI(LoopMBB, DL, TII->get(RISCV::MAXU), ScratchReg)
+ .addReg(Zext ? ScratchReg : DestReg)
+ .addReg(IncrReg);
+ break;
}
BuildMI(LoopMBB, DL, TII->get(getSCForRMW(Ordering, Width, STI)), ScratchReg)
.addReg(ScratchReg)
@@ -433,38 +554,105 @@ static void insertSext(const RISCVInstrInfo *TII, DebugLoc DL,
.addReg(ShamtReg);
}
-bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
- MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
- AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
- MachineBasicBlock::iterator &NextMBBI) {
- assert(IsMasked == true &&
- "Should only need to expand masked atomic max/min");
- assert(Width == 32 && "Should never need to expand masked 64-bit operations");
+static void insertZext(const RISCVInstrInfo *TII, DebugLoc DL,
+ MachineBasicBlock *MBB, Register ValReg,
+ Register SrcReg, int64_t Shamt) {
+ BuildMI(MBB, DL, TII->get(RISCV::SLLI), ValReg)
+ .addReg(SrcReg)
+ .addImm(Shamt);
+ BuildMI(MBB, DL, TII->get(RISCV::SRLI), ValReg)
+ .addReg(ValReg)
+ .addImm(Shamt);
+}
- MachineInstr &MI = *MBBI;
- DebugLoc DL = MI.getDebugLoc();
- MachineFunction *MF = MBB.getParent();
- auto LoopHeadMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
- auto LoopIfBodyMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
- auto LoopTailMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
- auto DoneMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
+static void doAtomicMinMaxOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
+ DebugLoc DL, MachineBasicBlock *ThisMBB,
+ MachineBasicBlock *LoopHeadMBB,
+ MachineBasicBlock *LoopIfBodyMBB,
+ MachineBasicBlock *LoopTailMBB,
+ MachineBasicBlock *DoneMBB,
+ AtomicRMWInst::BinOp BinOp, int Width,
+ const RISCVSubtarget *STI) {
+ Register DestReg = MI.getOperand(0).getReg();
+ Register ScratchReg = MI.getOperand(1).getReg();
+ Register AddrReg = MI.getOperand(2).getReg();
+ Register IncrReg = MI.getOperand(3).getReg();
+ bool IsUnsigned = BinOp == AtomicRMWInst::UMin ||
+ BinOp == AtomicRMWInst::UMax;
+ bool Zext = IsUnsigned && STI->is64Bit() && Width == 32;
+ AtomicOrdering Ordering =
+ static_cast<AtomicOrdering>(MI.getOperand(4).getImm());
- // Insert new MBBs.
- MF->insert(++MBB.getIterator(), LoopHeadMBB);
- MF->insert(++LoopHeadMBB->getIterator(), LoopIfBodyMBB);
- MF->insert(++LoopIfBodyMBB->getIterator(), LoopTailMBB);
- MF->insert(++LoopTailMBB->getIterator(), DoneMBB);
+ // .loophead:
+ // lr.[w|d] dest, (addr)
+ // mv scratch, dest
+ // ifnochangeneeded scratch, incr, .looptail
+ BuildMI(LoopHeadMBB, DL, TII->get(getLRForRMW(Ordering, Width, STI)), DestReg)
+ .addReg(AddrReg);
+ if (Zext)
+ insertZext(TII, DL, LoopHeadMBB, ScratchReg, DestReg, 32);
+ else
+ BuildMI(LoopHeadMBB, DL, TII->get(RISCV::ADDI), ScratchReg)
+ .addReg(DestReg)
+ .addImm(0);
+ switch (BinOp) {
+ default:
+ llvm_unreachable("Unexpected AtomicRMW BinOp");
+ case AtomicRMWInst::Max: {
+ BuildMI(LoopHeadMBB, DL, TII->get(RISCV::BGE))
+ .addReg(ScratchReg)
+ .addReg(IncrReg)
+ .addMBB(LoopTailMBB);
+ break;
+ }
+ case AtomicRMWInst::Min: {
+ BuildMI(LoopHeadMBB, DL, TII->get(RISCV::BGE))
+ .addReg(IncrReg)
+ .addReg(ScratchReg)
+ .addMBB(LoopTailMBB);
+ break;
+ }
+ case AtomicRMWInst::UMax:
+ BuildMI(LoopHeadMBB, DL, TII->get(RISCV::BGEU))
+ .addReg(ScratchReg)
+ .addReg(IncrReg)
+ .addMBB(LoopTailMBB);
+ break;
+ case AtomicRMWInst::UMin:
+ BuildMI(LoopHeadMBB, DL, TII->get(RISCV::BGEU))
+ .addReg(IncrReg)
+ .addReg(ScratchReg)
+ .addMBB(LoopTailMBB);
+ break;
+ }
- // Set up successors and transfer remaining instructions to DoneMBB.
- LoopHeadMBB->addSuccessor(LoopIfBodyMBB);
- LoopHeadMBB->addSuccessor(LoopTailMBB);
- LoopIfBodyMBB->addSuccessor(LoopTailMBB);
- LoopTailMBB->addSuccessor(LoopHeadMBB);
- LoopTailMBB->addSuccessor(DoneMBB);
- DoneMBB->splice(DoneMBB->end(), &MBB, MI, MBB.end());
- DoneMBB->transferSuccessors(&MBB);
- MBB.addSuccessor(LoopHeadMBB);
+ // .loopifbody:
+ // mv scratch, incr
+ BuildMI(LoopIfBodyMBB, DL, TII->get(RISCV::ADDI), ScratchReg)
+ .addReg(IncrReg)
+ .addImm(0);
+
+ // .looptail:
+ // sc.[w|d] scratch, scratch, (addr)
+ // bnez scratch, loop
+ BuildMI(LoopTailMBB, DL, TII->get(getSCForRMW(Ordering, Width, STI)), ScratchReg)
+ .addReg(ScratchReg)
+ .addReg(AddrReg);
+ BuildMI(LoopTailMBB, DL, TII->get(RISCV::BNE))
+ .addReg(ScratchReg)
+ .addReg(RISCV::X0)
+ .addMBB(LoopHeadMBB);
+}
+static void doMaskedAtomicMinMaxOpExpansion(const RISCVInstrInfo *TII, MachineInstr &MI,
+ DebugLoc DL, MachineBasicBlock *ThisMBB,
+ MachineBasicBlock *LoopHeadMBB,
+ MachineBasicBlock *LoopIfBodyMBB,
+ MachineBasicBlock *LoopTailMBB,
+ MachineBasicBlock *DoneMBB,
+ AtomicRMWInst::BinOp BinOp, int Width,
+ const RISCVSubtarget *STI) {
+ assert(Width == 32 && "Should never need to expand masked 64-bit operations");
Register DestReg = MI.getOperand(0).getReg();
Register Scratch1Reg = MI.getOperand(1).getReg();
Register Scratch2Reg = MI.getOperand(2).getReg();
@@ -541,6 +729,48 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
.addReg(Scratch1Reg)
.addReg(RISCV::X0)
.addMBB(LoopHeadMBB);
+}
+
+bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
+ MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+ AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
+ MachineBasicBlock::iterator &NextMBBI) {
+
+ // Prefer expansion using MIN/MAX/MINU/MAXU instructions if available
+ if (STI->hasStdExtZbb() && !IsMasked)
+ return expandAtomicBinOp(MBB, MBBI, BinOp, IsMasked, Width, NextMBBI);
+
+ MachineInstr &MI = *MBBI;
+ DebugLoc DL = MI.getDebugLoc();
+ MachineFunction *MF = MBB.getParent();
+ auto LoopHeadMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
+ auto LoopIfBodyMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
+ auto LoopTailMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
+ auto DoneMBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock());
+
+ // Insert new MBBs.
+ MF->insert(++MBB.getIterator(), LoopHeadMBB);
+ MF->insert(++LoopHeadMBB->getIterator(), LoopIfBodyMBB);
+ MF->insert(++LoopIfBodyMBB->getIterator(), LoopTailMBB);
+ MF->insert(++LoopTailMBB->getIterator(), DoneMBB);
+
+ // Set up successors and transfer remaining instructions to DoneMBB.
+ LoopHeadMBB->addSuccessor(LoopIfBodyMBB);
+ LoopHeadMBB->addSuccessor(LoopTailMBB);
+ LoopIfBodyMBB->addSuccessor(LoopTailMBB);
+ LoopTailMBB->addSuccessor(LoopHeadMBB);
+ LoopTailMBB->addSuccessor(DoneMBB);
+ DoneMBB->splice(DoneMBB->end(), &MBB, MI, MBB.end());
+ DoneMBB->transferSuccessors(&MBB);
+ MBB.addSuccessor(LoopHeadMBB);
+
+ if (!IsMasked)
+ doAtomicMinMaxOpExpansion(TII, MI, DL, &MBB, LoopHeadMBB, LoopIfBodyMBB,
+ LoopTailMBB, DoneMBB, BinOp, Width, STI);
+ else
+ doMaskedAtomicMinMaxOpExpansion(TII, MI, DL, &MBB, LoopHeadMBB,
+ LoopIfBodyMBB, LoopTailMBB, DoneMBB, BinOp,
+ Width, STI);
NextMBBI = MBB.end();
MI.eraseFromParent();
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 5ceb477069188..f5c12998a8baf 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -218,6 +218,7 @@ def HasStdExtZaamo
: Predicate<"Subtarget->hasStdExtZaamo()">,
AssemblerPredicate<(any_of FeatureStdExtZaamo),
"'Zaamo' (Atomic Memory Operations)">;
+def NoStdExtZaamo : Predicate<"!Subtarget->hasStdExtZaamo()">;
def FeatureStdExtZalrsc
: RISCVExtension<1, 0, "Load-Reserved/Store-Conditional">;
@@ -1861,7 +1862,7 @@ def FeatureForcedAtomics : SubtargetFeature<
"forced-atomics", "HasForcedAtomics", "true",
"Assume that lock-free native-width atomics are available">;
def HasAtomicLdSt
- : Predicate<"Subtarget->hasStdExtA() || Subtarget->hasForcedAtomics()">;
+ : Predicate<"Subtarget->hasStdExtZalrsc() || Subtarget->hasForcedAtomics()">;
def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
"AllowTaggedGlobals",
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 7123a2d706787..1e1b4b785059f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -688,7 +688,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
else if (Subtarget.hasStdExtZicbop())
setOperationAction(ISD::PREFETCH, MVT::Other, Legal);
- if (Subtarget.hasStdExtA()) {
+ if (Subtarget.hasStdExtZalrsc()) {
setMaxAtomicSizeInBitsSupported(Subtarget.getXLen());
if (Subtarget.hasStdExtZabha() && Subtarget.hasStdExtZacas())
setMinCmpXchgSizeInBits(8);
@@ -1558,7 +1558,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
}
}
- if (Subtarget.hasStdExtA())
+ if (Subtarget.hasStdExtZaamo())
setOperationAction(ISD::ATOMIC_LOAD_SUB, XLenVT, Expand);
if (Subtarget.hasForcedAtomics()) {
@@ -21876,7 +21876,7 @@ unsigned RISCVTargetLowering::ComputeNumSignBitsForTargetNode(
// result is then sign extended to XLEN. With +A, the minimum width is
// 32 for both 64 and 32.
assert(getMinCmpXchgSizeInBits() == 32);
- assert(Subtarget.hasStdExtA());
+ assert(Subtarget.hasStdExtZalrsc());
return Op.getValueSizeInBits() - 31;
}
break;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
index 571d72f904ca7..8b88b2ab89941 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
@@ -308,7 +308,67 @@ class PseudoMaskedAMOMinMaxPat<Intrinsic intrin, Pseudo AMOInst>
(AMOInst GPR:$addr, GPR:$incr, GPR:$mask, GPR:$shiftamt,
timm:$ordering)>;
-let Predicates = [HasStdExtA] in {
+let Predicates = [HasStdExtZalrsc, NoStdExtZaamo] in {
+
+let Size = 16 in {
+def PseudoAtomicSwap32 : PseudoAMO;
+def PseudoAtomicLoadAdd32 : PseudoAMO;
+def PseudoAtomicLoadSub32 : PseudoAMO;
+def PseudoAtomicLoadAnd32 : PseudoAMO;
+def PseudoAtomicLoadOr32 : PseudoAMO;
+def PseudoAtomicLoadXor32 : PseudoAMO;
+} // Size = 16
+let Size = 24 in {
+def PseudoAtomicLoadMax32 : PseudoAMO;
+def PseudoAtomicLoadMin32 : PseudoAMO;
+} // Size = 24
+let Size = 28 in {
+def PseudoAtomicLoadUMax32 : PseudoAMO;
+def PseudoAtomicLoadUMin32 : PseudoAMO;
+} // Size = 28
+
+defm : PseudoAMOPat<"atomic_swap_i32", PseudoAtomicSwap32>;
+defm : PseudoAMOPat<"atomic_load_add_i32", PseudoAtomicLoadAdd32>;
+defm : PseudoAMOPat<"atomic_load_sub_i32", PseudoAtomicLoadSub32>;
+defm : PseudoAMOPat<"atomic_load_and_i32", PseudoAtomicLoadAnd32>;
+defm : PseudoAMOPat<"atomic_load_or_i32", PseudoAtomicLoadOr32>;
+defm : PseudoAMOPat<"atomic_load_xor_i32", PseudoAtomicLoadXor32>;
+defm : PseudoAMOPat<"atomic_load_max_i32", PseudoAtomicLoadMax32>;
+defm : PseudoAMOPat<"atomic_load_min_i32", PseudoAtomicLoadMin32>;
+defm : PseudoAMOPat<"atomic_load_umax_i32", PseudoAtomicLoadUMax32>;
+defm : PseudoAMOPat<"atomic_load_umin_i32", PseudoAtomicLoadUMin32>;
+} // Predicates = [HasStdExtZalrsc, NoStdExtZaamo]
+
+let Predicates = [HasStdExtZalrsc, NoStdExtZaamo, IsRV64] in {
+
+let Size = 16 in {
+def PseudoAtomicSwap64 : PseudoAMO;
...
[truncated]
|
break; | ||
case AtomicRMWInst::UMin: | ||
if (Zext) | ||
BuildMI(LoopMBB, DL, TII->get(RISCV::ADD_UW), ScratchReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADD_UW belong to Zba not Zbb which I don't think you checked for.
add.uw, min(u), max(u), aren't allowed to be in an LR/SC loop by my reading of the ISA manual
"The dynamic
code executed between the LR and SC instructions can only contain instructions from the base ''I''
instruction set, excluding loads, stores, backward jumps, taken backward branches, JALR, FENCE,
and SYSTEM instructions. If the ''C'' extension is supported, then compressed forms of the
aforementioned ''I'' instructions are also permitted."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that. We can drop the middle patch which implements the 'Zbb' enhancement, unless there is interest in keeping this path for machines which exceed the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop it from this PR. If there is interest we can make another PR and add some way of enabling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
7deaf9f
to
1c06bd8
Compare
; RV64I-ZALRSC-NEXT: .LBB165_1: # =>This Inner Loop Header: Depth=1 | ||
; RV64I-ZALRSC-NEXT: lr.w a2, (a0) | ||
; RV64I-ZALRSC-NEXT: mv a3, a2 | ||
; RV64I-ZALRSC-NEXT: bge a3, a1, .LBB165_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upper 32 bits of a1 are garbage.
; RV64I-ZALRSC-NEXT: lr.w a2, (a0) | ||
; RV64I-ZALRSC-NEXT: slli a3, a2, 32 | ||
; RV64I-ZALRSC-NEXT: srli a3, a3, 32 | ||
; RV64I-ZALRSC-NEXT: bgeu a3, a1, .LBB175_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upper 32 bits of a1 are garbage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate.
The type legalizer is handling the 32-bit arguments properly for the cmpxchg
case because TLI.getExtendForAtomicCmpSwapArg() controls the argument promotion in DAGTypeLegalizer::PromoteIntRes_AtomicCmpSwap(), and RISCV TLI overrides this virtual to return SIGN_EXTEND for correct behavior.
No such control was plumbed for DAGTypeLegalizer::PromoteIntRes_Atomic1() used for atomicrmw <op>
cases, so we get any extended arguments always.
We could add another virtual TLI function to control this more explicitly for anyone else that might want it.
Or some early RISCV DAGCombine to adjust atomicrmw {min,max,umin,umax}
arguments and avoid the default legalizer behavior.
Or solve this in the pseudo expansion, but that is not ideal because I believe it would require a second scratch register.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference to the same issue that plagued cmpxchg
many years ago: 616289e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in additional commit: f8ac954
Let me know if you dislike this approach, or want to squash this into the first commit.
Is there an abridged history of the RISC-V "A" spec anywhere? I feel reasonably out-of-touch with how it has developed (and been split into sub-extensions). |
clang/lib/Basic/Targets/RISCV.h
Outdated
MaxAtomicPromoteWidth = 128; | ||
|
||
if (ISAInfo->hasExtension("a")) | ||
if (ISAInfo->hasExtension("a") || ISAInfo->hasExtension("zalrsc")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't a imply zalrsc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, and we could drop the redundant "a" clause. I could be in the wrong here coding-style wise, but my thinking was that "a" for atomic is well known, and "zalrsc" is not, so there is some benefit to readability / grep-ability by keeping both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer the frontend to adhere to the actual ISA split in the specs, and remove the "a" check.
How about just adding two lines of comment explaining why zalrsc and not "a" is checked here? Might be a bit redundant (the commit message will explain the change), but very effective for readability and grep-ability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I added these two extensions in the MC level but haven't gotten a chance to support them in CodeGen. :-) |
289c3e4
to
f8ac954
Compare
b21459d
to
7283821
Compare
53d6db2
to
33a5aa1
Compare
clang/lib/Basic/Targets/RISCV.cpp
Outdated
|
||
if (ISAInfo->hasExtension("a")) { | ||
// The "a" extension is composed of "zalrsc" and "zaamo" | ||
if (ISAInfo->hasExtension("zalrsc") && ISAInfo->hasExtension("zaamo")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should stay "a". I put up #163890 to make Zalrsc+Zaamo imply A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased on top of #163890
33a5aa1
to
d867abd
Compare
The 'A' atomics extension is composed of two subextensions, 'Zaamo' which has atomic memory operation instructions, and 'Zalrsc' which has load-reserve / store-conditional instructions. For machines where 'Zalrsc' is present, but 'Zaamo' is not, implement and enable atomics memory operations through pseudo expansion. Updates the predication and lowering control to be more precise about which 'Zaamo'/'Zalrsc' feature was truly requisite. There will be no functional change to subtargets supporting 'A', while allowing 'Zalrsc' only subtargets to utilize atomics at an increased code footprint.
Similar to the argument sign-extend control provided by getExtendForAtomicCmpSwapArg for `cmpxchg`, this change adds getExtendForAtomicRMWArg for `atomicrmw <op>`. This mechanism is used to correct the RISCV code generation when using atomic pseudo expansions that perform sub-word comparions inside a LR/SC block, and expect a correctly sign-extended input argument. A shorter instruction sequence is possible for umin/max by leveraging the fact that two sign-extended arguments still order correctly during an unsigned comparison.
'Zalrsc' only subtargets are atomics capable, ensure the RISCV target exposes this correctly.
f0da61d
to
a676c2d
Compare
The 'A' atomics extension is composed of two subextensions, 'Zaamo' which has
atomic memory operation instructions, and 'Zalrsc' which has load-reserve /
store-conditional instructions.
For machines where 'Zalrsc' is present, but 'Zaamo' is not, implement and enable
atomics memory operations through pseudo expansion. Updates the predication
and lowering control to be more precise about which 'Zaamo'/'Zalrsc' feature
was truly requisite.
There will be no functional change to subtargets supporting 'A', while allowing
'Zalrsc' only subtargets to utilize atomics at an increased code footprint.