Skip to content

Commit eb76404

Browse files
authored
[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062)
## Short Summary This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI for ZA state (e.g., lazy saves and agnostic ZA functions). This is currently not enabled by default (but aims to be by LLVM 22). The goal is for this new pass to more optimally place ZA saves/restores and to work with exception handling. ## Long Description This patch reimplements management of ZA state for functions with private and shared ZA state. Agnostic ZA functions will be handled in a later patch. For now, this is under the flag `-aarch64-new-sme-abi`, however, we intend for this to replace the current SelectionDAG implementation once complete. The approach taken here is to mark instructions as needing ZA to be in a specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly defining or using ZA registers (such as $zt0 or $zab0) require the "ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE" state depending on the callee (having shared or private ZA). We already add ZA register uses/definitions to machine instructions, so no extra work is needed to mark these. Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START. These markers are then used by the MachineSMEABIPass to find instructions where there is a transition between required ZA states. These are the points we need to insert code to set up or restore a ZA save (or initialize ZA). To handle control flow between blocks (which may have different ZA state requirements), we bundle the incoming and outgoing edges of blocks. Bundles are formed by assigning each block an incoming and outgoing bundle (initially, all blocks have their own two bundles). Bundles are then combined by joining the outgoing bundle of a block with the incoming bundle of all successors. These bundles are then assigned a ZA state based on the blocks that participate in the bundle. Blocks whose incoming edges are in a bundle "vote" for a ZA state that matches the state required at the first instruction in the block, and likewise, blocks whose outgoing edges are in a bundle vote for the ZA state that matches the last instruction in the block. The ZA state with the most votes is used, which aims to minimize the number of state transitions.
1 parent 4ab87ff commit eb76404

24 files changed

+4194
-501
lines changed

llvm/lib/Target/AArch64/AArch64.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ FunctionPass *createAArch64CleanupLocalDynamicTLSPass();
6060
FunctionPass *createAArch64CollectLOHPass();
6161
FunctionPass *createSMEABIPass();
6262
FunctionPass *createSMEPeepholeOptPass();
63+
FunctionPass *createMachineSMEABIPass();
6364
ModulePass *createSVEIntrinsicOptsPass();
6465
InstructionSelector *
6566
createAArch64InstructionSelector(const AArch64TargetMachine &,
@@ -111,6 +112,7 @@ void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
111112
void initializeLDTLSCleanupPass(PassRegistry&);
112113
void initializeSMEABIPass(PassRegistry &);
113114
void initializeSMEPeepholeOptPass(PassRegistry &);
115+
void initializeMachineSMEABIPass(PassRegistry &);
114116
void initializeSVEIntrinsicOptsPass(PassRegistry &);
115117
void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
116118
} // end namespace llvm

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp

Lines changed: 43 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,9 @@ class AArch64ExpandPseudo : public MachineFunctionPass {
9292
bool expandCALL_BTI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI);
9393
bool expandStoreSwiftAsyncContext(MachineBasicBlock &MBB,
9494
MachineBasicBlock::iterator MBBI);
95-
MachineBasicBlock *expandRestoreZA(MachineBasicBlock &MBB,
96-
MachineBasicBlock::iterator MBBI);
95+
MachineBasicBlock *
96+
expandCommitOrRestoreZASave(MachineBasicBlock &MBB,
97+
MachineBasicBlock::iterator MBBI);
9798
MachineBasicBlock *expandCondSMToggle(MachineBasicBlock &MBB,
9899
MachineBasicBlock::iterator MBBI);
99100
};
@@ -990,44 +991,69 @@ bool AArch64ExpandPseudo::expandStoreSwiftAsyncContext(
990991
return true;
991992
}
992993

993-
MachineBasicBlock *
994-
AArch64ExpandPseudo::expandRestoreZA(MachineBasicBlock &MBB,
995-
MachineBasicBlock::iterator MBBI) {
994+
static constexpr unsigned ZERO_ALL_ZA_MASK = 0b11111111;
995+
996+
MachineBasicBlock *AArch64ExpandPseudo::expandCommitOrRestoreZASave(
997+
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI) {
996998
MachineInstr &MI = *MBBI;
999+
bool IsRestoreZA = MI.getOpcode() == AArch64::RestoreZAPseudo;
1000+
assert((MI.getOpcode() == AArch64::RestoreZAPseudo ||
1001+
MI.getOpcode() == AArch64::CommitZASavePseudo) &&
1002+
"Expected ZA commit or restore");
9971003
assert((std::next(MBBI) != MBB.end() ||
9981004
MI.getParent()->successors().begin() !=
9991005
MI.getParent()->successors().end()) &&
10001006
"Unexpected unreachable in block that restores ZA");
10011007

10021008
// Compare TPIDR2_EL0 value against 0.
10031009
DebugLoc DL = MI.getDebugLoc();
1004-
MachineInstrBuilder Cbz = BuildMI(MBB, MBBI, DL, TII->get(AArch64::CBZX))
1005-
.add(MI.getOperand(0));
1010+
MachineInstrBuilder Branch =
1011+
BuildMI(MBB, MBBI, DL,
1012+
TII->get(IsRestoreZA ? AArch64::CBZX : AArch64::CBNZX))
1013+
.add(MI.getOperand(0));
10061014

10071015
// Split MBB and create two new blocks:
10081016
// - MBB now contains all instructions before RestoreZAPseudo.
1009-
// - SMBB contains the RestoreZAPseudo instruction only.
1010-
// - EndBB contains all instructions after RestoreZAPseudo.
1017+
// - SMBB contains the [Commit|RestoreZA]Pseudo instruction only.
1018+
// - EndBB contains all instructions after [Commit|RestoreZA]Pseudo.
10111019
MachineInstr &PrevMI = *std::prev(MBBI);
10121020
MachineBasicBlock *SMBB = MBB.splitAt(PrevMI, /*UpdateLiveIns*/ true);
10131021
MachineBasicBlock *EndBB = std::next(MI.getIterator()) == SMBB->end()
10141022
? *SMBB->successors().begin()
10151023
: SMBB->splitAt(MI, /*UpdateLiveIns*/ true);
10161024

1017-
// Add the SMBB label to the TB[N]Z instruction & create a branch to EndBB.
1018-
Cbz.addMBB(SMBB);
1025+
// Add the SMBB label to the CB[N]Z instruction & create a branch to EndBB.
1026+
Branch.addMBB(SMBB);
10191027
BuildMI(&MBB, DL, TII->get(AArch64::B))
10201028
.addMBB(EndBB);
10211029
MBB.addSuccessor(EndBB);
10221030

10231031
// Replace the pseudo with a call (BL).
10241032
MachineInstrBuilder MIB =
10251033
BuildMI(*SMBB, SMBB->end(), DL, TII->get(AArch64::BL));
1026-
MIB.addReg(MI.getOperand(1).getReg(), RegState::Implicit);
1034+
// Copy operands (mainly the regmask) from the pseudo.
10271035
for (unsigned I = 2; I < MI.getNumOperands(); ++I)
10281036
MIB.add(MI.getOperand(I));
1029-
BuildMI(SMBB, DL, TII->get(AArch64::B)).addMBB(EndBB);
10301037

1038+
if (IsRestoreZA) {
1039+
// Mark the TPIDR2 block pointer (X0) as an implicit use.
1040+
MIB.addReg(MI.getOperand(1).getReg(), RegState::Implicit);
1041+
} else /*CommitZA*/ {
1042+
auto *TRI = MBB.getParent()->getSubtarget().getRegisterInfo();
1043+
// Clear TPIDR2_EL0.
1044+
BuildMI(*SMBB, SMBB->end(), DL, TII->get(AArch64::MSR))
1045+
.addImm(AArch64SysReg::TPIDR2_EL0)
1046+
.addReg(AArch64::XZR);
1047+
bool ZeroZA = MI.getOperand(1).getImm() != 0;
1048+
if (ZeroZA) {
1049+
assert(MI.definesRegister(AArch64::ZAB0, TRI) && "should define ZA!");
1050+
BuildMI(*SMBB, SMBB->end(), DL, TII->get(AArch64::ZERO_M))
1051+
.addImm(ZERO_ALL_ZA_MASK)
1052+
.addDef(AArch64::ZAB0, RegState::ImplicitDefine);
1053+
}
1054+
}
1055+
1056+
BuildMI(SMBB, DL, TII->get(AArch64::B)).addMBB(EndBB);
10311057
MI.eraseFromParent();
10321058
return EndBB;
10331059
}
@@ -1646,8 +1672,9 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
16461672
return expandCALL_BTI(MBB, MBBI);
16471673
case AArch64::StoreSwiftAsyncContext:
16481674
return expandStoreSwiftAsyncContext(MBB, MBBI);
1675+
case AArch64::CommitZASavePseudo:
16491676
case AArch64::RestoreZAPseudo: {
1650-
auto *NewMBB = expandRestoreZA(MBB, MBBI);
1677+
auto *NewMBB = expandCommitOrRestoreZASave(MBB, MBBI);
16511678
if (NewMBB != &MBB)
16521679
NextMBBI = MBB.end(); // The NextMBBI iterator is invalidated.
16531680
return true;
@@ -1658,6 +1685,8 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
16581685
NextMBBI = MBB.end(); // The NextMBBI iterator is invalidated.
16591686
return true;
16601687
}
1688+
case AArch64::InOutZAUsePseudo:
1689+
case AArch64::RequiresZASavePseudo:
16611690
case AArch64::COALESCER_BARRIER_FPR16:
16621691
case AArch64::COALESCER_BARRIER_FPR32:
16631692
case AArch64::COALESCER_BARRIER_FPR64:

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Lines changed: 91 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include "AArch64PerfectShuffle.h"
1818
#include "AArch64RegisterInfo.h"
1919
#include "AArch64Subtarget.h"
20+
#include "AArch64TargetMachine.h"
2021
#include "MCTargetDesc/AArch64AddressingModes.h"
2122
#include "Utils/AArch64BaseInfo.h"
2223
#include "Utils/AArch64SMEAttributes.h"
@@ -1998,6 +1999,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
19981999
setOperationAction(Op, MVT::f16, Promote);
19992000
}
20002001

2002+
const AArch64TargetMachine &AArch64TargetLowering::getTM() const {
2003+
return static_cast<const AArch64TargetMachine &>(getTargetMachine());
2004+
}
2005+
20012006
void AArch64TargetLowering::addTypeForNEON(MVT VT) {
20022007
assert(VT.isVector() && "VT should be a vector type");
20032008

@@ -8285,53 +8290,54 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
82858290
if (Subtarget->hasCustomCallingConv())
82868291
Subtarget->getRegisterInfo()->UpdateCustomCalleeSavedRegs(MF);
82878292

8288-
// Create a 16 Byte TPIDR2 object. The dynamic buffer
8289-
// will be expanded and stored in the static object later using a pseudonode.
8290-
if (Attrs.hasZAState()) {
8291-
TPIDR2Object &TPIDR2 = FuncInfo->getTPIDR2Obj();
8292-
TPIDR2.FrameIndex = MFI.CreateStackObject(16, Align(16), false);
8293-
SDValue SVL = DAG.getNode(AArch64ISD::RDSVL, DL, MVT::i64,
8294-
DAG.getConstant(1, DL, MVT::i32));
8295-
8296-
SDValue Buffer;
8297-
if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
8298-
Buffer = DAG.getNode(AArch64ISD::ALLOCATE_ZA_BUFFER, DL,
8299-
DAG.getVTList(MVT::i64, MVT::Other), {Chain, SVL});
8300-
} else {
8301-
SDValue Size = DAG.getNode(ISD::MUL, DL, MVT::i64, SVL, SVL);
8302-
Buffer = DAG.getNode(ISD::DYNAMIC_STACKALLOC, DL,
8303-
DAG.getVTList(MVT::i64, MVT::Other),
8304-
{Chain, Size, DAG.getConstant(1, DL, MVT::i64)});
8305-
MFI.CreateVariableSizedObject(Align(16), nullptr);
8306-
}
8307-
Chain = DAG.getNode(
8308-
AArch64ISD::INIT_TPIDR2OBJ, DL, DAG.getVTList(MVT::Other),
8309-
{/*Chain*/ Buffer.getValue(1), /*Buffer ptr*/ Buffer.getValue(0)});
8310-
} else if (Attrs.hasAgnosticZAInterface()) {
8311-
// Call __arm_sme_state_size().
8312-
SDValue BufferSize =
8313-
DAG.getNode(AArch64ISD::GET_SME_SAVE_SIZE, DL,
8314-
DAG.getVTList(MVT::i64, MVT::Other), Chain);
8315-
Chain = BufferSize.getValue(1);
8316-
8317-
SDValue Buffer;
8318-
if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
8319-
Buffer =
8320-
DAG.getNode(AArch64ISD::ALLOC_SME_SAVE_BUFFER, DL,
8321-
DAG.getVTList(MVT::i64, MVT::Other), {Chain, BufferSize});
8322-
} else {
8323-
// Allocate space dynamically.
8324-
Buffer = DAG.getNode(
8325-
ISD::DYNAMIC_STACKALLOC, DL, DAG.getVTList(MVT::i64, MVT::Other),
8326-
{Chain, BufferSize, DAG.getConstant(1, DL, MVT::i64)});
8327-
MFI.CreateVariableSizedObject(Align(16), nullptr);
8293+
if (!getTM().useNewSMEABILowering() || Attrs.hasAgnosticZAInterface()) {
8294+
// Old SME ABI lowering (deprecated):
8295+
// Create a 16 Byte TPIDR2 object. The dynamic buffer
8296+
// will be expanded and stored in the static object later using a
8297+
// pseudonode.
8298+
if (Attrs.hasZAState()) {
8299+
TPIDR2Object &TPIDR2 = FuncInfo->getTPIDR2Obj();
8300+
TPIDR2.FrameIndex = MFI.CreateStackObject(16, Align(16), false);
8301+
SDValue SVL = DAG.getNode(AArch64ISD::RDSVL, DL, MVT::i64,
8302+
DAG.getConstant(1, DL, MVT::i32));
8303+
SDValue Buffer;
8304+
if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
8305+
Buffer = DAG.getNode(AArch64ISD::ALLOCATE_ZA_BUFFER, DL,
8306+
DAG.getVTList(MVT::i64, MVT::Other), {Chain, SVL});
8307+
} else {
8308+
SDValue Size = DAG.getNode(ISD::MUL, DL, MVT::i64, SVL, SVL);
8309+
Buffer = DAG.getNode(ISD::DYNAMIC_STACKALLOC, DL,
8310+
DAG.getVTList(MVT::i64, MVT::Other),
8311+
{Chain, Size, DAG.getConstant(1, DL, MVT::i64)});
8312+
MFI.CreateVariableSizedObject(Align(16), nullptr);
8313+
}
8314+
Chain = DAG.getNode(
8315+
AArch64ISD::INIT_TPIDR2OBJ, DL, DAG.getVTList(MVT::Other),
8316+
{/*Chain*/ Buffer.getValue(1), /*Buffer ptr*/ Buffer.getValue(0)});
8317+
} else if (Attrs.hasAgnosticZAInterface()) {
8318+
// Call __arm_sme_state_size().
8319+
SDValue BufferSize =
8320+
DAG.getNode(AArch64ISD::GET_SME_SAVE_SIZE, DL,
8321+
DAG.getVTList(MVT::i64, MVT::Other), Chain);
8322+
Chain = BufferSize.getValue(1);
8323+
SDValue Buffer;
8324+
if (!Subtarget->isTargetWindows() && !hasInlineStackProbe(MF)) {
8325+
Buffer = DAG.getNode(AArch64ISD::ALLOC_SME_SAVE_BUFFER, DL,
8326+
DAG.getVTList(MVT::i64, MVT::Other),
8327+
{Chain, BufferSize});
8328+
} else {
8329+
// Allocate space dynamically.
8330+
Buffer = DAG.getNode(
8331+
ISD::DYNAMIC_STACKALLOC, DL, DAG.getVTList(MVT::i64, MVT::Other),
8332+
{Chain, BufferSize, DAG.getConstant(1, DL, MVT::i64)});
8333+
MFI.CreateVariableSizedObject(Align(16), nullptr);
8334+
}
8335+
// Copy the value to a virtual register, and save that in FuncInfo.
8336+
Register BufferPtr =
8337+
MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
8338+
FuncInfo->setSMESaveBufferAddr(BufferPtr);
8339+
Chain = DAG.getCopyToReg(Chain, DL, BufferPtr, Buffer);
83288340
}
8329-
8330-
// Copy the value to a virtual register, and save that in FuncInfo.
8331-
Register BufferPtr =
8332-
MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
8333-
FuncInfo->setSMESaveBufferAddr(BufferPtr);
8334-
Chain = DAG.getCopyToReg(Chain, DL, BufferPtr, Buffer);
83358341
}
83368342

83378343
if (CallConv == CallingConv::PreserveNone) {
@@ -8348,6 +8354,15 @@ SDValue AArch64TargetLowering::LowerFormalArguments(
83488354
}
83498355
}
83508356

8357+
if (getTM().useNewSMEABILowering()) {
8358+
// Clear new ZT0 state. TODO: Move this to the SME ABI pass.
8359+
if (Attrs.isNewZT0())
8360+
Chain = DAG.getNode(
8361+
ISD::INTRINSIC_VOID, DL, MVT::Other, Chain,
8362+
DAG.getConstant(Intrinsic::aarch64_sme_zero_zt, DL, MVT::i32),
8363+
DAG.getTargetConstant(0, DL, MVT::i32));
8364+
}
8365+
83518366
return Chain;
83528367
}
83538368

@@ -8919,7 +8934,6 @@ static SDValue emitSMEStateSaveRestore(const AArch64TargetLowering &TLI,
89198934
MachineFunction &MF = DAG.getMachineFunction();
89208935
AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
89218936
FuncInfo->setSMESaveBufferUsed();
8922-
89238937
TargetLowering::ArgListTy Args;
89248938
Args.emplace_back(
89258939
DAG.getCopyFromReg(Chain, DL, Info->getSMESaveBufferAddr(), MVT::i64),
@@ -9060,14 +9074,28 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
90609074
CallConv = CallingConv::AArch64_SVE_VectorCall;
90619075
}
90629076

9077+
// Determine whether we need any streaming mode changes.
9078+
SMECallAttrs CallAttrs = getSMECallAttrs(MF.getFunction(), *this, CLI);
9079+
bool UseNewSMEABILowering = getTM().useNewSMEABILowering();
9080+
bool IsAgnosticZAFunction = CallAttrs.caller().hasAgnosticZAInterface();
9081+
auto ZAMarkerNode = [&]() -> std::optional<unsigned> {
9082+
// TODO: Handle agnostic ZA functions.
9083+
if (!UseNewSMEABILowering || IsAgnosticZAFunction)
9084+
return std::nullopt;
9085+
if (!CallAttrs.caller().hasZAState() && !CallAttrs.caller().hasZT0State())
9086+
return std::nullopt;
9087+
return CallAttrs.requiresLazySave() ? AArch64ISD::REQUIRES_ZA_SAVE
9088+
: AArch64ISD::INOUT_ZA_USE;
9089+
}();
9090+
90639091
if (IsTailCall) {
90649092
// Check if it's really possible to do a tail call.
90659093
IsTailCall = isEligibleForTailCallOptimization(CLI);
90669094

90679095
// A sibling call is one where we're under the usual C ABI and not planning
90689096
// to change that but can still do a tail call:
9069-
if (!TailCallOpt && IsTailCall && CallConv != CallingConv::Tail &&
9070-
CallConv != CallingConv::SwiftTail)
9097+
if (!ZAMarkerNode && !TailCallOpt && IsTailCall &&
9098+
CallConv != CallingConv::Tail && CallConv != CallingConv::SwiftTail)
90719099
IsSibCall = true;
90729100

90739101
if (IsTailCall)
@@ -9119,9 +9147,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
91199147
assert(FPDiff % 16 == 0 && "unaligned stack on tail call");
91209148
}
91219149

9122-
// Determine whether we need any streaming mode changes.
9123-
SMECallAttrs CallAttrs = getSMECallAttrs(MF.getFunction(), *this, CLI);
9124-
91259150
auto DescribeCallsite =
91269151
[&](OptimizationRemarkAnalysis &R) -> OptimizationRemarkAnalysis & {
91279152
R << "call from '" << ore::NV("Caller", MF.getName()) << "' to '";
@@ -9135,7 +9160,7 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
91359160
return R;
91369161
};
91379162

9138-
bool RequiresLazySave = CallAttrs.requiresLazySave();
9163+
bool RequiresLazySave = !UseNewSMEABILowering && CallAttrs.requiresLazySave();
91399164
bool RequiresSaveAllZA = CallAttrs.requiresPreservingAllZAState();
91409165
if (RequiresLazySave) {
91419166
const TPIDR2Object &TPIDR2 = FuncInfo->getTPIDR2Obj();
@@ -9210,10 +9235,20 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
92109235
AArch64ISD::SMSTOP, DL, DAG.getVTList(MVT::Other, MVT::Glue), Chain,
92119236
DAG.getTargetConstant((int32_t)(AArch64SVCR::SVCRZA), DL, MVT::i32));
92129237

9213-
// Adjust the stack pointer for the new arguments...
9238+
// Adjust the stack pointer for the new arguments... and mark ZA uses.
92149239
// These operations are automatically eliminated by the prolog/epilog pass
9215-
if (!IsSibCall)
9240+
assert((!IsSibCall || !ZAMarkerNode) && "ZA markers require CALLSEQ_START");
9241+
if (!IsSibCall) {
92169242
Chain = DAG.getCALLSEQ_START(Chain, IsTailCall ? 0 : NumBytes, 0, DL);
9243+
if (ZAMarkerNode) {
9244+
// Note: We need the CALLSEQ_START to glue the ZAMarkerNode to, simply
9245+
// using a chain can result in incorrect scheduling. The markers refer to
9246+
// the position just before the CALLSEQ_START (though occur after as
9247+
// CALLSEQ_START lacks in-glue).
9248+
Chain = DAG.getNode(*ZAMarkerNode, DL, DAG.getVTList(MVT::Other),
9249+
{Chain, Chain.getValue(1)});
9250+
}
9251+
}
92179252

92189253
SDValue StackPtr = DAG.getCopyFromReg(Chain, DL, AArch64::SP,
92199254
getPointerTy(DAG.getDataLayout()));
@@ -9684,7 +9719,7 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
96849719
}
96859720
}
96869721

9687-
if (CallAttrs.requiresEnablingZAAfterCall())
9722+
if (RequiresLazySave || CallAttrs.requiresEnablingZAAfterCall())
96889723
// Unconditionally resume ZA.
96899724
Result = DAG.getNode(
96909725
AArch64ISD::SMSTART, DL, DAG.getVTList(MVT::Other, MVT::Glue), Result,
@@ -9706,7 +9741,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
97069741
SDValue TPIDR2_EL0 = DAG.getNode(
97079742
ISD::INTRINSIC_W_CHAIN, DL, MVT::i64, Result,
97089743
DAG.getConstant(Intrinsic::aarch64_sme_get_tpidr2, DL, MVT::i32));
9709-
97109744
// Copy the address of the TPIDR2 block into X0 before 'calling' the
97119745
// RESTORE_ZA pseudo.
97129746
SDValue Glue;
@@ -9718,7 +9752,6 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
97189752
DAG.getNode(AArch64ISD::RESTORE_ZA, DL, MVT::Other,
97199753
{Result, TPIDR2_EL0, DAG.getRegister(AArch64::X0, MVT::i64),
97209754
RestoreRoutine, RegMask, Result.getValue(1)});
9721-
97229755
// Finally reset the TPIDR2_EL0 register to 0.
97239756
Result = DAG.getNode(
97249757
ISD::INTRINSIC_VOID, DL, MVT::Other, Result,

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323

2424
namespace llvm {
2525

26+
class AArch64TargetMachine;
27+
2628
namespace AArch64 {
2729
/// Possible values of current rounding mode, which is specified in bits
2830
/// 23:22 of FPCR.
@@ -64,6 +66,8 @@ class AArch64TargetLowering : public TargetLowering {
6466
explicit AArch64TargetLowering(const TargetMachine &TM,
6567
const AArch64Subtarget &STI);
6668

69+
const AArch64TargetMachine &getTM() const;
70+
6771
/// Control the following reassociation of operands: (op (op x, c1), y) -> (op
6872
/// (op x, y), c1) where N0 is (op x, c1) and N1 is y.
6973
bool isReassocProfitable(SelectionDAG &DAG, SDValue N0,
@@ -173,6 +177,10 @@ class AArch64TargetLowering : public TargetLowering {
173177
MachineBasicBlock *EmitZTInstr(MachineInstr &MI, MachineBasicBlock *BB,
174178
unsigned Opcode, bool Op0IsDef) const;
175179
MachineBasicBlock *EmitZero(MachineInstr &MI, MachineBasicBlock *BB) const;
180+
181+
// Note: The following group of functions are only used as part of the old SME
182+
// ABI lowering. They will be removed once -aarch64-new-sme-abi=true is the
183+
// default.
176184
MachineBasicBlock *EmitInitTPIDR2Object(MachineInstr &MI,
177185
MachineBasicBlock *BB) const;
178186
MachineBasicBlock *EmitAllocateZABuffer(MachineInstr &MI,

0 commit comments

Comments
 (0)