Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions llvm/include/llvm/CodeGen/MachineFunction.h
Original file line number Diff line number Diff line change
Expand Up @@ -1168,6 +1168,8 @@ class LLVM_ABI MachineFunction {
/// Allocate and initialize a register mask with @p NumRegister bits.
uint32_t *allocateRegMask();

MutableArrayRef<uint32_t> allocateRegMaskArray();

ArrayRef<int> allocateShuffleMask(ArrayRef<int> Mask);

/// Allocate and construct an extra info structure for a `MachineInstr`.
Expand Down
6 changes: 5 additions & 1 deletion llvm/lib/CodeGen/MachineFunction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -620,11 +620,15 @@ const char *MachineFunction::createExternalSymbolName(StringRef Name) {
}

uint32_t *MachineFunction::allocateRegMask() {
return allocateRegMaskArray().data();
}

MutableArrayRef<uint32_t> MachineFunction::allocateRegMaskArray() {
unsigned NumRegs = getSubtarget().getRegisterInfo()->getNumRegs();
unsigned Size = MachineOperand::getRegMaskSize(NumRegs);
uint32_t *Mask = Allocator.Allocate<uint32_t>(Size);
memset(Mask, 0, Size * sizeof(Mask[0]));
return Mask;
return {Mask, Size};
}

ArrayRef<int> MachineFunction::allocateShuffleMask(ArrayRef<int> Mask) {
Expand Down
22 changes: 22 additions & 0 deletions llvm/lib/CodeGen/MachineOutliner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1117,6 +1117,7 @@ bool MachineOutliner::outline(
// instruction. It also updates call site information for moved
// code.
SmallSet<Register, 2> UseRegs, DefRegs;
SmallPtrSet<const uint32_t *, 2> RegMasks;
// Copy over the defs in the outlined range.
// First inst in outlined range <-- Anything that's defined in this
// ... .. range has to be added as an
Expand All @@ -1130,6 +1131,12 @@ bool MachineOutliner::outline(
MachineInstr *MI = &*Iter;
SmallSet<Register, 2> InstrUseRegs;
for (MachineOperand &MOP : MI->operands()) {
// Collect all regmasks. Merge them in the end.
if (MOP.isRegMask()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you attempt to optimize the case with 1 just for copying the pointer. But I'd simply merging reg mask in place here, without using the extra set.
Basically, you could declare RegMaskSize outside the loop, as it's a constant for the function.

uint32_t *RegMask = nullptr;
...
            if (MOP.isRegMask()) {
              const uint32_t *CurrRegMask = MOP.getRegMask();
              if (!RegMask) {
                RegMask = MF->allocateRegMask();
                memcpy(RegMask, CurrRegMask, sizeof(RegMask[0]) * RegMaskSize);
              } else {
                for (unsigned I = 0; I < RegMaskSize; ++I)
                  RegMask[I] &= CurrRegMask[I];
              }
              continue;
            }
...
            if (RegMask)
              CallInst->addOperand(MachineOperand::CreateRegMask(RegMask));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe 1-regmask is the most common case and the memory allocation in MF->allocateRegMask() should be avoided if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of RegMasks depends on the number of call instructions in the outlined candidate, and I believe this usually one. However, I think you might be forgetting that non-call instruction can clobber a register. My understanding of RegMask is that it tells you which registers could be changed after executing that instruction.

/// A RegMask operand represents a set of non-clobbered physical registers
/// on an instruction that clobbers many registers, typically a call. The
/// bit mask has a bit set for each physreg that is preserved by this
/// instruction, as described in the documentation for
/// TargetRegisterInfo::getCallPreservedMask().
///
/// Any physreg with a 0 bit in the mask is clobbered by the instruction.

But an outlined function might have other instructions as well as call instructions. I think you need to mark MOP.isDef() registers as clobbered in the mask too (I believe this is what @MatzeB was alluding to).

In that case, we will likely need to copy the mask using allocateRegMask() anyway, so I think @kyulee-com suggestion might be worth considering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to mark registers from def operands in the mask, I have a follow-up question: should we drop all the def operands in favor of using regmask, or should we add the masked registers to the def list as well? My understanding is that regmask and def operands serve different purposes - regmask is primarily used to describe calling conventions, while def operands provide additional information specific to the callee. Including all defs in the regmask seems redundant to me. Could anyone clarify the intended distinction between these two, or share best practices for handling this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. It seems like RegMask and Def/Use registers hold similar information. I think the RegMask is responsible only to tell you which registers are clobbered after an instruction (usually a call). Def/Use registers are required to know which registers to pass as inputs to a function, and which to use a return values.

There is similar code in FastISel.cpp where it uses both Def/Use (In/Out) registers and a RegMask.

// Push the arguments from the call instruction.
for (auto Reg : CLI.OutRegs)
Ops.push_back(MachineOperand::CreateReg(Reg, /*isDef=*/false));
// Push live variables for the stack map.
if (!addStackMapLiveVars(Ops, I, NumMetaOpers + NumArgs))
return false;
// Push the register mask info.
Ops.push_back(MachineOperand::CreateRegMask(
TRI.getCallPreservedMask(*FuncInfo.MF, CC)));
// Add scratch registers as implicit def and early clobber.
const MCPhysReg *ScratchRegs = TLI.getScratchRegisters(CC);
for (unsigned i = 0; ScratchRegs[i]; ++i)
Ops.push_back(MachineOperand::CreateReg(
ScratchRegs[i], /*isDef=*/true, /*isImp=*/true, /*isKill=*/false,
/*isDead=*/false, /*isUndef=*/false, /*isEarlyClobber=*/true));
// Add implicit defs (return values).
for (auto Reg : CLI.InRegs)
Ops.push_back(MachineOperand::CreateReg(Reg, /*isDef=*/true,
/*isImp=*/true));

RegMasks.insert(MOP.getRegMask());
continue;
}

// Skip over anything that isn't a register.
if (!MOP.isReg())
continue;
Expand All @@ -1153,6 +1160,21 @@ bool MachineOutliner::outline(
MI->getMF()->eraseAdditionalCallInfo(MI);
}

if (!RegMasks.empty()) {
if (RegMasks.size() == 1) {
CallInst->addOperand(
MachineOperand::CreateRegMask(*RegMasks.begin()));
} else {
auto RegMask = MF->allocateRegMaskArray();
for (unsigned I = 0; I < RegMask.size(); ++I)
RegMask[I] = UINT32_MAX;
for (const uint32_t *Mask : RegMasks)
for (unsigned I = 0; I < RegMask.size(); ++I)
RegMask[I] &= Mask[I];
CallInst->addOperand(MachineOperand::CreateRegMask(RegMask.data()));
}
}

for (const Register &I : DefRegs)
// If it's a def, add it to the call instruction.
CallInst->addOperand(
Expand Down
95 changes: 95 additions & 0 deletions llvm/test/CodeGen/AArch64/machine-outliner-regmask.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# RUN: llc -mtriple=aarch64-apple-ios -run-pass=prologepilog -run-pass=machine-outliner %s -o - | FileCheck %s
--- |
declare swiftcc void @bar()
declare void @baz(i32, i32, i32) #0

define void @test_same_regmask() #0 {
ret void
}
define void @test_different_regmasks() #0 {
ret void
}
define void @foo(i32, i32, i32, i32, i32, i32, i32, i32) #0 {
ret void
}

attributes #0 = { minsize }
...
---
name: foo
tracksRegLiveness: true
body: |
bb.0:
RET undef $lr


...
---
name: test_same_regmask
tracksRegLiveness: true
body: |
; CHECK-LABEL: name: test_same_regmask
; CHECK: bb.1:
; CHECK-NEXT: BL @OUTLINED_FUNCTION_1, csr_aarch64_aapcs

bb.0:
$sp = frame-setup SUBXri $sp, 16, 0

bb.1:
$w0 = MOVZWi 1, 0
$w1 = MOVZWi 2, 0
$w2 = MOVZWi 3, 0
BL @baz, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $w0, implicit killed $w1, implicit killed $w2, implicit-def $sp
$w0 = MOVZWi 1, 0
$w1 = MOVZWi 2, 0
$w2 = MOVZWi 3, 0
BL @baz, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $w0, implicit killed $w1, implicit killed $w2, implicit-def $sp
$sp = ADDXri $sp, 16, 0
RET undef $lr


...
---
name: test_different_regmasks
tracksRegLiveness: true
body: |
; CHECK-LABEL: name: test_different_regmasks
; CHECK: bb.1:
; CHECK-NEXT: BL @OUTLINED_FUNCTION_0, CustomRegMask($fp,$lr,$wzr,$wzr_hi,$xzr,$b8,$b9,$b10,$b11,$b12,$b13,$b14,$b15,$d8,$d9,$d10,$d11,$d12,$d13,$d14,$d15,$h8,$h9,$h10,$h11,$h12,$h13,$h14,$h15,$s8,$s9,$s10,$s11,$s12,$s13,$s14,$s15,$w19,$w20,$w22,$w23,$w24,$w25,$w26,$w27,$w28,$w29,$w30,$x19,$x20,$x22,$x23,$x24,$x25,$x26,$x27,$x28,$b8_hi,$b9_hi,$b10_hi,$b11_hi,$b12_hi,$b13_hi,$b14_hi,$b15_hi,$h8_hi,$h9_hi,$h10_hi,$h11_hi,$h12_hi,$h13_hi,$h14_hi,$h15_hi,$s8_hi,$s9_hi,$s10_hi,$s11_hi,$s12_hi,$s13_hi,$s14_hi,$s15_hi,$w19_hi,$w20_hi,$w22_hi,$w23_hi,$w24_hi,$w25_hi,$w26_hi,$w27_hi,$w28_hi,$w29_hi,$w30_hi,$d8_d9,$d9_d10,$d10_d11,$d11_d12,$d12_d13,$d13_d14,$d14_d15,$d8_d9_d10_d11,$d9_d10_d11_d12,$d10_d11_d12_d13,$d11_d12_d13_d14,$d12_d13_d14_d15,$d8_d9_d10,$d9_d10_d11,$d10_d11_d12,$d11_d12_d13,$d12_d13_d14,$d13_d14_d15,$x22_x23_x24_x25_x26_x27_x28_fp,$w22_w23,$w24_w25,$w26_w27,$w28_w29,$x28_fp,$x22_x23,$x24_x25,$x26_x27)

bb.0:
$sp = frame-setup SUBXri $sp, 16, 0

bb.1:
$w0 = MOVZWi 1, 0
$w1 = MOVZWi 2, 0
$w2 = MOVZWi 3, 0
$w3 = MOVZWi 4, 0
$w4 = MOVZWi 5, 0
$w5 = MOVZWi 6, 0
$w6 = MOVZWi 7, 0
$w7 = MOVZWi 8, 0
BL @foo, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $w0, implicit killed $w1, implicit killed $w2, implicit killed $w3, implicit killed $w4, implicit killed $w5, implicit killed $w6, implicit killed $w7, implicit-def $sp
BL @bar, csr_aarch64_aapcs_swifterror, implicit-def dead $lr, implicit $sp, implicit-def $sp
$w0 = MOVZWi 1, 0
$w1 = MOVZWi 2, 0
$w2 = MOVZWi 3, 0
$w3 = MOVZWi 4, 0
$w4 = MOVZWi 5, 0
$w5 = MOVZWi 6, 0
$w6 = MOVZWi 7, 0
$w7 = MOVZWi 8, 0
BL @foo, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $w0, implicit killed $w1, implicit killed $w2, implicit killed $w3, implicit killed $w4, implicit killed $w5, implicit killed $w6, implicit killed $w7, implicit-def $sp
BL @bar, csr_aarch64_aapcs_swifterror, implicit-def dead $lr, implicit $sp, implicit-def $sp
$w0 = MOVZWi 1, 0
$w1 = MOVZWi 2, 0
$w2 = MOVZWi 3, 0
$w3 = MOVZWi 4, 0
$w4 = MOVZWi 5, 0
$w5 = MOVZWi 6, 0
$w6 = MOVZWi 7, 0
$w7 = MOVZWi 8, 0
BL @foo, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $w0, implicit killed $w1, implicit killed $w2, implicit killed $w3, implicit killed $w4, implicit killed $w5, implicit killed $w6, implicit killed $w7, implicit-def $sp
BL @bar, csr_aarch64_aapcs_swifterror, implicit-def dead $lr, implicit $sp, implicit-def $sp
$sp = ADDXri $sp, 16, 0
RET undef $lr
Loading