Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions llvm/lib/CodeGen/LiveVariables.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,22 @@ LiveVariables::FindLastPartialDef(Register Reg,
return LastDef;
}

static void fixupLiveIns(MachineInstr &MI, MCPhysReg SubReg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong in this PR (and we should also delete LiveVariables)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was necessary to get the added testcase to pass; otherwise it would crash in LiveVariables.

MachineBasicBlock &MBB = *MI.getParent();
if (MBB.isLiveIn(SubReg))
return;

for (MachineBasicBlock::reverse_iterator RIt = MI.getReverseIterator();
RIt != MBB.rend(); RIt++)
if (RIt->definesRegister(SubReg, nullptr))
return;

MBB.addLiveIn(SubReg);
for (const auto &PredMBB : MBB.predecessors())
if (!PredMBB->empty())
fixupLiveIns(PredMBB->back(), SubReg);
}

/// HandlePhysRegUse - Turn previous partial def's into read/mod/writes. Add
/// implicit defs to a machine instruction if there was an earlier def of its
/// super-register.
Expand Down Expand Up @@ -279,6 +295,7 @@ void LiveVariables::HandlePhysRegUse(Register Reg, MachineInstr &MI) {
LastPartialDef->addOperand(MachineOperand::CreateReg(SubReg,
false/*IsDef*/,
true/*IsImp*/));
fixupLiveIns(MI, SubReg);
PhysRegDef[SubReg] = LastPartialDef;
Processed.insert_range(TRI->subregs(SubReg));
}
Expand Down
11 changes: 6 additions & 5 deletions llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -256,17 +256,18 @@ AMDGPUResourceUsageAnalysisImpl::analyzeResourceUsage(

const Function *Callee = getCalleeFunction(*CalleeOp);

auto isSameFunction = [](const MachineFunction &MF, const Function *F) {
return F == &MF.getFunction();
};

// Avoid crashing on undefined behavior with an illegal call to a
// kernel. If a callsite's calling convention doesn't match the
// function's, it's undefined behavior. If the callsite calling
// convention does match, that would have errored earlier.
if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()))
if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()) &&
!isSameFunction(MF, Callee))
report_fatal_error("invalid call to entry function");
Comment on lines +267 to 269
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I may be missing something but why is this error message being constrained?
Or, to phrase it differently, why are self-recursive (entry) functions allowed beyond this error now?
Also, is there a test case that showcases this? If this change to ResourceUsageAnalysis is worth it, it may be better off in its own PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added testcase, from a fuzzer, uses self-recursion. If that's not allowed, please let me know so the fuzzer can be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, the isEntryFunctionCC-ness of this is the blocker on why it cannot be a callee, regardless of (self-)recursion. Recursion itself is possible (albeit dubious in terms of resource allocation and runtime use, but I digress) so long as it isn't marked as an entry function.


auto isSameFunction = [](const MachineFunction &MF, const Function *F) {
return F == &MF.getFunction();
};

if (Callee && !isSameFunction(MF, Callee))
Info.Callees.push_back(Callee);

Expand Down
7 changes: 5 additions & 2 deletions llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -898,12 +898,13 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI,
TII->get(AMDGPU::V_READFIRSTLANE_B32), TmpReg)
.add(MI.getOperand(1));
MI.getOperand(1).setReg(TmpReg);
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No else after return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind I understand now.

} else if (tryMoveVGPRConstToSGPR(MI.getOperand(1), DstReg, MI.getParent(),
MI, MI.getDebugLoc())) {
I = std::next(I);
MI.eraseFromParent();
return true;
}
return true;
}
if (!SrcReg.isVirtual() || TRI->isAGPR(*MRI, SrcReg)) {
SIInstrWorklist worklist;
Expand All @@ -929,7 +930,9 @@ void SIFixSGPRCopies::analyzeVGPRToSGPRCopy(MachineInstr* MI) {
if (PHISources.contains(MI))
return;
Register DstReg = MI->getOperand(0).getReg();
const TargetRegisterClass *DstRC = MRI->getRegClass(DstReg);
const TargetRegisterClass *DstRC = DstReg.isVirtual()
? MRI->getRegClass(DstReg)
: TRI->getPhysRegBaseClass(DstReg);

V2SCopyInfo Info(getNextVGPRToSGPRCopyId(), MI,
TRI->getRegSizeInBits(*DstRC));
Expand Down
44 changes: 44 additions & 0 deletions llvm/test/CodeGen/AMDGPU/511263.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -O1 < %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be piped to FileCheck + add checks on expected output (can probably just use update_llc_test_check.py)

I think this test file could use some cleanup (manually or automated through llvm-reduce) and a file rename to also add the swdev prefix + what's being checked.

source_filename = "i1-copy-from-loop.ll"

@G = global ptr addrspace(8) poison

define amdgpu_ps void @i1_copy_from_loop(ptr addrspace(8) inreg %rsrc, i32 %tid) {
entry:
br label %for.body

for.body: ; preds = %end.loop, %entry
%i = phi i32 [ 0, %entry ], [ %i.inc, %end.loop ]
%LGV = load ptr addrspace(8), ptr @G, align 8
%cc = icmp ult i32 %i, 4
call void @i1_copy_from_loop(ptr addrspace(8) %LGV, i32 -2147483648)
br i1 %cc, label %mid.loop, label %for.end

mid.loop: ; preds = %for.body
%v = call float @llvm.amdgcn.struct.ptr.buffer.load.f32(ptr addrspace(8) %rsrc, i32 %tid, i32 %i, i32 0, i32 0)
%cc2 = fcmp oge float %v, 0.000000e+00
br i1 %cc2, label %end.loop, label %for.end

end.loop: ; preds = %mid.loop
%i.inc = add i32 %i, 1
br label %for.body

for.end: ; preds = %mid.loop, %for.body
br i1 %cc, label %if, label %end

if: ; preds = %for.end
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float undef, float undef, float undef, float undef, i1 true, i1 true)
br label %end

end: ; preds = %if, %for.end
ret void
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
declare float @llvm.amdgcn.struct.ptr.buffer.load.f32(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) #1

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }

This file was deleted.

65 changes: 0 additions & 65 deletions llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll

This file was deleted.

This file was deleted.

10 changes: 4 additions & 6 deletions llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 2> %t.err < %s | FileCheck %s
; RUN: FileCheck -check-prefix=ERR %s < %t.err
; FIXME: These tests cannot be tail called, and should be executed in a waterfall loop.
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s

declare hidden void @void_func_i32_inreg(i32 inreg)

Expand All @@ -20,11 +18,11 @@ define void @tail_call_i32_inreg_divergent(i32 %vgpr) {
; CHECK-NEXT: v_writelane_b32 v40, s16, 2
; CHECK-NEXT: s_addk_i32 s32, 0x400
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_getpc_b64 s[16:17]
; CHECK-NEXT: s_add_u32 s16, s16, void_func_i32_inreg@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s17, s17, void_func_i32_inreg@rel32@hi+12
; CHECK-NEXT: ; illegal copy v0 to s0
; CHECK-NEXT: v_readfirstlane_b32 s0, v0
; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
; CHECK-NEXT: v_readlane_b32 s30, v40, 0
Expand Down Expand Up @@ -58,8 +56,8 @@ define void @indirect_tail_call_i32_inreg_divergent(i32 %vgpr) {
; CHECK-NEXT: s_addc_u32 s17, s17, constant@rel32@hi+12
; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
; CHECK-NEXT: v_readfirstlane_b32 s0, v0
; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: ; illegal copy v0 to s0
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
Expand Down
1 change: 0 additions & 1 deletion llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
; XFAIL: *
; REQUIRES: asserts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't require asserts, and also needs to check what is produced

; RUN: llc -mtriple=amdgcn -mcpu=bonaire -verify-machineinstrs < %s

Expand Down