Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1417,9 +1417,8 @@ static bool shouldRunLdsBranchVmemWARHazardFixup(const MachineFunction &MF,
bool HasVmem = false;
for (auto &MBB : MF) {
for (auto &MI : MBB) {
HasLds |= SIInstrInfo::isDS(MI);
HasVmem |= (SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isFLAT(MI)) ||
SIInstrInfo::isSegmentSpecificFLAT(MI);
HasLds |= SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI);
HasVmem |= SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isLDSDMA(MI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having a deep understanding of the hazard, it is not clear to me why you would exclude isLDSDMA here. I.e. why shouldn't an instruction like GLOBAL_LOAD_LDS_DWORD satisfy both the HasLds and HasVmem conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I currently don't have a deep understanding either. While working on the other PR, I just noticed that the current implementation only considers FLAT instruction targeting Global/Scratch memory and I thought that it might make more sense to exclude LDS DMA instead.
However, I'm not sure what is actually supported by the ISA. I mean, we have an LDS bit in the FLAT instruction encoding which is described as "0 = normal, 1 = transfer data between LDS and memory instead of VGPRs and memory", which would suggest that no vgprs are touched. On the other hand, FLAT_Global_Load_LDS_Pseudo still uses VM_CNT and has SchedRW = [WriteVMEM, WriteLDS].
(Note, though, that this change introduces more synchronization in the current tests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDS covers the encoding, it's not a hasLds test. GLOBAL_LOAD_LDS_DWORD is definitively a FLAT encoded instruction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, the HasLds part makes sense. I don't understand the exemption on HasVmem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case a VMEM instruction does LDS DMA, it doesn't actually access the vector memory anymore, or is this assumption wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do actually flat FLAT instructions support the LDS bit? I thought the bit only applied to GLOBAL_* instructions (or in some ISAs the dedicated LDS opcodes), but I can't seem to find this as a documented restriction.

I'd guess they don't really support LDS->LDS copy, so I would assume they implicitly assume a global memory source address if the bit isn't just ignored. In any case I don't think we ever codegen a FLAT_* instruction with the LDS bit.

I think the intent here was to check for GLOBAL_* or SCRATCH_* instructions, excluding FLAT_* instructions (but all 3 sets are FLAT encoded instructions)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent here was to check for GLOBAL_* or SCRATCH_* instructions, excluding FLAT_* instructions (but all 3 sets are FLAT encoded instructions)

Agreed, that matches the GFX9/GFX10 docs that I have found. Only MUBUF, GLOBAL_* and SCRATCH_* instructions support load-to-LDS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what we actually want for HasVmem is just plain SIInstrInfo::isVMEM(MI)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what we actually want for HasVmem is just plain SIInstrInfo::isVMEM(MI)?

That seems trivially correct to me. Anything different would be weird and would need justification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

if (HasLds && HasVmem)
return true;
}
Expand All @@ -1441,10 +1440,9 @@ bool GCNHazardRecognizer::fixLdsBranchVmemWARHazard(MachineInstr *MI) {
assert(!ST.hasExtendedWaitCounts());

auto IsHazardInst = [](const MachineInstr &MI) {
if (SIInstrInfo::isDS(MI))
if (SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI))
return 1;
if ((SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isFLAT(MI)) ||
SIInstrInfo::isSegmentSpecificFLAT(MI))
if (SIInstrInfo::isVMEM(MI) && !SIInstrInfo::isLDSDMA(MI))
return 2;
return 0;
};
Expand Down
8 changes: 6 additions & 2 deletions llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
Original file line number Diff line number Diff line change
Expand Up @@ -269,11 +269,15 @@ body: |
S_ENDPGM 0
...

# GCN-LABEL: name: no_hazard_lds_branch_flat
# FLAT_* instructions "look at the per-workitem address and determine for each
# work item if the target memory address is in global, private or scratch
# memory" (RDNA2 ISA)
# GCN-LABEL: name: hazard_lds_branch_flat
# GCN: bb.1:
# GFX10-NEXT: S_WAITCNT_VSCNT undef $sgpr_null, 0
# GCN-NEXT: FLAT_LOAD_DWORD
---
name: no_hazard_lds_branch_flat
name: hazard_lds_branch_flat
body: |
bb.0:
successors: %bb.1
Expand Down