Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -989,6 +989,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
setOperationAction(ISD::FLOG2, {MVT::v2f16, MVT::v2bf16}, Expand);
}

setOperationAction(ISD::ADDRSPACECAST, {MVT::i32, MVT::i64}, Custom);

// No FPOW or FREM in PTX.

// Now deduce the information based on the above mentioned
Expand Down Expand Up @@ -2652,6 +2654,8 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
return SDValue();
case ISD::FRAMEADDR:
return SDValue();
case ISD::ADDRSPACECAST:
return LowerADDRSPACECAST(Op, DAG);
case ISD::GlobalAddress:
return LowerGlobalAddress(Op, DAG);
case ISD::INTRINSIC_W_CHAIN:
Expand Down Expand Up @@ -2767,6 +2771,17 @@ unsigned NVPTXTargetLowering::getJumpTableEncoding() const {
return MachineJumpTableInfo::EK_Inline;
}

SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op,
SelectionDAG &DAG) const {
AddrSpaceCastSDNode *N = cast<AddrSpaceCastSDNode>(Op.getNode());
unsigned SrcAS = N->getSrcAddressSpace();
unsigned DestAS = N->getDestAddressSpace();
if (SrcAS != llvm::ADDRESS_SPACE_GENERIC &&
DestAS != llvm::ADDRESS_SPACE_GENERIC)
return DAG.getUNDEF(Op.getValueType());
return Op;
}

// This function is almost a copy of SelectionDAG::expandVAArg().
// The only diff is that this one produces loads from local address space.
SDValue NVPTXTargetLowering::LowerVAARG(SDValue Op, SelectionDAG &DAG) const {
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Target/NVPTX/NVPTXISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ class NVPTXTargetLowering : public TargetLowering {
const NVPTXSubtarget &STI; // cache the subtarget here
SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;

SDValue LowerADDRSPACECAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
Expand Down
21 changes: 16 additions & 5 deletions llvm/test/CodeGen/NVPTX/addrspacecast.ll
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
; RUN: llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,CLS32,G32
; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64,G64
; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr| FileCheck %s -check-prefixes=ALL,PTRCONV,CLS64,G64
; RUN: llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,CLS32
; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64
; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | FileCheck %s -check-prefixes=ALL,PTRCONV,CLS64
; RUN: %if ptxas && !ptxas-12.0 %{ llc -O0 < %s -mtriple=nvptx -mcpu=sm_20 | %ptxas-verify %}
; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 | %ptxas-verify %}
; RUN: %if ptxas %{ llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_20 --nvptx-short-ptr | %ptxas-verify %}

; ALL-LABEL: conv1
define i32 @conv1(ptr addrspace(1) %ptr) {
; G32: cvta.global.u32
; CLS32: cvta.global.u32
; ALL-NOT: cvt.u64.u32
; G64: cvta.global.u64
; CLS64: cvta.global.u64
; ALL: ld.u32
%genptr = addrspacecast ptr addrspace(1) %ptr to ptr
%val = load i32, ptr %genptr
Expand Down Expand Up @@ -99,6 +99,17 @@ define i32 @conv8(ptr %ptr) {
ret i32 %val
}

; ALL-LABEL: conv9
define i32 @conv9(ptr addrspace(1) %ptr) {
; CLS32: // implicit-def: %[[ADDR:r[0-9]+]]
; PTRCONV: // implicit-def: %[[ADDR:r[0-9]+]]
; NOPTRCONV: // implicit-def: %[[ADDR:rd[0-9]+]]
; ALL: ld.shared.u32 %r{{[0-9]+}}, [%[[ADDR]]]
%specptr = addrspacecast ptr addrspace(1) %ptr to ptr addrspace(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that allowing ASCs via generic AS for any AS combination is the right thing to do here.

While we technically can generate PTX that compiles, doing so when it's clearly an error is not a great choice, IMO. It pushes the error detection from compilation phase to runtime and substantially raises the cost of dealing with the consequences. While I agree that diagnostics by crashing is not a good user interface, not diagnosing the problem is worse.

I think incompatible ASC in IR should still be an error. In this particular case we have all the information we need to diagnose invalid ASC early on (IR validation pass on load, perhaps?) and may be able to fail with a somewhat more sensible diagnostic via llvm::report_fatal_error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree and I think we ought to handle invalid addrspacecasts as poison, and stop treating them as a backend error. As it is it is possible to write an assume that introduces UB, resulting in a compiler error depending on optimization level which is a bad property to have

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that if such an invalid ASC would be placed in a conditional branch that would or would not be eliminated, it would result in back-end crash if that branch was not eliminated by the optimizations?

If that's the case, then it's exactly the problem we have now, with the back-end crashing when we have no way to lower the bad ASC, and I agree that crash in the back-end is not something we want. It's way too late.

I was thinking diagnosing the error early on, if possible. I.e. treat it as if it was a target-specific syntax error, triggered when the back-end knows up-front that such a combination is invalid.

Treating invalid ASC as poison is indeed fundamentally more sound, at the expense of practical usability. We do know that the input is wrong, but have no way to make user aware of that. LLVM's feedback mechanisms are not great.

If we do need compilation to succeed, then I'd rather generate a trap for the invalid ASC, possibly with inline asm comment, explaining what's going on. At least the failure will be obvious, and, maybe, even somewhat debuggable. It's less bad than having to chase mysteriously mangled invalid pointer created by nonsensical ASC laundered via conversion to generic AS and back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU currently uses DiagnosticInfo to report the invalid cast, plus lower to undef. I've been meaning to remove the report error case for a really long time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #127487, this happened to come up again since 64-bit flat atomicrmws have apparently been broken since October at -O0

%val = load i32, ptr addrspace(3) %specptr
ret i32 %val
}

; Check that we support addrspacecast when splitting the vector
; result (<2 x ptr> => 2 x <1 x ptr>).
; This also checks that scalarization works for addrspacecast
Expand Down