Skip to content

Commit 04aabc6

Browse files
committed
Reland [CUDA][HIP] Fix CTAD for host/device constructors (#168711)
Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses device- and host-only attributes as errors to keep the syntax clean and avoid confusion. It emits a deprecation warning for host+device attributes. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: #146646
1 parent 31ec45a commit 04aabc6

File tree

9 files changed

+366
-2
lines changed

9 files changed

+366
-2
lines changed

clang/docs/HIPSupport.rst

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,57 @@ Example Usage
412412
__host__ __device__ int Four(void) __attribute__((weak, alias("_Z6__Fourv")));
413413
__host__ __device__ float Four(float f) __attribute__((weak, alias("_Z6__Fourf")));
414414

415+
C++17 Class Template Argument Deduction (CTAD) Support
416+
======================================================
417+
418+
Clang supports C++17 Class Template Argument Deduction (CTAD) in both host and
419+
device code for HIP. This allows you to omit template arguments when creating
420+
class template instances, letting the compiler deduce them from constructor
421+
arguments.
422+
423+
.. code-block:: c++
424+
425+
#include <tuple>
426+
427+
__host__ __device__ void func() {
428+
std::tuple<int, int> t = std::tuple(1, 1);
429+
}
430+
431+
In the above example, ``std::tuple(1, 1)`` automatically deduces the type to be
432+
``std::tuple<int, int>``.
433+
434+
Deduction Guides
435+
----------------
436+
437+
User-defined deduction guides are also supported. Since deduction guides are not
438+
executable code and only participate in type deduction, they semantically behave
439+
as ``__host__ __device__``. This ensures they are available for deduction in both
440+
host and device contexts, and CTAD continues to respect any constraints on the
441+
corresponding constructors in the usual C++ way.
442+
443+
.. code-block:: c++
444+
445+
template <typename T>
446+
struct MyType {
447+
T value;
448+
__device__ MyType(T v) : value(v) {}
449+
};
450+
451+
MyType(float) -> MyType<double>;
452+
453+
__device__ void deviceFunc() {
454+
MyType m(1.0f); // Deduces MyType<double>
455+
}
456+
457+
.. note::
458+
459+
Explicit HIP target attributes such as ``__host__`` or ``__device__``
460+
are currently only permitted on deduction guides when both are present
461+
(``__host__ __device__``). This usage is deprecated and will be rejected
462+
in a future version of Clang; prefer omitting HIP target attributes on
463+
deduction guides entirely. Clang treats all deduction guides as if they
464+
were ``__host__ __device__``, so ``__host__``-only, ``__device__``-only,
465+
or ``__global__`` deduction guides are rejected as ill-formed.
415466

416467
Host and Device Attributes of Default Destructors
417468
===================================================

clang/docs/ReleaseNotes.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -659,6 +659,23 @@ RISC-V Support
659659
CUDA/HIP Language Changes
660660
^^^^^^^^^^^^^^^^^^^^^^^^^
661661

662+
- Clang now supports C++17 Class Template Argument Deduction (CTAD) in CUDA/HIP
663+
device code by treating deduction guides as if they were ``__host__ __device__``.
664+
665+
- Clang avoids ambiguous CTAD in CUDA/HIP by not synthesizing duplicate implicit
666+
deduction guides when ``__host__`` and ``__device__`` constructors differ only
667+
in CUDA target attributes (same signature and constraints).
668+
669+
- Clang diagnoses CUDA/HIP deduction guides that are annotated as host-only,
670+
device-only, or ``__global__`` as errors. Explicit ``__host__ __device__``
671+
deduction guides remain accepted for now but are deprecated and will be
672+
rejected in a future version of Clang; deduction guides do not participate
673+
in code generation and are treated as implicitly host+device.
674+
675+
- Clang preserves distinct implicit deduction guides for constructors that differ
676+
by constraints, so constraint-based CTAD works in CUDA/HIP device code as in
677+
standard C++.
678+
662679
CUDA Support
663680
^^^^^^^^^^^^
664681

clang/include/clang/Basic/DiagnosticSemaKinds.td

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2769,6 +2769,14 @@ def err_deduction_guide_name_not_class_template : Error<
27692769
"cannot specify deduction guide for "
27702770
"%select{<error>|function template|variable template|alias template|"
27712771
"template template parameter|concept|dependent template name}0 %1">;
2772+
def err_deduction_guide_target_attr : Error<
2773+
"in CUDA/HIP, deduction guides may only be annotated with "
2774+
"'__host__ __device__'; '__host__'-only, '__device__'-only, or "
2775+
"'__global__' deduction guides are not allowed">;
2776+
def warn_deduction_guide_target_attr_deprecated : Warning<
2777+
"use of CUDA/HIP target attributes on deduction guides is deprecated; "
2778+
"they will be rejected in a future version of Clang">,
2779+
InGroup<DeprecatedAttributes>;
27722780
def err_deduction_guide_wrong_scope : Error<
27732781
"deduction guide must be declared in the same scope as template %q0">;
27742782
def err_deduction_guide_defines_function : Error<

clang/lib/Sema/SemaCUDA.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,12 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const FunctionDecl *D,
215215
if (D == nullptr)
216216
return CurCUDATargetCtx.Target;
217217

218+
// C++ deduction guides are never codegen'ed and only participate in template
219+
// argument deduction. Treat them as if they were always host+device so that
220+
// CUDA/HIP target checking never rejects their use based solely on target.
221+
if (isa<CXXDeductionGuideDecl>(D))
222+
return CUDAFunctionTarget::HostDevice;
223+
218224
if (D->hasAttr<CUDAInvalidTargetAttr>())
219225
return CUDAFunctionTarget::InvalidTarget;
220226

@@ -986,6 +992,12 @@ bool SemaCUDA::CheckCall(SourceLocation Loc, FunctionDecl *Callee) {
986992
if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated())
987993
return true;
988994

995+
// C++ deduction guides participate in overload resolution but are not
996+
// callable functions and are never codegen'ed. Treat them as always
997+
// allowed for CUDA/HIP compatibility checking.
998+
if (isa<CXXDeductionGuideDecl>(Callee))
999+
return true;
1000+
9891001
// FIXME: Is bailing out early correct here? Should we instead assume that
9901002
// the caller is a global initializer?
9911003
FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);

clang/lib/Sema/SemaDeclAttr.cpp

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8056,6 +8056,30 @@ void Sema::ProcessDeclAttributeList(
80568056
}
80578057
}
80588058

8059+
// CUDA/HIP: restrict explicit CUDA target attributes on deduction guides.
8060+
//
8061+
// Deduction guides are not callable functions and never participate in
8062+
// codegen; they are always treated as host+device for CUDA/HIP semantic
8063+
// checks. We therefore allow either no CUDA target attributes or an explicit
8064+
// '__host__ __device__' annotation, but reject guides that are host-only,
8065+
// device-only, or marked '__global__'. The use of explicit CUDA/HIP target
8066+
// attributes on deduction guides is deprecated and will be rejected in a
8067+
// future Clang version.
8068+
if (getLangOpts().CUDA)
8069+
if (auto *Guide = dyn_cast<CXXDeductionGuideDecl>(D)) {
8070+
bool HasHost = Guide->hasAttr<CUDAHostAttr>();
8071+
bool HasDevice = Guide->hasAttr<CUDADeviceAttr>();
8072+
bool HasGlobal = Guide->hasAttr<CUDAGlobalAttr>();
8073+
8074+
if (HasGlobal || HasHost != HasDevice) {
8075+
Diag(Guide->getLocation(), diag::err_deduction_guide_target_attr);
8076+
Guide->setInvalidDecl();
8077+
} else if (HasHost && HasDevice) {
8078+
Diag(Guide->getLocation(),
8079+
diag::warn_deduction_guide_target_attr_deprecated);
8080+
}
8081+
}
8082+
80598083
// Do not permit 'constructor' or 'destructor' attributes on __device__ code.
80608084
if (getLangOpts().CUDAIsDevice && D->hasAttr<CUDADeviceAttr>() &&
80618085
(D->hasAttr<ConstructorAttr>() || D->hasAttr<DestructorAttr>()) &&

clang/lib/Sema/SemaTemplateDeductionGuide.cpp

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,26 @@ using namespace clang;
5454
using namespace sema;
5555

5656
namespace {
57+
58+
/// Return true if two associated-constraint sets are semantically equal.
59+
static bool HaveSameAssociatedConstraints(
60+
Sema &SemaRef, const NamedDecl *Old, ArrayRef<AssociatedConstraint> OldACs,
61+
const NamedDecl *New, ArrayRef<AssociatedConstraint> NewACs) {
62+
if (OldACs.size() != NewACs.size())
63+
return false;
64+
if (OldACs.empty())
65+
return true;
66+
67+
// General case: pairwise compare each associated constraint expression.
68+
Sema::TemplateCompareNewDeclInfo NewInfo(New);
69+
for (size_t I = 0, E = OldACs.size(); I != E; ++I)
70+
if (!SemaRef.AreConstraintExpressionsEqual(
71+
Old, OldACs[I].ConstraintExpr, NewInfo, NewACs[I].ConstraintExpr))
72+
return false;
73+
74+
return true;
75+
}
76+
5777
/// Tree transform to "extract" a transformed type from a class template's
5878
/// constructor to a deduction guide.
5979
class ExtractTypeForDeductionGuide
@@ -218,9 +238,51 @@ buildDeductionGuide(Sema &SemaRef, TemplateDecl *OriginalTemplate,
218238
TInfo->getTypeLoc().castAs<FunctionProtoTypeLoc>().getParams();
219239

220240
// Build the implicit deduction guide template.
241+
QualType GuideType = TInfo->getType();
242+
243+
// In CUDA/HIP mode, avoid duplicate implicit guides that differ only in CUDA
244+
// target attributes (same constructor signature and constraints).
245+
if (IsImplicit && Ctor && SemaRef.getLangOpts().CUDA) {
246+
SmallVector<AssociatedConstraint, 4> NewACs;
247+
Ctor->getAssociatedConstraints(NewACs);
248+
249+
for (NamedDecl *Existing : DC->lookup(DeductionGuideName)) {
250+
auto *ExistingFT = dyn_cast<FunctionTemplateDecl>(Existing);
251+
auto *ExistingGuide =
252+
ExistingFT
253+
? dyn_cast<CXXDeductionGuideDecl>(ExistingFT->getTemplatedDecl())
254+
: dyn_cast<CXXDeductionGuideDecl>(Existing);
255+
if (!ExistingGuide)
256+
continue;
257+
258+
// Only consider guides that were also synthesized from a constructor.
259+
auto *ExistingCtor = ExistingGuide->getCorrespondingConstructor();
260+
if (!ExistingCtor)
261+
continue;
262+
263+
// If the underlying constructors are overloads (different signatures once
264+
// CUDA attributes are ignored), they should each get their own guides.
265+
if (SemaRef.IsOverload(Ctor, ExistingCtor,
266+
/*UseMemberUsingDeclRules=*/false,
267+
/*ConsiderCudaAttrs=*/false))
268+
continue;
269+
270+
// At this point, the constructors have the same signature ignoring CUDA
271+
// attributes. Decide whether their associated constraints are also the
272+
// same; only in that case do we treat one guide as a duplicate of the
273+
// other.
274+
SmallVector<AssociatedConstraint, 4> ExistingACs;
275+
ExistingCtor->getAssociatedConstraints(ExistingACs);
276+
277+
if (HaveSameAssociatedConstraints(SemaRef, ExistingCtor, ExistingACs,
278+
Ctor, NewACs))
279+
return Existing;
280+
}
281+
}
282+
221283
auto *Guide = CXXDeductionGuideDecl::Create(
222-
SemaRef.Context, DC, LocStart, ES, Name, TInfo->getType(), TInfo, LocEnd,
223-
Ctor, DeductionCandidate::Normal, FunctionTrailingRC);
284+
SemaRef.Context, DC, LocStart, ES, Name, GuideType, TInfo, LocEnd, Ctor,
285+
DeductionCandidate::Normal, FunctionTrailingRC);
224286
Guide->setImplicit(IsImplicit);
225287
Guide->setParams(Params);
226288

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
2+
// RUN: -fcuda-is-device -verify %s
3+
// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
4+
// RUN: -verify %s
5+
6+
#include "Inputs/cuda.h"
7+
8+
template <typename T>
9+
struct S {
10+
__host__ __device__ S(T);
11+
};
12+
13+
// A host+device deduction guide is allowed and participates in CTAD, but its
14+
// explicit target attributes are deprecated and will be rejected in a future
15+
// Clang version.
16+
template <typename T>
17+
__host__ __device__ S(T) -> S<T>; // expected-warning {{use of CUDA/HIP target attributes on deduction guides is deprecated; they will be rejected in a future version of Clang}}
18+
19+
__host__ __device__ void use_hd_guide() {
20+
S s(42); // uses the explicit __host__ __device__ deduction guide above
21+
}
22+
23+
// CUDA/HIP target attributes on deduction guides are rejected when they make
24+
// the guide host-only, device-only, or a kernel.
25+
template <typename U>
26+
__host__ S(U) -> S<U>; // expected-error {{in CUDA/HIP, deduction guides may only be annotated with '__host__ __device__'; '__host__'-only, '__device__'-only, or '__global__' deduction guides are not allowed}}
27+
28+
template <typename V>
29+
__device__ S(V) -> S<V>; // expected-error {{in CUDA/HIP, deduction guides may only be annotated with '__host__ __device__'; '__host__'-only, '__device__'-only, or '__global__' deduction guides are not allowed}}
30+
31+
template <typename W>
32+
__global__ S(W) -> S<W>; // expected-error {{in CUDA/HIP, deduction guides may only be annotated with '__host__ __device__'; '__host__'-only, '__device__'-only, or '__global__' deduction guides are not allowed}}
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
// RUN: %clang_cc1 -std=c++20 -triple nvptx64-nvidia-cuda -fsyntax-only \
2+
// RUN: -fcuda-is-device -verify %s
3+
// RUN: %clang_cc1 -std=c++20 -triple nvptx64-nvidia-cuda -fsyntax-only \
4+
// RUN: -verify %s
5+
// expected-no-diagnostics
6+
7+
#include "Inputs/cuda.h"
8+
9+
// This test exercises class template argument deduction (CTAD) when there are
10+
// multiple constructors that differ only by constraints. In CUDA/HIP mode, the
11+
// implementation must *not* collapse implicit deduction guides that have the
12+
// same function type but different constraints; otherwise, CTAD can lose viable
13+
// candidates.
14+
15+
template <typename T>
16+
concept Signed = __is_signed(T);
17+
18+
template <typename T>
19+
concept NotSigned = !Signed<T>;
20+
21+
// 1) Constrained ctors with different constraints: ensure we keep
22+
// deduction guides that differ only by constraints.
23+
24+
template <typename T>
25+
struct OverloadCTAD {
26+
__host__ __device__ OverloadCTAD(T) requires Signed<T>;
27+
__host__ __device__ OverloadCTAD(T) requires NotSigned<T>;
28+
};
29+
30+
__host__ __device__ void use_overload_ctad_hd() {
31+
OverloadCTAD a(1); // T = int, uses Signed-constrained guide
32+
OverloadCTAD b(1u); // T = unsigned int, uses NotSigned-constrained guide
33+
}
34+
35+
__device__ void use_overload_ctad_dev() {
36+
OverloadCTAD c(1);
37+
OverloadCTAD d(1u);
38+
}
39+
40+
__global__ void use_overload_ctad_global() {
41+
OverloadCTAD e(1);
42+
OverloadCTAD f(1u);
43+
}
44+
45+
// 2) Add a pair of constructors that have the same signature and the same
46+
// constraint but differ only by CUDA target attributes. This exercises the
47+
// case where two implicit deduction guides would be identical except for
48+
// their originating constructor's CUDA target.
49+
50+
template <typename T>
51+
struct OverloadCTADTargets {
52+
__host__ OverloadCTADTargets(T) requires Signed<T>;
53+
__device__ OverloadCTADTargets(T) requires Signed<T>;
54+
};
55+
56+
__host__ void use_overload_ctad_targets_host() {
57+
OverloadCTADTargets g(1);
58+
}
59+
60+
__device__ void use_overload_ctad_targets_device() {
61+
OverloadCTADTargets h(1);
62+
}
63+
64+
// 3) Unconstrained host/device duplicates: identical signatures and no
65+
// constraints, differing only by CUDA target attributes.
66+
67+
template <typename T>
68+
struct UnconstrainedHD {
69+
__host__ UnconstrainedHD(T);
70+
__device__ UnconstrainedHD(T);
71+
};
72+
73+
__host__ __device__ void use_unconstrained_hd_hd() {
74+
UnconstrainedHD u1(1);
75+
}
76+
77+
__device__ void use_unconstrained_hd_dev() {
78+
UnconstrainedHD u2(1);
79+
}
80+
81+
__global__ void use_unconstrained_hd_global() {
82+
UnconstrainedHD u3(1);
83+
}
84+
85+
// 4) Constrained vs unconstrained ctors with the same signature: guides
86+
// must not be collapsed away when constraints differ.
87+
88+
template <typename T>
89+
concept IsInt = __is_same(T, int);
90+
91+
template <typename T>
92+
struct ConstrainedVsUnconstrained {
93+
__host__ __device__ ConstrainedVsUnconstrained(T);
94+
__host__ __device__ ConstrainedVsUnconstrained(T) requires IsInt<T>;
95+
};
96+
97+
__host__ __device__ void use_constrained_vs_unconstrained_hd() {
98+
ConstrainedVsUnconstrained a(1); // T = int, constrained guide viable
99+
ConstrainedVsUnconstrained b(1u); // T = unsigned, only unconstrained guide
100+
}
101+
102+
__device__ void use_constrained_vs_unconstrained_dev() {
103+
ConstrainedVsUnconstrained c(1);
104+
ConstrainedVsUnconstrained d(1u);
105+
}
106+
107+
__global__ void use_constrained_vs_unconstrained_global() {
108+
ConstrainedVsUnconstrained e(1);
109+
ConstrainedVsUnconstrained f(1u);
110+
}
111+

0 commit comments

Comments
 (0)