Skip to content

Commit aa7482a

Browse files
committed
[CUDA][HIP] Fix CTAD for host/device constructors
Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses such attributes as errors to keep the syntax clean and avoid confusion. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: #146646
1 parent de4e128 commit aa7482a

File tree

8 files changed

+177
-2
lines changed

8 files changed

+177
-2
lines changed

clang/docs/HIPSupport.rst

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,51 @@ Example Usage
376376
basePtr->virtualFunction(); // Allowed since obj is constructed in device code
377377
}
378378

379+
C++17 Class Template Argument Deduction (CTAD) Support
380+
======================================================
381+
382+
Clang supports C++17 Class Template Argument Deduction (CTAD) in both host and device code for HIP.
383+
This allows you to omit template arguments when creating class template instances, letting the compiler
384+
deduce them from constructor arguments.
385+
386+
.. code-block:: c++
387+
388+
#include <tuple>
389+
390+
__host__ __device__ void func() {
391+
std::tuple<int, int> t = std::tuple(1, 1);
392+
}
393+
394+
In the above example, ``std::tuple(1, 1)`` automatically deduces the type to be ``std::tuple<int, int>``.
395+
396+
Deduction Guides
397+
----------------
398+
399+
User-defined deduction guides are also supported. Since deduction guides are not executable code and only
400+
participate in type deduction, they semantically behave as ``__host__ __device__``. This ensures they are
401+
available for deduction in both host and device contexts.
402+
403+
.. code-block:: c++
404+
405+
template <typename T>
406+
struct MyType {
407+
T value;
408+
__device__ MyType(T v) : value(v) {}
409+
};
410+
411+
MyType(float) -> MyType<double>;
412+
413+
__device__ void deviceFunc() {
414+
MyType m(1.0f); // Deduces MyType<double>
415+
}
416+
417+
.. note::
418+
419+
Explicit HIP target attributes such as ``__host__`` or ``__device__``
420+
are not allowed on deduction guides. Clang treats all deduction guides
421+
as if they were ``__host__ __device__`` and diagnoses any explicit
422+
target attributes on them as errors.
423+
379424
Host and Device Attributes of Default Destructors
380425
===================================================
381426

clang/docs/ReleaseNotes.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -641,6 +641,15 @@ RISC-V Support
641641
CUDA/HIP Language Changes
642642
^^^^^^^^^^^^^^^^^^^^^^^^^
643643

644+
- Clang now supports C++17 Class Template Argument Deduction (CTAD) in CUDA/HIP
645+
device code by treating deduction guides as if they were ``__host__ __device__``.
646+
647+
- Clang avoids ambiguous CTAD in CUDA/HIP by not synthesizing duplicate implicit
648+
deduction guides when ``__host__`` and ``__device__`` constructors share a signature.
649+
650+
- Clang diagnoses CUDA/HIP target attributes written on deduction guides as errors,
651+
since deduction guides do not participate in code generation.
652+
644653
CUDA Support
645654
^^^^^^^^^^^^
646655

clang/include/clang/Basic/DiagnosticSemaKinds.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2769,6 +2769,9 @@ def err_deduction_guide_name_not_class_template : Error<
27692769
"cannot specify deduction guide for "
27702770
"%select{<error>|function template|variable template|alias template|"
27712771
"template template parameter|concept|dependent template name}0 %1">;
2772+
def err_deduction_guide_target_attr : Error<
2773+
"in CUDA/HIP, target attributes are not allowed on deduction guides; "
2774+
"deduction guides are implicitly enabled for both host and device">;
27722775
def err_deduction_guide_wrong_scope : Error<
27732776
"deduction guide must be declared in the same scope as template %q0">;
27742777
def err_deduction_guide_defines_function : Error<

clang/lib/Sema/SemaCUDA.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,12 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const FunctionDecl *D,
137137
if (D == nullptr)
138138
return CurCUDATargetCtx.Target;
139139

140+
// C++ deduction guides are never codegen'ed and only participate in template
141+
// argument deduction. Treat them as if they were always host+device so that
142+
// CUDA/HIP target checking never rejects their use based solely on target.
143+
if (isa<CXXDeductionGuideDecl>(D))
144+
return CUDAFunctionTarget::HostDevice;
145+
140146
if (D->hasAttr<CUDAInvalidTargetAttr>())
141147
return CUDAFunctionTarget::InvalidTarget;
142148

@@ -907,6 +913,12 @@ bool SemaCUDA::CheckCall(SourceLocation Loc, FunctionDecl *Callee) {
907913
if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated())
908914
return true;
909915

916+
// C++ deduction guides participate in overload resolution but are not
917+
// callable functions and are never codegen'ed. Treat them as always
918+
// allowed for CUDA/HIP compatibility checking.
919+
if (isa<CXXDeductionGuideDecl>(Callee))
920+
return true;
921+
910922
// FIXME: Is bailing out early correct here? Should we instead assume that
911923
// the caller is a global initializer?
912924
FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);

clang/lib/Sema/SemaDeclAttr.cpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7984,6 +7984,19 @@ void Sema::ProcessDeclAttributeList(
79847984
}
79857985
}
79867986

7987+
// CUDA/HIP: disallow explicit CUDA target attributes on deduction guides.
7988+
// Deduction guides are not callable functions and never participate in
7989+
// codegen; they are always treated as host+device for CUDA/HIP semantic
7990+
// checks, so explicit target attributes on them would be misleading noise.
7991+
if (getLangOpts().CUDA)
7992+
if (auto *Guide = dyn_cast<CXXDeductionGuideDecl>(D);
7993+
Guide &&
7994+
(Guide->hasAttr<CUDAHostAttr>() || Guide->hasAttr<CUDADeviceAttr>() ||
7995+
Guide->hasAttr<CUDAGlobalAttr>())) {
7996+
Diag(Guide->getLocation(), diag::err_deduction_guide_target_attr);
7997+
Guide->setInvalidDecl();
7998+
}
7999+
79878000
// Do not permit 'constructor' or 'destructor' attributes on __device__ code.
79888001
if (getLangOpts().CUDAIsDevice && D->hasAttr<CUDADeviceAttr>() &&
79898002
(D->hasAttr<ConstructorAttr>() || D->hasAttr<DestructorAttr>()) &&

clang/lib/Sema/SemaTemplateDeductionGuide.cpp

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,9 +218,31 @@ buildDeductionGuide(Sema &SemaRef, TemplateDecl *OriginalTemplate,
218218
TInfo->getTypeLoc().castAs<FunctionProtoTypeLoc>().getParams();
219219

220220
// Build the implicit deduction guide template.
221+
QualType GuideType = TInfo->getType();
222+
223+
// In CUDA/HIP mode, avoid creating duplicate implicit deduction guides with
224+
// identical function types. This can happen when there are separate
225+
// __host__ and __device__ constructors with the same signature; each would
226+
// otherwise synthesize its own implicit deduction guide, leading to
227+
// ambiguous CTAD purely due to target attributes. For such cases we keep the
228+
// first guide we created and skip building another one.
229+
if (IsImplicit && Ctor && SemaRef.getLangOpts().CUDA)
230+
for (NamedDecl *Existing : DC->lookup(DeductionGuideName)) {
231+
auto *ExistingFT = dyn_cast<FunctionTemplateDecl>(Existing);
232+
auto *ExistingGuide =
233+
ExistingFT
234+
? dyn_cast<CXXDeductionGuideDecl>(ExistingFT->getTemplatedDecl())
235+
: dyn_cast<CXXDeductionGuideDecl>(Existing);
236+
if (!ExistingGuide)
237+
continue;
238+
239+
if (SemaRef.Context.hasSameType(ExistingGuide->getType(), GuideType))
240+
return Existing;
241+
}
242+
221243
auto *Guide = CXXDeductionGuideDecl::Create(
222-
SemaRef.Context, DC, LocStart, ES, Name, TInfo->getType(), TInfo, LocEnd,
223-
Ctor, DeductionCandidate::Normal, FunctionTrailingRC);
244+
SemaRef.Context, DC, LocStart, ES, Name, GuideType, TInfo, LocEnd, Ctor,
245+
DeductionCandidate::Normal, FunctionTrailingRC);
224246
Guide->setImplicit(IsImplicit);
225247
Guide->setParams(Params);
226248

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
2+
// RUN: -fcuda-is-device -verify %s
3+
// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
4+
// RUN: -verify %s
5+
6+
#include "Inputs/cuda.h"
7+
8+
template <typename T>
9+
struct S {
10+
__host__ __device__ S(T);
11+
};
12+
13+
template <typename T>
14+
S(T) -> S<T>;
15+
16+
// CUDA/HIP target attributes on deduction guides are rejected.
17+
template <typename U>
18+
__host__ S(U) -> S<U>; // expected-error {{in CUDA/HIP, target attributes are not allowed on deduction guides; deduction guides are implicitly enabled for both host and device}}
19+
20+
template <typename V>
21+
__device__ S(V) -> S<V>; // expected-error {{in CUDA/HIP, target attributes are not allowed on deduction guides; deduction guides are implicitly enabled for both host and device}}
22+
23+
template <typename W>
24+
__global__ S(W) -> S<W>; // expected-error {{in CUDA/HIP, target attributes are not allowed on deduction guides; deduction guides are implicitly enabled for both host and device}}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
2+
// RUN: -fcuda-is-device -verify=expected,dev %s
3+
// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
4+
// RUN: -verify %s
5+
6+
#include "Inputs/cuda.h"
7+
8+
template <class T>
9+
struct CTADType { // expected-note 2{{candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided}}
10+
// expected-note@-1 2{{candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided}}
11+
T first;
12+
T second;
13+
14+
CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
15+
__device__ CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
16+
__host__ __device__ CTADType(T x, T y) : first(x), second(y) {} // expected-note 2{{candidate constructor not viable: requires 2 arguments, but 3 were provided}}
17+
CTADType(T x, T y, T z) : first(x), second(z) {} // dev-note {{'CTADType' declared here}}
18+
// expected-note@-1 {{candidate constructor not viable: call to __host__ function from __device__ function}}
19+
// expected-note@-2 {{candidate constructor not viable: call to __host__ function from __global__ function}}
20+
};
21+
22+
template <class T>
23+
CTADType(T, T) -> CTADType<T>;
24+
25+
__host__ __device__ void use_ctad_host_device() {
26+
CTADType ctad_from_two_args(1, 1);
27+
CTADType ctad_from_one_arg(1);
28+
CTADType ctad_from_three_args(1, 2, 3); // dev-error {{reference to __host__ function 'CTADType' in __host__ __device__ function}}
29+
}
30+
31+
__host__ void use_ctad_host() {
32+
CTADType ctad_from_two_args(1, 1);
33+
CTADType ctad_from_one_arg(1);
34+
CTADType ctad_from_three_args(1, 2, 3);
35+
}
36+
37+
__device__ void use_ctad_device() {
38+
CTADType ctad_from_two_args(1, 1);
39+
CTADType ctad_from_one_arg(1);
40+
CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
41+
}
42+
43+
__global__ void use_ctad_global() {
44+
CTADType ctad_from_two_args(1, 1);
45+
CTADType ctad_from_one_arg(1);
46+
CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
47+
}

0 commit comments

Comments
 (0)