[CUDA][HIP] Fix CTAD for host/device constructors #168711

yxsamliu · 2025-11-19T14:24:25Z

Clang currently does not allow using CTAD in CUDA/HIP device functions
since deduction guides are treated as host-only. This patch fixes that
by treating deduction guides as host+device. The rationale is that
deduction guides do not actually generate code in IR, and there is an
existing check for device/host correctness for constructors.

The patch also suppresses duplicate implicit deduction guides from
host/device constructors with identical signatures and constraints
to prevent ambiguity.

For CUDA/HIP, deduction guides are now always implicitly enabled for
both host and device, which matches nvcc's effective behavior. Unlike
nvcc, which silently ignores explicit CUDA/HIP target attributes on
deduction guides, Clang diagnoses such attributes as errors to keep
the syntax clean and avoid confusion.

This ensures CTAD works correctly in CUDA/HIP for constructors with
different target attributes and provides clearer diagnostics when users
attempt to annotate deduction guides with CUDA/HIP target attributes.

Example:

  #include <tuple>

  __host__ __device__ void func()
  {
    std::tuple<int, int> t = std::tuple(1, 1);
  }

This compiles with nvcc but fails with clang for CUDA/HIP without this
fix.

Reference: https://godbolt.org/z/WhT1GrhWE

Fixes: ROCm/ROCm#5646

Fixes: #146646

llvmbot · 2025-11-19T14:25:03Z

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Currently Clang does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host functions. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors.

Also suppress duplicate implicit deduction guides from host/device constructors with identical signatures to prevent ambiguity.

This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes.

Fixes: ROCm/ROCm#5646

Fixes: #146646

Full diff: https://github.com/llvm/llvm-project/pull/168711.diff

3 Files Affected:

(modified) clang/lib/Sema/SemaCUDA.cpp (+12)
(modified) clang/lib/Sema/SemaTemplateDeductionGuide.cpp (+26-2)
(added) clang/test/SemaCUDA/deduction-guide.cu (+47)

diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 31735a0f5feb3..8d1e03c8bc571 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -137,6 +137,12 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const FunctionDecl *D,
   if (D == nullptr)
     return CurCUDATargetCtx.Target;
 
+  // C++ deduction guides are never codegen'ed and only participate in template
+  // argument deduction.  Treat them as if they were always host+device so that
+  // CUDA/HIP target checking never rejects their use based solely on target.
+  if (isa<CXXDeductionGuideDecl>(D))
+    return CUDAFunctionTarget::HostDevice;
+
   if (D->hasAttr<CUDAInvalidTargetAttr>())
     return CUDAFunctionTarget::InvalidTarget;
 
@@ -907,6 +913,12 @@ bool SemaCUDA::CheckCall(SourceLocation Loc, FunctionDecl *Callee) {
   if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated())
     return true;
 
+  // C++ deduction guides participate in overload resolution but are not
+  // callable functions and are never codegen'ed.  Treat them as always
+  // allowed for CUDA/HIP compatibility checking.
+  if (isa<CXXDeductionGuideDecl>(Callee))
+    return true;
+
   // FIXME: Is bailing out early correct here?  Should we instead assume that
   // the caller is a global initializer?
   FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
diff --git a/clang/lib/Sema/SemaTemplateDeductionGuide.cpp b/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
index bfb10665c25b1..f91d84916fa3e 100644
--- a/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
+++ b/clang/lib/Sema/SemaTemplateDeductionGuide.cpp
@@ -218,9 +218,33 @@ buildDeductionGuide(Sema &SemaRef, TemplateDecl *OriginalTemplate,
       TInfo->getTypeLoc().castAs<FunctionProtoTypeLoc>().getParams();
 
   // Build the implicit deduction guide template.
+  QualType GuideType = TInfo->getType();
+
+  // In CUDA/HIP mode, avoid creating duplicate implicit deduction guides with
+  // identical function types. This can happen when there are separate
+  // __host__ and __device__ constructors with the same signature; each would
+  // otherwise synthesize its own implicit deduction guide, leading to
+  // ambiguous CTAD purely due to target attributes. For such cases we keep the
+  // first guide we created and skip building another one.
+  if (IsImplicit && Ctor && SemaRef.getLangOpts().CUDA) {
+    for (NamedDecl *Existing : DC->lookup(DeductionGuideName)) {
+      auto *ExistingFT = dyn_cast<FunctionTemplateDecl>(Existing);
+      auto *ExistingGuide =
+          ExistingFT
+              ? dyn_cast<CXXDeductionGuideDecl>(ExistingFT->getTemplatedDecl())
+              : dyn_cast<CXXDeductionGuideDecl>(Existing);
+      if (!ExistingGuide)
+        continue;
+
+      if (SemaRef.Context.hasSameType(ExistingGuide->getType(), GuideType)) {
+        return Existing;
+      }
+    }
+  }
+
   auto *Guide = CXXDeductionGuideDecl::Create(
-      SemaRef.Context, DC, LocStart, ES, Name, TInfo->getType(), TInfo, LocEnd,
-      Ctor, DeductionCandidate::Normal, FunctionTrailingRC);
+      SemaRef.Context, DC, LocStart, ES, Name, GuideType, TInfo, LocEnd, Ctor,
+      DeductionCandidate::Normal, FunctionTrailingRC);
   Guide->setImplicit(IsImplicit);
   Guide->setParams(Params);
 
diff --git a/clang/test/SemaCUDA/deduction-guide.cu b/clang/test/SemaCUDA/deduction-guide.cu
new file mode 100644
index 0000000000000..30e02f7518053
--- /dev/null
+++ b/clang/test/SemaCUDA/deduction-guide.cu
@@ -0,0 +1,47 @@
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:            -fcuda-is-device -verify=expected,dev %s
+// RUN: %clang_cc1 -std=c++17 -triple nvptx64-nvidia-cuda -fsyntax-only \
+// RUN:            -verify %s
+
+#include "Inputs/cuda.h"
+
+template <class T>
+struct CTADType { // expected-note 2{{candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided}}
+                  // expected-note@-1 2{{candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided}}
+  T first;
+  T second;
+
+  CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
+  __device__ CTADType(T x) : first(x), second(x) {} // expected-note 2{{candidate constructor not viable: requires single argument 'x', but 3 arguments were provided}}
+  __host__ __device__ CTADType(T x, T y) : first(x), second(y) {} // expected-note 2{{candidate constructor not viable: requires 2 arguments, but 3 were provided}}
+  CTADType(T x, T y, T z) : first(x), second(z) {} // dev-note {{'CTADType' declared here}}
+                                                   // expected-note@-1 {{candidate constructor not viable: call to __host__ function from __device__ function}}
+                                                   // expected-note@-2 {{candidate constructor not viable: call to __host__ function from __global__ function}}
+};
+
+template <class T>
+CTADType(T, T) -> CTADType<T>;
+
+__host__ __device__ void use_ctad_host_device() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType ctad_from_three_args(1, 2, 3); // dev-error {{reference to __host__ function 'CTADType' in __host__ __device__ function}}
+}
+
+__host__ void use_ctad_host() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType ctad_from_three_args(1, 2, 3);
+}
+
+__device__ void use_ctad_device() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
+}
+
+__global__ void use_ctad_global() {
+  CTADType ctad_from_two_args(1, 1);
+  CTADType ctad_from_one_arg(1);
+  CTADType<int> ctad_from_three_args(1, 2, 3); // expected-error {{no matching constructor for initialization of 'CTADType<int>'}}
+}

github-actions · 2025-11-19T16:07:32Z

🐧 Linux x64 Test Results

111729 tests passed
4469 tests skipped

✅ The build succeeded and all tests passed.

clang/docs/HIPSupport.rst

cor3ntin

Can you add a release note? Otherwise this looks good, thansk!

clang/include/clang/Basic/DiagnosticSemaKinds.td

clang/lib/Sema/SemaDeclAttr.cpp

yxsamliu · 2025-11-26T14:27:49Z

Can you add a release note?

will do

clang/lib/Sema/SemaDeclAttr.cpp

clang/lib/Sema/SemaTemplateDeductionGuide.cpp

Artem-B

LGTM

cor3ntin

LGTM modulo nit

clang/lib/Sema/SemaTemplateDeductionGuide.cpp

Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses such attributes as errors to keep the syntax clean and avoid confusion. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: llvm#146646

breaks rocRAND This reverts commit e719e93.

This reverts commit e719e93. revert this since it caused regression in our internal CI. Deduction guide with host/device attrs have already been used in https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/library/src/rng/utils/cpp_utils.hpp#L249 ``` template<class V> __host__ __device__ vec_wrapper(V) -> vec_wrapper<V>; ```

yxsamliu · 2025-12-02T20:47:32Z

I reverted my PR since it caused build failures in our internal CI because a ROCm library rocRand has already used device and host attributes with deduction guide:

https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/library/src/rng/utils/cpp_utils.hpp#L249

I think this is probably a tip of iceberg, so it seems we need to change the error to a warning.

Artem-B · 2025-12-02T21:08:50Z

I thought we agreed on complaining on host or device attributes only, allowing implicit or explicit HD. That should've worked for the ROCm code above.

yxsamliu · 2025-12-02T21:21:53Z

I thought we agreed on complaining on host or device attributes only, allowing implicit or explicit HD. That should've worked for the ROCm code above.

I will try that.

Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses such attributes as errors to keep the syntax clean and avoid confusion. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: llvm#146646

Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses device- and host-only attributes as errors to keep the syntax clean and avoid confusion. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: llvm#146646

Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses device- and host-only attributes as errors to keep the syntax clean and avoid confusion. It emits a deprecation warning for host+device attributes. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: llvm#146646

Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses such attributes as errors to keep the syntax clean and avoid confusion. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: llvm#146646

This reverts commit e719e93. revert this since it caused regression in our internal CI. Deduction guide with host/device attrs have already been used in https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/library/src/rng/utils/cpp_utils.hpp#L249 ``` template<class V> __host__ __device__ vec_wrapper(V) -> vec_wrapper<V>; ```

…170481) Clang currently does not allow using CTAD in CUDA/HIP device functions since deduction guides are treated as host-only. This patch fixes that by treating deduction guides as host+device. The rationale is that deduction guides do not actually generate code in IR, and there is an existing check for device/host correctness for constructors. The patch also suppresses duplicate implicit deduction guides from host/device constructors with identical signatures and constraints to prevent ambiguity. For CUDA/HIP, deduction guides are now always implicitly enabled for both host and device, which matches nvcc's effective behavior. Unlike nvcc, which silently ignores explicit CUDA/HIP target attributes on deduction guides, Clang diagnoses device- and host-only attributes as errors to keep the syntax clean and avoid confusion. This ensures CTAD works correctly in CUDA/HIP for constructors with different target attributes and provides clearer diagnostics when users attempt to annotate deduction guides with CUDA/HIP target attributes. Example: ``` #include <tuple> __host__ __device__ void func() { std::tuple<int, int> t = std::tuple(1, 1); } ``` This compiles with nvcc but fails with clang for CUDA/HIP without this fix. Reference: https://godbolt.org/z/WhT1GrhWE Fixes: ROCm/ROCm#5646 Fixes: #146646

yxsamliu requested review from Artem-B, ldionne and zygoloid November 19, 2025 14:24

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 19, 2025

yxsamliu mentioned this pull request Nov 19, 2025

[CUDA][HIP] Fix deduction guide #69366

Closed

yxsamliu force-pushed the deduct-guide branch from bc72715 to 8ddc3fc Compare November 19, 2025 14:39

yxsamliu changed the title ~~[CUDA] Fix CTAD for host/device constructors~~ [CUDA][HIP] Fix CTAD for host/device constructors Nov 19, 2025

yxsamliu force-pushed the deduct-guide branch 2 times, most recently from a45974d to db344b9 Compare November 19, 2025 15:13

Artem-B reviewed Nov 19, 2025

View reviewed changes

clang/docs/HIPSupport.rst Outdated Show resolved Hide resolved

clang/docs/HIPSupport.rst Outdated Show resolved Hide resolved

clang/docs/HIPSupport.rst Outdated Show resolved Hide resolved

yxsamliu force-pushed the deduct-guide branch from db344b9 to eaf4aec Compare November 26, 2025 03:50

cor3ntin reviewed Nov 26, 2025

View reviewed changes

clang/include/clang/Basic/DiagnosticSemaKinds.td Outdated Show resolved Hide resolved

clang/lib/Sema/SemaDeclAttr.cpp Outdated Show resolved Hide resolved

yxsamliu force-pushed the deduct-guide branch from eaf4aec to aa7482a Compare November 26, 2025 14:34

yxsamliu mentioned this pull request Nov 26, 2025

[clang][Driver] Add ROCm lib/llvm/bin detection test #169705

Open

cor3ntin reviewed Nov 27, 2025

View reviewed changes

clang/lib/Sema/SemaDeclAttr.cpp Outdated Show resolved Hide resolved

clang/lib/Sema/SemaTemplateDeductionGuide.cpp Show resolved Hide resolved

yxsamliu force-pushed the deduct-guide branch from aa7482a to 03a9d02 Compare December 1, 2025 15:41

Artem-B approved these changes Dec 1, 2025

View reviewed changes

cor3ntin approved these changes Dec 2, 2025

View reviewed changes

clang/lib/Sema/SemaTemplateDeductionGuide.cpp Outdated Show resolved Hide resolved

yxsamliu force-pushed the deduct-guide branch from 03a9d02 to 2f82086 Compare December 2, 2025 14:34

yxsamliu merged commit e719e93 into llvm:main Dec 2, 2025
11 checks passed

ronlieb added a commit to ROCm/llvm-project that referenced this pull request Dec 2, 2025

Revert "[CUDA][HIP] Fix CTAD for host/device constructors (llvm#168711)"

72ecf80

breaks rocRAND This reverts commit e719e93.

yxsamliu mentioned this pull request Dec 4, 2025

[rocRand] host device attributes on deduction guide ROCm/rocm-libraries#3123

Open

[CUDA][HIP] Fix CTAD for host/device constructors #168711

[CUDA][HIP] Fix CTAD for host/device constructors #168711

Uh oh!

Conversation

yxsamliu commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cor3ntin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yxsamliu commented Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

cor3ntin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yxsamliu commented Dec 2, 2025

Uh oh!

Artem-B commented Dec 2, 2025

Uh oh!

yxsamliu commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yxsamliu commented Nov 19, 2025 •

edited

Loading

github-actions bot commented Nov 19, 2025 •

edited

Loading