This is a simple reproducer for NVIDIA bug (5840342)[https://developer.nvidia.com/bugs/5840342]
nvcc accepts invalid device code when a constexpr function containing host-only C++ standard library types (e.g. std::: string) is instantiated and called from a CUDA kernel, particularly when using --std=c++20 --expt-relaxed-constexpr. Thee
code compiles without diagnostics even though such constructs are not supported in device code. This can lead to silent mii
scompilation or undefined behavior at runtime.
A reproducer can be as simple as test.cu:
#include <cstdio>
#include <string>
template <int I>
constexpr void dump() {
std::string data = std::to_string(I);
printf("%s\n", data.data());
}
template <int I>
__global__ void kernel() {
dump<I>();
}
int main() {
printf("Before\n");
cudaDeviceSynchronize();
kernel<42><<<1, 1>>>();
cudaDeviceSynchronize();
printf("After\n");
}Compiling with:
/usr/local/cuda-13.0/bin/nvcc -ccbin /opt/rh/gcc-toolset-14/root/usr/bin/gcc -std=c++20 --expt-relaxed-constexpr test.cu -oo
testInvestigation indicates additionally that the behavior depends on the host compiler. For example, using nvcc v13.0.88 witt
h gcc 11.5.0 produces an expected compilation error, whereas switching to gcc 14.2.1 suppresses the error and allows thh
e invalid code to compile.
An an example nvcc 12.9.1 with gcc (GCC) 11.5.0 leads to the correct error message
<source>(17): error: calling a __host__ function("void ::dump<(int)42> ()") from a __global__ function("kernel<(int)42> "))
is not allowed
<source>(17): error: identifier "dump<(int)42> " is undefined in device code