-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Description
I've been diagnosing a problem related to a cudaKernelLaunch
returning "named symbol not found" for certain kernel launches.
The code that produces the problem is quite complicated and I've been unable to isolate it into something simpler yet, but it seems like the name mangling is producing invalid names.
Running this code:
KernelCUDA<ExecutionShapes_t, Threads, Blocks, Functions_t, decltype(Parameters)...><<<Blocks, Threads, MaxShmem, stream>>>(Functions, Parameters...);
if (cudaGetLastError() == cudaErrorSymbolNotFound) {
auto name = typeid(KernelCUDA<ExecutionShapes_t, Threads, Blocks, Functions_t, decltype(Parameters)...>).name();
std::cout << "Mangled: " << name << std::endl;
int status = 0;
auto demangled = abi::__cxa_demangle(name, 0, 0, &status);
std::cout << "Demangled: " << demangled << std::endl;
}
The mangled name ends up being FvN5tadma5tupleIJZZZZNS_9OptimizerIJNS_4NodeINS_8SequenceIJEEEZ9InputNodeIJRNS_6TaggedILi0ENS_6TensorIlNS_9AllocatorILNS_6MemoryE1EEENS3_IJLi1ELi10EEEES4_EEEERNS6_ILi2ESC_EERNS6_ILi1ESC_EEEEDaDpOT_EUlDpRKSJ_E_JSD_SF_SH_EEENS2_INS3_IJLi1ELi10ELi768EEEEZNS_6gatherILi0ETkNS_9AnyTensorENS7_IfSA_NS3_IJLi30522ELi768EEEES4_EETkNS_9AnyTensorESD_EEDaRKT0_RKT1_EUlT_RSV_S10_RKT2_E_JNS7_IfSA_SR_S4_EEKSU_KSD_EEENS2_ISR_ZNSS_ILi0ETkNS_9AnyTensorENS7_IfSA_NS3_IJLi2ELi768EEEES4_EETkNS_9AnyTensorESH_EEDaSX_S10_EUlS11_S12_S10_S15_E_JS17_KS1C_KSH_EEENS2_ISR_ZNS_13CombineToNodeITkNS_9AnyTensorENS6_ILm3765833763ES17_EETkNS_9AnyTensorENS6_ILm2639559269ES17_EEZNS_plITkNS_9AnyTensorES1I_TkNS_9AnyTensorES1J_EEDaRKS11_SX_EUlS1M_SX_E_EEDaS1M_SX_S10_EUlS1M_SX_S10_RS13_E_JKS1I_KS1J_S17_EEENS2_ISR_ZNS1H_ITkNS_9AnyTensorENS6_ILm2072007968ES17_EETkNS_9AnyTensorES17_ZNS1K_ITkNS_9AnyTensorES1T_TkNS_9AnyTensorES17_EEDaS1M_SX_EUlS1M_SX_E_EEDaS1M_SX_S10_EUlS1M_SX_S10_S1O_E_JKS1T_KS17_S17_EEENS2_INS3_IJXtlNS_15BlockSizeMarkerILl512EEEEELi10ELi1EEEENS6_ILj2048EZNS_10ReduceNodeILi2ELb1ETkNS_9AnyTensorENS6_ILm3627180312ES17_EEZNHS17_4meanILin1ELb1ERS23_EEDcOSY_EUlS1M_SX_E_UlS1M_E_ZNHS24_ILin1ELb1ES25_EEDcS26_EUlS1M_E_QeqsrSY_6deviceLS9_1EEEDaS10_S15_RKT3_RKT4_EUlS1M_SX_RSY_E_EEJKS23_NS7_IfSA_NS3_IJLi1ELi10ELi1EEEES4_EEEEENS2_IS4_Z10OutputNodeIJRNS6_ILm129000195ENS7_IfSA_NS3_IJLi1ELi10ELi30522EEEES4_EEEEEEDaSL_EUlSO_E_JS2Q_EEEEE12CompileGraphILS9_1EEENS_13CompiledGraphERKSQ_RKS1A_RKS1G_RKS1S_RKS1Y_RKS2M_RKS2T_ENKUlTnivE_clILi3EEEDavENKUlTpTnivE_clIJLi2ELi3EEEEDavENKUlTnivE_clILi0EEEDavEUlRS11_DpOT0_E_ZZZZNS2V_ILS9_1EEES2W_S2Y_S30_S32_S34_S36_S38_S3A_ENKS3C_ILi3EEEDavENKS3E_IJLi2ELi3EEEEDavENKS3G_ILi1EEEDavEUlS3H_S3K_E_EEERNS7_IfNS8_ILS9_3EEESR_S4_EERS1C_RSC_RS17_S3T_E
And abi::__cxa_demangle
returns status code -2 mangled_name is not a valid name under the C++ ABI mangling rules
This happens when instantiating any kernel at this point in the code, but only if it's instantiated with a lambda.
__global__ void RandomKernel(auto) {}
...
RandomKernel<<<1,1>>>([]{}); // Boom
So a kernel instantiated with a lambda (or any type?) dependent on a complicated mess of templates breaks name mangling at some point.
I'm still working to isolate the problem and will post any additional findings here, any thoughts appreciated.