- 
                Notifications
    
You must be signed in to change notification settings  - Fork 255
 
Description
Describe the bug
After upgrading to Julia 1.12.1, GPU kernels that compiled and ran perfectly on Julia 1.11.7 (same code, environment, and hardware) now fail during PTX assembly with:
ERROR: Failed to compile PTX code (ptxas exited with code 4294967295)
ptxas ... error   : Modifier '.NaN' requires .target sm_80 or higher
This happens on an RTX 2080 (sm_75) using CUDA.jl 5.8.5 and cuDNN 1.4.5.
It appears that PTX code is being generated with .NaN modifiers requiring sm_80+, even though the target GPU is sm_75.
Reverting to Julia 1.11.7 immediately fixes the issue.
To reproduce
Minimal working example (any CUDA kernel that performs reductions like fast_maximum or logsumexp will trigger this):
using Flux, CUDA, NNlib
CUDA.allowscalar(false)
x = cu(rand(Float32, 128, 128))
y = maximum(x)  # or NNlib.logsumexp(x) to trigger reduction kernelsThis throws:
ptxas ... error : Modifier '.NaN' requires .target sm_80 or higher
On Julia 1.11.7, the same code executes successfully.
Manifest.toml
CUDA.jl v5.8.5
cuDNN.jl v1.4.5
Flux.jl v0.16.5
NNlib.jl v0.9.31
GPUCompiler.jl v0.26.9
GPUArrays.jl v10.2.0
LLVM.jl v6.3.0
Expected behavior
PTX assembly should correctly target the actual GPU architecture (sm_75 for RTX 2080) without emitting .NaN instruction modifiers that require sm_80+.
The same code should compile and run on Julia 1.12.1 just as it does on Julia 1.11.7.
Version info
Details on Julia:
Julia Version 1.12.1 (2025-10-25)
Commit 3bdfb0c7f0 (2025-09-28 06:23 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: 16.0.6
Environment:
  JULIA_CUDA_USE_BINARYBUILDER = false
  CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9
Details on CUDA:
CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 577.00
CUDA.jl 5.8.5
cuDNN.jl 1.4.5
Device: NVIDIA GeForce RTX 2080 (sm_75)
Additional context
- The same project runs perfectly under Julia 1.11.7 with identical packages.
 - Reproducible across fresh environments (
Pkg.activate("temp"); Pkg.add("CUDA")). - Behavior strongly suggests a regression introduced in Julia 1.12 or updated LLVM/PTX codegen used by CUDA.jl.
 - Stack trace points to 
fast_maximum/logsumexpinsideNNlibCUDAExt→ptxascompilation failure. - Possibly related to previous issue [#2148](LLVM generates max.NaN which only works on sm_80 #2148), which was fixed for earlier LLVMs but might have re-emerged under Julia 1.12.
 
Workaround
- Downgrading Julia to 1.11.7 fully resolves the issue.
 - Alternatively, forcing older PTX codegen via 
ENV["JULIA_CUDA_ARCH"]="sm_75"does not eliminate the error. 
Suggested next steps
- Verify PTX emission for 
fast_maximumunder Julia 1.12 + CUDA.jl 5.8.5 / 5.9.2. - Confirm whether LLVM 16.0.6 in Julia 1.12 reintroduced 
.NaNmodifier generation for non-Ampere GPUs. - If confirmed, patch CUDA.jl’s PTX lowering or set the default codegen to avoid 
.NaNfor < sm_80.