Skip to content

PTX compile error: ".NaN requires .target sm_80 or higher" on Julia 1.12 (RTX 2080 / sm_75, works fine on Julia 1.11.7) #2946

@potaslab

Description

@potaslab

Describe the bug

After upgrading to Julia 1.12.1, GPU kernels that compiled and ran perfectly on Julia 1.11.7 (same code, environment, and hardware) now fail during PTX assembly with:

ERROR: Failed to compile PTX code (ptxas exited with code 4294967295)
ptxas ... error   : Modifier '.NaN' requires .target sm_80 or higher

This happens on an RTX 2080 (sm_75) using CUDA.jl 5.8.5 and cuDNN 1.4.5.
It appears that PTX code is being generated with .NaN modifiers requiring sm_80+, even though the target GPU is sm_75.

Reverting to Julia 1.11.7 immediately fixes the issue.


To reproduce

Minimal working example (any CUDA kernel that performs reductions like fast_maximum or logsumexp will trigger this):

using Flux, CUDA, NNlib

CUDA.allowscalar(false)

x = cu(rand(Float32, 128, 128))
y = maximum(x)  # or NNlib.logsumexp(x) to trigger reduction kernels

This throws:

ptxas ... error : Modifier '.NaN' requires .target sm_80 or higher

On Julia 1.11.7, the same code executes successfully.

Manifest.toml

CUDA.jl v5.8.5
cuDNN.jl v1.4.5
Flux.jl v0.16.5
NNlib.jl v0.9.31
GPUCompiler.jl v0.26.9
GPUArrays.jl v10.2.0
LLVM.jl v6.3.0


Expected behavior

PTX assembly should correctly target the actual GPU architecture (sm_75 for RTX 2080) without emitting .NaN instruction modifiers that require sm_80+.
The same code should compile and run on Julia 1.12.1 just as it does on Julia 1.11.7.


Version info

Details on Julia:

Julia Version 1.12.1 (2025-10-25)
Commit 3bdfb0c7f0 (2025-09-28 06:23 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: 16.0.6
Environment:
  JULIA_CUDA_USE_BINARYBUILDER = false
  CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9

Details on CUDA:

CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 577.00
CUDA.jl 5.8.5
cuDNN.jl 1.4.5
Device: NVIDIA GeForce RTX 2080 (sm_75)

Additional context

  • The same project runs perfectly under Julia 1.11.7 with identical packages.
  • Reproducible across fresh environments (Pkg.activate("temp"); Pkg.add("CUDA")).
  • Behavior strongly suggests a regression introduced in Julia 1.12 or updated LLVM/PTX codegen used by CUDA.jl.
  • Stack trace points to fast_maximum / logsumexp inside NNlibCUDAExtptxas compilation failure.
  • Possibly related to previous issue [#2148](LLVM generates max.NaN which only works on sm_80 #2148), which was fixed for earlier LLVMs but might have re-emerged under Julia 1.12.

Workaround

  • Downgrading Julia to 1.11.7 fully resolves the issue.
  • Alternatively, forcing older PTX codegen via ENV["JULIA_CUDA_ARCH"]="sm_75" does not eliminate the error.

Suggested next steps

  • Verify PTX emission for fast_maximum under Julia 1.12 + CUDA.jl 5.8.5 / 5.9.2.
  • Confirm whether LLVM 16.0.6 in Julia 1.12 reintroduced .NaN modifier generation for non-Ampere GPUs.
  • If confirmed, patch CUDA.jl’s PTX lowering or set the default codegen to avoid .NaN for < sm_80.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions