PTX compile error: ".NaN requires .target sm_80 or higher" on Julia 1.12 (RTX 2080 / sm_75, works fine on Julia 1.11.7)


**Describe the bug**

After upgrading to **Julia 1.12.1**, GPU kernels that compiled and ran perfectly on **Julia 1.11.7** (same code, environment, and hardware) now fail during PTX assembly with:

```
ERROR: Failed to compile PTX code (ptxas exited with code 4294967295)
ptxas ... error   : Modifier '.NaN' requires .target sm_80 or higher
```

This happens on an **RTX 2080 (sm_75)** using **CUDA.jl 5.8.5** and **cuDNN 1.4.5**.
It appears that PTX code is being generated with `.NaN` modifiers requiring `sm_80+`, even though the target GPU is `sm_75`.

Reverting to **Julia 1.11.7** immediately fixes the issue.

---

**To reproduce**

Minimal working example (any CUDA kernel that performs reductions like `fast_maximum` or `logsumexp` will trigger this):

```julia
using Flux, CUDA, NNlib

CUDA.allowscalar(false)

x = cu(rand(Float32, 128, 128))
y = maximum(x)  # or NNlib.logsumexp(x) to trigger reduction kernels
```

This throws:

```
ptxas ... error : Modifier '.NaN' requires .target sm_80 or higher
```

On Julia 1.11.7, the same code executes successfully.

<details><summary>Manifest.toml</summary>
<p>

```
CUDA.jl v5.8.5
cuDNN.jl v1.4.5
Flux.jl v0.16.5
NNlib.jl v0.9.31
GPUCompiler.jl v0.26.9
GPUArrays.jl v10.2.0
LLVM.jl v6.3.0
```

</p>
</details>

---

**Expected behavior**

PTX assembly should correctly target the actual GPU architecture (`sm_75` for RTX 2080) without emitting `.NaN` instruction modifiers that require `sm_80+`.
The same code should compile and run on Julia 1.12.1 just as it does on Julia 1.11.7.

---

**Version info**

Details on Julia:

```
Julia Version 1.12.1 (2025-10-25)
Commit 3bdfb0c7f0 (2025-09-28 06:23 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: 16.0.6
Environment:
  JULIA_CUDA_USE_BINARYBUILDER = false
  CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9
```

Details on CUDA:

```
CUDA runtime 12.9, artifact installation
CUDA driver 12.9
NVIDIA driver 577.00
CUDA.jl 5.8.5
cuDNN.jl 1.4.5
Device: NVIDIA GeForce RTX 2080 (sm_75)
```

---

**Additional context**

* The same project runs perfectly under **Julia 1.11.7** with identical packages.
* Reproducible across fresh environments (`Pkg.activate("temp"); Pkg.add("CUDA")`).
* Behavior strongly suggests a regression introduced in Julia 1.12 or updated LLVM/PTX codegen used by CUDA.jl.
* Stack trace points to `fast_maximum` / `logsumexp` inside `NNlibCUDAExt` → `ptxas` compilation failure.
* Possibly related to previous issue [[#2148](https://github.com/JuliaGPU/CUDA.jl/issues/2148)](https://github.com/JuliaGPU/CUDA.jl/issues/2148), which was fixed for earlier LLVMs but might have re-emerged under Julia 1.12.

---

**Workaround**

* Downgrading Julia to **1.11.7** fully resolves the issue.
* Alternatively, forcing older PTX codegen via `ENV["JULIA_CUDA_ARCH"]="sm_75"` does not eliminate the error.

---

**Suggested next steps**

* Verify PTX emission for `fast_maximum` under Julia 1.12 + CUDA.jl 5.8.5 / 5.9.2.
* Confirm whether LLVM 16.0.6 in Julia 1.12 reintroduced `.NaN` modifier generation for non-Ampere GPUs.
* If confirmed, patch CUDA.jl’s PTX lowering or set the default codegen to avoid `.NaN` for < sm_80.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PTX compile error: ".NaN requires .target sm_80 or higher" on Julia 1.12 (RTX 2080 / sm_75, works fine on Julia 1.11.7) #2946

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PTX compile error: ".NaN requires .target sm_80 or higher" on Julia 1.12 (RTX 2080 / sm_75, works fine on Julia 1.11.7) #2946

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions