Using `@fastmath` for a function inlined in a kernel yields PTX compile error for Float64 arrays in Julia 1.12.0

In the following MWE, when using `@fastmath` and Float64 arrays on a GPU kernel, I get a PTX compile error in Julia 1.12.0. Removing `@fastmath` or using Float32 does no yield error. I am not sure if this is a KA bug or a CUDA.jl bug, so I will move there if necessary.

```julia
using CUDA
using KernelAbstractions: get_backend, @index, @kernel

@fastmath @inline function flux_out(I::CartesianIndex{d},u) where {d}
    s = zero(eltype(u))
    for i in 1:d
        @inbounds s += max(0,u[I,i])
    end
    return s
end
@kernel function _CFL!(σ, u)
    I = @index(Global, Cartesian)
    σ[I] = flux_out(I,u)
end
function CFL!(σ, u)
    _CFL!(get_backend(σ), 64)(σ, u, ndrange=size(σ))
end
function main()
    T = Float64
    σ = CUDA.rand(T,16,16)
    u = CUDA.rand(T,16,16,2)
    CFL!(σ, u)
end
```

<details>

<summary>Error trace</summary>

```julia
ERROR: Failed to compile PTX code (ptxas exited with code 255)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_89 --output-file /tmp/jl_kFmdGHiFet.cubin /tmp/jl_6nYDKLCf2P.ptx
ptxas /tmp/jl_6nYDKLCf2P.ptx, line 114; error   : Illegal modifier '.NaN' for instruction 'max'
ptxas /tmp/jl_6nYDKLCf2P.ptx, line 120; error   : Illegal modifier '.NaN' for instruction 'max'
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach /tmp/jl_6nYDKLCf2P.ptx
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:44
  [2] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/OnIOF/src/compiler/compilation.jl:356
  [3] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Gp8bZ/src/execution.jl:245
  [4] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Gp8bZ/src/execution.jl:159
  [5] macro expansion
    @ ~/.julia/packages/CUDA/OnIOF/src/compiler/execution.jl:373 [inlined]
  [6] macro expansion
    @ ./lock.jl:376 [inlined]
  [7] cufunction(f::typeof(gpu__CFL!), tt::Type{Tuple{…}}; kwargs::@Kwargs{always_inline::Bool, maxthreads::Int64})
    @ CUDA ~/.julia/packages/CUDA/OnIOF/src/compiler/execution.jl:368
  [8] macro expansion
    @ ~/.julia/packages/CUDA/OnIOF/src/compiler/execution.jl:112 [inlined]
  [9] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/OnIOF/src/CUDAKernels.jl:124
 [10] Kernel
    @ ~/.julia/packages/CUDA/OnIOF/src/CUDAKernels.jl:110 [inlined]
 [11] CFL!
    @ ~/Workspace/tudelft1/WaterLily-Tests/test_fluxout2.jl:17 [inlined]
 [12] main()
    @ Main ~/Workspace/tudelft1/WaterLily-Tests/test_fluxout2.jl:23
 [13] top-level scope
    @ ~/Workspace/tudelft1/WaterLily-Tests/test_fluxout2.jl:25
Some type information was truncated. Use `show(err)` to see complete types.
```

</details>

We caught the bug in WaterLily https://github.com/WaterLily-jl/WaterLily.jl/issues/258 https://github.com/WaterLily-jl/WaterLily.jl/pull/261.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using `@fastmath` for a function inlined in a kernel yields PTX compile error for Float64 arrays in Julia 1.12.0 #643

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using @fastmath for a function inlined in a kernel yields PTX compile error for Float64 arrays in Julia 1.12.0 #643

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Using `@fastmath` for a function inlined in a kernel yields PTX compile error for Float64 arrays in Julia 1.12.0 #643