Cannot use `lambertw` on the GPU

I've been trying to use this function on the GPU but I always get an error (the original issue I posted is https://github.com/CliMA/Oceananigans.jl/issues/3438). Mostly I work indirectly with KernelAbstractions.jl, and the following MWE illustrates the error I'm getting:

```julia
using KernelAbstractions
using LambertW: lambertw

@kernel function mul2_kernel(A)
  I = @index(Global)
  A[I] = lambertw(A[I])
end

using CUDA: CuArray
A = CuArray(ones(10, 10))

backend = get_backend(A)
mul2_kernel(backend, 64)(A, ndrange=size(A))
synchronize(backend)
display(A)
```

Instead of this working, I get a huge error that starts with

```
ERROR: LoadError: InvalidIRError: compiling MethodInstance for gpu_mul2_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(64, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, ::CUDA.CuDeviceMatrix{Float64, 1}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to __memcmp_evex_movbe)
Stacktrace:
 [1] _memcmp
   @ ./strings/string.jl:124
 [2] ==
   @ ./strings/string.jl:136
 [3] multiple call sites
   @ unknown:0
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
 [1] multiple call sites
   @ unknown:0
```

You can see whole error [here](https://github.com/JuliaMath/LambertW.jl/files/14041475/error.txt).

I also was able to come up with a simpler example using CUDA.jl that produces a similar error, but for some reason occasionally this fails with a segfault, so I'm not sure if the same thing is going on here (or maybe I'm doing something wrong since I've never really worked with CUDA directly). Here's the CUDA MWE:

```julia
using LambertW: lambertw
using CUDA: CuArray

A = CuArray(ones(10, 10))
B = lambertw.(A)
display(B)
```

Is there any way to make `lambertw` work on the GPU? More specifically work with KernelAbstractions?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot use `lambertw` on the GPU #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot use lambertw on the GPU #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Cannot use `lambertw` on the GPU #30