-
Notifications
You must be signed in to change notification settings - Fork 14
Description
I've been trying to use this function on the GPU but I always get an error (the original issue I posted is CliMA/Oceananigans.jl#3438). Mostly I work indirectly with KernelAbstractions.jl, and the following MWE illustrates the error I'm getting:
using KernelAbstractions
using LambertW: lambertw
@kernel function mul2_kernel(A)
I = @index(Global)
A[I] = lambertw(A[I])
end
using CUDA: CuArray
A = CuArray(ones(10, 10))
backend = get_backend(A)
mul2_kernel(backend, 64)(A, ndrange=size(A))
synchronize(backend)
display(A)
Instead of this working, I get a huge error that starts with
ERROR: LoadError: InvalidIRError: compiling MethodInstance for gpu_mul2_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(64, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, ::CUDA.CuDeviceMatrix{Float64, 1}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to __memcmp_evex_movbe)
Stacktrace:
[1] _memcmp
@ ./strings/string.jl:124
[2] ==
@ ./strings/string.jl:136
[3] multiple call sites
@ unknown:0
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
[1] multiple call sites
@ unknown:0
You can see whole error here.
I also was able to come up with a simpler example using CUDA.jl that produces a similar error, but for some reason occasionally this fails with a segfault, so I'm not sure if the same thing is going on here (or maybe I'm doing something wrong since I've never really worked with CUDA directly). Here's the CUDA MWE:
using LambertW: lambertw
using CUDA: CuArray
A = CuArray(ones(10, 10))
B = lambertw.(A)
display(B)
Is there any way to make lambertw
work on the GPU? More specifically work with KernelAbstractions?
Thanks!