Skip to content

Cannot use lambertw on the GPU #30

@tomchor

Description

@tomchor

I've been trying to use this function on the GPU but I always get an error (the original issue I posted is CliMA/Oceananigans.jl#3438). Mostly I work indirectly with KernelAbstractions.jl, and the following MWE illustrates the error I'm getting:

using KernelAbstractions
using LambertW: lambertw

@kernel function mul2_kernel(A)
  I = @index(Global)
  A[I] = lambertw(A[I])
end

using CUDA: CuArray
A = CuArray(ones(10, 10))

backend = get_backend(A)
mul2_kernel(backend, 64)(A, ndrange=size(A))
synchronize(backend)
display(A)

Instead of this working, I get a huge error that starts with

ERROR: LoadError: InvalidIRError: compiling MethodInstance for gpu_mul2_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(64, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, ::CUDA.CuDeviceMatrix{Float64, 1}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to __memcmp_evex_movbe)
Stacktrace:
 [1] _memcmp
   @ ./strings/string.jl:124
 [2] ==
   @ ./strings/string.jl:136
 [3] multiple call sites
   @ unknown:0
Reason: unsupported call to an unknown function (call to julia.get_pgcstack)
Stacktrace:
 [1] multiple call sites
   @ unknown:0

You can see whole error here.

I also was able to come up with a simpler example using CUDA.jl that produces a similar error, but for some reason occasionally this fails with a segfault, so I'm not sure if the same thing is going on here (or maybe I'm doing something wrong since I've never really worked with CUDA directly). Here's the CUDA MWE:

using LambertW: lambertw
using CUDA: CuArray

A = CuArray(ones(10, 10))
B = lambertw.(A)
display(B)

Is there any way to make lambertw work on the GPU? More specifically work with KernelAbstractions?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions