Skip to content

localmem does not work for cpu backend? #544

@ww1g11

Description

@ww1g11

Hi, I am trying to write a matvec kernel with shared memory, it works for CUDA backend. However, it results in an error when switched to CPU backend:

ERROR: LoadError: UndefVarError: `j` not defined in `Main`
Stacktrace:
 [1] cpu_matvec_kernel!

How to fix it? many thanks.

The julia script is shown below:

using KernelAbstractions
using CUDA
using Test

@kernel function matvec_kernel!(output, @Const(A), @Const(b))
    I = @index(Global, Linear)
    I = div(I-1, 32) + 1
    idx = @index(Local, Linear)
    i = (idx - 1) % 32 + 1  #local index within the wrap

    cache_size = @uniform @groupsize()
    cache = @localmem eltype(output) cache_size

    N = size(A, 2)
    sum = zero(eltype(output))
    @inbounds begin
        for J = i:32:N
            sum += A[I, J] * b[J]
        end
        cache[idx] = sum
    end
    @synchronize

    j::Int = 16
    while j > 0
        if i <= j
            @inbounds cache[idx] += cache[idx + j]  # can not find j for cpu backend
        end
        @synchronize
        j = j ÷ 2
    end

    if i == 1
        @inbounds output[I] = cache[idx]
    end
    
end

function matvec!(output, A, b)
    backend = KernelAbstractions.get_backend(A)
    kernel! = matvec_kernel!(backend, 256)
    kernel!(output, A, b; ndrange=32*size(A, 1))
end


m, n = 2^10, 2^10
A = CUDA.rand(Float32, m, n)
b = CUDA.rand(Float32, n)
output = CUDA.rand(Float32, m)

matvec!(output, A, b)
@test isapprox(output, A * b)

matvec!(Array(output), Array(A), Array(b))

The versioninfo() gives:

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12a (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 24 × 13th Gen Intel(R) Core(TM) i7-13700F
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 561.3.0

CUDA libraries:
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+561.3

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce RTX 3060 Ti (sm_86, 5.048 GiB / 8.000 GiB available)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions