- 
                Notifications
    You must be signed in to change notification settings 
- Fork 79
Open
Description
Hi, I am trying to write a matvec kernel with shared memory, it works for CUDA backend. However, it results in an error when switched to CPU backend:
ERROR: LoadError: UndefVarError: `j` not defined in `Main`
Stacktrace:
 [1] cpu_matvec_kernel!
How to fix it? many thanks.
The julia script is shown below:
using KernelAbstractions
using CUDA
using Test
@kernel function matvec_kernel!(output, @Const(A), @Const(b))
    I = @index(Global, Linear)
    I = div(I-1, 32) + 1
    idx = @index(Local, Linear)
    i = (idx - 1) % 32 + 1  #local index within the wrap
    cache_size = @uniform @groupsize()
    cache = @localmem eltype(output) cache_size
    N = size(A, 2)
    sum = zero(eltype(output))
    @inbounds begin
        for J = i:32:N
            sum += A[I, J] * b[J]
        end
        cache[idx] = sum
    end
    @synchronize
    j::Int = 16
    while j > 0
        if i <= j
            @inbounds cache[idx] += cache[idx + j]  # can not find j for cpu backend
        end
        @synchronize
        j = j ÷ 2
    end
    if i == 1
        @inbounds output[I] = cache[idx]
    end
    
end
function matvec!(output, A, b)
    backend = KernelAbstractions.get_backend(A)
    kernel! = matvec_kernel!(backend, 256)
    kernel!(output, A, b; ndrange=32*size(A, 1))
end
m, n = 2^10, 2^10
A = CUDA.rand(Float32, m, n)
b = CUDA.rand(Float32, n)
output = CUDA.rand(Float32, m)
matvec!(output, A, b)
@test isapprox(output, A * b)
matvec!(Array(output), Array(A), Array(b))The versioninfo() gives:
julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12a (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 24 × 13th Gen Intel(R) Core(TM) i7-13700F
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)
julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 561.3.0
CUDA libraries:
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+561.3
Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0
Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6
1 device:
  0: NVIDIA GeForce RTX 3060 Ti (sm_86, 5.048 GiB / 8.000 GiB available)
Metadata
Metadata
Assignees
Labels
No labels