Skip to content

Undefined variable error in kernel after update to 0.9.34 #575

@nHackel

Description

@nHackel

Hello, a kernel I wrote with KernalAbstractions.jl is now broken with version 0.9.34. I've reduced my kernel to this MWE:

 using KernelAbstractions

function foo(arr::AbstractMatrix{T}) where T
	backend = get_backend(arr)

	@kernel function foo_kernel(res, @Const(arr))
		row = @index(Group, Linear)
		grid_stride = prod(@groupsize)
		localIdx = @index(Local, Linear)
		N = size(arr, 2)
		
		shared = @localmem eltype(res) grid_stride
		shared[localIdx] = zero(eltype(res))
		
		tmp = zero(eltype(res))
		for i = localIdx:grid_stride:N
			tmp += arr[row, i]
		end
		shared[localIdx] = tmp
		@synchronize


		@private s = div(min(grid_stride, N), Int32(2))
		while s > Int32(0)
			if localIdx <= s
			shared[localIdx] = shared[localIdx] + shared[localIdx + s]
			end
			s >>= 1
			@synchronize
		end
	
		if localIdx == 1
			res[row] = shared[localIdx]
		end
	end
	
	res = similar(arr, size(arr, 1))
	kernel = foo_kernel(backend, 512)
	kernel(res, arr; ndrange = (512, size(arr, 1)))
	KernelAbstractions.synchronize(backend)
	return res
end


using CUDA
A = cu(rand(1024, 1024))
foo(A)

I'll try to reduce it even further.

I've run into this issue with both CUDA.jl and AMDGPU.jl with my custom kernel, however I could only reproduce the error with the MWE on CUDA.jl, since I don't have access to the AMD card at the moment.

With CUDA.jl I get the following error:

ERROR: a undefined variable error was thrown during kernel execution on thread (449, 1, 1) in block (40, 1, 1).
Stacktrace:                                                                                                              
[1] macro expansion at C:\...\.julia\packages\KernelAbstractions\sWSE0\src\KernelAbstractions.jl:242
[2] gpu_foo_kernel at C:\...\.julia\packages\KernelAbstractions\sWSE0\src\macros.jl:318
[3] gpu_foo_kernel at .\none:0                                                                                                                                                                                                                 

ERROR: LoadError: KernelException: exception thrown during kernel execution on device Quadro RTX 5000
Stacktrace:                                                                                                              
[1] check_exceptions()

(Note that on Windows I can't copy paste the MWE directly into the REPL and have to include it from a file, while on Linux it works. On both systems I can reproduce the error)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions