- 
                Notifications
    You must be signed in to change notification settings 
- Fork 79
Closed
Description
Hello, a kernel I wrote with KernalAbstractions.jl is now broken with version 0.9.34. I've reduced my kernel to this MWE:
 using KernelAbstractions
function foo(arr::AbstractMatrix{T}) where T
	backend = get_backend(arr)
	@kernel function foo_kernel(res, @Const(arr))
		row = @index(Group, Linear)
		grid_stride = prod(@groupsize)
		localIdx = @index(Local, Linear)
		N = size(arr, 2)
		
		shared = @localmem eltype(res) grid_stride
		shared[localIdx] = zero(eltype(res))
		
		tmp = zero(eltype(res))
		for i = localIdx:grid_stride:N
			tmp += arr[row, i]
		end
		shared[localIdx] = tmp
		@synchronize
		@private s = div(min(grid_stride, N), Int32(2))
		while s > Int32(0)
			if localIdx <= s
			shared[localIdx] = shared[localIdx] + shared[localIdx + s]
			end
			s >>= 1
			@synchronize
		end
	
		if localIdx == 1
			res[row] = shared[localIdx]
		end
	end
	
	res = similar(arr, size(arr, 1))
	kernel = foo_kernel(backend, 512)
	kernel(res, arr; ndrange = (512, size(arr, 1)))
	KernelAbstractions.synchronize(backend)
	return res
end
using CUDA
A = cu(rand(1024, 1024))
foo(A)I'll try to reduce it even further.
I've run into this issue with both CUDA.jl and AMDGPU.jl with my custom kernel, however I could only reproduce the error with the MWE on CUDA.jl, since I don't have access to the AMD card at the moment.
With CUDA.jl I get the following error:
ERROR: a undefined variable error was thrown during kernel execution on thread (449, 1, 1) in block (40, 1, 1).
Stacktrace:                                                                                                              
[1] macro expansion at C:\...\.julia\packages\KernelAbstractions\sWSE0\src\KernelAbstractions.jl:242
[2] gpu_foo_kernel at C:\...\.julia\packages\KernelAbstractions\sWSE0\src\macros.jl:318
[3] gpu_foo_kernel at .\none:0                                                                                                                                                                                                                 
ERROR: LoadError: KernelException: exception thrown during kernel execution on device Quadro RTX 5000
Stacktrace:                                                                                                              
[1] check_exceptions()(Note that on Windows I can't copy paste the MWE directly into the REPL and have to include it from a file, while on Linux it works. On both systems I can reproduce the error)
Metadata
Metadata
Assignees
Labels
No labels