Skip to content

Cartesian indexing issue for high dimensional arrays #340

@GiggleLiu

Description

@GiggleLiu

When the dimension of the array is larger than 16, there is dynamic allocation. Many CUDA array functions will break.

julia> x = (fill(2, 15)...,)
(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)

julia> @benchmark CartesianIndices($x)[3]
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     84.947 ns (0.00% GC)
  median time:      85.537 ns (0.00% GC)
  mean time:        86.939 ns (0.00% GC)
  maximum time:     183.274 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     963

julia> x = (fill(2, 16)...,)
(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)

julia> @benchmark CartesianIndices($x)[3]
BenchmarkTools.Trial: 
  memory estimate:  4.39 KiB
  allocs estimate:  73
  --------------
  minimum time:     3.644 μs (0.00% GC)
  median time:      3.713 μs (0.00% GC)
  mean time:        4.002 μs (0.98% GC)
  maximum time:     68.644 μs (92.66% GC)
  --------------
  samples:          10000
  evals/sample:     8

I think it might be related to tuple splatting. Current CartesianIndices getindex is

function getindex(A::AbstractArray, I...)
    @_propagate_inbounds_meta
    error_if_canonical_getindex(IndexStyle(A), A, I...)
    _getindex(IndexStyle(A), A, to_indices(A, I)...)
end

MWE to trigger the error

julia> cz = CUDA.zeros(fill(2, 20)...);

julia> cz / Float32(0.3)
ERROR: InvalidIRError: compiling kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Float32,20,1}, Base.Broadcast.Broadcasted{Nothing,NTuple{20,Base.OneTo{Int64}},typeof(/),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,20,1},NTuple{20,Bool},NTuple{20,Int64}},Float32}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] macro expansion at /home/leo/.julia/dev/GPUArrays/src/device/indexing.jl:81
 [2] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:61
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:62
Reason: unsupported dynamic function invocation (call to setindex!)
Stacktrace:
 [1] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:62
Reason: unsupported call through a literal pointer (call to jl_alloc_array_1d)
Stacktrace:
 [1] Array at boot.jl:406
 [2] map at tuple.jl:168
 [3] axes at abstractarray.jl:75
 [4] CartesianIndices at multidimensional.jl:264
 [5] macro expansion at /home/leo/.julia/dev/GPUArrays/src/device/indexing.jl:81
 [6] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:61
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] map at tuple.jl:172
 [2] axes at abstractarray.jl:75
 [3] CartesianIndices at multidimensional.jl:264
 [4] macro expansion at /home/leo/.julia/dev/GPUArrays/src/device/indexing.jl:81
 [5] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:61
Reason: unsupported dynamic function invocation (call to CartesianIndices)
Stacktrace:
 [1] CartesianIndices at multidimensional.jl:264
 [2] macro expansion at /home/leo/.julia/dev/GPUArrays/src/device/indexing.jl:81
 [3] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:61
Stacktrace:
 [1] check_ir(::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget,CUDA.CUDACompilerParams}, ::LLVM.Module) at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/validation.jl:123
 [2] macro expansion at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:239 [inlined]
 [3] macro expansion at /home/leo/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [4] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:237
 [5] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
 [6] compile at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [7] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/leo/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:310
 [8] cufunction_compile(::GPUCompiler.FunctionSpec) at /home/leo/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:305
 [9] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12",Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,20,1},Base.Broadcast.Broadcasted{Nothing,NTuple{20,Base.OneTo{Int64}},typeof(/),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,20,1},NTuple{20,Bool},NTuple{20,Int64}},Float32}},Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [10] broadcast_kernel at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:60 [inlined]
 [11] cached_compilation at /home/leo/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65 [inlined]
 [12] cufunction(::GPUArrays.var"#broadcast_kernel#12", ::Type{Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,20,1},Base.Broadcast.Broadcasted{Nothing,NTuple{20,Base.OneTo{Int64}},typeof(/),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,20,1},NTuple{20,Bool},NTuple{20,Int64}},Float32}},Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/leo/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:297
 [13] cufunction at /home/leo/.julia/packages/CUDA/YeS8q/src/compiler/execution.jl:294 [inlined]
 [14] launch_heuristic(::CUDA.CuArrayBackend, ::GPUArrays.var"#broadcast_kernel#12", ::CuArray{Float32,20}, ::Base.Broadcast.Broadcasted{Nothing,NTuple{20,Base.OneTo{Int64}},typeof(/),Tuple{Base.Broadcast.Extruded{CuArray{Float32,20},NTuple{20,Bool},NTuple{20,Int64}},Float32}}, ::Int64; maximize_blocksize::Bool) at /home/leo/.julia/packages/CUDA/YeS8q/src/gpuarrays.jl:19
 [15] launch_heuristic at /home/leo/.julia/packages/CUDA/YeS8q/src/gpuarrays.jl:17 [inlined]
 [16] copyto!(::CuArray{Float32,20}, ::Base.Broadcast.Broadcasted{Nothing,NTuple{20,Base.OneTo{Int64}},typeof(/),Tuple{CuArray{Float32,20},Float32}}) at /home/leo/.julia/dev/GPUArrays/src/host/broadcast.jl:66
 [17] copyto! at ./broadcast.jl:886 [inlined]
 [18] copy(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{20},NTuple{20,Base.OneTo{Int64}},typeof(/),Tuple{CuArray{Float32,20},Float32}}) at ./broadcast.jl:862
 [19] materialize at ./broadcast.jl:837 [inlined]
 [20] broadcast_preserving_zero_d at ./broadcast.jl:826 [inlined]
 [21] /(::CuArray{Float32,20}, ::Float32) at ./arraymath.jl:55
 [22] top-level scope at REPL[16]:1

Related issues JuliaLang/julia#27398 , but it seems was fixed since 2018...
@Roger-luo do you know what is happening in this case?

Originally posted by @GiggleLiu in #334 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions