-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Questionnaire
- Does ROCm works for you outside of Julia, e.g. C/C++/Python?
yes
- Post output of
rocminfo.
output of `rocminfo`
$ rocminfo
ROCk module version 6.3.6 is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 131342572(0x7d420ec) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 131342572(0x7d420ec) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 131342572(0x7d420ec) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 1
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 132111288(0x7dfdbb8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 132111288(0x7dfdbb8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 132111288(0x7dfdbb8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 3
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 2
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 2
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 132111292(0x7dfdbbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 132111292(0x7dfdbbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 132111292(0x7dfdbbc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 4
*******
Name: AMD EPYC 7A53 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7A53 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 3
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 3
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 132008820(0x7de4b74) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 132008820(0x7de4b74) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 132008820(0x7de4b74) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 5
*******
Name: gfx90a
Uuid: GPU-23e41c33cd99e2ca
Marketing Name: AMD Instinct MI250X
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 4
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 29704(0x7408)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1700
BDFID: 55552
Internal Node ID: 4
Compute Unit: 110
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: TRUE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 2048(0x800)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 78
SDMA engine uCode:: 8
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 67092480(0x3ffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 4
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
- Post output of
AMDGPU.versioninfo()if possible.
Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬──────────────────────────────────────────────────────────────────────────────────────────┐
│ Available │ Name │ Version │ Path │
├───────────┼──────────────────┼───────────┼──────────────────────────────────────────────────────────────────────────────────────────┤
│ + │ LLD │ - │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/llvm/bin/ld.lld │
│ + │ Device Libraries │ - │ /users/barthale/.julia/artifacts/b46ab46ef568406312e5f500efb677511199c2f9/amdgcn/bitcode │
│ + │ HIP │ 6.2.41134 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/libamdhip64.so │
│ + │ rocBLAS │ 4.2.1 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/librocblas.so │
│ + │ rocSOLVER │ 3.26.0 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/librocsolver.so │
│ + │ rocSPARSE │ 3.2.0 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/librocsparse.so │
│ + │ rocRAND │ 2.10.5 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/librocrand.so │
│ + │ rocFFT │ 1.0.29 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/librocfft.so │
│ + │ MIOpen │ 3.2.0 │ /appl/lumi/SW/LUMI-24.03/G/EB/rocm/6.2.2/lib/libMIOpen.so │
└───────────┴──────────────────┴───────────┴──────────────────────────────────────────────────────────────────────────────────────────┘
Reproducing the bug
- Describe what's not working.
While experimenting for issue #844 , I found that allocating large (>= 4GB) with AMDGPU.ones fails. The GPU should have 64 GB available.
- Provide MWE to reproduce it (if possible).
This is the first command in a julia session:
julia> using AMDGPU; a= AMDGPU.ones(UInt8,4*1024^3);
ERROR: HIPError(code hipErrorInvalidConfiguration, invalid configuration)
Stacktrace:
[1] check
@ ~/.julia/packages/AMDGPU/TqRG0/src/hip/error.jl:145 [inlined]
[2] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/utils.jl:123 [inlined]
[3] hipModuleLaunchKernel(f::AMDGPU.HIP.HIPFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, stream::HIPStream, kernelParams::Vector{…}, extra::Ptr{…})
@ AMDGPU.HIP ~/.julia/packages/AMDGPU/TqRG0/src/hip/libhip.jl:5120
[4] #33
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:130 [inlined]
[5] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:110 [inlined]
[6] macro expansion
@ ./none:0 [inlined]
[7] pack_arguments(::AMDGPU.Runtime.var"#33#34"{…}, ::AMDGPU.KernelState, ::KernelAbstractions.CompilerMetadata{…}, ::AMDGPU.Device.ROCDeviceVector{…}, ::UInt8)
@ AMDGPU.Runtime ./none:0
[8] #launch#31
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:124 [inlined]
[9] launch
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:116 [inlined]
[10] #25
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:85 [inlined]
[11] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:78 [inlined]
[12] macro expansion
@ ./none:0 [inlined]
[13] convert_arguments
@ ./none:0 [inlined]
[14] #roccall#23
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:86 [inlined]
[15] roccall
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:84 [inlined]
[16] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:50 [inlined]
[17] macro expansion
@ ./none:0 [inlined]
[18] #call#1
@ ./none:0 [inlined]
[19] call
@ ./none:-1 [inlined]
[20] (::AMDGPU.Runtime.HIPKernel{…})(::KernelAbstractions.CompilerMetadata{…}, ::ROCArray{…}, ::UInt8; stream::HIPStream, call_kwargs::@Kwargs{…})
@ AMDGPU.Runtime ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:59
[21] (::KernelAbstractions.Kernel{…})(::ROCArray{…}, ::Vararg{…}; ndrange::Int64, workgroupsize::Nothing)
@ AMDGPU.ROCKernels ~/.julia/packages/AMDGPU/TqRG0/src/ROCKernels.jl:113
[22] Kernel
@ ~/.julia/packages/AMDGPU/TqRG0/src/ROCKernels.jl:93 [inlined]
[23] fill!(A::ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}, x::UInt8)
@ GPUArrays ~/.julia/packages/GPUArrays/w335n/src/host/construction.jl:22
[24] ones(T::Type, dims::Int64)
@ AMDGPU ~/.julia/packages/AMDGPU/TqRG0/src/array.jl:253
[25] top-level scope
@ REPL[1]:1
Some type information was truncated. Use `show(err)` to see complete types.
The allocation per-se seem to work, but the error is raised in the assignement.
julia> using AMDGPU; a = ROCVector{UInt8}(undef,4*1024^3);
julia> a .= 1;
ERROR: HIPError(code hipErrorInvalidConfiguration, invalid configuration)
Stacktrace:
[1] check
@ ~/.julia/packages/AMDGPU/TqRG0/src/hip/error.jl:145 [inlined]
[2] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/utils.jl:123 [inlined]
[3] hipModuleLaunchKernel(f::AMDGPU.HIP.HIPFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, stream::HIPStream, kernelParams::Vector{…}, extra::Ptr{…})
@ AMDGPU.HIP ~/.julia/packages/AMDGPU/TqRG0/src/hip/libhip.jl:5120
[4] #33
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:130 [inlined]
[5] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:110 [inlined]
[6] macro expansion
@ ./none:0 [inlined]
[7] pack_arguments(::AMDGPU.Runtime.var"#33#34"{…}, ::AMDGPU.KernelState, ::KernelAbstractions.CompilerMetadata{…}, ::AMDGPU.Device.ROCDeviceVector{…}, ::Base.Broadcast.Broadcasted{…})
@ AMDGPU.Runtime ./none:0
[8] #launch#31
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:124 [inlined]
[9] launch
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:116 [inlined]
[10] #25
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:85 [inlined]
[11] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:78 [inlined]
[12] macro expansion
@ ./none:0 [inlined]
[13] convert_arguments
@ ./none:0 [inlined]
[14] #roccall#23
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:86 [inlined]
[15] roccall
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:84 [inlined]
[16] macro expansion
@ ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:50 [inlined]
[17] macro expansion
@ ./none:0 [inlined]
[18] #call#1
@ ./none:0 [inlined]
[19] call
@ ./none:-1 [inlined]
[20] (::AMDGPU.Runtime.HIPKernel{…})(::KernelAbstractions.CompilerMetadata{…}, ::ROCArray{…}, ::Base.Broadcast.Broadcasted{…}; stream::HIPStream, call_kwargs::@Kwargs{…})
@ AMDGPU.Runtime ~/.julia/packages/AMDGPU/TqRG0/src/runtime/hip-execution.jl:59
[21] (::KernelAbstractions.Kernel{…})(::ROCArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
@ AMDGPU.ROCKernels ~/.julia/packages/AMDGPU/TqRG0/src/ROCKernels.jl:113
[22] Kernel
@ ~/.julia/packages/AMDGPU/TqRG0/src/ROCKernels.jl:93 [inlined]
[23] _copyto!
@ ~/.julia/packages/GPUArrays/w335n/src/host/broadcast.jl:71 [inlined]
[24] materialize!
@ ~/.julia/packages/GPUArrays/w335n/src/host/broadcast.jl:38 [inlined]
[25] materialize!(dest::ROCArray{…}, bc::Base.Broadcast.Broadcasted{…})
@ Base.Broadcast ./broadcast.jl:902
[26] top-level scope
@ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.
(@lux-dev) pkg> st
Status `/pfs/lustrep4/users/barthale/.julia/environments/lux-dev/Project.toml`
[21141c5a] AMDGPU v2.1.2
⌃ [6e4b80f9] BenchmarkTools v1.6.0
[63c18a36] KernelAbstractions v0.9.39
⌃ [b2108857] Lux v1.24.0
[872c559c] NNlib v0.9.31
[90137ffa] StaticArrays v1.9.15
[36e64239] Unroll v0.1.0
Info Packages marked with ⌃ have new versions available and may be upgradable.
Is that a known issue?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working