-
Notifications
You must be signed in to change notification settings - Fork 65
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I am getting this error when using AMDGPU trying to replace all NaNs by another value:
julia> import AMDGPU
julia> x0 = AMDGPU.roc([1,2,3,NaN32]);
julia> x0[isnan.(x0)] .= 0;
┌ Warning: Global hostcalls detected!
│ - Source: MethodInstance for AcceleratedKernels.gpu__accumulate_block!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, ::typeof(+), ::AMDGPU.Device.ROCDeviceVector{Bool, 1}, ::Bool, ::Bool, ::Bool, ::AMDGPU.Device.ROCDeviceVector{Int8, 1}, ::AMDGPU.Device.ROCDeviceVector{Bool, 1})
│ - Hostcalls: [:malloc_hostcall]
│
│ Use `AMDGPU.synchronize(; stop_hostcalls=false)` to synchronize and stop them.
│ Otherwise, performance might degrade if they keep running in the background.
└ @ AMDGPU.Compiler /tmp/julia-depot-DINDiff-barthale/packages/AMDGPU/BgSqf/src/compiler/codegen.jl:208
Memory access fault by GPU node-4 (Agent handle: 0x10f0140) on address 0x152f8c650000. Reason: Unknown.
[115426] signal 6 (-6): Aborted
in expression starting at REPL[5]:1
Allocations: 28864586 (Pool: 28863762; Big: 824); GC: 21
I am using AMDGPU 1.2.3 and julia 1.11.3.
julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Available │ Name │ Version │ Path │
├───────────┼──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ + │ LLD │ - │ /opt/rocm-6.0.3/llvm/bin/ld.lld │
│ + │ Device Libraries │ - │ /tmp/julia-depot-DINDiff-barthale/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode │
│ + │ HIP │ 6.0.32831 │ /opt/rocm-6.0.3/lib/libamdhip64.so │
│ + │ rocBLAS │ 4.0.0 │ /opt/rocm-6.0.3/lib/librocblas.so │
│ + │ rocSOLVER │ 3.24.0 │ /opt/rocm-6.0.3/lib/librocsolver.so │
│ + │ rocSPARSE │ 3.0.2 │ /opt/rocm-6.0.3/lib/librocsparse.so │
│ + │ rocRAND │ 2.10.5 │ /opt/rocm-6.0.3/lib/librocrand.so │
│ + │ rocFFT │ 1.0.27 │ /opt/rocm-6.0.3/lib/librocfft.so │
│ + │ MIOpen │ 3.0.0 │ /opt/rocm-6.0.3/lib/libMIOpen.so │
└───────────┴──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────┘
[ Info: AMDGPU devices
┌────┬─────────────────────┬────────────────────────┬───────────┬────────────┬───────────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │ Shared Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼────────────┼───────────────┤
│ 1 │ AMD Instinct MI250X │ gfx90a:sramecc+:xnack- │ 64 │ 63.984 GiB │ 64.000 KiB │
└────┴─────────────────────┴────────────────────────┴───────────┴────────────┴───────────────┘
This is not a big problem for me as I can use x0 = ifelse.(isnan.(x0),zero(eltype(x0)),x0); instead.
Thank you for this great packages. I would not be able to use some "serious" compute cluster for my research without this package :-)
luraess
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working