-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Running the sharding weak scaling test on ALPS targeting a single GPU errors with type ReactantState has no field connectivity.
ERROR: LoadError: type ReactantState has no field connectivity
Stacktrace:
[1] getproperty(x::ReactantState, f::Symbol)
@ Base ./Base.jl:49
[2] top-level scope
@ /capstor/scratch/cscs/lraess/GB-25/sharding/runs/2026-02-23T11-14-58.674_uVu1/sharded_baroclinic_instability_simulation_run.jl:86
in expression starting at /capstor/scratch/cscs/lraess/GB-25/sharding/runs/2026-02-23T11-14-58.674_uVu1/sharded_baroclinic_instability_simulation_run.jl:86
┌ Debug: [GETPID 286099] Cleanup Backend State, Reactant.XLA.IFRTBackendState(true, Dict{String, Reactant.XLA.IFRT.Client}("cpu" => Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001b97e790), "cuda" => Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001c239990)), Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001c239990)), Reactant.XLA.State(0, 1, nothing, Reactant.XLA.DistributedRuntimeService(Ptr{Nothing} @0x000000001c3b3cd0), Reactant.XLA.DistributedRuntimeClient(Ptr{Nothing} @0x000000001b276f80), "nid005812:63939", "[::]:63939")
└ @ Reactant.XLA /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/XLA.jl:111
┌ Debug: [GETPID 286099] Finalizing backend state, Reactant.XLA.IFRTBackendState(true, Dict{String, Reactant.XLA.IFRT.Client}("cpu" => Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001b97e790), "cuda" => Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001c239990)), Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001c239990))
└ @ Reactant.XLA /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/XLA.jl:77
┌ Debug: [GETPID 286099] Freeing Client Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001b97e790)
└ @ Reactant.XLA.IFRT /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/IFRT/Client.jl:14
┌ Debug: [GETPID 286099] Freeing Client Reactant.XLA.IFRT.Client(Ptr{Nothing} @0x000000001c239990)
└ @ Reactant.XLA.IFRT /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/IFRT/Client.jl:14
┌ Debug: [GETPID 286099] Shutdown DistributedRuntimeClient
└ @ Reactant.XLA /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/Distributed.jl:52
I0000 00:00:1771845448.636690 286099 client.cc:151] Distributed task shutdown initiated.
I0000 00:00:1771845448.636798 286099 coordination_service_agent.cc:393] Coordination agent has initiated Shutdown().
I0000 00:00:1771845448.637367 286775 coordination_service.cc:1373] Barrier(Shutdown::7348303068028219848::0) has passed with status: OK
I0000 00:00:1771845448.637472 286775 coordination_service.cc:1725] Shutdown barrier in coordination service has passed.
I0000 00:00:1771845448.637558 286099 coordination_service_agent.cc:411] Coordination agent has successfully shut down.
I0000 00:00:1771845448.637767 286099 client.cc:153] Distributed task shutdown result: OK
I0000 00:00:1771845448.637787 286797 coordination_service_agent.cc:288] Cancelling error polling because the service or the agent is shutting down.
┌ Debug: [GETPID 286099] Shutting down DistributedRuntimeService
└ @ Reactant.XLA /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/Distributed.jl:100
I0000 00:00:1771845448.637840 286099 service.cc:115] Jax service shutting down
I0000 00:00:1771845448.649744 286775 coordination_service.cc:746] /job:jax_worker/replica:0/task:0 has disconnected from coordination service.
┌ Debug: [GETPID 286099] Freeing distributed runtime client
└ @ Reactant.XLA /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/Distributed.jl:34
┌ Debug: [GETPID 286099] Freeing DistributedRuntimeService
└ @ Reactant.XLA /capstor/scratch/cscs/lraess/.julia/gh200/juliaup/depot/packages/Reactant/j2PDd/src/xla/Distributed.jl:91
srun: error: nid005812: task 0: Exited with exit code 1
srun: Terminating StepId=2742723.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels