-
Notifications
You must be signed in to change notification settings - Fork 268
Open
Labels
Description
Behavior
- β
Works:
filename = "output"β createsoutput_rank0.jld2,output_rank1.jld2, etc. - β Fails:
filename = "output.jld2"β hangs (all ranks attempt to write to same file)
Reproducer
using MPI, Oceananigans
MPI.Init()
arch = Distributed(GPU(); partition = Partition(2, 2))
grid = RectilinearGrid(arch, size=(64, 64, 16), extent=(1, 1, 1))
model = NonhydrostaticModel(; grid)
simulation = Simulation(model, Ξt=10, stop_iteration=100)
# This hangs:
simulation.output_writers[:out] = JLD2Writer(model, model.tracers,
filename = "output.jld2",
schedule = IterationInterval(10))
run!(simulation)Test Results
Tested with filename = "output" (without extension):
- Grid: 1440Γ720Γ30 (0.25Β° Arctic Ocean)
- GPUs: 4Γ RTX 4090
- Partition:
Partition(2, 2) - Output interval: 6 hours
Results:
- β All 4 ranks completed successfully
- β
Created
output_rank0.jld2throughoutput_rank3.jld2 - β Good I/O performance (~10ms write time after initialization)
Originally reported in ClimaOcean.jl: CliMA/ClimaOcean.jl#746
Credit to @glwagner for diagnosing the bug.
Reactions are currently unavailable