Skip to content

Conversation

@Mikolaj-A-Kowalski
Copy link
Collaborator

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski commented Dec 22, 2025

As described in the title I think this is what was happening.

Tagging @jagoosw since I run into the problem while doing OceanBioME with MPI.

Julia script reproducing the problem
using OceanBioME, Oceananigans, Printf
using Oceananigans.Units
using MPI
using Oceananigans.DistributedComputations

# Initialize MPI
MPI.Init()

comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
nranks = MPI.Comm_size(comm)

MPI.Barrier(comm)

Rx = 1
Ry = 1
Rz = 1

@assert Rx * Ry * Rz == nranks  # Product of the process grid dimensions must equal the number of MPI ranks.

partition = Partition(;
    x = Rx, 
    y = Ry, 
    z = Rz,
)
# Set the architecture to use.
arch = Distributed(
    CPU(); 
    partition, 
    communicator = MPI.COMM_WORLD,
    synchronized_communication=true
)

duration = 10days # Duration of the simulation


# set the number of gridpoints in each direction
Nx = 16
Ny = 16
Nz = 8

# set the domain size
Lx = 1kilometer
Ly = 1kilometer
Lz = 100meters

distributed_grid = RectilinearGrid(arch, size = (Nx, Ny, Nz), extent = (Lx, Ly, Lz))
grid = distributed_grid # alias for convenience

# Set the Coriolis and buoyancy models.
coriolis = FPlane(f = 1e-4) # [s⁻¹]
buoyancy = SeawaterBuoyancy()

# Specify parameters that are used to construct the background state.
background_state_parameters = (M = 1e-4,       # s⁻¹, geostrophic shear
                               f = coriolis.f, # s⁻¹, Coriolis parameter
                               N = 1e-4,       # s⁻¹, buoyancy frequency
                               H = grid.Lz,
                               g = buoyancy.gravitational_acceleration,
                               α = buoyancy.equation_of_state.thermal_expansion)


T(x, y, z, t, p) = (p.M^2 * x + p.N^2 * (z + p.H)) / (p.g * p.α)
V(x, y, z, t, p) = p.M^2 / p.f * (z + p.H)

V_field = BackgroundField(V, parameters = background_state_parameters)
T_field = BackgroundField(T, parameters = background_state_parameters)

# Specify some horizontal and vertical viscosity and diffusivity.
νᵥ = κᵥ = 1e-4 # [m² s⁻¹]
vertical_diffusivity = VerticalScalarDiffusivity(ν = νᵥ, κ = κᵥ)

# Setup the biogeochemical model with optional carbonate chemistry turned on.
biogeochemistry = LOBSTER(; grid,
                            carbonate_system = CarbonateSystem(),
                            oxygen = Oxygen(),
                            detritus = VariableRedfieldDetritus(grid),
                            scale_negatives = true)



DIC_bcs = FieldBoundaryConditions(top = CarbonDioxideGasExchangeBoundaryCondition())


# Model instantiation
model = NonhydrostaticModel(; grid,
                              biogeochemistry,
                              boundary_conditions = (DIC = DIC_bcs, ),
                              advection = WENO(),
                              timestepper = :RungeKutta3,
                              coriolis,
                              tracers = (:T, :S),
                              buoyancy,
                              background_fields = (T = T_field, v = V_field),
                              closure = vertical_diffusivity)



MPI.Barrier(comm)
MPI.Finalize()

I would get the message that the CommunicationBuffers do not have property time (also with clock variable beeing of type CommunicationBuffers) form:

# Return ContinuousBoundaryFunction on the bottom or top interface of a cell adjacent to an immersed boundary
@inline function getbc(cbf::ZBoundaryFunction{LX, LY, S}, i::Integer, j::Integer, k::Integer,
grid::AbstractGrid, clock, model_fields, args...) where {LX, LY, S}
k′ = cell_boundary_index(S(), k)
args = user_function_arguments(i, j, k, grid, model_fields, cbf.parameters, cbf)
X = node(i, j, k′, grid, LX(), LY(), Face())
return cbf.func(X..., clock.time, args...)
end

I think that the problem was what I have changed in this PR, that is that the buffer would get slurped into args... and incorrectly passed to the kernel here:

fill_halo_event!(c, kernels![task], bcs[task], loc, grid, args...; kwargs...)

I am not sure though if I am not missing some use case where the first argument should be the buffers...

@Mikolaj-A-Kowalski
Copy link
Collaborator Author

Mikolaj-A-Kowalski commented Dec 22, 2025

I am not sure though if I am not missing some use case where the first argument should be the buffers...

Apparently I did :-/

https://github.com/CliMA/Oceananigans.jl/actions/runs/20431663262/job/58703558803?pr=5063

EDIT: I will probably not have much time to work on it until new year so in the meantime feel 100% free to close it, push etc. ;-)

@navidcy navidcy added the distributed 🕸️ Our plan for total cluster domination label Dec 22, 2025
@simone-silvestri
Copy link
Collaborator

You probably need to extend

@inline fill_halo_event!(c, kernel!, bcs, loc, grid, args...; kwargs...) = kernel!(c, bcs..., loc, grid, Tuple(args))
@inline fill_halo_event!(c, ::Nothing, ::NoBCs, loc, grid, args...; kwargs...) = nothing

in the DistributedComputations module to throw away the buffer argument in case of a distributed grid and a kernel which is not a DistributedFillHalo, something like

fill_halo_event!(c, kernel!, bcs, loc, grid::DistributedGrid, buffers, args...; kwargs...) = kernel!(c, bcs..., loc, grid, Tuple(args)) 
fill_halo_event!(c, ::Nothing, ::NoBCs, loc, grid::DistributedGrid, buffers, args...; kwargs...) = nothing

@simone-silvestri
Copy link
Collaborator

pretty annoying that we cannot run the distributed tests from a fork. @Mikolaj-A-Kowalski if you have time can you verify locally that the tests run (on CPU at least, there should be no difference on a GPU)?

Copy link
Collaborator Author

@Mikolaj-A-Kowalski Mikolaj-A-Kowalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty annoying that we cannot run the distributed tests from a fork. @Mikolaj-A-Kowalski if you have time can you verify locally that the tests run (on CPU at least, there should be no difference on a GPU)?

I tried running the tests on my side. However, I wasn't able to get the new test to work.
I have added some comments with the things I had to change when I was trying to debug that, but in the end I was not able to get it pass the assertions (was giving me values different than expected in the test for u).

Maybe it will be better to close this PR and reopen from within the Oceananigans repo to let the CI run normally.

@simone-silvestri simone-silvestri merged commit 459a553 into CliMA:main Jan 24, 2026
62 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

distributed 🕸️ Our plan for total cluster domination

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants