Update CI to Julia version to 1.12.0 #4836

navidcy · 2025-10-09T00:52:08Z

No description provided.

simone-silvestri · 2025-10-09T06:52:59Z

I am very interested in this. Let's hope it works and we can move on from julia 1.10

simone-silvestri · 2025-10-09T07:09:13Z

I am disabling the reactant tests for the moment to check if the rest works.

simone-silvestri · 2025-10-09T07:56:43Z

If docs still break on the internal_tide.jl example NaNing, I would try with a GridFittedBottom instead of a PartialCellBottom, which is not even tested, and, if tests pass, merge this PR and move forward trying to assess what the problems with partial cells are.

simone-silvestri · 2025-10-09T08:23:25Z

Seems that we are hitting the same NaN issue on the internal tide example

navidcy · 2025-10-09T08:24:25Z

Seems that we are hitting the same NaN issue on the internal tide example

the ghosts of the past still haunt us....

simone-silvestri · 2025-10-09T08:39:21Z

Apparently also GridFittedBottom fails...

simone-silvestri · 2025-10-16T08:08:39Z

If I run the example locally, it works. Why would it error on CI? Do we have a way to reproduce this error locally?

ali-ramadhan · 2025-10-16T13:00:10Z

If I run the example locally, it works. Why would it error on CI? Do we have a way to reproduce this error locally?

One thing to try might be to run the example locally and on CI using the exact same Manifest.toml if possible. We can commit a Manifest.toml to this branch for debugging. I can't think of which dependency would lead to such a big difference but it's one thing we can control for.

navidcy · 2025-10-16T22:25:38Z

From the Julia v1.11 chat I recall that the error was showing up only for unix, not for mac?

giordano · 2025-10-26T20:59:45Z

With this environment (manifests for v1.11 and v1.12 both included) environment.tar.gz, https://github.com/CliMA/Oceananigans.jl/blob/ea25179c7af868175fc295c2bf1dbfe78ec3cd4f/examples/internal_tide.jl works on macOS but not on Linux (tested only Ubuntu so far), doesn't seem to be a matter of package versions.

giordano · 2025-10-26T22:39:50Z

I can make the simulation error early with

diff --git a/src/Diagnostics/nan_checker.jl b/src/Diagnostics/nan_checker.jl
index 57945c5dc..893a9e283 100644
--- a/src/Diagnostics/nan_checker.jl
+++ b/src/Diagnostics/nan_checker.jl
@@ -5,7 +5,7 @@ mutable struct NaNChecker{F}
     erroring :: Bool
 end
 
-NaNChecker(fields) = NaNChecker(fields, false) # default
+NaNChecker(fields) = NaNChecker(fields, true) # default
 default_nan_checker(model) = nothing
 
 function Base.summary(nc::NaNChecker)
@@ -28,7 +28,7 @@ a container with key-value pairs like a dictionary or `NamedTuple`.
 
 If `erroring=true`, the `NaNChecker` will throw an error on NaN detection.
 """
-NaNChecker(; fields, erroring=false) = NaNChecker(fields, erroring)
+NaNChecker(; fields, erroring=true) = NaNChecker(fields, erroring)
 
 hasnan(field::AbstractArray) = any(isnan, parent(field))
 hasnan(model) = hasnan(first(fields(model)))

I presume there's also a way to set the NaNChecker explicitly for a simulation, but I couldn't quickly figure it out.

Can we use a callback to print out to file all the steps, so that we can compare 1:1 the progress on different machines? Presumably we're initially interested in the field u, that's what the error is about at least:

julia> run!(simulation)
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (17.810 ms)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (9.452 ms).
ERROR: time = 60000.0, iteration = 200: NaN found in field u. Aborting simulation.

giordano · 2025-10-26T22:52:24Z

Before

Oceananigans.jl/examples/internal_tide.jl

Line 175 in ea25179

run!(simulation)

I see

julia> simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.281029, min=0.281029, mean=0.281029

on both machines, if I'm looking at the right field and this display says enough about it, then they're the same at the beginning, but then on macOS I have

julia> time_step!(simulation); simulation.model.velocities.u
[ Info: Initializing simulation...
[ Info: Iter: 0, time: 0 seconds, wall time: 2.256 minutes, max|w|: 2.089e-03, m s⁻¹
[ Info:     ... simulation initialization complete (887.307 ms)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (128.489 ms).
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.31715, min=0.265116, mean=0.280967

julia> time_step!(simulation); simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.335864, min=0.264486, mean=0.280859

and on Ubuntu

julia> time_step!(simulation); simulation.model.velocities.u
[ Info: Initializing simulation...
[ Info: Iter: 0, time: 0 seconds, wall time: 2.391 minutes, max|w|: 2.089e-03, m s⁻¹
[ Info:     ... simulation initialization complete (1.130 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (20.645 ms).
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.31715, min=0.265116, mean=0.280967

julia> time_step!(simulation); simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.333478, min=0.264645, mean=0.280863

so there's a significant divergence already after two timesteps.

Update:

julia> time_step!(simulation); simulation.model.velocities.u
[ Info: Initializing simulation...
[ Info: Iter: 0, time: 0 seconds, wall time: 2.269 minutes, max|w|: 2.089e-03, m s⁻¹
[ Info:     ... simulation initialization complete (11.788 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (12.640 seconds).
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.31715, min=0.265116, mean=0.280967

julia> time_step!(simulation); simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│   └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
    └── max=0.335864, min=0.264486, mean=0.280859

is also what I see on Ubuntu with Julia v1.10, which is consistent with all versions of Julia on macOS.

giordano · 2025-10-27T00:06:21Z

The plot thickens: it works correctly in Julia v1.12 on Ampere eMAG (aarch64) with AlmaLinux 8.10 as operating system, which rules out an operating system difference. aarch64 is also the architecture on macOS, so I'm starting to suspect there's an architecture dependence. Can someone point me to the operation performed on the u field? Is there any chance there you're running any BLAS operation? Edit: using in x86-64 Julia v1.12 the libopenblas shipped in Julia v1.10 doesn't fix the issue, so that'd rule out BLAS version differences. Edit 2: also works on Intel(R) Xeon(R) CPU X5670 with CentOS 8 Stream, I'm getting more and more confused, sigh. Pointers to the relevant code would still be very welcome.

glwagner · 2025-10-27T01:12:01Z

The plot thickens: it works correctly in Julia v1.12 on Ampere eMAG (aarch64) with AlmaLinux 8.10 as operating system, which rules out an operating system difference. aarch64 is also the architecture on macOS, so I'm starting to suspect there's an architecture dependence. Can someone point me to the operation performed on the u field? Is there any chance there you're running any BLAS operation? Edit: using in x86-64 Julia v1.12 the libopenblas shipped in Julia v1.10 doesn't fix the issue, so that'd rule out BLAS version differences. Edit 2: also works on Intel(R) Xeon(R) CPU X5670 with CentOS 8 Stream, I'm getting more and more confused, sigh. Pointers to the relevant code would still be very welcome.

Nice work so far though!!

The entire time-step is a complex chain of operations. I do think it is a good start to save down all fields every time-step. We may find that differences arise in one field versus another. Note that the NaNChecker checks u only as a proxy for the entire state. Here the prognostic state should be model.velocities (u, v, w) and model.tracers.b.

glwagner · 2025-10-27T01:13:17Z

To save every iteration chnage this line

Oceananigans.jl/examples/internal_tide.jl

Line 170 in ea25179

schedule = TimeInterval(save_fields_interval),

to schedule = IterationInterval(1). Also I think we should add v (I see u and w but not v there).

The difference should arise in the very first time-step? We could compare those. It seems annoying laborious to do this across architectures, but maybe @giordano you have good ideas how to do this efficiently

giordano · 2025-11-03T15:52:13Z

We made some progress during a pair-debugging session with @simone-silvestri (during which we discovered the typo fixed by #4901), we found that already after the first step the pressure is different, which is updated by

Oceananigans.jl/src/Models/NonhydrostaticModels/update_hydrostatic_pressure.jl

Lines 12 to 20 in 4265add

    
           @kernel function _update_hydrostatic_pressure!(pHY′, grid, buoyancy, C) 
        
               i, j = @index(Global, NTuple) 
        
               @inbounds pHY′[i, j, grid.Nz] = - z_dot_g_bᶜᶜᶠ(i, j, grid.Nz+1, grid, buoyancy, C) * Δzᶜᶜᶠ(i, j, grid.Nz+1, grid) 
        
               for k in grid.Nz-1 : -1 : 1 
        
                   @inbounds pHY′[i, j, k] = pHY′[i, j, k+1] - z_dot_g_bᶜᶜᶠ(i, j, k+1, grid, buoyancy, C) * Δzᶜᶜᶠ(i, j, k+1, grid) 
        
               end 
        
           end

Replacing z_dot_g_bᶜᶜᶠ with

@inline my_z_dot_g_bᶜᶜᶠ(i, j, k, grid::Oceananigans.Grids.AbstractGrid{FT}, bf, C) where FT = FT(0.5) * (C.b[i, j, k] + C.b[i, j, k-1])

seems to be a workaround, but that's puzzling because that's pretty much the same as

Oceananigans.jl/src/Operators/interpolation_operators.jl

Line 15 in 5b9538b

    
           @inline ℑzᵃᵃᶠ(i, j, k, grid::AG{FT}, c) where FT = @inbounds FT(0.5) * (c[i, j, k-1] + c[i, j,   k])

minus the @inbounds. Putting the @inbounds in the definition of my_z_dot_g_bᶜᶜᶠ makes the pressure diverge and the NaNs pop up again. I still don't have a full explanation of what's happening though, nor why this happens only on some systems.

giordano · 2025-11-03T17:01:07Z

Issue with Julia v1.11+ seems to be fixed by JuliaGPU/KernelAbstractions.jl#653, resolved by @vchuravy independently from Oceananigans troubles, but very timely nonetheless 😁

simone-silvestri · 2025-11-03T17:02:21Z

I will switch to a local accumulator in that kernel like we did in the compute_w_from_continuity! function so we can merge this PR independently on the fix.

…to ncc/julia-v1.12

simone-silvestri · 2025-11-03T17:53:37Z

It worked! The simulation does not produce NaN anymore. Now we have other problems to solve, but they all seem easier.

giordano · 2025-11-03T19:42:09Z

src/Models/NonhydrostaticModels/update_hydrostatic_pressure.jl

-    @inbounds pHY′[i, j, grid.Nz] = - z_dot_g_bᶜᶜᶠ(i, j, grid.Nz+1, grid, buoyancy, C) * Δzᶜᶜᶠ(i, j, grid.Nz+1, grid)
+    pᵏ = - z_dot_g_bᶜᶜᶠ(i, j, grid.Nz+1, grid, buoyancy, C) * Δzᶜᶜᶠ(i, j, grid.Nz+1, grid)
+    @inbounds pHY′[i, j, grid.Nz] = pᵏ

    for k in grid.Nz-1 : -1 : 1
-        @inbounds pHY′[i, j, k] = pHY′[i, j, k+1] - z_dot_g_bᶜᶜᶠ(i, j, k+1, grid, buoyancy, C) * Δzᶜᶜᶠ(i, j, k+1, grid)
+        pᵏ -= z_dot_g_bᶜᶜᶠ(i, j, k+1, grid, buoyancy, C) * Δzᶜᶜᶠ(i, j, k+1, grid)
+        @inbounds pHY′[i, j, k] = pᵏ


If we require KernelAbstractions.jl v0.9.39 we don't need the workaround anymore.

giordano · 2025-11-03T21:07:14Z

Doctests failures are due to #4840 (comment)

Update CI to Julia version to 1.12.0

f11a280

navidcy added the testing 🧪 Tests get priority in case of emergency evacuation label Oct 9, 2025

simone-silvestri added 2 commits October 9, 2025 08:52

try increasing reactant compact

0b17768

Merge branch 'main' into ncc/julia-v1.12

2e15efd

simone-silvestri added 2 commits October 9, 2025 09:10

try without reactant

244442d

just run the docs

ef56cac

bugfix

d1b7f3c

simone-silvestri and others added 2 commits October 9, 2025 10:24

Change grid bottom function to GridFittedBottom

bdbfd4f

Update warning for Julia version compatibility

d040a0c

simone-silvestri and others added 4 commits October 9, 2025 10:42

take up the changes in #3836

57f395e

try without distributed

0d34ac0

Merge branch 'main' into ncc/julia-v1.12

612e6a8

imports alphabetically

e0f5137

Merge branch 'main' into ncc/julia-v1.12

92b9871

Merge branch 'main' into ncc/julia-v1.12

879230a

simone-silvestri added 6 commits November 3, 2025 18:05

use a local accumulator and reinstate tests

2173b31

reinstate all tests

052c60f

white line

70e3b0a

Merge branch 'main' into ncc/julia-v1.12

7a78b5c

reinstate distributed examples

ec35ad9

Merge branch 'ncc/julia-v1.12' of github.com:CliMA/Oceananigans.jl in…

00ae2ec

…to ncc/julia-v1.12

simone-silvestri mentioned this pull request Nov 3, 2025

Accumulating in Arrays can lead to wrong results on CPU in Julia 1.11/1.12 JuliaGPU/KernelAbstractions.jl#652

Open

giordano reviewed Nov 3, 2025

View reviewed changes

navidcy added 2 commits November 4, 2025 08:00

Update KernelAbstractions version to 0.9.39

82caad0

Merge branch 'main' into ncc/julia-v1.12

3e4bb4f

giordano mentioned this pull request Nov 3, 2025

Fix mt problem with julia 1.12 #4840

Merged

Update CI to Julia version to 1.12.0 #4836

Are you sure you want to change the base?

Update CI to Julia version to 1.12.0 #4836

Conversation

navidcy commented Oct 9, 2025

Uh oh!

simone-silvestri commented Oct 9, 2025

Uh oh!

simone-silvestri commented Oct 9, 2025

Uh oh!

simone-silvestri commented Oct 9, 2025

Uh oh!

simone-silvestri commented Oct 9, 2025

Uh oh!

navidcy commented Oct 9, 2025

Uh oh!

simone-silvestri commented Oct 9, 2025

Uh oh!

simone-silvestri commented Oct 16, 2025

Uh oh!

ali-ramadhan commented Oct 16, 2025

Uh oh!

navidcy commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giordano commented Oct 26, 2025

Uh oh!

giordano commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giordano commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giordano commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glwagner commented Oct 27, 2025

Uh oh!

glwagner commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giordano commented Nov 3, 2025

Uh oh!

giordano commented Nov 3, 2025

Uh oh!

simone-silvestri commented Nov 3, 2025

Uh oh!

simone-silvestri commented Nov 3, 2025

Uh oh!

giordano Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

navidcy Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

giordano commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

navidcy commented Oct 16, 2025 •

edited

Loading

giordano commented Oct 26, 2025 •

edited

Loading

giordano commented Oct 26, 2025 •

edited

Loading

giordano commented Oct 27, 2025 •

edited

Loading

glwagner commented Oct 27, 2025 •

edited

Loading