Open
Conversation
Introduce 4 new topology types for distributed tripolar grid fold boundaries:
- LeftConnectedRightCenterFolded (1xN UPivot)
- LeftConnectedRightFaceFolded (1xN FPivot, Face-extended)
- LeftConnectedRightCenterConnected{WestOfPivot/EastOfPivot} (MxN UPivot)
- LeftConnectedRightFaceConnected{WestOfPivot/EastOfPivot} (MxN FPivot, Face-extended)
These replace the previous FullyConnected y-topology on northernmost distributed
ranks, which lost fold information and caused FPivot CF/FF fields to crash (OOB)
because they need Ny+1 Face points.
Key changes:
- Grids.jl: new types, WestOfPivot/EastOfPivot, global_fold_topology()
- grid_utils.jl: BoundedTopology union includes Face-extended fold types
- distributed_grids.jl: insert_connected_topology 5-arg methods for fold topologies
- distributed_zipper.jl: complete rewrite with topology-based buffer dispatch,
corrected y-ranges for Face-extended grids, fold-line WoP/EoP handling
- Tripolar struct simplified to 3 type params (fold info now in grid topology)
- Split-explicit, advection, and BC unions updated for new types
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lo fill
- Make Tripolar{N,F,S,FT} carry the fold topology as a phantom type
parameter (isbits for GPU). Add fold_topology() getter, remove
global_fold_topology() which could not resolve the fold type from
non-fold rank topologies (FullyConnected, RightConnected).
- Fix reconstruct_global_grid and with_halo for distributed TripolarGrid:
pass fold_topology from conformal_mapping so FPivot grids reconstruct
correctly (was defaulting to RightCenterFolded).
- Fix distributed zipper corner buffers: use has_fold_line to size
corner buffers (Hy+1 when fold line present), dispatch on
TwoDZipperBuffer only (corners not needed for 1D partitions),
remove redundant arch::Distributed constraint, rename H→Hy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atch Remove the WestOfPivot/EastOfPivot type parameter from topology types and replace with per-buffer fold-line control via two new type parameters on TwoDZipperBuffer and ZipperCornerBuffer: - FL (fold line in buffer): true when the buffer contains the fold-line row (Hy+1 rows). Set to has_fold_line() for ALL buffers to ensure MPI size matching between mirror partners. - WFL (writes fold line): true when the recv should write the fold-line row. Computed per-buffer from the rank's x-position relative to the pivot (Nx/2), accounting for periodic wrap at domain boundaries. Helper functions north_writes_fold_line, northwest_writes_fold_line, northeast_writes_fold_line determine WFL based on rx and Rx. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atch - Remove WestOfPivot/EastOfPivot type parameters and pivot_side function - Add FL (fold-line in buffer) and WFL (writes fold line) type parameters to TwoDZipperBuffer and ZipperCornerBuffer - Per-buffer fold-line helpers: north/northwest/northeast_writes_fold_line accounting for both pivot points (x-periodicity) - TripolarXBuffer with location-aware y-size using length(loc_y, topo, Ny) instead of has_fold_line (only FPivot Face-y gets Ny+1, not UPivot) - Temporary @info diagnostics for send/recv tracing (to be removed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add FL/WFL type parameters to TripolarXBuffer, mirroring the north/corner buffer pattern. The west/east x-buffers complement the adjacent NW/NE corner: if the corner writes the fold line, the x-buffer skips it (and vice versa), ensuring no overlapping writes with async MPI. Separate west_tripolar_buffer and east_tripolar_buffer constructors since they complement different corners (NW vs NE). Remaining: Face-x pivot point (global x = Nx/2+1) at the fold line is unfilled in distributed because it falls in the gap between the NE corner (Hx-1 columns) and the east x-buffer (WFL=false). This only affects the exact pivot point which is under land in real ocean configurations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t ZBC - Remove local zipper_bc function (duplicated north_fold_boundary_condition) - Use north_fold_boundary_condition(fold_topology(...))(sign) directly - Remove local const ZBC (already defined in BoundaryConditions) - Import ZBC from BoundaryConditions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… types Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Simone Silvestri <silvestri.simone0@gmail.com>
- Refactor distributed_zipper.jl: replace ~65 combinatorial send/recv methods with ~20 using helper functions for y-ranges, x-ranges, and type-parameter accessors. Move FL/WFL to front of type parameter list for clean dispatch. Consolidate 4 corner buffer constructors into 2. - Fix FPivot distributed simulation mismatch: extend async buffer tendency kernel parameters to use worksize(grid) instead of size(grid), ensuring the fold line at Ny+1 is covered by the buffer pass for RightFaceFolded grids. - Add worksize override for distributed FPivot grids (DRFTRG) so kernels correctly operate on the extra Face-y row at Ny+1. - Remove stale explicit imports (FZBC, UZBC, RightConnected, instantiated_location) and fix self-qualified fold_topology access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The overrides were modifying the serial fold BC to skip the fold-line consistency substitution, which is no longer needed now that the distributed fold handles this correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Simone Silvestri <silvestri.simone0@gmail.com>
OneDZipperBuffer was never constructed because slab partitions (Rx=1) use the serial fold BC directly, not a DistributedZipper. The OneDFoldTopology + DistributedZipper combination was impossible. Re-add loc_id import needed by distributed_zipper_north_tags.jl. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Benchmark ComparisonBenchmark Comparison: PR vs Main
NSYS Kernel ProfilingEarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr
|
Contributor
Benchmark ComparisonBenchmark Comparison: PR vs Main
NSYS Kernel ProfilingEarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes the distributed tripolar fold with no extra MPI passes, for both UPivot and FPivot topologies. The current distributed UPivot on main is broken (halos don't match the serial case), so this PR addresses that and adds FPivot support.
Key changes
New distributed fold topologies
Doing this without extra MPI passes requires local y-topologies that encode more information than just
FullyConnected. This is accomplished by introducing 4 new topology types for distributed tripolar grid fold boundaries:LeftConnectedRightCenterFolded(1xN slab UPivot)LeftConnectedRightFaceFolded(1xN slab FPivot)LeftConnectedRightCenterConnected(MxN pencil UPivot)LeftConnectedRightFaceConnected(MxN pencil FPivot)These replace the previous
FullyConnectedy-topology on northernmost distributed ranks.Fold-aware MPI communication buffers
Three buffer types encode fold geometry for the distributed zipper for pencil partitions (slab partitions use the serial fold BC directly):
TwoDZipperBuffer— interior-width x-reversal withFL/WFLflags controlling fold-line inclusion and write ownership based on pivot position.ZipperCornerBuffer— NW/NE corner halos where the fold intersects x-boundaries between ranks.TripolarXBuffer— X-direction halos adjacent to the fold with matching fold-line awareness.Worksize-based kernel coverage for FPivot
Added
worksizeoverride for distributed FPivot grids returning(Nx, Ny+1, Nz), matching the serial override from #5408. Extended the async distributed buffer tendency kernel parameters to also useworksize(grid)instead ofsize(grid), closing a coverage gap at the fold line that caused serial/distributed divergence after 2 time steps.TODO: extend testing
I have a bunch of extra tests that I have run locally, and which seem to pass. I intend to include some here, in particular to cover the FPivot topology. Future commits could add to CI:
Pivot point mismatch
There is one single point where serial and distributed halo-filling don't match: the pivot points. That's because it is a very special case that would need special handling, and I don't think we should "waste" lines of codes for it since it will always be covered by land. Take the ranks A and B "touching" the central pivot point pictured below (for a FPivot pencil partition; the star represents the pivot point, which is at FF location for this topology). When you fill the halos of A around the pivot, you use the north, corner, and x buffers. The buffers all have nice "rectangle like" shapes. But the pivot point must come from B, despite being on a row coming from A itself through the corner buffer. And this only happens at the pivot, so for any partition (Rx,Ry) where Rx>2 it would need an extra type/encoding to correctly fill that single point only for this rank next to the pivot. So I think it's not worth it, but happy to be convinced otherwise.
Visual verification
The fold is verified visually using index-tracking fields with half-circles representing 180° rotation around pivot points. For a 4x2 UPivot 20x20 grid with
Hx=Hy=3:The fold is correct if all halos are filled and the circles are centered around the stars (= the pivot points).
For a 4x2 FPivot:
And I also checked the halos for each rank, e.g., 4x2 UPivot rank 7:
For the record, it currently looks like that for the main branch, missing top halo row and fold line for example + some incorrectly pivoted cells (non-concentric circles):
So overall I think that the src/ part is ready for review.
Follows from #5408.
Supersedes #5381.