Skip to content

Conversation

@katsmith133
Copy link

@katsmith133 katsmith133 commented Oct 30, 2025

Add the VertMix class to Omega, which consists of:

Methods to calculate the vertical diffusivity and viscosity using (for now just) the PP option, which can linearly add contributions from background, convective, and shear mixing based upon user choices in the config file. The VertMix options and parameter values are set in the yaml config file. There is a dummy KPP option in there now to set up the structure for adding it in the near future (I can remove if we think this is messy). This PR includes tests for each individual mixing process (background, convective, and shear), as well as one that linearly adds all of them together. Because both the convective and shear mixing require the Brunt Vaiasla frequency, this PR also adds the calculation of BruntVaisalaFreq to the EOS class and adds corresponding tests. The BruntVaisalaFreq is calculated in the way that was in MPAS-O if the Linear EOS is used (i.e. with linear coefficients and the derivative based upon changes in z) and is calculated in the same way the TEOS-10 package calculates it if the TEOS-10 EOS is used (i.e. with non-linear coefficients and the derivative based upon changes in p).

Ran ctests successfully on pm-cpu, pm-gpu, and chrysalis.

Checklist

  • Documentation:
    • User's Guide has been updated
    • Developer's Guide has been updated
    • Documentation has been built locally and changes look as expected
  • Building
    • CMake build does not produce any new warnings from changes in this PR
  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.
    • CTest unit tests for new features have been added per the approved design.
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.

Copy link

@philipwjones philipwjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails on Frontier GPU with a memory access error, so some array/variable may be missing or not scoped? I have some other comments spread below.

@katsmith133
Copy link
Author

This fails on Frontier GPU with a memory access error, so some array/variable may be missing or not scoped? I have some other comments spread below.

@philipwjones did the error provide any more information than that? At the moment my Frontier build is stopping at the cmake step (it keeps hanging at the Cpptrace auto config: Using cxxabi for demangling?), so I can't test to see what this issue is.

@philipwjones
Copy link

@katsmith133 No additional info and I didn't have a chance to narrow it further, but I forgot to mention that it failed both EOS and VertMix unit tests, so it could point to something in the EOS additions. And I think this was with the AMD compiler - I can look into it further next week if no one gets to it first.

@katsmith133
Copy link
Author

@katsmith133 No additional info and I didn't have a chance to narrow it further, but I forgot to mention that it failed both EOS and VertMix unit tests, so it could point to something in the EOS additions. And I think this was with the AMD compiler - I can look into it further next week if no one gets to it first.

Thanks @philipwjones! I am in a Deep Learning course this week, and probably will not get to it. So if you have time, that would be appreciated! If not, I can pick it back up next week.

Copy link
Member

@mwarusz mwarusz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katsmith133 @philipwjones

I had a chance to look into the test failures of this PR on Frontier. There are issues with accessing host pointers on the device in EosTest.cpp and VertMixTest.cpp. My suggestion shows how to fix one loop, but similar changes need to be applied for all accesses to VCoord->ZMid and various Mesh->X arrays in these two tests.

@philipwjones
Copy link

Just pushed some changes to fix the frontier issue in EOS test driver (Vmix test driver mods will come soon). The commit also adds some additional EOS checks and cleans up some error handling, so folks should pull these changes if working on this PR

Copy link
Member

@mwarusz mwarusz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments and suggestions. There is a couple of serial vertical loops in this PR that I think could be easily done in parallel.

@philipwjones
Copy link

Ok, I pushed a new version of the vert mix test driver that fixes the Frontier issues, expands some error checks to full arrays, and cleans up a few things. This now passes ctests on Frontier.

@katsmith133
Copy link
Author

@philipwjones Thanks for the help in fixing the Frontier issues!

@mwarusz, @cbegeman, and @vanroekel I believe I have addressed all of your comments. Please let me know if not. Thanks!

@philipwjones philipwjones self-requested a review November 17, 2025 22:08
Copy link

@philipwjones philipwjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this now.

Real PInt =
0.5_Real * (Pressure(ICell, K) + Pressure(ICell, K - 1));
Real SpInt = 0.5_Real * (SpecVol(ICell, K) + SpecVol(ICell, K - 1));
Real AlphaInt = calcAlpha(SaInt, CtInt, PInt, SpInt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to put alpha and beta in member arrays as KPP will need these too and then we won't have to recalculate (I think).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep this as a note to change when we implement KPP. Thanks for pointing it out!

Copy link

@alicebarthel alicebarthel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this @katsmith133! Most of my comments are suggestions for clarification (in the documentation or the naming). The key point to revise or resolve is whether $\nu_b$ should or should not be included in $\kappa_{shear}$ (see additional comment above on cvmix). We agreed to proceed with the new (smaller) PCoeff array, which will need to be revisited in the performance optimization or the submeso implementation.

@katsmith133 katsmith133 requested review from alicebarthel and removed request for alicebarthel December 23, 2025 00:35
@katsmith133 katsmith133 force-pushed the omega/add-vertical-mixing branch from 0dfdfdc to 57fb146 Compare December 23, 2025 00:39
@mark-petersen
Copy link
Collaborator

Passes all tests on perlmutter and frontier, CPU and GPU, including Test #36: VERTMIX_TEST.

Test commands:

######### Frontier cpu ############
export CODEDIR=opr
#export CODEDIR=omega-develop
export RUNDIR=test_omega_cpu
mkdir /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR
cd !$

cd /ccs/home/mpetersen/repos/E3SM/${CODEDIR}
git submodule update --init --recursive components/omega/external/GSW-C components/omega/external/yaml-cpp externals/YAKL externals/ekat externals/scorpio cime externals/cpptrace
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR

module load cmake
rm -rf build
mkdir build
cd build

# compiler options are:
export compiler=craycray
#export compiler=craygnu
#export compiler=crayamd

export PARMETIS_ROOT=/ccs/proj/cli115/software/polaris/frontier/spack/dev_polaris_0_9_0_${compiler}_mpich/var/spack/environments/dev_polaris_0_9_0_${compiler}_mpich/.spack-env/view

module load python
cmake \
   -DOMEGA_CIME_COMPILER=${compiler} \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=frontier \
   -DOMEGA_VECTOR_LENGTH=1 \
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   -S /ccs/home/mpetersen/repos/E3SM/${CODEDIR}/components/omega -B .
# note OMEGA_VECTOR_LENGTH=8 fails MPI tests on CPUs.
./omega_build.sh

# linking:
cd test
ln -isf ~/meshes/omega/O*nc .
cp /ccs/home/mpetersen/repos/E3SM/${CODEDIR}/components/omega/configs/Default.yml omega.yml

# -S is number of GPUs: Count of Specialized Cores per node

salloc -A cli115 -J inter -t 1:00:00 -q debug -N 1 -S 0
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR/build
./omega_ctest.sh

######### Frontier gpu #########
export CODEDIR=opr
export RUNDIR=test_omega_gpu
mkdir /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR
cd !$

cd /ccs/home/mpetersen/repos/E3SM/${CODEDIR}
git submodule update --init --recursive components/omega/external/GSW-C components/omega/external/yaml-cpp externals/YAKL externals/ekat externals/scorpio cime externals/cpptrace
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR

module load cmake
rm -rf build
mkdir build
cd build

# compiler options are:
export compiler=craycray-mphipcc	
#export compiler=craygnu-mphipcc  # 250422 error on cmake, module cmake, git missing
#export compiler=crayamd-mphipcc  

export PARMETIS_ROOT=/ccs/proj/cli115/software/polaris/frontier/spack/dev_polaris_0_9_0_${compiler}_mpich/var/spack/environments/dev_polaris_0_9_0_${compiler}_mpich/.spack-env/view

#module load Core/24.07
#module load cmake/3.27.9 git/2.45.1
module load python
cmake \
   -DOMEGA_CIME_COMPILER=${compiler} \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=frontier \
   -DOMEGA_BUILD_TEST=ON \
   -DOMEGA_VECTOR_LENGTH=1 \
   -Wno-dev \
   -Wno-deprecated \
   -S /ccs/home/mpetersen/repos/E3SM/${CODEDIR}/components/omega -B .
./omega_build.sh

# linking:
cd test
ln -isf ~/meshes/omega/O*nc .
cp /ccs/home/mpetersen/repos/E3SM/${CODEDIR}/components/omega/configs/Default.yml omega.yml

salloc -A cli115 -J inter -t 1:00:00 -q debug -N 1 -p batch
# -p is partition name
cd /lustre/orion/cli115/scratch/mpetersen/runs/$RUNDIR/build
./omega_ctest.sh

######### perlmutter CPU
#export CODEDIR=omega-develop
export CODEDIR=opr
export RUNDIR=test_omega_cpu

cd /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}
#git fetch
#git reset --hard origin/develop
git submodule update --init --recursive components/omega/external/GSW-C components/omega/external/yaml-cpp externals/ekat externals/scorpio cime externals/cpptrace
cd components/omega/

module load cmake
mkdir ${PSCRATCH}/runs/$RUNDIR
cd ${PSCRATCH}/runs/$RUNDIR

rm -rf build
mkdir build
cd build

# compiler options are:
export compiler=gnu
#export compiler=nvidia # not working 250421

export PARMETIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-cpu/spack/dev_polaris_0_9_0_${compiler}_mpich/var/spack/environments/dev_polaris_0_9_0_${compiler}_mpich/.spack-env/view

# nvidia or gnu compiler:
cmake \
   -DOMEGA_CIME_COMPILER=${compiler} \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=pm-cpu \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -DOMEGA_VECTOR_LENGTH=1 \
   -Wno-dev \
   -S /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega -B .
# note OMEGA_VECTOR_LENGTH=8 fails MPI tests on CPUs.
cd ${PSCRATCH}/runs/$RUNDIR/build
./omega_build.sh

# linking:
cd test
ln -isf /global/homes/m/mpeterse/meshes/omega/O*nc .
cp /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega/configs/Default.yml omega.yml

# run test:
salloc --nodes 1 --qos interactive --time 01:00:00 --constraint cpu --account=m4572 # or e3sm
cd ${PSCRATCH}/runs/${RUNDIR}/build

./omega_ctest.sh

######### perlmutter GPU
salloc --nodes 4 --qos interactive --time 01:00:00 --constraint gpu --tasks-per-node=2 --gpus-per-task 1 --account=m4572_g # or e3sm_g

# perlmutter has nodes with either 40 or 80 gb of high bandwidth memory, and the system defaults to 40. You can ask for 80 gb nodes with the sbatch flag --constraint="gpu&hbm80gb"

export CODEDIR=opr
#export CODEDIR=omega-develop
export RUNDIR=test_omega_gpu
mkdir ${PSCRATCH}/runs/$RUNDIR
cd !$

rm -rf build
mkdir build
cd build
module load cmake

# compiler options are:
export compiler=gnugpu
#export compiler=nvidiagpu

export PARMETIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-gpu/spack/dev_polaris_0_9_0_${compiler}_mpich/var/spack/environments/dev_polaris_0_9_0_${compiler}_mpich/.spack-env/view
cmake \
   -DOMEGA_CIME_COMPILER=${compiler} \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=pm-gpu \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   -DOMEGA_MPI_ON_DEVICE:BOOL=OFF \
   -S /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega -B .
# needed for compiler bug: OMEGA_MPI_ON_DEVICE:BOOL=OFF. See https://github.com/E3SM-Project/Omega/issues/214
./omega_build.sh

# linking:
cd test
ln -isf /global/homes/m/mpeterse/meshes/omega/O*nc .
cp /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega/configs/Default.yml omega.yml

cd ..
./omega_ctest.sh

@mark-petersen mark-petersen merged commit f2e951a into E3SM-Project:develop Dec 23, 2025
1 check passed
@mark-petersen
Copy link
Collaborator

Thank you @katsmith133 and congratulations on the completion of this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants