-
Notifications
You must be signed in to change notification settings - Fork 15
Using Graphical Debuggers with GEOSgcm
This page will cover debugging GEOSgcm with graphical debuggers. In this page, we will focus on debugging at NCCS as that is mainly where we do our debugging.
Below we will focus on GEOSgcm built with Intel and Intel MPI, but using with other compilers and MPI stacks should be similar.
The prerequisite for any of these steps is to build GEOSgcm with Debugging flags. This is either via --debug if you use parallel_build.csh or by specifying -DCMAKE_BUILD_TYPE=Debug if you build manually.
No matter what, you'll want to set up an experiment to the point where you would normally run sbatch gcm_run.j but in this case, we will be running interactively.
Now we need to run GEOSgcm on discover. At this point we assume you have an experiment setup exactly how you want it. Note that there are limited TotalView licenses at NCCS, so it's best to run with a small number of processes, like 6 to 24.
To debug, you will want to get some interactive nodes in the usual way using salloc. When we run gcm_run.j below, we need to make sure we do so on these nodes.
In the examples below, we will be focusing on using Totalview 2025.2.6. The main thrust is to set up and use the Remote Client to the point where you can start debugging. Beyond that, please consult the TotalView documentation.
At the moment, to get the Remote Client for Totalview, contact NCCS Support. While Totalview does provide remote clients on its website, you need a license key to get them. NCCS can do this for you. For macOS, the client was named something like totalview_remote_client-2025.2.6.darwin-arm64-installer.dmg.
Once you get the Remote Client, install it as usual. Note that on macOS at least, unlike the DDT Remote Client, it must go into /Applications so you will need some sort of temporary sudo access or help from your sysadmins.
For ease of use, we recommend setting up an SSH controlmaster to discover such that, after you setup the controlmaster, you can do ssh discover after that with no need for Note, this is not necessary, but it makes it easier.
To set it up, start the Totalview Remote Client, and on the main screen, click the "three-sliders" button under "Launch Remote Debugger" (see arrow):
That will pop up the Remote Connections screen:
Click "create a new connection" and fill it in as follows. Note, this assumes you get to discover you do something like ssh user@discover. If you use a different HostName alias, use that. It also assume you are using the tview/2025.2.6 module. If you use a new one, update the path accordingly (or make a new connection).
- Connection Name: discover
- Remote Host(s): user@discover
- TotalView Remote Installation Directory: /usr/local/toolworks/2025/toolworks/totalview.2025.2.6/bin
- Remote TotalView Arguments: -nomrnet
NOTE: The -nomrnet argument seems to be critical for Intel MPI.
Once you have this, click "Ok".
If you now want to launch the remote debugging session, when you start up TotalView Remote Client on your laptop, open the dropdown under "Launch Remote Debugger" and select the connection you want. A window will pop open and then close:
We now need to edit gcm_run.j to have it launch with TotalView such that it will use our Remote Client. Open gcm_run.j and find the part where GEOSgcm.x is actually run. In modern gcm_run.j that line is around line 910 and looks like:
$RUN_CMD $TOTAL_PES $GEOSEXE $IOSERVER_OPTIONS $IOSERVER_EXTRA --logging_config 'logging.yaml'
We want to replace this with:
ml tview/2025.2.6
tvconnect `which mpiexec` -np 24 $GEOSEXE
where 24 should change to the number of processes you are running with.
Here we first load TotalView and then run tvconnect. Testing with Intel MPI has shown that:
- We need to use
mpiexecnotmpirun(oresma_mpirun) - We need to use the full path to
mpiexec. We use thewhichhere to make this a bit more flexible (since yourg5_modulesmight have a different Intel MPI version)
Now, run gcm_run.j on your interactive session with:
./gcm_run.j
and if all works, the Remote Client should now have a popup saying a reverse request was found:
Press "Yes".
When you do this, it will look like it is trying to debug mpiexec. For now, just press the green Go triangle button in the upper left:
When that is pressed, you'll get this dialog that mpiexec was detected and is a parallel job and "Do you want to stop the job now?":
Press "Yes". Now it should move on to GEOSgcm.x and the logger window will show that it is reading symbols:
Then a popup will appear with it loading symbols:
One that is done, you can use as normal. For example, click on the "Lookup File or Function" tab and type in something. I chose GEOS_GwdGridComp.F90 (choose the .F90 not the .i90). Now you can set a breakpoint:
Once done, press the green "Go" play button and it should advance until it hits your breakpoint: