Skip to content

Using Graphical Debuggers with GEOSgcm

Matt Thompson edited this page Sep 18, 2025 · 1 revision

This page will cover debugging GEOSgcm with graphical debuggers. In this page, we will focus on debugging at NCCS as that is mainly where we do our debugging.

Below we will focus on GEOSgcm built with Intel and Intel MPI, but using with other compilers and MPI stacks should be similar.

Compile GEOSgcm with Debugging flags

The prerequisite for any of these steps is to build GEOSgcm with Debugging flags. This is either via --debug if you use parallel_build.csh or by specifying -DCMAKE_BUILD_TYPE=Debug if you build manually.

Set up an experiment

No matter what, you'll want to set up an experiment to the point where you would normally run sbatch gcm_run.j but in this case, we will be running interactively.

Grab interactive nodes

Now we need to run GEOSgcm on discover. At this point we assume you have an experiment setup exactly how you want it. Note that there are limited TotalView licenses at NCCS, so it's best to run with a small number of processes, like 6 to 24.

To debug, you will want to get some interactive nodes in the usual way using salloc. When we run gcm_run.j below, we need to make sure we do so on these nodes.

DDT

Native on Discover

Using Remote Client

Totalview

In the examples below, we will be focusing on using Totalview 2025.2.6. The main thrust is to set up and use the Remote Client to the point where you can start debugging. Beyond that, please consult the TotalView documentation.

Native on Discover

Using Remote Client

Obtaining the Remote Client

At the moment, to get the Remote Client for Totalview, contact NCCS Support. While Totalview does provide remote clients on its website, you need a license key to get them. NCCS can do this for you. For macOS, the client was named something like totalview_remote_client-2025.2.6.darwin-arm64-installer.dmg.

Once you get the Remote Client, install it as usual. Note that on macOS at least, unlike the DDT Remote Client, it must go into /Applications so you will need some sort of temporary sudo access or help from your sysadmins.

Setting up the Remote Client (One-Time per connection)

For ease of use, we recommend setting up an SSH controlmaster to discover such that, after you setup the controlmaster, you can do ssh discover after that with no need for Note, this is not necessary, but it makes it easier.

To set it up, start the Totalview Remote Client, and on the main screen, click the "three-sliders" button under "Launch Remote Debugger" (see arrow):

Main Screen

That will pop up the Remote Connections screen:

Remote Connections - Bare

Click "create a new connection" and fill it in as follows. Note, this assumes you get to discover you do something like ssh user@discover. If you use a different HostName alias, use that. It also assume you are using the tview/2025.2.6 module. If you use a new one, update the path accordingly (or make a new connection).

  • Connection Name: discover
  • Remote Host(s): user@discover
  • TotalView Remote Installation Directory: /usr/local/toolworks/2025/toolworks/totalview.2025.2.6/bin
  • Remote TotalView Arguments: -nomrnet

NOTE: The -nomrnet argument seems to be critical for Intel MPI.

Remote Connections - Discover

Once you have this, click "Ok".

Launch Remote Debugger

If you now want to launch the remote debugging session, when you start up TotalView Remote Client on your laptop, open the dropdown under "Launch Remote Debugger" and select the connection you want. A window will pop open and then close:

Launching Connection

Editing gcm_run.j for TotalView

We now need to edit gcm_run.j to have it launch with TotalView such that it will use our Remote Client. Open gcm_run.j and find the part where GEOSgcm.x is actually run. In modern gcm_run.j that line is around line 910 and looks like:

    $RUN_CMD $TOTAL_PES $GEOSEXE $IOSERVER_OPTIONS $IOSERVER_EXTRA --logging_config 'logging.yaml'

We want to replace this with:

ml tview/2025.2.6
tvconnect `which mpiexec` -np 24 $GEOSEXE

where 24 should change to the number of processes you are running with.

Here we first load TotalView and then run tvconnect. Testing with Intel MPI has shown that:

  1. We need to use mpiexec not mpirun (or esma_mpirun)
  2. We need to use the full path to mpiexec. We use the which here to make this a bit more flexible (since your g5_modules might have a different Intel MPI version)

Run gcm_run.j

Now, run gcm_run.j on your interactive session with:

./gcm_run.j

and if all works, the Remote Client should now have a popup saying a reverse request was found:

Reverse request found

Press "Yes".

When you do this, it will look like it is trying to debug mpiexec. For now, just press the green Go triangle button in the upper left:

Now Press Play

When that is pressed, you'll get this dialog that mpiexec was detected and is a parallel job and "Do you want to stop the job now?":

Detected mpiexec

Press "Yes". Now it should move on to GEOSgcm.x and the logger window will show that it is reading symbols:

Reading symbols

Then a popup will appear with it loading symbols:

Loading symbols

One that is done, you can use as normal. For example, click on the "Lookup File or Function" tab and type in something. I chose GEOS_GwdGridComp.F90 (choose the .F90 not the .i90). Now you can set a breakpoint:

Setting Breakpoint

Once done, press the green "Go" play button and it should advance until it hits your breakpoint:

At the breakpoint

Clone this wiki locally