Skip to content

Issues with Timer start and stop methods when using collect_performance: true config option #326

@fmalatino

Description

@fmalatino

Describe the bug
When setting the configuration option collect_performance: true for dace:cpu runs of more than one time step, the start and stop methods of the TImer class in ndsl/performance/timer.py throw the following exceptions:

2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:CubedToLatLon
Exception ignored on calling ctypes callback function: <function flatten_callback.<locals>.make_cb.<locals>.cb_func at 0x7f14739942c0>
Traceback (most recent call last):
  File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/dace/frontend/python/preprocessing.py", line 441, in cb_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 87, in __exit__
    self.timer.stop(name)
  File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 43, in stop
    self._accumulated_time[name] += time() - self._clock_starts.pop(name)
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'mainloop'
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:Finished stepping 0
/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/collector.py:75: RuntimeWarning: Retrieved times while clocks are still going, incomplete times are not included: ['DynCore']
  self.times_per_step.append(self.timestep_timer.times)
/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/collector.py:76: RuntimeWarning: Retrieved hit counts while clocks are still going, incomplete times are not included: ['DynCore']
  self.hits_per_step.append(self.timestep_timer.hits)
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:diagnostics for step 2000-01-01 00:03:45 started
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:diagnostics for step 2000-01-01 00:03:45 finished
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:DynCore
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:TracerAdvection
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:DynCore
Exception ignored on calling ctypes callback function: <bound method Timer.clock.<locals>.Wrapper.__enter__ of <ndsl.performance.timer.Timer.clock.<locals>.Wrapper object at 0x7f14737072d0>>
Traceback (most recent call last):
  File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 83, in __enter__
    self.timer.start(name)
  File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 28, in start
    raise ValueError(f"clock already started for '{name}'")
ValueError: clock already started for 'DynCore'
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:TracerAdvection

To Reproduce
Run a configuration with backend: dace:cpu and collect_performance: true

Expected behavior
The timer should be able to handle that the clock has already started for the dynamical core, and that mainloop exists as a key for the _accumulated_time and _clock_starts dictionaries.

System Environment
Describe the system environment, include:

  • OS: SUSE Linux Enterprise Server 15 SP6
  • Backend used: dace:cpu
  • Environment variables set: FV3_DACEMODE=Build
  • Compiler(s): Type and version: gcc-native/12.3
  • MPI type, and version: cray-mpich-abi/8.1.28
  • netCDF Version: N/A
  • Configure options: collect_performance: true
  • If this bug came from a model run, which model: baroclinic_c12.yaml in Pace, with a 1x1 layout, and total runtime of 15 minutes

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions