-
Notifications
You must be signed in to change notification settings - Fork 15
Issues with Timer start and stop methods when using collect_performance: true config option #326
Copy link
Copy link
Open
Labels
BugSomething isn't workingSomething isn't working
Description
Describe the bug
When setting the configuration option collect_performance: true for dace:cpu runs of more than one time step, the start and stop methods of the TImer class in ndsl/performance/timer.py throw the following exceptions:
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:CubedToLatLon
Exception ignored on calling ctypes callback function: <function flatten_callback.<locals>.make_cb.<locals>.cb_func at 0x7f14739942c0>
Traceback (most recent call last):
File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/dace/frontend/python/preprocessing.py", line 441, in cb_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 87, in __exit__
self.timer.stop(name)
File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 43, in stop
self._accumulated_time[name] += time() - self._clock_starts.pop(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'mainloop'
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:Finished stepping 0
/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/collector.py:75: RuntimeWarning: Retrieved times while clocks are still going, incomplete times are not included: ['DynCore']
self.times_per_step.append(self.timestep_timer.times)
/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/collector.py:76: RuntimeWarning: Retrieved hit counts while clocks are still going, incomplete times are not included: ['DynCore']
self.hits_per_step.append(self.timestep_timer.hits)
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:diagnostics for step 2000-01-01 00:03:45 started
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:diagnostics for step 2000-01-01 00:03:45 finished
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:DynCore
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:TracerAdvection
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:DynCore
Exception ignored on calling ctypes callback function: <bound method Timer.clock.<locals>.Wrapper.__enter__ of <ndsl.performance.timer.Timer.clock.<locals>.Wrapper object at 0x7f14737072d0>>
Traceback (most recent call last):
File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 83, in __enter__
self.timer.start(name)
File "/ncrc/home2/Frank.Malatino/.conda/envs/throw2/lib/python3.11/site-packages/ndsl/performance/timer.py", line 28, in start
raise ValueError(f"clock already started for '{name}'")
ValueError: clock already started for 'DynCore'
2025-11-19 13:16:39|INFO|rank 0|ndsl.logging:TracerAdvection
To Reproduce
Run a configuration with backend: dace:cpu and collect_performance: true
Expected behavior
The timer should be able to handle that the clock has already started for the dynamical core, and that mainloop exists as a key for the _accumulated_time and _clock_starts dictionaries.
System Environment
Describe the system environment, include:
- OS: SUSE Linux Enterprise Server 15 SP6
- Backend used: dace:cpu
- Environment variables set: FV3_DACEMODE=Build
- Compiler(s): Type and version: gcc-native/12.3
- MPI type, and version: cray-mpich-abi/8.1.28
- netCDF Version: N/A
- Configure options: collect_performance: true
- If this bug came from a model run, which model: baroclinic_c12.yaml in Pace, with a 1x1 layout, and total runtime of 15 minutes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
BugSomething isn't workingSomething isn't working