Skip to content

Commit f0544e9

Browse files
committed
Bump Zeus to v0.12.1 with last touch
1 parent 87f1be5 commit f0544e9

File tree

3 files changed

+21
-4
lines changed

3 files changed

+21
-4
lines changed

docs/measure/index.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ Measuring power and energy is also very low overhead, typically taking less than
1111

1212
## Programmatic measurement
1313

14+
### Time and energy consumption of a chunk of code
15+
1416
[`ZeusMonitor`][zeus.monitor.ZeusMonitor] makes it very simple to measure the GPU time and energy consumption of arbitrary Python code blocks.
1517

1618
A *measurement window* is defined by a code block wrapped with [`begin_window`][zeus.monitor.ZeusMonitor.begin_window] and [`end_window`][zeus.monitor.ZeusMonitor.end_window].
@@ -60,15 +62,29 @@ if __name__ == "__main__":
6062
In general, energy optimizers measure the energy of the GPU through a [`ZeusMonitor`][zeus.monitor.ZeusMonitor] instance that is passed to their constructor.
6163
Thus, only the GPUs specified by `gpu_indices` will be the target of optimization.
6264

65+
### Power consumption over time
66+
67+
Apart from energy, you can also measure the power consumption of GPUs over time by directly using the [`PowerMonitor`][zeus.monitor.power.PowerMonitor].
68+
It measures power in three *power domains*:
69+
- **GPU average power**: Windowed average power consumption of the GPU over a one-second interval.
70+
- **GPU instantaneous power**: Instantaneous power consumption of the GPU at the time of the query.
71+
- **GPU memory average power** (Hopper or newer): Windowed average power consumption of the GPU's memory.
72+
73+
!!! Important
74+
Not all GPUs support all power domains, and this is not really documented well. You'll have to check on your GPU by instantiating [`PowerMonitor`][zeus.monitor.power.PowerMonitor], which will automatically detect supported power domains.
75+
76+
When `PowerMonitor` is instantiated, it spawns separate processes that poll the device's power consumption API and collects deduplicated power samples in-memory.
77+
Then, you can call [`get_all_power_timelines`][zeus.monitor.power.PowerMonitor.get_all_power_timelines] or [`get_power_timeline`][zeus.monitor.power.PowerMonitor.get_power_timeline] for a specific power domain to retrieve the power samples collected either for the whole lifetime of the monitor, or for a specific time window.
78+
6379
### Synchronizing CPU and GPU computations
6480

6581
Deep learning frameworks typically run actual computation on GPUs in an asynchronous fashion.
6682
That is, the CPU (Python interpreter) asynchronously dispatches computations to run on the GPU and moves on to dispatch the next computation without waiting for the GPU to finish.
6783
This helps GPUs achieve higher utilization with less idle time.
6884

6985
Due to this asynchronous nature of Deep Learning frameworks, we need to be careful when we want to take time and energy measurements of GPU execution.
70-
We want *only and all of* the computations dispatched between `begin_window` and `end_window` to be captured by our time and energy measurement.
71-
That's what the `sync_execution_with` paramter in [`ZeusMonitor`][zeus.monitor.ZeusMonitor] and `sync_execution` paramter in [`begin_window`][zeus.monitor.ZeusMonitor.begin_window] and [`end_window`][zeus.monitor.ZeusMonitor.end_window] are for.
86+
We want *only and all* the computations dispatched between `begin_window` and `end_window` to be captured by our time and energy measurement.
87+
That's what the `sync_execution_with` parameter in [`ZeusMonitor`][zeus.monitor.ZeusMonitor] and `sync_execution` paramter in [`begin_window`][zeus.monitor.ZeusMonitor.begin_window] and [`end_window`][zeus.monitor.ZeusMonitor.end_window] are for.
7288
Depending on the Deep Learning framework you're using (currently PyTorch and JAX are supported), [`ZeusMonitor`][zeus.monitor.ZeusMonitor] will automatically synchronize CPU and GPU execution to make sure all and only the computations dispatched between the window are captured.
7389

7490
!!! Tip

zeus/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@
1111
- [`_legacy`][zeus._legacy.policy]: Legacy code mostly to keep our papers reproducible
1212
"""
1313

14-
__version__ = "0.12.0"
14+
__version__ = "0.12.1"

zeus/monitor/power.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,8 +239,9 @@ def _determine_supported_domains(self) -> list[PowerDomain]:
239239
try:
240240
_ = method(0)
241241
supported.append(domain)
242+
logger.info("Power domain %s is supported", domain.value)
242243
except ZeusGPUNotSupportedError:
243-
pass
244+
logger.info("Power domain %s is not supported", domain.value)
244245
except Exception as e:
245246
logger.warning(
246247
"Unexpected error while checking for %s support on GPU %d: %s",

0 commit comments

Comments
 (0)