You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/measure/index.md
+18-2Lines changed: 18 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,8 @@ Measuring power and energy is also very low overhead, typically taking less than
11
11
12
12
## Programmatic measurement
13
13
14
+
### Time and energy consumption of a chunk of code
15
+
14
16
[`ZeusMonitor`][zeus.monitor.ZeusMonitor] makes it very simple to measure the GPU time and energy consumption of arbitrary Python code blocks.
15
17
16
18
A *measurement window* is defined by a code block wrapped with [`begin_window`][zeus.monitor.ZeusMonitor.begin_window] and [`end_window`][zeus.monitor.ZeusMonitor.end_window].
@@ -60,15 +62,29 @@ if __name__ == "__main__":
60
62
In general, energy optimizers measure the energy of the GPU through a [`ZeusMonitor`][zeus.monitor.ZeusMonitor] instance that is passed to their constructor.
61
63
Thus, only the GPUs specified by `gpu_indices` will be the target of optimization.
62
64
65
+
### Power consumption over time
66
+
67
+
Apart from energy, you can also measure the power consumption of GPUs over time by directly using the [`PowerMonitor`][zeus.monitor.power.PowerMonitor].
68
+
It measures power in three *power domains*:
69
+
-**GPU average power**: Windowed average power consumption of the GPU over a one-second interval.
70
+
-**GPU instantaneous power**: Instantaneous power consumption of the GPU at the time of the query.
71
+
-**GPU memory average power** (Hopper or newer): Windowed average power consumption of the GPU's memory.
72
+
73
+
!!! Important
74
+
Not all GPUs support all power domains, and this is not really documented well. You'll have to check on your GPU by instantiating [`PowerMonitor`][zeus.monitor.power.PowerMonitor], which will automatically detect supported power domains.
75
+
76
+
When `PowerMonitor` is instantiated, it spawns separate processes that poll the device's power consumption API and collects deduplicated power samples in-memory.
77
+
Then, you can call [`get_all_power_timelines`][zeus.monitor.power.PowerMonitor.get_all_power_timelines] or [`get_power_timeline`][zeus.monitor.power.PowerMonitor.get_power_timeline] for a specific power domain to retrieve the power samples collected either for the whole lifetime of the monitor, or for a specific time window.
78
+
63
79
### Synchronizing CPU and GPU computations
64
80
65
81
Deep learning frameworks typically run actual computation on GPUs in an asynchronous fashion.
66
82
That is, the CPU (Python interpreter) asynchronously dispatches computations to run on the GPU and moves on to dispatch the next computation without waiting for the GPU to finish.
67
83
This helps GPUs achieve higher utilization with less idle time.
68
84
69
85
Due to this asynchronous nature of Deep Learning frameworks, we need to be careful when we want to take time and energy measurements of GPU execution.
70
-
We want *only and all of* the computations dispatched between `begin_window` and `end_window` to be captured by our time and energy measurement.
71
-
That's what the `sync_execution_with`paramter in [`ZeusMonitor`][zeus.monitor.ZeusMonitor] and `sync_execution` paramter in [`begin_window`][zeus.monitor.ZeusMonitor.begin_window] and [`end_window`][zeus.monitor.ZeusMonitor.end_window] are for.
86
+
We want *only and all* the computations dispatched between `begin_window` and `end_window` to be captured by our time and energy measurement.
87
+
That's what the `sync_execution_with`parameter in [`ZeusMonitor`][zeus.monitor.ZeusMonitor] and `sync_execution` paramter in [`begin_window`][zeus.monitor.ZeusMonitor.begin_window] and [`end_window`][zeus.monitor.ZeusMonitor.end_window] are for.
72
88
Depending on the Deep Learning framework you're using (currently PyTorch and JAX are supported), [`ZeusMonitor`][zeus.monitor.ZeusMonitor] will automatically synchronize CPU and GPU execution to make sure all and only the computations dispatched between the window are captured.
0 commit comments