Commit b2b38de
committed
Collect GPU usage metrics with prometheus
We use [prometheus node exporter](https://github.com/prometheus/node_exporter),
deployed as part of our prometheus chart, to collect metrics about
CPU and memory usage.
This deploys NVIDIA's [dcgm-exporter](https://github.com/NVIDIA/dcgm-exporter)
which collects information about GPU usage.
As we work towards more cost monitoring and usage monitoring,
collecting this information should allow us to help users get more
bang for the buck from their GPU use. Since we only collect information
after the exporters are deployed, this starts the information collection
process even if it's not directly visible to end users.
Works towards https://2i2c.productboard.com/entity-detail/features/30046512,
initially requested as part of https://2i2c.freshdesk.com/a/tickets/2545.1 parent b68995e commit b2b38de
File tree
3 files changed
+23
-0
lines changed- helm-charts/support
3 files changed
+23
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
45 | 48 | | |
46 | 49 | | |
47 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
494 | 494 | | |
495 | 495 | | |
496 | 496 | | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
497 | 511 | | |
498 | 512 | | |
499 | 513 | | |
| |||
0 commit comments