|
| 1 | +--- |
| 2 | +title: "Built-in UI for monitoring basic GPU metrics" |
| 3 | +date: 2025-04-03 |
| 4 | +description: "TBA" |
| 5 | +slug: metrics-ui |
| 6 | +image: https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-metrics-ui-v2-min.png?raw=true |
| 7 | +categories: |
| 8 | + - Monitoring |
| 9 | + - AMD |
| 10 | + - NVIDIA |
| 11 | +--- |
| 12 | + |
| 13 | +# Built-in UI for monitoring basic GPU metrics |
| 14 | + |
| 15 | +AI workloads generate vast amounts of metrics, making it essential to have efficient monitoring tools. While our recent |
| 16 | +update introduced the ability to export available metrics to Prometheus for maximum flexibility, there are times when |
| 17 | +users need to quickly access essential metrics without the need to switch to an external tool. |
| 18 | + |
| 19 | +<img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-metrics-ui-v2-min.png?raw=true" width="630"/> |
| 20 | + |
| 21 | +Previously, we introduced a [CLI command](dstack-metrics.md) that allows users to view basic GPU metrics for both NVIDIA |
| 22 | +and AMD hardware. Now, with this latest update, we’re excited to announce the addition of a built-in dashboard within |
| 23 | +the `dstack` control plane. |
| 24 | + |
| 25 | +<!-- more --> |
| 26 | + |
| 27 | +The new feature provides an easy-to-use interface for tracking the most essential GPU metrics |
| 28 | +directly from the control plane, streamlining the real-time monitoring process without needing any additional tools. |
| 29 | + |
| 30 | +<img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-metrics-ui-dashboard.png?raw=true" width="800"> |
| 31 | + |
| 32 | +Additionally, we’ve renamed the CLI command previously known as `dstack stats` to `dstack metrics` for consistency. |
| 33 | + |
| 34 | +<div class="termy"> |
| 35 | + |
| 36 | +```shell |
| 37 | +$ dstack metrics nccl-tests -w |
| 38 | + NAME CPU MEMORY GPU |
| 39 | + nccl-tests 81% 2754MB/1638400MB #0 100740MB/144384MB 100% Util |
| 40 | + #1 100740MB/144384MB 100% Util |
| 41 | + #2 100740MB/144384MB 99% Util |
| 42 | + #3 100740MB/144384MB 99% Util |
| 43 | + #4 100740MB/144384MB 99% Util |
| 44 | + #5 100740MB/144384MB 99% Util |
| 45 | + #6 100740MB/144384MB 99% Util |
| 46 | + #7 100740MB/144384MB 100% Util |
| 47 | +``` |
| 48 | + |
| 49 | +</div> |
| 50 | + |
| 51 | +By default, both the control plane and CLI show metrics from the last hour, which is particularly useful for debugging |
| 52 | +workloads. |
| 53 | + |
| 54 | +For persistent storage and long-term access to metrics, we still recommend setting up Prometheus to fetch |
| 55 | +metrics from `dstack`. |
| 56 | + |
| 57 | +!!! info "What's next?" |
| 58 | + 1. See the [Monitoring](../../docs/guides/monitoring.md) guide |
| 59 | + 2. Check [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md) |
| 60 | + 3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"} |
0 commit comments