[Blog] Built-in UI for monitoring basic GPU metrics (#2470)

peterschmidt85 · web-flow · commit 0e50bcf0e226 · 2025-04-03T16:26:35.000+02:00
diff --git a/docs/blog/posts/dstack-metrics.md b/docs/blog/posts/dstack-metrics.md
@@ -1,16 +1,16 @@
 ---
-title: "Monitoring basic GPU metrics via dstack stats"
+title: "Monitoring basic GPU metrics via CLI"
 date: 2024-10-22
 description: "dstack introduces a new CLI command (and API) for monitoring container metrics, incl. GPU usage for NVIDIA, AMD, and other accelerators."  
-slug: dstack-stats
+slug: dstack-metrics
 image: https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-stats-v2.png?raw=true
 categories:
   - AMD
   - NVIDIA
   - Monitoring
 ---
 
-# Monitoring basic GPU metrics via dstack stats
+# Monitoring basic GPU metrics via CLI
 
 ## How it works { style="display:none"}
 
@@ -22,6 +22,8 @@ for monitoring container metrics, including GPU usage for `NVIDIA`, `AMD`, and o
 
 <!-- more -->
 
+> Note, the `dstack stats` command has been renamed to `dstack metrics`. The old name is also supported by deprecated.
+
 The command is similar to `kubectl top` (in terms of semantics) and `docker stats` (in terms of the CLI interface). The key
 difference is that `dstack stats` includes GPU VRAM usage and GPU utilization percentage. 
 
diff --git a/docs/blog/posts/metrics-ui.md b/docs/blog/posts/metrics-ui.md
@@ -0,0 +1,60 @@
+---
+title: "Built-in UI for monitoring basic GPU metrics"
+date: 2025-04-03
+description: "TBA"
+slug: metrics-ui
+image: https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-metrics-ui-v2-min.png?raw=true
+categories:
+  - Monitoring
+  - AMD
+  - NVIDIA
+---
+
+# Built-in UI for monitoring basic GPU metrics
+
+AI workloads generate vast amounts of metrics, making it essential to have efficient monitoring tools. While our recent
+update introduced the ability to export available metrics to Prometheus for maximum flexibility, there are times when
+users need to quickly access essential metrics without the need to switch to an external tool.
+
+<img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-metrics-ui-v2-min.png?raw=true" width="630"/>
+
+Previously, we introduced a [CLI command](dstack-metrics.md) that allows users to view basic GPU metrics for both NVIDIA
+and AMD hardware. Now, with this latest update, we’re excited to announce the addition of a built-in dashboard within
+the `dstack` control plane.
+
+<!-- more -->
+
+The new feature provides an easy-to-use interface for tracking the most essential GPU metrics
+directly from the control plane, streamlining the real-time monitoring process without needing any additional tools.
+
+<img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-metrics-ui-dashboard.png?raw=true" width="800">
+
+Additionally, we’ve renamed the CLI command previously known as `dstack stats` to `dstack metrics` for consistency.
+
+<div class="termy">
+
+```shell
+$ dstack metrics nccl-tests -w
+ NAME        CPU  MEMORY            GPU
+ nccl-tests  81%  2754MB/1638400MB  #0 100740MB/144384MB 100% Util
+                                    #1 100740MB/144384MB 100% Util
+                                    #2 100740MB/144384MB 99% Util
+                                    #3 100740MB/144384MB 99% Util
+                                    #4 100740MB/144384MB 99% Util
+                                    #5 100740MB/144384MB 99% Util
+                                    #6 100740MB/144384MB 99% Util
+                                    #7 100740MB/144384MB 100% Util
+```
+
+</div>
+
+By default, both the control plane and CLI show metrics from the last hour, which is particularly useful for debugging
+workloads. 
+
+For persistent storage and long-term access to metrics, we still recommend setting up Prometheus to fetch
+metrics from `dstack`.
+
+!!! info "What's next?"
+    1. See the [Monitoring](../../docs/guides/monitoring.md) guide
+    2. Check [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md)
+    3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}
diff --git a/docs/blog/posts/prometheus.md b/docs/blog/posts/prometheus.md
@@ -17,7 +17,7 @@ Effective AI infrastructure management requires full visibility into compute per
 detailed insights into container- and GPU-level performance, while managers rely on cost metrics to track resource usage
 across projects.
 
-While `dstack` provides key metrics through its UI and [`dstack metrics`](dstack-stats.md) CLI, teams often need more granular data and prefer
+While `dstack` provides key metrics through its UI and [`dstack metrics`](dstack-metrics.md) CLI, teams often need more granular data and prefer
 using their own monitoring tools. To support this, we’ve introduced a new endpoint that allows real-time exporting all collected
 metrics—covering fleets and runs—directly to Prometheus.
 
@@ -57,7 +57,7 @@ For a full list of available metrics and labels, check out the [Monitoring](../.
 
 ??? info "AMD"
     AMD device metrics are not yet collected for any backends. This support will be available soon. For now, AMD metrics are
-    only accessible through the UI and the [`dstack metrics`](dstack-stats.md) CLI.
+    only accessible through the UI and the [`dstack metrics`](dstack-metrics.md) CLI.
 
 !!! info "What's next?"
     1. See the [Monitoring](../../docs/guides/monitoring.md) guide
diff --git a/docs/docs/guides/protips.md b/docs/docs/guides/protips.md
@@ -312,7 +312,7 @@ The GPU vendor is indicated by one of the following case-insensitive values:
 
 While `dstack` allows the use of any third-party monitoring tools (e.g., Weights and Biases), you can also
 monitor container metrics such as CPU, memory, and GPU usage using the [built-in
-`dstack metrics` CLI command](../../blog/posts/dstack-stats.md) or the corresponding API.
+`dstack metrics` CLI command](../../blog/posts/dstack-metrics.md) or the corresponding API.
 
 ## Service quotas
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -127,10 +127,11 @@ plugins:
         'backends.md': 'partners.md'
         'developers.md': 'community.md'
         'blog/ambassador-program.md': 'blog/archive/ambassador-program.md'
-        'blog/monitoring-gpu-usage.md': 'blog/posts/dstack-stats.md'
+        'blog/monitoring-gpu-usage.md': 'blog/posts/dstack-metrics.md'
         'blog/inactive-dev-environments-auto-shutdown.md': 'blog/posts/inactivity-duration.md'
         'blog/data-centers-and-private-clouds.md': 'blog/posts/gpu-blocks-and-proxy-jump.md'
         'blog/distributed-training-with-aws-efa.md': 'blog/posts/efa.md'
+        'blog/dstack-stats.md': 'blog/posts/dstack-metrics.md'
   - typeset
   - gen-files:
       scripts:  # always relative to mkdocs.yml