Skip to content

Commit fb2b53c

Browse files
[Docs] Described CLI and UI in Metrics; updated Protips.
1 parent 653d746 commit fb2b53c

File tree

2 files changed

+39
-5
lines changed

2 files changed

+39
-5
lines changed

docs/docs/guides/metrics.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,47 @@
11
# Metrics
22

3+
`dstack` automatically tracks essential metrics, which you can access via the CLI and UI.
4+
You can also configure the `dstack` server to export metrics to Prometheus—this is required to access advanced metrics such as those from DCGM.
5+
6+
## UI
7+
8+
To access metrics via the UI, open the page of the corresponding run or job and switch to the `Metrics` tab:
9+
10+
![](https://dstack.ai/static-assets/static-assets/images/dstack-newsletter-metrics.png){ width=800 }
11+
12+
This tab displays key CPU, memory, and GPU metrics collected during the last hour of the run or job.
13+
14+
## CLI
15+
16+
As an alternative to the UI, you can track real-time essential metrics via the CLI.
17+
The `dstack metrics` command displays the most recently tracked CPU, memory, and GPU metrics.
18+
19+
<div class="termy">
20+
21+
```shell
22+
dstack metrics gentle-mayfly-1
23+
24+
NAME STATUS CPU MEMORY GPU
25+
gentle-mayfly-1 done 0% 16.27GB/2000GB gpu=0 mem=72.48GB/80GB util=0%
26+
gpu=1 mem=64.99GB/80GB util=0%
27+
gpu=2 mem=580MB/80GB util=0%
28+
gpu=3 mem=4MB/80GB util=0%
29+
gpu=4 mem=4MB/80GB util=0%
30+
gpu=5 mem=4MB/80GB util=0%
31+
gpu=6 mem=4MB/80GB util=0%
32+
gpu=7 mem=292MB/80GB util=0%
33+
```
34+
35+
</div>
36+
337
## Prometheus
438

5-
To collect and export fleet and run metrics to Prometheus, enable the
6-
`DSTACK_ENABLE_PROMETHEUS_METRICS` environment variable and configure Prometheus to fetch metrics from
39+
To enable exporting metrics to Prometheus, set the
40+
`DSTACK_ENABLE_PROMETHEUS_METRICS` environment variable and configure Prometheus to scrape metrics from
741
`<dstack server URL>/metrics`.
842

43+
In addition to the essential metrics available via the CLI and UI, `dstack` exports additional metrics to Prometheus, including data on fleets, runs, jobs, and DCGM metrics.
44+
945
??? info "NVIDIA DCGM"
1046
NVIDIA DCGM metrics are automatically collected for `aws`, `azure`, `gcp`, and `oci` backends,
1147
as well as for [SSH fleets](../concepts/fleets.md#ssh).

docs/docs/guides/protips.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -422,9 +422,7 @@ Getting offers...
422422

423423
## Metrics
424424

425-
While `dstack` allows the use of any third-party monitoring tools (e.g., Weights and Biases), you can also
426-
monitor container metrics such as CPU, memory, and GPU usage using the [built-in
427-
`dstack metrics` CLI command](../../blog/posts/dstack-metrics.md) or the corresponding API.
425+
`dstack` tracks essential metrics accessible via the CLI and UI. To access advanced metrics like DCGM, configure the server to export metrics to Prometheus. See [Metrics](metrics.md) for details.
428426

429427
## Service quotas
430428

0 commit comments

Comments
 (0)