Release 0.19.1 · dstackai/dstack

Metrics

With this update, we've added more metrics that you can export to Prometheus. The new metrics allow tracking job CPU and system memory utilization, user and project usage stats, success/error rate, and more.

Runs

Name	Type	Description	Examples
`dstack_run_count_total`	counter	The total number of runs	`537`
`dstack_run_count_terminated_total`	counter	The number of terminated runs	`118`
`dstack_run_count_failed_total`	counter	The number of failed runs	`27`
`dstack_run_count_done_total`	counter	The number of successful runs	`218`

Run jobs

Name	Type	Description	Examples
`dstack_job_cpu_count`	gauge	Job CPU count	`32.0`
`dstack_job_cpu_time_seconds_total`	counter	Total CPU time consumed by the job, seconds	`11.727975`
`dstack_job_memory_total_bytes`	gauge	Total memory allocated for the job, bytes	`4009754624.0`
`dstack_job_memory_usage_bytes`	gauge	Memory used by the job (including cache), bytes	`339017728.0`
`dstack_job_memory_working_set_bytes`	gauge	Memory used by the job (not including cache), bytes	`147251200.0`

For more details on metrics, check Metrics

Major bugfixes

Fixed a bug introduced in 0.19.0 where the working directory in the container was incorrectly set by default to / instead of /workflow.

What's changed

Fix trying fleet instance offers by @jvstme in #2443
Add job system metrics, run metrics by @un-def in #2445
Fix default working dir in containers by @jvstme in #2449
[Examples] Update nccl-tests by @un-def in #2451

Full changelog: 0.19.0...0.19.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.19.1

Choose a tag to compare

Sorry, something went wrong.