Skip to content

0.19.1

Choose a tag to compare

@un-def un-def released this 26 Mar 12:26
· 714 commits to master since this release
9d0b83f

Metrics

With this update, we've added more metrics that you can export to Prometheus. The new metrics allow tracking job CPU and system memory utilization, user and project usage stats, success/error rate, and more.

Runs

Name Type Description Examples
dstack_run_count_total counter The total number of runs 537
dstack_run_count_terminated_total counter The number of terminated runs 118
dstack_run_count_failed_total counter The number of failed runs 27
dstack_run_count_done_total counter The number of successful runs 218

Run jobs

Name Type Description Examples
dstack_job_cpu_count gauge Job CPU count 32.0
dstack_job_cpu_time_seconds_total counter Total CPU time consumed by the job, seconds 11.727975
dstack_job_memory_total_bytes gauge Total memory allocated for the job, bytes 4009754624.0
dstack_job_memory_usage_bytes gauge Memory used by the job (including cache), bytes 339017728.0
dstack_job_memory_working_set_bytes gauge Memory used by the job (not including cache), bytes 147251200.0

For more details on metrics, check Metrics

Major bugfixes

Fixed a bug introduced in 0.19.0 where the working directory in the container was incorrectly set by default to / instead of /workflow.

What's changed

Full changelog: 0.19.0...0.19.1