Skip to content

Consider an optional flag to collect and dump GPU metrics #265

@jsign

Description

@jsign

It might be useful to add some flag e.g. --collect-metrics, which dumps in one (or more) files hardware metrics during the proving process.

The most obvious one are GPU metrics. Prob the easiest way to do this is to leverage nvidia-smi which already have watch flags and output formats, etc.

Example:

$ nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.used,memory.total,temperature.gpu --format=csv -l 1 > gpu_log.csv

And tailing the .csv file:

2026/01/06 15:13:03.389, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 49
2026/01/06 15:13:03.389, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 48
2026/01/06 15:13:04.389, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 41
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 47
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 48
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 49
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 47
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 47
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 49
2026/01/06 15:13:04.390, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 48
2026/01/06 15:13:05.391, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 41
2026/01/06 15:13:05.391, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 47
2026/01/06 15:13:05.391, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 48
2026/01/06 15:13:05.391, NVIDIA GeForce RTX 5090, 0 %, 0 %, 2 MiB, 32607 MiB, 48

Some considerations:

  • We could name the file accordingly with some zkvm name, start-timestamp, end-timestamp
  • See which --query-gpu stuff makes sense to collect, and maybe support ERE_GPU_METRICS to allow the user to customize
  • Not sure about CSV format by default, but I guess sounds reasonable.
  • Apparently, there is a -lms to set the frequency at ms level if that's useful, but not sure.

All points above are thinking out loud.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions