What to measure during benchmarking?

The plan is to implement a benchmarking tool which automatically runs a suite of "Zarr workloads" across a range of compute platforms, storage media, chunk sizes, and Zarr implementations.

What would we like to measure for each workload?

[Existing benchmarking tools](https://github.com/zarr-developers/zarr-benchmark/issues/1) only measure the runtime of each workload. That doesn't feel sufficient for Zarr because one of our main questions during benchmarking is whether the Zarr implementation is able to saturate the IO subsystem, and how much CPU and RAM is required to saturate the IO.

I'd propose that it'd be great to measure these parameters each time each workload is run:

* Total execution time of the workload
* Total bytes read / written for disk / network
* Total IO operations
* Total bytes in final numpy array
* Average CPU utilization (per CPU).
* Max RAM usage during the execution of the workload
* CPU cache hit ratio

(Each run would also capture a bunch of metadata about the environment such as the compute environment, storage media, chunk sizes, Zarr implementation name and version, etc.)

I had previously gotten over-excited and starting thinking about capturing a full "trace" during the execution of each workload, e.g. capturing a timeseries of the IO utilization every 100 milliseconds. This might be useful, but makes the benchmarking code rather more complex, and maybe doesn't tell us much more than the "totals per workload" tell us. And some benchmark workloads might run for less than 100 ms. And [psutil's documentation states that some of its counters aren't reliable when polled more frequently than 10 times a second](https://psutil.readthedocs.io/en/latest/#psutil.cpu_percent).

What do you folks think? Do we need to record a full "trace" during each workload? Or is it sufficient to just capture totals per workload? Are there any changes you'd make to the list of parameters I proposed above?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What to measure during benchmarking? #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What to measure during benchmarking? #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions