-
Notifications
You must be signed in to change notification settings - Fork 2
Description
The plan is to implement a benchmarking tool which automatically runs a suite of "Zarr workloads" across a range of compute platforms, storage media, chunk sizes, and Zarr implementations.
What would we like to measure for each workload?
Existing benchmarking tools only measure the runtime of each workload. That doesn't feel sufficient for Zarr because one of our main questions during benchmarking is whether the Zarr implementation is able to saturate the IO subsystem, and how much CPU and RAM is required to saturate the IO.
I'd propose that it'd be great to measure these parameters each time each workload is run:
- Total execution time of the workload
- Total bytes read / written for disk / network
- Total IO operations
- Total bytes in final numpy array
- Average CPU utilization (per CPU).
- Max RAM usage during the execution of the workload
- CPU cache hit ratio
(Each run would also capture a bunch of metadata about the environment such as the compute environment, storage media, chunk sizes, Zarr implementation name and version, etc.)
I had previously gotten over-excited and starting thinking about capturing a full "trace" during the execution of each workload, e.g. capturing a timeseries of the IO utilization every 100 milliseconds. This might be useful, but makes the benchmarking code rather more complex, and maybe doesn't tell us much more than the "totals per workload" tell us. And some benchmark workloads might run for less than 100 ms. And psutil's documentation states that some of its counters aren't reliable when polled more frequently than 10 times a second.
What do you folks think? Do we need to record a full "trace" during each workload? Or is it sufficient to just capture totals per workload? Are there any changes you'd make to the list of parameters I proposed above?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status