Dev: ASV Benchmarks

What are ASV Benchmarks and how do they work?

ASV is a benchmarking tool that is used to benchmark and compare the performance of the library over time.

Example users are Numpy, Arrow, SciPy.

The benchmarks get run automatically in the following cases:

nightly on the master branch - this updates the performance graphs
on push on any branch with open PR - this benchmarks the branch in PR against the master branch and if there is a regression of more than 15% the benchmarks fail

Normally, ASV keeps track of the results in JSON files, but we are transforming them into data frames and store them in an ArcticDB database. There is a special script that helps.

Adding new benchmarks

All of the code for the actual benchmarks is located in the benchmarks folder. Any new benchmarks should be added there, either in one of the existing classes/files or in a new one.

Currently, we have the following 4 major groups of benchmarks:

Basic functions - for benchmarking operations such as read/write/append/update, their batch variant, etc. against a local storage
List functions - for benchmarking operations such as listing symbols, versions, etc. against a local storage
Local query builder - for benchmarking QB against a local storage (e.g. LMDB)
Persistent Query Builder - for benchmarking QB against a persistent storage (e.g. AWS S3) so we can read bigger data size
Resample - benchmarking resampling functionality. Parametrized over input rows, output rows, column types, and supported aggregators

If you have made any changes to the benchmarks, you need to update and push the updated benchmarks.json file to GitHub.

To do this, simply run python python/utils/asv_checks.py from the project root directory. The activated venv must have a version of arcticdb installed that the benchmarks could run against. At time of writing, this requires the following environment variables to be set. The values do not need to work, but some are subjected to regex matching, so must look plausible:

export ARCTICDB_PERSISTENT_STORAGE_TESTS=1
export ARCTICDB_PERSISTENT_STORAGE_STRATEGY_BRANCH=blah
export ARCTICDB_PERSISTENT_STORAGE_SHARED_PATH_PREFIX=blah
export ARCTICDB_PERSISTENT_STORAGE_UNIQUE_PATH_PREFIX=blah
export ARCTICDB_REAL_S3_BUCKET=blah
export ARCTICDB_REAL_S3_ENDPOINT=https://s3.eu-west-1.amazonaws.com/
export ARCTICDB_REAL_S3_REGION=eu-west-1
export ARCTICDB_REAL_S3_CLEAR=1
export ARCTICDB_REAL_S3_ACCESS_KEY=blah
export ARCTICDB_REAL_S3_SECRET_KEY=blah

This file is very important for ASV and the results of the benchmarks will not be generated properly without it.

How many times will a single time test be executed?

Although ASV has documentation, reading it might not help you plan and do your test properly with ArcticDB. We are trying to benchmark library functions which have dependency on the current internal state of the symbol/library. That is especially true for functions like append when we want to append a dataframe that is time indexed. finalize_staged_data() also poses problem because it can be executed exactly once on staged data.

To plan your tests you must know following specifics of ASV:

asv runs test in separate processes. Thus knowing how many processes you have to "synchronize" over your scenario is important
asv debugging is not possible with simply using "print()" statememnts - the print statements can be seen in console only if part of setup_cache(). For other methods because they work in different threads the output is perhaps not kept. Only solution is to append logs in file or use logger library
there are several properties that influence repetitions of your time tests. Note that they are also related to the number of process that asv will create
- rounds
- number
- repeat
- params
- the time that your code takes - if time is small then asv will decide on its own and will not use your suggestions

see Asv timing logic Asv timing attributes

The warmup_time runs your benchmark repeatedly to start with, before starting to record results. You can skip it by setting warmup_time=0.

The logic of the repetition parameters is described here.

To run repeatedly in a single process you could use:

    
    rounds = 1
    number = 1 
    repeat = X # you can tune that
    min_run_count = 1
    warmup_time = 0

    params = [a1...an]

ASV test checks.

ASV uses benchmark.json to store metadata for each test, including a hash of its code. Thus each test change triggers new version number created in this benchmark.json file. To sanity check your ASV changes you can run asv_checks.py. Our CI also runs this script.

Running the benchmarks on master

There is a workflow that automatically benchmarks the latest master commit every night. If you need to run it manually, you can issue a manual build from here and click on the Run workflow menu. This will start a build that will benchmark only the latest version.

If you have made changes to the benchmarks, you might need to regenerate all of the benchmarks. You will need to start a new build manually on master and select the run_all_benchmarks option.

Running the benchmarks on a non-master branch

Local Run

To run ASV locally, you first need to make sure that you have some prerequisites installed:

asv
virtualenv

Some ASV benchmarks use files stored in git lfs. In order to be able to run all benchmarks you also need to install git-lfs. Either via sudo apt-get install git-lfs or by following the instructions here.

After git-lfs is installed you must pull the files stored in lfs.

cd <arcticdb-root>
git lfs pull

You also need to change the asv.conf.json file to point to your branch instead on master (e.g. "branches": ["some_branch"], ) If you have introduced any new hard dependencies, you need to add them to the matrix of dependencies that will be installed.

After that you can simply run:

python -m asv run -v --show-stderr HEAD^! => if you want to benchmark only the latest commit

To run a subset of benchmarks, use --bench <regex>.

After running this once, if you are just changing the benchmarks, and not ArcticDB code itself, you can run the updated benchmarks without committing and rebuilding with:

python3 -m asv run --python=python/.asv/env/<some hash>/bin/python -v --show-stderr

where the path should be obvious from the first ASV run from HEAD^!.

Running specific tests

During development you might want to run only some tests and not always compile from the HEAD of the branch. To do that you can use this line:

asv run --python=same -v --show-stderr --bench .*myTest.*

This will run the benchmark from the same venv you're running.

GitHub Actions Run

If you want to benchmark more than one commit (e.g. if you have added new benchmarks), it might be better to run them on a GH Runner instead of locally. You will again need to change the asv.conf.json file to point to your branch instead on master (e.g. "branches": ["some_branch"], ) And if you have introduced any new hard dependencies, you need to add them to the matrix of dependencies that will be installed.

Then push your changes and start a manual build from here. Make sure to select your branch and whether or not you want to run the benchmarks against all commits. Options

LMDB / REAL / ALL - when LMDB is chosen the primary set of tests will be executed on the test runner machine with LMDB database. REAL will run Amazon S3 and any other tests with shared storage.
"Specify regular expression for specific tests to be executed" if specified this will override otevious selection and will run only the test(s) that matches the regular expression that is passed there

Understanding and implementing peakmem benchmarks

According to ASV documentation: The peak memory benchmark also counts memory usage during the setup routine, which may confound the benchmark results. One way to avoid this is to use setup_cache instead.

That means that the heavier we utilize setup() method, the more things we do and add as self variables, the more we will affect the final number of the benchmark. And the less information we will have for the actual memory efficiency of the process we measure - hence we might end up taking wrong decisions based on wrong assumptions.

Due to the fact that most of our tests need to do work in the setup() we cannot fallback on the advice of the ASV team to use setup_cache() for all our tests.

A way to solve this problem is to establish the baseline memory, make it evident and add it to results and the final graph so that it is clear what is its number. To that we can add one more parameter to all tests "measurment_type" with two values - "baseline" and "test_process". If we do that our test would look like this and the results will start to look like this:

`

def peakmem_some_test(self, measurment_type):
    if measurment_type == "baseline":
        return
    do_test()


         ============================ =======
                 measurment_type
         ---------------------------- -------
                  baseline             2.66G
                test_process           3.17G
         ============================ =======

`

Having this table it is easier to see that actual peakmemmory of the measured process is the difference between the two numbers. then having this on the result graphics will make clear what is the measured process peamkmem.

The good thing of this approach is that it is easy to do and does not change anything in our results. In other words it can be added ASAP without any other side effect than some small increase of run time which is not comparable at all with the time of the actual test but quite fast.

Note that with pytest memray library this can be avoided as the setup can be moved to a special fixture for the test and then leave only the code that need to be measured in the test.

Flaky or slow ASV benchmarks

It's important that ASV benchmarks are not flaky and not too slow. This section describes how to investigate these problems.

Flakiness

Want to repeat ASV benchmarks to check they are stable.

Create an m6i.4xlarge EC2 runner. This is what the CI uses.
Log in to it.
Install deps:

sudo apt update
sudo apt-get install build-essential gcc-11 cmake gdb
sudo apt-get install zip pkg-config flex bison libkrb5-dev libsasl2-dev libcurl4-openssl-dev

Clone ArcticDB and git submodule update --init --recursive
Install Python and install ASV (docs https://github.com/man-group/ArcticDB/wiki/Dev:-Building#setup-for-linux-build-using-wsl)
Run some benchmarks:

python -m asv run --bench "resample.Resample.time_resample" -v --show-stderr HEAD^!

It will log the environment that ASV created, you can use that env in future to skip the build step:

python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr

We need to get ASV to save results to its database or it won't report regressions. We can use

--set-commit-hash

to do this. These need to be real hashes in the history.

So that leaves us with commands like:

python -m asv run --set-commit-hash $(git rev-parse HEAD~42) --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr

When developing, remember the -q option to run benchmarks without repeats.

You can then check the comparison across a few benchmark runs to check for any large differences.

You can then run benchmarks repeatedly:

#!/bin/bash

for i in {1..3}; do
  commit=$(git rev-parse HEAD~$i)
  echo "Running benchmark and storing results under $commit"
  /root/miniforge3/bin/python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr --set-commit-hash $commit
done

and then compare them:

#!/bin/bash

for i in {2..3}; do
  /root/miniforge3/bin/python -m asv compare -s $(get rev-parse HEAD~1 HEAD~$i) > comparison_$i.txt
done

And then look for comparisons with a large ratio between the repeated runs of the same benchmark. For example, this will look for ones with a ratio less than 0.95 or greater than 1.05:

awk -F'|' 'gsub(/[[:space:]]/,"",$5) && ($5 < 0.95 || $5 > 1.05) && $5 != "Ratio" && $5 != ""' comparison*.txt | sort -t'|' -k5 -n

You can tune any suspicious benchmarks, then repeat this analysis to see whether they appear to be more stable.

Slowness

transform_asv_results.py includes --mode analyze to check where time is spent on a saved ASV run. See that file for docs. It also runs at the end of the benchmarking CI step so you can check its printout there.

ArcticDB Wiki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev: ASV Benchmarks

What are ASV Benchmarks and how do they work?

Adding new benchmarks

How many times will a single time test be executed?

ASV test checks.

Running the benchmarks on master

Running the benchmarks on a non-master branch

Local Run

Running specific tests

GitHub Actions Run

Understanding and implementing peakmem benchmarks

Flaky or slow ASV benchmarks

Flakiness

Slowness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally