-
Notifications
You must be signed in to change notification settings - Fork 155
Dev: ASV Benchmarks
ASV is a benchmarking tool that is used to benchmark and compare the performance of the library over time.
Example users are Numpy, Arrow, SciPy.
The benchmarks get run automatically in the following cases:
- nightly on the master branch - this updates the performance graphs
- on push on any branch with open PR - this benchmarks the branch in PR against the master branch and if there is a regression of more than 15% the benchmarks fail
Normally, ASV keeps track of the results in JSON files, but we are transforming them into data frames and store them in an ArcticDB database. There is a special script that helps.
All of the code for the actual benchmarks is located in the benchmarks folder. Any new benchmarks should be added there, either in one of the existing classes/files or in a new one.
Currently, we have the following 4 major groups of benchmarks:
- Basic functions - for benchmarking operations such as read/write/append/update, their batch variant, etc. against a local storage
- List functions - for benchmarking operations such as listing symbols, versions, etc. against a local storage
- Local query builder - for benchmarking QB against a local storage (e.g. LMDB)
- Persistent Query Builder - for benchmarking QB against a persistent storage (e.g. AWS S3) so we can read bigger data size
- Resample - benchmarking resampling functionality. Parametrized over input rows, output rows, column types, and supported aggregators
If you have made any changes to the benchmarks, you need to update and push the updated benchmarks.json file to GitHub.
To do this, simply run python python/utils/asv_checks.py from the project root directory. The activated venv must have a version of arcticdb installed that the benchmarks could run against. At time of writing, this requires the following environment variables to be set. The values do not need to work, but some are subjected to regex matching, so must look plausible:
export ARCTICDB_PERSISTENT_STORAGE_TESTS=1
export ARCTICDB_PERSISTENT_STORAGE_STRATEGY_BRANCH=blah
export ARCTICDB_PERSISTENT_STORAGE_SHARED_PATH_PREFIX=blah
export ARCTICDB_PERSISTENT_STORAGE_UNIQUE_PATH_PREFIX=blah
export ARCTICDB_REAL_S3_BUCKET=blah
export ARCTICDB_REAL_S3_ENDPOINT=https://s3.eu-west-1.amazonaws.com/
export ARCTICDB_REAL_S3_REGION=eu-west-1
export ARCTICDB_REAL_S3_CLEAR=1
export ARCTICDB_REAL_S3_ACCESS_KEY=blah
export ARCTICDB_REAL_S3_SECRET_KEY=blah
This file is very important for ASV and the results of the benchmarks will not be generated properly without it.
Although ASV has documentation, reading it might not help you plan and do your test properly with ArcticDB. We are trying to benchmark library functions which have dependency on the current internal state of the symbol/library. That is especially true for functions like append when we want to append a dataframe that is time indexed. finalize_staged_data() also poses problem because it can be executed exactly once on staged data.
To plan your tests you must know following specifics of ASV:
- asv runs test in separate processes. Thus knowing how many processes you have to "synchronize" over your scenario is important
- asv debugging is not possible with simply using "print()" statememnts - the print statements can be seen in console only if part of setup_cache(). For other methods because they work in different threads the output is perhaps not kept. Only solution is to append logs in file or use logger library
- there are several properties that influence repetitions of your time tests. Note that they are also related to the number of process that asv will create
- rounds
- number
- repeat
- params
- the time that your code takes - if time is small then asv will decide on its own and will not use your suggestions
see Asv timing logic Asv timing attributes
The warmup_time runs your benchmark repeatedly to start with, before starting to record results. You can skip it by setting warmup_time=0.
The logic of the repetition parameters is described here.
To run repeatedly in a single process you could use:
rounds = 1
number = 1
repeat = X # you can tune that
min_run_count = 1
warmup_time = 0
params = [a1...an]ASV uses benchmark.json to store metadata for each test, including a hash of its code. Thus each test change triggers new version number created in this benchmark.json file. To sanity check your ASV changes you can run asv_checks.py. Our CI also runs this script.
There is a workflow that automatically benchmarks the latest master commit every night. If you need to run it manually, you can issue a manual build from here and click on the Run workflow menu. This will start a build that will benchmark only the latest version.
If you have made changes to the benchmarks, you might need to regenerate all of the benchmarks. You will need to start a new build manually on master and select the run_all_benchmarks option.
To run ASV locally, you first need to make sure that you have some prerequisites installed:
- asv
- virtualenv
Some ASV benchmarks use files stored in git lfs. In order to be able to run all benchmarks you also need to install git-lfs. Either via sudo apt-get install git-lfs or by following the instructions here.
After git-lfs is installed you must pull the files stored in lfs.
cd <arcticdb-root>
git lfs pullYou also need to change the asv.conf.json file to point to your branch instead on master (e.g. "branches": ["some_branch"], )
If you have introduced any new hard dependencies, you need to add them to the matrix of dependencies that will be installed.

After that you can simply run:
python -m asv run -v --show-stderr HEAD^! => if you want to benchmark only the latest commitTo run a subset of benchmarks, use --bench <regex>.
After running this once, if you are just changing the benchmarks, and not ArcticDB code itself, you can run the updated benchmarks without committing and rebuilding with:
python3 -m asv run --python=python/.asv/env/<some hash>/bin/python -v --show-stderrwhere the path should be obvious from the first ASV run from HEAD^!.
During development you might want to run only some tests and not always compile from the HEAD of the branch. To do that you can use this line:
asv run --python=same -v --show-stderr --bench .*myTest.*This will run the benchmark from the same venv you're running.
If you want to benchmark more than one commit (e.g. if you have added new benchmarks), it might be better to run them on a GH Runner instead of locally. You will again need to change the asv.conf.json file to point to your branch instead on master (e.g. "branches": ["some_branch"], ) And if you have introduced any new hard dependencies, you need to add them to the matrix of dependencies that will be installed.

Then push your changes and start a manual build from here. Make sure to select your branch and whether or not you want to run the benchmarks against all commits. Options
- LMDB / REAL / ALL - when LMDB is chosen the primary set of tests will be executed on the test runner machine with LMDB database. REAL will run Amazon S3 and any other tests with shared storage.
- "Specify regular expression for specific tests to be executed" if specified this will override otevious selection and will run only the test(s) that matches the regular expression that is passed there

According to ASV documentation: The peak memory benchmark also counts memory usage during the setup routine, which may confound the benchmark results. One way to avoid this is to use setup_cache instead.
That means that the heavier we utilize setup() method, the more things we do and add as self variables, the more we will affect the final number of the benchmark. And the less information we will have for the actual memory efficiency of the process we measure - hence we might end up taking wrong decisions based on wrong assumptions.
Due to the fact that most of our tests need to do work in the setup() we cannot fallback on the advice of the ASV team to use setup_cache() for all our tests.
A way to solve this problem is to establish the baseline memory, make it evident and add it to results and the final graph so that it is clear what is its number. To that we can add one more parameter to all tests "measurment_type" with two values - "baseline" and "test_process". If we do that our test would look like this and the results will start to look like this:
`
def peakmem_some_test(self, measurment_type):
if measurment_type == "baseline":
return
do_test()
============================ =======
measurment_type
---------------------------- -------
baseline 2.66G
test_process 3.17G
============================ =======
`
Having this table it is easier to see that actual peakmemmory of the measured process is the difference between the two numbers. then having this on the result graphics will make clear what is the measured process peamkmem.
The good thing of this approach is that it is easy to do and does not change anything in our results. In other words it can be added ASAP without any other side effect than some small increase of run time which is not comparable at all with the time of the actual test but quite fast.
Note that with pytest memray library this can be avoided as the setup can be moved to a special fixture for the test and then leave only the code that need to be measured in the test.
It's important that ASV benchmarks are not flaky and not too slow. This section describes how to investigate these problems.
Want to repeat ASV benchmarks to check they are stable.
- Create an m6i.4xlarge EC2 runner. This is what the CI uses.
- Log in to it.
- Install deps:
sudo apt update
sudo apt-get install build-essential gcc-11 cmake gdb
sudo apt-get install zip pkg-config flex bison libkrb5-dev libsasl2-dev libcurl4-openssl-dev
-
Clone ArcticDB and
git submodule update --init --recursive -
Install Python and install ASV (docs https://github.com/man-group/ArcticDB/wiki/Dev:-Building#setup-for-linux-build-using-wsl)
-
Run some benchmarks:
python -m asv run --bench "resample.Resample.time_resample" -v --show-stderr HEAD^!
It will log the environment that ASV created, you can use that env in future to skip the build step:
python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr
We need to get ASV to save results to its database or it won't report regressions. We can use
--set-commit-hash
to do this. These need to be real hashes in the history.
So that leaves us with commands like:
python -m asv run --set-commit-hash $(git rev-parse HEAD~42) --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr
When developing, remember the -q option to run benchmarks without repeats.
You can then check the comparison across a few benchmark runs to check for any large differences.
You can then run benchmarks repeatedly:
#!/bin/bash
for i in {1..3}; do
commit=$(git rev-parse HEAD~$i)
echo "Running benchmark and storing results under $commit"
/root/miniforge3/bin/python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr --set-commit-hash $commit
done
and then compare them:
#!/bin/bash
for i in {2..3}; do
/root/miniforge3/bin/python -m asv compare -s $(get rev-parse HEAD~1 HEAD~$i) > comparison_$i.txt
done
And then look for comparisons with a large ratio between the repeated runs of the same benchmark. For example, this will look for ones with a ratio less than 0.95 or greater than 1.05:
awk -F'|' 'gsub(/[[:space:]]/,"",$5) && ($5 < 0.95 || $5 > 1.05) && $5 != "Ratio" && $5 != ""' comparison*.txt | sort -t'|' -k5 -n
You can tune any suspicious benchmarks, then repeat this analysis to see whether they appear to be more stable.
transform_asv_results.py includes --mode analyze to check where time is spent on a saved ASV run. See that file for docs. It also runs at the end of the benchmarking CI step so you can check its printout there.
ArcticDB Wiki