Skip to content

Performance Testing

Tom Augspurger edited this page Mar 12, 2014 · 7 revisions

pandas uses vbench to monitor performance across revisions.

vbench

vbench is a tool for benchmarking your code through time, for showing performance improvement or regressions.

WARNING: vbench is not yet compatible with python3.

New Dependencies

Also note that you need to have sqlite3 working with python.

Writing a good vbench


A set of related benchmarks go together in a module (a `.py` file).
See `vb_suite/indexing.py` for an example.

There's typically some boilerplate common to all the tests, which can
be placed in a string `common_setup`.

Now we can write our specific benchmark.

There are up to three items in a single benchmark:

* setup specific to that benchmark (typically a string concatenated to `common_setup`)
* a statement to be executed, which is the first argument to the `vbench.BenchmarkRunner` class
* instantiation the `vbench.Benchmark` class

It's important to separate the setup from the statement we're interested in profiling.
The statement ought to be concise and should profile only one thing.
If you mix setup in with the statement to be profiled, then changes affecting the performance of the setup (which might even take place outside your library) will pollute the test.

Each module must be listed in the `suite.py` file in the modules list.

Not all tests can be run against the entire history of the project (since the API has changed).
For newer features, each `Benchmark` object takes an optional `start_date` parameter.
For example:

```python
start_date=datetime(2012, 1, 1)
```

If a `start_date` is not applied for a specific benchmark, the global setting from `vb_suite.py` is used.

Another reason that a benchmark can't be run against the entire project's history is that API's sometimes have to change in ways that are not backwards compatible.
For these cases, the easiest way to compare performance pre- to post-API change is probably the try-except idiom:

```python
try:
    rng = date_range('1/1/2000', periods=N, freq='min')
except NameError:
    rng = DateRange('1/1/2000', periods=N, offset=datetools.Minute())
    date_range = DateRange
```

Pre-PR
-------

Most contributors don't need to worry about writing a vbench or running the full suite against the project's entire history.
If you are ever asked to run a vbench, change your directory to `pandas/vb_suite` and run

```
./test_perf.sh -b master -t HEAD
```

You can optionally restrict the run to certain files with the `-r` paramater.
Clone this wiki locally