|
| 1 | +## Purpose |
| 2 | + |
| 3 | +This directory contains a Python re-implementation of the Haskell Criterion methodology to run executables (instead of Haskell functions, like Criterion normally does). |
| 4 | +One could call it "benchrunner-runner" because the purpose is to run `benchrunner` many times and calculate the appropriate run time statistics. |
| 5 | + |
| 6 | +We take as input some program `prog` with the following interface: |
| 7 | + |
| 8 | +- `prog` takes `iters` as a command-line argument, |
| 9 | +- `prog` measures run time of a function of interest in a tight loop that repeats `iters` many times, and finally |
| 10 | +- `prog` prints to stdout the batchtime (total loop time) and selftimed (total loop time divided by `iters`). |
| 11 | + |
| 12 | +The ultimate goal is then to sweep `iters` and perform a linear regression against `iters` and `batchtime`. |
| 13 | +The slope is the mean and the y-intercept represents some notion of shared overhead, insensitive to `iters`. |
| 14 | + |
| 15 | +## Run |
| 16 | + |
| 17 | +This package contains two scripts: |
| 18 | + |
| 19 | +- `sweep_seq.py` (top level) |
| 20 | +- `criterionmethodology.py` (called by `sweep_seq.py`) |
| 21 | + |
| 22 | +Both can be ran directly, i.e.: |
| 23 | + |
| 24 | +```shellsession |
| 25 | +criterionmethodology benchrunner Quicksort Seq 2000 |
| 26 | +``` |
| 27 | + |
| 28 | +will call `benchrunner iters Quicksort Seq 2000` for various `iters`. |
| 29 | + |
| 30 | +`sweep_seq` performs a logarithmic sweep over different array sizes, invoking `criterionmethdology.py` at each point. |
| 31 | + |
| 32 | +## Arightmetic vs geometric mean |
| 33 | + |
| 34 | +Since performance data is non-negative and judged multiplicatively (twice as good means numbers are half, twice has bad means numbers are doubled; these are all *factors*), the geomean and geo-standard-deviation may make more sense theoretically. |
| 35 | +However, from some testing, the geomean seems to vary wildly for programs with fleeting execution times, even between repeated runs with the same parameters. |
| 36 | + |
| 37 | +In particular, to compute the geomean, we: |
| 38 | + |
| 39 | +- take the logarithm of all the `x` and `y` values, |
| 40 | +- compute linear regression over that, then |
| 41 | +- exponentiate the y-intercept. |
| 42 | + |
| 43 | +The other dependent portion, which is the slope, becomes a power (the equation is `y = e^b x^m`), which represents *geometric overhead*, e.g. how much overhead is being added per iteration. |
| 44 | +This may do well to model any slowdowns, e.g. ones arising from pre-allocating arrays. |
0 commit comments