Skip to content

Commit a5af979

Browse files
committed
Docs on how to add a new benchmark
1 parent 40b3587 commit a5af979

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

benchmarks/README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,60 @@ FLAGS:
275275
...
276276
```
277277

278+
# Writing a new benchmark
279+
280+
## Creating or downloading data outside of the benchmark
281+
282+
If you want to create or download the data with Rust as part of running the benchmark, see the next
283+
section on adding a benchmark subcommand and add code to create or download data as part of its
284+
`run` function.
285+
286+
If you want to create or download the data with shell commands, in `benchmarks/bench.sh`, define a
287+
new function named `data_[your benchmark name]` and call that function in the `data` command case
288+
as a subcommand case named for your benchmark. Also call the new function in the `data all` case.
289+
290+
## Adding the benchmark subcommand
291+
292+
In `benchmarks/bench.sh`, define a new function named `run_[your benchmark name]` following the
293+
example of existing `run_*` functions. Call that function in the `run` command case as a subcommand
294+
case named for your benchmark. subcommand for your benchmark. Also call the new function in the
295+
`run all` case. Add documentation for your benchmark to the text in the `usage` function.
296+
297+
In `benchmarks/src/bin/dfbench.rs`, add a `dfbench` subcommand for your benchmark by:
298+
299+
- Adding a new variant to the `Options` enum
300+
- Adding corresponding code to handle the new variant in the `main` function, similar to the other
301+
variants
302+
- Adding a module to the `use datafusion_benchmarks::{}` statement
303+
304+
In `benchmarks/src/lib.rs`, declare the new module you imported in `dfbench.rs` and create the
305+
corresponding file(s) for the module's code.
306+
307+
In the module, following the pattern of other existing benchmarks, define a `RunOpt` struct with:
308+
309+
- A doc comment that will become the `--help` output for the subcommand
310+
- A `run` method that the `dfbench` `main` function will call.
311+
- A `--path` structopt field that the `bench.sh` script should use with `${DATA_DIR}` to define
312+
where the input data should be stored.
313+
- An `--output` structopt field that the `bench.sh` script should use with `"${RESULTS_FILE}"` to
314+
define where the benchmark's results should be stored.
315+
316+
### Creating or downloading data as part of the benchmark
317+
318+
Use the `--path` structopt field defined on the `RunOpt` struct to know where to store or look for
319+
the data. Generate the data using whatever Rust code you'd like, before the code that will be
320+
measuring an operation.
321+
322+
### Collecting data
323+
324+
Your benchmark should create and use an instance of `BenchmarkRun` defined in `benchmarks/src/util/run.rs` as follows:
325+
326+
- Call its `start_new_case` method with a string that will appear in the "Query" column of the
327+
compare output.
328+
- Use `write_iter` to record elapsed times for the behavior you're benchmarking.
329+
- When all cases are done, call the `BenchmarkRun`'s `maybe_write_json` method, giving it the value
330+
of the `--output` structopt field on `RunOpt`.
331+
278332
# Benchmarks
279333

280334
The output of `dfbench` help includes a description of each benchmark, which is reproduced here for convenience

0 commit comments

Comments
 (0)