@@ -275,6 +275,60 @@ FLAGS:
275275...
276276```
277277
278+ # Writing a new benchmark
279+
280+ ## Creating or downloading data outside of the benchmark
281+
282+ If you want to create or download the data with Rust as part of running the benchmark, see the next
283+ section on adding a benchmark subcommand and add code to create or download data as part of its
284+ ` run ` function.
285+
286+ If you want to create or download the data with shell commands, in ` benchmarks/bench.sh ` , define a
287+ new function named ` data_[your benchmark name] ` and call that function in the ` data ` command case
288+ as a subcommand case named for your benchmark. Also call the new function in the ` data all ` case.
289+
290+ ## Adding the benchmark subcommand
291+
292+ In ` benchmarks/bench.sh ` , define a new function named ` run_[your benchmark name] ` following the
293+ example of existing ` run_* ` functions. Call that function in the ` run ` command case as a subcommand
294+ case named for your benchmark. subcommand for your benchmark. Also call the new function in the
295+ ` run all ` case. Add documentation for your benchmark to the text in the ` usage ` function.
296+
297+ In ` benchmarks/src/bin/dfbench.rs ` , add a ` dfbench ` subcommand for your benchmark by:
298+
299+ - Adding a new variant to the ` Options ` enum
300+ - Adding corresponding code to handle the new variant in the ` main ` function, similar to the other
301+ variants
302+ - Adding a module to the ` use datafusion_benchmarks::{} ` statement
303+
304+ In ` benchmarks/src/lib.rs ` , declare the new module you imported in ` dfbench.rs ` and create the
305+ corresponding file(s) for the module's code.
306+
307+ In the module, following the pattern of other existing benchmarks, define a ` RunOpt ` struct with:
308+
309+ - A doc comment that will become the ` --help ` output for the subcommand
310+ - A ` run ` method that the ` dfbench ` ` main ` function will call.
311+ - A ` --path ` structopt field that the ` bench.sh ` script should use with ` ${DATA_DIR} ` to define
312+ where the input data should be stored.
313+ - An ` --output ` structopt field that the ` bench.sh ` script should use with ` "${RESULTS_FILE}" ` to
314+ define where the benchmark's results should be stored.
315+
316+ ### Creating or downloading data as part of the benchmark
317+
318+ Use the ` --path ` structopt field defined on the ` RunOpt ` struct to know where to store or look for
319+ the data. Generate the data using whatever Rust code you'd like, before the code that will be
320+ measuring an operation.
321+
322+ ### Collecting data
323+
324+ Your benchmark should create and use an instance of ` BenchmarkRun ` defined in ` benchmarks/src/util/run.rs ` as follows:
325+
326+ - Call its ` start_new_case ` method with a string that will appear in the "Query" column of the
327+ compare output.
328+ - Use ` write_iter ` to record elapsed times for the behavior you're benchmarking.
329+ - When all cases are done, call the ` BenchmarkRun ` 's ` maybe_write_json ` method, giving it the value
330+ of the ` --output ` structopt field on ` RunOpt ` .
331+
278332# Benchmarks
279333
280334The output of ` dfbench ` help includes a description of each benchmark, which is reproduced here for convenience
0 commit comments