This repository has utility scripts and example functions for running jet
reconstruction benchmarks with the
JetReconstruction Julia
package, and then comparing to Fastjet and Python implementations.
Main directory of source files and utilities for benchmarking:
benchmark.jlrun timing tests for different backends, allowing switching of algorithm, strategy, etc. (will run over multiple event input files)generate-benchmarks-{pp,ee,antikt}.shexample files of how to generate a set of benchmark files for various parametersmerge-results.jlmerges per run-parameters output CSV files into only large results file
Some other utility scripts:
hepmc32summary.jlwrite CSV files summarising the content of HepMC3 test input files (average cluster density)fastjet2json.jlconvert fastjet outputs to JSON files for event validation tests with Juliafindmin-test.jltests the speed of finding minima in vectors, using Julia's builtinfindminand an turbo version that usesLoopVectorization(as is used inJetReconstruction.jl)benchmark-substructure.jlruns timing tests for substructure utilities inJetReconstruction.jl
Example C++ application that runs fastjet for comparing with other implementations. Built with CMake.
Use the script fastjet2json.jl to convert fastjet outputs to JSON files for
comparison tests with Julia.
Submodule with the Python jet finding code for AntiKt.
Pythia8 application(s) for producing input HepMC3 files.
TODO: Proper CMake setup, but the source files will take you most of the way.
Sorted HepMC3 data files used as reconstruction inputs (compressed).
Note that the benchmark.jl script will trigger unpacking of the compressed
data files for running with fastjet or Python codes (which do not directly read
compressed inputs).
Archive of results.
Recommended structure is to store in a particular subdirectory the results from
a particular machine, e.g., M2Pro_OSX_15.3 or AMD5700_Linux_Alma9.4.
This is an older set of results (2023 and 2024) where the metadata used in the benchmarking outputs was rather sparse, with few fields and an awkward convention of storing additional metadata in the output filenames. This was fragile and not very extensible, so has been archived.
Here we describe what to do to measure events/s using benchmark.jl with
different versions of the sequential jet algorithm codes for a single set of
parameters, usually over a range of input files.
This is the native Julia version of sequential jet reconstruction. It is a
package dependency of this benchmarking package, so the version used will be
whatever is set in the Manifest.toml. In particular, if you pick a development
version of the code, you will be able to benchmark different development
branches, etc.
The Julia version used is the one which is used to run benchmark.jl, so it is
trivial to switch Julia versions in the usual juliaup way, i.e.,
julia +1.9 --project src/benchmark.jl ...would also run JetReconstruction with Julia 1.9.
There is a small fastjet program in the fastjet directory. Use CMake to
compile this to fastjet/build/fastjet-finder which is where benchmark.jl
expects to find it. e.g.,
cd fastjet
cmake -S . -B build
cmake --build buildEvidently you can use whatever compiler and flags you like.
The repository used for the native Python versions of jet reconstruction is
antikt-python. This is a
submodule of this repository, so benchmark.jl expects to find the correct
Python scripts in antikt-python/src.
Note that the Python environment, with dependencies, needs to be setup manually.
Use the environment.yml file provided with conda or mamba (you may wish to
tune the main Python version to your needs).
The Python repository contains two versions of the reconstruction algorithms: one that is pure Python and one that uses NumPy and Numba for acceleration.
Note that the Python codes are limited: only the AntiKt algorithm is supported
and only the N2Plain and N2Tiled strategies can be used. (This is, however,
sufficient to get a feel for the relative speed of these Python codes.)
The benchmark.jl script has quite a few options used for detailed control of
the jobs it runs. The most important ones are:
julia --project benchmark.jl --code CODE --algorithm ALG --strategy STRAT --nsamples TRIALS INPUT_FILE(S)The CODE option selects the code implementation:
CODE |
Code Used |
|---|---|
| JetReconstruction | Julia JetReconstruction.jl |
| Fastjet | C++ Fastjet |
| AkTPython | AntiKt jet finder in pure Python |
| AkTNumPy | AntiKt jet finder using NumPy and Numba in Python |
ALG is the algorithm, in the JetReconstruction.jl parlance: AntiKt, CA,
Kt, GenKt (for Durham, EEKt (for GenKt and EEKt the additional argument --power p is required to fully
specify the reconstruction.
STRAT is the strategy, which is only relevant to N2Plain or N2Tiled (Best is also an option, which delegates the
strategy to the application).
TRIALS is the number of times to run over the event sample (N.B. the result
which is reported will be the minimum value obtained, which is the most stable
benchmark).
INPUT_FILE is either:
- A list of specific input files to run over (e.g.,
data/events-pp-13TeV-20GeV.hepmc3.gz), or - A CSV containing a list of input files with filenames and mean particle
densities (e.g.,
data/events-summary-ee-pp.csv)
In general the former is used for testing and the latter for full benchmark runs.
--radius R: for algorithms which have a variable radius parameter, this is set toR(default 0.4)--power p: for algorithms which have a variable power, this is set top- for algorithms with a fixed power this must be set consistently or the run will be aborted--ptmin PTMIN: for these benchmarking runs an inclusive jet selection will be done with thisptmincut (this has little influence on the results)
To conduct a benchmarking run there are a few other parameters that should be specified:
--results OUTPUT: write the final CSV data out toOUTPUT; ifOUTPUTis a directory thenbenchmarking.jlwill use an ~unique filename determined from the parameters of run - this should be sufficient for a systematic series of benchmark runs--code-version: write the version of the code used into the output; there is no good way forbenchmarking.jlto detect this, so this is a user specified string (e.g.,3.4.3for Fastjet 3.43.;0.4.6for a recent version ofJetReconstruction.jl)--backend: the backend compiler/interpreter, e.g.,juliaorgcc; for Python and Julia codes this can be determined automatically; for Fastjet it will be set toC++, but should really be the compiler that was used to compile the application--backend-version: the version of the backend compiler/interpreter, e.g., for Julia 1.11.5 this will be1.11.5; for Python and Julia this can be determined automatically, but for the C++ compiler of Fastjet this is not possible and must be specified
When the benchmarking runs all important parameters will be written into the final CSV file output.
It is recommended to process these files as DataFrames.
Here is an example:
julia --project=. src/benchmark.jl --code JetReconstruction --algorithm CA --strategy N2Tiled -m 16 -R 1.0 --code-version 0.4.6 --results test.csv data/events-summary-ee-pp.csv
12×13 DataFrame
Row │ File mean_particles File_path n_samples time_per_event code code_version algorithm strategy R p backend backend_version
│ String31 Float64 String Int64 Float64 Code String Algorithm Strategy Float64 Int64 String String
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ events-ee-Z.hepmc3.gz 43.05 data/events-ee-Z.hepmc3.gz 16 29.5892 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
2 │ events-ee-H.hepmc3.gz 64.97 data/events-ee-H.hepmc3.gz 16 46.5575 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
3 │ events-pp-0.5TeV-5GeV.hepmc3.gz 112.62 data/events-pp-0.5TeV-5GeV.hepmc… 16 69.0592 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
4 │ events-pp-1TeV-5GeV.hepmc3.gz 160.36 data/events-pp-1TeV-5GeV.hepmc3.… 16 108.462 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
5 │ events-pp-2TeV-5Gev.hepmc3.gz 188.21 data/events-pp-2TeV-5Gev.hepmc3.… 16 133.251 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
6 │ events-pp-2TeV.hepmc3.gz 226.98 data/events-pp-2TeV.hepmc3.gz 16 171.804 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
7 │ events-pp-5TeV-10GeV.hepmc3.gz 284.15 data/events-pp-5TeV-10GeV.hepmc3… 16 220.633 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
8 │ events-pp-8TeV-20GeV.hepmc3.gz 354.18 data/events-pp-8TeV-20GeV.hepmc3… 16 298.599 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
9 │ events-pp-13TeV-20GeV.hepmc3.gz 431.18 data/events-pp-13TeV-20GeV.hepmc… 16 379.04 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
10 │ events-pp-20TeV-20GeV.hepmc3.gz 524.59 data/events-pp-20TeV-20GeV.hepmc… 16 486.832 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
11 │ events-pp-20TeV-50GeV.hepmc3.gz 553.64 data/events-pp-20TeV-50GeV.hepmc… 16 536.417 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5
12 │ events-pp-30TeV-50GeV.hepmc3.gz 632.29 data/events-pp-30TeV-50GeV.hepmc… 16 632.413 JetReconstruction 0.4.6 CA N2Tiled 1.0 0 Julia 1.11.5The CSV file is generated from the DataFrame in the standard way, but the fields are easier to read in the DataFrame output above.
To run a sweep over parameters of interest there are some shell scripts that can
drive the process - see the files src/generate-benchmarks*.sh.
The scripts take one parameter, which is the output directory for all the results. e.g.,
./src/generate-benchmarks-ee.sh results/M2Pro_OSX_15.3.2N.B. These example scripts do not set code-version, backend or
backend-version.
After a systematic sweep script has run there are many output file in the output
directory. To merge these files use the merge-results.jl script. e.g.,
cd results/test_run
julia --project=../.. ../../src/merge-results.jl *.csvThe default is to output the merged CSV file to all-results.csv, this can be
adjusted with the --output option.
When a final merged output file has been generated, the analysis usually proceeds by using a Pluto notebook that ingests the data, makes appropriate selections and plots the results.
Currently examples for the new style metadata are a work in progress, but the
older style notebooks results/*-nb.jl can be mostly recycled for this purpose.