Skip to content

Commit 1e0f608

Browse files
authored
Benchmarks re-org (#5676)
This PR makes two big changes to our benchmarking setup: 1. It splits duckdb, datafusion and lance benchmarks into their own crates, so they don't depend on each other, minimizing the friction when bumping dependencies like Arrow or DataFusion. 2. Introduces a new benchmarking tool - `vx-bench` which allows comparing various runs, persisting results and takes care of building and running the individual binaries. `vx-bench` is written in Python using `typer`. I've tried to not touch the CI setup too much, but it can be simplified and improved significantly to fit this structure. ## Notes for reviewers 1. DF/Arrow results are worse because the data is chunked differently, we use an `MemTable` which is just a `Vec<RecordBatch>`, which is now longer. 2. ~I'm looking into the differences in Vortex results, not sure what's going on there.~ Turns out the `vortex-compact` results are slower because they actually run on data compressed with the compact compressor! Some of the tables are about half the size, but they are slower to decompress/process. 4. ~Seems like we've been dropping some compression benchmark results, so they dont have a baseline (or I changed the name) I'm trying to figure out what's going on there.~ I've changed the name of some compression benchmarks, ratios have the same name though. ## `vx-bench` demo ![render1765993255912](https://github.com/user-attachments/assets/0e0d3c9d-e7f9-47b6-bdff-e41a34e46348) --------- Signed-off-by: Adam Gutglick <[email protected]>
1 parent 155ca5b commit 1e0f608

File tree

274 files changed

+8618
-6844
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

274 files changed

+8618
-6844
lines changed

.github/workflows/bench-pr.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,9 @@ jobs:
4444
strategy:
4545
matrix:
4646
benchmark:
47-
- id: random_access
47+
- id: random-access-bench
4848
name: Random Access
49-
- id: compress
49+
- id: compress-bench
5050
name: Compression
5151
if: ${{ contains(github.event.head_commit.message, '[benchmark]') || github.event.label.name == 'benchmark' && github.event_name == 'pull_request' }}
5252
steps:
@@ -72,7 +72,7 @@ jobs:
7272
env:
7373
RUSTFLAGS: "-C target-cpu=native -C force-frame-pointers=yes"
7474
run: |
75-
cargo build --bin ${{ matrix.benchmark.id }} --package bench-vortex --profile release_debug
75+
cargo build --package ${{ matrix.benchmark.id }} --profile release_debug
7676
7777
- name: Setup Polar Signals
7878
if: github.event.pull_request.head.repo.fork == false
@@ -89,7 +89,7 @@ jobs:
8989
env:
9090
RUST_BACKTRACE: full
9191
run: |
92-
target/release_debug/${{ matrix.benchmark.id }} -d gh-json -o ${{ matrix.benchmark.id }}.json
92+
target/release_debug/${{ matrix.benchmark.id }} -d gh-json -o results.json
9393
9494
- name: Setup AWS CLI
9595
if: github.event.pull_request.head.repo.fork == false
@@ -123,7 +123,7 @@ jobs:
123123
124124
echo '# Benchmarks: ${{ matrix.benchmark.name }}' > comment.md
125125
echo '' >> comment.md
126-
uv run --no-project scripts/compare-benchmark-jsons.py base.json ${{ matrix.benchmark.id }}.json "${{ matrix.benchmark.name }}" \
126+
uv run --no-project scripts/compare-benchmark-jsons.py base.json results.json "${{ matrix.benchmark.name }}" \
127127
>> comment.md
128128
129129
- name: Comment PR

.github/workflows/bench.yml

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
RUSTFLAGS: "-C target-cpu=native -C force-frame-pointers=yes"
6969
# The main difference between this and `bench-pr.yml` is that we add the `lance` feature.
7070
run: |
71-
cargo build --bin ${{ matrix.benchmark.id }} --package bench-vortex --profile release_debug --features lance
71+
cargo build --bin ${{ matrix.benchmark.id }} --package vortex-bench --profile release_debug --features lance
7272
7373
- name: Setup Polar Signals
7474
uses: polarsignals/[email protected]
@@ -84,7 +84,7 @@ jobs:
8484
env:
8585
RUST_BACKTRACE: full
8686
run: |
87-
target/release_debug/${{ matrix.benchmark.id }} -d gh-json -o ${{ matrix.benchmark.id }}.json --formats parquet,lance,vortex
87+
target/release_debug/${{ matrix.benchmark.id }} --formats parquet,lance,vortex -o results.json
8888
8989
- name: Setup AWS CLI
9090
uses: aws-actions/configure-aws-credentials@v5
@@ -95,7 +95,8 @@ jobs:
9595
- name: Upload Benchmark Results
9696
shell: bash
9797
run: |
98-
bash scripts/cat-s3.sh vortex-benchmark-results-database data.json.gz ${{ matrix.benchmark.id }}.json
98+
bash scripts/cat-s3.sh vortex-benchmark-results-database data.json.gz results.json
99+
99100
sql:
100101
uses: ./.github/workflows/sql-benchmarks.yml
101102
secrets: inherit
@@ -108,73 +109,71 @@ jobs:
108109
"subcommand": "clickbench",
109110
"name": "Clickbench on NVME",
110111
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,datafusion:lance,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact,duckdb:duckdb",
111-
"build_args": "--features lance"
112+
"build_lance": true
112113
},
113114
{
114115
"id": "tpch-nvme",
115116
"subcommand": "tpch",
116117
"name": "TPC-H SF=1 on NVME",
117118
"targets": "datafusion:arrow,datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,datafusion:lance,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact,duckdb:duckdb",
118-
"scale_factor": "--scale-factor 1.0",
119-
"build_args": "--features lance"
119+
"scale_factor": "1.0",
120+
"build_lance": true
120121
},
121122
{
122123
"id": "tpch-s3",
123124
"subcommand": "tpch",
124125
"name": "TPC-H SF=1 on S3",
125-
"local_dir": "bench-vortex/data/tpch/1.0",
126+
"local_dir": "vortex-bench/data/tpch/1.0",
126127
"remote_storage": "s3://vortex-bench-dev-eu/${{github.ref_name}}/${{github.run_id}}/tpch/1.0/",
127128
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,datafusion:lance,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact",
128-
"scale_factor": "--scale-factor 1.0",
129-
"build_args": "--features lance"
129+
"scale_factor": "1.0",
130+
"build_lance": true
130131
},
131132
{
132133
"id": "tpch-nvme-10",
133134
"subcommand": "tpch",
134135
"name": "TPC-H SF=10 on NVME",
135136
"targets": "datafusion:arrow,datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,datafusion:lance,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact,duckdb:duckdb",
136-
"scale_factor": "--scale-factor 10.0",
137-
"build_args": "--features lance"
137+
"scale_factor": "10.0",
138+
"build_lance": true
138139
},
139140
{
140141
"id": "tpch-s3-10",
141142
"subcommand": "tpch",
142143
"name": "TPC-H SF=10 on S3",
143-
"local_dir": "bench-vortex/data/tpch/10.0",
144+
"local_dir": "vortex-bench/data/tpch/10.0",
144145
"remote_storage": "s3://vortex-bench-dev-eu/${{github.ref_name}}/${{github.run_id}}/tpch/10.0/",
145146
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,datafusion:lance,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact",
146-
"scale_factor": "--scale-factor 10.0",
147-
"build_args": "--features lance"
147+
"scale_factor": "10.0",
148+
"build_lance": true
148149
},
149150
{
150151
"id": "tpcds-nvme",
151152
"subcommand": "tpcds",
152153
"name": "TPC-DS SF=1 on NVME",
153154
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact,duckdb:duckdb",
154-
"scale_factor": "--scale-factor 1.0"
155+
"scale_factor": "1.0"
155156
},
156157
{
157158
"id": "statpopgen",
158159
"subcommand": "statpopgen",
159160
"name": "Statistical and Population Genetics",
160-
"local_dir": "bench-vortex/data/statpopgen",
161+
"local_dir": "vortex-bench/data/statpopgen",
161162
"targets": "duckdb:parquet,duckdb:vortex,duckdb:vortex-compact",
162-
"scale_factor": "--scale-factor 100"
163+
"scale_factor": "100"
163164
},
164165
{
165166
"id": "fineweb",
166167
"subcommand": "fineweb",
167168
"name": "FineWeb NVMe",
168-
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact",
169-
"scale_factor": "--scale-factor 100"
169+
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact"
170170
},
171171
{
172172
"id": "fineweb-s3",
173173
"subcommand": "fineweb",
174174
"name": "FineWeb S3",
175-
"local_dir": "bench-vortex/data/fineweb",
175+
"local_dir": "vortex-bench/data/fineweb",
176176
"remote_storage": "s3://vortex-bench-dev-eu/${{github.ref_name}}/${{github.run_id}}/fineweb/",
177-
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact",
178-
"scale_factor": "--scale-factor 100"
177+
"targets": "datafusion:parquet,datafusion:vortex,datafusion:vortex-compact,duckdb:parquet,duckdb:vortex,duckdb:vortex-compact"
179178
},
180179
]

.github/workflows/ci.yml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,8 @@ jobs:
399399
if: ${{ matrix.suite == 'tpc-h' }}
400400
# We use i2 to ensure that restarting the duckdb connection succeeds
401401
run: |
402-
cargo run --bin query_bench -- tpch -i2 --targets "datafusion:vortex,datafusion:vortex-compact,duckdb:vortex,duckdb:vortex-compact" --scale-factor 0.1
402+
cargo run --bin datafusion-bench -- tpch -i2 --formats "vortex,vortex-compact" --opt scale-factor=0.1
403+
cargo run --bin duckdb-bench -- tpch -i2 --formats "vortex,vortex-compact" --opt scale-factor=0.1
403404
- name: Run FFI Example
404405
if: ${{ matrix.suite == 'ffi' }}
405406
run: |
@@ -411,12 +412,12 @@ jobs:
411412
run: |
412413
grcov . --binary-path target/debug/ -s . -t lcov --llvm --ignore-not-existing \
413414
--threads $(nproc) \
414-
--ignore '../*' --ignore '/*' --ignore 'fuzz/*' --ignore 'bench-vortex/*' \
415+
--ignore '../*' --ignore '/*' --ignore 'fuzz/*' --ignore 'vortex-bench/*' \
415416
--ignore 'home/*' --ignore 'xtask/*' --ignore 'target/*' --ignore 'vortex-error/*' \
416417
--ignore 'vortex-python/*' --ignore 'vortex-jni/*' --ignore 'vortex-flatbuffers/*' \
417418
--ignore 'vortex-proto/*' --ignore 'vortex-tui/*' --ignore 'vortex-datafusion/examples/*' \
418419
--ignore 'vortex-ffi/examples/*' --ignore '*/arbitrary/*' --ignore '*/arbitrary.rs' --ignore 'vortex-cxx/*' \
419-
--ignore 'vortex-gpu/*' \
420+
--ignore 'vortex-gpu/*' --ignore benchmarks/* \
420421
-o ${{ env.GRCOV_OUTPUT_FILE }}
421422
- name: Codecov
422423
uses: codecov/codecov-action@v5
@@ -528,10 +529,11 @@ jobs:
528529
tool: nextest
529530
- name: Rust Tests (Windows)
530531
if: matrix.os == 'windows-x64'
531-
run: cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude bench-vortex --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz
532+
run: |
533+
cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude vortex-bench --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz --exclude duckdb-bench --exclude lance-bench --exclude datafusion-bench --exclude random-access-bench --exclude compress-bench
532534
- name: Rust Tests (Other)
533535
if: matrix.os != 'windows-x64'
534-
run: cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude bench-vortex --exclude vortex-duckdb
536+
run: cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude vortex-bench --exclude vortex-duckdb
535537

536538
build-java:
537539
name: "Java"

.github/workflows/nightly-bench.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,34 +38,34 @@ jobs:
3838
"subcommand": "tpch",
3939
"name": "TPC-H on NVME",
4040
"targets": "datafusion:parquet,datafusion:vortex,datafusion:lance,duckdb:parquet,duckdb:vortex,duckdb:duckdb",
41-
"scale_factor": "--scale-factor 10.0",
41+
"scale_factor": "10.0",
4242
"build_args": "--features lance"
4343
},
4444
{
4545
"id": "tpch-s3",
4646
"subcommand": "tpch",
4747
"name": "TPC-H on S3",
48-
"local_dir": "bench-vortex/data/tpch/10.0",
48+
"local_dir": "vortex-bench/data/tpch/10.0",
4949
"remote_storage": "s3://vortex-bench-dev-eu/${{github.ref_name}}/${{github.run_id}}/tpch/10.0/",
5050
"targets": "datafusion:parquet,datafusion:vortex,datafusion:lance,duckdb:parquet,duckdb:vortex",
51-
"scale_factor": "--scale-factor 10.0",
51+
"scale_factor": "10.0",
5252
"build_args": "--features lance"
5353
},
5454
{
5555
"id": "tpch-nvme",
5656
"subcommand": "tpch",
5757
"name": "TPC-H on NVME",
5858
"targets": "datafusion:parquet,duckdb:parquet,duckdb:vortex",
59-
"scale_factor": "--scale-factor 100"
59+
"scale_factor": "100"
6060
},
6161
{
6262
"id": "tpch-s3",
6363
"subcommand": "tpch",
6464
"name": "TPC-H on S3",
65-
"local_dir": "bench-vortex/data/tpch/100.0",
65+
"local_dir": "vortex-bench/data/tpch/100.0",
6666
"remote_storage": "s3://vortex-bench-dev-eu/${{github.ref_name}}/${{github.run_id}}/tpch/100.0/",
6767
"targets": "datafusion:parquet,duckdb:parquet,duckdb:vortex",
68-
"scale_factor": "--scale-factor 100.0"
68+
"scale_factor": "100.0"
6969
},
7070
]
7171
strategy:

0 commit comments

Comments
 (0)