Benchmark restructuring #5674
AdamGS
started this conversation in
Feature Requests
Replies: 3 comments
-
|
SGTM |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
It would be great to keep ratios between datafusion and duckdb locally |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
People are welcome to follow along here |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As of this writing, we're stuck in dependency hell where we can't upgrade our DataFusion/Arrow dependencies because we pull lance to benchmark against which is still on older version because of their dependencies.
Our benchmarks also suffer from long build times because every iteration is built (at least) with DataFusion and DuckDB which are both pretty big, and iteration times are not great.
I propose the following structure:
bench-vortexbecomesvortex-benchand will only have the minimal shared code required for the benchmarks - like the JSON formatting, data generation, query and workload enumeration, potentially some shared CLI arguments, and ideally Vortex specific code behind a feature.benchesdirectory that has a bunch of dedicated benchmarking crates, one for each big engine. This way, we can have a different DF version that Lance.xtask) that can invoke those dedicated runners, so the overall experience stays streamlined as possible.Some of the non-sql benchmarks right now depend on post-processing to get ratios between vortex/parquet/lance, but we can shift those to the website.
Beta Was this translation helpful? Give feedback.
All reactions