Execution working on all 22 TPCH queries #89

robtandy · 2025-08-08T20:10:56Z

Edit:

Moved validation out of benchmarks and into @jayshrivastava 's tpch validation tests
Incorporated improvements suggested in comments
Ignored tpch query 22, while support for NestedLoopJoinExec is added.

This PR fixes an execution logical bug and also adds a --validate flag to the benchmarks to confirm that we calculate the correct result vs the single node case.

To run the TPCH benchmark in a distributed fashion and validate it against single node execution, follow the readme in benchmarks.

Note that the particular approach to distributed execution that this library takes requires all joins to be partition joins. We do not support CollectLeft in particular. So, the following modifications to the context before planning are required for correct operation

config
       .options_mut()
       .optimizer
       .hash_join_single_partition_threshold = 0;
config
       .options_mut()
       .optimizer
       .hash_join_single_partition_threshold_rows = 0;

config.options_mut().optimizer.prefer_hash_join = true;

At the moment this is set in the benchmark crate, but we really need to make this easy for the user and not allow them to mess up these values. I'm not sure how to do this at the moment. I think we can refactor subsequent to this PR.

Regarding adding support for other Hash join modes, I think we can do that, and then use the benchmark to compare and evaluate options for execution speed.

…s' into robtandy/fix_execution_bug

…n_bug

gabotechs

This looks really good! left just some minor comments, but looks pretty much there

gabotechs · 2025-08-11T06:52:35Z

benchmarks/src/tpch/run.rs

+
+    #[structopt(long = "validate")]
+    validate: bool,


What do you think about moving forward with @jayshrivastava's changes in #83 for validating TPCH correctness instead of this? it might be slightly better to ensure validation there because:

It would be nice to touch this code as little as possible, as this is pretty much vendored code from upstream DataFusion, and if we decide to move this project there or upstream DataFusion decides to make their benchmarks crate public, it would be difficult to port because of conflicts

We want to ensure TPCH correctness in the CI, so it might be more suitable to do it as a mandatory test suite using Cargo test tools rather than an optional step during the benchmarks

Yep i thought the same thing and i moved it out of the benchmarks and aligned with @jayshrivastava 's PR

gabotechs · 2025-08-11T06:54:30Z

src/flight_service/do_get.rs

+        /*println!(
+            "{} Task {:?} executing partition {}",
+            stage.name(),
+            task.partition_group,
+            partition
+        );*/


remove this maybe?

gabotechs · 2025-08-11T07:01:37Z

src/flight_service/do_get.rs

+        let (state, stage) = once_stage
+            .get_or_try_init(|| async {


This will lock the once_stage RefMut across the initialization, locking a shard in the self.stages dashmap across an asynchronous gap, which might be too much locking.

Fortunately, it's very easy to prevent this:

we can make the OnceCell a shared reference:

pub(super) stages: DashMap<StageKey, Arc<OnceCell<(SessionState, Arc<ExecutionStage>)>>>,

and then immediately drop the reference to the dashmap entry

let once_stage = { let entry = self.stages.entry(key).or_default(); Arc::clone(&entry) // <- dashmap RefMut get's dropped, releasing the lock for the current shard };

A good improvement. added.

gabotechs · 2025-08-11T07:04:39Z

src/flight_service/stream_partitioner_registry.rs

This file contained some tests that tested the behavior of sharing a single execution node across multiple callers with a dashmap. It's a shame to delete them, I would have expected them to still be valid here.

If you see no path forward in keeping those tests it's fine, we can build new ones eventually.

I've added it back but i'm not sure of its function or necessity, as its not referenced anywhere.

gabotechs · 2025-08-11T07:07:18Z

src/plan/arrow_flight_read.rs

-                ))?;
-                stream_from_stage_task(ticket.clone(), &url, schema.clone(), &channel_manager).await
+            let futs = child_stage_tasks.iter().enumerate().map(|(i, task)| {
+                let i_capture = i;


🤔 i_capture? It should not be necessary to capture any variables that implement the Copy trait right?

Good eye and yes, not needed!

gabotechs · 2025-08-11T07:12:00Z

tests/distributed_aggregation.rs

-        assert_snapshot!(physical_distributed_str,
+        /*assert_snapshot!(physical_distributed_str,


does this need to be commented? I think it should be fine to leave this uncommented, it should work.

FYI, this is using https://github.com/mitsuhiko/insta, which means that if the test fails, you can just do:

cargo insta review

And you will be prompted to accept the changes

You can install cargo insta with:

curl -LsSf https://insta.rs/install.sh | sh

wow, insta is awesome! 💯 Done.

This change adds a DashMap-like struct which has a background tasks to clean up entries that have outlived a configurable TTL. This struct is simliar to https://github.com/moka-rs/moka, which also uses time wheels. Having our own module avoids introducing a large dependency, which keeps this project closer to vanilla datafusion. This change is meant to be useful for #89, where it's possible for `ExecutionStages` to be orphaned in `ArrowFlightEndpoint`. We need an async task to clean up old entries. Informs: #90

gabotechs

This is awesome! 💯 in it goes

This change adds a DashMap-like struct which has a background tasks to clean up entries that have outlived a configurable TTL. This struct is simliar to https://github.com/moka-rs/moka, which also uses time wheels. Having our own module avoids introducing a large dependency, which keeps this project closer to vanilla datafusion. This change is meant to be useful for #89, where it's possible for `ExecutionStages` to be orphaned in `ArrowFlightEndpoint`. We need an async task to clean up old entries. Informs: #90

robtandy and others added 8 commits August 6, 2025 14:15

add comment for execution stage struct

9734a95

Allow passing custom codecs

957bffd

Better UX for providing user defined codecs

04728a1

Add docs

770a8d1

Merge remote-tracking branch 'origin/gabrielmusat/user-provided-codec…

8650db2

…s' into robtandy/fix_execution_bug

fix execution by storing stages in a OnceCell

95667be

execution working on tpch

17101a1

Merge remote-tracking branch 'origin/main' into robtandy/fix_executio…

7ffcf95

…n_bug

gabotechs reviewed Aug 11, 2025

View reviewed changes

robtandy added 4 commits August 12, 2025 15:05

move validation to integration tests

5ad36d8

address lints and add back stream_partitioner_registry

a350566

address lints and lock more granularly in do_get

aa30d88

add renamed file

4692166

robtandy mentioned this pull request Aug 12, 2025

Add delta report for benchmarks #91

Merged

jayshrivastava mentioned this pull request Aug 13, 2025

Create TTL map with time wheel architecture #92

Closed

gabotechs approved these changes Aug 13, 2025

View reviewed changes

gabotechs merged commit a2a8163 into main Aug 13, 2025
3 checks passed

gabotechs deleted the robtandy/fix_execution_bug branch August 13, 2025 09:27

		assert_snapshot!(physical_distributed_str,
		/*assert_snapshot!(physical_distributed_str,

Execution working on all 22 TPCH queries #89

Execution working on all 22 TPCH queries #89

Uh oh!

Conversation

robtandy commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Edit:

Uh oh!

gabotechs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gabotechs Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robtandy Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gabotechs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robtandy commented Aug 8, 2025 •

edited

Loading

gabotechs Aug 11, 2025 •

edited

Loading

robtandy Aug 12, 2025 •

edited

Loading

gabotechs left a comment •

edited

Loading