Skip to content

Conversation

@askalt
Copy link
Contributor

@askalt askalt commented Jan 14, 2026

Which issue does this PR close?

What changes are included in this PR?

This patch adds a benchmark which is intended to measure overhead of actions, required to perform making an independent instance of the execution plan to re-execute it, avoiding re-planning stage. There are several typical plans that are tested, covering projection, aggregation, filtration, re-partition.

Are there any user-facing changes?

The function reset_plan_states(...) is publically exported.

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jan 14, 2026
@askalt askalt force-pushed the askalt/reuse_plan_bench branch from 0ee1d01 to daadb0c Compare January 14, 2026 10:31
@askalt askalt force-pushed the askalt/reuse_plan_bench branch from daadb0c to e28f292 Compare January 14, 2026 14:34
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @askalt

I think it is great, though I have a suggestion about how to make it more concise and easy to maintain.

Also, this is a benchmark for reset_plan_states what do you think about renaming the file datafusion/physical-plan/benches/reset_plan_states.rs so the benchmark is easier to find?

/// Returns a typical plan for the query like:
///
/// ```sql
/// SELECT aggr1(col1) as aggr1, aggr2(col2) as aggr2 FROM t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more future proof if you actually crated the plan from SQL rather than building up the plan manually.

That way if the way SQL is planned is changed, then the benchmark will automatically update too and I think the benchmark would be substantially shorter and easier to understand

Copy link
Contributor Author

@askalt askalt Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it.

Copy link
Contributor Author

@askalt askalt Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Benchmark is moved to core to be able to use physical planner.

@askalt askalt force-pushed the askalt/reuse_plan_bench branch from e28f292 to 2506a7b Compare January 15, 2026 08:54
@github-actions github-actions bot added the core Core DataFusion crate label Jan 15, 2026
This patch adds a benchmark which is intended to measure overhead of actions,
required to perform  making an independent instance of the execution plan to
re-execute it, avoiding re-planning stage. There are several typical plans
that are tested, covering projection, aggregation, filtration, re-partition.

Also, the function `reset_plan_states(...)` is publically exported.
@askalt askalt force-pushed the askalt/reuse_plan_bench branch from 2506a7b to 0e56507 Compare January 15, 2026 08:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I profiled it with samply and indeed a huge amount of time is spent recomputing the properties

Image

@alamb alamb changed the title physical plan: add plan re-use benchmark physical plan: add reset_plan_states , plan re-use benchmark Jan 15, 2026
@alamb alamb added this pull request to the merge queue Jan 15, 2026
@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

Thank you @askalt

Merged via the queue into apache:main with commit 094e7ee Jan 15, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants