Skip to content

fix: DH-21732: Compromise for stateful partition filter reordering.#7697

Merged
cpwright merged 14 commits intodeephaven:mainfrom
cpwright:cpw/partition-reordering
Feb 23, 2026
Merged

fix: DH-21732: Compromise for stateful partition filter reordering.#7697
cpwright merged 14 commits intodeephaven:mainfrom
cpwright:cpw/partition-reordering

Conversation

@cpwright
Copy link
Contributor

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

Deploying docs previews for a52b8fe (available for 14 days)

Python
Groovy

@cpwright cpwright requested review from lbooker42 and rcaudy February 19, 2026 21:58
lbooker42
lbooker42 previously approved these changes Feb 19, 2026
@cpwright cpwright added the ReleaseNotesNeeded Release notes are needed label Feb 19, 2026
rcaudy
rcaudy previously approved these changes Feb 20, 2026
@cpwright cpwright marked this pull request as draft February 20, 2026 16:55

#### Stateful Partition Filters

If Deephaven is configured to use stateless filters by default (the default setting in 41.0 and later), then a _partition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Deephaven is configured to use stateless filters by default (the default setting in 41.0 and later), then a partition filter (a filter that only accesses partitioning columns of the data) is only stateful if the application developer explicitly marks it as such. When Deephaven is configured to use stateless filters by default, stateful partition filters are not reordered and must be evaluated on all rows of the table.

When Deephaven is configured to use stateful filters by default, the engine is permitted to treat stateful partition filters as if they were stateless for pragmatic, performance-oriented reasons.
In particular, the ordering constraints for filters on partitioning columns may be relaxed, and rather than
evaluating the filter on every row in the table, it may only be evaluated per location. This allows common partition filters to be reordered ahead of other filters and to avoid repeated evaluation against the same value. For example, the formula filter Date=today() is stateful if filters are stateful by default, but in nearly every case, users would prefer this to be evaluated early, location-by-location.

To ensure that partition filters are not reordered and are evaluated row-by-row, you can coalesce the table prior to applying the filter. When possible, specify filters in the order they should be applied rather than relying on the engine to reorder them.

rcaudy
rcaudy previously approved these changes Feb 20, 2026
@rcaudy rcaudy added this to the 42.0 milestone Feb 20, 2026
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels incomplete.

/**
* Return true if this filter is a serial filter.
*/
default boolean isSerial() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels insufficient right now.

  • Right now, most of the code that uses filters only cares about permitParallelization(), not isSerial(). Shouldn't we have had to change something in the code that adds implicit barriers?
  • I think we need to clarify the distinction between permitParallelization() and isSerial().
  • Shouldn't there be more overrides? Condition filters, for example? io.deephaven.engine.table.impl.select.ConditionFilter#permitParallelization pays attention to QueryTable.STATELESS_FILTERS_BY_DEFAULT. I would want you to look at every WhereFilter impl that overrides permitParallelization(), and consider whether we need to be instead/also serial.

Copy link
Contributor Author

@cpwright cpwright Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serial implicit barriers are only a select thing; not a filter thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look at the permitParallelization overrides.

Aside from the condition filter they can be divided into two main categories:

  • Delegation, which deserves the same treatment (and exists).
  • Turning off parallelism for filters that can't handle it (e.g. an incremental release filter).

The only particularly relevant one that is special is ConditionFilter. The point of the change is that even though a ConditionFilter might not permit parallelization (because it is stateful); we should not consider it serial unless the user explicitly said so and permit the partition aware source table to reorder it. This requires no override for a condition filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand upon the bit about implicit barriers; any filter that does not permit parallelism is treated as a stateful filter in AbstractFilterExecution so it is effectively serial and executed on it's own.

The fundamental difference is that in Filters, we try to evaluate F_a, F_b, F_c generally in some sequence of a; b; and c such that we use the restriction of the inputs from one filter to prevent us from evaluating unnecessary rows in the next filter. In select, if there are no explicit column dependencies between F_a, F_b and F_c we may evaluate all three columns simultaneously; because there would be no savings by evaluating one before the other.

@cpwright cpwright marked this pull request as ready for review February 23, 2026 19:48
@cpwright cpwright merged commit 2c00339 into deephaven:main Feb 23, 2026
24 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants