Skip to content

feat: DH-21522: allow column region optimizations in Predicate Pushdown filtering#7666

Open
lbooker42 wants to merge 26 commits intodeephaven:mainfrom
lbooker42:nightly/DH-21522-parquettablelocation
Open

feat: DH-21522: allow column region optimizations in Predicate Pushdown filtering#7666
lbooker42 wants to merge 26 commits intodeephaven:mainfrom
lbooker42:nightly/DH-21522-parquettablelocation

Conversation

@lbooker42
Copy link
Contributor

@lbooker42 lbooker42 commented Feb 10, 2026

This PR refactors predicate pushdown filtering for regioned column sources to support both table-location and per-column-region optimizations, enabling more granular (region-level) pushdown planning and execution.

Changes:

  • Introduces a new RegionedPushdownAction model (Location vs Region actions) and new regioned pushdown filter context types.
  • Updates regioned pushdown execution to run per-region and merge results, enabling column-region pushdown participation.
  • Refactors ParquetTableLocation pushdown logic from an internal enum to the new action-based API and updates the engine interfaces accordingly.

Code Coverage Summary:

  • ImmutableConstant[Type]Source
    • makePushdownFilterContext / estimatePushdownFilterCost at 100%
    • pushdownFilter effectively 100% (empty selection detection not tested, might not be reachable without dedicated test)

@lbooker42 lbooker42 marked this pull request as draft February 10, 2026 18:02
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

No docs changes detected for 62c13dc

@lbooker42 lbooker42 changed the title Nightly/dh 21522 parquettablelocation DH-21522: allow column region optimizations in Predicate Pushdown filtering Feb 10, 2026
@lbooker42 lbooker42 requested a review from Copilot February 10, 2026 18:03
@lbooker42 lbooker42 self-assigned this Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors predicate pushdown filtering for regioned column sources to support both table-location and per-column-region optimizations, enabling more granular (region-level) pushdown planning and execution.

Changes:

  • Introduces a new RegionedPushdownAction model (Location vs Region actions) and new regioned pushdown filter context types.
  • Updates regioned pushdown execution to run per-region and merge results, enabling column-region pushdown participation.
  • Refactors ParquetTableLocation pushdown logic from an internal enum to the new action-based API and updates the engine interfaces accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
extensions/parquet/table/src/test/java/io/deephaven/parquet/table/location/ParquetTableLocationTest.java Removes the prior unit test that validated pushdown mode cost ordering.
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/location/ParquetTableLocation.java Implements location-level supported actions and action contexts for parquet pushdown planning/execution.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedPushdownHelper.java Adds shared utilities for region-thread context and combining per-region pushdown results.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedPushdownFilterMatcher.java Introduces the regioned action-based pushdown matcher API.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedPushdownFilterContext.java Adds a regioned pushdown context carrying column definitions + rename mappings.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedPushdownFilterLocationContext.java Extends the regioned context with access to the current table location.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedPushdownAction.java Defines the new pushdown action abstraction (Location/Region) and related context interfaces.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedColumnSourceManager.java Refactors manager-level pushdown scheduling/merging and exposes internals needed for region pushdown.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/RegionedColumnSourceBase.java Refactors regioned column source pushdown to operate directly on regions with location-aware contexts.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/GenericColumnRegionBase.java Adds default region-level pushdown orchestration combining region + location actions by cost.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/regioned/ColumnRegion.java Makes column regions pushdown-capable via RegionedPushdownFilterMatcher.
engine/table/src/main/java/io/deephaven/engine/table/impl/sources/UnionSourceManager.java Updates to use the new BasePushdownFilterContext#filter() accessor.
engine/table/src/main/java/io/deephaven/engine/table/impl/locations/impl/AbstractTableLocation.java Adds default action-based pushdown planning/execution for table locations.
engine/table/src/main/java/io/deephaven/engine/table/impl/locations/TableLocation.java Switches TableLocation to the new RegionedPushdownFilterMatcher API.
engine/table/src/main/java/io/deephaven/engine/table/impl/PushdownResult.java Adds a new cost constant for region-level single-value optimizations.
engine/table/src/main/java/io/deephaven/engine/table/impl/PushdownFilterMatcher.java Provides default implementations for pushdown matcher methods.
engine/table/src/main/java/io/deephaven/engine/table/impl/BasePushdownFilterContext.java Makes the base context abstract and introduces filter() accessor (encapsulation change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lbooker42 lbooker42 changed the title DH-21522: allow column region optimizations in Predicate Pushdown filtering feat: DH-21522: allow column region optimizations in Predicate Pushdown filtering Feb 24, 2026
@lbooker42 lbooker42 marked this pull request as ready for review February 24, 2026 23:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 64 out of 64 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@cpwright cpwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new code fully covered by the tests?

import io.deephaven.chunk.attributes.Any;
import io.deephaven.chunk.util.pools.MultiChunkPool;

import io.deephaven.function.ArraySort;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added imports without any actual code changes here and in WritableBooleanChunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were auto-added when I executed the replication code. I'll look deeper and try to understand how/why

}

@Override
@MustBeInvokedByOverriders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the MustBeInvokedByOverrides annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my logic, maybe it's wrong:

Might be incorrect annotation, but if we implement an override of this class I think we should include parent supported actions in our list (since they would also be supported). So a Null or Constant Char region would still be optimized.

if (localColumnName.equals(filterColumnName)) {
continue;
/**
* Get (or create) a map from column source to column name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad c/p of javadoc

is there a reason we've decided to defer this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and columnSourceToName are only used in filtering. We were deferring creating columnSourceToName but not columnNameToDefinition. Would like to be consistent between these two (either deferring both or neither).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 74 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +163
try {
ctx.shiftedRowSet = tle.subsetAndShiftIntoLocationSpace(selection);
getRegion(regionIndex).estimatePushdownFilterCost(
filter,
ctx.shiftedRowSet,
usePrev,
newCtx,
jobScheduler,
regionCost -> {
minCost.updateAndGet(old -> Math.min(old, regionCost));
resume.run();
newCtx.close();
},
nec);
} catch (final Exception e) {
// In the case of an exception, clean up the temporary context.
newCtx.close();
throw e;
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential resource management issue: newCtx.close() is called both in the success callback (line 156) and in the catch block (line 161). If an exception occurs after the callback is invoked but before it completes, the context could be closed twice. Consider using try-with-resources or a more explicit cleanup pattern to avoid potential double-close scenarios.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do try with resources because of the thread switch. This is the best we can do, release on error (although this context current holds no resources).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants