Skip to content

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Oct 18, 2025

Related issues

Notes

  • Take advantage of
    • pc.unique preserving order
    • structs/lists
  • Questions
    • How much can be performed without collection?

Tasks

@dangotbanned dangotbanned mentioned this pull request Oct 18, 2025
71 tasks
`expected` is now taken from testing the same selector on `main`
@dangotbanned dangotbanned added enhancement New feature or request fix labels Oct 19, 2025
- Already works, but I wanna add some optimizations for the single partition case
- `pc.unique` can be used directly on a lot of `ChunkedArray` types, but `filter` will drop nulls by default, so needs some care if present
Avoids the need for a tempoary composite key column, by using `dictionary_encode` and generating boolean masks based on index position
Left a comment in `selectors` about this issue earlier
Comment on lines +198 to +201
for idx in range(len(arr_dict.dictionary)):
# NOTE: Acero filter doesn't support `null_selection_behavior="emit_null"`
# Is there any reasonable way to do this in Acero?
yield native.filter(pc.equal(pa.scalar(idx), indices))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for use in over(partition_by=...)?

if so, just as a heads up, we won't be able to accept a solution which involves looping over partitions in python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request fix internal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants