Skip to content

GH-47269: [C++][Acero] Support for multi threaded input: ScalarAggregateNode , GroupByNode and SortedMergeNode#47270

Closed
X-Lemon-X wants to merge 20 commits intoapache:mainfrom
X-Lemon-X:aggregate_sequencer
Closed

GH-47269: [C++][Acero] Support for multi threaded input: ScalarAggregateNode , GroupByNode and SortedMergeNode#47270
X-Lemon-X wants to merge 20 commits intoapache:mainfrom
X-Lemon-X:aggregate_sequencer

Conversation

@X-Lemon-X
Copy link

@X-Lemon-X X-Lemon-X commented Aug 7, 2025

Rationale for this change

  • Lack of multi threading inputs support (batches were incoming not in order if they were ordered) with nodes like ScalarAggregateNode , GroupByNode and SortedMergeNode, previously resolved by disabling multi threading when using them.
  • Lack of ordering information on output of nodes ScalarAggregateNode and GroupByNode when input was ordered.

What changes are included in this PR?

  • added sequencer to input of ScalarAggregateNode and GroupByNode, (enables only it self when input is ordered).
  • added sequencer to all inputs of SortedMergeNode
  • added support for propagate ordering to ScalarAggregateNode and GroupByNode.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes. Users of those node must assert ordering on input where apropriate.

Patryk Dudziński added 5 commits August 4, 2025 15:00
…calarAggregateNode and GroupByNode, to allow supoort of parallel execution
…der to ScalarAggregateNode and GroupByNode
…erminign of orderign shoule be enabled, added implicit ordering in scalar_agregate_node in keys are present before segmented keys.
@X-Lemon-X X-Lemon-X requested a review from westonpace as a code owner August 7, 2025 10:24
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

⚠️ GitHub issue #47269 has been automatically assigned in GitHub to PR creator.

Patryk Dudziński added 6 commits August 7, 2025 14:42
…allel veriable, added jitter to test to simulate sequence issues. Moved oredign check to MakeAggregateNodeArgs
…st cases when incoming batches are not in order.
…n GroupByNode and ScalarAggregateNode moved inputs from NodeArgs to Make fucniton to establish ordering
@raulcd
Copy link
Member

raulcd commented Aug 18, 2025

This seems related to the discussion on:

Comment on lines +2986 to +2987
# with pytest.raises(NotImplementedError):
# table.group_by("b").aggregate([("a", "first")])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented for any reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to remove it.

@X-Lemon-X
Copy link
Author

This seems related to the discussion on:

* [Future plans for Acero #47331](https://github.com/apache/arrow/discussions/47331)

yes

@raulcd raulcd requested a review from zanmato1984 August 19, 2025 08:10
@raulcd
Copy link
Member

raulcd commented Aug 19, 2025

@github-actions crossbow submit -g cpp

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 19, 2025
@github-actions
Copy link

Revision: 61f86f2

Submitted crossbow builds: ursacomputing/crossbow @ actions-e373717988

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@X-Lemon-X X-Lemon-X marked this pull request as draft August 21, 2025 09:53
@X-Lemon-X X-Lemon-X closed this Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants