Skip to content

Conversation

antiguru
Copy link

Changes the merge batcher to use a container builder to produce its output. This introduces several interesting behavior and unfortunate code changes:

  • The merge batcher becomes simpler by delegating chain formation to a container builder. Instead of replicating the logic from capacity container builders, it can absorb whatever builder we give it. The column builder can happily build 2MiB chunks, while the vector/timely stack builders still aim for 8k allocations.
  • To avoid regressing, the push_and_add to push two items and add their diffs needs to be defined on the container builder. This arguably is a smell, but I don't know of an alternative, because we need to call copy_descructured for timely stacks :(
  • !! The merge batcher doesn't clear recycled allocations. This is because in columnar, a clear moves a column from any variant to typed, potentially discarding its allocation. Avoiding the clear call might leave some memory sitting around for a bit longer, but the container builders are used to clearing on re-use, so no problems expected on that side.
  • I took the liberty and addressed some of the to-dos in the columnar example.

The PR is in the wrong order, the merge batcher changes should go into master and then the columnar builder should pick up the new interface. But, this is how I started so I'm leaving it as-is until we talk about it.

@antiguru antiguru force-pushed the merge_batcher_cb2 branch 3 times, most recently from 5ab4ee7 to 9957bd5 Compare June 18, 2025 15:22
@frankmcsherry frankmcsherry force-pushed the columnar_builder branch 2 times, most recently from 5bb9784 to cecae53 Compare June 20, 2025 23:30
@antiguru antiguru force-pushed the merge_batcher_cb2 branch from 9957bd5 to 5c61b40 Compare June 24, 2025 09:25
frankmcsherry and others added 2 commits June 24, 2025 07:48
* Demonstrate Columnar batch builder

* Correct example, improve chunking

* Simplify constraints

* Form key batches in columnar

Signed-off-by: Moritz Hoffmann <[email protected]>

---------

Signed-off-by: Moritz Hoffmann <[email protected]>
Co-authored-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>

# Conflicts:
#	Cargo.toml
#	differential-dataflow/Cargo.toml
#	differential-dataflow/examples/columnar.rs

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru force-pushed the merge_batcher_cb2 branch from d8ff8b3 to e2970f8 Compare June 24, 2025 12:10
@antiguru
Copy link
Author

Superseded by TimelyDataflow#613

@antiguru antiguru closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants