Skip to content

Conversation

@jayshrivastava
Copy link
Collaborator

@jayshrivastava jayshrivastava commented Jul 27, 2025

This change adds a new abstraction called a Grouper in distribution_strategy.rs. This is responsible for grouping partitions into partition groups. The new distribution_strategy module can be used to store other splitters and, in the future, distribution strategies like round robin etc.

A "partition group" is now a range [starting_partition, ending_partition) instead of a Vec, so we improve space efficiency. In the proto layer, the DDTask proto still uses a Vec, so we simply expand the range into a Vec when creating them in planning.rs::assign_to_workers.

This also fixes a bug in the isolator where it may not have reported the right number of partitions.

Testing

  • unit tests in isolator.rs
  • unit tests for build_replacement in planning.rs
  • unit tests for the grouper in distribution_strategy.rs

@jayshrivastava jayshrivastava force-pushed the js/distribution-tests branch from bf7b224 to 506a7fe Compare July 27, 2025 18:12
src/planning.rs Outdated
stage_id: stage.stage_id,
plan_bytes,
partition_group: partition_group.to_vec(),
partition_group: (partition_group.start() as u64..partition_group.end() as u64).collect(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if I can change the DDTask proto to use a range as well.

src/planning.rs Outdated
let new_child = Arc::new(PartitionIsolatorExec::new(
child.clone(),
partitions_per_worker.unwrap(), // we know it is a Some, here.
partition_count,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a bug before. The isolator seems to want the total number of partitions here.

@jayshrivastava jayshrivastava force-pushed the js/distribution-tests branch 2 times, most recently from 25fe243 to 0fa3702 Compare July 27, 2025 18:45
@jayshrivastava jayshrivastava marked this pull request as ready for review July 27, 2025 18:47
@jayshrivastava jayshrivastava force-pushed the js/distribution-tests branch from 0fa3702 to 0ec389d Compare July 27, 2025 18:50
Copy link
Collaborator

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive first PR, @jayshrivastava
I have a few minor comments to suggest clearer names and more unit tests

This change adds a new abstraction called a `Grouper` in `distribution_strategy.rs`. This is responsible for grouping partitions into partition groups. The new `distribution_strategy` module can be used to store other splitters and, in the future, distribution strategies like round robin etc.

A "partition group" is now a range [starting_partition, ending_partition) instead of a Vec, so we improve space efficiency. In the proto layer, the `DDTask` proto still uses a Vec, so we simply expand the range into a Vec when creating them in `planning.rs::assign_to_workers`.

Testing
- unit tests in isolator.rs
- unit tests for `build_replacement` in `planning.rs`
- unit tests for the grouper in `distribution_strategy.rs`
@jayshrivastava jayshrivastava force-pushed the js/distribution-tests branch from 05d3d2a to 6c86f29 Compare July 29, 2025 19:03
@jayshrivastava jayshrivastava merged commit 4b9a48c into main Jul 29, 2025
3 checks passed
@jayshrivastava jayshrivastava deleted the js/distribution-tests branch July 29, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants