[Data] Support strict=False mode for StreamingRepartition#60295
[Data] Support strict=False mode for StreamingRepartition#60295machichima wants to merge 26 commits intoray-project:masterfrom
Conversation
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a strict parameter to StreamingRepartition, allowing for a non-strict mode. In non-strict mode, repartitioning doesn't stitch blocks, which enables more operator fusion opportunities. The changes are well-implemented across the logical planning, fusion rules, and physical planning layers. The default for repartition is now non-strict, which is a good choice for performance. The added tests are comprehensive and cover both the new non-strict behavior and the fusion logic. My main feedback is to add documentation for the new strict parameter in the user-facing Dataset.repartition method to ensure users understand how to use it.
| num_blocks: Optional[int] = None, | ||
| target_num_rows_per_block: Optional[int] = None, | ||
| *, | ||
| strict: bool = False, |
There was a problem hiding this comment.
The new strict parameter should be documented in the repartition method's docstring. Explaining the difference between strict=True (the old behavior) and strict=False (the new default) is important for users to understand its impact on block sizes and fusion.
You could add something like this to the Args section:
strict: If ``True``, `repartition` guarantees that all output blocks, except for the last one, will have `target_num_rows_per_block` rows. If ``False``, `repartition` is more relaxed and may produce blocks smaller than `target_num_rows_per_block` without stitching them. This is only used with `target_num_rows_per_block`. Defaults to ``False``.Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
|
@owenowenisme PTAL. Thank you! |
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
6cfbfc5 to
04964bc
Compare
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
owenowenisme
left a comment
There was a problem hiding this comment.
test_operator_fusion is failing could you please take a look?
| input_physical_dag, | ||
| data_context, | ||
| name=op.name, | ||
| compute_strategy=compute, |
There was a problem hiding this comment.
I think we need min_rows_per_bundle = op.target_num_rows_per_block here if strict=False?
There was a problem hiding this comment.
Seems like when we set min_rows_per_bundle here, the BlockRefBundler will try to stitch the output:
Therefor, I think we should keep it as None here to prevent stitching
ray/python/ray/data/_internal/execution/operators/map_operator.py
Lines 828 to 835 in 68d01c4
| strict: If ``True``, ``repartition`` guarantees that all output blocks, | ||
| except for the last one, will have exactly ``target_num_rows_per_block`` rows. | ||
| If ``False``, ``repartition`` is more relaxed and may produce blocks smaller | ||
| than ``target_num_rows_per_block`` without stitching them together. | ||
| This parameter is only used with ``target_num_rows_per_block``. | ||
| Defaults to ``False``. |
There was a problem hiding this comment.
Might be better to say that will only produce at most 1 block that is < target_num_rows_per_block per input block if strict is false.
|
|
||
|
|
||
| @pytest.mark.parametrize("batch_size", [30, 35, 45]) | ||
| def test_streaming_repartition_fusion_non_strict( |
There was a problem hiding this comment.
I think fusion test should be in python/ray/data/tests/test_operator_fusion.py
There was a problem hiding this comment.
There's existing fusion and streaming repartition related test in this file, I think we can put this here as it align with existing tests. WDYT?
| ref_bundler = StreamingRepartitionRefBundler(batch_size) | ||
| # No further fusion because StreamingRepartitionRefBundler is stateful | ||
| # and maintains internal buffering state across bundles. | ||
| supports_fusion = False |
There was a problem hiding this comment.
Will this prevent fusion when batch_size == target_num_rows_per_block ?
There was a problem hiding this comment.
Yes, but I think it's intended. As the original code (strict mode) hard-coded supports_fusion=False to prevent further fusion
# For now, we don't want to over-fuse StreamingRepartition with other map operators,
# so the result operator does not support further fusion.
supports_fusion=False,| strict: If True, guarantees that all output blocks, except for the last one, | ||
| will have exactly target_num_rows_per_block rows. If False, is more relaxed | ||
| and may produce blocks smaller than target_num_rows_per_block without | ||
| stitching them together. Defaults to False. |
There was a problem hiding this comment.
Ditto with the comment in dataset.py
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
| if not ( | ||
| isinstance(up_logical_op, MapBatches) | ||
| and up_logical_op._batch_size is not None | ||
| and down_logical_op.target_num_rows_per_block is not None | ||
| and down_logical_op.target_num_rows_per_block > 0 | ||
| # When the batch_size is a multiple of target_num_rows_per_block, fusing would still produce exactly identical sequence of blocks. | ||
| # See `_fuse_streaming_repartition_operators_in_dag` docstring for details. | ||
| # TODO: when the StreamingRepartition supports none_strict_mode, we can fuse | ||
| # `MapBatches -> StreamingRepartition` no matter what the `batch_size` and `target_num_rows` are. | ||
| # https://anyscale1.atlassian.net/browse/DATA-1731 | ||
| and up_logical_op._batch_size | ||
| % down_logical_op.target_num_rows_per_block | ||
| ): | ||
| return False |
There was a problem hiding this comment.
I don't think this logic is correct -- if _batch_size is None we'd still allow to fuse StreamingRepartition
There was a problem hiding this comment.
Hi @alexeykudinkin ,
I was following the original logic here, which also return False when _batch_size is None
ray/python/ray/data/_internal/logical/rules/operator_fusion.py
Lines 280 to 294 in 8e2e0aa
Also, while we use StreamingRepartitionRefBundler(batch_size), based on the class def, the batch_size cannot be None
ray/python/ray/data/_internal/streaming_repartition.py
Lines 34 to 37 in 68d01c4
Therefor, I think we should keep this here?
There was a problem hiding this comment.
Well, you're relaxing this, right?
There are now should be 2 modes:
- StreamingRepartition(strict=True): batch-size need to be exact multiple of
target_num_rows_per_blockto produce correct results. - StreamingRepartition(strict=False): batch-size could be anything (even null)
There was a problem hiding this comment.
Make sense! Thank you for pointing this out. Updated in c77787c
There was a problem hiding this comment.
| if not ( | |
| isinstance(up_logical_op, MapBatches) | |
| and up_logical_op._batch_size is not None | |
| and down_logical_op.target_num_rows_per_block is not None | |
| and down_logical_op.target_num_rows_per_block > 0 | |
| # When the batch_size is a multiple of target_num_rows_per_block, fusing would still produce exactly identical sequence of blocks. | |
| # See `_fuse_streaming_repartition_operators_in_dag` docstring for details. | |
| # TODO: when the StreamingRepartition supports none_strict_mode, we can fuse | |
| # `MapBatches -> StreamingRepartition` no matter what the `batch_size` and `target_num_rows` are. | |
| # https://anyscale1.atlassian.net/browse/DATA-1731 | |
| and up_logical_op._batch_size | |
| % down_logical_op.target_num_rows_per_block | |
| ): | |
| return False | |
| if ( | |
| not isinstance(up_logical_op, MapBatches) | |
| or not down_logical_op.target_num_rows_per_block | |
| ): | |
| return False |
Can we simplify the logic here like this? And add check and raise error at dataset api to check if target_num_rows_per_block is not None it should not be negative
(Maybe move this)
ray/python/ray/data/_internal/streaming_repartition.py
Lines 35 to 37 in f5a53c4
…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
| ref_bundler = StreamingRepartitionRefBundler(batch_size) | ||
| # No further fusion because StreamingRepartitionRefBundler is stateful | ||
| # and maintains internal buffering state across bundles. | ||
| supports_fusion = False |
There was a problem hiding this comment.
We'd not be blocking any subsequent fusion like that
Let's add a test that we're able to fuse multiple ops like this:
- Map > Map > SR
- Map > SR > SR
There was a problem hiding this comment.
While the comment is on line 338 (supports_fusion=False), I want to make sure do we want to support fusion for strict mode? Or just add test for non-strict mode? I think it's the latter one?
There was a problem hiding this comment.
The Map > SR > SR case cannot work here because after the first Map > SR fusion, the logical operator becomes AbstractUDFMap rather than MapBatches.
ray/python/ray/data/_internal/logical/rules/operator_fusion.py
Lines 355 to 369 in f3d444a
The current implementation only allows MapBatches > SR fusion:
To support Map > SR > SR fusion, we will need more changes, which I think is a bit out of scope of this PR.
There was a problem hiding this comment.
Let's keep it MapBatches then. Map > SR > SR needs to work
There was a problem hiding this comment.
I look into it more, seems like Map > SR > SR already worked, but it's CombineShuffles._combine() combining two SR into one, so the result will just be Map > SR
Updated the test in 8552ff9
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
|
|
||
| assert ( | ||
| f"MapBatches(<lambda>)->StreamingRepartition[num_rows_per_block={target_rows}]" | ||
| f"MapBatches(<lambda>)->StreamingRepartition[num_rows_per_block={target_rows},strict=False]" |
There was a problem hiding this comment.
Test name misleading after non-strict mode behavior change
Low Severity
The test function test_streaming_repartition_no_further_fuse has a name and docstring that describe strict mode behavior ("doesn't fuse further"), but the test uses default strict=False (line 811). In non-strict mode, the fused operator has supports_fusion=True which DOES allow further fusion. The test assertions still pass because they check for substring matches, but the test name no longer accurately describes what it tests.


Description
Currently, StreamingRepartition operator is essentially
strict=True. We want to relax this to allow non-strict mode with following guarantees:target_num_rowstarget_num_rowsblocks per input block (ie it wouldn’t do any stitching)This mode will be the default mode and would allow StreamingRepartition to be fused into previous operator
Related issues
Closes #60026
Additional information
strict: bool = Falseparameter to repartition()_get_fused_streaming_repartition_operator()andplan_streaming_repartition_op():- Strict: uses
ref_bundler=StreamingRepartitionRefBundler- Non-strict: uses
ref_bundler=None(defaultBlockRefBundler)