Skip to content

[data] Possible bug / regression in nightly with autoscaling #54548

@praateekmahajan

Description

@praateekmahajan

What happened + What you expected to happen

The following code on 2.47.1 vs 3.0.0dev (877feaf1f372a3309345566949a91caa43cdc6b9) is ~3x slower.

Versions / Dependencies

2.47.1 vs 3.0.0.dev

Reproduction script

import ray

ray.init(address="local", ignore_reinit_error=True, num_cpus=8)

class SimpleMap:
    def __call__(self, x): 
        return x

class FlatMap:
    def __call__(self, x):
        return {
            "items" : [f"{item}{suffix}" for item in x["items"] for suffix in "abcdefghijklm"]
        }

ds = (
    ray.data.from_items([1])
    .map_batches(lambda x: {"items" : list(range(75))})
    .repartition(target_num_rows_per_block=1)
    .map_batches(lambda x : x, num_cpus=1.1) # this needs to be 1.1 in 2.47.1 to avoid fusion
    .map_batches(SimpleMap, concurrency=(1, 4))
    .map_batches(FlatMap, concurrency=(1, 4))
    .repartition(target_num_rows_per_block=1)
    .map_batches(SimpleMap, concurrency=(1, 4), num_cpus=1.1)  # this needs to be 1.1 in 2.47.1 to avoid fusion
    .map_batches(lambda x : x)
)

output = ds.take_all()

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Labels

@external-author-action-requiredAlternate tag for PRs where the author doesn't have labeling permission.P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesperformanceregression

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions