into_batches + distinct produce incorrect results

### Describe the bug

When I run the following script that deduplicates a list of strings the output is `4 11`:
```
import daft


vals = [
    "jjtzwafmzk",
    "fjinpogsnd",
    "advcnkwdgr",
    "lkloaeeuvg",
    "qdmljqvqxv",
    "bknitbecis",
    "fgqrpbilay",
    "advcnkwdgr",
    "shsjofbzml",
    "zyjbwskjyk",
    "utbqewwxoc",
    "qdmljqvqxv",
    "ezocoaxmsd",
    "qdmljqvqxv",
]

daft.set_runner_ray()

df = daft.from_pydict({"val": vals})
ddf_vals = df.into_batches(4).distinct("val")
unique_vals = ddf_vals.count_rows()
print(unique_vals, len(set(vals)))
```

The value changes when I change the number of batches in `into_batches`, which is a bit surprising to me.
With `into_partitions` I get correct results. Does `count_rows` mean different things in these two contexts?

### To Reproduce

I'm using daft version 0.7.2, ray version 2.53.0.

### Expected behavior

The output should be `11 11``

### Component(s)

Distributed Runner (flotilla)

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

into_batches + distinct produce incorrect results #6161

Describe the bug

To Reproduce

Expected behavior

Component(s)

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

into_batches + distinct produce incorrect results #6161

Description

Describe the bug

To Reproduce

Expected behavior

Component(s)

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions