fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects by joshua-gould · Pull Request #384 · dask/dask-image

joshua-gould · 2024-07-18T12:22:42Z

No description provided.

…ject')] are in the [columns]" Added test

m-albert · 2024-07-23T18:13:14Z

@joshua-gould Thanks for your PR!

I was trying to reproduce the error you're fixing and found that it relates to #335.

The following code

    test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
    test_labels[0, 0] = 1
    computed_result = dask_image.ndmeasure.find_objects(test_labels).compute()

fails in the presence of pyarrow in the environment and runs through in its absence.

Unsure how to proceed, there might be an error to reproduce upstream in dask.dataframe.

jmuhlich · 2025-03-18T21:28:54Z

I ran into this issue with find_objects too, and setting dataframe.convert-string to False does not fully fix it for me. (dask=2025.2.0, dask-image=2024.5.3, also tested with dask-image latest from github today) I found that triggering the issue depends on the specific layout of the image and which chunks are all zero. Generally it seems that empty chunks "earlier" (left or above?) are problematic. A fully empty label image that has more than one chunk in each dimension always errors. The only thing that fixes it for me is reverting to dask=2024.12.1 without dask-expr.

m-albert · 2025-03-20T13:22:50Z

Thanks @jmuhlich for reporting this here!

It seems that the following example (which is also included in the tests added in this PR):

import dask.array as da
import dask_image.ndmeasure

test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
test_labels[0, 0] = 1
computed_result = dask_image.ndmeasure.find_objects(test_labels).compute(scheduler='single-threaded')

fails on main
fails on Fix CI test failures #393 (which sets dataframe.convert-string to False)
runs through on this PR

In this PR, @joshua-gould works around problems that occur when merging dask dataframes.

Also here theres a mention of a pandas bug when merging dataframes. The error here might be related to that.

I didn't have the time yet to find out what's going wrong in the merge. I think it'd be good to report the results of this upstream.

Independent of upstream we should incorporate this workaround here I think.

I found that triggering the issue depends on the specific layout of the image and which chunks are all zero

@jmuhlich Does the code in this PR fix the problems you mention?

jakirkham · 2025-03-22T05:15:22Z

Fixed up some conflicts introduced by a recent PR fixing CI: #393

Hope that is ok

Please feel free to tweak further as needed

m-albert · 2025-04-12T19:04:31Z

Quick summary

This PR fixes find_objects for a straight forward use case, and tests for it.

Problem on main

The following fails

import dask.array as da
import dask_image.ndmeasure

test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
test_labels[0, 0] = 1
computed_result = dask_image.ndmeasure.find_objects(test_labels).compute(scheduler='single-threaded')

The fix in this PR

The following fails with empty df1 and df2 in some cases:

ddf = dd.merge(df1, df2, how="outer", left_index=True, right_index=True)

The following workaround in this PR fixes it:

    if len(df1) > 0 and len(df2) > 0:
        ddf = dd.merge(
            df1, df2,
            how="outer", left_index=True, right_index=True)
    elif len(df1) > 0:
        ddf = df1
    elif len(df2) > 0:
        ddf = df2
    else:
        ddf = pd.DataFrame()

Conclusion

I suspect there's a problem with dd.merge when working on empty frames, however I couldn't find a minimal reproducer yet.

Since the workaround here works and fixes a problem that came up for different people, I'd propose to merge this PR as is and report a potential problem upstream once we identify it.

What do you think @jakirkham? 🙏

m-albert · 2025-05-16T09:39:17Z

This came up in a new issue again #403 so I'll go ahead and merge this.

If further discussion / changes are required I suggest we open a new issue! Thanks @joshua-gould for this fix and everyone for contributing 🙏

fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='ob…

81bf7c6

…ject')] are in the [columns]" Added test

joshua-gould changed the title ~~fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_object~~ fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects Jul 18, 2024

m-albert mentioned this pull request Mar 20, 2025

Fix CI test failures #393

Merged

jakirkham added 2 commits March 21, 2025 22:13

Merge branch 'main' into find_objects_empty

413098f

Fix-up conflict resolution

253d943

jakirkham and others added 3 commits March 21, 2025 22:41

Merge branch 'main' into find_objects_empty

948d704

Merge branch 'main' into find_objects_empty

bbec72c

Fix flake8 error

931367b

m-albert merged commit bab4261 into dask:main May 16, 2025
17 checks passed

m-albert mentioned this pull request May 16, 2025

find_objects fails when empty chunks exist inside dask label array. #403

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects#384

fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects#384
m-albert merged 6 commits intodask:mainfrom
joshua-gould:find_objects_empty

joshua-gould commented Jul 18, 2024

Uh oh!

m-albert commented Jul 23, 2024 •

edited

Loading

Uh oh!

jmuhlich commented Mar 18, 2025

Uh oh!

m-albert commented Mar 20, 2025 •

edited

Loading

Uh oh!

jakirkham commented Mar 22, 2025

Uh oh!

m-albert commented Apr 12, 2025

Uh oh!

m-albert commented May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

joshua-gould commented Jul 18, 2024

Uh oh!

m-albert commented Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmuhlich commented Mar 18, 2025

Uh oh!

m-albert commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Mar 22, 2025

Uh oh!

m-albert commented Apr 12, 2025

Quick summary

Problem on main

The fix in this PR

Conclusion

Uh oh!

m-albert commented May 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

m-albert commented Jul 23, 2024 •

edited

Loading

m-albert commented Mar 20, 2025 •

edited

Loading