Skip to content

fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects#384

Merged
m-albert merged 6 commits intodask:mainfrom
joshua-gould:find_objects_empty
May 16, 2025
Merged

fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects#384
m-albert merged 6 commits intodask:mainfrom
joshua-gould:find_objects_empty

Conversation

@joshua-gould
Copy link
Contributor

No description provided.

@joshua-gould joshua-gould changed the title fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_object fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects Jul 18, 2024
@m-albert
Copy link
Collaborator

m-albert commented Jul 23, 2024

@joshua-gould Thanks for your PR!

I was trying to reproduce the error you're fixing and found that it relates to #335.

The following code

    test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
    test_labels[0, 0] = 1
    computed_result = dask_image.ndmeasure.find_objects(test_labels).compute()

fails in the presence of pyarrow in the environment and runs through in its absence.

Unsure how to proceed, there might be an error to reproduce upstream in dask.dataframe.

@jmuhlich
Copy link

I ran into this issue with find_objects too, and setting dataframe.convert-string to False does not fully fix it for me. (dask=2025.2.0, dask-image=2024.5.3, also tested with dask-image latest from github today) I found that triggering the issue depends on the specific layout of the image and which chunks are all zero. Generally it seems that empty chunks "earlier" (left or above?) are problematic. A fully empty label image that has more than one chunk in each dimension always errors. The only thing that fixes it for me is reverting to dask=2024.12.1 without dask-expr.

@m-albert
Copy link
Collaborator

m-albert commented Mar 20, 2025

Thanks @jmuhlich for reporting this here!

It seems that the following example (which is also included in the tests added in this PR):

import dask.array as da
import dask_image.ndmeasure

test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
test_labels[0, 0] = 1
computed_result = dask_image.ndmeasure.find_objects(test_labels).compute(scheduler='single-threaded')
  1. fails on main
  2. fails on Fix CI test failures #393 (which sets dataframe.convert-string to False)
  3. runs through on this PR

In this PR, @joshua-gould works around problems that occur when merging dask dataframes.

Also here theres a mention of a pandas bug when merging dataframes. The error here might be related to that.

I didn't have the time yet to find out what's going wrong in the merge. I think it'd be good to report the results of this upstream.

Independent of upstream we should incorporate this workaround here I think.

I found that triggering the issue depends on the specific layout of the image and which chunks are all zero

@jmuhlich Does the code in this PR fix the problems you mention?

@m-albert m-albert mentioned this pull request Mar 20, 2025
@jakirkham
Copy link
Member

Fixed up some conflicts introduced by a recent PR fixing CI: #393

Hope that is ok

Please feel free to tweak further as needed

@m-albert
Copy link
Collaborator

Quick summary

This PR fixes find_objects for a straight forward use case, and tests for it.

Problem on main

The following fails

import dask.array as da
import dask_image.ndmeasure

test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
test_labels[0, 0] = 1
computed_result = dask_image.ndmeasure.find_objects(test_labels).compute(scheduler='single-threaded')

The fix in this PR

The following fails with empty df1 and df2 in some cases:

ddf = dd.merge(df1, df2, how="outer", left_index=True, right_index=True)

The following workaround in this PR fixes it:

    if len(df1) > 0 and len(df2) > 0:
        ddf = dd.merge(
            df1, df2,
            how="outer", left_index=True, right_index=True)
    elif len(df1) > 0:
        ddf = df1
    elif len(df2) > 0:
        ddf = df2
    else:
        ddf = pd.DataFrame()

Conclusion

I suspect there's a problem with dd.merge when working on empty frames, however I couldn't find a minimal reproducer yet.

Since the workaround here works and fixes a problem that came up for different people, I'd propose to merge this PR as is and report a potential problem upstream once we identify it.

What do you think @jakirkham? 🙏

@m-albert
Copy link
Collaborator

This came up in a new issue again #403 so I'll go ahead and merge this.

If further discussion / changes are required I suggest we open a new issue! Thanks @joshua-gould for this fix and everyone for contributing 🙏

@m-albert m-albert merged commit bab4261 into dask:main May 16, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants