Skip to content

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Jan 23, 2025

Crude implementation of sort and argsort for dask.array, which is functionally correct but can be extremely memory and network-intensive.

A better solution would be to implement these two functions in dask.array itself, on top of the shuffle subsystem which is already used for dask.dataframe.DataFrame.sort_values.

FYI @fjetter @phofl @hendrikmakait

@crusaderky crusaderky force-pushed the dask_sort branch 4 times, most recently from 1816c16 to 1a7316f Compare January 23, 2025 10:20
@crusaderky
Copy link
Contributor Author

FYI @lucascolley @lithomas1

@crusaderky crusaderky force-pushed the dask_sort branch 3 times, most recently from c0f8617 to 8500867 Compare January 23, 2025 11:21
@crusaderky
Copy link
Contributor Author

@ev-br @lucascolley ready for review and merge.

@crusaderky crusaderky mentioned this pull request Jan 23, 2025
Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you like to get feedback from a Dask expert before we merge this? Or are you confident that it is at least good enough for now?

@crusaderky
Copy link
Contributor Author

I'm confident that this is tolerable at least as a temporary crutch. It will work for some geometries and will go OOM for others, which IMHO is vastly better than not having anything at all.

@lucascolley lucascolley merged commit fa558f2 into data-apis:main Jan 26, 2025
42 checks passed
@crusaderky crusaderky deleted the dask_sort branch January 26, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants