-
Notifications
You must be signed in to change notification settings - Fork 97
Closes #5228: remove type ignore from factorize in extension module #5229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #5228: remove type ignore from factorize in extension module #5229
Conversation
74ee5f0 to
937d368
Compare
1RyanK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
937d368 to
bfdfb5a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5229 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 4
Lines ? 63
Branches ? 0
========================================
Hits ? 63
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jaketrookman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
drculhane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly seems to do what's advertised. Nice work.
PR: Align ExtensionArray factorize and argsort with pandas expectations
Summary
Align pandas
ExtensionArraybehavior with pandas expectations by returning NumPy arrays(not Arkouda arrays) for
factorizecodes andargsortindices, while keeping all groupingand sorting computation server-side in Arkouda.
This improves pandas compatibility, simplifies downstream pandas internals
(e.g.
groupby,take,iloc), and clarifies API semantics.Key changes
ArkoudaExtensionArray.factorizecodesare now returned as a NumPy array of dtypenp.intp, as expected by pandas.uniquesare returned as anExtensionArrayof the same type asself.sortargument:NaNas missing.use_na_sentinelcontrols whether missing values map to-1orlen(uniques).ArkoudaExtensionArray.argsortpdarrayto NumPyndarray[np.intp].na_positionis now accepted via**kwargsfor pandas compatibility.ExtensionArraycontract more precisely.Tests
factorizecodesargsortindicessort-parameter test cases and adjusted expectations to first-appearance semantics.numpy.testing.assert_equal) for clarity and correctness.Motivation
Pandas internals expect:
factorize→ NumPy integer codesargsort→ NumPy permutation indicesReturning Arkouda arrays in these paths caused unnecessary friction and divergence from the
pandas
ExtensionArraycontract. This PR preserves Arkouda’s distributed execution modelwhile presenting pandas-native results at the API boundary.
Closes #5228: remove type ignore from factorize in extension module