Closes #5228: remove type ignore from factorize in extension module #5229
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR: Align ExtensionArray factorize and argsort with pandas expectations
Summary
Align pandas
ExtensionArraybehavior with pandas expectations by returning NumPy arrays(not Arkouda arrays) for
factorizecodes andargsortindices, while keeping all groupingand sorting computation server-side in Arkouda.
This improves pandas compatibility, simplifies downstream pandas internals
(e.g.
groupby,take,iloc), and clarifies API semantics.Key changes
ArkoudaExtensionArray.factorizecodesare now returned as a NumPy array of dtypenp.intp, as expected by pandas.uniquesare returned as anExtensionArrayof the same type asself.sortargument:NaNas missing.use_na_sentinelcontrols whether missing values map to-1orlen(uniques).ArkoudaExtensionArray.argsortpdarrayto NumPyndarray[np.intp].na_positionis now accepted via**kwargsfor pandas compatibility.ExtensionArraycontract more precisely.Tests
factorizecodesargsortindicessort-parameter test cases and adjusted expectations to first-appearance semantics.numpy.testing.assert_equal) for clarity and correctness.Motivation
Pandas internals expect:
factorize→ NumPy integer codesargsort→ NumPy permutation indicesReturning Arkouda arrays in these paths caused unnecessary friction and divergence from the
pandas
ExtensionArraycontract. This PR preserves Arkouda’s distributed execution modelwhile presenting pandas-native results at the API boundary.
Closes #5228: remove type ignore from factorize in extension module