-
Notifications
You must be signed in to change notification settings - Fork 97
Closes #5230: ArkoudaExtensionArray arithmetic #5231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Closes #5230: ArkoudaExtensionArray arithmetic #5231
Conversation
dd8d9a8 to
7357416
Compare
1RyanK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| return NotImplemented | ||
|
|
||
| result = op(self._data, other) | ||
| return type(self)(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use _from_data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Fixed.
|
|
||
|
|
||
| # Self-type for correct return typing | ||
| EA = TypeVar("EA", bound="ExtensionArray") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used? Or just for future use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it.
| implementation of ``op``. | ||
| """ | ||
| if isinstance(other, ExtensionArray) and hasattr(other, "_data"): | ||
| other = other._data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be concerned that other's _data could be something other than a pdarray or ndarray? ExtensionArray is rather broad... But I guess if you want to use our extension arrays with someone else, compatibility is at your own risk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I added some additional type handling.
| """ | ||
| if isinstance(other, ExtensionArray) and hasattr(other, "_data"): | ||
| other = other._data | ||
| elif np.isscalar(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, pandas has things that numpy doesn't consider to be scalars, like pd.NA. But maybe some of those would be fine here? I'm not saying this is something that has to be addressed in this PR but maybe make an issue for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I created a ticket: #5311
Right now, if you add pd.NA to an ArkoudaArray, _arith_method will return NotImplemented, and pandas will resort to a fall back method, typically converting the array to NumPy.
I think we need to figure out our own NA handling before we can fully address this.
| (operator.mul, 2, np.array([2, 4, 6])), | ||
| ], | ||
| ) | ||
| def test_arith_method_with_scalar_operand(self, op, scalar, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test adding float scalars? See how promotion works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added some float examples.
7357416 to
774ac07
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5231 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 4
Lines ? 63
Branches ? 0
========================================
Hits ? 63
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
drculhane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as if this allows all numerical types (I tested it manually with ak.float64, ak.int64, ak.uint64 and ak.bool_). Typically we parametrize the tests to include all types. Is that not needed here?
jaketrookman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Enable pandas arithmetic dispatch for Arkouda ExtensionArray
Summary
This PR implements the pandas
ExtensionArrayarithmetic hook (_arith_method) forArkoudaExtensionArray, enabling elementwise arithmetic operations (e.g.+,-,*)between Arkouda-backed arrays and with scalars.
Motivation
Pandas does not automatically dispatch Python operators (
__add__, etc.) forExtensionArrays. Instead, arithmetic is routed through
_arith_method. Without thishook, expressions like:
raise
TypeError.Implementing
_arith_methodis required for:Series/DataFramearithmeticWhat’s in this PR
Core functionality
_arith_methodtoArkoudaExtensionArrayNotImplementedfor unsupported operand types_from_dataconstructor helper for mypy-safe instance creation_dataattribute for static typingTyping & correctness
typing_extensions.Selffor precise self-type returnsNotImplementedType(not the value) in return annotationsTests
NotImplementedpropagationTypeErrorbehavior for unsupported operandsargsort/ NaN placement tests remain unchanged and passingDesign notes
before calling into the ExtensionArray.
operations.
operator overloading.
Example
Reviewer notes
_from_datahelper is intentionally minimal and centralizes EA construction.hasattr(other, "_data")) is used instead of concrete EA imports toavoid circular dependencies.
modified.
Closes #5230: ArkoudaExtensionArray arithmetic