Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Jan 6, 2026

Enable pandas arithmetic dispatch for Arkouda ExtensionArray

Summary

This PR implements the pandas ExtensionArray arithmetic hook (_arith_method) for
ArkoudaExtensionArray, enabling elementwise arithmetic operations (e.g. +, -, *)
between Arkouda-backed arrays and with scalars.


Motivation

Pandas does not automatically dispatch Python operators (__add__, etc.) for
ExtensionArrays. Instead, arithmetic is routed through _arith_method. Without this
hook, expressions like:

pd.array([1, 2, 3], dtype="ak_int64") + pd.array([4, 5, 6], dtype="ak_int64")

raise TypeError.

Implementing _arith_method is required for:

  • correct pandas operator dispatch
  • future Series / DataFrame arithmetic
  • consistency with pandas ExtensionArray contracts

What’s in this PR

Core functionality

  • Adds _arith_method to ArkoudaExtensionArray
    • Supports EA–EA and EA–scalar operations
    • Returns NotImplemented for unsupported operand types
    • Preserves the concrete EA type on return
  • Adds _from_data constructor helper for mypy-safe instance creation
  • Annotates internal _data attribute for static typing

Typing & correctness

  • Uses typing_extensions.Self for precise self-type returns
  • Uses NotImplementedType (not the value) in return annotations

Tests

  • Adds unit tests covering:
    • EA–EA arithmetic
    • EA–scalar arithmetic
    • NotImplemented propagation
    • User-visible TypeError behavior for unsupported operands
  • Existing argsort / NaN placement tests remain unchanged and passing

Design notes

  • Index alignment is intentionally not handled here; pandas performs alignment
    before calling into the ExtensionArray.
  • Type coercion and promotion semantics are delegated to the underlying Arkouda
    operations.
  • The implementation follows pandas’ recommended EA patterns rather than Python
    operator overloading.

Example

import pandas as pd

x = pd.array([1, 2, 3], dtype="ak_int64")
y = pd.array([10, 20, 30], dtype="ak_int64")

x + y
# ArkoudaArray([11 22 33])

Reviewer notes

  • The _from_data helper is intentionally minimal and centralizes EA construction.
  • Duck-typing (hasattr(other, "_data")) is used instead of concrete EA imports to
    avoid circular dependencies.
  • All changes are localized to the ExtensionArray layer; no pandas behavior is
    modified.

Closes #5230: ArkoudaExtensionArray arithmetic

@ajpotts ajpotts force-pushed the 5230_ArkoudaExtensionArray_arithmetic branch from dd8d9a8 to 7357416 Compare January 6, 2026 12:23
@ajpotts ajpotts marked this pull request as ready for review January 6, 2026 15:41
Copy link
Collaborator

@1RyanK 1RyanK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

return NotImplemented

result = op(self._data, other)
return type(self)(result)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use _from_data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Fixed.



# Self-type for correct return typing
EA = TypeVar("EA", bound="ExtensionArray")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used? Or just for future use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

implementation of ``op``.
"""
if isinstance(other, ExtensionArray) and hasattr(other, "_data"):
other = other._data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be concerned that other's _data could be something other than a pdarray or ndarray? ExtensionArray is rather broad... But I guess if you want to use our extension arrays with someone else, compatibility is at your own risk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I added some additional type handling.

"""
if isinstance(other, ExtensionArray) and hasattr(other, "_data"):
other = other._data
elif np.isscalar(other):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, pandas has things that numpy doesn't consider to be scalars, like pd.NA. But maybe some of those would be fine here? I'm not saying this is something that has to be addressed in this PR but maybe make an issue for the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I created a ticket: #5311

Right now, if you add pd.NA to an ArkoudaArray, _arith_method will return NotImplemented, and pandas will resort to a fall back method, typically converting the array to NumPy.

I think we need to figure out our own NA handling before we can fully address this.

(operator.mul, 2, np.array([2, 4, 6])),
],
)
def test_arith_method_with_scalar_operand(self, op, scalar, expected):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test adding float scalars? See how promotion works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added some float examples.

@ajpotts ajpotts force-pushed the 5230_ArkoudaExtensionArray_arithmetic branch from 7357416 to 774ac07 Compare January 15, 2026 09:47
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@983fc7a). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #5231   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         4           
  Lines           ?        63           
  Branches        ?         0           
========================================
  Hits            ?        63           
  Misses          ?         0           
  Partials        ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@drculhane drculhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks as if this allows all numerical types (I tested it manually with ak.float64, ak.int64, ak.uint64 and ak.bool_). Typically we parametrize the tests to include all types. Is that not needed here?

@ajpotts ajpotts enabled auto-merge January 15, 2026 19:21
Copy link
Contributor

@jaketrookman jaketrookman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ArkoudaExtensionArray arithmetic

4 participants