-
Notifications
You must be signed in to change notification settings - Fork 168
feat: Add {Expr,Series}.is_close
#2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
😂😂😂 |
Only a tiny dent into the 600, but ... #2962 (comment)
def is_close( | ||
self, | ||
other: Self | NumericLiteral, | ||
*, | ||
abs_tol: float, | ||
rel_tol: float, | ||
nans_equal: bool, | ||
) -> Self: | ||
return self._reuse_series( | ||
"is_close", | ||
other=other, | ||
abs_tol=abs_tol, | ||
rel_tol=rel_tol, | ||
nans_equal=nans_equal, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be needed 🤔
narwhals/_utils.py
Outdated
return deep_attrgetter(name_1, *nested)(obj) | ||
|
||
|
||
def _is_close_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, we could use this implementation also for polars pre 1.32.0, however polars has a couple of native methods which play a bit nicer (e.g. .sign
, .is_infinite
, .not_
). If we were to introduce those, then we can have a single implementation function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not_
and __invert__
are identical btw
""" | ||
if abs_tol < 0: | ||
msg = f"`abs_tol` must be non-negative but got {abs_tol}" | ||
raise ComputeError(msg) | ||
|
||
if not (0 <= rel_tol < 1): | ||
msg = f"`rel_tol` must be in the range [0, 1) but got {rel_tol}" | ||
raise ComputeError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For lazy backends, we are not raising for non-numeric dtypes
@dangotbanned I am quite happy with the state of this. I left a couple of comments. Main blocker is typing. In a way I feel like mypy it's not picking up the obvious, but clearly I am not doing something as it's expected to be done 😂 |
I've got 3 different ideas for fixing the typing, each with varying trade-offs I'll try to write them up when I get a chance later today/tomorrow |
@dangotbanned look at 22e2203 🙏🏼 |
Bad news: that didn't fix the typing 😭 Good news: I still have 3 other ideas 😄 |
Idea 1 - Use a constrained
|
def is_close( | |
self, | |
other: Self | NumericLiteral, | |
*, | |
abs_tol: float, | |
rel_tol: float, | |
nans_equal: bool, | |
) -> Self: | |
return _is_close_impl( | |
self, other, abs_tol=abs_tol, rel_tol=rel_tol, nans_equal=nans_equal | |
) |
might need to change to this, since the function would now return CompliantSeriesAny
from typing import cast
result = _is_close_impl(
self, other, abs_tol=abs_tol, rel_tol=rel_tol, nans_equal=nans_equal
)
return cast("Self", result)
Idea 2 - Move utils._is_close_impl
into nw.Expr
and dispatch nw.Series
to nw.Expr
If we're already inside Expr
, we have a bounded Self
to use for self
, other
- rather than handling CompliantExprT | CompliantSeriesT
.
I can explain the benefits more if needed, but overall seems quite clean to me
Downsides
I guess the only wart is polars
, but that has it's own version now anyway.
nw.Series
has Series.implementation
but not too sure how branching could work for nw.Expr
when evaluating for polars
🤔
Idea 3 - Add a Protocol
for IsClose
, shared by Compliant{Expr,Series}
Both mypy
and pyright
are accepting this
You could add it as base for each Compliant*
and remove their definitions of is_close
+ the _utils
function
class IsClose(Protocol):
"""Every member defined is a dependency of `is_close`."""
def __and__(self, other: Any) -> Self: ...
def __or__(self, other: Any) -> Self: ...
def __invert__(self) -> Self: ...
def __sub__(self, other: Any) -> Self: ...
def __mul__(self, other: Any) -> Self: ...
def __eq__(self, other: Self | Any) -> Self: ... # type: ignore[override]
def __gt__(self, other: Any) -> Self: ...
def __ge__(self, other: Any) -> Self: ...
def __lt__(self, other: Any) -> Self: ...
def __le__(self, other: Any) -> Self: ...
def abs(self) -> Self: ...
def is_nan(self) -> Self: ...
def is_finite(self) -> Self: ...
def clip(
self,
lower_bound: Self | NumericLiteral | TemporalLiteral | None,
upper_bound: Self | NumericLiteral | TemporalLiteral | None,
) -> Self: ...
def is_close(
self,
other: Self | NumericLiteral,
*,
abs_tol: float,
rel_tol: float,
nans_equal: bool,
) -> Self:
from decimal import Decimal
other_abs: Self | NumericLiteral
other_is_nan: Self | bool
other_is_inf: Self | bool
other_is_not_inf: Self | bool
if isinstance(other, (float, int, Decimal)):
from math import isinf, isnan
other_abs = other.__abs__()
other_is_nan = isnan(other)
other_is_inf = isinf(other)
other_is_not_inf = not other_is_inf
else:
other_abs, other_is_nan = other.abs(), other.is_nan()
other_is_not_inf = other.is_finite() | other_is_nan
other_is_inf = ~other_is_not_inf
rel_threshold = self.abs().clip(lower_bound=other_abs, upper_bound=None) * rel_tol
tolerance = rel_threshold.clip(lower_bound=abs_tol, upper_bound=None)
self_is_nan = self.is_nan()
self_is_not_inf = self.is_finite() | self_is_nan
# Values are close if abs_diff <= tolerance, and both finite
is_close = (
((self - other).abs() <= tolerance) & self_is_not_inf & other_is_not_inf
)
# Handle infinity cases: infinities are "close" only if they have the same sign
self_sign, other_sign = self > 0, other > 0
is_same_inf = (~self_is_not_inf) & other_is_inf & (self_sign == other_sign)
# Handle nan cases:
# * nans_equals = True => if both values are NaN, then True
# * nans_equals = False => if any value is NaN, then False
either_nan = self_is_nan | other_is_nan
result = (is_close | is_same_inf) & ~either_nan
if nans_equal:
both_nan = self_is_nan & other_is_nan
result = result | both_nan
return result
Hey @dangotbanned thanks for the amazing writeup in #2962 (comment) It's kind of re-assuring that I somehow thought at a 2/3 of those ideas.
This was my first attempt, and yes you are correct that it leads to the issues with the return type not being
For better or for worse, I retained from using cast, and preferred adding one
This was indeed working, yet personally I am not too keen in triggering an extra ...which bring us to
which I couldn't dream of 🤯 It works smoothly (I assume), maybe a little verbose, yet possibly it's the preferred solution? What's you opinion? |
aha thanks and I agree great to see we're on the same page 😄
Yeah that would be my preference! The verbosity is only temporary though, as it gave me an idea for a follow-up. We could move quite a lot into a shared So the end result would be shrinking both protocols, removing But we can save that part for another PR (provided it sounds good to you) |
Co-authored-by: Dan Redding <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good, thanks @FBruzzesi
Only a few notes/suggestions from me 🙂
>>> s.is_close(1.4, abs_tol=0.1).to_native() # doctest:+ELLIPSIS | ||
<pyarrow.lib.ChunkedArray object at ...> | ||
[ | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a sneaky solution to (#2776 (comment)) 😂?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂 that's the only way to use it in doctest, maybe it's an idea?
Thanks for the additional simplification @dangotbanned I addressed the
in e4b7a1e and figured out how katex wants |
I'll look now! I'm pretty sure the rest was good, but will do a thrice over 😄 |
![]() No luck for me in VSCode ![]() but it seems to be the same story with https://narwhals-dev.github.io/narwhals/api-reference/expr/#narwhals.Expr.ewm_mean 😂 ![]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expr|Series.is_close
{Expr,Series}.is_close
{Expr,Series}.is_close
{Expr,Series}.is_close
thanks both, great feature! |
Now that #2962 has merged, this part of the plan is possible (#2962 (comment))
What type of PR is this? (check all applicable)
Related issues
is_close
#2911Checklist
If you have comments or can explain your changes, please do so below
Reason for "help wanted" label:
_is_close_impl
is almost possible to avoid any code duplication. All we need is already in narwhals, and no backend has this natively implemented (I think). However.... typing issues (which I didn't spend too much time on as of now but will give it a try)test_is_close_expr_with_scalar
with other being a finite scalar andnans_equal=True
still returning True when the column value is nan.It's been a long day and I couldn't point down the reason. It could be that logical operators are not supposed to be used with scalar booleans. If that's the case, the implementation might need to be split up between eager and lazy.Logic is now fixed 😇Also I keep submitting PRs which as 600 line changes with one functionality implemented 😩