Skip to content

Conversation

@FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Aug 7, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Reason for "help wanted" label:

  1. By implementing _is_close_impl is almost possible to avoid any code duplication. All we need is already in narwhals, and no backend has this natively implemented (I think). However.... typing issues (which I didn't spend too much time on as of now but will give it a try)
  2. One of the test is failing for the SQL-like backends. This is test_is_close_expr_with_scalar with other being a finite scalar and nans_equal=True still returning True when the column value is nan. It's been a long day and I couldn't point down the reason. It could be that logical operators are not supposed to be used with scalar booleans. If that's the case, the implementation might need to be split up between eager and lazy. Logic is now fixed 😇

Also I keep submitting PRs which as 600 line changes with one functionality implemented 😩

@FBruzzesi FBruzzesi added enhancement New feature or request help wanted Extra attention is needed labels Aug 7, 2025
@dangotbanned
Copy link
Member

Also I keep submitting PRs which as 600 line changes with one functionality implemented 😩

😂😂😂

Comment on lines +907 to +921
def is_close(
self,
other: Self | NumericLiteral,
*,
abs_tol: float,
rel_tol: float,
nans_equal: bool,
) -> Self:
return self._reuse_series(
"is_close",
other=other,
abs_tol=abs_tol,
rel_tol=rel_tol,
nans_equal=nans_equal,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be needed 🤔

return deep_attrgetter(name_1, *nested)(obj)


def _is_close_impl(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, we could use this implementation also for polars pre 1.32.0, however polars has a couple of native methods which play a bit nicer (e.g. .sign, .is_infinite, .not_). If we were to introduce those, then we can have a single implementation function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not_ and __invert__ are identical btw

Comment on lines +2412 to +2419
"""
if abs_tol < 0:
msg = f"`abs_tol` must be non-negative but got {abs_tol}"
raise ComputeError(msg)

if not (0 <= rel_tol < 1):
msg = f"`rel_tol` must be in the range [0, 1) but got {rel_tol}"
raise ComputeError(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For lazy backends, we are not raising for non-numeric dtypes

@FBruzzesi
Copy link
Member Author

@dangotbanned I am quite happy with the state of this. I left a couple of comments. Main blocker is typing. In a way I feel like mypy it's not picking up the obvious, but clearly I am not doing something as it's expected to be done 😂

@dangotbanned
Copy link
Member

I've got 3 different ideas for fixing the typing, each with varying trade-offs

I'll try to write them up when I get a chance later today/tomorrow

@FBruzzesi
Copy link
Member Author

I've got 3 different ideas for fixing the typing, each with varying trade-offs

I'll try to write them up when I get a chance later today/tomorrow

@dangotbanned look at 22e2203 🙏🏼

@dangotbanned
Copy link
Member

dangotbanned commented Aug 8, 2025

I've got 3 different ideas for fixing the typing, each with varying trade-offs
I'll try to write them up when I get a chance later today/tomorrow

@dangotbanned look at 22e2203 🙏🏼

Bad news: that didn't fix the typing 😭
(there's still 2x type ignore(s), probably more if not doing 3x assignments on a single line)

Good news: I still have 3 other ideas 😄
(tomorrow!)

@dangotbanned dangotbanned self-requested a review August 9, 2025 12:12
@dangotbanned
Copy link
Member

dangotbanned commented Aug 9, 2025

I've got 3 different ideas for fixing the typing, each with varying trade-offs
I'll try to write them up when I get a chance later today/tomorrow
Good news: I still have 3 other ideas 😄 (tomorrow!)

Idea 1 - Use a constrained TypeVar

Some related threads

(22e2203) changes things a bit from when I first thought of this, but it would've looked like:

CompliantSeriesOrExprT = TypeVar("CompliantSeriesOrExprT", CompliantSeriesAny, CompliantExprAny)

Downsides

If that solves most of the issues, I still expect there will be some cast(s) required.

My guess is here:

CompliantSeries.is_close

def is_close(
self,
other: Self | NumericLiteral,
*,
abs_tol: float,
rel_tol: float,
nans_equal: bool,
) -> Self:
return _is_close_impl(
self, other, abs_tol=abs_tol, rel_tol=rel_tol, nans_equal=nans_equal
)

might need to change to this, since the function would now return CompliantSeriesAny

        from typing import cast

        result = _is_close_impl(
            self, other, abs_tol=abs_tol, rel_tol=rel_tol, nans_equal=nans_equal
        )
        return cast("Self", result)

Idea 2 - Move utils._is_close_impl into nw.Expr

and dispatch nw.Series to nw.Expr

If we're already inside Expr, we have a bounded Self to use for self, other - rather than handling CompliantExprT | CompliantSeriesT.

I can explain the benefits more if needed, but overall seems quite clean to me

Downsides

I guess the only wart is polars, but that has it's own version now anyway.

nw.Series has Series.implementation but not too sure how branching could work for nw.Expr when evaluating for polars 🤔

Idea 3 - Add a Protocol for IsClose, shared by Compliant{Expr,Series}

Both mypy and pyright are accepting this

You could add it as base for each Compliant* and remove their definitions of is_close + the _utils function

class IsClose(Protocol):
    """Every member defined is a dependency of `is_close`."""

    def __and__(self, other: Any) -> Self: ...
    def __or__(self, other: Any) -> Self: ...
    def __invert__(self) -> Self: ...
    def __sub__(self, other: Any) -> Self: ...
    def __mul__(self, other: Any) -> Self: ...
    def __eq__(self, other: Self | Any) -> Self: ...  # type: ignore[override]
    def __gt__(self, other: Any) -> Self: ...
    def __ge__(self, other: Any) -> Self: ...
    def __lt__(self, other: Any) -> Self: ...
    def __le__(self, other: Any) -> Self: ...
    def abs(self) -> Self: ...
    def is_nan(self) -> Self: ...
    def is_finite(self) -> Self: ...
    def clip(
        self,
        lower_bound: Self | NumericLiteral | TemporalLiteral | None,
        upper_bound: Self | NumericLiteral | TemporalLiteral | None,
    ) -> Self: ...
    def is_close(
        self,
        other: Self | NumericLiteral,
        *,
        abs_tol: float,
        rel_tol: float,
        nans_equal: bool,
    ) -> Self:
        from decimal import Decimal

        other_abs: Self | NumericLiteral
        other_is_nan: Self | bool
        other_is_inf: Self | bool
        other_is_not_inf: Self | bool

        if isinstance(other, (float, int, Decimal)):
            from math import isinf, isnan

            other_abs = other.__abs__()
            other_is_nan = isnan(other)
            other_is_inf = isinf(other)
            other_is_not_inf = not other_is_inf

        else:
            other_abs, other_is_nan = other.abs(), other.is_nan()
            other_is_not_inf = other.is_finite() | other_is_nan
            other_is_inf = ~other_is_not_inf

        rel_threshold = self.abs().clip(lower_bound=other_abs, upper_bound=None) * rel_tol
        tolerance = rel_threshold.clip(lower_bound=abs_tol, upper_bound=None)

        self_is_nan = self.is_nan()
        self_is_not_inf = self.is_finite() | self_is_nan

        # Values are close if abs_diff <= tolerance, and both finite
        is_close = (
            ((self - other).abs() <= tolerance) & self_is_not_inf & other_is_not_inf
        )

        # Handle infinity cases: infinities are "close" only if they have the same sign
        self_sign, other_sign = self > 0, other > 0
        is_same_inf = (~self_is_not_inf) & other_is_inf & (self_sign == other_sign)

        # Handle nan cases:
        #   * nans_equals = True => if both values are NaN, then True
        #   * nans_equals = False => if any value is NaN, then False
        either_nan = self_is_nan | other_is_nan
        result = (is_close | is_same_inf) & ~either_nan

        if nans_equal:
            both_nan = self_is_nan & other_is_nan
            result = result | both_nan

        return result

@FBruzzesi
Copy link
Member Author

FBruzzesi commented Aug 9, 2025

Hey @dangotbanned thanks for the amazing writeup in #2962 (comment)

It's kind of re-assuring that I somehow thought at a 2/3 of those ideas.

Idea 1 - Use a constrained TypeVar

This was my first attempt, and yes you are correct that it leads to the issues with the return type not being Self:

narwhals/_compliant/series.py:296: error: Incompatible return value type (got "CompliantSeries[Any]", expected "Self")  [return-value]
            return _is_close_impl(
                   ^
narwhals/_compliant/expr.py:252: error: Incompatible return value type (got "CompliantExpr[Any, Any]", expected "Self")  [return-value]
            return _is_close_impl(
                   ^
Found 2 errors in 2 files (checked 153 source files)

For better or for worse, I retained from using cast, and preferred adding one type: ignore in clip (the one for abs is solved by doing .__abs__(), thanks!)

Idea 2 - Move utils._is_close_impl into nw.Expr

This was indeed working, yet personally I am not too keen in triggering an extra to_frame + get_column if there is no need.

...which bring us to

Idea 3 - Add a Protocol for IsClose, shared by Compliant{Expr,Series}

which I couldn't dream of 🤯 It works smoothly (I assume), maybe a little verbose, yet possibly it's the preferred solution? What's you opinion?

@dangotbanned
Copy link
Member

dangotbanned commented Aug 9, 2025

#2962 (comment)

Hey @dangotbanned thanks for the amazing writeup in #2962 (comment)
It's kind of re-assuring that I somehow thought at a 2/3 of those ideas.

aha thanks and I agree great to see we're on the same page 😄

Idea 3 - Add a Protocol for IsClose, shared by Compliant{Expr,Series}

which I couldn't dream of 🤯 It works smoothly (I assume), maybe a little verbose, yet possibly it's the preferred solution? What's you opinion?

Yeah that would be my preference!

The verbosity is only temporary though, as it gave me an idea for a follow-up.

We could move quite a lot into a shared Protocol for Compliant{Expr,Series} (maybe named CompliantColumn or CompliantColumnar)
Essentially anything they both have, but doesn't aggregate/return a scalar could just be defined in one place 😏

So the end result would be shrinking both protocols, removing IsClose and just defining it there 🥳

But we can save that part for another PR (provided it sounds good to you)

@FBruzzesi FBruzzesi marked this pull request as ready for review August 9, 2025 16:34
Co-authored-by: Dan Redding <[email protected]>
@FBruzzesi FBruzzesi removed the help wanted Extra attention is needed label Aug 10, 2025
dangotbanned added a commit that referenced this pull request Aug 10, 2025
@dangotbanned dangotbanned mentioned this pull request Aug 10, 2025
10 tasks
@dangotbanned dangotbanned self-requested a review August 11, 2025 19:40
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, thanks @FBruzzesi

Only a few notes/suggestions from me 🙂

Comment on lines +2804 to +2807
>>> s.is_close(1.4, abs_tol=0.1).to_native() # doctest:+ELLIPSIS
<pyarrow.lib.ChunkedArray object at ...>
[
[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a sneaky solution to (#2776 (comment)) 😂?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 that's the only way to use it in doctest, maybe it's an idea?

@FBruzzesi
Copy link
Member Author

Thanks for the additional simplification @dangotbanned

I addressed the

Only a few notes/suggestions from me 🙂

in e4b7a1e and figured out how katex wants \text{...} in d61c5c7 - I checked the latter in the docs served locally, not sure how it renders in the editor though

@dangotbanned
Copy link
Member

(#2962 (comment))

I checked the latter in the docs served locally, not sure how it renders in the editor though

I'll look now!

I'm pretty sure the rest was good, but will do a thrice over 😄

@dangotbanned
Copy link
Member

mkdocs seems happy now

image

No luck for me in VSCode

image

but it seems to be the same story with ewm_mean

https://narwhals-dev.github.io/narwhals/api-reference/expr/#narwhals.Expr.ewm_mean

😂

image

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very shaky narwhal

@dangotbanned dangotbanned changed the title feat: Add support for Expr|Series.is_close feat: Add support for {Expr,Series}.is_close Aug 12, 2025
@dangotbanned dangotbanned changed the title feat: Add support for {Expr,Series}.is_close feat: Add {Expr,Series}.is_close Aug 12, 2025
@FBruzzesi FBruzzesi merged commit 11fe33f into main Aug 13, 2025
33 checks passed
@FBruzzesi FBruzzesi deleted the feat/is-close branch August 13, 2025 06:30
@MarcoGorelli
Copy link
Member

thanks both, great feature!

dangotbanned added a commit that referenced this pull request Aug 13, 2025
Now that #2962 has merged, this part of the plan is possible (#2962 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enh]: Add is_close

3 participants