Skip to content

Conversation

@ym-pett
Copy link
Contributor

@ym-pett ym-pett commented Aug 1, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Comment on lines 20 to 23
# not sure how to initialise the class, I want it to have all the methods etc of SQLExprT and add its own operator
# methods in addition to those
def __init__(self, other: [Int | Float]) -> None:
super().__init__(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class will only be used for type-checking purposes, so there's no need to fill in the implementations for any methods, in particular, you don't need to define __init__

for the others, you can just return ...

@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from 2bd2b9b to f8e853c Compare August 1, 2025 16:13

class NativeSQLExpr(NativeExpr):
# TODO @mp: fix input type for all these!
def __gt__(self, value: float) -> Boolean: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the return type should be Self

In [11]: import duckdb

In [12]: duckdb.ColumnExpression('a') == 3
Out[12]: (a = 3)

In [13]: type(duckdb.ColumnExpression('a') == 3)
Out[13]: duckdb.duckdb.Expression

when you do arithmetic / comparisons between these expressions, the output is still an expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks for that, now I understand why Self!

@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from d3af7d3 to a4de0bb Compare August 3, 2025 17:04
@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from 9377eb7 to d372db6 Compare August 4, 2025 08:08
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 4, 2025

pytests seem to suggest something going on with the arguments to NativeSQLExprT, I think I do need to define the arguments by making a NativeSQLExprTAny; looking into this!

but NativeSQLExpr doesn't take any arguments, so unlikely to be the solution..

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Aug 4, 2025

I think I do need to define the arguments by making a NativeSQLExprTAny; looking into this!

yup, that's right!

but NativeSQLExpr doesn't take any arguments, so unlikely to be the solution..

NativeSQLExpr doesn't, but NativeSQLExprT does

EDIT: discussed over a call how to resolve this

@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from efcbd68 to 3a97dcb Compare August 4, 2025 09:55
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 4, 2025

I think I do need to define the arguments by making a NativeSQLExprTAny; looking into this!

yup, that's right!

but NativeSQLExpr doesn't take any arguments, so unlikely to be the solution..

NativeSQLExpr doesn't, but NativeSQLExprT does

ah just seen this, thanks!

@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from f540cd8 to 9a544e8 Compare August 4, 2025 10:20
@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from c0e19c5 to 266cab9 Compare August 4, 2025 12:02
@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from 660b4e3 to 4aae069 Compare August 4, 2025 12:23
@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from 247ad11 to 7ad112a Compare August 4, 2025 13:45
@ym-pett ym-pett force-pushed the fix_sql_operator_problem branch from 23eb022 to 937e2a1 Compare August 4, 2025 14:07
@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 4, 2025

hmm i can see why the typing errors happen, i'm not sure how to fix though. I'll add comments to the files where the errors are flagged.

@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 5, 2025

2 ibis type errors remain - don't know if my technique is right, is there a better way?

@MarcoGorelli
Copy link
Member

I'm seriously tempted to say that we should just switch

class IbisExpr(SQLExpr["IbisLazyFrame", "ir.Value"])

to

class IbisExpr(SQLExpr["IbisLazyFrame", Any])

Afterall, Ibis doesn't export its type annotations, nor do they even check them, so I don't think it's worth bending backwards over their non-public types anyway. We're having to introduce cast / type: ignore in every other ibis function anyway... 😩

cc @dangotbanned in case you have thoughts

@ym-pett
Copy link
Contributor Author

ym-pett commented Aug 5, 2025

re going for an 'Any' type, let me know if/when I can implement or if I should try to find another fix

@dangotbanned
Copy link
Member

Thanks for the ping @MarcoGorelli

Let me have a think on it today

@dangotbanned
Copy link
Member

dangotbanned commented Aug 5, 2025

Afterall, Ibis doesn't export its type annotations, nor do they even check them, so I don't think it's worth bending backwards over their non-public types anyway.
We're having to introduce cast / type: ignore in every other ibis function anyway... 😩

Let me have a think on it today

My preference would be to open issue(s) upstream first.
Back in (#2000 (comment)), I noticed that a lot of our ibis typing woes would be fixed by some methods using Self as the return type.

These operator issues would also fall into that category as we want the methods defined in https://github.com/ibis-project/ibis/blob/db1a727b3c4c75e8e4af0f7ef3d0101d26d4450b/ibis/expr/types/numeric.py and not ir.Value itself.

From the looks of (ibis-project/ibis#11511), I'm getting the feeling that the ibis maintainers might be open to improvements if we propose them πŸ™

We do have the benefit of an entire sub-package documenting where we've struggled πŸ˜‚


If we don't have any success there, another option is isolating the problematic paths and wrapping them with our own annotations.
I've done this a lot in (https://github.com/narwhals-dev/narwhals/blob/a8b89fd268f2031b52dfe0548e728fcacbc90a04/narwhals/_arrow/utils.py), (https://github.com/narwhals-dev/narwhals/blob/a8b89fd268f2031b52dfe0548e728fcacbc90a04/narwhals/_arrow/typing.py), but there's stuff like this all over narwhals

Show examples

# NOTE: Use this to avoid annotating inline
def iter_dtype_backends(
dtypes: Iterable[Any], implementation: Implementation
) -> Iterator[DTypeBackend]:
"""Yield a `DTypeBackend` per-dtype.
Matches pandas' `dtype_backend` argument in `convert_dtypes`.
"""
return (get_dtype_backend(dtype, implementation) for dtype in dtypes)

def import_array_module(implementation: Implementation, /) -> ModuleType:

class _NativeConcat(Protocol[NativeDataFrameT, NativeSeriesT]):
@overload
def __call__(
self,
objs: Iterable[NativeDataFrameT],
*,
axis: _Vertical,
copy: bool | None = ...,
) -> NativeDataFrameT: ...
@overload
def __call__(
self, objs: Iterable[NativeSeriesT], *, axis: _Vertical, copy: bool | None = ...
) -> NativeSeriesT: ...
@overload
def __call__(
self,
objs: Iterable[NativeDataFrameT | NativeSeriesT],
*,
axis: _Horizontal,
copy: bool | None = ...,
) -> NativeDataFrameT: ...
@overload
def __call__(
self,
objs: Iterable[NativeDataFrameT | NativeSeriesT],
*,
axis: Axis,
copy: bool | None = ...,
) -> NativeDataFrameT | NativeSeriesT: ...
def __call__(
self,
objs: Iterable[NativeDataFrameT | NativeSeriesT],
*,
axis: Axis,
copy: bool | None = None,
) -> NativeDataFrameT | NativeSeriesT: ...

class _BasePandasLike(Sized, Protocol):
index: Any
"""`mypy` doesn't like the asymmetric `property` setter in `pandas`."""
def __getitem__(self, key: Any, /) -> Any: ...
def __mul__(self, other: float | Collection[float] | Self) -> Self: ...
def __floordiv__(self, other: float | Collection[float] | Self) -> Self: ...
@property
def loc(self) -> Any: ...
@property
def shape(self) -> tuple[int, ...]: ...
def set_axis(self, labels: Any, *, axis: Any = ..., copy: bool = ...) -> Self: ...
def copy(self, deep: bool = ...) -> Self: ... # noqa: FBT001
def rename(self, *args: Any, inplace: Literal[False], **kwds: Any) -> Self:
"""`inplace=False` is required to avoid (incorrect?) default overloads."""
...
class _BasePandasLikeFrame(NativeFrame, _BasePandasLike, Protocol): ...
class _BasePandasLikeSeries(NativeSeries, _BasePandasLike, Protocol):
def where(self, cond: Any, other: Any = ..., **kwds: Any) -> Any: ...
class _NativeDask(Protocol):
_partition_type: type[pd.DataFrame]
class _CuDFDataFrame(_BasePandasLikeFrame, Protocol):
def to_pylibcudf(self, *args: Any, **kwds: Any) -> Any: ...
class _CuDFSeries(_BasePandasLikeSeries, Protocol):
def to_pylibcudf(self, *args: Any, **kwds: Any) -> Any: ...
class _NativeIbis(Protocol):
def sql(self, *args: Any, **kwds: Any) -> Any: ...
def __pyarrow_result__(self, *args: Any, **kwds: Any) -> Any: ...
def __pandas_result__(self, *args: Any, **kwds: Any) -> Any: ...
def __polars_result__(self, *args: Any, **kwds: Any) -> Any: ...
class _ModinDataFrame(_BasePandasLikeFrame, Protocol):
_pandas_class: type[pd.DataFrame]
class _ModinSeries(_BasePandasLikeSeries, Protocol):
_pandas_class: type[pd.Series[Any]]
_NativePolars: TypeAlias = "pl.DataFrame | pl.LazyFrame | pl.Series"
_NativeArrow: TypeAlias = "pa.Table | pa.ChunkedArray[Any]"
_NativeDuckDB: TypeAlias = "duckdb.DuckDBPyRelation"
_NativePandas: TypeAlias = "pd.DataFrame | pd.Series[Any]"
_NativeModin: TypeAlias = "_ModinDataFrame | _ModinSeries"
_NativeCuDF: TypeAlias = "_CuDFDataFrame | _CuDFSeries"

def _F(self): # type: ignore[no-untyped-def] # noqa: ANN202, N802
if TYPE_CHECKING:
from sqlframe.base import functions
return functions
return import_functions(self._implementation)
@property

While this is an option, I wouldn't suggest it until we know that the only alternative is typing everything as Any

@MarcoGorelli
Copy link
Member

All I can think to suggest is, in narwhals/_ibis/typing.py, to introduce

class Expression(ir.Value, Protocol):
    def __ge__(self, other: Any, /) -> Expression: ...
    def __le__(self, other: Any, /) -> Expression: ...
    # etc.

and fill in the missing bits

Alternatively, use Any when defining the classes, but keep ir.Column / ir.NumericColumn in some methods

perhaps i'll give this a go later

@dangotbanned
Copy link
Member

All I can think to suggest is, in narwhals/_ibis/typing.py, to introduce

class Expression(ir.Value, Protocol):

Sadly this won't work IIRC, as a Protocol can't inherit from a non-Protocol class πŸ˜”

@MarcoGorelli
Copy link
Member

πŸ€” i'm wondering if Deferred would be a better candidate than ir.Value, as that's closer to Expression / Column

In [1]: import ibis

In [2]: ibis._['a']
Out[2]: _['a']

In [3]: type(_)
Out[3]: ibis.common.deferred.Deferred

@dangotbanned dangotbanned mentioned this pull request Aug 7, 2025
1 task
Replaces them with calls to `operator` functions, which use `Any`
@dangotbanned
Copy link
Member

@MarcoGorelli, @ym-pett

How do you feel about this?

I've tried to keep the terseness of using operators through aliasing.

This works due to most of these functions being typed more permissively

Comment on lines +67 to +76
# NOTE: None of these are annotated for `dx.Series`, but are added imperatively
# Probably better to define a sub-protocol for `NativeSQLExpr`
# - match `dx.Series` to `NativeExpr`
# - match the others to `NativeSQLExpr`
def __gt__(self, value: Any, /) -> Self: ...
def __lt__(self, value: Any, /) -> Self: ...
def __ge__(self, value: Any, /) -> Self: ...
def __le__(self, value: Any, /) -> Self: ...
def __eq__(self, value: Any, /) -> Self: ... # type: ignore[override]
def __ne__(self, value: Any, /) -> Self: ... # type: ignore[override]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my 🀦

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks @dangotbanned

So, the idea is to put the comparison operators in NativeExpr, and for the arithmetic ones we just use operator? And just ignore for Dask, which implements these in a hacky way anyway

@ym-pett I think if you clean up the temporary comments, then I think we can ship this

Comment on lines +67 to +76
# NOTE: None of these are annotated for `dx.Series`, but are added imperatively
# Probably better to define a sub-protocol for `NativeSQLExpr`
# - match `dx.Series` to `NativeExpr`
# - match the others to `NativeSQLExpr`
def __gt__(self, value: Any, /) -> Self: ...
def __lt__(self, value: Any, /) -> Self: ...
def __ge__(self, value: Any, /) -> Self: ...
def __le__(self, value: Any, /) -> Self: ...
def __eq__(self, value: Any, /) -> Self: ... # type: ignore[override]
def __ne__(self, value: Any, /) -> Self: ... # type: ignore[override]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks - if anything, I like it better if we match on these than if we match on between and isin (which, for example, Daft doesn't have)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the important bit

class NativeExpr(Protocol):
"""An `Expr`-like object from a package with [Lazy-only support](https://narwhals-dev.github.io/narwhals/extending/#levels-of-support).
Protocol members are chosen *purely* for matching statically - as they
are common to all currently supported packages.
"""

which, for example, Daft doesn't have

I had a similar issue for ibis.Table in (#2944)

IntoLazyFrame: TypeAlias = Union["NativeLazyFrame", "_NativeIbis"]

It's okay to have multiple protocols/sub-protocols/aliases if we need that now

I started with 1 because we had a common denominator, but we grow πŸ˜„

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a common denominator

right, so if we add these comparison operators, we can remove isin and between?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep between if possible

Kind of like how we have filter for NativeSeries - it's something to avoid false-positives

Waaaaay more than just the types we're interested in will have comparison dunders, but fewer will also have between as well

@ym-pett ym-pett marked this pull request as ready for review August 8, 2025 09:27
@MarcoGorelli MarcoGorelli changed the title Fix sql operator problem chore: fixup operator type ignores for SQLExpr Aug 8, 2025
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks both

merging then so we can make progress on extensions

@MarcoGorelli MarcoGorelli merged commit 787114d into narwhals-dev:main Aug 8, 2025
32 of 34 checks passed
@ym-pett ym-pett deleted the fix_sql_operator_problem branch August 14, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants