Skip to content

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Oct 9, 2025

Will close #3188

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

If you have comments or can explain your changes, please do so below

Note

Typing is "fixed"
Still need to open some issues/PRs upstream in duckdb (#3189 (comment))

@dangotbanned dangotbanned linked an issue Oct 9, 2025 that may be closed by this pull request
2 tasks
@dangotbanned dangotbanned added fix typing duckdb Issue is related to duckdb backend labels Oct 9, 2025
@dangotbanned dangotbanned added fix typing blocked duckdb Issue is related to duckdb backend labels Oct 9, 2025
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dangotbanned - I personally have mixed feeling for all the patching that is happening in the typing module. It's roughly 100 lines to take care of 6 issues that arguably duckdb should deal with?! Isn't there also the risk to lose track of potential future updates/improvement they might do just because we are taking care of all these details? I am no expert but I will give a try to simplifying it a bit

I am 1000% onboarded on all the remaining changes

lit = duckdb.ConstantExpression
"""Alias for `duckdb.ConstantExpression`."""

# TODO @dangotbanned: Raise an issue upstream on `Expression | str` too narrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I notice that as well! It's worse than having Any 😒

Copy link
Member Author

@dangotbanned dangotbanned Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@dangotbanned dangotbanned Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FBruzzesi!!!! I wasn't expecting it to support all of this, but here's the proof πŸ˜‚

from __future__ import annotations

import datetime as dt  # noqa: F811
import decimal
import uuid
from typing import Any, Callable, cast  # noqa: F811

import duckdb  # noqa: F811
import numpy as np  # noqa: F811
import pandas as pd

lit = cast("Callable[[Any], duckdb.Expression]", duckdb.ConstantExpression)


def try_lit(*inputs: Any) -> None:
    for value in inputs:
        print(f"{value}: {lit(value)}")


try_lit(
    None,
    pd.NaT,
    pd.NA,
    np.ma.masked,
    True,
    False,
    0,
    -999_991,
    100_000_000_000_000,
    1e98,
    decimal.Decimal("2933.957546"),
    uuid.uuid5(uuid.NAMESPACE_URL, "hello"),
    dt.datetime(2034, 1, 23, 4, 2, 56),
    dt.time(1, 2, 3, 4),
    dt.date(1023, 3, 1),
    dt.timedelta(50),
    "i am a string",
    bytearray(range(4)),
    memoryview(b"i was a bytestring"),
    b"i am a bytestring",
    ["list", "of", "str"],
    ("tuple", "of", "str"),
    {"dict": "str"},
    np.arange(10),
    np.datetime64(dt.date(2000, 3, 1)),
    duckdb.UnsignedIntegerValue(50),
)
None: NULL
NaT: NULL
<NA>: NULL
--: NULL
True: true
False: false
0: 0
-999991: -999991
100000000000000: 100000000000000
1e+98: 1e+98
2933.957546: 2933.957546
074171de-bc84-5ea4-b636-1135477620e1: '074171de-bc84-5ea4-b636-1135477620e1'::UUID
2034-01-23 04:02:56: '2034-01-23 04:02:56'::TIMESTAMP
01:02:03.000004: '01:02:03.000004'::TIME
1023-03-01: '1023-03-01'::DATE
50 days, 0:00:00: '50 days'::INTERVAL
i am a string: 'i am a string'
bytearray(b'\x00\x01\x02\x03'): '\x00\x01\x02\x03'::BLOB
<memory at 0x00000154FFEA3100>: 'i was a bytestring'::BLOB
b'i am a bytestring': 'i am a bytestring'::BLOB
['list', 'of', 'str']: ['list', 'of', 'str']
('tuple', 'of', 'str'): ['tuple', 'of', 'str']
{'dict': 'str'}: {'dict': 'str'}
[0 1 2 3 4 5 6 7 8 9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2000-03-01: '2000-03-01'::DATE
50: 50

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned
Copy link
Member Author

I personally have mixed feeling for all the patching that is happening in the typing module. It's roughly 100 lines to take care of 6 issues that arguably duckdb should deal with?! Isn't there also the risk to lose track of potential future updates/improvement they might do just because we are taking care of all these details?

@FBruzzesi ahh I thought I was safe in draft πŸ˜‚

I vaguely mentioned in a comment somewhere (that I can't seem to find πŸ€¦β€β™‚οΈ) about needing to upstream some stuff to duckdb.

All of (https://github.com/narwhals-dev/narwhals/pull/3189/files#diff-3e3b75f6b9584445d40b9b7564a703fb4399c24094039fef6d8ab041d43954c7) is essentially experimenting with what we need for most of typing to be "fixed" - so I can refer to it in an issue πŸ™‚

I still need to work out the right signature for these:

  • LambdaExpression
  • ConstantExpression

I thought (once I've fixed them on our side) they might be an easier first issue to raise over there πŸ™

Comment on lines +29 to +38
def test_expr_is_in_iterable(
constructor: Constructor, request: pytest.FixtureRequest
) -> None:
if any(x in str(constructor) for x in ("sqlframe", "polars")):
request.applymarker(pytest.mark.xfail)
df = nw.from_native(constructor(data))
sequence = 4, 2
result = df.select(nw.col("a").is_in(iter(sequence)))
expected = {"a": [False, True, True, False]}
assert_equal_data(result, expected)
Copy link
Member Author

@dangotbanned dangotbanned Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(#3189 (comment))
I'm probably gonna need to open a new PR to fix this for the other backends.

Although Marco may still have reservations since:

here, the issue is the same thing.
Expr.is_in documents and allows Iterable at runtime:

Show Expr.is_in

narwhals/narwhals/expr.py

Lines 971 to 1000 in ebb2a40

def is_in(self, other: Any) -> Self:
"""Check if elements of this expression are present in the other iterable.
Arguments:
other: iterable
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2, 9, 10]})
>>> df = nw.from_native(df_native)
>>> df.with_columns(b=nw.col("a").is_in([1, 2]))
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
|Narwhals DataFrame|
|------------------|
| a b |
| 0 1 True |
| 1 2 True |
| 2 9 False |
| 3 10 False |
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
"""
if isinstance(other, Iterable) and not isinstance(other, (str, bytes)):
return self._with_elementwise(
lambda plx: self._to_compliant_expr(plx).is_in(
to_native(other, pass_through=True)
)
)
msg = "Narwhals `is_in` doesn't accept expressions as an argument, as opposed to Polars. You should provide an iterable instead."
raise NotImplementedError(msg)

But our tests cover only list[int] https://github.com/narwhals-dev/narwhals/blob/ebb2a40bab2f18b485512707ffb8cd440bb17172/tests/expr_and_series/is_in_test.py

Copy link
Member Author

@dangotbanned dangotbanned Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only discovered this because of the new typing.

Here DuckDBExpr.is_in accepts Sequence[Any]:

def is_in(self, other: Sequence[Any]) -> Self:

That itself is wrong.
It is passed Iterable[Any], but slips through on pass_through=True

narwhals/narwhals/expr.py

Lines 995 to 996 in ebb2a40

lambda plx: self._to_compliant_expr(plx).is_in(
to_native(other, pass_through=True)

But I now also know that where this ends up, will call lit = duckdb.ConstantExpression:

def is_in(self, other: Sequence[Any]) -> Self:
return self._with_elementwise(lambda expr: F("contains", lit(other), expr))

Which accept lots of types, but they are all concrete types:

IntoDuckDBLiteral: TypeAlias = """
PythonLiteral
| dict[Any, Any]
| uuid.UUID
| bytearray
| memoryview
| Into1DArray
| pd.api.typing.NaTType
| pd.api.typing.NAType
| np.ma.MaskedArray
| duckdb.Value
"""

narwhals/narwhals/typing.py

Lines 363 to 368 in ebb2a40

NumericLiteral: TypeAlias = "int | float | Decimal"
TemporalLiteral: TypeAlias = "dt.date | dt.datetime | dt.time | dt.timedelta"
NonNestedLiteral: TypeAlias = (
"NumericLiteral | TemporalLiteral | str | bool | bytes | None"
)
PythonLiteral: TypeAlias = "NonNestedLiteral | list[Any] | tuple[Any, ...]"

So an Iterable like Iterator will fail

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your pr

tbh i feel like this is too complex

once duckdb and ibis make a compatible release i'll make a pr showing a different idea

@dangotbanned
Copy link
Member Author

dangotbanned commented Oct 11, 2025

#3189 (review)

Thanks for the review @MarcoGorelli, but I don't want to merge most of this to narwhals (#3189 (comment))

I'm just trying to gather things we currently would need to get the typing working - but I actually want to fix this upstream.
Those fixes might not be what is in here, but my hope is that this PR can serve to demonstrate that we're in need of some help from our upstream friends πŸ˜…

TL;DR: the new stubs don't match the implementation

@dangotbanned dangotbanned changed the title fix: duckdb>=1.4.1 typing & warnings fix(DONT MERGE): duckdb>=1.4.1 typing & warnings Oct 11, 2025
@dangotbanned

This comment was marked as resolved.

@dangotbanned dangotbanned removed the ibis Issue is related to ibis backend label Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked duckdb Issue is related to duckdb backend fix typing upstream issue