-
Notifications
You must be signed in to change notification settings - Fork 168
fix(DONT MERGE): duckdb>=1.4.1
typing & warnings
#3189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Will close #3188
It has children, but we don't use them https://github.com/narwhals-dev/narwhals/actions/runs/18380855813/job/52366638686?pr=3189
It has children, but we don't use them https://github.com/narwhals-dev/narwhals/actions/runs/18380855813/job/52366638686?pr=3189
β¦-dev/narwhals into fix-duckdb-1-4-1-typing
Merge conflict fixup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dangotbanned - I personally have mixed feeling for all the patching that is happening in the typing module. It's roughly 100 lines to take care of 6 issues that arguably duckdb should deal with?! Isn't there also the risk to lose track of potential future updates/improvement they might do just because we are taking care of all these details? I am no expert but I will give a try to simplifying it a bit
I am 1000% onboarded on all the remaining changes
lit = duckdb.ConstantExpression | ||
"""Alias for `duckdb.ConstantExpression`.""" | ||
|
||
# TODO @dangotbanned: Raise an issue upstream on `Expression | str` too narrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I notice that as well! It's worse than having Any
π’
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp I found where this ends up (most recent last)
- https://github.com/duckdb/duckdb-python/blob/df7789cbd31b2d2b8d03d012f14331bc3297fb2d/src/duckdb_py/pyexpression.cpp#L348-L359
- https://github.com/duckdb/duckdb-python/blob/df7789cbd31b2d2b8d03d012f14331bc3297fb2d/src/duckdb_py/native/python_conversion.cpp#L1075-L1079
- https://github.com/duckdb/duckdb-python/blob/df7789cbd31b2d2b8d03d012f14331bc3297fb2d/src/duckdb_py/native/python_conversion.cpp#L916-L1069
Now I just need to work through TransformPythonObjectInternal
to see what is allowed.
But it is definitely not Expression | str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FBruzzesi!!!! I wasn't expecting it to support all of this, but here's the proof π
from __future__ import annotations
import datetime as dt # noqa: F811
import decimal
import uuid
from typing import Any, Callable, cast # noqa: F811
import duckdb # noqa: F811
import numpy as np # noqa: F811
import pandas as pd
lit = cast("Callable[[Any], duckdb.Expression]", duckdb.ConstantExpression)
def try_lit(*inputs: Any) -> None:
for value in inputs:
print(f"{value}: {lit(value)}")
try_lit(
None,
pd.NaT,
pd.NA,
np.ma.masked,
True,
False,
0,
-999_991,
100_000_000_000_000,
1e98,
decimal.Decimal("2933.957546"),
uuid.uuid5(uuid.NAMESPACE_URL, "hello"),
dt.datetime(2034, 1, 23, 4, 2, 56),
dt.time(1, 2, 3, 4),
dt.date(1023, 3, 1),
dt.timedelta(50),
"i am a string",
bytearray(range(4)),
memoryview(b"i was a bytestring"),
b"i am a bytestring",
["list", "of", "str"],
("tuple", "of", "str"),
{"dict": "str"},
np.arange(10),
np.datetime64(dt.date(2000, 3, 1)),
duckdb.UnsignedIntegerValue(50),
)
None: NULL
NaT: NULL
<NA>: NULL
--: NULL
True: true
False: false
0: 0
-999991: -999991
100000000000000: 100000000000000
1e+98: 1e+98
2933.957546: 2933.957546
074171de-bc84-5ea4-b636-1135477620e1: '074171de-bc84-5ea4-b636-1135477620e1'::UUID
2034-01-23 04:02:56: '2034-01-23 04:02:56'::TIMESTAMP
01:02:03.000004: '01:02:03.000004'::TIME
1023-03-01: '1023-03-01'::DATE
50 days, 0:00:00: '50 days'::INTERVAL
i am a string: 'i am a string'
bytearray(b'\x00\x01\x02\x03'): '\x00\x01\x02\x03'::BLOB
<memory at 0x00000154FFEA3100>: 'i was a bytestring'::BLOB
b'i am a bytestring': 'i am a bytestring'::BLOB
['list', 'of', 'str']: ['list', 'of', 'str']
('tuple', 'of', 'str'): ['tuple', 'of', 'str']
{'dict': 'str'}: {'dict': 'str'}
[0 1 2 3 4 5 6 7 8 9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2000-03-01: '2000-03-01'::DATE
50: 50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FBruzzesi ahh I thought I was safe in draft π I vaguely mentioned in a comment somewhere (that I can't seem to find π€¦ββοΈ) about needing to upstream some stuff to All of (https://github.com/narwhals-dev/narwhals/pull/3189/files#diff-3e3b75f6b9584445d40b9b7564a703fb4399c24094039fef6d8ab041d43954c7) is essentially experimenting with what we need for most of typing to be "fixed" - so I can refer to it in an issue π I still need to work out the right signature for these:
I thought (once I've fixed them on our side) they might be an easier first issue to raise over there π |
def test_expr_is_in_iterable( | ||
constructor: Constructor, request: pytest.FixtureRequest | ||
) -> None: | ||
if any(x in str(constructor) for x in ("sqlframe", "polars")): | ||
request.applymarker(pytest.mark.xfail) | ||
df = nw.from_native(constructor(data)) | ||
sequence = 4, 2 | ||
result = df.select(nw.col("a").is_in(iter(sequence))) | ||
expected = {"a": [False, True, True, False]} | ||
assert_equal_data(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(#3189 (comment))
I'm probably gonna need to open a new PR to fix this for the other backends.
Although Marco may still have reservations since:
here, the issue is the same thing.
Expr.is_in
documents and allows Iterable
at runtime:
Show Expr.is_in
Lines 971 to 1000 in ebb2a40
def is_in(self, other: Any) -> Self: | |
"""Check if elements of this expression are present in the other iterable. | |
Arguments: | |
other: iterable | |
Examples: | |
>>> import pandas as pd | |
>>> import narwhals as nw | |
>>> df_native = pd.DataFrame({"a": [1, 2, 9, 10]}) | |
>>> df = nw.from_native(df_native) | |
>>> df.with_columns(b=nw.col("a").is_in([1, 2])) | |
ββββββββββββββββββββ | |
|Narwhals DataFrame| | |
|------------------| | |
| a b | | |
| 0 1 True | | |
| 1 2 True | | |
| 2 9 False | | |
| 3 10 False | | |
ββββββββββββββββββββ | |
""" | |
if isinstance(other, Iterable) and not isinstance(other, (str, bytes)): | |
return self._with_elementwise( | |
lambda plx: self._to_compliant_expr(plx).is_in( | |
to_native(other, pass_through=True) | |
) | |
) | |
msg = "Narwhals `is_in` doesn't accept expressions as an argument, as opposed to Polars. You should provide an iterable instead." | |
raise NotImplementedError(msg) |
But our tests cover only list[int]
https://github.com/narwhals-dev/narwhals/blob/ebb2a40bab2f18b485512707ffb8cd440bb17172/tests/expr_and_series/is_in_test.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only discovered this because of the new typing.
Here DuckDBExpr.is_in
accepts Sequence[Any]
:
narwhals/narwhals/_duckdb/expr.py
Line 260 in ebb2a40
def is_in(self, other: Sequence[Any]) -> Self: |
That itself is wrong.
It is passed Iterable[Any]
, but slips through on pass_through=True
Lines 995 to 996 in ebb2a40
lambda plx: self._to_compliant_expr(plx).is_in( | |
to_native(other, pass_through=True) |
But I now also know that where this ends up, will call lit = duckdb.ConstantExpression
:
narwhals/narwhals/_duckdb/expr.py
Lines 260 to 261 in ebb2a40
def is_in(self, other: Sequence[Any]) -> Self: | |
return self._with_elementwise(lambda expr: F("contains", lit(other), expr)) |
Which accept lots of types, but they are all concrete types:
narwhals/narwhals/_duckdb/typing.py
Lines 30 to 41 in ed6f6b2
IntoDuckDBLiteral: TypeAlias = """ | |
PythonLiteral | |
| dict[Any, Any] | |
| uuid.UUID | |
| bytearray | |
| memoryview | |
| Into1DArray | |
| pd.api.typing.NaTType | |
| pd.api.typing.NAType | |
| np.ma.MaskedArray | |
| duckdb.Value | |
""" |
Lines 363 to 368 in ebb2a40
NumericLiteral: TypeAlias = "int | float | Decimal" | |
TemporalLiteral: TypeAlias = "dt.date | dt.datetime | dt.time | dt.timedelta" | |
NonNestedLiteral: TypeAlias = ( | |
"NumericLiteral | TemporalLiteral | str | bool | bytes | None" | |
) | |
PythonLiteral: TypeAlias = "NonNestedLiteral | list[Any] | tuple[Any, ...]" |
So an Iterable
like Iterator
will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your pr
tbh i feel like this is too complex
once duckdb and ibis make a compatible release i'll make a pr showing a different idea
Thanks for the review @MarcoGorelli, but I don't want to merge most of this to I'm just trying to gather things we currently would need to get the typing working - but I actually want to fix this upstream. TL;DR: the new stubs don't match the implementation |
duckdb>=1.4.1
typing & warningsduckdb>=1.4.1
typing & warnings
Will close #3188
What type of PR is this? (check all applicable)
Related issues
duckdb==1.4.1
Β #3188duckdb.typing
is deprecated and will be removed in a future version. Please useduckdb.sqltypes
instead.Β eakmanrq/sqlframe#531narwhals
-supported datatypes handled?Β akmalsoliev/Validoopsie#30If you have comments or can explain your changes, please do so below
Note
Typing is "fixed"
Still need to open some issues/PRs upstream in
duckdb
(#3189 (comment))