-
Notifications
You must be signed in to change notification settings - Fork 32
Description
What happens?
First of all nice, feature with the lazy DataFrame. I have a problem with this new feature in duckdb v1.4.1 also tested with 1.4.0 and polars version 1.34.0, but also tested with versions earlier than this.
Mostly the first loading and filtering ... with pl(lazy = True) works but e.g. joins with other tables are not working and results in this Error:
ComputeError: caught exception during execution of a Python source, exception: InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result.
Full Trace:
File ~/.venv/lib/python3.9/site-packages/polars/_utils/deprecation.py:97, in deprecate_streaming_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
93 kwargs["engine"] = "in-memory"
95 del kwargs["streaming"]
---> [97](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225468726f6d626f7365227d.vscode-resource.vscode-cdn.net/home/cdsw/notebooks/~/.venv/lib/python3.9/site-packages/polars/_utils/deprecation.py:97) return function(*args, **kwargs)
File ~/.venv/lib/python3.9/site-packages/polars/lazyframe/opt_flags.py:328, in forward_old_opt_flags.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
325 optflags = cb(optflags, kwargs.pop(key)) # type: ignore[no-untyped-call,unused-ignore]
327 kwargs["optimizations"] = optflags
--> [328](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225468726f6d626f7365227d.vscode-resource.vscode-cdn.net/home/cdsw/notebooks/~/.venv/lib/python3.9/site-packages/polars/lazyframe/opt_flags.py:328) return function(*args, **kwargs)
File ~/.venv/lib/python3.9/site-packages/polars/lazyframe/frame.py:2415, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, engine, background, optimizations, **_kwargs)
2413 # Only for testing purposes
2414 callback = _kwargs.get("post_opt_callback", callback)
-> [2415](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225468726f6d626f7365227d.vscode-resource.vscode-cdn.net/home/cdsw/notebooks/~/.venv/lib/python3.9/site-packages/polars/lazyframe/frame.py:2415) return wrap_df(ldf.collect(engine, callback))
To Reproduce
con = duckdb.connect(db.db_path, read_only= True)
df_lab = con.sql("SELECT * FROM data1").pl(lazy = True)
df_main = con.sql("SELECT * FROM data2").pl(lazy = True)
df_lab.join(df_main, on = "account_id").collect()
this would be the LazyFrame:
naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan)
INNER JOIN:
LEFT PLAN ON: [col("account_id")]
PYTHON SCAN []
PROJECT */10 COLUMNS
RIGHT PLAN ON: [col("account_id")]
PYTHON SCAN []
PROJECT */70 COLUMNS
END INNER JOIN
OS:
linux
DuckDB Version:
1.41.0
DuckDB Client:
Python
Hardware:
No response
Full Name:
Maximilian Zeidler
Affiliation:
Helios
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have