This package is WIP and ultimately aim to provide a polars like API for DuckDB Python API, as well as full support of the DuckDB functions.
Altough pql aims to be as close as possible to polars, some differences exists.
Sometimes they are due to hard limitations of duckdb (e.g Categorical datatypes), sometimes they are just deliberate design choices (e.g cross join strategy).
Some of those are listed here, but for a more comprehensive list, see the API coverage report.
DataFramedon't exist. OnlyLazyFrameis implemented, as it is the only one that can be implemented with duckdb.- To convert to polars, you can do:
import pql
lf_pql = pql.LazyFrame({...})
lf_polars = lf_pql.lazy() # equivalent to DuckDBPyRelation.pl(lazy=True)
df_polars = lf_pql.collect() # equivalent to DuckDBPyRelation.pl(lazy=False)LazyFrame.join()don't have a"cross"strategy. Instead, callLazyFrame.join_cross(). This is a deliberate choice, because:duckdbnatively have differents methods for join/cross_join- The internal implementation is simpler and cleaner if we don't have to handle the cross join as a special case of the regular join
- The public API is clearer, as on, left_on and right_on parameters don't make sense for a cross join, and it is better to not have them in the signature of the method, rather than throwing runtime errors if they are used with a cross join strategy.
Categoricaldatatypes are not supported (this is not representable in duckdb).
- Full support of the
GEOMETRYdatatypes and functions, as they are natively supported in duckdb LazyFrame.group_by_all()method -> see more here- columns/schema, and other methods/properties who return plain python
Iterablereturn pyochain objects. This allows you to use all the methods of those objects, whilst keeping the same method chaining style than withExpression/LazyFrame. For example, you can do:
>>> data = {"price": [1, 2, 3], "name": ["x", "y", "z"]}
>>> lf = pql.LazyFrame(data)
# get the columns as a pyochain object
>>> cols = lf.columns.iter().filter(lambda col: col.startswith("p"))
>>> lf.select(cols).columns
Vec("price",)narwhals aims to support more functionnality from polars AND all of those from duckdb, as pql is not limited by multiple backend compatibility.
Furthermore, narwhals is primarly designed for library developpers who want integration with multiple dataframe libraries, not for end users.
Narwhals support a subset of polars API, hence necessarily a subset of DuckDB API, while pql aims to support the full API of both.
SQLFrame is fundamentally a PySpark oriented library API-wise.
Ibis has a different syntax from polars. It can be close for some operations, but totally different for others. Also the goal isn't the same, as Ibis is more focused on providing a high level API for multiple backends, while pql is focused on providing a polars like API for DuckDB.
pql two main public classes are LazyFrame and Expr.
Expressions are the base building blocks of the API.
an Expr is a wrapper around an internal SqlExpr class.
This responsibility separation allows to separate metadata handling (column names resolution mainly, Selectors implementation, etc.. ), and internal implementation of custom expressions (e.g Expr.str.titlecase()).
SqlExpr in turn wraps a sqlglot.Expression object, which is the AST used to generate the final SQL query.
Once needed, the sqlglot.Expression is converted to a native duckdb.Expression object, which is the one used to execute the query.
This class wraps a duckdb.DuckDBPyRelation object, and is the main entry point for users.
It provides methods that give context for the Expr objects, and also handle the final SQL query generation and execution.
Scripts are used for code generation and API comparison at dev time. They are not meant to be used by end users, and are not part of the public API.
More infos with the following command:
uv run -m scripts --helpThe compare command will create the coverage report to compare pql vs polars and narwhals API's.
The gen-{fns, themes} commands will respectively generate python code for:
- The functions from the
table_functionsDuckDB table - A
Literalfor SQL display theming (seeThemetype)
Note that if you never generated the table_functions code, you need first to run fns-to_parquet once to get the parquet file with the data casted and updated, and then gen-fns to generate the code.