-
Notifications
You must be signed in to change notification settings - Fork 3
Add a session
keyword argument to to_sql
and to_df
#427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -453,6 +453,7 @@ The `to_sql` API takes in PyDough code and transforms it into SQL query text wit | |||||||||||||||
- `metadata`: the PyDough knowledge graph to use for the conversion (if omitted, `pydough.active_session.metadata` is used instead). | ||||||||||||||||
- `config`: the PyDough configuration settings to use for the conversion (if omitted, `pydough.active_session.config` is used instead). | ||||||||||||||||
- `database`: the database context to use for the conversion (if omitted, `pydough.active_session.database` is used instead). The database context matters because it controls which SQL dialect is used for the translation. | ||||||||||||||||
- `session`: a PyDough session object which, if provided, is used instead of `pydough.active_session` or the `metadata` / `config` / `database` arguments. Note: this argument cannot be used alongside those arguments. | ||||||||||||||||
|
||||||||||||||||
Below is an example of using `pydough.to_sql` and the output (the SQL output may be outdated if PyDough's SQL conversion process has been updated): | ||||||||||||||||
|
||||||||||||||||
|
@@ -464,34 +465,22 @@ pydough.to_sql(result, columns=["name", "n_custs"]) | |||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
```sql | ||||||||||||||||
SELECT name, COALESCE(agg_0, 0) AS n_custs | ||||||||||||||||
FROM ( | ||||||||||||||||
SELECT name, agg_0 | ||||||||||||||||
FROM ( | ||||||||||||||||
SELECT name, key | ||||||||||||||||
FROM ( | ||||||||||||||||
SELECT _table_alias_0.name AS name, _table_alias_0.key AS key, _table_alias_1.name AS name_3 | ||||||||||||||||
FROM ( | ||||||||||||||||
SELECT n_name AS name, n_nationkey AS key, n_regionkey AS region_key FROM main.NATION | ||||||||||||||||
) AS _table_alias_0 | ||||||||||||||||
LEFT JOIN ( | ||||||||||||||||
SELECT r_name AS name, r_regionkey AS key | ||||||||||||||||
FROM main.REGION | ||||||||||||||||
) AS _table_alias_1 | ||||||||||||||||
ON region_key = _table_alias_1.key | ||||||||||||||||
) | ||||||||||||||||
WHERE name_3 = 'EUROPE' | ||||||||||||||||
) | ||||||||||||||||
LEFT JOIN ( | ||||||||||||||||
SELECT nation_key, COUNT(*) AS agg_0 | ||||||||||||||||
FROM ( | ||||||||||||||||
SELECT c_nationkey AS nation_key | ||||||||||||||||
FROM main.CUSTOMER | ||||||||||||||||
) | ||||||||||||||||
GROUP BY nation_key | ||||||||||||||||
) | ||||||||||||||||
ON key = nation_key | ||||||||||||||||
WITH _s3 AS ( | ||||||||||||||||
SELECT | ||||||||||||||||
c_nationkey, | ||||||||||||||||
COUNT(*) AS n_rows | ||||||||||||||||
FROM tpch.customer | ||||||||||||||||
GROUP BY | ||||||||||||||||
1 | ||||||||||||||||
) | ||||||||||||||||
SELECT | ||||||||||||||||
nation.n_name AS name, | ||||||||||||||||
_s3.n_rows AS n_custs | ||||||||||||||||
FROM tpch.nation AS nation | ||||||||||||||||
JOIN tpch.region AS region | ||||||||||||||||
ON nation.n_regionkey = region.r_regionkey AND region.r_name = 'EUROPE' | ||||||||||||||||
JOIN _s3 AS _s3 | ||||||||||||||||
ON _s3.c_nationkey = nation.n_nationkey | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
See the [demo notebooks](../demos/README.md) for more instances of how to use the `to_sql` API. | ||||||||||||||||
|
@@ -506,6 +495,7 @@ The `to_df` API does all the same steps as the [`to_sql` API](#pydoughto_sql), b | |||||||||||||||
- `metadata`: the PyDough knowledge graph to use for the conversion (if omitted, `pydough.active_session.metadata` is used instead). | ||||||||||||||||
- `config`: the PyDough configuration settings to use for the conversion (if omitted, `pydough.active_session.config` is used instead). | ||||||||||||||||
- `database`: the database context to use for the conversion (if omitted, `pydough.active_session.database` is used instead). The database context matters because it controls which SQL dialect is used for the translation. | ||||||||||||||||
- `session`: a PyDough session object which, if provided, is used instead of `pydough.active_session` or the `metadata` / `config` / `database` arguments. Note: this argument cannot be used alongside those arguments. | ||||||||||||||||
- `display_sql`: displays the sql before executing in a logger. | ||||||||||||||||
|
||||||||||||||||
Below is an example of using `pydough.to_df` and the output, attached to a sqlite database containing data for the TPC-H schema: | ||||||||||||||||
|
@@ -644,41 +634,35 @@ The value of `sql` is the following SQL query text as a Python string: | |||||||||||||||
```sql | ||||||||||||||||
WITH _s7 AS ( | ||||||||||||||||
SELECT | ||||||||||||||||
ROUND( | ||||||||||||||||
COALESCE( | ||||||||||||||||
SUM( | ||||||||||||||||
lineitem.l_extendedprice * ( | ||||||||||||||||
1 - lineitem.l_discount | ||||||||||||||||
) * ( | ||||||||||||||||
1 - lineitem.l_tax | ||||||||||||||||
) - lineitem.l_quantity * partsupp.ps_supplycost | ||||||||||||||||
), | ||||||||||||||||
0 | ||||||||||||||||
), | ||||||||||||||||
2 | ||||||||||||||||
) AS revenue_year, | ||||||||||||||||
partsupp.ps_suppkey | ||||||||||||||||
partsupp.ps_suppkey, | ||||||||||||||||
SUM( | ||||||||||||||||
lineitem.l_extendedprice * ( | ||||||||||||||||
1 - lineitem.l_discount | ||||||||||||||||
) * ( | ||||||||||||||||
1 - lineitem.l_tax | ||||||||||||||||
) - lineitem.l_quantity * partsupp.ps_supplycost | ||||||||||||||||
) AS sum_rev | ||||||||||||||||
FROM main.partsupp AS partsupp | ||||||||||||||||
JOIN main.part AS part | ||||||||||||||||
ON part.p_name LIKE 'coral%' AND part.p_partkey = partsupp.ps_partkey | ||||||||||||||||
JOIN main.lineitem AS lineitem | ||||||||||||||||
ON CAST(STRFTIME('%Y', lineitem.l_shipdate) AS INTEGER) = 1996 | ||||||||||||||||
ON EXTRACT(YEAR FROM CAST(lineitem.l_shipdate AS DATETIME)) = 1996 | ||||||||||||||||
AND lineitem.l_partkey = partsupp.ps_partkey | ||||||||||||||||
AND lineitem.l_shipmode = 'TRUCK' | ||||||||||||||||
AND lineitem.l_suppkey = partsupp.ps_suppkey | ||||||||||||||||
GROUP BY | ||||||||||||||||
partsupp.ps_suppkey | ||||||||||||||||
1 | ||||||||||||||||
) | ||||||||||||||||
SELECT | ||||||||||||||||
supplier.s_name AS name, | ||||||||||||||||
_s7.revenue_year | ||||||||||||||||
ROUND(COALESCE(_s7.sum_rev, 0), 2) AS revenue_year | ||||||||||||||||
FROM main.supplier AS supplier | ||||||||||||||||
JOIN main.nation AS nation | ||||||||||||||||
ON nation.n_name = 'JAPAN' AND nation.n_nationkey = supplier.s_nationkey | ||||||||||||||||
JOIN _s7 AS _s7 | ||||||||||||||||
ON _s7.ps_suppkey = supplier.s_suppkey | ||||||||||||||||
ORDER BY | ||||||||||||||||
revenue_year DESC | ||||||||||||||||
2 DESC | ||||||||||||||||
LIMIT 5 | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
|
@@ -716,27 +700,27 @@ The value of `sql` is the following SQL query text as a Python string: | |||||||||||||||
```sql | ||||||||||||||||
WITH _s1 AS ( | ||||||||||||||||
SELECT | ||||||||||||||||
COALESCE(SUM(o_totalprice), 0) AS total, | ||||||||||||||||
o_custkey, | ||||||||||||||||
COUNT(*) AS n_rows, | ||||||||||||||||
o_custkey | ||||||||||||||||
SUM(o_totalprice) AS sum_o_totalprice | ||||||||||||||||
FROM main.orders | ||||||||||||||||
WHERE | ||||||||||||||||
o_orderdate < '1997-01-01' | ||||||||||||||||
AND o_orderdate >= '1996-01-01' | ||||||||||||||||
o_orderdate < CAST('1997-01-01' AS DATE) | ||||||||||||||||
AND o_orderdate >= CAST('1996-01-01' AS DATE) | ||||||||||||||||
AND o_orderpriority = '1-URGENT' | ||||||||||||||||
AND o_totalprice > 100000 | ||||||||||||||||
GROUP BY | ||||||||||||||||
o_custkey | ||||||||||||||||
1 | ||||||||||||||||
) | ||||||||||||||||
SELECT | ||||||||||||||||
customer.c_name AS name, | ||||||||||||||||
_s1.n_rows AS n_orders, | ||||||||||||||||
_s1.total | ||||||||||||||||
_s1.sum_o_totalprice AS total | ||||||||||||||||
FROM main.customer AS customer | ||||||||||||||||
JOIN _s1 AS _s1 | ||||||||||||||||
ON _s1.o_custkey = customer.c_custkey | ||||||||||||||||
ORDER BY | ||||||||||||||||
total DESC | ||||||||||||||||
3 DESC | ||||||||||||||||
``` | ||||||||||||||||
|
||||||||||||||||
<!-- TOC --><a name="exploration-apis"></a> | ||||||||||||||||
|
@@ -804,7 +788,7 @@ The `explain` API is a more generic explanation interface that can be called on | |||||||||||||||
- A specific property within a specific collection within a metadata graph object (can be accessed as `graph["collection_name"]["property_name"]`) | ||||||||||||||||
- The PyDough code for a collection that could have `to_sql` or `to_df` called on it. | ||||||||||||||||
|
||||||||||||||||
The `explain` API also has an optional `verbose` argument (default=False) that enables displaying additional information. | ||||||||||||||||
The `explain` API also has an optional `verbose` argument (default=False) that enables displaying additional information. It also has an optional `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`). | ||||||||||||||||
|
||||||||||||||||
Below are examples of each of these behaviors, using a knowledge graph for the TPCH schema. | ||||||||||||||||
|
||||||||||||||||
|
@@ -1022,7 +1006,8 @@ The `explain` API is limited in that it can only be called on complete PyDough c | |||||||||||||||
|
||||||||||||||||
To handle cases where you need to learn about a term within a collection, you can use the `explain_term` API. The first argument to `explain_term` is PyDough code for a collection, which can have `explain` called on it, and the second is PyDough code for a term that can be evaluated within the context of that collection (e.g. a scalar term of the collection, or one of its sub-collections). | ||||||||||||||||
|
||||||||||||||||
The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary. | ||||||||||||||||
The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary. The `explain_term` API also has an optional `verbose` argument (default=False) that enables displaying additional information. It also has an optional `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`). | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets rephrase this so we don't use
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree to rephrase, I'll iterate on this since I'm not a fan of "also has some optional arguments" either. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
Below are examples of using `explain_term`, using a knowledge graph for the TPCH schema. For each of these examples, `european_countries` is the "context" collection, which could have `to_sql` or `to_df` called on it, and `term` is the term being explained with regards to `european_countries`. | ||||||||||||||||
|
||||||||||||||||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these queries change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was updating the documentation, I also updated the example generated SQL used in the documentation to reflect the most recent version of PyDough (since these are not automatically generated)