Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 39 additions & 54 deletions documentation/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,7 @@ The `to_sql` API takes in PyDough code and transforms it into SQL query text wit
- `metadata`: the PyDough knowledge graph to use for the conversion (if omitted, `pydough.active_session.metadata` is used instead).
- `config`: the PyDough configuration settings to use for the conversion (if omitted, `pydough.active_session.config` is used instead).
- `database`: the database context to use for the conversion (if omitted, `pydough.active_session.database` is used instead). The database context matters because it controls which SQL dialect is used for the translation.
- `session`: a PyDough session object which, if provided, is used instead of `pydough.active_session` or the `metadata` / `config` / `database` arguments. Note: this argument cannot be used alongside those arguments.

Below is an example of using `pydough.to_sql` and the output (the SQL output may be outdated if PyDough's SQL conversion process has been updated):

Expand All @@ -436,34 +437,22 @@ pydough.to_sql(result, columns=["name", "n_custs"])
```

```sql
SELECT name, COALESCE(agg_0, 0) AS n_custs
FROM (
SELECT name, agg_0
FROM (
SELECT name, key
FROM (
SELECT _table_alias_0.name AS name, _table_alias_0.key AS key, _table_alias_1.name AS name_3
FROM (
SELECT n_name AS name, n_nationkey AS key, n_regionkey AS region_key FROM main.NATION
) AS _table_alias_0
LEFT JOIN (
SELECT r_name AS name, r_regionkey AS key
FROM main.REGION
) AS _table_alias_1
ON region_key = _table_alias_1.key
)
WHERE name_3 = 'EUROPE'
)
LEFT JOIN (
SELECT nation_key, COUNT(*) AS agg_0
FROM (
SELECT c_nationkey AS nation_key
FROM main.CUSTOMER
)
GROUP BY nation_key
)
ON key = nation_key
WITH _s3 AS (
SELECT
c_nationkey,
COUNT(*) AS n_rows
FROM tpch.customer
GROUP BY
1
)
SELECT
nation.n_name AS name,
_s3.n_rows AS n_custs
FROM tpch.nation AS nation
JOIN tpch.region AS region
ON nation.n_regionkey = region.r_regionkey AND region.r_name = 'EUROPE'
JOIN _s3 AS _s3
ON _s3.c_nationkey = nation.n_nationkey
```

See the [demo notebooks](../demos/README.md) for more instances of how to use the `to_sql` API.
Expand All @@ -478,6 +467,7 @@ The `to_df` API does all the same steps as the [`to_sql` API](#pydoughto_sql), b
- `metadata`: the PyDough knowledge graph to use for the conversion (if omitted, `pydough.active_session.metadata` is used instead).
- `config`: the PyDough configuration settings to use for the conversion (if omitted, `pydough.active_session.config` is used instead).
- `database`: the database context to use for the conversion (if omitted, `pydough.active_session.database` is used instead). The database context matters because it controls which SQL dialect is used for the translation.
- `session`: a PyDough session object which, if provided, is used instead of `pydough.active_session` or the `metadata` / `config` / `database` arguments. Note: this argument cannot be used alongside those arguments.
- `display_sql`: displays the sql before executing in a logger.

Below is an example of using `pydough.to_df` and the output, attached to a sqlite database containing data for the TPC-H schema:
Expand Down Expand Up @@ -616,41 +606,35 @@ The value of `sql` is the following SQL query text as a Python string:
```sql
WITH _s7 AS (
SELECT
ROUND(
COALESCE(
SUM(
lineitem.l_extendedprice * (
1 - lineitem.l_discount
) * (
1 - lineitem.l_tax
) - lineitem.l_quantity * partsupp.ps_supplycost
),
0
),
2
) AS revenue_year,
partsupp.ps_suppkey
partsupp.ps_suppkey,
SUM(
lineitem.l_extendedprice * (
1 - lineitem.l_discount
) * (
1 - lineitem.l_tax
) - lineitem.l_quantity * partsupp.ps_supplycost
) AS sum_rev
FROM main.partsupp AS partsupp
JOIN main.part AS part
ON part.p_name LIKE 'coral%' AND part.p_partkey = partsupp.ps_partkey
JOIN main.lineitem AS lineitem
ON CAST(STRFTIME('%Y', lineitem.l_shipdate) AS INTEGER) = 1996
ON EXTRACT(YEAR FROM CAST(lineitem.l_shipdate AS DATETIME)) = 1996
AND lineitem.l_partkey = partsupp.ps_partkey
AND lineitem.l_shipmode = 'TRUCK'
AND lineitem.l_suppkey = partsupp.ps_suppkey
GROUP BY
partsupp.ps_suppkey
1
)
SELECT
supplier.s_name AS name,
_s7.revenue_year
ROUND(COALESCE(_s7.sum_rev, 0), 2) AS revenue_year
FROM main.supplier AS supplier
JOIN main.nation AS nation
ON nation.n_name = 'JAPAN' AND nation.n_nationkey = supplier.s_nationkey
JOIN _s7 AS _s7
ON _s7.ps_suppkey = supplier.s_suppkey
ORDER BY
revenue_year DESC
2 DESC
LIMIT 5
```

Expand Down Expand Up @@ -688,27 +672,27 @@ The value of `sql` is the following SQL query text as a Python string:
```sql
WITH _s1 AS (
SELECT
COALESCE(SUM(o_totalprice), 0) AS total,
o_custkey,
COUNT(*) AS n_rows,
o_custkey
SUM(o_totalprice) AS sum_o_totalprice
FROM main.orders
WHERE
o_orderdate < '1997-01-01'
AND o_orderdate >= '1996-01-01'
o_orderdate < CAST('1997-01-01' AS DATE)
AND o_orderdate >= CAST('1996-01-01' AS DATE)
AND o_orderpriority = '1-URGENT'
AND o_totalprice > 100000
GROUP BY
o_custkey
1
)
SELECT
customer.c_name AS name,
_s1.n_rows AS n_orders,
_s1.total
_s1.sum_o_totalprice AS total
FROM main.customer AS customer
JOIN _s1 AS _s1
ON _s1.o_custkey = customer.c_custkey
ORDER BY
total DESC
3 DESC
Copy link
Contributor

@john-sanchez31 john-sanchez31 Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these queries change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was updating the documentation, I also updated the example generated SQL used in the documentation to reflect the most recent version of PyDough (since these are not automatically generated)

```

<!-- TOC --><a name="exploration-apis"></a>
Expand Down Expand Up @@ -776,7 +760,7 @@ The `explain` API is a more generic explanation interface that can be called on
- A specific property within a specific collection within a metadata graph object (can be accessed as `graph["collection_name"]["property_name"]`)
- The PyDough code for a collection that could have `to_sql` or `to_df` called on it.

The `explain` API also has an optional `verbose` argument (default=False) that enables displaying additional information.
The `explain` API also has an optional `verbose` argument (default=False) that enables displaying additional information. It also has an optional `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`).

Below are examples of each of these behaviors, using a knowledge graph for the TPCH schema.

Expand Down Expand Up @@ -994,7 +978,8 @@ The `explain` API is limited in that it can only be called on complete PyDough c

To handle cases where you need to learn about a term within a collection, you can use the `explain_term` API. The first argument to `explain_term` is PyDough code for a collection, which can have `explain` called on it, and the second is PyDough code for a term that can be evaluated within the context of that collection (e.g. a scalar term of the collection, or one of its sub-collections).

The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary.
The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary. The `explain_term` API also has an optional `verbose` argument (default=False) that enables displaying additional information. It also has an optional `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets rephrase this so we don't use also has an optional argument twice.

Suggested change
The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary. The `explain_term` API also has an optional `verbose` argument (default=False) that enables displaying additional information. It also has an optional `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`).
The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary. The `explain_term` API also has some optional arguments. The `verbose` argument (default=False) that enables displaying additional information and the `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to rephrase, I'll iterate on this since I'm not a fan of "also has some optional arguments" either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `explain_term` API also has a `verbose` keyword argument (default False) to specify whether to include a more detailed explanation, as opposed to a more compact summary. The `explain_term` API also has an optional `verbose` argument (default=False) that enables displaying additional information. It also has an optional `session` argument to specify what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`).
The `explain_term` API has two optional arguments:
- `verbose` (default=False): if True, returns a detailed explanation; otherwise, returns a compact summary.
- `session`: specifies what configs etc. to use when explaining certain terms (if not provided, uses `pydough.active_session`)
.



Below are examples of using `explain_term`, using a knowledge graph for the TPCH schema. For each of these examples, `european_countries` is the "context" collection, which could have `to_sql` or `to_df` called on it, and `term` is the term being explained with regards to `european_countries`.

Expand Down
22 changes: 12 additions & 10 deletions pydough/conversion/agg_split.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@


import pydough.pydough_operators as pydop
from pydough.configs import PyDoughConfigs
from pydough.configs import PyDoughSession
from pydough.relational import (
Aggregate,
CallExpression,
Expand Down Expand Up @@ -51,15 +51,15 @@
"""


def decompose_aggregations(node: Aggregate, config: PyDoughConfigs) -> RelationalNode:
def decompose_aggregations(node: Aggregate, session: PyDoughSession) -> RelationalNode:
"""
Splits up an aggregate node into an aggregate followed by a projection when
the aggregate contains 1+ calls to functions that can be split into 1+
calls to partial aggregates, e.g. how AVG(X) = SUM(X)/COUNT(X).

Args:
`node`: the aggregate node to be decomposed.
`config`: the current configuration settings.
`session`: the PyDough session used during the transformation.

Returns:
The projection node on top of the new aggregate, overall containing the
Expand Down Expand Up @@ -110,7 +110,7 @@ def decompose_aggregations(node: Aggregate, config: PyDoughConfigs) -> Relationa
)
# If the config specifies that the default value for AVG should be
# zero, wrap the division in a DEFAULT_TO call.
if config.avg_default_zero:
if session.config.avg_default_zero:
avg_call = CallExpression(
pydop.DEFAULT_TO,
agg.data_type,
Expand Down Expand Up @@ -277,7 +277,7 @@ def transpose_aggregate_join(


def attempt_join_aggregate_transpose(
node: Aggregate, join: Join, config: PyDoughConfigs
node: Aggregate, join: Join, session: PyDoughSession
) -> tuple[RelationalNode, bool]:
"""
Determine whether the aggregate join transpose operation can occur, and if
Expand Down Expand Up @@ -396,7 +396,7 @@ def attempt_join_aggregate_transpose(
for col in node.aggregations.values():
if col.op in decomposable_aggfuncs:
return split_partial_aggregates(
decompose_aggregations(node, config), config
decompose_aggregations(node, session), session
), False

# Keep a dictionary for the projection columns that will be used to post-process
Expand Down Expand Up @@ -464,7 +464,7 @@ def attempt_join_aggregate_transpose(


def split_partial_aggregates(
node: RelationalNode, config: PyDoughConfigs
node: RelationalNode, session: PyDoughSession
) -> RelationalNode:
"""
Splits partial aggregates above joins into two aggregates, one above the
Expand All @@ -473,19 +473,21 @@ def split_partial_aggregates(

Args:
`node`: the root node of the relational plan to be transformed.
`config`: the current configuration settings.
`session`: the PyDough session used during the transformation.

Returns:
The transformed node. The transformation is also done-in-place.
"""
# If the aggregate+join pattern is detected, attempt to do the transpose.
handle_inputs: bool = True
if isinstance(node, Aggregate) and isinstance(node.input, Join):
node, handle_inputs = attempt_join_aggregate_transpose(node, node.input, config)
node, handle_inputs = attempt_join_aggregate_transpose(
node, node.input, session
)

# If needed, recursively invoke the procedure on all inputs to the node.
if handle_inputs:
node = node.copy(
inputs=[split_partial_aggregates(input, config) for input in node.inputs]
inputs=[split_partial_aggregates(input, session) for input in node.inputs]
)
return node
10 changes: 5 additions & 5 deletions pydough/conversion/filter_pushdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@


import pydough.pydough_operators as pydop
from pydough.configs import PyDoughConfigs
from pydough.configs import PyDoughSession
from pydough.relational import (
Aggregate,
CallExpression,
Expand Down Expand Up @@ -66,7 +66,7 @@ class FilterPushdownShuttle(RelationalShuttle):
cannot be pushed further.
"""

def __init__(self, configs: PyDoughConfigs):
def __init__(self, session: PyDoughSession):
# The set of filters that are currently being pushed down. When
# visit_xxx is called, it is presumed that the set of conditions in
# self.filters are the conditions that can be pushed down as far as the
Expand All @@ -76,7 +76,7 @@ def __init__(self, configs: PyDoughConfigs):
# simplification logic to aid in advanced filter predicate inference,
# such as determining that a left join is redundant because if the RHS
# column is null then the filter will always be false.
self.simplifier: SimplificationShuttle = SimplificationShuttle(configs)
self.simplifier: SimplificationShuttle = SimplificationShuttle(session)

def reset(self):
self.filters = set()
Expand Down Expand Up @@ -300,7 +300,7 @@ def visit_empty_singleton(self, empty_singleton: EmptySingleton) -> RelationalNo
return self.flush_remaining_filters(empty_singleton, self.filters, set())


def push_filters(node: RelationalNode, configs: PyDoughConfigs) -> RelationalNode:
def push_filters(node: RelationalNode, session: PyDoughSession) -> RelationalNode:
"""
Transpose filter conditions down as far as possible.

Expand All @@ -314,5 +314,5 @@ def push_filters(node: RelationalNode, configs: PyDoughConfigs) -> RelationalNod
the node or into one of its inputs, or possibly both if there are
multiple filters.
"""
pusher: FilterPushdownShuttle = FilterPushdownShuttle(configs)
pusher: FilterPushdownShuttle = FilterPushdownShuttle(session)
return node.accept_shuttle(pusher)
12 changes: 6 additions & 6 deletions pydough/conversion/hybrid_translator.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from collections.abc import Iterable

import pydough.pydough_operators as pydop
from pydough.configs import PyDoughConfigs
from pydough.configs import PyDoughSession
from pydough.database_connectors import DatabaseDialect
from pydough.errors import PyDoughSQLException
from pydough.metadata import (
Expand Down Expand Up @@ -80,8 +80,8 @@ class HybridTranslator:
Class used to translate PyDough QDAG nodes into the HybridTree structure.
"""

def __init__(self, configs: PyDoughConfigs, dialect: DatabaseDialect):
self.configs = configs
def __init__(self, session: PyDoughSession):
self.session = session
# An index used for creating fake column names for aliases
self.alias_counter: int = 0
# A stack where each element is a hybrid tree being derived
Expand All @@ -91,7 +91,7 @@ def __init__(self, configs: PyDoughConfigs, dialect: DatabaseDialect):
# If True, rewrites MEDIAN calls into an average of the 1-2 median rows
# or rewrites QUANTILE calls to select the first qualifying row,
# both derived from window functions, otherwise leaves as-is.
self.rewrite_median_quantile: bool = dialect not in {
self.rewrite_median_quantile: bool = session.database.dialect not in {
DatabaseDialect.ANSI,
DatabaseDialect.SNOWFLAKE,
}
Expand Down Expand Up @@ -481,8 +481,8 @@ def postprocess_agg_output(
# COUNT/NDISTINCT for left joins since the semantics of those functions
# never allow returning NULL.
if (
(agg_call.operator == pydop.SUM and self.configs.sum_default_zero)
or (agg_call.operator == pydop.AVG and self.configs.avg_default_zero)
(agg_call.operator == pydop.SUM and self.session.config.sum_default_zero)
or (agg_call.operator == pydop.AVG and self.session.config.avg_default_zero)
or (
agg_call.operator in (pydop.COUNT, pydop.NDISTINCT)
and joins_can_nullify
Expand Down
Loading