Skip to content

Conversation

@mccanne
Copy link
Collaborator

@mccanne mccanne commented Oct 10, 2025

This commit adds support for referring to tables in a SQL expression, resulting in a record representing the table row (as in duckdb and somewhat similarly in postgres). We also added support for referencing "this" in a SQL expression, which refers to the input relation in SELECT and WHERE expressions, the output relation in HAVING clauses, and the input relation for arguments (and where clauses) of agg functions in HAVING clauses.

This new logic causes an error for table references of dynamic schemas. This is to avoid a situation where "select T from T" refers to the table in a dynamic schema, then when a schema shows up for that same data, the query compiles differently to a field reference of T inside T (following postgres and duckdb scope precedence). When it is desied to refer to the table of a dynamic source, the special value "this" can be used instead. In general, query semantics should be identical when types/schemas are known and unknown; if this isn't the case anywhere here, then it's a design bug.

We also added an escape valve for referring to a SQL column named "this", which is simply denoted with double quotes.

Finally, these changes exposed a problem in the as-name inference algorithm, where the internal DAG paths would show up, so we updated the inference code to strictly use the AST instead of a mix of the AST and sem tree.

Fixes #6241

Where: t.semExprNullable(call.Where),
Node: n,
Name: name,
Expr: e,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
Expr: e,
Expr: t.semExprNullable(arg),

@philrz
Copy link
Contributor

philrz commented Oct 10, 2025

Here's a failure that happens on the branch that runs ok on tip of main. Unpack the attached data.tgz, then run:

$ super -version
Version: 321566076

$ super -c "SELECT tab1.col1 FROM tab0 (FORMAT parquet), tab2 (FORMAT parquet) AS cor0 CROSS JOIN tab1, tab2 AS cor1;"
"tab1": ambiguous column reference at line 1, column 8:
SELECT tab1.col1 FROM tab0 (FORMAT parquet), tab2 (FORMAT parquet) AS cor0 CROSS JOIN tab1, tab2 AS cor1;
       ~~~~~~~~~

@philrz
Copy link
Contributor

philrz commented Oct 10, 2025

Here's one that runs on the branch but produces only error outputs. It uses the same test data as in #6292 (comment).

$ super -version
Version: 321566076

$ super -c "SELECT tab2.col1 AS col2 FROM tab1 (FORMAT parquet), tab1 (FORMAT parquet) AS cor0 CROSS JOIN tab2;"
{col2:error("missing")}
{col2:error("missing")}
{col2:error("missing")}
{col2:error("missing")}
...

Whereas on tip of main:

$ super -c "SELECT tab2.col1 AS col2 FROM tab1 (FORMAT parquet), tab1 (FORMAT parquet) AS cor0 CROSS JOIN tab2;"
{col2:31}
{col2:31}
{col2:31}
{col2:17}
...

This commit adds support for referring to tables in a SQL expression,
resulting in a record representing the table row (as in duckdb and
somewhat similarly in postgres).  We also added support for referencing
"this" in a SQL expression, which refers to the input relation
in SELECT and WHERE expressions, the output relation in HAVING clauses,
and the input relation for arguments (and where clauses) of agg functions
in HAVING clauses.

This new logic causes an error for table references of dynamic schemas.
This is to avoid a situation where "select T from T" refers to the table
in a dynamic schema, then when a schema shows up for that same data, the
query compiles differently to a field reference of T inside T (following
postgres and duckdb scope precedence).  When it is desied to refer to
the table of a dynamic source, the special value "this" can be used instead.
In general, query semantics should be identical when types/schemas are known
and unknown; if this isn't the case anywhere here, then it's a design bug.

We also added an escape valve for referring to a SQL column named "this",
which is simply denoted with double quotes.

Finally, these changes exposed a problem in the as-name inference algorithm,
where the internal DAG paths would show up, so we updated the inference code
to strictly use the AST instead of a mix of the AST and sem tree.
@philrz
Copy link
Contributor

philrz commented Oct 10, 2025

@mccanne and @nwt spotted that the failures in my two comments above could both be explained by the absence of the (FORMAT parquet) on some of the table references that resulted in getting dynamic schemas rather than static. The fact they'd been working correctly on main looks like it was a happy accident. Therefore I'm updating my sqllogictest scripts to handle those additional (FORMAT parquet) corner cases and can then retest in full to sniff out if there's any failures hiding behind those with other root causes. In the meantime, no reason to hold off on getting this PR merged. 👍

@mccanne mccanne merged commit 7c4d960 into main Oct 10, 2025
3 checks passed
@mccanne mccanne deleted the sql-table-ref branch October 10, 2025 21:00
@philrz
Copy link
Contributor

philrz commented Oct 12, 2025

Following up on the prior comment, after adding the additional (FORMAT parquet) where needed (brimdata/sqllogic-ztests@d836ae9) a run of the 1+ million sqllogictest queries that had previously run successfully at super commit 0dd0e87 before this PR merged also ran successfully at commit cad1342 after this PR merged. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

What happens when a user tries to use "this" in relational scoping

4 participants