Skip to content

Regression with compound field access and join schema #15549

@alexwilcoxson-rel

Description

@alexwilcoxson-rel

Describe the bug

I am seeing a regression in 46 with regards to queries that join two tables with similar columns and have compound field access. I did see some changes here recently related to sqlparser version update (#14255)

For context we see this using deltalake and merging tables that have structs .

To Reproduce

45

DataFusion CLI v45.0.0
> create table u as values({r: 'a', c: 1}), ({r: 'b', c: 2.3});
0 row(s) fetched. 
Elapsed 0.017 seconds.

> create table t as values({r: 'a', c: 1}), ({r: 'b', c: 2.3});
0 row(s) fetched. 
Elapsed 0.006 seconds.

> select * from t;
+----------------+
| column1        |
+----------------+
| {r: a, c: 1.0} |
| {r: b, c: 2.3} |
+----------------+
2 row(s) fetched. 
Elapsed 0.003 seconds.

> select * from t tee join u you on tee.column1['r'] = you.column1['r'];
+----------------+----------------+
| column1        | column1        |
+----------------+----------------+
| {r: a, c: 1.0} | {r: a, c: 1.0} |
| {r: b, c: 2.3} | {r: b, c: 2.3} |
+----------------+----------------+
2 row(s) fetched. 
Elapsed 0.009 seconds.

> select * from t join u on t.column1['r'] = u.column1['r'];
+----------------+----------------+
| column1        | column1        |
+----------------+----------------+
| {r: a, c: 1.0} | {r: a, c: 1.0} |
| {r: b, c: 2.3} | {r: b, c: 2.3} |
+----------------+----------------+
2 row(s) fetched. 
Elapsed 0.005 seconds.

46

DataFusion CLI v46.0.1
> create table u as values({r: 'a', c: 1}), ({r: 'b', c: 2.3});
0 row(s) fetched. 
Elapsed 0.009 seconds.

> create table t as values({r: 'a', c: 1}), ({r: 'b', c: 2.3});
0 row(s) fetched. 
Elapsed 0.005 seconds.

> select * from t;
+----------------+
| column1        |
+----------------+
| {r: a, c: 1.0} |
| {r: b, c: 2.3} |
+----------------+
2 row(s) fetched. 
Elapsed 0.003 seconds.

> select * from t tee join u you on tee.column1['r'] = you.column1['r'];
Schema error: No field named tee. Valid fields are tee.column1, you.column1.
> select * from t join u on t.column1['r'] = u.column1['r'];
Schema error: No field named t. Valid fields are t.column1, u.column1.

Expected behavior

regression is fixed

Additional context

Some other queries that still do work on 46:

using field access for struct column

> select * from t where t.column1['r'] is not null;
+----------------+
| column1        |
+----------------+
| {r: a, c: 1.0} |
| {r: b, c: 2.3} |
+----------------+
2 row(s) fetched. 
Elapsed 0.004 seconds.

> select * from t where t.column1['r'] = 'a';
+----------------+
| column1        |
+----------------+
| {r: a, c: 1.0} |
+----------------+
1 row(s) fetched. 
Elapsed 0.004 seconds.

simple joins

> create table x (a int) as values (1);
0 row(s) fetched. 
Elapsed 0.008 seconds.

> create table y (a int) as values (1);
0 row(s) fetched. 
Elapsed 0.007 seconds.

> select * from x;
+---+
| a |
+---+
| 1 |
+---+
1 row(s) fetched. 
Elapsed 0.003 seconds.

> select * from x join y on x.a = y.a;
+---+---+
| a | a |
+---+---+
| 1 | 1 |
+---+---+
1 row(s) fetched. 
Elapsed 0.007 seconds.

> select * from x ex join y why on ex.a = why.a;
+---+---+
| a | a |
+---+---+
| 1 | 1 |
+---+---+
1 row(s) fetched. 
Elapsed 0.005 seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions