Skip to content

SQL: Remaining "no corresponding grouping element for non-aggregate" errors #6410

@philrz

Description

@philrz

The fix in #6388 addressed most of the ~176k incidents of the "no corresponding grouping element for non-aggregate" errors from sqllogictests originally tracked in #5973. However, now that a full re-run of all the sqllogictests has been completed with the benefit of that fix there's still 979 queries that produce that error. It's unclear at first glance if these are just corner cases that can be addressed with the same approach used in #6388 or if they present a unique challenge.

Details

Repro is with super commit a814e1b.

The attached bundle groupby_slt_good_8_repro.tgz contains 75 such queries from the 8th "random groupby" sqllogictest set along with test data. As with all the "random" queries, these are "fuzzy" in nature and therefore look unrealistic in some ways. Just run the following and it will show you each query producing the error:

$ super -version
Version: a814e1b66

$ sh repro.sh 
no corresponding grouping element for non-aggregate "+tab0.col2" at line 1, column 8:
SELECT + tab0.col2, tab0.col2 FROM tab0 GROUP BY col2;
       ~~~~~~~~~~~
no corresponding grouping element for non-aggregate "col2_1" at line 1, column 12:
SELECT ALL col2, col2 AS col2 FROM tab2 GROUP BY col2;
           ~~~~
...

There's plenty of repetition in the style of these queries so I won't enumerate them all. But to my eyes, these first two actually seem representative of what may be two buckets of queries:

Simple math putting + / - before column names

That first one shown above is an example of this, and I'm optimistic that it may indeed just be a variation of the kind of queries shown in the original #5973 repro. For example, DuckDB happens to give the +-led column a unique name in the output.

$ duckdb -c "SELECT + tab0.col2, tab0.col2 FROM tab0.parquet GROUP BY col2;"
┌──────────────┬───────┐
│ +(tab0.col2) │ col2  │
│    int64     │ int64 │
├──────────────┼───────┤
│           24 │    24 │
│           38 │    38 │
│           79 │    79 │
└──────────────┴───────┘

What Would be Duplicate Columns

The second one shown above seems like it falls into this bucket, evidenced by the trailing _1 in the error message. Indeed, DuckDB uses the same column name twice in the output, and I know this already presents a challenge for us (#5977). I suspect that if we let the _1 column appear in the output that would be ok (at least for now) but it would be an improvement if we didn't reject the query.

$ duckdb -c "SELECT ALL col2, col2 AS col2 FROM tab2.parquet GROUP BY col2;"
┌───────┬───────┐
│ col2  │ col2  │
│ int64 │ int64 │
├───────┼───────┤
│    58 │    58 │
│    79 │    79 │
│    87 │    87 │
└───────┴───────┘

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions