Skip to content

Conversation

samuelcolvin
Copy link
Contributor

I'm not sure this is exactly the right solution, but it's definitely closer to correct for PostgreSQL.

@coveralls
Copy link

coveralls commented Aug 22, 2024

Pull Request Test Coverage Report for Build 10510947525

Details

  • 37 of 37 (100.0%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 89.151%

Totals Coverage Status
Change from base Build 10418247467: 0.01%
Covered Lines: 28326
Relevant Lines: 31773

💛 - Coveralls

Copy link
Contributor Author

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, datafusion has lots of hacky logic dedicated to working around this issue.

We could probably remove all that hacky logic if we fixed this properly in sqlparser-rs.

pub struct PostgreSqlDialect {}

// matches <https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-PRECEDENCE>
const INTERVAL_COLON_PREC: u8 = 150;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the point of this PR — interval binds more tightly than anything else in postgres:

select pg_typeof(INTERVAL '1'::text)
--> text

E.g. postgres evaluates that as (interval '1')::text, not as interval ('1'::text) as was the in sqlparser-rs until this PR.

@samuelcolvin samuelcolvin marked this pull request as draft August 22, 2024 17:22
@samuelcolvin
Copy link
Contributor Author

samuelcolvin commented Aug 22, 2024

Okay, I've done some more research, and basically there are three groups of databases with pretty distinct ways of parsing interval expressions:

1. PostgreSQL-like

Expect INTERVAL expressions to be in the form

SELECT INTERVAL '1 second'

These bind interval to the literal tightly.

There's pretty good support for parsing these interval expressions in arrow-rs (added in apache/arrow-rs#6211).

Databases: Postgres, DuckDB, Redshift, Snowflake

2. MySQL-like

Expect INTERVAL expressions to be in the form

SELECT INTERVAL 1 SECOND

they also support INTERVAL 1 + 1 SECOND, so I presume the logic is "keep parsing literals until you meet one of the following units", there's a good list of these units here.

Databases: MySQL, BigQuery, Databricks, Hive, ClickHouse

Sqlite and MsSql

Don't seem to support interval syntax at all.


Critically, from what I can tell, type 1 and type 2 don't support the other syntax at all. IF YOU'RE AWARE OF AN EXCEPTION TO THIS RULE, PLEASE LET ME KNOW.

Currently sqlparser-rs seems to partially support both syntaxes, with a preference for the MySQL style. Datafusion then tries to hack around the problems with this.

I would therefore suggest we change the logic to have two separate functions for the two INTERVAL approaches.

@alamb if you're happy with that I can try to implement this.

@samuelcolvin
Copy link
Contributor Author

Okay, this is wrong, since syntax like

SELECT INTERVAL '1' DAY;

Will be parsed incorrectly. I'm also somewhat uncomfortable that tests aren't currently failing.

Having looked through this for about an hour, I think the current implementation is pretty messy, but the solution definitely isn't as simple as I had hoped.

I think the only solution is to rip lots of the current hacks out, and reimplement the interval parsing from scratch in a saner way.

@samuelcolvin
Copy link
Contributor Author

replaced by #1398.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants