-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(snowflake)!: support transpilation of try_to_date from snowflake to duckdb #6731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(snowflake)!: support transpilation of try_to_date from snowflake to duckdb #6731
Conversation
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 5697 total, 5465 passed (pass rate: 95.9%), sqlglot version: sqlglot:transpile_try_to_date_snowflake_duckdb: 5697 total, 5465 passed (pass rate: 95.9%), sqlglot version: Difference: No change |
|
@fivetran-felixhuang the example you mentioned isn't fixed in this branch: |
@georgesittas yeah I am not sure how to fix it...somehow we would have to validate that the input conform to the format |
|
I'm a little confused, could you clarify what this PR is trying to solve? Is it supporting the transpilation of |
My goal was to support transpilation of TRY_TO_DATE for the string inputs (integer inputs are quite tricky). But when I was testing, I found the example of discrepancy I mentioned above (not validating input strings against default formats). I tried to check if there are ways to validate string inputs if using the default format. But I found more issues with TRY_TO_DATE/TO_DATE...It seems that Snowflake supports many date/time formats by default (https://docs.snowflake.com/en/sql-reference/date-time-input-output#label-date-time-input-output-supported-formats-for-auto-detection) For example, TRY_TO_DATE('17-DEC-1980'), TRY_TO_DATE('12/17/1980') and TRY_TO_DATE('Thu, 21 Dec 2000 16:01:07 +0200') all return date values in Snowflake, while TRY_CAST('17-DEC-1980' AS DATE), TRY_CAST('12/17/1980' AS DATE) and TRY_CAST('Thu, 21 Dec 2000 16:01:07 +0200' AS DATE) return NULL, since DuckDB only supports the YYYY-MM-DD format I am not sure what the next step is...if we want to validate the input strings against default formats by using duckDB's STRPTIME, we will need to support all of them |
|
It should also be noted that there are some formats that snowflake accepts but DuckDB doesn't.. All of these return dates in snowflake, while their transpiled versions return NULL, as DuckDB doesn't recognize them |
|
Is there a strftime counterpart for these format model elements in Snowflake? Seems like the unsupported ones are:
We can ignore these for the time being, if not. |
perhaps Snowflake's TO_TIMESTAMP does something similar to DuckDB's strftime |
14dea1c to
2d6ce49
Compare
|
source query: transpiled query: both return the same results |
2d6ce49 to
70f27ce
Compare
| "%:z": "%z", # In DuckDB %z can represent ±HH:MM, ±HHMM, or ±HH. | ||
| "%-z": "%z", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests that demonstrate DuckDB's %z is preserved when roundtripping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test
| INVERSE_TIME_MAPPING = { | ||
| "T": "T", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right, why are we mapping T to itself? This should happen automatically given how the trie works when working with time formats, right? What trigerred the need to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that TRY_TO_DATE('2024-01-31', 'AUTO') will be mapped to TRY_TO_DATE('2024-01-31', 'AU"T"O') without this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's add a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test
sqlglot/dialects/snowflake.py
Outdated
| "FF0": "%n", # %n is the internal representation of nanoseconds | ||
| "ff0": "%n", | ||
| "FF1": "%n", | ||
| "ff1": "%n", | ||
| "FF2": "%n", | ||
| "ff2": "%n", | ||
| "FF3": "%n", | ||
| "ff3": "%n", | ||
| "FF4": "%n", | ||
| "ff4": "%n", | ||
| "FF5": "%n", | ||
| "ff5": "%n", | ||
| "FF6": "%n", | ||
| "ff6": "%n", | ||
| "FF7": "%n", | ||
| "ff7": "%n", | ||
| "FF8": "%n", | ||
| "ff8": "%n", | ||
| "FF9": "%n", | ||
| "ff9": "%n", | ||
| "FF": "%n", | ||
| "ff": "%n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't look correct either. Doesn't the digit after FF designate precision? For example, isn't FF3 meant to be used for milliseconds?
If you map these to %n, then you break roundtrip because, e.g. FF3 maps back to ff which is the last key to be mapped to %n.
Were these tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ff works for ff[0-9] in Snowflake, should we preserve the precision digits (i.e FF1, FF5)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should preserve digits, because Snowflake's docs only indicate equivalency between FF and FF9:
Fractional seconds with precision 0 (seconds) to 9 (nanoseconds), e.g. FF, FF0, FF3, FF9. Specifying FF is equivalent to FF9 (nanoseconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| self.validate_identity( | ||
| """SELECT TO_TIMESTAMP('2025-01-16T14:45:30.123+0500', 'yyyy-mm-DD"T"hh24:mi:ss.ff3TZHTZM')""" | ||
| """SELECT TO_TIMESTAMP('2025-01-16T14:45:30.123+0500', 'yyyy-mm-DDThh24:mi:ss.fftzhtzm')""" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this test change?
tests/dialects/test_snowflake.py
Outdated
| "": "SELECT STR_TO_TIME('2025-01-16 14:45:30.123', '%Y-%m-%d %H:%M:%S.%f')", | ||
| "snowflake": "SELECT TO_TIMESTAMP('2025-01-16 14:45:30.123', 'yyyy-mm-DD hh24:mi:ss.ff6')", | ||
| "": "SELECT STR_TO_TIME('2025-01-16 14:45:30.123', '%Y-%m-%d %H:%M:%S.%n')", | ||
| "snowflake": "SELECT TO_TIMESTAMP('2025-01-16 14:45:30.123', 'yyyy-mm-DD hh24:mi:ss.ff')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's double-check whether FFN can be safely converted into ff in all cases.
| }, | ||
| ) | ||
| self.validate_all( | ||
| "TRY_TO_DATE('01-01-2000', 'MM-DD-YYYY')", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this test change?
tests/dialects/test_snowflake.py
Outdated
| ) | ||
|
|
||
| self.validate_all( | ||
| """TRY_TO_DATE('2013-04-28 20:57 +0700', 'YYYY-MM-DD HH24:MI TZHTZM')""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's not use triple quotes for these, it's unnecessary. Simple " quotes suffice for this and other tests.
70f27ce to
14d84eb
Compare
https://docs.snowflake.com/en/sql-reference/functions/try_to_date
We have the same problem as transpiling TO_DATE (https://docs.snowflake.com/en/sql-reference/functions/to_date). Thus if the input is an integer expression like '31536000000', this implementation can't handle it.
For inputs of string types (formatted date strings), we can use a combination of TRY_CAST and TRY_STRPTIME through setting the 'safe' argument.
Testing
source query:
transpiled query:
both queries produce the same results.
Discrepancy found
source query:
TRY_TO_DATE('2024-05-10', 'YYYY-MM-DD HH24:MI:SS')
transpiled query:
TRY_CAST('2024-05-10' AS DATE)
The source query returns NULL, while the transpiled query returns 2024-05-10 . Since the format is the default snowflake format (Snowflake.TIME_FORMAT), in tsordstodate_sql it won't be handled by the branch "if time_format and time_format not in (self.dialect.TIME_FORMAT, self.dialect.DATE_FORMAT)", and instead will go straight to casting without checking whether the input string matches the format string (in this case the HH24:MI:SS part is missing in the input).