-
Notifications
You must be signed in to change notification settings - Fork 1k
Feature: support transpilation of GREATEST from BigQuery to DuckDB #6361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e9cfb7c to
f58c999
Compare
|
Can we also add tests for this ? |
@geooo109 added some tests, please let me know if they are sufficient |
4ea1569 to
05a38f6
Compare
|
is greatest null / nan handling preserved for bigquery -> duckdb? perhaps least should be supported in a follow-up pr |
|
So, @fivetran-felixhuang the |
For GREATEST, the NaN handling is the same for both the BigQuery and DuckDB, but NULL handling is not. This PR is to address the NULL handling issue. I will work on the NaN handling issue for LEAST in a different PR |
georgesittas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one last comment, I don't think we need _build_greatest for DuckDB.
|
|
||
| FUNCTIONS = { | ||
| **parser.Parser.FUNCTIONS, | ||
| "GREATEST": _build_greatest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this right? It should be parsed just fine out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do. When I removed, the arguments were not parsed correctly for some reason....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's due to is_var_len_args. Without doing what I did in parser.py, we'd break the AST by spilling args[2:] to null_if_any_null, which is not the right representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
|
@fivetran-felixhuang let's not worry about nan for now, I've reached out to the duckdb community to see if this is a bug instead of a feature. We can revisit later. |
For the GREATEST function
In BigQuery:
- if any argument is NULL, the result is NULL
In DuckDB:
- NULLs are ignored, returns greatest non-NULL value.
The workaround here is to make sure that the transpiled DuckDB query returns NULL is any argument is NULL
BigQuery: GREATEST(1, 2, NULL, 3) -> DuckDB: CASE WHEN 1 IS NULL OR 2 IS NULL OR NULL IS NULL OR 3 IS NULL THEN NULL ELSE GREATEST(1, 2, NULL, 3) END