SNOW-1764178: Adding multiline/tabbing to generated SQL statements#3378
SNOW-1764178: Adding multiline/tabbing to generated SQL statements#3378sfc-gh-daviwang merged 21 commits intomainfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
|
I have read the CLA Document and I hereby sign the CLA |
d3003f6 to
06361b3
Compare
c2c05b3 to
283c897
Compare
sfc-gh-joshi
left a comment
There was a problem hiding this comment.
Who's the intended audience for this change? I'd prefer is this could be guarded by some kind of configuration flag, as I'm worried this could make logging noisier than it already is, especially in Snowpark pandas, where queries can get very large very fast. It already takes browsers a very long time to load the result page of a CI run, and having more newlines would likely exacerbate that.
We do not have nested tabbing for child queries b/c doing so would make us unable to include "\n" in identifiers.
Why exactly is this a limitation? Nested indentation would help readability a lot.
Some clients want this change so their queries are more readable. Also this is part of a series of changes where we want to attach errors to specific lines of SQL, which isn't possible if the entire SQL query is on one line. Adding a configuration flag makes sense, I put a PR to parameter protect this feature, and will adjust the code accordingly.
The main issue is that there isn't an easy way to differentiate between a "\n" character that a customer input as opposed to one we created for formatting purposes. Without an easy way to differentiate these, our indentation can break current behavior (eg. test_drop_columns_special_names in test_dataframe.py). I've discussed this a bunch offline already with @sfc-gh-aalam and we came to the conclusion that this approach makes the most sense. Feel free to reach out on slack if you would like to discuss. |
9785011 to
fbc391e
Compare
bbd5ef1 to
630773a
Compare
630773a to
ad1c4cc
Compare
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1764178
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Added NEW_LINE and TAB character to analyzer_utils.py so generated SQL includes newlines and SQL formatting. We do not have nested tabbing for child queries b/c doing so would make us unable to include "\n" in identifiers. We only use tabbing to indent column names, group bys, and order bys. This setting is protected by the parameter PYTHON_SNOWPARK_GENERATE_MULTILINE_QUERIES in https://github.com/snowflakedb/snowflake/pull/294205, and is automatically enabled due to this being a low risk change. This behavior is disabled for snowpark pandas tests. Also added additional unit tests to verify formatting.
An example of the new behavior:
Old behavior (all one line):