Skip to content

SNOW-1764178: Adding multiline/tabbing to generated SQL statements#3378

Merged
sfc-gh-daviwang merged 21 commits intomainfrom
daviwang-SNOW-1764178-multiline-queries
May 30, 2025
Merged

SNOW-1764178: Adding multiline/tabbing to generated SQL statements#3378
sfc-gh-daviwang merged 21 commits intomainfrom
daviwang-SNOW-1764178-multiline-queries

Conversation

@sfc-gh-daviwang
Copy link
Contributor

@sfc-gh-daviwang sfc-gh-daviwang commented May 19, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1764178

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

Added NEW_LINE and TAB character to analyzer_utils.py so generated SQL includes newlines and SQL formatting. We do not have nested tabbing for child queries b/c doing so would make us unable to include "\n" in identifiers. We only use tabbing to indent column names, group bys, and order bys. This setting is protected by the parameter PYTHON_SNOWPARK_GENERATE_MULTILINE_QUERIES in https://github.com/snowflakedb/snowflake/pull/294205, and is automatically enabled due to this being a low risk change. This behavior is disabled for snowpark pandas tests. Also added additional unit tests to verify formatting.

An example of the new behavior:

SELECT  * 
 FROM (
( SELECT 
    "ID" AS "l_0002_ID", 
    "NAME" AS "NAME"
 FROM (
 SELECT $1 AS "ID", $2 AS "NAME" FROM  VALUES (1 :: INT, 'A' :: STRING), (2 :: INT, 'B' :: STRING)
)) AS SNOWPARK_LEFT 
INNER JOIN 
( SELECT 
    "ID" AS "r_0003_ID", 
    "VALUE" AS "VALUE"
 FROM (
 SELECT $1 AS "ID", $2 AS "VALUE" FROM  VALUES (1 :: INT, 10 :: INT), (2 :: INT, 20 :: INT)
)) AS SNOWPARK_RIGHT
 ON ("l_0002_ID" = "r_0003_ID")
)

Old behavior (all one line):

SELECT  *  FROM (( SELECT "ID" AS "l_0000_ID", "NAME" AS "NAME" FROM ( SELECT "ID", "NAME" FROM ( SELECT $1 AS "ID", $2 AS "NAME" FROM  VALUES (1 :: INT, 'A' :: STRING), (2 :: INT, 'B' :: STRING)))) AS SNOWPARK_LEFT INNER JOIN ( SELECT "ID" AS "r_0001_ID", "VALUE" AS "VALUE" FROM ( SELECT "ID", "VALUE" FROM ( SELECT $1 AS "ID", $2 AS "VALUE" FROM  VALUES (1 :: INT, 10 :: INT), (2 :: INT, 20 :: INT)))) AS SNOWPARK_RIGHT ON ("l_0000_ID" = "r_0001_ID"))```


@github-actions
Copy link

github-actions bot commented May 19, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@sfc-gh-snowflakedb-snyk-sa
Copy link

sfc-gh-snowflakedb-snyk-sa commented May 19, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@sfc-gh-daviwang
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@sfc-gh-daviwang sfc-gh-daviwang added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label May 19, 2025
@sfc-gh-daviwang sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from d3003f6 to 06361b3 Compare May 21, 2025 18:47
@github-actions github-actions bot added local testing Local Testing issues/PRs snowpark-pandas labels May 21, 2025
@sfc-gh-daviwang sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from c2c05b3 to 283c897 Compare May 21, 2025 23:52
@sfc-gh-daviwang sfc-gh-daviwang removed local testing Local Testing issues/PRs snowpark-pandas labels May 21, 2025
@sfc-gh-daviwang sfc-gh-daviwang added NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs and removed snowpark-pandas labels May 22, 2025
@sfc-gh-daviwang sfc-gh-daviwang marked this pull request as ready for review May 22, 2025 20:24
@sfc-gh-daviwang sfc-gh-daviwang requested review from a team as code owners May 22, 2025 20:24
@sfc-gh-daviwang sfc-gh-daviwang self-assigned this May 22, 2025
Copy link
Contributor

@sfc-gh-joshi sfc-gh-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's the intended audience for this change? I'd prefer is this could be guarded by some kind of configuration flag, as I'm worried this could make logging noisier than it already is, especially in Snowpark pandas, where queries can get very large very fast. It already takes browsers a very long time to load the result page of a CI run, and having more newlines would likely exacerbate that.

We do not have nested tabbing for child queries b/c doing so would make us unable to include "\n" in identifiers.

Why exactly is this a limitation? Nested indentation would help readability a lot.

@sfc-gh-daviwang
Copy link
Contributor Author

sfc-gh-daviwang commented May 22, 2025

Who's the intended audience for this change? I'd prefer is this could be guarded by some kind of configuration flag, as I'm worried this could make logging noisier than it already is, especially in Snowpark pandas, where queries can get very large very fast. It already takes browsers a very long time to load the result page of a CI run, and having more newlines would likely exacerbate that.

Some clients want this change so their queries are more readable. Also this is part of a series of changes where we want to attach errors to specific lines of SQL, which isn't possible if the entire SQL query is on one line. Adding a configuration flag makes sense, I put a PR to parameter protect this feature, and will adjust the code accordingly.

Why exactly is this a limitation? Nested indentation would help readability a lot.

The main issue is that there isn't an easy way to differentiate between a "\n" character that a customer input as opposed to one we created for formatting purposes. Without an easy way to differentiate these, our indentation can break current behavior (eg. test_drop_columns_special_names in test_dataframe.py). I've discussed this a bunch offline already with @sfc-gh-aalam and we came to the conclusion that this approach makes the most sense. Feel free to reach out on slack if you would like to discuss.

@sfc-gh-daviwang sfc-gh-daviwang changed the title Adding multiline/tabbing to generated SQL statements SNOW-1764178: Adding multiline/tabbing to generated SQL statements May 23, 2025
@sfc-gh-daviwang sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch 2 times, most recently from 9785011 to fbc391e Compare May 27, 2025 17:43
@sfc-gh-daviwang sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from bbd5ef1 to 630773a Compare May 30, 2025 17:03
@sfc-gh-daviwang sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from 630773a to ad1c4cc Compare May 30, 2025 17:04
@sfc-gh-daviwang sfc-gh-daviwang merged commit ed0a2ac into main May 30, 2025
40 checks passed
@sfc-gh-daviwang sfc-gh-daviwang deleted the daviwang-SNOW-1764178-multiline-queries branch May 30, 2025 23:15
@github-actions github-actions bot locked and limited conversation to collaborators May 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants