SNOW-1764178: Adding multiline/tabbing to generated SQL statements by sfc-gh-daviwang · Pull Request #3378 · snowflakedb/snowpark-python

sfc-gh-daviwang · 2025-05-19T23:04:29Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1764178
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
- If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.

Added NEW_LINE and TAB character to analyzer_utils.py so generated SQL includes newlines and SQL formatting. We do not have nested tabbing for child queries b/c doing so would make us unable to include "\n" in identifiers. We only use tabbing to indent column names, group bys, and order bys. This setting is protected by the parameter PYTHON_SNOWPARK_GENERATE_MULTILINE_QUERIES in https://github.com/snowflakedb/snowflake/pull/294205, and is automatically enabled due to this being a low risk change. This behavior is disabled for snowpark pandas tests. Also added additional unit tests to verify formatting.

An example of the new behavior:

SELECT  * 
 FROM (
( SELECT 
    "ID" AS "l_0002_ID", 
    "NAME" AS "NAME"
 FROM (
 SELECT $1 AS "ID", $2 AS "NAME" FROM  VALUES (1 :: INT, 'A' :: STRING), (2 :: INT, 'B' :: STRING)
)) AS SNOWPARK_LEFT 
INNER JOIN 
( SELECT 
    "ID" AS "r_0003_ID", 
    "VALUE" AS "VALUE"
 FROM (
 SELECT $1 AS "ID", $2 AS "VALUE" FROM  VALUES (1 :: INT, 10 :: INT), (2 :: INT, 20 :: INT)
)) AS SNOWPARK_RIGHT
 ON ("l_0002_ID" = "r_0003_ID")
)

Old behavior (all one line):

SELECT  *  FROM (( SELECT "ID" AS "l_0000_ID", "NAME" AS "NAME" FROM ( SELECT "ID", "NAME" FROM ( SELECT $1 AS "ID", $2 AS "NAME" FROM  VALUES (1 :: INT, 'A' :: STRING), (2 :: INT, 'B' :: STRING)))) AS SNOWPARK_LEFT INNER JOIN ( SELECT "ID" AS "r_0001_ID", "VALUE" AS "VALUE" FROM ( SELECT "ID", "VALUE" FROM ( SELECT $1 AS "ID", $2 AS "VALUE" FROM  VALUES (1 :: INT, 10 :: INT), (2 :: INT, 20 :: INT)))) AS SNOWPARK_RIGHT ON ("l_0000_ID" = "r_0001_ID"))```

github-actions · 2025-05-19T23:04:41Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

sfc-gh-snowflakedb-snyk-sa · 2025-05-19T23:04:42Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

✅ license/snyk check is complete. No issues have been found. (View Details)

sfc-gh-daviwang · 2025-05-19T23:05:58Z

I have read the CLA Document and I hereby sign the CLA

sfc-gh-joshi

Who's the intended audience for this change? I'd prefer is this could be guarded by some kind of configuration flag, as I'm worried this could make logging noisier than it already is, especially in Snowpark pandas, where queries can get very large very fast. It already takes browsers a very long time to load the result page of a CI run, and having more newlines would likely exacerbate that.

We do not have nested tabbing for child queries b/c doing so would make us unable to include "\n" in identifiers.

Why exactly is this a limitation? Nested indentation would help readability a lot.

sfc-gh-daviwang · 2025-05-22T22:38:02Z

Who's the intended audience for this change? I'd prefer is this could be guarded by some kind of configuration flag, as I'm worried this could make logging noisier than it already is, especially in Snowpark pandas, where queries can get very large very fast. It already takes browsers a very long time to load the result page of a CI run, and having more newlines would likely exacerbate that.

Some clients want this change so their queries are more readable. Also this is part of a series of changes where we want to attach errors to specific lines of SQL, which isn't possible if the entire SQL query is on one line. Adding a configuration flag makes sense, I put a PR to parameter protect this feature, and will adjust the code accordingly.

Why exactly is this a limitation? Nested indentation would help readability a lot.

The main issue is that there isn't an easy way to differentiate between a "\n" character that a customer input as opposed to one we created for formatting purposes. Without an easy way to differentiate these, our indentation can break current behavior (eg. test_drop_columns_special_names in test_dataframe.py). I've discussed this a bunch offline already with @sfc-gh-aalam and we came to the conclusion that this approach makes the most sense. Feel free to reach out on slack if you would like to discuss.

src/snowflake/snowpark/_internal/analyzer/analyzer_utils.py

src/snowflake/snowpark/_internal/analyzer/select_statement.py

tests/unit/test_analyzer_util_suite.py

tests/integ/test_simplifier_suite.py

src/snowflake/snowpark/session.py

tests/integ/modin/conftest.py

tests/integ/scala/test_datatype_suite.py

tests/integ/utils/sql_counter.py

src/snowflake/snowpark/_internal/utils.py

tests/integ/scala/test_snowflake_plan_suite.py

tests/integ/utils/sql_counter.py

src/snowflake/snowpark/session.py

sfc-gh-daviwang requested review from sfc-gh-aalam, sfc-gh-jrose and sfc-gh-yuwang May 19, 2025 23:04

sfc-gh-daviwang added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label May 19, 2025

sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from d3003f6 to 06361b3 Compare May 21, 2025 18:47

github-actions bot added local testing Local Testing issues/PRs snowpark-pandas labels May 21, 2025

sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from c2c05b3 to 283c897 Compare May 21, 2025 23:52

sfc-gh-daviwang removed local testing Local Testing issues/PRs snowpark-pandas labels May 21, 2025

github-actions bot added the snowpark-pandas label May 22, 2025

sfc-gh-daviwang added NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs and removed snowpark-pandas labels May 22, 2025

github-actions bot added the snowpark-pandas label May 22, 2025

sfc-gh-daviwang marked this pull request as ready for review May 22, 2025 20:24

sfc-gh-daviwang requested review from a team as code owners May 22, 2025 20:24

sfc-gh-daviwang requested review from sfc-gh-mvashishtha and sfc-gh-vrpatel May 22, 2025 20:24

sfc-gh-daviwang self-assigned this May 22, 2025

sfc-gh-joshi requested changes May 22, 2025

View reviewed changes

sfc-gh-aalam reviewed May 23, 2025

View reviewed changes

sfc-gh-daviwang changed the title ~~Adding multiline/tabbing to generated SQL statements~~ SNOW-1764178: Adding multiline/tabbing to generated SQL statements May 23, 2025

sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch 2 times, most recently from 9785011 to fbc391e Compare May 27, 2025 17:43

sfc-gh-aalam reviewed May 28, 2025

View reviewed changes

sfc-gh-aalam approved these changes May 28, 2025

View reviewed changes

sfc-gh-aalam reviewed May 28, 2025

View reviewed changes

src/snowflake/snowpark/session.py Outdated Show resolved Hide resolved

sfc-gh-jrose approved these changes May 29, 2025

View reviewed changes

sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from bbd5ef1 to 630773a Compare May 30, 2025 17:03

sfc-gh-daviwang added 21 commits May 30, 2025 10:04

Adding multiline/tabbing to generated SQL statements

31cd67b

Added additional formatting for columns and fixing tests

b35631b

Fixing tests

34f4d52

Changing SQL counter to normalize queries with tabs and newlines

414d0d1

Adjusting unit tests and union counter

a4f5172

Fixing unit tests that are broken by linter

6aabf56

Removing tabbing for child queries

ab5a02d

Fixing small tests

f632032

Fixing one more test

6a1c99f

Parametrizing generating multiline queries

7b55ebf

Fixing review comments

80ea110

fixing small test

a0dafe4

Rebasing and fixing test

2346cf8

Modifying integration tests to not depend on newlines/tabs

9055c9a

Fixing test

07056dc

Fixing review comments

7df2743

Fixing review comments

1b17cf1

Adding regex normalizer

5352791

Fixing sql counter

1a2718a

fixing failing modin tests

a11bde5

nit

ad1c4cc

sfc-gh-daviwang force-pushed the daviwang-SNOW-1764178-multiline-queries branch from 630773a to ad1c4cc Compare May 30, 2025 17:04

sfc-gh-daviwang merged commit ed0a2ac into main May 30, 2025
40 checks passed

sfc-gh-daviwang deleted the daviwang-SNOW-1764178-multiline-queries branch May 30, 2025 23:15

github-actions bot locked and limited conversation to collaborators May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1764178: Adding multiline/tabbing to generated SQL statements#3378

SNOW-1764178: Adding multiline/tabbing to generated SQL statements#3378
sfc-gh-daviwang merged 21 commits intomainfrom
daviwang-SNOW-1764178-multiline-queries

sfc-gh-daviwang commented May 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 19, 2025 •

edited

Loading

Uh oh!

sfc-gh-snowflakedb-snyk-sa commented May 19, 2025 •

edited

Loading

Uh oh!

sfc-gh-daviwang commented May 19, 2025

Uh oh!

sfc-gh-joshi left a comment

Uh oh!

sfc-gh-daviwang commented May 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sfc-gh-daviwang commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfc-gh-snowflakedb-snyk-sa commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

sfc-gh-daviwang commented May 19, 2025

Uh oh!

sfc-gh-joshi left a comment

Choose a reason for hiding this comment

Uh oh!

sfc-gh-daviwang commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sfc-gh-daviwang commented May 19, 2025 •

edited

Loading

github-actions bot commented May 19, 2025 •

edited

Loading

sfc-gh-snowflakedb-snyk-sa commented May 19, 2025 •

edited

Loading

sfc-gh-daviwang commented May 22, 2025 •

edited

Loading