Skip to content

SNOW-2012666: Lazy import of Pandas in snowpark-python#3348

Closed
sfc-gh-nqi wants to merge 2443 commits intomainfrom
nqi_SNOW_2012666_lazy_load
Closed

SNOW-2012666: Lazy import of Pandas in snowpark-python#3348
sfc-gh-nqi wants to merge 2443 commits intomainfrom
nqi_SNOW_2012666_lazy_load

Conversation

@sfc-gh-nqi
Copy link
Copy Markdown

@sfc-gh-nqi sfc-gh-nqi commented May 8, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2012666

  • Added thread safe lazy loading utilities in lazy_import_utils.py
  • Reduced unnecessary imports and optimize memory usage by loading dependencies only when needed
  • Fixed type checking with TYPE_CHECKING to avoid runtime imports
  1. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  2. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

sfc-gh-aling and others added 30 commits March 25, 2025 18:18
…3195)

Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
…itHub action to validate it (#3173)

1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

   [SNOW-1923653](https://snowflakecomputing.atlassian.net/browse/SNOW-1923653)

2. Fill out the following pre-review checklist:

   - [x] I am adding a new automated test(s) to verify correctness of my new code
      - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
   - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes.
   - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development)
   - [x] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: [AST Support Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#ast-abstract-syntax-tree-support-in-snowpark)

3. Please describe how your code solves the related issue.

   This PR creates an AST acknowledgement checkbox in our repository's PR template along with a GitHub action to verify its usage. This is in the same vein as the thread safety acknowledgement checkbox and GitHub Action.

[SNOW-1923653]: https://snowflakecomputing.atlassian.net/browse/SNOW-1923653?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…s for stateless batches, and DataframeWriter,DataframeReader new APIs (#3180)

1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

   This PR includes all changes required to complete the following tickets.

   - [SNOW-1882075](https://snowflakecomputing.atlassian.net/browse/SNOW-1882075): AST for DataFrameWriter.insertInto/insert_into
   - [SNOW-1990301](https://snowflakecomputing.atlassian.net/browse/SNOW-1990301): AST for DataframerWriter.{format,save}
   - [SNOW-1990302](https://snowflakecomputing.atlassian.net/browse/SNOW-1990302): AST for DataframeReader.{format,load}
   - [SNOW-1821290](https://snowflakecomputing.atlassian.net/browse/SNOW-1821290): Finalize the AST schema v1

   It also includes changes in support of [SNOW-1997136](https://snowflakecomputing.atlassian.net/browse/SNOW-1997136): Ensure AST batch is stateless.

2. Fill out the following pre-review checklist:

   - [x] I am adding a new automated test(s) to verify correctness of my new code
      - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
   - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes.
   - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development)
   - [x] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support.

3. Please describe how your code solves the related issue.

   AST generation for `DataframeReader` APIs which return the same `DataframeReader` instance have been updated to follow the same pattern as `DataframeWriter`. Since all such methods are setters, it is only necessary to have the last set attributes in the AST sent to the serveer. This is in support of the new `format` API in both Snowpark classes.

    AST generation was added for the new APIs `DataframeReader.{format,load}`, and `DataframeWriter.{format,read,insert_into}` along with expectation test updates.

    Changes were also made in support of AST batch statelessness in the IR, by changing fields typed `VarId` to be `Expr` typed. This allows more consistent usage of `_set_ast_ref` as a helper method across all Snowpark APIs.

    This PR is accompanied by a [server-side PR](snowflakedb/snowflake#270350).

[SNOW-1882075]: https://snowflakecomputing.atlassian.net/browse/SNOW-1882075?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[SNOW-1990301]: https://snowflakecomputing.atlassian.net/browse/SNOW-1990301?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[SNOW-1990302]: https://snowflakecomputing.atlassian.net/browse/SNOW-1990302?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[SNOW-1821290]: https://snowflakecomputing.atlassian.net/browse/SNOW-1821290?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

[SNOW-1997136]: https://snowflakecomputing.atlassian.net/browse/SNOW-1997136?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
…arId to BindId (#3183)

1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

   [SNOW-1821290](https://snowflakecomputing.atlassian.net/browse/SNOW-1821290): Finalize AST schema v1 by refactoring `Assign` to `Bind`, `VarId` to `BindId`

2. Fill out the following pre-review checklist:

   - [x] I am adding a new automated test(s) to verify correctness of my new code
      - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
   - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes.
   - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development)
   - [x] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support.

3. Please describe how your code solves the related issue.

   To reflect the new semantics, `AstBatch.assign` was refactored to `AstBatch.bind` and its usage was updated across the client. Usage of `var_id` fields was replaced with `bind_id` fields as well.

   `AstBatch._init_batch` now provides the UUID to the newly created `Request` message  in its binary 16 byte representation provided by `UUID.bytes` under the `Request.request_id` field.

   `AstBatch.bind` will also take the current `AstBatch._request_id` and fill `Bind.first_request_id` with the same binary 16 byte representation from `UUID.bytes` to keep track of the first `Request` message which this statement will be flushed to the server in.

   This PR is accompanied by a [server-side PR](snowflakedb/snowflake#270978).


[SNOW-1821290]: https://snowflakecomputing.atlassian.net/browse/SNOW-1821290?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…rame/Series.create_or_replace_dynamic_table (#3216)
sfc-gh-jkew and others added 25 commits June 20, 2025 11:04
…#3474)

Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
Co-authored-by: Jianzhun Du <68252326+sfc-gh-jdu@users.noreply.github.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
Co-authored-by: Jamison <jamison.rose@snowflake.com>
Co-authored-by: David Wang <da.wang@snowflake.com>
Co-authored-by: Yijun Xie <yijun.xie@snowflake.com>
…3484)

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Replace importlib.import_module in lazy_load to handle nested module paths

Fix the use of lazy import helper functions and remove thread locking in lazy_import

Complete the cleanup of importing pandas

Update the version number to 1.31.2

Update the version number in meta.yaml for conda build issue troubleshoot

Revert the version change in version.py and meta.yaml

Clean up the hacky parts from the past implementation

Clean up the hacky parts from the past implementation

fix the failed type_checking merge gate

fix the failed linting merge gate

 fix the failed linting merge gate 2

 fix the failed linting merge gate 2

Add license header via pre-commit

fix the precommit black

Update utils.py for Test py-macos-latest-3.9-aws

Updat _options.py for Test py-macos-latest-3.9-aws

fix the local testing error related to PandasDataFrame

Fixing local test failed in PandasDataFrame
- Fix PandasSeries type conversion error by implementing caching mechanism
  in types.py and enhancing eval context in type_utils.py
- Enable write_pandas mocking support by adding lazy wrapper and __getattr__
  in session.py while preserving lazy import behavior
- Correct pandas_tools import statements in lazy_import_utils.py
…ed lazy import in mock modules

This reverts commit 48b6a69321013644ac67ab10ac03189e2cbb7d4a.
Fixing the runtimeEror:Local testing requires pandas as dependency

Fixing the runtimeEror:Local testing requires pandas as dependency

Update ../CHANGELOG.md
- isinstance() arg 2 must be a type, a tuple of types, or a union in test_udf_utils.py::test_copy_grant_for_udf_or_sp_registration
- tests/mock/test_functions.py::test_concat_ws_indexing - TypeError: '>' not supported between instances of 'float' and 'str'
Fix tests/integ/test_pandas_to_df.py::test_create_from_pandas_extension_types:
- Ensures pandas types are fully loaded before isinstance()
Refactor: Make pandas_util imports lazy in session.py
- Pandas will not loaded when importing session, only be loaded when actually creating DataFrames from pandas
- Fix pandas timestamp type detection in regular mode
Fix pd.api.types. to pd.core.dtypes.common in _snowflake_data_type.py
… fixing to test failures) and modify create_python_udf_or_sp in udf_utils.py
@sfc-gh-nqi sfc-gh-nqi force-pushed the nqi_SNOW_2012666_lazy_load branch from 84ea7de to 4e4c73f Compare June 27, 2025 00:26
@sfc-gh-jkew
Copy link
Copy Markdown
Contributor

How much memory does this save, and under what scenarios?

Ok. I found a slack thread discussing this: https://snowflake.slack.com/archives/C055DUP8G5N/p1683910177158739

It's kind of interesting, but the performance is hard to evaluate because almost everything ultimately depends on pandas and pyarrow. We would have to change the connector as well and then determining what the user impact would be hard.

@sfc-gh-aalam sfc-gh-aalam force-pushed the nqi_SNOW_2012666_lazy_load branch from 4e4c73f to 4f72760 Compare November 12, 2025 23:20
@github-actions github-actions bot locked and limited conversation to collaborators Nov 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

local testing Local Testing issues/PRs snowpark-pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.