SNOW-2012666: Lazy import of Pandas in snowpark-python#3348
Closed
sfc-gh-nqi wants to merge 2443 commits intomainfrom
Closed
SNOW-2012666: Lazy import of Pandas in snowpark-python#3348sfc-gh-nqi wants to merge 2443 commits intomainfrom
sfc-gh-nqi wants to merge 2443 commits intomainfrom
Conversation
…3195) Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
…itHub action to validate it (#3173) 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. [SNOW-1923653](https://snowflakecomputing.atlassian.net/browse/SNOW-1923653) 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development) - [x] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: [AST Support Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#ast-abstract-syntax-tree-support-in-snowpark) 3. Please describe how your code solves the related issue. This PR creates an AST acknowledgement checkbox in our repository's PR template along with a GitHub action to verify its usage. This is in the same vein as the thread safety acknowledgement checkbox and GitHub Action. [SNOW-1923653]: https://snowflakecomputing.atlassian.net/browse/SNOW-1923653?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…r_replace_view (#3198)
…s for stateless batches, and DataframeWriter,DataframeReader new APIs (#3180) 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. This PR includes all changes required to complete the following tickets. - [SNOW-1882075](https://snowflakecomputing.atlassian.net/browse/SNOW-1882075): AST for DataFrameWriter.insertInto/insert_into - [SNOW-1990301](https://snowflakecomputing.atlassian.net/browse/SNOW-1990301): AST for DataframerWriter.{format,save} - [SNOW-1990302](https://snowflakecomputing.atlassian.net/browse/SNOW-1990302): AST for DataframeReader.{format,load} - [SNOW-1821290](https://snowflakecomputing.atlassian.net/browse/SNOW-1821290): Finalize the AST schema v1 It also includes changes in support of [SNOW-1997136](https://snowflakecomputing.atlassian.net/browse/SNOW-1997136): Ensure AST batch is stateless. 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development) - [x] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. 3. Please describe how your code solves the related issue. AST generation for `DataframeReader` APIs which return the same `DataframeReader` instance have been updated to follow the same pattern as `DataframeWriter`. Since all such methods are setters, it is only necessary to have the last set attributes in the AST sent to the serveer. This is in support of the new `format` API in both Snowpark classes. AST generation was added for the new APIs `DataframeReader.{format,load}`, and `DataframeWriter.{format,read,insert_into}` along with expectation test updates. Changes were also made in support of AST batch statelessness in the IR, by changing fields typed `VarId` to be `Expr` typed. This allows more consistent usage of `_set_ast_ref` as a helper method across all Snowpark APIs. This PR is accompanied by a [server-side PR](snowflakedb/snowflake#270350). [SNOW-1882075]: https://snowflakecomputing.atlassian.net/browse/SNOW-1882075?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SNOW-1990301]: https://snowflakecomputing.atlassian.net/browse/SNOW-1990301?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SNOW-1990302]: https://snowflakecomputing.atlassian.net/browse/SNOW-1990302?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SNOW-1821290]: https://snowflakecomputing.atlassian.net/browse/SNOW-1821290?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SNOW-1997136]: https://snowflakecomputing.atlassian.net/browse/SNOW-1997136?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
…arId to BindId (#3183) 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. [SNOW-1821290](https://snowflakecomputing.atlassian.net/browse/SNOW-1821290): Finalize AST schema v1 by refactoring `Assign` to `Bind`, `VarId` to `BindId` 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development) - [x] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. 3. Please describe how your code solves the related issue. To reflect the new semantics, `AstBatch.assign` was refactored to `AstBatch.bind` and its usage was updated across the client. Usage of `var_id` fields was replaced with `bind_id` fields as well. `AstBatch._init_batch` now provides the UUID to the newly created `Request` message in its binary 16 byte representation provided by `UUID.bytes` under the `Request.request_id` field. `AstBatch.bind` will also take the current `AstBatch._request_id` and fill `Bind.first_request_id` with the same binary 16 byte representation from `UUID.bytes` to keep track of the first `Request` message which this statement will be flushed to the server in. This PR is accompanied by a [server-side PR](snowflakedb/snowflake#270978). [SNOW-1821290]: https://snowflakecomputing.atlassian.net/browse/SNOW-1821290?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…_replace_dynamic_table (#3209)
Co-authored-by: Mahesh Vashishtha <mahesh.vashishtha@snowflake.com>
…rame/Series.create_or_replace_dynamic_table (#3216)
…rame/Series.create_or_replace_view (#3215)
…param after a bad merge (#3236)
…#3474) Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com>
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Co-authored-by: Hazem Elmeleegy <hazem.elmeleegy@snowflake.com> Co-authored-by: Jianzhun Du <68252326+sfc-gh-jdu@users.noreply.github.com> Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com> Co-authored-by: Jamison <jamison.rose@snowflake.com> Co-authored-by: David Wang <da.wang@snowflake.com> Co-authored-by: Yijun Xie <yijun.xie@snowflake.com>
…3484) Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
Replace importlib.import_module in lazy_load to handle nested module paths Fix the use of lazy import helper functions and remove thread locking in lazy_import Complete the cleanup of importing pandas Update the version number to 1.31.2 Update the version number in meta.yaml for conda build issue troubleshoot Revert the version change in version.py and meta.yaml Clean up the hacky parts from the past implementation Clean up the hacky parts from the past implementation fix the failed type_checking merge gate fix the failed linting merge gate fix the failed linting merge gate 2 fix the failed linting merge gate 2 Add license header via pre-commit fix the precommit black Update utils.py for Test py-macos-latest-3.9-aws Updat _options.py for Test py-macos-latest-3.9-aws fix the local testing error related to PandasDataFrame Fixing local test failed in PandasDataFrame
…t of Pandas related
- Fix PandasSeries type conversion error by implementing caching mechanism in types.py and enhancing eval context in type_utils.py - Enable write_pandas mocking support by adding lazy wrapper and __getattr__ in session.py while preserving lazy import behavior - Correct pandas_tools import statements in lazy_import_utils.py
…ed lazy import in mock modules This reverts commit 48b6a69321013644ac67ab10ac03189e2cbb7d4a.
…e lazy imports in mock modules
Fixing the runtimeEror:Local testing requires pandas as dependency Fixing the runtimeEror:Local testing requires pandas as dependency Update ../CHANGELOG.md
- isinstance() arg 2 must be a type, a tuple of types, or a union in test_udf_utils.py::test_copy_grant_for_udf_or_sp_registration - tests/mock/test_functions.py::test_concat_ws_indexing - TypeError: '>' not supported between instances of 'float' and 'str' Fix tests/integ/test_pandas_to_df.py::test_create_from_pandas_extension_types: - Ensures pandas types are fully loaded before isinstance() Refactor: Make pandas_util imports lazy in session.py - Pandas will not loaded when importing session, only be loaded when actually creating DataFrames from pandas - Fix pandas timestamp type detection in regular mode Fix pd.api.types. to pd.core.dtypes.common in _snowflake_data_type.py
…to test failures)
… fixing to test failures) and modify create_python_udf_or_sp in udf_utils.py
84ea7de to
4e4c73f
Compare
Contributor
Ok. I found a slack thread discussing this: https://snowflake.slack.com/archives/C055DUP8G5N/p1683910177158739 It's kind of interesting, but the performance is hard to evaluate because almost everything ultimately depends on pandas and pyarrow. We would have to change the connector as well and then determining what the user impact would be hard. |
4e4c73f to
4f72760
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2012666
lazy_import_utils.pyTYPE_CHECKINGto avoid runtime importsFill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.