-
Notifications
You must be signed in to change notification settings - Fork 144
SNOW-1856438: Directly support Snowflake Cortex functions Summarize and Sentiment with apply #2943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sfc-gh-lmukhopadhyay
merged 34 commits into
main
from
lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
Feb 7, 2025
Merged
Changes from 33 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
a6fd8a0
SNOW-1856438: support cortex classify_text with apply
sfc-gh-lmukhopadhyay 1c84edd
resolve conf
sfc-gh-lmukhopadhyay 80fa2a5
use snowflake cortex function directly and remove snowpark functions
sfc-gh-lmukhopadhyay 582623d
remove native snowpark cortex func tests
sfc-gh-lmukhopadhyay 90acd1e
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay c8b3d10
add snowflake.cortex to setup
sfc-gh-lmukhopadhyay be3f0fb
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay 2534b22
change to snowflake-ml-python pkg
sfc-gh-lmukhopadhyay e54bfe6
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay 9d4c878
fix apply utils and add new test file
sfc-gh-lmukhopadhyay afbf924
neg test and fix is_supported_snowflake_cortex_function
sfc-gh-lmukhopadhyay 0b3a458
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay 11e1031
update changelog
sfc-gh-lmukhopadhyay 8b46fd3
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay 47b689c
update neg text to remove match
sfc-gh-lmukhopadhyay a7d0041
add support for apply cortex to df
sfc-gh-lmukhopadhyay d0abc8f
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay a3f4be4
review changes
sfc-gh-lmukhopadhyay 35a53ad
resolve conflics
sfc-gh-lmukhopadhyay 74852a8
update neg test and qc apply methods
sfc-gh-lmukhopadhyay 8974567
rev changes and support both args and kwargs in apply
sfc-gh-lmukhopadhyay c9890a4
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay 1173b66
generate cortex funcs and rev changes
sfc-gh-lmukhopadhyay 48b1e97
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay 4035093
removing ClassifyText support
sfc-gh-lmukhopadhyay fdf3013
fix changelog
sfc-gh-lmukhopadhyay dd613d4
fix args unsupported test
sfc-gh-lmukhopadhyay 29c548c
add back snowpark python functions and deprecation warning
sfc-gh-lmukhopadhyay 6fb0f03
update deprecate warning and changelog
sfc-gh-lmukhopadhyay c4923fc
updating dependency setup for snowflake-ml-python
sfc-gh-lmukhopadhyay f7475f2
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay ef21755
cleanup comment
sfc-gh-lmukhopadhyay 26dcef3
address comments
sfc-gh-lmukhopadhyay 993c7dc
resolve conf
sfc-gh-lmukhopadhyay File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
149 changes: 149 additions & 0 deletions
149
tests/integ/modin/test_apply_snowflake_cortex_functions.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| # | ||
| # Copyright (c) 2012-2025 Snowflake Computing Inc. All rights reserved. | ||
| # | ||
|
|
||
| import modin.pandas as pd | ||
| import pytest | ||
| from pytest import param | ||
|
|
||
|
|
||
| from tests.integ.utils.sql_counter import SqlCounter, sql_count_checker | ||
| from tests.utils import running_on_jenkins | ||
| from snowflake.cortex import Sentiment, Summarize, Translate | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| running_on_jenkins(), | ||
| reason="TODO: SNOW-1859087 snowflake.cortex.summarize SSL error", | ||
| ) | ||
| def test_apply_snowflake_cortex_summarize(session): | ||
|
|
||
| # TODO: SNOW-1758914 snowflake.cortex.summarize error on GCP | ||
| with SqlCounter(query_count=0): | ||
| if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com": | ||
| return | ||
|
|
||
| with SqlCounter(query_count=1): | ||
| content = """pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in | ||
| Snowflake. Just by changing the import statement and a few lines of code, you can get the familiar pandas experience | ||
| you know and love with the scalability and security benefits of Snowflake. With pandas on Snowflake, you can work | ||
| with much larger datasets and avoid the time and expense of porting your pandas pipelines to other big data | ||
| frameworks or provisioning large and expensive machines. It runs workloads natively in Snowflake through | ||
| transpilation to SQL, enabling it to take advantage of parallelization and the data governance and security | ||
| benefits of Snowflake. pandas on Snowflake is delivered through the Snowpark pandas API as part of the Snowpark | ||
| Python library, which enables scalable data processing of Python code within the Snowflake platform. | ||
| """ | ||
| s = pd.Series([content]) | ||
| summary = s.apply(Summarize).iloc[0] | ||
| # this length check is to get around the fact that this function may not be deterministic | ||
| assert 0 < len(summary) < len(content) | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| running_on_jenkins(), | ||
| reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error", | ||
| ) | ||
| def test_apply_snowflake_cortex_sentiment_series(session): | ||
|
|
||
| # TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP | ||
| with SqlCounter(query_count=0): | ||
| if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com": | ||
| return | ||
|
|
||
| with SqlCounter(query_count=1): | ||
| content = "A very very bad review!" | ||
| s = pd.Series([content]) | ||
| sentiment = s.apply(Sentiment).iloc[0] | ||
| assert -1 <= sentiment <= 0 | ||
|
|
||
|
|
||
| def test_apply_snowflake_cortex_sentiment_df(session): | ||
|
|
||
| # TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP | ||
| with SqlCounter(query_count=0): | ||
| if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com": | ||
| return | ||
| text_list = [ | ||
| "A first row of text.", | ||
| "This is a very bad test.", | ||
| "This is the best test ever.", | ||
| ] | ||
|
|
||
| content_frame = pd.DataFrame(text_list, columns=["content"]) | ||
| with SqlCounter(query_count=4): | ||
| res = content_frame.apply(Sentiment) | ||
| sent_row_2 = res["content"][1] | ||
| sent_row_3 = res["content"][2] | ||
| assert -1 <= sent_row_2 <= 0 | ||
| assert 0 <= sent_row_3 <= 1 | ||
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| running_on_jenkins(), | ||
| reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error", | ||
| ) | ||
| @sql_count_checker(query_count=0) | ||
| @pytest.mark.parametrize( | ||
| "is_series, operation", | ||
| [ | ||
| param( | ||
| True, | ||
| (lambda s: s.apply(Translate, source_language="en", target_language="de")), | ||
| id="series_cortex_unsupported_function_translate", | ||
| ), | ||
| param( | ||
| False, | ||
| ( | ||
| lambda df: df.apply( | ||
| Translate, source_language="en", target_language="de" | ||
| ) | ||
| ), | ||
| id="df_cortex_unsupported_function_translate", | ||
| ), | ||
| param( | ||
| True, | ||
| (lambda s: s.apply(Sentiment, args=("hello",))), | ||
| id="series_cortex_unsupported_args", | ||
| ), | ||
| param( | ||
| False, | ||
| (lambda df: df.apply(Sentiment, args=("hello",))), | ||
| id="df_cortex_unsupported_args", | ||
| ), | ||
| param( | ||
| True, | ||
| (lambda s: s.apply(Sentiment, extra="hello")), | ||
| id="series_cortex_unsupported_kwargs", | ||
| ), | ||
| param( | ||
| False, | ||
| (lambda df: df.apply(Sentiment, extra="hello")), | ||
| id="df_cortex_unsupported_kwargs", | ||
| ), | ||
| param( | ||
| True, | ||
| (lambda s: s.apply(Sentiment, na_action="ignore")), | ||
| id="series_cortex_unsupported_na_action", | ||
| ), | ||
| param( | ||
| False, | ||
| (lambda df: df.apply(Sentiment, raw=True)), | ||
| id="df_cortex_unsupported_raw", | ||
| ), | ||
| param( | ||
| False, | ||
| (lambda df: df.apply(Sentiment, axis=1)), | ||
| id="df_cortex_unsupported_axis_1", | ||
| ), | ||
| ], | ||
| ) | ||
| def test_apply_snowflake_cortex_negative(session, is_series, operation): | ||
|
|
||
| # TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP | ||
| if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com": | ||
| return | ||
|
|
||
| content = "One day I will see the world." | ||
| modin_input = (pd.Series if is_series else pd.DataFrame)([content]) | ||
| with pytest.raises(NotImplementedError): | ||
| operation(modin_input) | ||
sfc-gh-lmukhopadhyay marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.