Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a6fd8a0
SNOW-1856438: support cortex classify_text with apply
sfc-gh-lmukhopadhyay Dec 13, 2024
1c84edd
resolve conf
sfc-gh-lmukhopadhyay Jan 21, 2025
80fa2a5
use snowflake cortex function directly and remove snowpark functions
sfc-gh-lmukhopadhyay Jan 27, 2025
582623d
remove native snowpark cortex func tests
sfc-gh-lmukhopadhyay Jan 27, 2025
90acd1e
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 27, 2025
c8b3d10
add snowflake.cortex to setup
sfc-gh-lmukhopadhyay Jan 28, 2025
be3f0fb
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 28, 2025
2534b22
change to snowflake-ml-python pkg
sfc-gh-lmukhopadhyay Jan 28, 2025
e54bfe6
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 28, 2025
9d4c878
fix apply utils and add new test file
sfc-gh-lmukhopadhyay Jan 28, 2025
afbf924
neg test and fix is_supported_snowflake_cortex_function
sfc-gh-lmukhopadhyay Jan 29, 2025
0b3a458
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 29, 2025
11e1031
update changelog
sfc-gh-lmukhopadhyay Jan 29, 2025
8b46fd3
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 30, 2025
47b689c
update neg text to remove match
sfc-gh-lmukhopadhyay Jan 30, 2025
a7d0041
add support for apply cortex to df
sfc-gh-lmukhopadhyay Jan 31, 2025
d0abc8f
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 31, 2025
a3f4be4
review changes
sfc-gh-lmukhopadhyay Feb 4, 2025
35a53ad
resolve conflics
sfc-gh-lmukhopadhyay Feb 4, 2025
74852a8
update neg test and qc apply methods
sfc-gh-lmukhopadhyay Feb 4, 2025
8974567
rev changes and support both args and kwargs in apply
sfc-gh-lmukhopadhyay Feb 5, 2025
c9890a4
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Feb 5, 2025
1173b66
generate cortex funcs and rev changes
sfc-gh-lmukhopadhyay Feb 5, 2025
48b1e97
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Feb 5, 2025
4035093
removing ClassifyText support
sfc-gh-lmukhopadhyay Feb 5, 2025
fdf3013
fix changelog
sfc-gh-lmukhopadhyay Feb 5, 2025
dd613d4
fix args unsupported test
sfc-gh-lmukhopadhyay Feb 6, 2025
29c548c
add back snowpark python functions and deprecation warning
sfc-gh-lmukhopadhyay Feb 6, 2025
6fb0f03
update deprecate warning and changelog
sfc-gh-lmukhopadhyay Feb 6, 2025
c4923fc
updating dependency setup for snowflake-ml-python
sfc-gh-lmukhopadhyay Feb 6, 2025
f7475f2
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Feb 6, 2025
ef21755
cleanup comment
sfc-gh-lmukhopadhyay Feb 6, 2025
26dcef3
address comments
sfc-gh-lmukhopadhyay Feb 7, 2025
993c7dc
resolve conf
sfc-gh-lmukhopadhyay Feb 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,23 @@

### Snowpark Python API Updates

#### Deprecations:

- Deprecated Snowpark Python function `snowflake_cortex_summarize`. Users can install snowflake-ml-python and use the snowflake.cortex.summarize function instead.
- Deprecated Snowpark Python function `snowflake_cortex_sentiment`. Users can install snowflake-ml-python and use the snowflake.cortex.sentiment function instead.

#### New Features

- Added support for the following functions in `functions.py`
- `normal`
- `randn`

### Snowpark pandas API Updates

#### New Features

- Added support for applying Snowflake Cortex functions `Summarize` and `Sentiment`.

## 1.27.0 (2025-02-03)

### Snowpark Python API Updates
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ def run(self):
"scikit-learn", # Snowpark pandas 3rd party library testing
# plotly version restricted due to foreseen change in query counts in version 6.0.0+
"plotly<6.0.0", # Snowpark pandas 3rd party library testing
"snowflake-ml-python",
],
"localtest": [
"pandas",
Expand Down
14 changes: 11 additions & 3 deletions src/snowflake/snowpark/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@
publicapi,
validate_object_name,
check_create_map_parameter,
deprecated,
)
from snowflake.snowpark.column import (
CaseExpr,
Expand Down Expand Up @@ -10779,13 +10780,16 @@ def make_interval(
return res


@deprecated(
version="1.28.0",
extra_warning_text="Please consider installing snowflake-ml-python and using `snowflake.cortex.summarize` instead.",
extra_doc_string="Use :meth:`snowflake.cortex.summarize` instead.",
)
def snowflake_cortex_summarize(text: ColumnOrLiteralStr):
"""
Summarizes the given English-language input text.

Args:
text: A string containing the English text from which a summary should be generated.

Returns:
A string containing a summary of the original text.
"""
Expand All @@ -10794,10 +10798,14 @@ def snowflake_cortex_summarize(text: ColumnOrLiteralStr):
return builtin(sql_func_name)(text_col)


@deprecated(
version="1.28.0",
extra_warning_text="Please consider installing snowflake-ml-python and using `snowflake.cortex.sentiment` instead.",
extra_doc_string="Use :meth:`snowflake.cortex.sentiment` instead.",
)
def snowflake_cortex_sentiment(text: ColumnOrLiteralStr):
"""
A string containing the text for which a sentiment score should be calculated.

Args:
text: A string containing the English text from which a summary should be generated.
Returns:
Expand Down
17 changes: 15 additions & 2 deletions src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,23 @@
sp_func.floor,
sp_func.trunc,
sp_func.sqrt,
sp_func.snowflake_cortex_summarize,
sp_func.snowflake_cortex_sentiment,
}

try:
import snowflake.cortex

SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY = {
snowflake.cortex.Summarize,
snowflake.cortex.Sentiment,
}

ALL_SNOWFLAKE_CORTEX_FUNCTIONS = tuple(
i[1] for i in inspect.getmembers(snowflake.cortex)
)
except ImportError:
SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY = set()
ALL_SNOWFLAKE_CORTEX_FUNCTIONS = tuple()


class GroupbyApplySortMethod(Enum):
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,8 @@
is_supported_snowpark_python_function,
sort_apply_udtf_result_columns_by_pandas_positions,
make_series_map_snowpark_function,
SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY,
ALL_SNOWFLAKE_CORTEX_FUNCTIONS,
)
from collections import defaultdict
from snowflake.snowpark.modin.plugin._internal.binary_op_utils import (
Expand Down Expand Up @@ -8471,6 +8473,29 @@ def apply(
)
return self._apply_snowpark_python_function_to_columns(func, kwargs)

if func in SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY:
if axis != 0:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}` with with axis = {axis}.'"
)
if raw is not False:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}`with raw = {raw}."
)
if args:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}` with args == '{args}'"
)
if kwargs:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}` with kwargs == '{kwargs}'"
)
return self._apply_snowflake_cortex_function_to_columns(func)
elif func in ALL_SNOWFLAKE_CORTEX_FUNCTIONS:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}`"
)

sf_func = NUMPY_UNIVERSAL_FUNCTION_TO_SNOWFLAKE_FUNCTION.get(func)
if sf_func is not None:
return self._apply_snowpark_python_function_to_columns(sf_func, kwargs)
Expand Down Expand Up @@ -8786,6 +8811,22 @@ def sf_function(col: SnowparkColumn) -> SnowparkColumn:
self._modin_frame.apply_snowpark_function_to_columns(sf_function)
)

def _apply_snowflake_cortex_function_to_columns(
self,
snowflake_function: Callable,
) -> "SnowflakeQueryCompiler":
"""Apply Snowflake Cortex function to columns."""

def sf_function(col: SnowparkColumn) -> SnowparkColumn:
resolved_positional = []
resolved_positional.append(col)

return snowflake_function(*resolved_positional)

return SnowflakeQueryCompiler(
self._modin_frame.apply_snowpark_function_to_columns(sf_function)
)

def applymap(
self,
func: AggFuncType,
Expand Down Expand Up @@ -8819,6 +8860,25 @@ def applymap(
)
return self._apply_snowpark_python_function_to_columns(func, kwargs)

if func in SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY:
if na_action:
ErrorMessage.not_implemented(
f"Snowpark pandas applymap API doesn't yet support Snowflake Cortex function `{func.__name__}` with na_action == '{na_action}'"
)
if args:
ErrorMessage.not_implemented(
f"Snowpark pandas applymap API doesn't yet support Snowflake Cortex function `{func.__name__}` with args == '{args}'"
)
if kwargs:
ErrorMessage.not_implemented(
f"Snowpark pandas applymap API doesn't yet support Snowflake Cortex function `{func.__name__}` with kwargs == '{kwargs}'"
)
return self._apply_snowflake_cortex_function_to_columns(func)
elif func in ALL_SNOWFLAKE_CORTEX_FUNCTIONS:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}`"
)

# Check if the function is a known numpy function that can be translated
# to Snowflake function.
sf_func = NUMPY_UNIVERSAL_FUNCTION_TO_SNOWFLAKE_FUNCTION.get(func)
Expand Down
149 changes: 149 additions & 0 deletions tests/integ/modin/test_apply_snowflake_cortex_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
#
# Copyright (c) 2012-2025 Snowflake Computing Inc. All rights reserved.
#

import modin.pandas as pd
import pytest
from pytest import param


from tests.integ.utils.sql_counter import SqlCounter, sql_count_checker
from tests.utils import running_on_jenkins
from snowflake.cortex import Sentiment, Summarize, Translate


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.summarize SSL error",
)
def test_apply_snowflake_cortex_summarize(session):

# TODO: SNOW-1758914 snowflake.cortex.summarize error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

with SqlCounter(query_count=1):
content = """pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in
Snowflake. Just by changing the import statement and a few lines of code, you can get the familiar pandas experience
you know and love with the scalability and security benefits of Snowflake. With pandas on Snowflake, you can work
with much larger datasets and avoid the time and expense of porting your pandas pipelines to other big data
frameworks or provisioning large and expensive machines. It runs workloads natively in Snowflake through
transpilation to SQL, enabling it to take advantage of parallelization and the data governance and security
benefits of Snowflake. pandas on Snowflake is delivered through the Snowpark pandas API as part of the Snowpark
Python library, which enables scalable data processing of Python code within the Snowflake platform.
"""
s = pd.Series([content])
summary = s.apply(Summarize).iloc[0]
# this length check is to get around the fact that this function may not be deterministic
assert 0 < len(summary) < len(content)


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error",
)
def test_apply_snowflake_cortex_sentiment_series(session):

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

with SqlCounter(query_count=1):
content = "A very very bad review!"
s = pd.Series([content])
sentiment = s.apply(Sentiment).iloc[0]
assert -1 <= sentiment <= 0


def test_apply_snowflake_cortex_sentiment_df(session):

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return
text_list = [
"A first row of text.",
"This is a very bad test.",
"This is the best test ever.",
]

content_frame = pd.DataFrame(text_list, columns=["content"])
with SqlCounter(query_count=4):
res = content_frame.apply(Sentiment)
sent_row_2 = res["content"][1]
sent_row_3 = res["content"][2]
assert -1 <= sent_row_2 <= 0
assert 0 <= sent_row_3 <= 1


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error",
)
@sql_count_checker(query_count=0)
@pytest.mark.parametrize(
"is_series, operation",
[
param(
True,
(lambda s: s.apply(Translate, source_language="en", target_language="de")),
id="series_cortex_unsupported_function_translate",
),
param(
False,
(
lambda df: df.apply(
Translate, source_language="en", target_language="de"
)
),
id="df_cortex_unsupported_function_translate",
),
param(
True,
(lambda s: s.apply(Sentiment, args=("hello",))),
id="series_cortex_unsupported_args",
),
param(
False,
(lambda df: df.apply(Sentiment, args=("hello",))),
id="df_cortex_unsupported_args",
),
param(
True,
(lambda s: s.apply(Sentiment, extra="hello")),
id="series_cortex_unsupported_kwargs",
),
param(
False,
(lambda df: df.apply(Sentiment, extra="hello")),
id="df_cortex_unsupported_kwargs",
),
param(
True,
(lambda s: s.apply(Sentiment, na_action="ignore")),
id="series_cortex_unsupported_na_action",
),
param(
False,
(lambda df: df.apply(Sentiment, raw=True)),
id="df_cortex_unsupported_raw",
),
param(
False,
(lambda df: df.apply(Sentiment, axis=1)),
id="df_cortex_unsupported_axis_1",
),
],
)
def test_apply_snowflake_cortex_negative(session, is_series, operation):

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

content = "One day I will see the world."
modin_input = (pd.Series if is_series else pd.DataFrame)([content])
with pytest.raises(NotImplementedError):
operation(modin_input)
50 changes: 1 addition & 49 deletions tests/integ/modin/test_apply_snowpark_python_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@
import pytest

from tests.integ.modin.utils import assert_frame_equal, assert_series_equal
from tests.integ.utils.sql_counter import sql_count_checker, SqlCounter
from tests.utils import running_on_jenkins
from tests.integ.utils.sql_counter import sql_count_checker


@sql_count_checker(query_count=4)
Expand Down Expand Up @@ -70,50 +69,3 @@ def test_apply_snowpark_python_function_not_implemented():
pd.DataFrame({"a": [1, 2, 3]}).apply(asc, axis=1)
with pytest.raises(NotImplementedError):
pd.DataFrame({"a": [1, 2, 3]}).apply(asc, args=(1, 2))


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.summarize SSL error",
)
def test_apply_snowflake_cortex_summarize(session):
from snowflake.snowpark.functions import snowflake_cortex_summarize

# TODO: SNOW-1758914 snowflake.cortex.summarize error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

with SqlCounter(query_count=1):
content = """pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in
Snowflake. Just by changing the import statement and a few lines of code, you can get the familiar pandas experience
you know and love with the scalability and security benefits of Snowflake. With pandas on Snowflake, you can work
with much larger datasets and avoid the time and expense of porting your pandas pipelines to other big data
frameworks or provisioning large and expensive machines. It runs workloads natively in Snowflake through
transpilation to SQL, enabling it to take advantage of parallelization and the data governance and security
benefits of Snowflake. pandas on Snowflake is delivered through the Snowpark pandas API as part of the Snowpark
Python library, which enables scalable data processing of Python code within the Snowflake platform.
"""
s = pd.Series([content])
summary = s.apply(snowflake_cortex_summarize).iloc[0]
# this length check is to get around the fact that this function may not be deterministic
assert 0 < len(summary) < len(content)


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error",
)
def test_apply_snowflake_cortex_sentiment(session):
from snowflake.snowpark.functions import snowflake_cortex_sentiment

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

with SqlCounter(query_count=1):
content = "A very very bad review!"
s = pd.Series([content])
sentiment = s.apply(snowflake_cortex_sentiment).iloc[0]
assert -1 <= sentiment <= 0
Loading