Skip to content
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a6fd8a0
SNOW-1856438: support cortex classify_text with apply
sfc-gh-lmukhopadhyay Dec 13, 2024
1c84edd
resolve conf
sfc-gh-lmukhopadhyay Jan 21, 2025
80fa2a5
use snowflake cortex function directly and remove snowpark functions
sfc-gh-lmukhopadhyay Jan 27, 2025
582623d
remove native snowpark cortex func tests
sfc-gh-lmukhopadhyay Jan 27, 2025
90acd1e
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 27, 2025
c8b3d10
add snowflake.cortex to setup
sfc-gh-lmukhopadhyay Jan 28, 2025
be3f0fb
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 28, 2025
2534b22
change to snowflake-ml-python pkg
sfc-gh-lmukhopadhyay Jan 28, 2025
e54bfe6
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 28, 2025
9d4c878
fix apply utils and add new test file
sfc-gh-lmukhopadhyay Jan 28, 2025
afbf924
neg test and fix is_supported_snowflake_cortex_function
sfc-gh-lmukhopadhyay Jan 29, 2025
0b3a458
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 29, 2025
11e1031
update changelog
sfc-gh-lmukhopadhyay Jan 29, 2025
8b46fd3
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 30, 2025
47b689c
update neg text to remove match
sfc-gh-lmukhopadhyay Jan 30, 2025
a7d0041
add support for apply cortex to df
sfc-gh-lmukhopadhyay Jan 31, 2025
d0abc8f
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Jan 31, 2025
a3f4be4
review changes
sfc-gh-lmukhopadhyay Feb 4, 2025
35a53ad
resolve conflics
sfc-gh-lmukhopadhyay Feb 4, 2025
74852a8
update neg test and qc apply methods
sfc-gh-lmukhopadhyay Feb 4, 2025
8974567
rev changes and support both args and kwargs in apply
sfc-gh-lmukhopadhyay Feb 5, 2025
c9890a4
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Feb 5, 2025
1173b66
generate cortex funcs and rev changes
sfc-gh-lmukhopadhyay Feb 5, 2025
48b1e97
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Feb 5, 2025
4035093
removing ClassifyText support
sfc-gh-lmukhopadhyay Feb 5, 2025
fdf3013
fix changelog
sfc-gh-lmukhopadhyay Feb 5, 2025
dd613d4
fix args unsupported test
sfc-gh-lmukhopadhyay Feb 6, 2025
29c548c
add back snowpark python functions and deprecation warning
sfc-gh-lmukhopadhyay Feb 6, 2025
6fb0f03
update deprecate warning and changelog
sfc-gh-lmukhopadhyay Feb 6, 2025
c4923fc
updating dependency setup for snowflake-ml-python
sfc-gh-lmukhopadhyay Feb 6, 2025
f7475f2
Merge branch 'main' into lmukhopadhyay-SNOW-1856438-cortex-funcs-apply
sfc-gh-lmukhopadhyay Feb 6, 2025
ef21755
cleanup comment
sfc-gh-lmukhopadhyay Feb 6, 2025
26dcef3
address comments
sfc-gh-lmukhopadhyay Feb 7, 2025
993c7dc
resolve conf
sfc-gh-lmukhopadhyay Feb 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
# Release History

## 1.28.0 (TBD)

### Snowpark Python API Updates

#### Removed Features

- Removed support for Snowpark Python function `snowflake_cortex_summarize`. Users can use the Summarize function from Snowflake.Cortex instead.
- Removed support for Snowpark Python function `snowflake_cortex_sentiment`. Users can use the Sentiment function from Snowflake.Cortex instead.

### Snowpark pandas API Updates

#### Dependency Updates

- Added a test dependency for `snowflake-ml-python`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to call out test dependency

#### New Features

- Added support for applying Snowflake Cortex functions `Summarize` and `Sentiment`.

## 1.27.0 (2025-02-03)

### Snowpark Python API Updates
Expand Down
2 changes: 0 additions & 2 deletions docs/source/snowpark/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -308,8 +308,6 @@ Functions
sinh
size
skew
snowflake_cortex_sentiment
snowflake_cortex_summarize
sort_array
soundex
split
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ def run(self):
"scikit-learn", # Snowpark pandas 3rd party library testing
# plotly version restricted due to foreseen change in query counts in version 6.0.0+
"plotly<6.0.0", # Snowpark pandas 3rd party library testing
"snowflake-ml-python",
],
"localtest": [
"pandas",
Expand Down
30 changes: 0 additions & 30 deletions src/snowflake/snowpark/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10741,36 +10741,6 @@ def make_interval(
return res


def snowflake_cortex_summarize(text: ColumnOrLiteralStr):
"""
Summarizes the given English-language input text.

Args:
text: A string containing the English text from which a summary should be generated.

Returns:
A string containing a summary of the original text.
"""
sql_func_name = "snowflake.cortex.summarize"
text_col = _to_col_if_lit(text, sql_func_name)
return builtin(sql_func_name)(text_col)


def snowflake_cortex_sentiment(text: ColumnOrLiteralStr):
"""
A string containing the text for which a sentiment score should be calculated.

Args:
text: A string containing the English text from which a summary should be generated.
Returns:
A floating-point number from -1 to 1 (inclusive) indicating the level of negative or positive sentiment in the
text. Values around 0 indicate neutral sentiment.
"""
sql_func_name = "snowflake.cortex.sentiment"
text_col = _to_col_if_lit(text, sql_func_name)
return builtin(sql_func_name)(text_col)


@publicapi
def acosh(e: ColumnOrName, _emit_ast: bool = True) -> Column:
"""
Expand Down
12 changes: 10 additions & 2 deletions src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import cloudpickle
import numpy as np
import pandas as native_pd
import snowflake.cortex
from pandas._typing import AggFuncType
from pandas.api.types import is_scalar

Expand Down Expand Up @@ -97,10 +98,17 @@
sp_func.floor,
sp_func.trunc,
sp_func.sqrt,
sp_func.snowflake_cortex_summarize,
sp_func.snowflake_cortex_sentiment,
}

SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY = {
snowflake.cortex.Summarize,
snowflake.cortex.Sentiment,
}

ALL_SNOWFLAKE_CORTEX_FUNCTIONS = tuple(
i[1] for i in inspect.getmembers(snowflake.cortex)
)


class GroupbyApplySortMethod(Enum):
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,8 @@
is_supported_snowpark_python_function,
sort_apply_udtf_result_columns_by_pandas_positions,
make_series_map_snowpark_function,
SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY,
ALL_SNOWFLAKE_CORTEX_FUNCTIONS,
)
from collections import defaultdict
from snowflake.snowpark.modin.plugin._internal.binary_op_utils import (
Expand Down Expand Up @@ -8471,6 +8473,29 @@ def apply(
)
return self._apply_snowpark_python_function_to_columns(func, kwargs)

if func in SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY:
if axis != 0:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}` with with axis = {axis}.'"
)
if raw is not False:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}`with raw = {raw}."
)
if args:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}` with args == '{args}'"
)
if kwargs:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}` with kwargs == '{kwargs}'"
)
return self._apply_snowflake_cortex_function_to_columns(func)
elif func in ALL_SNOWFLAKE_CORTEX_FUNCTIONS:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}`"
)

sf_func = NUMPY_UNIVERSAL_FUNCTION_TO_SNOWFLAKE_FUNCTION.get(func)
if sf_func is not None:
return self._apply_snowpark_python_function_to_columns(sf_func, kwargs)
Expand Down Expand Up @@ -8786,6 +8811,22 @@ def sf_function(col: SnowparkColumn) -> SnowparkColumn:
self._modin_frame.apply_snowpark_function_to_columns(sf_function)
)

def _apply_snowflake_cortex_function_to_columns(
self,
snowflake_function: Callable,
) -> "SnowflakeQueryCompiler":
"""Apply Snowflake Cortex function to columns."""

def sf_function(col: SnowparkColumn) -> SnowparkColumn:
resolved_positional = []
resolved_positional.append(col)

return snowflake_function(*resolved_positional)

return SnowflakeQueryCompiler(
self._modin_frame.apply_snowpark_function_to_columns(sf_function)
)

def applymap(
self,
func: AggFuncType,
Expand Down Expand Up @@ -8819,6 +8860,25 @@ def applymap(
)
return self._apply_snowpark_python_function_to_columns(func, kwargs)

if func in SUPPORTED_SNOWFLAKE_CORTEX_FUNCTIONS_IN_APPLY:
if na_action:
ErrorMessage.not_implemented(
f"Snowpark pandas applymap API doesn't yet support Snowflake Cortex function `{func.__name__}` with na_action == '{na_action}'"
)
if args:
ErrorMessage.not_implemented(
f"Snowpark pandas applymap API doesn't yet support Snowflake Cortex function `{func.__name__}` with args == '{args}'"
)
if kwargs:
ErrorMessage.not_implemented(
f"Snowpark pandas applymap API doesn't yet support Snowflake Cortex function `{func.__name__}` with kwargs == '{kwargs}'"
)
return self._apply_snowflake_cortex_function_to_columns(func)
elif func in ALL_SNOWFLAKE_CORTEX_FUNCTIONS:
ErrorMessage.not_implemented(
f"Snowpark pandas apply API doesn't yet support Snowflake Cortex function `{func.__name__}`"
)

# Check if the function is a known numpy function that can be translated
# to Snowflake function.
sf_func = NUMPY_UNIVERSAL_FUNCTION_TO_SNOWFLAKE_FUNCTION.get(func)
Expand Down
149 changes: 149 additions & 0 deletions tests/integ/modin/test_apply_snowflake_cortex_functions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
#
# Copyright (c) 2012-2025 Snowflake Computing Inc. All rights reserved.
#

import modin.pandas as pd
import pytest
from pytest import param


from tests.integ.utils.sql_counter import SqlCounter, sql_count_checker
from tests.utils import running_on_jenkins
from snowflake.cortex import Sentiment, Summarize, Translate


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.summarize SSL error",
)
def test_apply_snowflake_cortex_summarize(session):

# TODO: SNOW-1758914 snowflake.cortex.summarize error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

with SqlCounter(query_count=1):
content = """pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in
Snowflake. Just by changing the import statement and a few lines of code, you can get the familiar pandas experience
you know and love with the scalability and security benefits of Snowflake. With pandas on Snowflake, you can work
with much larger datasets and avoid the time and expense of porting your pandas pipelines to other big data
frameworks or provisioning large and expensive machines. It runs workloads natively in Snowflake through
transpilation to SQL, enabling it to take advantage of parallelization and the data governance and security
benefits of Snowflake. pandas on Snowflake is delivered through the Snowpark pandas API as part of the Snowpark
Python library, which enables scalable data processing of Python code within the Snowflake platform.
"""
s = pd.Series([content])
summary = s.apply(Summarize).iloc[0]
# this length check is to get around the fact that this function may not be deterministic
assert 0 < len(summary) < len(content)


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error",
)
def test_apply_snowflake_cortex_sentiment_series(session):

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

with SqlCounter(query_count=1):
content = "A very very bad review!"
s = pd.Series([content])
sentiment = s.apply(Sentiment).iloc[0]
assert -1 <= sentiment <= 0


def test_apply_snowflake_cortex_sentiment_df(session):

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
with SqlCounter(query_count=0):
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return
text_list = [
"A first row of text.",
"This is a very bad test.",
"This is the best test ever.",
]

content_frame = pd.DataFrame(text_list, columns=["content"])
with SqlCounter(query_count=4):
res = content_frame.apply(Sentiment)
sent_row_2 = res["content"][1]
sent_row_3 = res["content"][2]
assert -1 <= sent_row_2 <= 0
assert 0 <= sent_row_3 <= 1


@pytest.mark.skipif(
running_on_jenkins(),
reason="TODO: SNOW-1859087 snowflake.cortex.sentiment SSL error",
)
@sql_count_checker(query_count=0)
@pytest.mark.parametrize(
"is_series, operation",
[
param(
True,
(lambda s: s.apply(Translate, source_language="en", target_language="de")),
id="series_cortex_unsupported_function_translate",
),
param(
False,
(
lambda df: df.apply(
Translate, source_language="en", target_language="de"
)
),
id="df_cortex_unsupported_function_translate",
),
param(
True,
(lambda s: s.apply(Sentiment, args=("hello"))),
id="series_cortex_unsupported_args",
),
param(
False,
(lambda df: df.apply(Sentiment, args=("hello"))),
id="df_cortex_unsupported_args",
),
param(
True,
(lambda s: s.apply(Sentiment, extra="hello")),
id="series_cortex_unsupported_kwargs",
),
param(
False,
(lambda df: df.apply(Sentiment, extra="hello")),
id="df_cortex_unsupported_kwargs",
),
param(
True,
(lambda s: s.apply(Sentiment, na_action="ignore")),
id="series_cortex_unsupported_na_action",
),
param(
False,
(lambda df: df.apply(Sentiment, raw=True)),
id="df_cortex_unsupported_raw",
),
param(
False,
(lambda df: df.apply(Sentiment, axis=1)),
id="df_cortex_unsupported_axis_1",
),
],
)
def test_apply_snowflake_cortex_negative(session, is_series, operation):

# TODO: SNOW-1758914 snowflake.cortex.sentiment error on GCP
if session.connection.host == "sfctest0.us-central1.gcp.snowflakecomputing.com":
return

content = "One day I will see the world."
modin_input = (pd.Series if is_series else pd.DataFrame)([content])
with pytest.raises(NotImplementedError):
operation(modin_input)
Loading
Loading