Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
be0ae7b
SNOW-1722641: Add support for Series.between (#2775)
sfc-gh-joshi Dec 19, 2024
4ae4c4d
SNOW-1794369 Catalog API (#2655)
sfc-gh-aalam Dec 20, 2024
c388f00
SNOW-1866086: Fix test_dt_accessor.py::test_strftime in Windows (#2798)
sfc-gh-helmeleegy Dec 20, 2024
80ef534
SNOW-1866100 Update DataFrame.unpivot AST encoding to include `includ…
sfc-gh-vbudati Dec 20, 2024
20cfeed
SNOW-1865997: Use docstrings folder for DatetimeIndex methods and pro…
sfc-gh-helmeleegy Dec 20, 2024
637905b
SNOW-1805851: Add scikit-learn interoperability tests. (#2796)
sfc-gh-mvashishtha Dec 20, 2024
69c41a2
SNOW-1865595: Add currently supported structured type functions (#2790)
sfc-gh-jrose Dec 20, 2024
ba31301
SNOW-981562: Expose is_temp_table_for_cleanup to Session.table (#2784)
sfc-gh-aalam Dec 27, 2024
e75b506
SNOW-1869362: Plan plotter improvements (#2813)
sfc-gh-aalam Jan 3, 2025
10c612e
SNOW-1869388 add memoization to to_selectable (#2815)
sfc-gh-aalam Jan 3, 2025
a79fe9f
SNOW-1865904 fix query gen when nested cte node is partitioned (#2816)
sfc-gh-aalam Jan 3, 2025
888cec5
SNOW-1829870: Allow structured types to be enabled by default (#2727)
sfc-gh-jrose Jan 4, 2025
2247ae5
SNOW-1853347: Add mechanism to allow changing type strs when printing…
sfc-gh-jrose Jan 6, 2025
b0d1659
Merge branch 'main' into vbudati/SNOW-1794510-merge-decoder
sfc-gh-batur Jan 6, 2025
3a8612f
Unify Snowflake object name handling in the Snowpark AST (#2789)
sfc-gh-oplaton Jan 6, 2025
31b5c8f
Fail local-testing check rather than add nag comment (#2823)
sfc-gh-jrose Jan 7, 2025
f90ba59
SNOW-1867961: Fix from_json not working the TimestampType that contai…
sfc-gh-jrose Jan 7, 2025
7494363
SNOW-1865926: Infer schema for StructType columns from nested Rows (#…
sfc-gh-jrose Jan 7, 2025
b64e724
SNOW-1869802: Fix local testing pivot returning None (#2824)
sfc-gh-jrose Jan 7, 2025
d92dee9
SNOW-1739034: Unskip tests requiring pandas 2.2.3 in anaconda. (#2829)
sfc-gh-mvashishtha Jan 8, 2025
7c1ed3e
SNOW-1852925: Add type inference for Series.apply/map and Dataframe.m…
sfc-gh-nkumar Jan 8, 2025
98330fa
SNOW-1843881: Change StructType columns to return Row objects (#2820)
sfc-gh-jrose Jan 8, 2025
acde75d
Merge branch 'main' into merging_from_main
sfc-gh-batur Jan 10, 2025
ea1911f
moar changes
sfc-gh-batur Jan 10, 2025
8135708
moar changes
sfc-gh-batur Jan 10, 2025
799efee
fixing up decoder to handle name changes
sfc-gh-batur Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions .github/workflows/enforce_localtest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ name: Request Local Testing approval if necessary

on:
pull_request:
branches: '**'
types: [review_requested, review_request_removed, opened, synchronize]
branches:
- main

jobs:
request_review:
Expand All @@ -16,10 +18,10 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Request Local Testing review if PR contains local_testing_mode
- name: Check for local-testing changes
id: check-diff
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
url: ${{ github.event.pull_request.html_url }}
run: |
(gh pr diff "$url" | grep "^+" | grep "local_testing_mode" && gh pr comment "$url" --body "Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing") || echo "PR does not seem to contain Local Testing changes"
if gh pr diff "$url" | grep "^+" | grep "local_testing_mode"; then echo "Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing"; exit 1; else echo "PR does not seem to contain Local Testing changes"; fi
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,24 @@
#### New Features

- Added support for the following functions in `functions.py`
- `array_reverse`
- `divnull`
- `map_cat`
- `map_contains_key`
- `map_keys`
- `nullifzero`
- `snowflake_cortex_sentiment`
- Added `Catalog` class to manage snowflake objects. It can be accessed via `Session.catalog`.

#### Improvements

- Updated README.md to include instructions on how to verify package signatures using `cosign`.

#### Bug Fixes

- Fixed a bug in local testing mode that caused a column to contain None when it should contain 0
- Fixed a bug in StructField.from_json that prevented TimestampTypes with tzinfo from being parsed correctly.

### Snowpark pandas API Updates

#### New Features
Expand All @@ -38,6 +48,7 @@
- %j: Day of the year as a zero-padded decimal number.
- %X: Locale’s appropriate time representation.
- %%: A literal '%' character.
- Added support for `Series.between`.

#### Bug Fixes

Expand All @@ -48,6 +59,8 @@
- Updated integration testing for `session.lineage.trace` to exclude deleted objects
- Added documentation for `DataFrame.map`.
- Improve performance of `DataFrame.apply` by mapping numpy functions to snowpark functions if possible.
- Added documentation on the extent of Snowpark pandas interoperability with scikit-learn
- Infer return type of functions in `Series.map`, `Series.apply` and `DataFrame.map` if type-hint is not provided.

## 1.26.0 (2024-12-05)

Expand Down
105 changes: 100 additions & 5 deletions docs/source/modin/interoperability.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
===========================================
Interoperability with third party libraries
=============================================
===========================================

Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function
inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method
Expand All @@ -8,15 +9,17 @@ works in Snowpark pandas as well.
Snowpark pandas supports the `dataframe interchange protocol <https://data-apis.org/dataframe-protocol/latest/>`_, which
some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.

The following table is structured as follows: The first column contains a method name.
plotly.express
==============

The following table is structured as follows: The first column contains the name of a method in the ``plotly.express`` module.
The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these
methods, we validate that passing in a Snowpark pandas dataframe as the dataframe input parameter behaves equivalently
to passing in a pandas dataframe.
operations, we validate that passing in Snowpark pandas dataframes or series as the data inputs behaves equivalently
to passing in pandas dataframes or series.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.

Plotly.express module methods

.. note::
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.
Expand Down Expand Up @@ -56,3 +59,95 @@ Plotly.express module methods
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``imshow`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+


scikit-learn
============

We break down scikit-learn interoperability by categories of scikit-learn
operations.

For each category, we provide a table of interoperability with the following
structure: The first column describes a scikit-learn operation that may include
multiple method calls. The second column is a flag for whether or not
interoperability is guaranteed with Snowpark pandas. For each of these methods,
we validate that passing in Snowpark pandas objects behaves equivalently to
passing in pandas objects.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.

.. note::
While some scikit-learn methods accept Snowpark pandas inputs, their
performance with Snowpark pandas inputs is often much worse than their
performance with native pandas inputs. Generally we recommend converting
Snowpark pandas inputs to pandas with ``to_pandas()`` before passing them
to scikit-learn.


Classification
--------------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Fitting a ``LinearDiscriminantAnalysis`` | Y | |
| classifier with the ``fit()`` method and | | |
| classifying data with the ``predict()`` | | |
| method. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Regression
----------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Fitting a ``LogisticRegression`` model | Y | |
| with the ``fit()`` method and predicting | | |
| results with the ``predict()`` method. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+

Clustering
----------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Clustering method | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| ``KMeans.fit()`` | Y | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Dimensionality reduction
------------------------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Getting the principal components of a | Y | |
| numerical dataset with ``PCA.fit()``. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Model selection
------------------------

+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Choosing parameters for a | Y | ``RandomizedSearchCV`` causes Snowpark pandas |
| ``LogisticRegression`` model with | | to issue many queries. We strongly recommend |
| ``RandomizedSearchCV.fit()``. | | converting Snowpark pandas inputs to pandas |
| | | before using ``RandomizedSearchCV`` |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+

Preprocessing
-------------

+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Scaling training data with | Y | |
| ``MaxAbsScaler.fit_transform()``. | | |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
2 changes: 1 addition & 1 deletion docs/source/modin/supported/series_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``backfill`` | P | | ``N`` if param ``downcast`` is set. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``between`` | N | | |
| ``between`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``between_time`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
67 changes: 67 additions & 0 deletions docs/source/snowpark/catalog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
=============
Catalog
=============
Catalog module for Snowpark.

.. currentmodule:: snowflake.snowpark.catalog

.. rubric:: Catalog

.. autosummary::
:toctree: api/

Catalog.databaseExists
Catalog.database_exists
Catalog.dropDatabase
Catalog.dropSchema
Catalog.dropTable
Catalog.dropView
Catalog.drop_database
Catalog.drop_schema
Catalog.drop_table
Catalog.drop_view
Catalog.getCurrentDatabase
Catalog.getCurrentSchema
Catalog.getDatabase
Catalog.getProcedure
Catalog.getSchema
Catalog.getTable
Catalog.getUserDefinedFunction
Catalog.getView
Catalog.get_current_database
Catalog.get_current_schema
Catalog.get_database
Catalog.get_procedure
Catalog.get_schema
Catalog.get_table
Catalog.get_user_defined_function
Catalog.get_view
Catalog.listColumns
Catalog.listDatabases
Catalog.listProcedures
Catalog.listSchemas
Catalog.listTables
Catalog.listUserDefinedFunctions
Catalog.listViews
Catalog.list_columns
Catalog.list_databases
Catalog.list_procedures
Catalog.list_schemas
Catalog.list_tables
Catalog.list_user_defined_functions
Catalog.list_views
Catalog.procedureExists
Catalog.procedure_exists
Catalog.schemaExists
Catalog.schema_exists
Catalog.setCurrentDatabase
Catalog.setCurrentSchema
Catalog.set_current_database
Catalog.set_current_schema
Catalog.tableExists
Catalog.table_exists
Catalog.userDefinedFunctionExists
Catalog.user_defined_function_exists
Catalog.viewExists
Catalog.view_exists

9 changes: 9 additions & 0 deletions docs/source/snowpark/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,26 @@ Functions
array_construct_compact
array_contains
array_distinct
array_except
array_flatten
array_generate_range
array_insert
array_intersection
array_join
array_max
array_min
array_position
array_prepend
array_remove
array_reverse
array_size
array_slice
array_sort
array_to_string
array_union
array_unique_agg
arrays_overlap
arrays_zip
as_array
as_binary
as_char
Expand Down Expand Up @@ -205,6 +210,10 @@ Functions
lpad
ltrim
make_interval
map_cat
map_concat
map_contains_key
map_keys
max
md5
mean
Expand Down
7 changes: 4 additions & 3 deletions docs/source/snowpark/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ Snowpark APIs
column
types
row
functions
window
grouping
functions
window
grouping
table_function
table
async_job
Expand All @@ -21,6 +21,7 @@ Snowpark APIs
udtf
observability
files
catalog
lineage
context
exceptions
Expand Down
1 change: 1 addition & 0 deletions docs/source/snowpark/session.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Snowpark Session
Session.append_query_tag
Session.call
Session.cancel_all
Session.catalog
Session.clear_imports
Session.clear_packages
Session.close
Expand Down
1 change: 1 addition & 0 deletions recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ requirements:
- protobuf >=3.20,<6
- python-dateutil
- tzlocal
- snowflake.core >=1.0.0,<2

test:
imports:
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
"protobuf>=3.20, <6", # Snowpark IR
"python-dateutil", # Snowpark IR
"tzlocal", # Snowpark IR
"snowflake.core>=1.0.0, <2", # Catalog
]
REQUIRED_PYTHON_VERSION = ">=3.8, <3.12"

Expand Down Expand Up @@ -199,7 +200,7 @@ def run(self):
*DEVELOPMENT_REQUIREMENTS,
"scipy", # Snowpark pandas 3rd party library testing
"statsmodels", # Snowpark pandas 3rd party library testing
"scikit-learn==1.5.2", # Snowpark pandas scikit-learn tests
"scikit-learn", # Snowpark pandas 3rd party library testing
# plotly version restricted due to foreseen change in query counts in version 6.0.0+
"plotly<6.0.0", # Snowpark pandas 3rd party library testing
],
Expand Down
13 changes: 11 additions & 2 deletions src/snowflake/snowpark/_internal/analyzer/datatype_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,10 +202,16 @@ def to_sql(
return f"'{binascii.hexlify(bytes(value)).decode()}' :: BINARY"

if isinstance(value, (list, tuple, array)) and isinstance(datatype, ArrayType):
return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: ARRAY"
type_str = "ARRAY"
if datatype.structured:
type_str = convert_sp_to_sf_type(datatype)
return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: {type_str}"

if isinstance(value, dict) and isinstance(datatype, MapType):
return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: OBJECT"
type_str = "OBJECT"
if datatype.structured:
type_str = convert_sp_to_sf_type(datatype)
return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: {type_str}"

if isinstance(datatype, VariantType):
# PARSE_JSON returns VARIANT, so no need to append :: VARIANT here explicitly.
Expand Down Expand Up @@ -260,11 +266,14 @@ def schema_expression(data_type: DataType, is_nullable: bool) -> str:
return "to_timestamp('2020-09-16 06:30:00')"
if isinstance(data_type, ArrayType):
if data_type.structured:
assert isinstance(data_type.element_type, DataType)
element = schema_expression(data_type.element_type, is_nullable)
return f"to_array({element}) :: {convert_sp_to_sf_type(data_type)}"
return "to_array(0)"
if isinstance(data_type, MapType):
if data_type.structured:
assert isinstance(data_type.key_type, DataType)
assert isinstance(data_type.value_type, DataType)
key = schema_expression(data_type.key_type, is_nullable)
value = schema_expression(data_type.value_type, is_nullable)
return f"object_construct_keep_null({key}, {value}) :: {convert_sp_to_sf_type(data_type)}"
Expand Down
Loading
Loading