snowflakedb · sfc-gh-batur · Jan 13, 2025 · Dec 19, 2024 · Dec 20, 2024 · Dec 20, 2024
@@ -2,7 +2,9 @@ name: Request Local Testing approval if necessary
 
 on:
   pull_request:
-    branches: '**'
+    types: [review_requested, review_request_removed, opened, synchronize]
+    branches:
+      - main
 
 jobs:
   request_review:
@@ -16,10 +18,10 @@ jobs:
       uses: actions/checkout@v4
       with:
         fetch-depth: 0
-
-    - name: Request Local Testing review if PR contains local_testing_mode
+    - name: Check for local-testing changes
+      id: check-diff
       env:
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         url: ${{ github.event.pull_request.html_url }}
       run: |
-        (gh pr diff "$url" | grep "^+" | grep "local_testing_mode" && gh pr comment "$url" --body "Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing") || echo "PR does not seem to contain Local Testing changes"
+        if gh pr diff "$url" | grep "^+" | grep "local_testing_mode"; then echo "Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing"; exit 1; else echo "PR does not seem to contain Local Testing changes"; fi
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,14 +7,24 @@
 #### New Features
 
 - Added support for the following functions in `functions.py`
+  - `array_reverse`
   - `divnull`
+  - `map_cat`
+  - `map_contains_key`
+  - `map_keys`
   - `nullifzero`
   - `snowflake_cortex_sentiment`
+- Added `Catalog` class to manage snowflake objects. It can be accessed via `Session.catalog`.
 
 #### Improvements
 
 - Updated README.md to include instructions on how to verify package signatures using `cosign`.
 
+#### Bug Fixes
+
+- Fixed a bug in local testing mode that caused a column to contain None when it should contain 0
+- Fixed a bug in StructField.from_json that prevented TimestampTypes with tzinfo from being parsed correctly.
+
 ### Snowpark pandas API Updates
 
 #### New Features
@@ -38,6 +48,7 @@
   - %j: Day of the year as a zero-padded decimal number.
   - %X: Locale’s appropriate time representation.
   - %%: A literal '%' character.
+- Added support for `Series.between`.
 
 #### Bug Fixes
 
@@ -48,6 +59,8 @@
 - Updated integration testing for `session.lineage.trace` to exclude deleted objects
 - Added documentation for `DataFrame.map`.
 - Improve performance of `DataFrame.apply` by mapping numpy functions to snowpark functions if possible.
+- Added documentation on the extent of Snowpark pandas interoperability with scikit-learn
+- Infer return type of functions in `Series.map`, `Series.apply` and `DataFrame.map` if type-hint is not provided.
 
 ## 1.26.0 (2024-12-05)
 

@@ -1,5 +1,6 @@
+===========================================
 Interoperability with third party libraries
-=============================================
+===========================================
 
 Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function
 inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method
@@ -8,15 +9,17 @@ works in Snowpark pandas as well.
 Snowpark pandas supports the `dataframe interchange protocol <https://data-apis.org/dataframe-protocol/latest/>`_, which
 some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.
 
-The following table is structured as follows: The first column contains a method name.
+plotly.express
+==============
+
+The following table is structured as follows: The first column contains the name of a method in the ``plotly.express`` module.
 The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these
-methods, we validate that passing in a Snowpark pandas dataframe as the dataframe input parameter behaves equivalently
-to passing in a pandas dataframe.
+operations, we validate that passing in Snowpark pandas dataframes or series as the data inputs behaves equivalently
+to passing in pandas dataframes or series.
 
 .. note::
     ``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.
 
-Plotly.express module methods
 
 .. note::
     Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.
@@ -56,3 +59,95 @@ Plotly.express module methods
 +-------------------------+---------------------------------------------+--------------------------------------------+
 | ``imshow``              | Y                                           |                                            |
 +-------------------------+---------------------------------------------+--------------------------------------------+
+
+
+scikit-learn
+============
+
+We break down scikit-learn interoperability by categories of scikit-learn
+operations.
+
+For each category, we provide a table of interoperability with the following
+structure: The first column describes a scikit-learn operation that may include
+multiple method calls. The second column is a flag for whether or not
+interoperability is guaranteed with Snowpark pandas. For each of these methods,
+we validate that passing in Snowpark pandas objects behaves equivalently to
+passing in pandas objects.
+
+.. note::
+    ``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.
+
+.. note::
+    While some scikit-learn methods accept Snowpark pandas inputs, their
+    performance with Snowpark pandas inputs is often much worse than their
+    performance with native pandas inputs. Generally we recommend converting
+    Snowpark pandas inputs to pandas with ``to_pandas()`` before passing them
+    to scikit-learn.
+
+
+Classification
+--------------
+
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Operation                                  | Interoperable with Snowpark pandas? (Y/N)   | Notes for current implementation|
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Fitting a ``LinearDiscriminantAnalysis``   | Y                                           |                                 |
+| classifier with the ``fit()`` method and   |                                             |                                 |
+| classifying data with the ``predict()``    |                                             |                                 |
+| method.                                    |                                             |                                 |
++--------------------------------------------+---------------------------------------------+---------------------------------+
+
+
+Regression
+----------
+
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Operation                                  | Interoperable with Snowpark pandas? (Y/N)   | Notes for current implementation|
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Fitting a ``LogisticRegression``  model    | Y                                           |                                 |
+| with the ``fit()`` method and predicting   |                                             |                                 |
+| results with the ``predict()`` method.     |                                             |                                 |
++--------------------------------------------+---------------------------------------------+---------------------------------+
+
+Clustering
+----------
+
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Clustering method                          | Interoperable with Snowpark pandas? (Y/N)   | Notes for current implementation|
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| ``KMeans.fit()``                           | Y                                           |                                 |
++--------------------------------------------+---------------------------------------------+---------------------------------+
+
+
+Dimensionality reduction
+------------------------
+
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Operation                                  | Interoperable with Snowpark pandas? (Y/N)   | Notes for current implementation|
++--------------------------------------------+---------------------------------------------+---------------------------------+
+| Getting the principal components of a      | Y                                           |                                 |
+| numerical dataset with ``PCA.fit()``.      |                                             |                                 |
++--------------------------------------------+---------------------------------------------+---------------------------------+
+
+
+Model selection
+------------------------
+
++--------------------------------------------+---------------------------------------------+-----------------------------------------------+
+| Operation                                  | Interoperable with Snowpark pandas? (Y/N)   | Notes for current implementation              |
++--------------------------------------------+---------------------------------------------+-----------------------------------------------+
+| Choosing parameters for a                  | Y                                           | ``RandomizedSearchCV`` causes Snowpark pandas |
+| ``LogisticRegression`` model with          |                                             | to issue many queries. We strongly recommend  |
+| ``RandomizedSearchCV.fit()``.              |                                             | converting Snowpark pandas inputs to pandas   |
+|                                            |                                             | before using ``RandomizedSearchCV``           |
++--------------------------------------------+---------------------------------------------+-----------------------------------------------+
+
+Preprocessing
+-------------
+
++--------------------------------------------+---------------------------------------------+-----------------------------------------------+
+| Operation                                  | Interoperable with Snowpark pandas? (Y/N)   | Notes for current implementation              |
++--------------------------------------------+---------------------------------------------+-----------------------------------------------+
+| Scaling training data with                 | Y                                           |                                               |
+| ``MaxAbsScaler.fit_transform()``.          |                                             |                                               |
++--------------------------------------------+---------------------------------------------+-----------------------------------------------+
@@ -116,7 +116,7 @@ Methods
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``backfill``                | P                               |                                  | ``N`` if param ``downcast`` is set.                |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``between``                 | N                               |                                  |                                                    |
+| ``between``                 | Y                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``between_time``            | N                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+

@@ -0,0 +1,67 @@
+=============
+Catalog
+=============
+Catalog module for Snowpark.
+
+.. currentmodule:: snowflake.snowpark.catalog
+
+.. rubric:: Catalog
+
+.. autosummary::
+    :toctree: api/
+
+    Catalog.databaseExists
+    Catalog.database_exists
+    Catalog.dropDatabase
+    Catalog.dropSchema
+    Catalog.dropTable
+    Catalog.dropView
+    Catalog.drop_database
+    Catalog.drop_schema
+    Catalog.drop_table
+    Catalog.drop_view
+    Catalog.getCurrentDatabase
+    Catalog.getCurrentSchema
+    Catalog.getDatabase
+    Catalog.getProcedure
+    Catalog.getSchema
+    Catalog.getTable
+    Catalog.getUserDefinedFunction
+    Catalog.getView
+    Catalog.get_current_database
+    Catalog.get_current_schema
+    Catalog.get_database
+    Catalog.get_procedure
+    Catalog.get_schema
+    Catalog.get_table
+    Catalog.get_user_defined_function
+    Catalog.get_view
+    Catalog.listColumns
+    Catalog.listDatabases
+    Catalog.listProcedures
+    Catalog.listSchemas
+    Catalog.listTables
+    Catalog.listUserDefinedFunctions
+    Catalog.listViews
+    Catalog.list_columns
+    Catalog.list_databases
+    Catalog.list_procedures
+    Catalog.list_schemas
+    Catalog.list_tables
+    Catalog.list_user_defined_functions
+    Catalog.list_views
+    Catalog.procedureExists
+    Catalog.procedure_exists
+    Catalog.schemaExists
+    Catalog.schema_exists
+    Catalog.setCurrentDatabase
+    Catalog.setCurrentSchema
+    Catalog.set_current_database
+    Catalog.set_current_schema
+    Catalog.tableExists
+    Catalog.table_exists
+    Catalog.userDefinedFunctionExists
+    Catalog.user_defined_function_exists
+    Catalog.viewExists
+    Catalog.view_exists
+
@@ -35,21 +35,26 @@ Functions
     array_construct_compact
     array_contains
     array_distinct
+    array_except
     array_flatten
     array_generate_range
     array_insert
     array_intersection
+    array_join
     array_max
     array_min
     array_position
     array_prepend
     array_remove
+    array_reverse
     array_size
     array_slice
     array_sort
     array_to_string
+    array_union
     array_unique_agg
     arrays_overlap
+    arrays_zip
     as_array
     as_binary
     as_char
@@ -205,6 +210,10 @@ Functions
     lpad
     ltrim
     make_interval
+    map_cat
+    map_concat
+    map_contains_key
+    map_keys
     max
     md5
     mean

@@ -9,9 +9,9 @@ Snowpark APIs
    column
    types
    row
-   functions 
-   window 
-   grouping 
+   functions
+   window
+   grouping
    table_function
    table
    async_job
@@ -21,6 +21,7 @@ Snowpark APIs
    udtf
    observability
    files
+   catalog
    lineage
    context
    exceptions

@@ -38,6 +38,7 @@ Snowpark Session
       Session.append_query_tag
       Session.call
       Session.cancel_all
+      Session.catalog
       Session.clear_imports
       Session.clear_packages
       Session.close

@@ -43,6 +43,7 @@ requirements:
     - protobuf >=3.20,<6
     - python-dateutil
     - tzlocal
+    - snowflake.core >=1.0.0,<2
 
 test:
   imports:

@@ -29,6 +29,7 @@
     "protobuf>=3.20, <6",  # Snowpark IR
     "python-dateutil",  # Snowpark IR
     "tzlocal",  # Snowpark IR
+    "snowflake.core>=1.0.0, <2",  # Catalog
 ]
 REQUIRED_PYTHON_VERSION = ">=3.8, <3.12"
 
@@ -199,7 +200,7 @@ def run(self):
             *DEVELOPMENT_REQUIREMENTS,
             "scipy",  # Snowpark pandas 3rd party library testing
             "statsmodels",  # Snowpark pandas 3rd party library testing
-            "scikit-learn==1.5.2",  # Snowpark pandas scikit-learn tests
+            "scikit-learn",  # Snowpark pandas 3rd party library testing
             # plotly version restricted due to foreseen change in query counts in version 6.0.0+
             "plotly<6.0.0",  # Snowpark pandas 3rd party library testing
         ],

@@ -202,10 +202,16 @@ def to_sql(
         return f"'{binascii.hexlify(bytes(value)).decode()}' :: BINARY"
 
     if isinstance(value, (list, tuple, array)) and isinstance(datatype, ArrayType):
-        return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: ARRAY"
+        type_str = "ARRAY"
+        if datatype.structured:
+            type_str = convert_sp_to_sf_type(datatype)
+        return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: {type_str}"
 
     if isinstance(value, dict) and isinstance(datatype, MapType):
-        return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: OBJECT"
+        type_str = "OBJECT"
+        if datatype.structured:
+            type_str = convert_sp_to_sf_type(datatype)
+        return f"PARSE_JSON({str_to_sql(json.dumps(value, cls=PythonObjJSONEncoder))}) :: {type_str}"
 
     if isinstance(datatype, VariantType):
         # PARSE_JSON returns VARIANT, so no need to append :: VARIANT here explicitly.
@@ -260,11 +266,14 @@ def schema_expression(data_type: DataType, is_nullable: bool) -> str:
             return "to_timestamp('2020-09-16 06:30:00')"
     if isinstance(data_type, ArrayType):
         if data_type.structured:
+            assert isinstance(data_type.element_type, DataType)
             element = schema_expression(data_type.element_type, is_nullable)
             return f"to_array({element}) :: {convert_sp_to_sf_type(data_type)}"
         return "to_array(0)"
     if isinstance(data_type, MapType):
         if data_type.structured:
+            assert isinstance(data_type.key_type, DataType)
+            assert isinstance(data_type.value_type, DataType)
             key = schema_expression(data_type.key_type, is_nullable)
             value = schema_expression(data_type.value_type, is_nullable)
             return f"object_construct_keep_null({key}, {value}) :: {convert_sp_to_sf_type(data_type)}"