snowflakedb
diff --git a/‎CHANGELOG.md‎
Lines changed: 32 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 32 additions & 1 deletion
diff --git a/‎docs/source/modin/supported/agg_supp.rst‎
Lines changed: 3 additions & 0 deletions b/‎docs/source/modin/supported/agg_supp.rst‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/modin/supported/dataframe_supported.rst‎
Lines changed: 5 additions & 1 deletion b/‎docs/source/modin/supported/dataframe_supported.rst‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/source/modin/supported/series_supported.rst‎
Lines changed: 5 additions & 1 deletion b/‎docs/source/modin/supported/series_supported.rst‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎src/snowflake/snowpark/_internal/data_source/drivers/base_driver.py‎
Lines changed: 26 additions & 6 deletions b/‎src/snowflake/snowpark/_internal/data_source/drivers/base_driver.py‎
Lines changed: 26 additions & 6 deletions
diff --git a/‎src/snowflake/snowpark/_internal/server_connection.py‎
Lines changed: 1 addition & 0 deletions b/‎src/snowflake/snowpark/_internal/server_connection.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/snowflake/snowpark/_internal/udf_utils.py‎
Lines changed: 3 additions & 7 deletions b/‎src/snowflake/snowpark/_internal/udf_utils.py‎
Lines changed: 3 additions & 7 deletions
diff --git a/‎src/snowflake/snowpark/async_job.py‎
Lines changed: 27 additions & 3 deletions b/‎src/snowflake/snowpark/async_job.py‎
Lines changed: 27 additions & 3 deletions
diff --git a/‎src/snowflake/snowpark/dataframe_reader.py‎
Lines changed: 18 additions & 12 deletions b/‎src/snowflake/snowpark/dataframe_reader.py‎
Lines changed: 18 additions & 12 deletions
@@ -58,14 +58,20 @@
     - `st_geometryfromwkt`
     - `try_to_geography`
     - `try_to_geometry`
-
+- Added a parameter to enable and disable automatic column name aliasing for `interval_day_time_from_parts` and `interval_year_month_from_parts` functions.
 
 #### Bug Fixes
 
 - Fixed a bug that `DataFrameReader.xml` fails to parse XML files with undeclared namespaces when `ignoreNamespace` is `True`.
 - Added a fix for floating point precision discrepancies in `interval_day_time_from_parts`.
 - Fixed a bug where writing Snowpark pandas dataframes on the pandas backend with a column multiindex to Snowflake with `to_snowflake` would raise `KeyError`.
 - Fixed a bug that `DataFrameReader.dbapi` (PuPr) is not compatible with oracledb 3.4.0.
+- Fixed a bug where `modin` would unintentionally be imported during session initialization in some scenarios.
+- Fixed a bug where `session.udf|udtf|udaf|sproc.register` failed when an extra session argument was passed. These methods do not expect a session argument; please remove it if provided.
+
+#### Improvements
+
+- The default maximum length for inferred StringType columns during schema inference in `DataFrameReader.dbapi` is now increased from 16MB to 128MB in parquet file based ingestion.
 
 #### Dependency Updates
 
@@ -74,7 +80,10 @@
 ### Snowpark pandas API Updates
 
 #### New Features
+
 - Added support for the `dtypes` parameter of `pd.get_dummies`
+- Added support for `nunique` in `df.pivot_table`, `df.agg` and other places where aggregate functions can be used.
+- Added support for `DataFrame.interpolate` and `Series.interpolate` with the "linear", "ffill"/"pad", and "backfill"/bfill" methods. These use the SQL `INTERPOLATE_LINEAR`, `INTERPOLATE_FFILL`, and `INTERPOLATE_BFILL` functions (PuPr).
 
 #### Improvements
 
@@ -132,6 +141,28 @@
   - `drop`
   - `invert`
   - `duplicated`
+  - `iloc`
+  - `head`
+  - `columns` (e.g., df.columns = ["A", "B"])
+  - `agg`
+  - `min`
+  - `max`
+  - `count`
+  - `sum`
+  - `mean`
+  - `median`
+  - `std`
+  - `var`
+  - `groupby.agg`
+  - `groupby.min`
+  - `groupby.max`
+  - `groupby.count`
+  - `groupby.sum`
+  - `groupby.mean`
+  - `groupby.median`
+  - `groupby.std`
+  - `groupby.var`
+  - `drop_duplicates`
 - Reuse row count from the relaxed query compiler in `get_axis_len`.
 
 #### Bug Fixes
 
@@ -38,6 +38,9 @@ methods ``pd.pivot_table``, ``DataFrame.pivot_table``, and ``pd.crosstab``.
 | ``median``                  | ``Y`` for ``axis=0``.               | ``Y``                            | ``Y``                                      | ``Y``                                   | ``Y``                                   |
 |                             | ``N`` for  ``axis=1``.              |                                  |                                            |                                         |                                         |
 +-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+-----------------------------------------+
+| ``nunique``                 | ``Y`` for ``axis=0``.               | ``Y``                            | ``Y``                                      | ``Y``                                   | ``Y``                                   |
+|                             | ``N`` for  ``axis=1``.              |                                  |                                            |                                         |                                         |
++-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+-----------------------------------------+
 | ``size``                    | ``Y`` for ``axis=0``.               | ``Y``                            | ``Y``                                      | ``Y``                                   | ``N``                                   |
 |                             | ``N`` for  ``axis=1``.              |                                  |                                            |                                         |                                         |
 +-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+-----------------------------------------+
 
@@ -227,7 +227,11 @@ Methods
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``insert``                  | Y                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``interpolate``             | N                               |                                  |                                                    |
+| ``interpolate``             | P                               |                                  | ``N`` if ``axis == 1``, ``limit`` is set,          |
+|                             |                                 |                                  | ``limit_area`` is "outside", or ``method`` is not  |
+|                             |                                 |                                  | "linear", "bfill", "backfill", "ffill", or "pad".  |
+|                             |                                 |                                  | ``limit_area="inside"`` is supported only when     |
+|                             |                                 |                                  | ``method`` is ``linear``.                          |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``isetitem``                | N                               |                                  |                                                    |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 
@@ -243,7 +243,11 @@ Methods
 | ``info``                    | D                               |                                  | Different Index types are used in pandas but not   |
 |                             |                                 |                                  | in Snowpark pandas                                 |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
-| ``interpolate``             | N                               |                                  |                                                    |
+| ``interpolate``             | P                               |                                  | ``N`` if ``limit`` is set,                         |
+|                             |                                 |                                  | ``limit_area`` is "outside", or ``method`` is not  |
+|                             |                                 |                                  | "linear", "bfill", "backfill", "ffill", or "pad".  |
+|                             |                                 |                                  | ``limit_area="inside"`` is supported only when     |
+|                             |                                 |                                  | ``method`` is ``linear``.                          |
 +-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
 | ``isin``                    | Y                               |                                  | Snowpark pandas deviates with respect to handling  |
 |                             |                                 |                                  | NA values                                          |
 
@@ -11,6 +11,7 @@
     Connection,
     Cursor,
 )
+from snowflake.snowpark._internal.server_connection import MAX_STRING_SIZE
 from snowflake.snowpark._internal.utils import (
     get_sorted_key_for_version,
     measure_time,
@@ -27,6 +28,7 @@
     BinaryType,
     DateType,
     BooleanType,
+    StringType,
 )
 import snowflake.snowpark
 import logging
@@ -103,7 +105,16 @@ def infer_schema_from_description(
         query_input_alias: str,
     ) -> StructType:
         self.get_raw_schema(table_or_query, cursor, is_query, query_input_alias)
-        return self.to_snow_type(self.raw_schema)
+        generated_schema = self.to_snow_type(self.raw_schema)
+        # snowflake will default string length to 128MB in the bundle which will be enabled in 2026-01
+        # https://docs.snowflake.com/en/release-notes/bcr-bundles/2025_07_bundle
+        # here we prematurely make the change to default string to
+        # 1. align the string length with UDTF based ingestion
+        # 2. avoid the BCR impact to dbapi feature
+        for field in generated_schema.fields:
+            if isinstance(field.datatype, StringType) and field.datatype.length is None:
+                field.datatype.length = MAX_STRING_SIZE
+        return generated_schema
 
     def infer_schema_from_description_with_error_control(
         self, table_or_query: str, is_query: bool, query_input_alias: str
@@ -177,13 +188,17 @@ def udtf_ingestion(
                 packages=packages or UDTF_PACKAGE_MAP.get(self.dbms_type),
                 imports=imports,
                 statement_params=statement_params,
+                _emit_ast=_emit_ast,  # internal function call, _emit_ast will be set to False by the caller
             )
         logger.debug(f"register ingestion udtf takes: {udtf_register_time()} seconds")
         call_udtf_sql = f"""
             select * from {partition_table}, table({udtf_name}({PARTITION_TABLE_COLUMN_NAME}))
             """
         res = session.sql(call_udtf_sql, _emit_ast=_emit_ast)
-        return self.to_result_snowpark_df_udtf(res, schema, _emit_ast=_emit_ast)
+        return BaseDriver.keep_nullable_attributes(
+            self.to_result_snowpark_df_udtf(res, schema, _emit_ast=_emit_ast),
+            schema,
+        )
 
     def udtf_class_builder(
         self,
@@ -283,6 +298,14 @@ def to_result_snowpark_df(
     ) -> "DataFrame":
         return session.table(table_name, _emit_ast=_emit_ast)
 
+    @staticmethod
+    def keep_nullable_attributes(
+        selected_df: "DataFrame", schema: StructType
+    ) -> "DataFrame":
+        for attr, source_field in zip(selected_df._plan.attributes, schema.fields):
+            attr.nullable = source_field.nullable
+        return selected_df
+
     @staticmethod
     def to_result_snowpark_df_udtf(
         res_df: "DataFrame",
@@ -293,10 +316,7 @@ def to_result_snowpark_df_udtf(
             res_df[field.name].cast(field.datatype).alias(field.name)
             for field in schema.fields
         ]
-        selected_df = res_df.select(cols, _emit_ast=_emit_ast)
-        for attr, source_field in zip(selected_df._plan.attributes, schema.fields):
-            attr.nullable = source_field.nullable
-        return selected_df
+        return res_df.select(cols, _emit_ast=_emit_ast)
 
     def get_server_cursor_if_supported(self, conn: "Connection") -> "Cursor":
         """
 
@@ -86,6 +86,7 @@
 PARAM_INTERNAL_APPLICATION_NAME = "internal_application_name"
 PARAM_INTERNAL_APPLICATION_VERSION = "internal_application_version"
 DEFAULT_STRING_SIZE = 16777216
+MAX_STRING_SIZE = 134217728
 
 
 def _build_target_path(stage_location: str, dest_prefix: str = "") -> str:
 
@@ -1134,7 +1134,7 @@ def resolve_imports_and_packages(
     skip_upload_on_content_match: bool = False,
     is_permanent: bool = False,
     force_inline_code: bool = False,
-    **kwargs,
+    _suppress_local_package_warnings: bool = False,
 ) -> Tuple[
     Optional[str],
     Optional[str],
@@ -1168,9 +1168,7 @@ def resolve_imports_and_packages(
                     packages,
                     include_pandas=is_pandas_udf,
                     statement_params=statement_params,
-                    _suppress_local_package_warnings=kwargs.get(
-                        "_suppress_local_package_warnings", False
-                    ),
+                    _suppress_local_package_warnings=_suppress_local_package_warnings,
                 )
                 if packages is not None
                 else session._resolve_packages(
@@ -1179,9 +1177,7 @@ def resolve_imports_and_packages(
                     validate_package=False,
                     include_pandas=is_pandas_udf,
                     statement_params=statement_params,
-                    _suppress_local_package_warnings=kwargs.get(
-                        "_suppress_local_package_warnings", False
-                    ),
+                    _suppress_local_package_warnings=_suppress_local_package_warnings,
                 )
             )
 
 
@@ -284,10 +284,34 @@ def cancel(self) -> None:
                 "ENABLE_ASYNC_QUERY_IN_PYTHON_STORED_PROCS", False
             )
         ):
-            cancel_resp = self._session._conn._conn.cancel_query(self.query_id)
-            if not cancel_resp.get("success", False):
+            import _snowflake
+            import json
+            import uuid
+
+            try:
+                uuid.UUID(self.query_id)
+            except ValueError:
+                raise ValueError(f"Invalid UUID: '{self.query_id}'")
+
+            raw_cancel_resp = _snowflake.cancel_query(self.query_id)
+
+            # Set failure_response when
+            #   - success != True in the response or
+            #   - cannot parse the response at all.
+            failure_response = None
+            try:
+                parsed_cancel_resp = json.loads(raw_cancel_resp)
+                if not parsed_cancel_resp.get("success", False):
+                    failure_response = parsed_cancel_resp
+            except (TypeError, json.JSONDecodeError) as e:
+                failure_response = {
+                    "success": False,
+                    "error": f"Error parsing response: {e}",
+                }
+
+            if failure_response:
                 raise DatabaseError(
-                    f"Failed to cancel query. Returned response: {cancel_resp}"
+                    f"Failed to cancel query. Returned response: {failure_response}"
                 )
         else:
             self._cursor.execute(f"select SYSTEM$CANCEL_QUERY('{self.query_id}')")
 
@@ -1707,18 +1707,24 @@ def dbapi(
         Reads data from a database table or query into a DataFrame using a DBAPI connection,
         with support for optional partitioning, parallel processing, and query customization.
 
-        There are multiple methods to partition data and accelerate ingestion.
-        These methods can be combined to achieve optimal performance:
-
-        1.Use column, lower_bound, upper_bound and num_partitions at the same time when you need to split large tables into smaller partitions for parallel processing.
-        These must all be specified together, otherwise error will be raised.
-        2.Set max_workers to a proper positive integer.
-        This defines the maximum number of processes and threads used for parallel execution.
-        3.Adjusting fetch_size can optimize performance by reducing the number of round trips to the database.
-        4.Use predicates to defining WHERE conditions for partitions,
-        predicates will be ignored if column is specified to generate partition.
-        5.Set custom_schema to avoid snowpark infer schema, custom_schema must have a matched
-        column name with table in external data source.
+        Usage Notes:
+            - Ingestion performance tuning:
+                - **Partitioning**: Use ``column``, ``lower_bound``, ``upper_bound``, and ``num_partitions``
+                  together to split large tables into smaller partitions for parallel processing.
+                  All four parameters must be specified together, otherwise an error will be raised.
+                - **Parallel execution**: Set ``max_workers`` to control the maximum number of processes
+                  and threads used for parallel execution.
+                - **Fetch optimization**: Adjust ``fetch_size`` to optimize performance by reducing
+                  the number of round trips to the database.
+                - **Partition filtering**: Use ``predicates`` to define WHERE conditions for partitions.
+                  Note that ``predicates`` will be ignored if ``column`` is specified for partitioning.
+                - **Schema specification**: Set ``custom_schema`` to skip schema inference. The custom schema
+                  must have matching column names with the table in the external data source.
+            - Execution timing and error handling:
+                - **UDTF Ingestion**: Uses lazy evaluation. Errors are reported as ``SnowparkSQLException``
+                  during DataFrame actions (e.g., ``DataFrame.collect()``).
+                - **Local Ingestion**: Uses eager execution. Errors are reported immediately as
+                  ``SnowparkDataFrameReaderException`` when this method is called.
 
         Args:
             create_connection: A callable that returns a DB-API compatible database connection.