Skip to content

Commit 45abcf2

Browse files
Merge branch 'main' into SNOW-2364943
2 parents fc35b5e + 30fa5f3 commit 45abcf2

File tree

84 files changed

+4345
-548
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+4345
-548
lines changed
2.54 KB
Binary file not shown.
2.54 KB
Binary file not shown.
2.53 KB
Binary file not shown.

CHANGELOG.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,21 +49,43 @@
4949
- `st_y`
5050
- `st_ymax`
5151
- `st_ymin`
52-
52+
- `st_geogfromgeohash`
53+
- `st_geogpointfromgeohash`
54+
- `st_geographyfromwkb`
55+
- `st_geographyfromwkt`
56+
- `st_geometryfromwkb`
57+
- `st_geometryfromwkt`
58+
- `try_to_geography`
59+
- `try_to_geometry`
60+
- Added a parameter to enable and disable automatic column name aliasing for `interval_day_time_from_parts` and `interval_year_month_from_parts` functions.
5361

5462
#### Bug Fixes
5563

5664
- Fixed a bug that `DataFrameReader.xml` fails to parse XML files with undeclared namespaces when `ignoreNamespace` is `True`.
5765
- Added a fix for floating point precision discrepancies in `interval_day_time_from_parts`.
5866
- Fixed a bug where writing Snowpark pandas dataframes on the pandas backend with a column multiindex to Snowflake with `to_snowflake` would raise `KeyError`.
5967
- Fixed a bug that `DataFrameReader.dbapi` (PuPr) is not compatible with oracledb 3.4.0.
68+
- Fixed a bug where `modin` would unintentionally be imported during session initialization in some scenarios.
69+
- Fixed a bug where `session.udf|udtf|udaf|sproc.register` failed when an extra session argument was passed. These methods do not expect a session argument; please remove it if provided.
70+
- Fixed a bug in `DataFrameGroupBuy.agg` where func is a list of tuples used to set the names of the output columns.
71+
72+
#### Improvements
73+
74+
- The default maximum length for inferred StringType columns during schema inference in `DataFrameReader.dbapi` is now increased from 16MB to 128MB in parquet file based ingestion.
6075

6176
#### Dependency Updates
6277

6378
- Updated dependency of `snowflake-connector-python>=3.17,<5.0.0`.
6479

6580
### Snowpark pandas API Updates
6681

82+
#### New Features
83+
84+
- Added support for the `dtypes` parameter of `pd.get_dummies`
85+
- Added support for `nunique` in `df.pivot_table`, `df.agg` and other places where aggregate functions can be used.
86+
- Added support for `DataFrame.interpolate` and `Series.interpolate` with the "linear", "ffill"/"pad", and "backfill"/bfill" methods. These use the SQL `INTERPOLATE_LINEAR`, `INTERPOLATE_FFILL`, and `INTERPOLATE_BFILL` functions (PuPr).
87+
- Added support for `Dataframe.groupby.rolling()`.
88+
6789
#### Improvements
6890

6991
- Improved performance of `Series.to_snowflake` and `pd.to_snowflake(series)` for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variable `modin.config.PandasToSnowflakeParquetThresholdBytes`.
@@ -73,6 +95,16 @@
7395
- `skew()` with `axis=1` or `numeric_only=False` parameters
7496
- `round()` with `decimals` parameter as a Series
7597
- `corr()` with `method!=pearson` parameter
98+
- `shift()` with `suffix` or non-integer `periods` parameters
99+
- `sort_index()` with `axis=1` or `key` parameters
100+
- `sort_values()` with `axis=1`
101+
- `melt()` with `col_level` parameter
102+
- `apply()` with `result_type` parameter for DataFrame
103+
- `pivot_table()` with `sort=True`, non-string `index` list, non-string `columns` list, non-string `values` list, or `aggfunc` dict with non-string values
104+
- `fillna()` with `downcast` parameter or using `limit` together with `value`
105+
- `dropna()` with `axis=1`
106+
107+
76108
- Set `cte_optimization_enabled` to True for all Snowpark pandas sessions.
77109
- Add support for the following in faster pandas:
78110
- `isin`
@@ -105,7 +137,37 @@
105137
- `dt.days_in_month`
106138
- `dt.daysinmonth`
107139
- `sort_values`
140+
- `loc` (setting columns)
108141
- `to_datetime`
142+
- `rename`
143+
- `drop`
144+
- `invert`
145+
- `duplicated`
146+
- `iloc`
147+
- `head`
148+
- `columns` (e.g., df.columns = ["A", "B"])
149+
- `agg`
150+
- `min`
151+
- `max`
152+
- `count`
153+
- `sum`
154+
- `mean`
155+
- `median`
156+
- `std`
157+
- `var`
158+
- `groupby.agg`
159+
- `groupby.min`
160+
- `groupby.max`
161+
- `groupby.count`
162+
- `groupby.sum`
163+
- `groupby.mean`
164+
- `groupby.median`
165+
- `groupby.std`
166+
- `groupby.var`
167+
- `groupby.nunique`
168+
- `groupby.size`
169+
- `groupby.apply`
170+
- `drop_duplicates`
109171
- Reuse row count from the relaxed query compiler in `get_axis_len`.
110172

111173
#### Bug Fixes

docs/source/modin/hybrid_execution.rst

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
===========================================
2-
Hybrid Execution (Public Preview)
2+
Hybrid Execution
33
===========================================
44

55
Snowpark pandas supports workloads on mixed underlying execution engines and will automatically
@@ -37,8 +37,8 @@ read_snowflake, value_counts, tail, var, std, sum, sem, max, min, mean, agg, agg
3737
Examples
3838
========
3939

40-
Enabling Hybrid Execution
41-
~~~~~~~~~~~~~~~~~~~~~~~~~
40+
Disabling or Enabling Hybrid Execution
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4242

4343
.. code-block:: python
4444
@@ -140,4 +140,12 @@ Debugging Hybrid Execution
140140

141141
`pd.explain_switch()` provides information on how execution engine decisions
142142
are made. This method prints a simplified version of the command unless `simple=False` is
143-
passed as an argument.
143+
passed as an argument.
144+
145+
Performance Considerations
146+
~~~~~~~~~~~~~~~~~~~~~~~~~~
147+
Hybrid mode will generally perform well with small datasets and traditional notebook
148+
workloads, but merge-heavy workloads using a star schema can result in moving data too
149+
often, particularly when tables in the star schema straddle the transfer-cost boundary.
150+
Since the Snowflake Warehouse is designed for these SQL-like workloads turning off hybrid
151+
mode may be desirable.

docs/source/modin/supported/agg_supp.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ methods ``pd.pivot_table``, ``DataFrame.pivot_table``, and ``pd.crosstab``.
3838
| ``median`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` | ``Y`` |
3939
| | ``N`` for ``axis=1``. | | | | |
4040
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+-----------------------------------------+
41+
| ``nunique`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` | ``Y`` |
42+
| | ``N`` for ``axis=1``. | | | | |
43+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+-----------------------------------------+
4144
| ``size`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` | ``N`` |
4245
| | ``N`` for ``axis=1``. | | | | |
4346
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+-----------------------------------------+

docs/source/modin/supported/dataframe_supported.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,11 @@ Methods
227227
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
228228
| ``insert`` | Y | | |
229229
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
230-
| ``interpolate`` | N | | |
230+
| ``interpolate`` | P | | ``N`` if ``axis == 1``, ``limit`` is set, |
231+
| | | | ``limit_area`` is "outside", or ``method`` is not |
232+
| | | | "linear", "bfill", "backfill", "ffill", or "pad". |
233+
| | | | ``limit_area="inside"`` is supported only when |
234+
| | | | ``method`` is ``linear``. |
231235
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
232236
| ``isetitem`` | N | | |
233237
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+

docs/source/modin/supported/general_supported.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ Data manipulations
3232
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
3333
| ``from_dummies`` | N | | |
3434
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
35-
| ``get_dummies`` | P | ``sparse`` is ignored | ``Y`` if params ``dummy_na``, ``drop_first`` |
36-
| | | | and ``dtype`` are default, otherwise ``N`` |
35+
| ``get_dummies`` | P | ``sparse`` is ignored | ``Y`` if params ``dummy_na`` and ``drop_first`` |
36+
| | | | are default, otherwise ``N`` |
3737
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
3838
| ``json_normalize`` | Y | | |
3939
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+

docs/source/modin/supported/groupby_supported.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,10 @@ Computations/descriptive stats
153153
| | | will be lost. ``rule`` frequencies 's', 'min', |
154154
| | | 'h', and 'D' are supported. |
155155
+-----------------------------+---------------------------------+----------------------------------------------------+
156-
| ``rolling`` | N | |
156+
| ``rolling`` | P | Implemented for DataframeGroupby objects. ``N`` for|
157+
| | | ``on``, non-integer ``window``, ``axis = 1``, |
158+
| | | ``method`` != ``single``, ``min_periods = 0``, or |
159+
| | | ``closed`` != ``None``. |
157160
+-----------------------------+---------------------------------+----------------------------------------------------+
158161
| ``sample`` | N | |
159162
+-----------------------------+---------------------------------+----------------------------------------------------+

docs/source/modin/supported/series_supported.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,11 @@ Methods
243243
| ``info`` | D | | Different Index types are used in pandas but not |
244244
| | | | in Snowpark pandas |
245245
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
246-
| ``interpolate`` | N | | |
246+
| ``interpolate`` | P | | ``N`` if ``limit`` is set, |
247+
| | | | ``limit_area`` is "outside", or ``method`` is not |
248+
| | | | "linear", "bfill", "backfill", "ffill", or "pad". |
249+
| | | | ``limit_area="inside"`` is supported only when |
250+
| | | | ``method`` is ``linear``. |
247251
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
248252
| ``isin`` | Y | | Snowpark pandas deviates with respect to handling |
249253
| | | | NA values |

0 commit comments

Comments
 (0)