Skip to content

Commit b45a4f2

Browse files
Merge branch 'main' into yuwang-custom-data-source
2 parents 74e663b + d7caffe commit b45a4f2

File tree

278 files changed

+12592
-1869
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

278 files changed

+12592
-1869
lines changed
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
#!/usr/bin/env bash
22

3-
gpg --quiet --batch --yes --decrypt --passphrase="$PARAMETER_PASSWORD" --output tests/parameters.py .github/workflows/parameters/parameters_${CLOUD_PROVIDER}.py.gpg
3+
gpg --quiet --batch --yes --decrypt --passphrase="$PARAMETER_PASSWORD" .github/workflows/parameters/rsa_keys/rsa_key_${CLOUD_PROVIDER}.p8.gpg >> tests/rsa_key_${CLOUD_PROVIDER}.p8
4+
gpg --quiet --batch --yes --decrypt --passphrase="$PARAMETER_PASSWORD" .github/workflows/parameters/parameters_${CLOUD_PROVIDER}.py.gpg >> tests/parameters.py
45
gpg --quiet --batch --yes --decrypt --passphrase="$PARAMETER_PASSWORD" .github/workflows/parameters/parameters_dbapi.py.gpg >> tests/parameters.py

.github/workflows/create-test-branch-from-release.yml

Lines changed: 2 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,11 @@
11
# This workflow automatically creates a test branch from a release tag
22
# For example, when release v1.40.0 is published, it creates test-v1.40.0 branch
3-
# Can also be triggered manually to create a test branch from any existing tag
43

54
name: Create Test Branch from Release
65

76
on:
87
release:
98
types: [published]
10-
workflow_dispatch:
11-
inputs:
12-
tag_name:
13-
description: 'Tag name to create test branch from (e.g., v1.40.0)'
14-
required: true
15-
type: string
16-
test_branch_name:
17-
description: 'Test branch name (optional, defaults to test-<tag_name>)'
18-
required: false
19-
type: string
209

2110
permissions:
2211
contents: write
@@ -29,18 +18,8 @@ jobs:
2918
- name: Extract tag name
3019
id: extract_tag
3120
run: |
32-
# Determine tag name based on trigger type
33-
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
34-
TAG_NAME=${{ inputs.tag_name }}
35-
if [ -n "${{ inputs.test_branch_name }}" ]; then
36-
TEST_BRANCH_NAME=${{ inputs.test_branch_name }}
37-
else
38-
TEST_BRANCH_NAME="test-${TAG_NAME}"
39-
fi
40-
else
41-
TAG_NAME=${{ github.event.release.tag_name }}
42-
TEST_BRANCH_NAME="test-${TAG_NAME}"
43-
fi
21+
TAG_NAME=${{ github.event.release.tag_name }}
22+
TEST_BRANCH_NAME="test-${TAG_NAME}"
4423
4524
echo "tag_name=${TAG_NAME}" >> $GITHUB_OUTPUT
4625
echo "test_branch_name=${TEST_BRANCH_NAME}" >> $GITHUB_OUTPUT
44 Bytes
Binary file not shown.
50 Bytes
Binary file not shown.

.github/workflows/parameters/parameters_gcp.py.gpg

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1-
� ��cWhY�^���q�"�a����n�D!��Ks���W�V�t� ��#�u�/[v����D-������mÊ��6�M���h�u�t�'��yѫ��lwf��/����Ļ�Je�B|v7a�^�:�І�C�7�|,�|8�I���
2-
t�5�)g��2�y��~�&g�ι�� KT�d��F�oH�I0ꕎ��b1�J 8��0���� ��w}
3-
��9��f�'{��NL�G�hX�~�%|�.����G��?+_A'�c.��V�j�>����� 7���z�^��b+l�16ܜVA��8V
1+
� l���r���������`{����ű�W�Y�L�hT�k��ڭ����_� H�
2+
H�5{��쬋�ݙRS�2�I�t�'N9�eMq@�����5H��WNJ����iE�zt��ы�X�X�LG�!,O��;�V9v���WIvEFx��%��0g��-��x��cm�N���mʼn��� &\���5C�'3���f�y����0�<;hp܌����4�<����8�;L���,�+oTv�2��'����X���)�ٖr��yF_����N�js��� �=��'��Z�A��s�u%��ִF�Y���d� ���"�FѲ�m`���f-�X�
3+
�:'H��
4+
���`�^��%�;�{ZJYWl{3�:
2.54 KB
Binary file not shown.
2.54 KB
Binary file not shown.
2.53 KB
Binary file not shown.

CHANGELOG.md

Lines changed: 206 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,161 @@
11
# Release History
22

3-
## 1.41.0 (YYYY-MM-DD)
3+
## 1.43.0 (YYYY-MM-DD)
4+
5+
### Snowpark Python API Updates
6+
7+
#### New Features
8+
9+
- Added support for `Session.client_telemetry`.
10+
- Added support for `Session.udf_profiler`.
11+
- Added support for `functions.ai_translate`.
12+
- Added support for the following functions in `functions.py`:
13+
- String and Binary functions:
14+
- `base64_decode_binary`
15+
- `compress`
16+
- `decompress_binary`
17+
- `decompress_string`
18+
- `md5_binary`
19+
- `md5_number_lower64`
20+
- `md5_number_upper64`
21+
- `sha1_binary`
22+
- `sha2_binary`
23+
- `soundex_p123`
24+
- `strtok`
25+
- `try_base64_decode_binary`
26+
- `try_base64_decode_string`
27+
- `try_hex_decode_binary`
28+
- `try_hex_decode_string`
29+
- `unicode`
30+
- `uuid_string`
31+
32+
- Conditional expressions:
33+
- `booland_agg`
34+
- `boolxor_agg`
35+
- `regr_valy`
36+
- `zeroifnull`
37+
38+
- Numeric expressions:
39+
- `cot`
40+
- `mod`
41+
- `pi`
42+
- `square`
43+
- `width_bucket`
44+
45+
#### Improvements
46+
47+
- Enhanced `DataFrame.sort()` to support `ORDER BY ALL` when no columns are specified.
48+
- Catalog API now uses SQL commands instead of SnowAPI calls. This new implementation is more reliable now.
49+
50+
#### Dependency Updates
51+
52+
- Catalog API no longer uses types declared in `snowflake.core` and therefore this dependency was removed.
53+
54+
### Snowpark pandas API Updates
55+
56+
#### New Features
57+
58+
- Added support for `Dataframe.groupby.rolling()`.
59+
- Added support for mapping `np.percentile` with DataFrame and Series inputs to `Series.quantile`.
60+
- Added support for setting the `random_state` parameter to an integer when calling `DataFrame.sample` or `Series.sample`.
61+
62+
#### Improvements
63+
64+
- Enhanced autoswitching functionality from Snowflake to native pandas for methods with unsupported argument combinations:
65+
- `shift()` with `suffix` or non-integer `periods` parameters
66+
- `sort_index()` with `axis=1` or `key` parameters
67+
- `sort_values()` with `axis=1`
68+
- `melt()` with `col_level` parameter
69+
- `apply()` with `result_type` parameter for DataFrame
70+
- `pivot_table()` with `sort=True`, non-string `index` list, non-string `columns` list, non-string `values` list, or `aggfunc` dict with non-string values
71+
- `fillna()` with `downcast` parameter or using `limit` together with `value`
72+
- `dropna()` with `axis=1`
73+
- `asfreq()` with `how` parameter, `fill_value` parameter, `normalize=True`, or `freq` parameter being week, month, quarter, or year
74+
- `groupby()` with `axis=1`, `by!=None and level!=None`, or by containing any non-pandas hashable labels.
75+
- `groupby_fillna()` with `downcast` parameter
76+
- `groupby_first()` with `min_count>1`
77+
- `groupby_last()` with `min_count>1`
78+
- `groupby_shift()` with `freq` parameter
79+
- Slightly improved the performance of `agg`, `nunique`, `describe`, and related methods on 1-column DataFrame and Series objects.
80+
81+
#### Bug Fixes
82+
83+
- Fixed a bug in `DataFrameGroupBy.agg` where func is a list of tuples used to set the names of the output columns.
84+
- Fixed a bug where converting a modin datetime index with a timezone to a numpy array with `np.asarray` would cause a `TypeError`.
85+
- Fixed a bug where `Series.isin` with a Series argument matched index labels instead of the row position.
86+
87+
#### Improvements
88+
89+
- Add support for the following in faster pandas:
90+
- `groupby.apply`
91+
- `groupby.nunique`
92+
- `groupby.size`
93+
- `concat`
94+
- `copy`
95+
- `str.isdigit`
96+
- `str.islower`
97+
- `str.isupper`
98+
- `str.istitle`
99+
- `str.lower`
100+
- `str.upper`
101+
- `str.title`
102+
- `str.match`
103+
- `str.capitalize`
104+
- `str.__getitem__`
105+
- `str.center`
106+
- `str.count`
107+
- `str.get`
108+
- `str.pad`
109+
- `str.len`
110+
- `str.ljust`
111+
- `str.rjust`
112+
- `str.split`
113+
- `str.replace`
114+
- `str.strip`
115+
- `str.lstrip`
116+
- `str.rstrip`
117+
- `str.translate`
118+
- `dt.tz_localize`
119+
- `dt.tz_convert`
120+
- `dt.ceil`
121+
- `dt.round`
122+
- `dt.floor`
123+
- `dt.normalize`
124+
- `dt.month_name`
125+
- `dt.day_name`
126+
- `dt.strftime`
127+
- `rolling.min`
128+
- `rolling.max`
129+
- `rolling.count`
130+
- `rolling.sum`
131+
- `rolling.mean`
132+
- `rolling.std`
133+
- `rolling.var`
134+
- `rolling.sem`
135+
- `rolling.corr`
136+
- `expanding.min`
137+
- `expanding.max`
138+
- `expanding.count`
139+
- `expanding.sum`
140+
- `expanding.mean`
141+
- `expanding.std`
142+
- `expanding.var`
143+
- `expanding.sem`
144+
- `cumsum`
145+
- `cummin`
146+
- `cummax`
147+
- Make faster pandas disabled by default (opt-in instead of opt-out).
148+
- Improve performance of `drop_duplicates` by avoiding joins when `keep!=False` in faster pandas.
149+
150+
## 1.42.0 (2025-10-28)
151+
152+
### Snowpark Python API Updates
153+
154+
#### New Features
155+
156+
- Snowpark python DB-api is now generally available. Access this feature with `DataFrameReader.dbapi()` to read data from a database table or query into a DataFrame using a DBAPI connection.
157+
158+
## 1.41.0 (2025-10-23)
4159

5160
### Snowpark Python API Updates
6161

@@ -49,21 +204,44 @@
49204
- `st_y`
50205
- `st_ymax`
51206
- `st_ymin`
207+
- `st_geogfromgeohash`
208+
- `st_geogpointfromgeohash`
209+
- `st_geographyfromwkb`
210+
- `st_geographyfromwkt`
211+
- `st_geometryfromwkb`
212+
- `st_geometryfromwkt`
213+
- `try_to_geography`
214+
- `try_to_geometry`
215+
216+
#### Improvements
52217

218+
- Added a parameter to enable and disable automatic column name aliasing for `interval_day_time_from_parts` and `interval_year_month_from_parts` functions.
53219

54220
#### Bug Fixes
55221

56222
- Fixed a bug that `DataFrameReader.xml` fails to parse XML files with undeclared namespaces when `ignoreNamespace` is `True`.
57223
- Added a fix for floating point precision discrepancies in `interval_day_time_from_parts`.
58224
- Fixed a bug where writing Snowpark pandas dataframes on the pandas backend with a column multiindex to Snowflake with `to_snowflake` would raise `KeyError`.
59225
- Fixed a bug that `DataFrameReader.dbapi` (PuPr) is not compatible with oracledb 3.4.0.
226+
- Fixed a bug where `modin` would unintentionally be imported during session initialization in some scenarios.
227+
- Fixed a bug where `session.udf|udtf|udaf|sproc.register` failed when an extra session argument was passed. These methods do not expect a session argument; please remove it if provided.
228+
229+
#### Improvements
230+
231+
- The default maximum length for inferred StringType columns during schema inference in `DataFrameReader.dbapi` is now increased from 16MB to 128MB in parquet file based ingestion.
60232

61233
#### Dependency Updates
62234

63235
- Updated dependency of `snowflake-connector-python>=3.17,<5.0.0`.
64236

65237
### Snowpark pandas API Updates
66238

239+
#### New Features
240+
241+
- Added support for the `dtypes` parameter of `pd.get_dummies`
242+
- Added support for `nunique` in `df.pivot_table`, `df.agg` and other places where aggregate functions can be used.
243+
- Added support for `DataFrame.interpolate` and `Series.interpolate` with the "linear", "ffill"/"pad", and "backfill"/bfill" methods. These use the SQL `INTERPOLATE_LINEAR`, `INTERPOLATE_FFILL`, and `INTERPOLATE_BFILL` functions (PuPr).
244+
67245
#### Improvements
68246

69247
- Improved performance of `Series.to_snowflake` and `pd.to_snowflake(series)` for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variable `modin.config.PandasToSnowflakeParquetThresholdBytes`.
@@ -105,7 +283,34 @@
105283
- `dt.days_in_month`
106284
- `dt.daysinmonth`
107285
- `sort_values`
286+
- `loc` (setting columns)
108287
- `to_datetime`
288+
- `rename`
289+
- `drop`
290+
- `invert`
291+
- `duplicated`
292+
- `iloc`
293+
- `head`
294+
- `columns` (e.g., df.columns = ["A", "B"])
295+
- `agg`
296+
- `min`
297+
- `max`
298+
- `count`
299+
- `sum`
300+
- `mean`
301+
- `median`
302+
- `std`
303+
- `var`
304+
- `groupby.agg`
305+
- `groupby.min`
306+
- `groupby.max`
307+
- `groupby.count`
308+
- `groupby.sum`
309+
- `groupby.mean`
310+
- `groupby.median`
311+
- `groupby.std`
312+
- `groupby.var`
313+
- `drop_duplicates`
109314
- Reuse row count from the relaxed query compiler in `get_axis_len`.
110315

111316
#### Bug Fixes

docs/source/modin/hybrid_execution.rst

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
===========================================
2-
Hybrid Execution (Public Preview)
2+
Hybrid Execution
33
===========================================
44

55
Snowpark pandas supports workloads on mixed underlying execution engines and will automatically
@@ -37,8 +37,8 @@ read_snowflake, value_counts, tail, var, std, sum, sem, max, min, mean, agg, agg
3737
Examples
3838
========
3939

40-
Enabling Hybrid Execution
41-
~~~~~~~~~~~~~~~~~~~~~~~~~
40+
Disabling or Enabling Hybrid Execution
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4242

4343
.. code-block:: python
4444
@@ -140,4 +140,12 @@ Debugging Hybrid Execution
140140

141141
`pd.explain_switch()` provides information on how execution engine decisions
142142
are made. This method prints a simplified version of the command unless `simple=False` is
143-
passed as an argument.
143+
passed as an argument.
144+
145+
Performance Considerations
146+
~~~~~~~~~~~~~~~~~~~~~~~~~~
147+
Hybrid mode will generally perform well with small datasets and traditional notebook
148+
workloads, but merge-heavy workloads using a star schema can result in moving data too
149+
often, particularly when tables in the star schema straddle the transfer-cost boundary.
150+
Since the Snowflake Warehouse is designed for these SQL-like workloads turning off hybrid
151+
mode may be desirable.

0 commit comments

Comments
 (0)