- Added support for the following functions in
functions.pyarray_reversedivnullmap_catmap_contains_keymap_keysnullifzerosnowflake_cortex_sentiment
- Added
Catalogclass to manage snowflake objects. It can be accessed viaSession.catalog. - Added new methods in class
DataFrame:col_regex: Select columns that match with provided regex.mapand its aliasforeach: A method to apply user function on each row with 1-1 mapping.flat_map: A method to apply user function on each row with one to many mapping.toJSONand its aliasto_json: Convert each row of dataframe into json string.transform: Chain multiple transformations on dataframe.
- Updated README.md to include instructions on how to verify package signatures using
cosign.
- Added support for
Series.str.ljustandSeries.str.rjust. - Added support for
Series.str.center. - Added support for
Series.str.pad. - Added support for applying Snowpark Python function
snowflake_cortex_sentiment. - Added support for
DataFrame.map. - Added support for
DataFrame.from_dictandDataFrame.from_records. - Added support for mixed case field names in struct type columns.
- Added support for
SeriesGroupBy.unique - Added support for
Series.dt.strftimewith the following directives:- %d: Day of the month as a zero-padded decimal number.
- %m: Month as a zero-padded decimal number.
- %Y: Year with century as a decimal number.
- %H: Hour (24-hour clock) as a zero-padded decimal number.
- %M: Minute as a zero-padded decimal number.
- %S: Second as a zero-padded decimal number.
- %f: Microsecond as a decimal number, zero-padded to 6 digits.
- %j: Day of the year as a zero-padded decimal number.
- %X: Locale’s appropriate time representation.
- %%: A literal '%' character.
- Added support for
Series.between.
- Fixed a bug that system function called through
session.callhave incorrect type conversion.
- Improve performance of
DataFrame.map,Series.applyandSeries.mapmethods by mapping numpy functions to snowpark functions if possible. - Updated integration testing for
session.lineage.traceto exclude deleted objects - Added documentation for
DataFrame.map. - Improve performance of
DataFrame.applyby mapping numpy functions to snowpark functions if possible. - Added documentation on the extent of Snowpark pandas interoperability with scikit-learn
- Added support for property
versionand class methodget_active_sessionforSessionclass. - Added support for property
versionand class methodget_active_sessionforSessionclass. - Added new methods and variables to enhance data type handling and JSON serialization/deserialization:
- To
DataType, its derived classes, andStructField:type_name: Returns the type name of the data.simple_string: Provides a simple string representation of the data.json_value: Returns the data as a JSON-compatible value.json: Converts the data to a JSON string.
- To
ArrayType,MapType,StructField,PandasSeriesType,PandasDataFrameTypeandStructType:from_json: Enables these types to be created from JSON data.
- To
MapType:keyType: keys of the mapvalueType: values of the map
- To
- Added support for method
appNameinSessionBuilder. - Added support for
include_nullsargument inDataFrame.unpivot. - Added support for following functions in
functions.py:sizeto get size of array, object, or map columns.collect_listan alias ofarray_agg.substringmakeslenargument optional.
- Added parameter
ast_enabledto session for internal usage (default:False).
- Added support for specifying the following to
DataFrame.create_or_replace_dynamic_table:iceberg_configA dictionary that can hold the following iceberg configuration options:external_volumecatalogbase_locationcatalog_syncstorage_serialization_policy
- Added support for nested data types to
DataFrame.print_schema - Added support for
levelparameter toDataFrame.print_schema - Improved flexibility of
DataFrameReaderandDataFrameWriterAPI by adding support for the following:- Added
formatmethod toDataFrameReaderandDataFrameWriterto specify file format when loading or unloading results. - Added
loadmethod toDataFrameReaderto work in conjunction withformat. - Added
savemethod toDataFrameWriterto work in conjunction withformat. - Added support to read keyword arguments to
optionsmethod forDataFrameReaderandDataFrameWriter.
- Added
- Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11,
cloudpickle==2.2.1remains the only supported version.
- Removed warnings that dynamic pivot features were in private preview, because dynamic pivot is now generally available.
- Fixed a bug in
session.read.optionswhereFalseBoolean values were incorrectly parsed asTruein the generated file format.
- Added a runtime dependency on
python-dateutil.
- Added partial support for
Series.mapwhenargis a pandasSeriesor acollections.abc.Mapping. No support for instances ofdictthat implement__missing__but are not instances ofcollections.defaultdict. - Added support for
DataFrame.alignandSeries.alignforaxis=1andaxis=None. - Added support for
pd.json_normalize. - Added support for
GroupBy.pct_changewithaxis=0,freq=None, andlimit=None. - Added support for
DataFrameGroupBy.__iter__andSeriesGroupBy.__iter__. - Added support for
np.sqrt,np.trunc,np.floor, numpy trig functions,np.exp,np.abs,np.positiveandnp.negative. - Added partial support for the dataframe interchange protocol method
DataFrame.__dataframe__().
- Fixed a bug in
df.locwhere setting a single column from a series results in unexpectedNonevalues.
- Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.
- Improved documentation for pd.read_excel.
- Added the following new functions in
snowflake.snowpark.dataframe:map
- Added support for passing parameter
include_errortoSession.query_historyto record queries that have error during execution.
- When target stage is not set in profiler, a default stage from
Session.get_session_stageis used instead of raisingSnowparkSQLException. - Allowed lower case or mixed case input when calling
Session.stored_procedure_profiler.set_active_profiler. - Added distributed tracing using open telemetry APIs for action function in
DataFrame:cache_result
- Removed opentelemetry warning from logging.
- Fixed the pre-action and post-action query propagation when
Inexpression were used in selects. - Fixed a bug that raised error
AttributeErrorwhile callingSession.stored_procedure_profiler.get_outputwhenSession.stored_procedure_profileris disabled.
- Added a dependency on
protobuf>=5.28andtzlocalat runtime. - Added a dependency on
protoc-wheel-0for the development profile. - Require
snowflake-connector-python>=3.12.0, <4.0.0(was>=3.10.0).
- Updated
modinfrom 0.28.1 to 0.30.1. - Added support for all
pandas2.2.x versions.
- Added support for
Index.to_numpy. - Added support for
DataFrame.alignandSeries.alignforaxis=0. - Added support for
sizeinGroupBy.aggregate,DataFrame.aggregate, andSeries.aggregate. - Added support for
snowflake.snowpark.functions.window - Added support for
pd.read_pickle(Uses native pandas for processing). - Added support for
pd.read_html(Uses native pandas for processing). - Added support for
pd.read_xml(Uses native pandas for processing). - Added support for aggregation functions
"size"andleninGroupBy.aggregate,DataFrame.aggregate, andSeries.aggregate. - Added support for list values in
Series.str.len.
- Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g.
pd.DataFrame([0]).agg(np.mean)) would fail to transpose the result. - Fixed bugs where
DataFrame.dropna()would:- Treat an empty
subset(e.g.[]) as if it specified all columns instead of no columns. - Raise a
TypeErrorfor a scalarsubsetinstead of filtering on just that column. - Raise a
ValueErrorfor asubsetof typepandas.Indexinstead of filtering on the columns in the index.
- Treat an empty
- Disable creation of scoped read only table to mitigate Disable creation of scoped read only table to mitigate
TableNotFoundErrorwhen using dynamic pivot in notebook environment. - Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.
- Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.
- Improve get_dummies performance by flattening the pivot with join.
- Improve align performance when aligning on row position column by removing unnecessary window functions.
- Added support for patching functions that are unavailable in the
snowflake.snowpark.functionsmodule. - Added support for
snowflake.snowpark.functions.any_value
- Fixed a bug where
Table.updatecould not handleVariantType,MapType, andArrayTypedata types. - Fixed a bug where column aliases were incorrectly resolved in
DataFrame.join, causing errors when selecting columns from a joined DataFrame. - Fixed a bug where
Table.updateandTable.mergecould fail if the target table's index was not the defaultRangeIndex.
- Updated
Sessionclass to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the sameSessionobject.- The feature is disabled by default and can be enabled by setting
FEATURE_THREAD_SAFE_PYTHON_SESSIONtoTruefor account. - Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.
- When enabled, some internally created temporary table names returned from
DataFrame.queriesAPI are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.
- The feature is disabled by default and can be enabled by setting
- Added support for 'Service' domain to
session.lineage.traceAPI. - Added support for
copy_grantsparameter when registering UDxF and stored procedures. - Added support for the following methods in
DataFrameWriterto support daisy-chaining:optionoptionspartition_by
- Added support for
snowflake_cortex_summarize.
- Improved the following new capability for function
snowflake.snowpark.functions.array_removeit is now possible to use in python. - Disables sql simplification when sort is performed after limit.
- Previously,
df.sort().limit()anddf.limit().sort()generates the same query with sort in front of limit. Now,df.limit().sort()will generate query that readsdf.limit().sort(). - Improve performance of generated query for
df.limit().sort(), because limit stops table scanning as soon as the number of records is satisfied.
- Previously,
- Added a client side error message for when an invalid stage location is passed to DataFrame read functions.
- Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
- Fixed a bug in
DataFrame.analytics.time_series_aggfunction to handle multiple data points in same sliding interval. - Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.
- Deprecated warnings will be triggered when using snowpark-python with Python 3.8. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
- Added support for
np.subtract,np.multiply,np.divide, andnp.true_divide. - Added support for tracking usages of
__array_ufunc__. - Added numpy compatibility support for
np.float_power,np.mod,np.remainder,np.greater,np.greater_equal,np.less,np.less_equal,np.not_equal, andnp.equal. - Added numpy compatibility support for
np.log,np.log2, andnp.log10 - Added support for
DataFrameGroupBy.bfill,SeriesGroupBy.bfill,DataFrameGroupBy.ffill, andSeriesGroupBy.ffill. - Added support for
onparameter withResampler. - Added support for timedelta inputs in
value_counts(). - Added support for applying Snowpark Python function
snowflake_cortex_summarize. - Added support for
DataFrame.attrsandSeries.attrs. - Added support for
DataFrame.style. - Added numpy compatibility support for
np.full_like
- Improved generated SQL query for
headandilocwhen the row key is a slice. - Improved error message when passing an unknown timezone to
tz_convertandtz_localizeinSeries,DataFrame,Series.dt, andDatetimeIndex. - Improved documentation for
tz_convertandtz_localizeinSeries,DataFrame,Series.dt, andDatetimeIndexto specify the supported timezone formats. - Added additional kwargs support for
df.applyandseries.apply( as well asmapandapplymap) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object. - Improved generated SQL query for
ilocandiatwhen the row key is a scalar. - Removed all joins in
iterrows. - Improved documentation for
Series.mapto reflect the unsupported features. - Added support for
np.may_share_memorywhich is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.
- Fixed a bug where
DataFrameandSeriespct_change()would raiseTypeErrorwhen input contained timedelta columns. - Fixed a bug where
replace()would sometimes propagateTimedeltatypes incorrectly throughreplace(). Instead raiseNotImplementedErrorforreplace()onTimedelta. - Fixed a bug where
DataFrameandSeriesround()would raiseAssertionErrorforTimedeltacolumns. Instead raiseNotImplementedErrorforround()onTimedelta. - Fixed a bug where
reindexfails when the new index is a Series with non-overlapping types from the original index. - Fixed a bug where calling
__getitem__on a DataFrameGroupBy object always returned a DataFrameGroupBy object ifas_index=False. - Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising
NotImplementedError. - Fixed a bug where
DataFrame.shift()on axis=0 and axis=1 would fail to propagate timedelta types. DataFrame.abs(),DataFrame.__neg__(),DataFrame.stack(), andDataFrame.unstack()now raiseNotImplementedErrorfor timedelta inputs instead of failing to propagate timedelta types.
- Fixed a bug where
DataFrame.aliasraisesKeyErrorfor input column name. - Fixed a bug where
to_csvon Snowflake stage fails when data contains empty strings.
- Added the following new functions in
snowflake.snowpark.functions:make_interval
- Added support for using Snowflake Interval constants with
Window.range_between()when the order by column is TIMESTAMP or DATE type. - Added support for file writes. This feature is currently in private preview.
- Added
thread_idtoQueryRecordto track the thread id submitting the query history. - Added support for
Session.stored_procedure_profiler.
- Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning
'NoneType' has no len() when trying to read default values from function.
- Added support for
TimedeltaIndex.meanmethod. - Added support for some cases of aggregating
Timedeltacolumns onaxis=0withaggoraggregate. - Added support for
by,left_by,right_by,left_index, andright_indexforpd.merge_asof. - Added support for passing parameter
include_describetoSession.query_history. - Added support for
DatetimeIndex.meanandDatetimeIndex.stdmethods. - Added support for
Resampler.asfreq,Resampler.indices,Resampler.nunique, andResampler.quantile. - Added support for
resamplefrequencyW,ME,YEwithclosed = "left". - Added support for
DataFrame.rolling.corrandSeries.rolling.corrforpairwise = Falseand intwindow. - Added support for string time-based
windowandmin_periods = NoneforRolling. - Added support for
DataFrameGroupBy.fillnaandSeriesGroupBy.fillna. - Added support for constructing
SeriesandDataFrameobjects with the lazyIndexobject asdata,index, andcolumnsarguments. - Added support for constructing
SeriesandDataFrameobjects withindexandcolumnvalues not present inDataFrame/Seriesdata. - Added support for
pd.read_sas(Uses native pandas for processing). - Added support for applying
rolling().count()andexpanding().count()toTimedeltaseries and columns. - Added support for
tzin bothpd.date_rangeandpd.bdate_range. - Added support for
Series.items. - Added support for
errors="ignore"inpd.to_datetime. - Added support for
DataFrame.tz_localizeandSeries.tz_localize. - Added support for
DataFrame.tz_convertandSeries.tz_convert. - Added support for applying Snowpark Python functions (e.g.,
sin) inSeries.map,Series.apply,DataFrame.applyandDataFrame.applymap.
- Improved
to_pandasto persist the original timezone offset for TIMESTAMP_TZ type. - Improved
dtyperesults for TIMESTAMP_TZ type to show correct timezone offset. - Improved
dtyperesults for TIMESTAMP_LTZ type to show correct timezone. - Improved error message when passing non-bool value to
numeric_onlyfor groupby aggregations. - Removed unnecessary warning about sort algorithm in
sort_values. - Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.
- Improved warning messages for operations that lead to materialization with inadvertent slowness.
- Removed unnecessary warning message about
convert_dtypeinSeries.apply.
- Fixed a bug where an
Indexobject created from aSeries/DataFrameincorrectly updates theSeries/DataFrame's index name after an inplace update has been applied to the originalSeries/DataFrame. - Suppressed an unhelpful
SettingWithCopyWarningthat sometimes appeared when printingTimedeltacolumns. - Fixed
inplaceargument forSeriesobjects derived from otherSeriesobjects. - Fixed a bug where
Series.sort_valuesfailed if series name overlapped with index column name. - Fixed a bug where transposing a dataframe would map
Timedeltaindex levels to integer column levels. - Fixed a bug where
Resamplermethods on timedelta columns would produce integer results. - Fixed a bug where
pd.to_numeric()would leaveTimedeltainputs asTimedeltainstead of converting them to integers. - Fixed
locset when setting a single row, or multiple rows, of a DataFrame with a Series value.
- Fixed a bug where nullable columns were annotated wrongly.
- Fixed a bug where the
date_addanddate_subfunctions failed forNULLvalues. - Fixed a bug where
equal_nullcould fail inside a merge statement. - Fixed a bug where
row_numbercould fail inside a Window function. - Fixed a bug where updates could fail when the source is the result of a join.
This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
- Added the following new functions in
snowflake.snowpark.functions:array_removeln
- Improved documentation for
Session.write_pandasby makinguse_logical_typeoption more explicit. - Added support for specifying the following to
DataFrameWriter.save_as_table:enable_schema_evolutiondata_retention_timemax_data_extension_timechange_trackingcopy_grantsiceberg_configA dicitionary that can hold the following iceberg configuration options:external_volumecatalogbase_locationcatalog_syncstorage_serialization_policy
- Added support for specifying the following to
DataFrameWriter.copy_into_table:iceberg_configA dicitionary that can hold the following iceberg configuration options:external_volumecatalogbase_locationcatalog_syncstorage_serialization_policy
- Added support for specifying the following parameters to
DataFrame.create_or_replace_dynamic_table:moderefresh_modeinitializeclustering_keysis_transientdata_retention_timemax_data_extension_time
- Fixed a bug in
session.read.csvthat caused an error when settingPARSE_HEADER = Truein an externally defined file format. - Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
- Fixed a bug in
session.get_session_stagethat referenced a non-existing stage after switching database or schema. - Fixed a bug where calling
DataFrame.to_snowpark_pandaswithout explicitly initializing the Snowpark pandas plugin caused an error. - Fixed a bug where using the
explodefunction in dynamic table creation caused a SQL compilation error due to improper boolean type casting on theouterparameter.
- Added support for type coercion when passing columns as input to UDF calls.
- Added support for
Index.identical.
- Fixed a bug where the truncate mode in
DataFrameWriter.save_as_tableincorrectly handled DataFrames containing only a subset of columns from the existing table. - Fixed a bug where function
to_timestampdoes not set the default timezone of the column datatype.
- Added limited support for the
Timedeltatype, including the following features. Snowpark pandas will raiseNotImplementedErrorfor unsupportedTimedeltause cases.- supporting tracking the Timedelta type through
copy,cache_result,shift,sort_index,assign,bfill,ffill,fillna,compare,diff,drop,dropna,duplicated,empty,equals,insert,isin,isna,items,iterrows,join,len,mask,melt,merge,nlargest,nsmallest,to_pandas. - converting non-timedelta to timedelta via
astype. NotImplementedErrorwill be raised for the rest of methods that do not supportTimedelta.- support for subtracting two timestamps to get a Timedelta.
- support indexing with Timedelta data columns.
- support for adding or subtracting timestamps and
Timedelta. - support for binary arithmetic between two
Timedeltavalues. - support for binary arithmetic and comparisons between
Timedeltavalues and numeric values. - support for lazy
TimedeltaIndex. - support for
pd.to_timedelta. - support for
GroupByaggregationsmin,max,mean,idxmax,idxmin,std,sum,median,count,any,all,size,nunique,head,tail,aggregate. - support for
GroupByfiltrationsfirstandlast. - support for
TimedeltaIndexattributes:days,seconds,microsecondsandnanoseconds. - support for
diffwith timestamp columns onaxis=0andaxis=1 - support for
TimedeltaIndexmethods:ceil,floorandround. - support for
TimedeltaIndex.total_secondsmethod.
- supporting tracking the Timedelta type through
- Added support for index's arithmetic and comparison operators.
- Added support for
Series.dt.round. - Added documentation pages for
DatetimeIndex. - Added support for
Index.name,Index.names,Index.rename, andIndex.set_names. - Added support for
Index.__repr__. - Added support for
DatetimeIndex.month_nameandDatetimeIndex.day_name. - Added support for
Series.dt.weekday,Series.dt.time, andDatetimeIndex.time. - Added support for
Index.minandIndex.max. - Added support for
pd.merge_asof. - Added support for
Series.dt.normalizeandDatetimeIndex.normalize. - Added support for
Index.is_boolean,Index.is_integer,Index.is_floating,Index.is_numeric, andIndex.is_object. - Added support for
DatetimeIndex.round,DatetimeIndex.floorandDatetimeIndex.ceil. - Added support for
Series.dt.days_in_monthandSeries.dt.daysinmonth. - Added support for
DataFrameGroupBy.value_countsandSeriesGroupBy.value_counts. - Added support for
Series.is_monotonic_increasingandSeries.is_monotonic_decreasing. - Added support for
Index.is_monotonic_increasingandIndex.is_monotonic_decreasing. - Added support for
pd.crosstab. - Added support for
pd.bdate_rangeand included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for bothpd.date_rangeandpd.bdate_range. - Added support for lazy
Indexobjects aslabelsinDataFrame.reindexandSeries.reindex. - Added support for
Series.dt.days,Series.dt.seconds,Series.dt.microseconds, andSeries.dt.nanoseconds. - Added support for creating a
DatetimeIndexfrom anIndexof numeric or string type. - Added support for string indexing with
Timedeltaobjects. - Added support for
Series.dt.total_secondsmethod. - Added support for
DataFrame.apply(axis=0). - Added support for
Series.dt.tz_convertandSeries.dt.tz_localize. - Added support for
DatetimeIndex.tz_convertandDatetimeIndex.tz_localize.
- Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
- Refactored
quoted_identifier_to_snowflake_typeto avoid making metadata queries if the types have been cached locally. - Improved
pd.to_datetimeto handle all local input cases. - Create a lazy index from another lazy index without pulling data to client.
- Raised
NotImplementedErrorfor Index bitwise operators. - Display a more clear error message when
Index.namesis set to a non-like-like object. - Raise a warning whenever MultiIndex values are pulled in locally.
- Improve warning message for
pd.read_snowflakeinclude the creation reason when temp table creation is triggered. - Improve performance for
DataFrame.set_index, or settingDataFrame.indexorSeries.indexby avoiding checks require eager evaluation. As a consequence, when the new index that does not match the currentSeries/DataFrameobject length, aValueErroris no longer raised. Instead, when theSeries/DataFrameobject is longer than the provided index, theSeries/DataFrame's new index is filled withNaNvalues for the "extra" elements. Otherwise, the extra values in the provided index are ignored. - Properly raise
NotImplementedErrorwhen ambiguous/nonexistent are non-string inceil/floor/round.
- Stopped ignoring nanoseconds in
pd.Timedeltascalars. - Fixed AssertionError in tree of binary operations.
- Fixed bug in
Series.dt.isocalendarusing a named Series - Fixed
inplaceargument for Series objects derived from DataFrame columns. - Fixed a bug where
Series.reindexandDataFrame.reindexdid not update the result index's name correctly. - Fixed a bug where
Series.takedid not error whenaxis=1was specified.
- Fixed a bug where using
to_pandas_batcheswith async jobs caused an error due to improper handling of waiting for asynchronous query completion.
- Added support for
snowflake.snowpark.testing.assert_dataframe_equalthat is a utility function to check the equality of two Snowpark DataFrames.
- Added support server side string size limitations.
- Added support to create and invoke stored procedures, UDFs and UDTFs with optional arguments.
- Added support for column lineage in the DataFrame.lineage.trace API.
- Added support for passing
INFER_SCHEMAoptions toDataFrameReaderviaINFER_SCHEMA_OPTIONS. - Added support for passing
parametersparameter toColumn.rlikeandColumn.regexp. - Added support for automatically cleaning up temporary tables created by
df.cache_result()in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by settingsession.auto_clean_up_temp_table_enabledtoTrue. - Added support for string literals to the
fmtparameter ofsnowflake.snowpark.functions.to_date. - Added support for system$reference function.
- Fixed a bug where SQL generated for selecting
*column has an incorrect subquery. - Fixed a bug in
DataFrame.to_pandas_batcheswhere the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level. - Fixed a bug in
DataFrame.lineage.traceto split the quoted feature view's name and version correctly. - Fixed a bug in
Column.isinthat caused invalid sql generation when passed an empty list. - Fixed a bug that fails to raise NotImplementedError while setting cell with list like item.
- Added support for the following APIs:
- snowflake.snowpark.functions
rankdense_rankpercent_rankcume_distntiledatediffarray_agg
- snowflake.snowpark.column.Column.within_group
- snowflake.snowpark.functions
- Added support for parsing flags in regex statements for mocked plans. This maintains parity with the
rlikeandregexpchanges above.
- Fixed a bug where Window Functions LEAD and LAG do not handle option
ignore_nullsproperly. - Fixed a bug where values were not populated into the result DataFrame during the insertion of table merge operation.
- Fix pandas FutureWarning about integer indexing.
- Added support for
DataFrame.backfill,DataFrame.bfill,Series.backfill, andSeries.bfill. - Added support for
DataFrame.compareandSeries.comparewith default parameters. - Added support for
Series.dt.microsecondandSeries.dt.nanosecond. - Added support for
Index.is_uniqueandIndex.has_duplicates. - Added support for
Index.equals. - Added support for
Index.value_counts. - Added support for
Series.dt.day_nameandSeries.dt.month_name. - Added support for indexing on Index, e.g.,
df.index[:10]. - Added support for
DataFrame.unstackandSeries.unstack. - Added support for
DataFrame.asfreqandSeries.asfreq. - Added support for
Series.dt.is_month_startandSeries.dt.is_month_end. - Added support for
Index.allandIndex.any. - Added support for
Series.dt.is_year_startandSeries.dt.is_year_end. - Added support for
Series.dt.is_quarter_startandSeries.dt.is_quarter_end. - Added support for lazy
DatetimeIndex. - Added support for
Series.argmaxandSeries.argmin. - Added support for
Series.dt.is_leap_year. - Added support for
DataFrame.items. - Added support for
Series.dt.floorandSeries.dt.ceil. - Added support for
Index.reindex. - Added support for
DatetimeIndexproperties:year,month,day,hour,minute,second,microsecond,nanosecond,date,dayofyear,day_of_year,dayofweek,day_of_week,weekday,quarter,is_month_start,is_month_end,is_quarter_start,is_quarter_end,is_year_start,is_year_endandis_leap_year. - Added support for
Resampler.fillnaandResampler.bfill. - Added limited support for the
Timedeltatype, including creatingTimedeltacolumns andto_pandas. - Added support for
Index.argmaxandIndex.argmin.
- Removed the public preview warning message when importing Snowpark pandas.
- Removed unnecessary count query from
SnowflakeQueryCompiler.is_series_likemethod. Dataframe.columnsnow returns native pandas Index object instead of Snowpark Index object.- Refactor and introduce
query_compilerargument inIndexconstructor to createIndexfrom query compiler. pd.to_datetimenow returns a DatetimeIndex object instead of a Series object.pd.date_rangenow returns a DatetimeIndex object instead of a Series object.
- Made passing an unsupported aggregation function to
pivot_tableraiseNotImplementedErrorinstead ofKeyError. - Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.
- Fixed AssertionError in
Series.drop_duplicatesandDataFrame.drop_duplicateswhen called aftersort_values. - Fixed a bug in
Index.to_framewhere the result frame's column name may be wrong where name is unspecified. - Fixed a bug where some Index docstrings are ignored.
- Fixed a bug in
Series.reset_index(drop=True)where the result name may be wrong. - Fixed a bug in
Groupby.first/lastordering by the correct columns in the underlying window expression.
- Added distributed tracing using open telemetry APIs for table stored procedure function in
DataFrame:_execute_and_get_query_id
- Added support for the
arrays_zipfunction. - Improves performance for binary column expression and
df._inby avoiding unnecessary cast for numeric values. You can enable this optimization by settingsession.eliminate_numeric_sql_value_cast_enabled = True. - Improved error message for
write_pandaswhen the target table does not exist andauto_create_table=False. - Added open telemetry tracing on UDxF functions in Snowpark.
- Added open telemetry tracing on stored procedure registration in Snowpark.
- Added a new optional parameter called
format_jsonto theSession.SessionBuilder.app_namefunction that sets the app name in theSession.query_tagin JSON format. By default, this parameter is set toFalse.
- Fixed a bug where SQL generated for
lag(x, 0)was incorrect and failed with error messageargument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'.
- Added support for the following APIs:
- snowflake.snowpark.functions
- random
- snowflake.snowpark.functions
- Added new parameters to
patchfunction when registering a mocked function:distinctallows an alternate function to be specified for when a sql function should be distinct.pass_column_indexpasses a named parametercolumn_indexto the mocked function that contains the pandas.Index for the input data.pass_row_indexpasses a named parameterrow_indexto the mocked function that is the 0 indexed row number the function is currently operating on.pass_input_datapasses a named parameterinput_datato the mocked function that contains the entire input dataframe for the current expression.- Added support for the
column_orderparameter to methodDataFrameWriter.save_as_table.
- Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.
- Added support for
DataFrameGroupBy.all,SeriesGroupBy.all,DataFrameGroupBy.any, andSeriesGroupBy.any. - Added support for
DataFrame.nlargest,DataFrame.nsmallest,Series.nlargestandSeries.nsmallest. - Added support for
replaceandfrac > 1inDataFrame.sampleandSeries.sample. - Added support for
read_excel(Uses local pandas for processing) - Added support for
Series.at,Series.iat,DataFrame.at, andDataFrame.iat. - Added support for
Series.dt.isocalendar. - Added support for
Series.case_whenexcept when condition or replacement is callable. - Added documentation pages for
Indexand its APIs. - Added support for
DataFrame.assign. - Added support for
DataFrame.stack. - Added support for
DataFrame.pivotandpd.pivot. - Added support for
DataFrame.to_csvandSeries.to_csv. - Added partial support for
Series.str.translatewhere the values in thetableare single-codepoint strings. - Added support for
DataFrame.corr. - Allow
df.plot()andseries.plot()to be called, materializing the data into the local client - Added support for
DataFrameGroupByandSeriesGroupByaggregationsfirstandlast - Added support for
DataFrameGroupBy.get_group. - Added support for
limitparameter whenmethodparameter is used infillna. - Added partial support for
Series.str.translatewhere the values in thetableare single-codepoint strings. - Added support for
DataFrame.corr. - Added support for
DataFrame.equalsandSeries.equals. - Added support for
DataFrame.reindexandSeries.reindex. - Added support for
Index.astype. - Added support for
Index.uniqueandIndex.nunique. - Added support for
Index.sort_values.
- Fixed an issue when using np.where and df.where when the scalar 'other' is the literal 0.
- Fixed a bug regarding precision loss when converting to Snowpark pandas
DataFrameorSerieswithdtype=np.uint64. - Fixed bug where
valuesis set toindexwhenindexandcolumnscontain all columns in DataFrame duringpivot_table.
- Added support for
Index.copy() - Added support for Index APIs:
dtype,values,item(),tolist(),to_series()andto_frame() - Expand support for DataFrames with no rows in
pd.pivot_tableandDataFrame.pivot_table. - Added support for
inplaceparameter inDataFrame.sort_indexandSeries.sort_index.
- Added support for
to_booleanfunction. - Added documentation pages for Index and its APIs.
- Fixed a bug where python stored procedure with table return type fails when run in a task.
- Fixed a bug where df.dropna fails due to
RecursionError: maximum recursion depth exceededwhen the DataFrame has more than 500 columns. - Fixed a bug where
AsyncJob.result("no_result")doesn't wait for the query to finish execution.
- Added support for the
strictparameter when registering UDFs and Stored Procedures.
- Fixed a bug in convert_timezone that made the setting the source_timezone parameter return an error.
- Fixed a bug where creating DataFrame with empty data of type
DateTyperaisesAttributeError. - Fixed a bug that table merge fails when update clause exists but no update takes place.
- Fixed a bug in mock implementation of
to_charthat raisesIndexErrorwhen incoming column has nonconsecutive row index. - Fixed a bug in handling of
CaseExprexpressions that raisesIndexErrorwhen incoming column has nonconsecutive row index. - Fixed a bug in implementation of
Column.likethat raisesIndexErrorwhen incoming column has nonconsecutive row index.
- Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna and the mock function
iff.
- Added partial support for
DataFrame.pct_changeandSeries.pct_changewithout thefreqandlimitparameters. - Added support for
Series.str.get. - Added support for
Series.dt.dayofweek,Series.dt.day_of_week,Series.dt.dayofyear, andSeries.dt.day_of_year. - Added support for
Series.str.__getitem__(Series.str[...]). - Added support for
Series.str.lstripandSeries.str.rstrip. - Added support for
DataFrameGroupBy.sizeandSeriesGroupBy.size. - Added support for
DataFrame.expandingandSeries.expandingfor aggregationscount,sum,min,max,mean,std,var, andsemwithaxis=0. - Added support for
DataFrame.rollingandSeries.rollingfor aggregationcountwithaxis=0. - Added support for
Series.str.match. - Added support for
DataFrame.resampleandSeries.resamplefor aggregationssize,first, andlast. - Added support for
DataFrameGroupBy.all,SeriesGroupBy.all,DataFrameGroupBy.any, andSeriesGroupBy.any. - Added support for
DataFrame.nlargest,DataFrame.nsmallest,Series.nlargestandSeries.nsmallest. - Added support for
replaceandfrac > 1inDataFrame.sampleandSeries.sample. - Added support for
read_excel(Uses local pandas for processing) - Added support for
Series.at,Series.iat,DataFrame.at, andDataFrame.iat. - Added support for
Series.dt.isocalendar. - Added support for
Series.case_whenexcept when condition or replacement is callable. - Added documentation pages for
Indexand its APIs. - Added support for
DataFrame.assign. - Added support for
DataFrame.stack. - Added support for
DataFrame.pivotandpd.pivot. - Added support for
DataFrame.to_csvandSeries.to_csv. - Added support for
Index.T.
- Fixed a bug that causes output of GroupBy.aggregate's columns to be ordered incorrectly.
- Fixed a bug where
DataFrame.describeon a frame with duplicate columns of differing dtypes could cause an error or incorrect results. - Fixed a bug in
DataFrame.rollingandSeries.rollingsowindow=0now throwsNotImplementedErrorinstead ofValueError
- Added support for named aggregations in
DataFrame.aggregateandSeries.aggregatewithaxis=0. pd.read_csvreads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported byread_csvincluding date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.- Initial work to support an
pd.Indexdirectly in Snowpark pandas. Support forpd.Indexas a first-class component of Snowpark pandas is coming soon. - Added a lazy index constructor and support for
len,shape,size,empty,to_pandas()andnames. Fordf.index, Snowpark pandas creates a lazy index object. - For
df.columns, Snowpark pandas supports a non-lazy version of anIndexsince the data is already stored locally.
- Improved error message to remind users set
{"infer_schema": True}when reading csv file without specifying its schema. - Improved error handling for
Session.create_dataframewhen called with more than 512 rows and usingformatorpyformatparamstyle.
- Added
DataFrame.cache_resultandSeries.cache_resultmethods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.
- Added partial support for
DataFrame.pivot_tablewith noindexparameter, as well as formarginsparameter. - Updated the signature of
DataFrame.shift/Series.shift/DataFrameGroupBy.shift/SeriesGroupBy.shiftto match pandas 2.2.1. Snowpark pandas does not yet support the newly-addedsuffixargument, or sequence values ofperiods. - Re-added support for
Series.str.split.
- Fixed how we support mixed columns for string methods (
Series.str.*).
- Added support for the following DataFrameReader read options to file formats
csvandjson:- PURGE
- PATTERN
- INFER_SCHEMA with value being
False - ENCODING with value being
UTF8
- Added support for
DataFrame.analytics.moving_aggandDataFrame.analytics.cumulative_agg_agg. - Added support for
if_not_existsparameter during UDF and stored procedure registration.
- Fixed a bug that when processing time format, fractional second part is not handled properly.
- Fixed a bug that caused function calls on
*to fail. - Fixed a bug that prevented creation of map and struct type objects.
- Fixed a bug that function
date_addwas unable to handle some numeric types. - Fixed a bug that
TimestampTypecasting resulted in incorrect data. - Fixed a bug that caused
DecimalTypedata to have incorrect precision in some cases. - Fixed a bug where referencing missing table or view raises confusing
IndexError. - Fixed a bug that mocked function
to_timestamp_ntzcan not handle None data. - Fixed a bug that mocked UDFs handles output data of None improperly.
- Fixed a bug where
DataFrame.with_column_renamedignores attributes from parent DataFrames after join operations. - Fixed a bug that integer precision of large value gets lost when converted to pandas DataFrame.
- Fixed a bug that the schema of datetime object is wrong when create DataFrame from a pandas DataFrame.
- Fixed a bug in the implementation of
Column.equal_nanwhere null data is handled incorrectly. - Fixed a bug where
DataFrame.dropignore attributes from parent DataFrames after join operations. - Fixed a bug in mocked function
date_partwhere Column type is set wrong. - Fixed a bug where
DataFrameWriter.save_as_tabledoes not raise exceptions when inserting null data into non-nullable columns. - Fixed a bug in the implementation of
DataFrameWriter.save_as_tablewhere- Append or Truncate fails when incoming data has different schema than existing table.
- Truncate fails when incoming data does not specify columns that are nullable.
- Removed dependency check for
pyarrowas it is not used. - Improved target type coverage of
Column.cast, adding support for casting to boolean and all integral types. - Aligned error experience when calling UDFs and stored procedures.
- Added appropriate error messages for
is_permanentandanonymousoptions in UDFs and stored procedures registration to make it more clear that those features are not yet supported. - File read operation with unsupported options and values now raises
NotImplementedErrorinstead of warnings and unclear error information.
- Added support to add a comment on tables and views using the functions listed below:
DataFrameWriter.save_as_tableDataFrame.create_or_replace_viewDataFrame.create_or_replace_temp_viewDataFrame.create_or_replace_dynamic_table
- Improved error message to remind users to set
{"infer_schema": True}when reading CSV file without specifying its schema.
- Start of Public Preview of Snowpark pandas API. Refer to the Snowpark pandas API Docs for more details.
- Added support for NumericType and VariantType data conversion in the mocked function
to_timestamp_ltz,to_timestamp_ntz,to_timestamp_tzandto_timestamp. - Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function
to_char. - Added support for the following APIs:
- snowflake.snowpark.functions:
- to_varchar
- snowflake.snowpark.DataFrame:
- pivot
- snowflake.snowpark.Session:
- cancel_all
- snowflake.snowpark.functions:
- Introduced a new exception class
snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException. - Added support for casting to FloatType
- Fixed a bug that stored procedure and UDF should not remove imports already in the
sys.pathduring the clean-up step. - Fixed a bug that when processing datetime format, the fractional second part is not handled properly.
- Fixed a bug that on Windows platform that file operations was unable to properly handle file separator in directory name.
- Fixed a bug that on Windows platform that when reading a pandas dataframe, IntervalType column with integer data can not be processed.
- Fixed a bug that prevented users from being able to select multiple columns with the same alias.
- Fixed a bug that
Session.get_current_[schema|database|role|user|account|warehouse]returns upper-cased identifiers when identifiers are quoted. - Fixed a bug that function
substrandsubstringcan not handle 0-basedstart_expr.
- Standardized the error experience by raising
SnowparkLocalTestingExceptionin error cases which is on par withSnowparkSQLExceptionraised in non-local execution. - Improved error experience of
Session.write_pandasmethod thatNotImplementErrorwill be raised when called. - Aligned error experience with reusing a closed session in non-local execution.
- Support stored procedure register with packages given as Python modules.
- Added snowflake.snowpark.Session.lineage.trace to explore data lineage of snowfake objects.
- Added support for structured type schema parsing.
- Fixed a bug when inferring schema, single quotes are added to stage files already have single quotes.
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function
to_date. - Added support for the following APIs:
- snowflake.snowpark.functions
- get
- concat
- concat_ws
- snowflake.snowpark.functions
- Fixed a bug that caused
NaTandNaNvalues to not be recognized. - Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
- Fixed a bug where
DataFrameReader.csvwas unable to handle quoted values containing a delimiter. - Fixed a bug that when there is
Nonevalue in an arithmetic calculation, the output should remainNoneinstead ofmath.nan. - Fixed a bug in function
sumandcovar_popthat when there ismath.nanin the data, the output should also bemath.nan. - Fixed a bug that stage operation can not handle directories.
- Fixed a bug that
DataFrame.to_pandasshould take Snowflake numeric types with precision 38 asint64.
- Added
truncatesave mode inDataFrameWriteto overwrite existing tables by truncating the underlying table instead of dropping it. - Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
- Added the functions below to unload data from a
DataFrameinto one or more files in a stage:DataFrame.write.jsonDataFrame.write.csvDataFrame.write.parquet
- Added distributed tracing using open telemetry APIs for action functions in
DataFrameandDataFrameWriter:- snowflake.snowpark.DataFrame:
- collect
- collect_nowait
- to_pandas
- count
- show
- snowflake.snowpark.DataFrameWriter:
- save_as_table
- snowflake.snowpark.DataFrame:
- Added support for snow:// URLs to
snowflake.snowpark.Session.file.getandsnowflake.snowpark.Session.file.get_stream - Added support to register stored procedures and UDxFs with a
comment. - UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
- Added support for dynamic pivot. This feature is currently in private preview.
- Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature not enabled by default, and can be enabled by setting
session.cte_optimization_enabledtoTrue.
- Fixed a bug where
statement_paramswas not passed to query executions that register stored procedures and user defined functions. - Fixed a bug causing
snowflake.snowpark.Session.file.get_streamto fail for quoted stage locations. - Fixed a bug that an internal type hint in
utils.pymight raise AttributeError in case the underlying module can not be found.
- Added support for registering UDFs and stored procedures.
- Added support for the following APIs:
- snowflake.snowpark.Session:
- file.put
- file.put_stream
- file.get
- file.get_stream
- read.json
- add_import
- remove_import
- get_imports
- clear_imports
- add_packages
- add_requirements
- clear_packages
- remove_package
- udf.register
- udf.register_from_file
- sproc.register
- sproc.register_from_file
- snowflake.snowpark.functions
- current_database
- current_session
- date_trunc
- object_construct
- object_construct_keep_null
- pow
- sqrt
- udf
- sproc
- snowflake.snowpark.Session:
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function
to_time.
- Fixed a bug that null filled columns for constant functions.
- Fixed a bug that implementation of to_object, to_array and to_binary to better handle null inputs.
- Fixed a bug that timestamp data comparison can not handle year beyond 2262.
- Fixed a bug that
Session.builder.getOrCreateshould return the created mock session.
- Added support for creating vectorized UDTFs with
processmethod. - Added support for dataframe functions:
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- locate
- Added support for ASOF JOIN type.
- Added support for the following local testing APIs:
- snowflake.snowpark.functions:
- to_double
- to_timestamp
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- greatest
- least
- convert_timezone
- dateadd
- date_part
- snowflake.snowpark.Session:
- get_current_account
- get_current_warehouse
- get_current_role
- use_schema
- use_warehouse
- use_database
- use_role
- snowflake.snowpark.functions:
- Fixed a bug in
SnowflakePlanBuilderthatsave_as_tabledoes not filter column that name start with '$' and follow by number correctly. - Fixed a bug that statement parameters may have no effect when resolving imports and packages.
- Fixed bugs in local testing:
- LEFT ANTI and LEFT SEMI joins drop rows with null values.
- DataFrameReader.csv incorrectly parses data when the optional parameter
field_optionally_enclosed_byis specified. - Column.regexp only considers the first entry when
patternis aColumn. - Table.update raises
KeyErrorwhen updating null values in the rows. - VARIANT columns raise errors at
DataFrame.collect. count_distinctdoes not work correctly when counting.- Null values in integer columns raise
TypeError.
- Added telemetry to local testing.
- Improved the error message of
DataFrameReaderto raiseFileNotFounderror when reading a path that does not exist or when there are no files under the path.
- Added support for an optional
date_partargument in functionlast_day. SessionBuilder.app_namewill set the query_tag after the session is created.- Added support for the following local testing functions:
- current_timestamp
- current_date
- current_time
- strip_null_value
- upper
- lower
- length
- initcap
- Added cleanup logic at interpreter shutdown to close all active sessions.
- Closing sessions within stored procedures now is a no-op logging a warning instead of raising an error.
- Fixed a bug in
DataFrame.to_local_iteratorwhere the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945. - Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
- Fixed a bug that
Session.rangereturns empty result when the range is large.
- Use
split_blocks=Trueby default duringto_pandasconversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas, which enablesPyArrowto split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
- Fixed a bug in
DataFrame.to_pandasthat caused an error when evaluating on a Dataframe with anIntergerTypecolumn with null values.
- Exposed
statement_paramsinStoredProcedure.__call__. - Added two optional arguments to
Session.add_import.chunk_size: The number of bytes to hash per chunk of the uploaded files.whole_file_hash: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
- Added parameters
external_access_integrationsandsecretswhen creating a UDAF from Snowpark Python to allow integration with external access. - Added a new method
Session.append_query_tag. Allows an additional tag to be added to the current query tag by appending it as a comma separated value. - Added a new method
Session.update_query_tag. Allows updates to a JSON encoded dictionary query tag. SessionBuilder.getOrCreatewill now attempt to replace the singleton it returns when token expiration has been detected.- Added support for new functions in
snowflake.snowpark.functions:array_exceptcreate_mapsign/signum
- Added the following functions to
DataFrame.analytics:- Added the
moving_aggfunction inDataFrame.analyticsto enable moving aggregations like sums and averages with multiple window sizes. - Added the
cummulative_aggfunction inDataFrame.analyticsto enable commulative aggregations like sums and averages on multiple columns. - Added the
compute_lagandcompute_leadfunctions inDataFrame.analyticsfor enabling lead and lag calculations on multiple columns. - Added the
time_series_aggfunction inDataFrame.analyticsto enable time series aggregations like sums and averages with multiple time windows.
- Added the
-
Fixed a bug in
DataFrame.na.fillthat caused Boolean values to erroneously override integer values. -
Fixed a bug in
Session.create_dataframewhere the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType(), but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ). - Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)and timezone information is retained correctly. - Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAMEto revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.
- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
-
Fixed a bug that
DataFrame.to_pandasgets decimal type when scale is not 0, and creates an object dtype inpandas. Instead, we cast the value to a float64 type. -
Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()is called afterDataFrame.sort().limit().DataFrame.sort()orfilter()is called on a DataFrame that already has a window function or sequence-dependent data generator column. For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")won't flatten the sort clause anymore.- a window or sequence-dependent data generator column is used after
DataFrame.limit(). For instance,df.limit(10).select(row_number().over())won't flatten the limit and select in the generated SQL.
-
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # threw an error. Now it's fixed.
-
Fixed a bug in
Session.create_dataframethat the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table. -
Fixed a bug in SQL simplifier where non-select statements in
session.sqldropped a SQL query when used withlimit(). -
Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATEis true.
- When parsing data types during a
to_pandasoperation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8gets returned asint64. Users can fix this by explicitly specifying precision values for their return column. - Aligned behavior for
Session.callin case of table stored procedures where runningSession.callwould not trigger stored procedure unless acollect()operation was performed. StoredProcedureRegistrationwill now automatically addsnowflake-snowpark-pythonas a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.
- Fixed a bug that numpy should not be imported at the top level of mock module.
- Added support for these new functions in
snowflake.snowpark.functions:from_utc_timestampto_utc_timestamp
-
Add the
conn_errorattribute toSnowflakeSQLExceptionthat stores the whole underlying exception fromsnowflake-connector-python. -
Added support for
RelationalGroupedDataframe.pivot()to accesspivotin the following patternDataframe.group_by(...).pivot(...). -
Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
-
Added support for
arrays_to_objectnew functions insnowflake.snowpark.functions. -
Added support for the vector data type.
- Bumped cloudpickle dependency to work with
cloudpickle==2.2.1 - Updated
snowflake-connector-pythonto3.4.0.
- DataFrame column names quoting check now supports newline characters.
- Fix a bug where a DataFrame generated by
session.read.with_metadatacreates inconsistent table when doingdf.write.save_as_table.
- Added support for managing case sensitivity in
DataFrame.to_local_iterator(). - Added support for specifying vectorized UDTF's input column names by using the optional parameter
input_namesinUDTFRegistration.register/register_fileandfunctions.pandas_udtf. By default,RelationalGroupedDataFrame.applyInPandaswill infer the column names from current dataframe schema. - Add
sql_error_codeandraw_messageattributes toSnowflakeSQLExceptionwhen it is caused by a SQL exception.
- Fixed a bug in
DataFrame.to_pandas()where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits. - Fixed a bug that
session.add_packagescan not handle requirement specifier that contains project name with underscore and version. - Fixed a bug in
DataFrame.limit()whenoffsetis used and the parentDataFrameuseslimit. Now theoffsetwon't impact the parent DataFrame'slimit. - Fixed a bug in
DataFrame.write.save_as_tablewhere dataframes created from read api could not save data into snowflake because of invalid column name$1.
- Changed the behavior of
date_format:- The
formatargument changed from optional to required. - The returned result changed from a date object to a date-formatted string.
- The
- When a window function, or a sequence-dependent data generator (
normal,zipf,uniform,seq1,seq2,seq4,seq8) function is used, the sort and filter operation will no longer be flattened when generating the query.
- Added support for the Python 3.11 runtime environment.
- Added back the dependency of
typing-extensions.
- Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.
- Revert back to using CTAS (create table as select) statement for
Dataframe.writer.save_as_tablewhich does not need insert permission for writing tables.
- Support
PythonObjJSONEncoderjson-serializable objects forARRAYandOBJECTliterals.
-
Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.
-
Added support for specifying clustering keys when saving dataframes using
DataFrame.save_as_table. -
Accept
Iterableobjects input forschemawhen creating dataframes usingSession.create_dataframe. -
Added the property
DataFrame.sessionto return aSessionobject. -
Added the property
Session.session_idto return an integer that represents session ID. -
Added the property
Session.connectionto return aSnowflakeConnectionobject . -
Added support for creating a Snowpark session from a configuration file or environment variables.
- Updated
snowflake-connector-pythonto 3.2.0.
- Fixed a bug where automatic package upload would raise
ValueErroreven when compatible package version were added insession.add_packages. - Fixed a bug where table stored procedures were not registered correctly when using
register_from_file. - Fixed a bug where dataframe joins failed with
invalid_identifiererror. - Fixed a bug where
DataFrame.copydisables SQL simplfier for the returned copy. - Fixed a bug where
session.sql().select()would fail if any parameters are specified tosession.sql()
- Added parameters
external_access_integrationsandsecretswhen creating a UDF, UDTF or Stored Procedure from Snowpark Python to allow integration with external access. - Added support for these new functions in
snowflake.snowpark.functions:array_flattenflatten
- Added support for
apply_in_pandasinsnowflake.snowpark.relational_grouped_dataframe. - Added support for replicating your local Python environment on Snowflake via
Session.replicate_local_environment.
- Fixed a bug where
session.create_dataframefails to properly set nullable columns where nullability was affected by order or data was given. - Fixed a bug where
DataFrame.selectcould not identify and alias columns in presence of table functions when output columns of table function overlapped with columns in dataframe.
- When creating stored procedures, UDFs, UDTFs, UDAFs with parameter
is_permanent=Falsewill now create temporary objects even whenstage_nameis provided. The default value ofis_permanentisFalsewhich is why if this value is not explicitly set toTruefor permanent objects, users will notice a change in behavior. types.StructFieldnow enquotes column identifier by default.
- Added support for these new functions in
snowflake.snowpark.functions:array_sortsort_arrayarray_minarray_maxexplode_outer
- Added support for pure Python packages specified via
Session.add_requirementsorSession.add_packages. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.- Added Session parameter
custom_packages_upload_enabledandcustom_packages_force_upload_enabledto enable the support for pure Python packages feature mentioned above. Both parameters default toFalse.
- Added Session parameter
- Added support for specifying package requirements by passing a Conda environment yaml file to
Session.add_requirements. - Added support for asynchronous execution of multi-query dataframes that contain binding variables.
- Added support for renaming multiple columns in
DataFrame.rename. - Added support for Geometry datatypes.
- Added support for
paramsinsession.sql()in stored procedures. - Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.
- Added support for vectorized UDTFs (user-defined table functions). This feature is currently in public preview.
- Added support for Snowflake Timestamp variants (i.e.,
TIMESTAMP_NTZ,TIMESTAMP_LTZ,TIMESTAMP_TZ)- Added
TimestampTimezoneas an argument inTimestampTypeconstructor. - Added type hints
NTZ,LTZ,TZandTimestampto annotate functions when registering UDFs.
- Added
- Removed redundant dependency
typing-extensions. DataFrame.cache_resultnow creates temp table fully qualified names under current database and current schema.
- Fixed a bug where type check happens on pandas before it is imported.
- Fixed a bug when creating a UDF from
numpy.ufunc. - Fixed a bug where
DataFrame.unionwas not generating the correctSelectable.schema_querywhen SQL simplifier is enabled.
DataFrameWriter.save_as_tablenow respects thenullablefield of the schema provided by the user or the inferred schema based on data from user input.
- Updated
snowflake-connector-pythonto 3.0.4.
- Added support for the Python 3.10 runtime environment.
- Aggregation results, from functions such as
DataFrame.aggandDataFrame.describe, no longer strip away non-printing characters from column names.
- Added support for the Python 3.9 runtime environment.
- Added support for new functions in
snowflake.snowpark.functions:array_generate_rangearray_unique_aggcollect_setsequence
- Added support for registering and calling stored procedures with
TABLEreturn type. - Added support for parameter
lengthinStringType()to specify the maximum number of characters that can be stored by the column. - Added the alias
functions.element_at()forfunctions.get(). - Added the alias
Column.containsforfunctions.contains. - Added experimental feature
DataFrame.alias. - Added support for querying metadata columns from stage when creating
DataFrameusingDataFrameReader. - Added support for
StructType.addto append more fields to existingStructTypeobjects. - Added support for parameter
execute_asinStoredProcedureRegistration.register_from_file()to specify stored procedure caller rights.
- Fixed a bug where the
Dataframe.join_table_functiondid not run all of the necessary queries to set up the join table function when SQL simplifier was enabled. - Fixed type hint declaration for custom types -
ColumnOrName,ColumnOrLiteralStr,ColumnOrSqlExpr,LiteralTypeandColumnOrLiteralthat were breakingmypychecks. - Fixed a bug where
DataFrameWriter.save_as_tableandDataFrame.copy_into_tablefailed to parse fully qualified table names.
- Added support for
session.getOrCreate. - Added support for alias
Column.getField. - Added support for new functions in
snowflake.snowpark.functions:date_addanddate_subto make add and subtract operations easier.daydiffexplodearray_distinct.regexp_extract.struct.format_number.bround.substring_index
- Added parameter
skip_upload_on_content_matchwhen creating UDFs, UDTFs and stored procedures usingregister_from_fileto skip uploading files to a stage if the same version of the files are already on the stage. - Added support for
DataFrameWriter.save_as_tablemethod to take table names that contain dots. - Flattened generated SQL when
DataFrame.filter()orDataFrame.order_by()is followed by a projection statement (e.g.DataFrame.select(),DataFrame.with_column()). - Added support for creating dynamic tables (in private preview) using
Dataframe.create_or_replace_dynamic_table. - Added an optional argument
paramsinsession.sql()to support binding variables. Note that this is not supported in stored procedures yet.
- Fixed a bug in
strtok_to_arraywhere an exception was thrown when a delimiter was passed in. - Fixed a bug in
session.add_importwhere the module had the same namespace as other dependencies.
- Added support for
delimitersparameter infunctions.initcap(). - Added support for
functions.hash()to accept a variable number of input expressions. - Added API
Session.RuntimeConfigfor getting/setting/checking the mutability of any runtime configuration. - Added support managing case sensitivity in
Rowresults fromDataFrame.collectusingcase_sensitiveparameter. - Added API
Session.conffor getting, setting or checking the mutability of any runtime configuration. - Added support for managing case sensitivity in
Rowresults fromDataFrame.collectusingcase_sensitiveparameter. - Added indexer support for
snowflake.snowpark.types.StructType. - Added a keyword argument
log_on_exceptiontoDataframe.collectandDataframe.collect_no_waitto optionally disable error logging for SQL exceptions.
- Fixed a bug where a DataFrame set operation(
DataFrame.substract,DataFrame.union, etc.) being called after another DataFrame set operation andDataFrame.selectorDataFrame.with_columnthrows an exception. - Fixed a bug where chained sort statements are overwritten by the SQL simplifier.
- Simplified JOIN queries to use constant subquery aliases (
SNOWPARK_LEFT,SNOWPARK_RIGHT) by default. Users can disable this at runtime withsession.conf.set('use_constant_subquery_alias', False)to use randomly generated alias names instead. - Allowed specifying statement parameters in
session.call(). - Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.
- Added support for displaying source code as comments in the generated scripts when registering stored procedures. This
is enabled by default, turn off by specifying
source_code_display=Falseat registration. - Added a parameter
if_not_existswhen creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists. - Accept integers when calling
snowflake.snowpark.functions.getto extract value from array. - Added
functions.reversein functions to open access to Snowflake built-in function reverse. - Added parameter
require_scoped_urlin snowflake.snowflake.files.SnowflakeFile.open()(in Private Preview)to replaceis_owner_fileis marked for deprecation.
- Fixed a bug that overwrote
paramstyletoqmarkwhen creating a Snowpark session. - Fixed a bug where
df.join(..., how="cross")fails withSnowparkJoinException: (1112): Unsupported using join type 'Cross'. - Fixed a bug where querying a
DataFramecolumn created from chained function calls used a wrong column name.
- Added
asc,asc_nulls_first,asc_nulls_last,desc,desc_nulls_first,desc_nulls_last,date_partandunix_timestampin functions. - Added the property
DataFrame.dtypesto return a list of column name and data type pairs. - Added the following aliases:
functions.expr()forfunctions.sql_expr().functions.date_format()forfunctions.to_date().functions.monotonically_increasing_id()forfunctions.seq8()functions.from_unixtime()forfunctions.to_timestamp()
- Fixed a bug in SQL simplifier that didn’t handle Column alias and join well in some cases. See #658 for details.
- Fixed a bug in SQL simplifier that generated wrong column names for function calls, NaN and INF.
- The session parameter
PYTHON_SNOWPARK_USE_SQL_SIMPLIFIERisTrueafter Snowflake 7.3 was released. In snowpark-python,session.sql_simplifier_enabledreads the value ofPYTHON_SNOWPARK_USE_SQL_SIMPLIFIERby default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, setPYTHON_SNOWPARK_USE_SQL_SIMPLIFIERin Snowflake toFalseor runsession.sql_simplifier_enabled = Falsefrom Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.
- Added
Session.generator()to create a newDataFrameusing the Generator table function. - Added a parameter
secureto the functions that create a secure UDF or UDTF.
- Added new APIs for async job:
Session.create_async_job()to create anAsyncJobinstance from a query id.AsyncJob.result()now accepts argumentresult_typeto return the results in different formats.AsyncJob.to_df()returns aDataFramebuilt from the result of this asynchronous job.AsyncJob.query()returns the SQL text of the executed query.
DataFrame.agg()andRelationalGroupedDataFrame.agg()now accept variable-length arguments.- Added parameters
lsuffixandrsuffixtoDataFram.join()andDataFrame.cross_join()to conveniently rename overlapping columns. - Added
Table.drop_table()so you can drop the temp table afterDataFrame.cache_result().Tableis also a context manager so you can use thewithstatement to drop the cache temp table after use. - Added
Session.use_secondary_roles(). - Added functions
first_value()andlast_value(). (contributed by @chasleslr) - Added
onas an alias forusing_columnsandhowas an alias forjoin_typeinDataFrame.join().
- Fixed a bug in
Session.create_dataframe()that raised an error whenschemanames had special characters. - Fixed a bug in which options set in
Session.read.option()were not passed toDataFrame.copy_into_table()as default values. - Fixed a bug in which
DataFrame.copy_into_table()raises an error when a copy option has single quotes in the value.
Session.add_packages()now raisesValueErrorwhen the version of a package cannot be found in Snowflake Anaconda channel. Previously,Session.add_packages()succeeded, and aSnowparkSQLExceptionexception was raised later in the UDF/SP registration step.
- Added method
FileOperation.get_stream()to support downloading stage files as stream. - Added support in
functions.ntiles()to accept int argument. - Added the following aliases:
functions.call_function()forfunctions.call_builtin().functions.function()forfunctions.builtin().DataFrame.order_by()forDataFrame.sort()DataFrame.orderBy()forDataFrame.sort()
- Improved
DataFrame.cache_result()to return a more accurateTableclass instead of aDataFrameclass. - Added support to allow
sessionas the first argument when callingStoredProcedure.
- Improved nested query generation by flattening queries when applicable.
- This improvement could be enabled by setting
Session.sql_simplifier_enabled = True. DataFrame.select(),DataFrame.with_column(),DataFrame.drop()and other select-related APIs have more flattened SQLs.DataFrame.union(),DataFrame.union_all(),DataFrame.except_(),DataFrame.intersect(),DataFrame.union_by_name()have flattened SQLs generated when multiple set operators are chained.
- This improvement could be enabled by setting
- Improved type annotations for async job APIs.
- Fixed a bug in which
Table.update(),Table.delete(),Table.merge()try to reference a temp table that does not exist.
- Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:
- Added keyword argument
blockto the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:DataFrame.collect(),DataFrame.to_local_iterator(),DataFrame.to_pandas(),DataFrame.to_pandas_batches(),DataFrame.count(),DataFrame.first().DataFrameWriter.save_as_table(),DataFrameWriter.copy_into_location().Table.delete(),Table.update(),Table.merge().
- Added method
DataFrame.collect_nowait()to allow asynchronous evaluations. - Added class
AsyncJobto retrieve results from asynchronously executed queries and check their status.
- Added keyword argument
- Added support for
table_typeinSession.write_pandas(). You can now choose from thesetable_typeoptions:"temporary","temp", and"transient". - Added support for using Python structured data (
list,tupleanddict) as literal values in Snowpark. - Added keyword argument
execute_astofunctions.sproc()andsession.sproc.register()to allow registering a stored procedure as a caller or owner. - Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.
- Added support for displaying details of a Snowpark session.
- Fixed a bug in which
DataFrame.copy_into_table()andDataFrameWriter.save_as_table()mistakenly created a new table if the table name is fully qualified, and the table already exists.
- Deprecated keyword argument
create_temp_tableinSession.write_pandas(). - Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.
- Updated
snowflake-connector-pythonto 2.7.12.
- Added support for displaying source code as comments in the generated scripts when registering UDFs.
This feature is turned on by default. To turn it off, pass the new keyword argument
source_code_displayasFalsewhen callingregister()or@udf(). - Added support for calling table functions from
DataFrame.select(),DataFrame.with_column()andDataFrame.with_columns()which now take parameters of typetable_function.TableFunctionCallfor columns. - Added keyword argument
overwritetosession.write_pandas()to allow overwriting contents of a Snowflake table with that of a pandas DataFrame. - Added keyword argument
column_ordertodf.write.save_as_table()to specify the matching rules when inserting data into table in append mode. - Added method
FileOperation.put_stream()to upload local files to a stage via file stream. - Added methods
TableFunctionCall.alias()andTableFunctionCall.as_()to allow aliasing the names of columns that come from the output of table function joins. - Added function
get_active_session()in modulesnowflake.snowpark.contextto get the current active Snowpark session.
- Fixed a bug in which batch insert should not raise an error when
statement_paramsis not passed to the function. - Fixed a bug in which column names should be quoted when
session.create_dataframe()is called with dicts and a given schema. - Fixed a bug in which creation of table should be skipped if the table already exists and is in append mode when calling
df.write.save_as_table(). - Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.
- Improved function
function.uniform()to infer the types of inputsmax_andmin_and cast the limits toIntegerTypeorFloatTypecorrespondingly.
- Added keyword only argument
statement_paramsto the following methods to allow for specifying statement level parameters:collect,to_local_iterator,to_pandas,to_pandas_batches,count,copy_into_table,show,create_or_replace_view,create_or_replace_temp_view,first,cache_resultandrandom_spliton classsnowflake.snowpark.Dateframe.update,deleteandmergeon classsnowflake.snowpark.Table.save_as_tableandcopy_into_locationon classsnowflake.snowpark.DataFrameWriter.approx_quantile,statement_params,covandcrosstabon classsnowflake.snowpark.DataFrameStatFunctions.registerandregister_from_fileon classsnowflake.snowpark.udf.UDFRegistration.registerandregister_from_fileon classsnowflake.snowpark.udtf.UDTFRegistration.registerandregister_from_fileon classsnowflake.snowpark.stored_procedure.StoredProcedureRegistration.udf,udtfandsprocinsnowflake.snowpark.functions.
- Added support for
Columnas an input argument tosession.call(). - Added support for
table_typeindf.write.save_as_table(). You can now choose from thesetable_typeoptions:"temporary","temp", and"transient".
- Added validation of object name in
session.use_*methods. - Updated the query tag in SQL to escape it when it has special characters.
- Added a check to see if Anaconda terms are acknowledged when adding missing packages.
- Fixed the limited length of the string column in
session.create_dataframe(). - Fixed a bug in which
session.create_dataframe()mistakenly converted 0 andFalsetoNonewhen the input data was only a list. - Fixed a bug in which calling
session.create_dataframe()using a large local dataset sometimes created a temp table twice. - Aligned the definition of
function.trim()with the SQL function definition. - Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function)
sumvs. the Snowparkfunction.sum().
- Deprecated keyword argument
create_temp_tableindf.write.save_as_table().
- Added support for user-defined table functions (UDTFs).
- Use function
snowflake.snowpark.functions.udtf()to register a UDTF, or use it as a decorator to register the UDTF.- You can also use
Session.udtf.register()to register a UDTF.
- You can also use
- Use
Session.udtf.register_from_file()to register a UDTF from a Python file.
- Use function
- Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.
- Use function
snowflake.snowpark.functions.table_function()to create a callable representing a table function and use it to call the table function in a query. - Alternatively, use function
snowflake.snowpark.functions.call_table_function()to call a table function. - Added support for
overclause that specifiespartition byandorder bywhen lateral joining a table function. - Updated
Session.table_function()andDataFrame.join_table_function()to acceptTableFunctionCallinstances.
- Use function
- When creating a function with
functions.udf()andfunctions.sproc(), you can now specify an empty list for theimportsorpackagesargument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages. - Improved the
__repr__implementation of data types intypes.py. The unusedtype_nameproperty has been removed. - Added a Snowpark-specific exception class for SQL errors. This replaces the previous
ProgrammingErrorfrom the Python connector.
- Added a lock to a UDF or UDTF when it is called for the first time per thread.
- Improved the error message for pickling errors that occurred during UDF creation.
- Included the query ID when logging the failed query.
- Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling
DataFrame.to_pandas(). - Fixed a bug in which
DataFrameReader.parquet()failed to read a parquet file when its column contained spaces. - Fixed a bug in which
DataFrame.copy_into_table()failed when the dataframe is created by reading a file with inferred schemas.
Session.flatten() and DataFrame.flatten().
- Restricted the version of
cloudpickle<=2.0.0.
- Added support for vectorized UDFs with the input as a pandas DataFrame or pandas Series and the output as a pandas Series. This improves the performance of UDFs in Snowpark.
- Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.
- Added functions
current_session(),current_statement(),current_user(),current_version(),current_warehouse(),date_from_parts(),date_trunc(),dayname(),dayofmonth(),dayofweek(),dayofyear(),grouping(),grouping_id(),hour(),last_day(),minute(),next_day(),previous_day(),second(),month(),monthname(),quarter(),year(),current_database(),current_role(),current_schema(),current_schemas(),current_region(),current_avaliable_roles(),add_months(),any_value(),bitnot(),bitshiftleft(),bitshiftright(),convert_timezone(),uniform(),strtok_to_array(),sysdate(),time_from_parts(),timestamp_from_parts(),timestamp_ltz_from_parts(),timestamp_ntz_from_parts(),timestamp_tz_from_parts(),weekofyear(),percentile_cont()tosnowflake.snowflake.functions.
- Expired deprecations:
- Removed the following APIs that were deprecated in 0.4.0:
DataFrame.groupByGroupingSets(),DataFrame.naturalJoin(),DataFrame.joinTableFunction,DataFrame.withColumns(),Session.getImports(),Session.addImport(),Session.removeImport(),Session.clearImports(),Session.getSessionStage(),Session.getDefaultDatabase(),Session.getDefaultSchema(),Session.getCurrentDatabase(),Session.getCurrentSchema(),Session.getFullyQualifiedCurrentSchema().
- Removed the following APIs that were deprecated in 0.4.0:
- Added support for creating an empty
DataFramewith a specific schema using theSession.create_dataframe()method. - Changed the logging level from
INFOtoDEBUGfor several logs (e.g., the executed query) when evaluating a dataframe. - Improved the error message when failing to create a UDF due to pickle errors.
- Removed pandas hard dependencies in the
Session.create_dataframe()method.
- Added
typing-extensionas a new dependency with the version >=4.1.0.
- Added stored procedures API.
- Added
Session.sprocproperty andsproc()tosnowflake.snowpark.functions, so you can register stored procedures. - Added
Session.callto call stored procedures by name.
- Added
- Added
UDFRegistration.register_from_file()to allow registering UDFs from Python source files or zip files directly. - Added
UDFRegistration.describe()to describe a UDF. - Added
DataFrame.random_split()to provide a way to randomly split a dataframe. - Added functions
md5(),sha1(),sha2(),ascii(),initcap(),length(),lower(),lpad(),ltrim(),rpad(),rtrim(),repeat(),soundex(),regexp_count(),replace(),charindex(),collate(),collation(),insert(),left(),right(),endswith()tosnowflake.snowpark.functions. - Allowed
call_udf()to accept literal values. - Provided a
distinctkeyword inarray_agg().
- Fixed an issue that caused
DataFrame.to_pandas()to have a string column ifColumn.cast(IntegerType())was used. - Fixed a bug in
DataFrame.describe()when there is more than one string column.
- You can now specify which Anaconda packages to use when defining UDFs.
- Added
add_packages(),get_packages(),clear_packages(), andremove_package(), to classSession. - Added
add_requirements()toSessionso you can use a requirements file to specify which packages this session will use. - Added parameter
packagesto functionsnowflake.snowpark.functions.udf()and methodUserDefinedFunction.register()to indicate UDF-level Anaconda package dependencies when creating a UDF. - Added parameter
importstosnowflake.snowpark.functions.udf()andUserDefinedFunction.register()to specify UDF-level code imports.
- Added
- Added a parameter
sessionto functionudf()andUserDefinedFunction.register()so you can specify which session to use to create a UDF if you have multiple sessions. - Added types
GeographyandVarianttosnowflake.snowpark.typesto be used as type hints for Geography and Variant data when defining a UDF. - Added support for Geography geoJSON data.
- Added
Table, a subclass ofDataFramefor table operations:- Methods
updateanddeleteupdate and delete rows of a table in Snowflake. - Method
mergemerges data from aDataFrameto aTable. - Override method
DataFrame.sample()with an additional parameterseed, which works on tables but not on view and sub-queries.
- Methods
- Added
DataFrame.to_local_iterator()andDataFrame.to_pandas_batches()to allow getting results from an iterator when the result set returned from the Snowflake database is too large. - Added
DataFrame.cache_result()for caching the operations performed on aDataFramein a temporary table. Subsequent operations on the originalDataFramehave no effect on the cached resultDataFrame. - Added property
DataFrame.queriesto get SQL queries that will be executed to evaluate theDataFrame. - Added
Session.query_history()as a context manager to track SQL queries executed on a session, including all SQL queries to evaluateDataFrames created from a session. Both query ID and query text are recorded. - You can now create a
Sessioninstance from an existing establishedsnowflake.connector.SnowflakeConnection. Use parameterconnectioninSession.builder.configs(). - Added
use_database(),use_schema(),use_warehouse(), anduse_role()to classSessionto switch database/schema/warehouse/role after a session is created. - Added
DataFrameWriter.copy_into_table()to unload aDataFrameto stage files. - Added
DataFrame.unpivot(). - Added
Column.within_group()for sorting the rows by columns with some aggregation functions. - Added functions
listagg(),mode(),div0(),acos(),asin(),atan(),atan2(),cos(),cosh(),sin(),sinh(),tan(),tanh(),degrees(),radians(),round(),trunc(), andfactorial()tosnowflake.snowflake.functions. - Added an optional argument
ignore_nullsin functionlead()andlag(). - The
conditionparameter of functionwhen()andiff()now accepts SQL expressions.
- All function and method names have been renamed to use the snake case naming style, which is more Pythonic. For convenience, some camel case names are kept as aliases to the snake case APIs. It is recommended to use the snake case APIs.
- Deprecated these methods on class
Sessionand replaced them with their snake case equivalents:getImports(),addImports(),removeImport(),clearImports(),getSessionStage(),getDefaultSchema(),getDefaultSchema(),getCurrentDatabase(),getFullyQualifiedCurrentSchema(). - Deprecated these methods on class
DataFrameand replaced them with their snake case equivalents:groupingByGroupingSets(),naturalJoin(),withColumns(),joinTableFunction().
- Deprecated these methods on class
- Property
DataFrame.columnsis now consistent withDataFrame.schema.namesand the Snowflake databaseIdentifier Requirements. Column.__bool__()now raises aTypeError. This will ban the use of logical operatorsand,or,notonColumnobject, for instancecol("a") > 1 and col("b") > 2will raise theTypeError. Use(col("a") > 1) & (col("b") > 2)instead.- Changed
PutResultandGetResultto subclassNamedTuple. - Fixed a bug which raised an error when the local path or stage location has a space or other special characters.
- Changed
DataFrame.describe()so that non-numeric and non-string columns are ignored instead of raising an exception.
- Updated
snowflake-connector-pythonto 2.7.4.
- Added
Column.isin(), with an aliasColumn.in_(). - Added
Column.try_cast(), which is a special version ofcast(). It tries to cast a string expression to other types and returnsnullif the cast is not possible. - Added
Column.startswith()andColumn.substr()to process string columns. Column.cast()now also accepts astrvalue to indicate the cast type in addition to aDataTypeinstance.- Added
DataFrame.describe()to summarize stats of aDataFrame. - Added
DataFrame.explain()to print the query plan of aDataFrame. DataFrame.filter()andDataFrame.select_expr()now accepts a sql expression.- Added a new
boolparametercreate_temp_tableto methodsDataFrame.saveAsTable()andSession.write_pandas()to optionally create a temp table. - Added
DataFrame.minus()andDataFrame.subtract()as aliases toDataFrame.except_(). - Added
regexp_replace(),concat(),concat_ws(),to_char(),current_timestamp(),current_date(),current_time(),months_between(),cast(),try_cast(),greatest(),least(), andhash()to modulesnowflake.snowpark.functions.
- Fixed an issue where
Session.createDataFrame(pandas_df)andSession.write_pandas(pandas_df)raise an exception when thepandas DataFramehas spaces in the column name. DataFrame.copy_into_table()sometimes prints anerrorlevel log entry while it actually works. It's fixed now.- Fixed an API docs issue where some
DataFrameAPIs are missing from the docs.
- Update
snowflake-connector-pythonto 2.7.2, which upgradespyarrowdependency to 6.0.x. Refer to the python connector 2.7.2 release notes for more details.
- Updated the
Session.createDataFrame()method for creating aDataFramefrom a pandas DataFrame. - Added the
Session.write_pandas()method for writing apandas DataFrameto a table in Snowflake and getting aSnowpark DataFrameobject back. - Added new classes and methods for calling window functions.
- Added the new functions
cume_dist(), to find the cumulative distribution of a value with regard to other values within a window partition, androw_number(), which returns a unique row number for each row within a window partition. - Added functions for computing statistics for DataFrames in the
DataFrameStatFunctionsclass. - Added functions for handling missing values in a DataFrame in the
DataFrameNaFunctionsclass. - Added new methods
rollup(),cube(), andpivot()to theDataFrameclass. - Added the
GroupingSetsclass, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS. - Added the new
FileOperation(session)class that you can use to upload and download files to and from a stage. - Added the
DataFrame.copy_into_table()method for loading data from files in a stage into a table. - In CASE expressions, the functions
when()andotherwise()now accept Python types in addition toColumnobjects. - When you register a UDF you can now optionally set the
replaceparameter toTrueto overwrite an existing UDF with the same name.
- UDFs are now compressed before they are uploaded to the server. This makes them about 10 times smaller, which can help when you are using large ML model files.
- When the size of a UDF is less than 8196 bytes, it will be uploaded as in-line code instead of uploaded to a stage.
- Fixed an issue where the statement
df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)]raised an exception. - Fixed an issue where
df.toPandas()raised an exception when a DataFrame was created from large local data.
Start of Private Preview