Releases: snowflakedb/snowpark-python
Release
1.12.1 (2024-02-08)
Improvements
- Use
split_blocks=Trueby default duringto_pandasconversion, for optimal memory allocation. This parameter is passed topyarrow.Table.to_pandas, which enablesPyArrowto split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
Bug Fixes
- Fixed a bug in
DataFrame.to_pandasthat caused an error when evaluating on a Dataframe with anIntergerTypecolumn with null values.
v1.12.0
1.12.0 (2024-01-30)
New Features
- Exposed
statement_paramsinStoredProcedure.__call__. - Added two optional arguments to
Session.add_import.chunk_size: The number of bytes to hash per chunk of the uploaded files.whole_file_hash: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
- Added parameters
external_access_integrationsandsecretswhen creating a UDAF from Snowpark Python to allow integration with external access. - Added a new method
Session.append_query_tag. Allows an additional tag to be added to the current query tag by appending it as a comma separated value. - Added a new method
Session.update_query_tag. Allows updates to a JSON encoded dictionary query tag. SessionBuilder.getOrCreatewill now attempt to replace the singleton it returns when token expiration has been detected.- Added support for new functions in
snowflake.snowpark.functions:array_exceptcreate_mapsign/signum
- Added the following functions to
DataFrame.analytics:- Added the
moving_aggfunction inDataFrame.analyticsto enable moving aggregations like sums and averages with multiple window sizes. - Added the
cummulative_aggfunction inDataFrame.analyticsto enable moving aggregations like sums and averages with multiple window sizes.
- Added the
Bug Fixes
-
Fixed a bug in
DataFrame.na.fillthat caused Boolean values to erroneously override integer values. -
Fixed a bug in
Session.create_dataframewhere the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
LongType(), but will now be correctly maintained as timestamp values and be inferred asTimestampType(TimestampTimeZone.NTZ). - Earlier timestamp columns with a timezone would be inferred as
TimestampType(TimestampTimeZone.NTZ)and loose timezone information but will now be correctly inferred asTimestampType(TimestampTimeZone.LTZ)and timezone information is retained correctly. - Set session parameter
PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAMEto revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.
- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as
-
Fixed a bug that
DataFrame.to_pandasgets decimal type when scale is not 0, and creates an object dtype inpandas. Instead, we cast the value to a float64 type. -
Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter()is called afterDataFrame.sort().limit().DataFrame.sort()orfilter()is called on a DataFrame that already has a window function or sequence-dependent data generator column.
For instance,df.select("a", seq1().alias("b")).select("a", "b").sort("a")won't flatten the sort clause anymore.- a window or sequence-dependent data generator column is used after
DataFrame.limit(). For instance,df.limit(10).select(row_number().over())won't flatten the limit and select in the generated SQL.
-
Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b")) df = copy(df) df.select(col("b").alias("c")) # threw an error. Now it's fixed.
-
Fixed a bug in
Session.create_dataframethat the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table. -
Fixed a bug in SQL simplifier where non-select statements in
session.sqldropped a SQL query when used withlimit(). -
Fixed a bug that raised an exception when session parameter
ERROR_ON_NONDETERMINISTIC_UPDATEis true.
Behavior Changes (API Compatible)
- When parsing data types during a
to_pandasoperation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned asint8gets returned asint64. Users can fix this by explicitly specifying precision values for their return column. - Aligned behavior for
Session.callin case of table stored procedures where runningSession.callwould not trigger stored procedure unless acollect()operation was performed. StoredProcedureRegistrationwill now automatically addsnowflake-snowpark-pythonas a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.
Release
1.11.1 (2023-12-07)
Bug Fixes
- Fixed a bug that numpy should not be imported at the top level of mock module.
Release
1.11.0 (2023-12-05)
New Features
-
Add the
conn_errorattribute toSnowflakeSQLExceptionthat stores the whole underlying exception fromsnowflake-connector-python. -
Added support for
RelationalGroupedDataframe.pivot()to accesspivotin the following patternDataframe.group_by(...).pivot(...). -
Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
-
Added support for
arrays_to_objectnew functions insnowflake.snowpark.functions. -
Added support for the vector data type.
Dependency Updates
- Bumped cloudpickle dependency to work with
cloudpickle==2.2.1 - Updated
snowflake-connector-pythonto3.4.0.
Bug Fixes
- DataFrame column names quoting check now supports newline characters.
- Fix a bug where a DataFrame generated by
session.read.with_metadatacreates inconsistent table when doingdf.write.save_as_table.
Release
1.10.0 (2023-11-03)
New Features
- Added support for managing case sensitivity in
DataFrame.to_local_iterator(). - Added support for specifying vectorized UDTF's input column names by using the optional parameter
input_namesinUDTFRegistration.register/register_fileandfunctions.pandas_udtf. By default,RelationalGroupedDataFrame.applyInPandaswill infer the column names from current dataframe schema. - Add
sql_error_codeandraw_messageattributes toSnowflakeSQLExceptionwhen it is caused by a SQL exception.
Bug Fixes
- Fixed a bug in
DataFrame.to_pandas()where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits. - Fixed a bug that
session.add_packagescan not handle requirement specifier that contains project name with underscore and version. - Fixed a bug in
DataFrame.limit()whenoffsetis used and the parentDataFrameuseslimit. Now theoffsetwon't impact the parent DataFrame'slimit. - Fixed a bug in
DataFrame.write.save_as_tablewhere dataframes created from read api could not save data into snowflake because of invalid column name$1.
Behavior change
- Changed the behavior of
date_format:- The
formatargument changed from optional to required. - The returned result changed from a date object to a date-formatted string.
- The
- When a window function, or a sequence-dependent data generator (
normal,zipf,uniform,seq1,seq2,seq4,seq8) function is used, the sort and filter operation will no longer be flattened when generating the query.
Release
1.9.0 (2023-10-13)
New Features
- Added support for the Python 3.11 runtime environment.
Dependency updates
- Added back the dependency of
typing-extensions.
Bug Fixes
- Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.
- Revert back to using CTAS (create table as select) statement for
Dataframe.writer.save_as_tablewhich does not need insert permission for writing tables.
New Features
- Support
PythonObjJSONEncoderjson-serializable objects forARRAYandOBJECTliterals.
Release
1.8.0 (2023-09-14)
New Features
- Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.
- Added support for specifying clustering keys when saving dataframes using
DataFrame.save_as_table. - Accept
Iterableobjects input forschemawhen creating dataframes usingSession.create_dataframe. - Added the property
DataFrame.sessionto return aSessionobject. - Added the property
Session.session_idto return an integer that represents session ID. - Added the property
Session.connectionto return aSnowflakeConnectionobject . - Added support for creating a Snowpark session from a configuration file or environment variables.
Dependency updates
- Updated
snowflake-connector-pythonto 3.2.0.
Bug Fixes
- Fixed a bug where automatic package upload would raise
ValueErroreven when compatible package version were added insession.add_packages. - Fixed a bug where table stored procedures were not registered correctly when using
register_from_file. - Fixed a bug where dataframe joins failed with
invalid_identifiererror. - Fixed a bug where
DataFrame.copydisables SQL simplfier for the returned copy. - Fixed a bug where
session.sql().select()would fail if any parameters are specified tosession.sql().
Release
1.7.0 (2023-08-28)
New Features
- Added parameters
external_access_integrationsandsecretswhen creating a UDF, UDTF or Stored Procedure from Snowpark Python to allow integration with external access. - Added support for these new functions in
snowflake.snowpark.functions:array_flattenflatten
- Added support for
apply_in_pandasinsnowflake.snowpark.relational_grouped_dataframe. - Added support for replicating your local Python environment on Snowflake via
Session.replicate_local_environment.
Bug Fixes
- Fixed a bug where
session.create_dataframefails to properly set nullable columns where nullability was affected by order or data was given. - Fixed a bug where
DataFrame.selectcould not identify and alias columns in presence of table functions when output columns of table function overlapped with columns in dataframe.
Behavior Changes
- When creating stored procedures, UDFs, UDTFs, UDAFs with parameter
is_permanent=Falsewill now create temporary objects even whenstage_nameis provided. The default value ofis_permanentisFalsewhich is why if this value is not explicitly set toTruefor permanent objects, users will notice a change in behavior. types.StructFieldnow enquotes column identifier by default.
Release
1.6.1 (2023-08-02)
New Features
- Added support for these new functions in
snowflake.snowpark.functions:array_sortsort_arrayarray_minarray_maxexplode_outer
- Added support for pure Python packages specified via
Session.add_requirementsorSession.add_packages. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.- Added Session parameter
custom_packages_upload_enabledandcustom_packages_force_upload_enabledto enable the support for pure Python packages feature mentioned above. Both parameters default toFalse.
- Added Session parameter
- Added support for specifying package requirements by passing a Conda environment yaml file to
Session.add_requirements. - Added support for asynchronous execution of multi-query dataframes that contain binding variables.
- Added support for renaming multiple columns in
DataFrame.rename. - Added support for Geometry datatypes.
- Added support for
paramsinsession.sql()in stored procedures. - Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.
- Added support for vectorized UDTFs (user-defined table functions). This feature is currently in public preview.
- Added support for Snowflake Timestamp variants (i.e.,
TIMESTAMP_NTZ,TIMESTAMP_LTZ,TIMESTAMP_TZ)- Added
TimestampTimezoneas an argument inTimestampTypeconstructor. - Added type hints
NTZ,LTZ,TZandTimestampto annotate functions when registering UDFs.
- Added
Improvements
- Removed redundant dependency
typing-extensions. DataFrame.cache_resultnow creates temp table fully qualified names under current database and current schema.
Bug Fixes
- Fixed a bug where type check happens on pandas before it is imported.
- Fixed a bug when creating a UDF from
numpy.ufunc. - Fixed a bug where
DataFrame.unionwas not generating the correctSelectable.schema_querywhen SQL simplifier is enabled.
Behavior Changes
DataFrameWriter.save_as_tablenow respects thenullablefield of the schema provided by the user or the inferred schema based on data from user input.
Dependency updates
- Updated
snowflake-connector-pythonto 3.0.4.
Release
1.5.1 (2023-06-20)
New Features
- Added support for the Python 3.10 runtime environment.