Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
Release
1.5.0 (2023-06-09)
Behavior Changes
- Aggregation results, from functions such as DataFrame.agg and DataFrame.describe, no longer strip away non-printing characters from column names.
New Features
- Added support for the Python 3.9 runtime environment.
- Added support for new functions in snowflake.snowpark.functions:
- array_generate_range
- array_unique_agg
- collect_set
- sequence
- Added support for registering and calling stored procedures with TABLE return type.
- Added support for parameter length in StringType() to specify the maximum number of characters that can be stored by the column.
- Added the alias functions.element_at() for functions.get().
- Added the alias Column.contains for functions.contains.
- Added experimental feature DataFrame.alias.
- Added support for querying metadata columns from stage when creating DataFrame using DataFrameReader.
- Added support for StructType.add to append more fields to existing StructType objects.
- Added support for parameter execute_as in StoredProcedureRegistration.register_from_file() to specify stored procedure caller rights.
Bug Fixes
- Fixed a bug where the Dataframe.join_table_function did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.
- Fixed type hint declaration for custom types - ColumnOrName, ColumnOrLiteralStr, ColumnOrSqlExpr, LiteralType and ColumnOrLiteral that were breaking mypy checks.
- Fixed a bug where DataFrameWriter.save_as_table and DataFrame.copy_into_table failed to parse fully qualified table names.
Release
1.4.0 (2023-04-24)
New Features
- Added support for
session.getOrCreate. - Added support for alias
Column.getField. - Added support for new functions in
snowflake.snowpark.functions:date_addanddate_subto make add and subtract operations easier.daydiffexplodearray_distinct.regexp_extract.struct.format_number.bround.substring_index
- Added parameter
skip_upload_on_content_matchwhen creating UDFs, UDTFs and stored procedures usingregister_from_fileto skip uploading files to a stage if the same version of the files are already on the stage. - Added support for
DataFrame.save_as_tablemethod to take table names that contain dots. - Flattened generated SQL when
DataFrame.filter()orDataFrame.order_by()is followed by a projection statement (e.g.DataFrame.select(),DataFrame.with_column()). - Added support for creating dynamic tables (in private preview) using
Dataframe.create_or_replace_dynamic_table. - Added an optional argument
paramsinsession.sql()to support binding variables. Note that this is not supported in stored procedures yet.
Bug Fixes
- Fixed a bug in
strtok_to_arraywhere an exception was thrown when a delimiter was passed in. - Fixed a bug in
session.add_importwhere the module had the same namespace as other dependencies.
Release
1.3.0 (2023-03-28)
New Features
- Added support for
delimitersparameter infunctions.initcap(). - Added support for
functions.hash()to accept a variable number of input expressions. - Added API
Session.conffor getting, setting or checking the mutability of any runtime configuration. - Added support for managing case sensitivity in
Rowresults fromDataFrame.collectusingcase_sensitiveparameter. - Added indexer support for
snowflake.snowpark.types.StructType. - Added a keyword argument
log_on_exceptiontoDataframe.collectandDataframe.collect_no_waitto optionally disable error logging for SQL exceptions.
Bug Fixes
- Fixed a bug where a DataFrame set operation(
DataFrame.substract,DataFrame.union, etc.) being called after another DataFrame set operation andDataFrame.selectorDataFrame.with_columnthrows an exception. - Fixed a bug where chained sort statements are overwritten by the SQL simplifier.
Improvements
- Simplified JOIN queries to use constant subquery aliases (
SNOWPARK_LEFT,SNOWPARK_RIGHT) by default. Users can disable this at runtime withsession.conf.set('use_constant_subquery_alias', False)to use randomly generated alias names instead. - Allowed specifying statement parameters in
session.call(). - Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.
Release
1.2.0 (2023-03-02)
New Features
- Added support for displaying source code as comments in the generated scripts when registering stored procedures. This
is enabled by default, turn off by specifyingsource_code_display=Falseat registration. - Added a parameter
if_not_existswhen creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists. - Accept integers when calling
snowflake.snowpark.functions.getto extract value from array. - Added
functions.reversein functions to open access to Snowflake built-in function
reverse. - Added parameter
require_scoped_urlin snowflake.snowflake.files.SnowflakeFile.open()(in Private Preview)to replaceis_owner_fileis marked for deprecation.
Bug Fixes
- Fixed a bug that overwrote
paramstyletoqmarkwhen creating a Snowpark session. - Fixed a bug where
df.join(..., how="cross")fails withSnowparkJoinException: (1112): Unsupported using join type 'Cross'. - Fixed a bug where querying a
DataFramecolumn created from chained function calls used a wrong column name.
1.1.0
1.1.0 (2023-01-26)
New Features:
- Added
asc,asc_nulls_first,asc_nulls_last,desc,desc_nulls_first,desc_nulls_last,date_partandunix_timestampin functions. - Added the property
DataFrame.dtypesto return a list of column name and data type pairs. - Added the following aliases:
functions.expr()forfunctions.sql_expr().functions.date_format()forfunctions.to_date().functions.monotonically_increasing_id()forfunctions.seq8()functions.from_unixtime()forfunctions.to_timestamp()
Bug Fixes:
- Fixed a bug in SQL simplifier that didn’t handle Column alias and join well in some cases. See #658 for details.
- Fixed a bug in SQL simplifier that generated wrong column names for function calls, NaN and INF.
Improvements
- The session parameter
PYTHON_SNOWPARK_USE_SQL_SIMPLIFIERisTrueafter Snowflake 7.3 was released. In snowpark-python,session.sql_simplifier_enabledreads the value ofPYTHON_SNOWPARK_USE_SQL_SIMPLIFIERby default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, setPYTHON_SNOWPARK_USE_SQL_SIMPLIFIERin Snowflake toFalseor runsession.sql_simplifier_enabled = Falsefrom Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.
1.0.0
1.0.0 (2022-11-01)
New Features
- Added
Session.generator()to create a newDataFrameusing the Generator table function. - Added a parameter
secureto the functions that create a secure UDF or UDTF.
v0.12.0
0.12.0 (2022-10-14)
New Features
- Added new APIs for async job:
Session.create_async_job()to create anAsyncJobinstance from a query id.AsyncJob.result()now accepts argumentresult_typeto return the results in different formats.AsyncJob.to_df()returns aDataFramebuilt from the result of this asynchronous job.AsyncJob.query()returns the SQL text of the executed query.
DataFrame.agg()andRelationalGroupedDataFrame.agg()now accept variable-length arguments.- Added parameters
lsuffixandrsuffixtoDataFram.join()andDataFrame.cross_join()to conveniently rename overlapping columns. - Added
Table.drop_table()so you can drop the temp table afterDataFrame.cache_result().Tableis also a context manager so you can use thewithstatement to drop the cache temp table after use. - Added
Session.use_secondary_roles(). - Added functions
first_value()andlast_value(). (contributed by @chasleslr) - Added
onas an alias forusing_columnsandhowas an alias forjoin_typeinDataFrame.join().
Bug Fixes
- Fixed a bug in
Session.create_dataframe()that raised an error whenschemanames had special characters. - Fixed a bug in which options set in
Session.read.option()were not passed toDataFrame.copy_into_table()as default values. - Fixed a bug in which
DataFrame.copy_into_table()raises an error when a copy option has single quotes in the value.
v0.11.0
0.11.0 (2022-09-28)
Behavior Changes:
Session.add_packages()now raisesValueErrorwhen the version of a package cannot be found in Snowflake Anaconda channel. Previously,Session.add_packages()succeeded, and aSnowparkSQLExceptionexception was raised later in the UDF/SP registration step.
New Features:
- Added method
FileOperation.get_stream()to support downloading stage files as stream. - Added support in
functions.ntiles()to accept int argument. - Added the following aliases:
functions.call_function()forfunctions.call_builtin().functions.function()forfunctions.builtin().DataFrame.order_by()forDataFrame.sort()DataFrame.orderBy()forDataFrame.sort()
- Improved
DataFrame.cache_result()to return a more accurateTableclass instead of aDataFrameclass. - Added support to allow
sessionas the first argument when callingStoredProcedure.
Improvements:
- Improved nested query generation by flattening queries when applicable.
- This improvement could be enabled by setting
Session.sql_simplifier_enabled = True. DataFrame.select(),DataFrame.with_column(),DataFrame.drop()and other select-related APIs have more flattened SQLs.DataFrame.union(),DataFrame.union_all(),DataFrame.except_(),DataFrame.intersect(),DataFrame.union_by_name()have flattened SQLs generated when multiple set operators are chained.
- This improvement could be enabled by setting
- Improved type annotations for async job APIs.
Bug Fixes:
- Fixed a bug in which
Table.update(),Table.delete(),Table.merge()try to reference a temp table that does not exist.
v0.10.0
0.10.0 (2022-09-16)
New Features:
- Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:
- Added keyword argument
blockto the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:DataFrame.collect(),DataFrame.to_local_iterator(),DataFrame.to_pandas(),DataFrame.to_pandas_batches(),DataFrame.count(),DataFrame.first().DataFrameWriter.save_as_table(),DataFrameWriter.copy_into_location().Table.delete(),Table.update(),Table.merge().
- Added method
DataFrame.collect_nowait()to allow asynchronous evaluations. - Added class
AsyncJobto retrieve results from asynchronously executed queries and check their status.
- Added keyword argument
- Added support for
table_typeinSession.write_pandas(). You can now choose from thesetable_typeoptions:"temporary","temp", and"transient". - Added support for using Python structured data (
list,tupleanddict) as literal values in Snowpark. - Added keyword argument
execute_astofunctions.sproc()andsession.sproc.register()to allow registering a stored procedure as a caller or owner. - Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.
Improvements:
- Added support for displaying details of a Snowpark session.
Bug Fixes:
- Fixed a bug in which
DataFrame.copy_into_table()andDataFrameWriter.save_as_table()mistakenly created a new table if the table name is fully qualified, and the table already exists.
Deprecations:
- Deprecated keyword argument
create_temp_tableinSession.write_pandas(). - Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.
Dependency updates
- Updated
snowflake-connector-pythonto 2.7.12.
v0.9.0
0.9.0 (2022-08-30)
New Features:
- Added support for displaying source code as comments in the generated scripts when registering UDFs.
This feature is turned on by default. To turn it off, pass the new keyword argumentsource_code_displayasFalsewhen callingregister()or@udf(). - Added support for calling table functions from
DataFrame.select(),DataFrame.with_column()andDataFrame.with_columns()which now take parameters of typetable_function.TableFunctionCallfor columns. - Added keyword argument
overwritetosession.write_pandas()to allow overwriting contents of a Snowflake table with that of a Pandas DataFrame. - Added keyword argument
column_ordertodf.write.save_as_table()to specify the matching rules when inserting data into table in append mode. - Added method
FileOperation.put_stream()to upload local files to a stage via file stream. - Added methods
TableFunctionCall.alias()andTableFunctionCall.as_()to allow aliasing the names of columns that come from the output of table function joins. - Added function
get_active_session()in modulesnowflake.snowpark.contextto get the current active Snowpark session.
Bug Fixes:
- Fixed a bug in which batch insert should not raise an error when
statement_paramsis not passed to the function. - Fixed a bug in which column names should be quoted when
session.create_dataframe()is called with dicts and a given schema. - Fixed a bug in which creation of table should be skipped if the table already exists and is in append mode when calling
df.write.save_as_table(). - Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.
Improvements:
- Improved function
function.uniform()to infer the types of inputsmax_andmin_and cast the limits toIntegerTypeorFloatTypecorrespondingly.