Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
v0.8.0
0.8.0 (2022-07-22)
New Features:
- Added keyword only argument
statement_paramsto the following methods to allow for specifying statement level parameters:collect,to_local_iterator,to_pandas,to_pandas_batches,
count,copy_into_table,show,create_or_replace_view,create_or_replace_temp_view,first,cache_result
andrandom_spliton classsnowflake.snowpark.Dateframe.update,deleteandmergeon classsnowflake.snowpark.Table.save_as_tableandcopy_into_locationon classsnowflake.snowpark.DataFrameWriter.approx_quantile,statement_params,covandcrosstabon classsnowflake.snowpark.DataFrameStatFunctions.registerandregister_from_fileon classsnowflake.snowpark.udf.UDFRegistration.registerandregister_from_fileon classsnowflake.snowpark.udtf.UDTFRegistration.registerandregister_from_fileon classsnowflake.snowpark.stored_procedure.StoredProcedureRegistration.udf,udtfandsprocinsnowflake.snowpark.functions.
- Added support for
Columnas an input argument tosession.call(). - Added support for
table_typeindf.write.save_as_table(). You can now choose from thesetable_typeoptions:"temporary","temp", and"transient".
Improvements:
- Added validation of object name in
session.use_*methods. - Updated the query tag in SQL to escape it when it has special characters.
- Added a check to see if Anaconda terms are acknowledged when adding missing packages.
Bug Fixes:
- Fixed the limited length of the string column in
session.create_dataframe(). - Fixed a bug in which
session.create_dataframe()mistakenly converted 0 andFalsetoNonewhen the input data was only a list. - Fixed a bug in which calling
session.create_dataframe()using a large local dataset sometimes created a temp table twice. - Aligned the definition of
function.trim()with the SQL function definition. - Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function)
sumvs. the Snowparkfunction.sum().
v0.7.0
0.7.0
New Features:
- Added support for user-defined table functions (UDTFs).
- Use function
snowflake.snowpark.functions.udtf()to register a UDTF, or use it as a decorator to register the UDTF.- You can also use
Session.udtf.register()to register a UDTF.
- You can also use
- Use
Session.udtf.register_from_file()to register a UDTF from a Python file.
- Use function
- Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.
- Use function
snowflake.snowpark.functions.table_function()to create a callable representing a table function and use it to call the table function in a query. - Alternatively, use function
snowflake.snowpark.functions.call_table_function()to call a table function. - Added support for
overclause that specifiespartition byandorder bywhen lateral joining a table function. - Updated
Session.table_function()andDataFrame.join_table_function()to acceptTableFunctionCallinstances.
- Use function
Breaking Changes:
- When creating a function with
functions.udf()andfunctions.sproc(), you can now specify an empty list for theimportsorpackagesargument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages. - Improved the
__repr__implementation of data types intypes.py. The unusedtype_nameproperty has been removed. - Added a Snowpark-specific exception class for SQL errors. This replaces the previous
ProgrammingErrorfrom the Python connector.
Improvements:
- Added a lock to a UDF or UDTF when it is called for the first time per thread.
- Improved the error message for pickling errors that occurred during UDF creation.
- Included the query ID when logging the failed query.
Bug Fixes:
- Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling
DataFrame.to_pandas(). - Fixed a bug in which
DataFrameReader.parquet()failed to read a parquet file when its column contained spaces. - Fixed a bug in which
DataFrame.copy_into_table()failed when the dataframe is created by reading a file with inferred schemas.
Deprecations
Session.flatten() and DataFrame.flatten().
Dependency Updates:
- Restricted the version of
cloudpickle<=2.0.0.
v0.6.0
0.6.0
New Features:
- Added support for vectorized UDFs with the input as a Pandas DataFrame or Pandas Series and the output as a Pandas Series. This improves the performance of UDFs in Snowpark.
- Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.
- Added functions
current_session(),current_statement(),current_user(),current_version(),current_warehouse(),date_from_parts(),date_trunc(),dayname(),dayofmonth(),dayofweek(),dayofyear(),grouping(),grouping_id(),hour(),last_day(),minute(),next_day(),previous_day(),second(),month(),monthname(),quarter(),year(),current_database(),current_role(),current_schema(),current_schemas(),current_region(),current_avaliable_roles(),add_months(),any_value(),bitnot(),bitshiftleft(),bitshiftright(),convert_timezone(),uniform(),strtok_to_array(),sysdate(),time_from_parts(),timestamp_from_parts(),timestamp_ltz_from_parts(),timestamp_ntz_from_parts(),timestamp_tz_from_parts(),weekofyear(),percentile_cont()tosnowflake.snowflake.functions.
Breaking Changes:
- Expired deprecations:
- Removed the following APIs that were deprecated in 0.4.0:
DataFrame.groupByGroupingSets(),DataFrame.naturalJoin(),DataFrame.joinTableFunction,DataFrame.withColumns(),Session.getImports(),Session.addImport(),Session.removeImport(),Session.clearImports(),Session.getSessionStage(),Session.getDefaultDatabase(),Session.getDefaultSchema(),Session.getCurrentDatabase(),Session.getCurrentSchema(),Session.getFullyQualifiedCurrentSchema().
- Removed the following APIs that were deprecated in 0.4.0:
Improvements:
- Added support for creating an empty
DataFramewith a specific schema using theSession.create_dataframe()method. - Changed the logging level from
INFOtoDEBUGfor several logs (e.g., the executed query) when evaluating a dataframe. - Improved the error message when failing to create a UDF due to pickle errors.
Bug Fixes:
- Removed pandas hard dependencies in the
Session.create_dataframe()method.
Dependency Updates:
- Added
typing-extensionas a new dependency with the version >=4.1.0.
v0.5.0
New Features
- Added stored procedures API.
- Added
Session.sprocproperty andsproc()tosnowflake.snowpark.functions, so you can register stored procedures. - Added
Session.callto call stored procedures by name.
- Added
- Added
UDFRegistration.register_from_file()to allow registering UDFs from Python source files or zip files directly. - Added
UDFRegistration.describe()to describe a UDF. - Added
DataFrame.random_split()to provide a way to randomly split a dataframe. - Added functions
md5(),sha1(),sha2(),ascii(),initcap(),length(),lower(),lpad(),ltrim(),rpad(),rtrim(),repeat(),soundex(),regexp_count(),replace(),charindex(),collate(),collation(),insert(),left(),right(),endswith()tosnowflake.snowpark.functions. - Allowed
call_udf()to accept literal values. - Provided a
distinctkeyword inarray_agg().
Bug Fixes:
- Fixed an issue that caused
DataFrame.to_pandas()to have a string column ifColumn.cast(IntegerType())was used. - Fixed a bug in
DataFrame.describe()when there is more than one string column.
v0.4.1
0.4.1 (2022-02-25)
Bug Fixes
- Fixed a bug in
DataFrame.describe()that raised an error when theDataFramehas more than one string columns.
v0.4.0
0.4.0 (2022-02-15)
New Features
- You can now specify which Anaconda packages to use when defining UDFs.
- Added
add_packages(),get_packages(),clear_packages(), andremove_package(), to classSession. - Added
add_requirements()toSessionso you can use a requirements file to specify which packages this session will use. - Added parameter
packagesto functionsnowflake.snowpark.functions.udf()and methodUserDefinedFunction.register()to indicate UDF-level Anaconda package dependencies when creating a UDF. - Added parameter
importstosnowflake.snowpark.functions.udf()andUserDefinedFunction.register()to specify UDF-level code imports.
- Added
- Added a parameter
sessionto functionudf()andUserDefinedFunction.register()so you can specify which session to use to create a UDF if you have multiple sessions. - Added types
GeographyandVarianttosnowflake.snowpark.typesto be used as type hints for Geography and Variant data when defining a UDF. - Added support for Geography geoJSON data.
- Added
Table, a subclass ofDataFramefor table operations:- Methods
updateanddeleteupdate and delete rows of a table in Snowflake. - Method
mergemerges data from aDataFrameto aTable. - Override method
DataFrame.sample()with an additional parameterseed, which works on tables but not on view and sub-queries.
- Methods
- Added
DataFrame.to_local_iterator()andDataFrame.to_pandas_batches()to allow getting results from an iterator when the result set returned from the Snowflake database is too large. - Added
DataFrame.cache_result()for caching the operations performed on aDataFramein a temporary table.
Subsequent operations on the originalDataFramehave no effect on the cached resultDataFrame. - Added property
DataFrame.queriesto get SQL queries that will be executed to evaluate theDataFrame. - Added
Session.query_history()as a context manager to track SQL queries executed on a session, including all SQL queries to evaluateDataFrames created from a session. Both query ID and query text are recorded. - You can now create a
Sessioninstance from an existing establishedsnowflake.connector.SnowflakeConnection. Use parameterconnectioninSession.builder.configs(). - Added
use_database(),use_schema(),use_warehouse(), anduse_role()to classSessionto switch database/schema/warehouse/role after a session is created. - Added
DataFrameWriter.copy_into_table()to unload aDataFrameto stage files. - Added
DataFrame.unpivot(). - Added
Column.within_group()for sorting the rows by columns with some aggregation functions. - Added functions
listagg(),mode(),div0(),acos(),asin(),atan(),atan2(),cos(),cosh(),sin(),sinh(),tan(),tanh(),degrees(),radians(),round(),trunc(), andfactorial()tosnowflake.snowflake.functions. - Added an optional argument
ignore_nullsin functionlead()andlag(). - The
conditionparameter of functionwhen()andiff()now accepts SQL expressions.
Improvements
- All function and method names have been renamed to use the snake case naming style, which is more Pythonic. For convenience, some camel case names are kept as aliases to the snake case APIs. It is recommended to use the snake case APIs.
- Deprecated these methods on class
Sessionand replaced them with their snake case equivalents:getImports(),addImports(),removeImport(),clearImports(),getSessionStage(),getDefaultSchema(),getDefaultSchema(),getCurrentDatabase(),getFullyQualifiedCurrentSchema(). - Deprecated these methods on class
DataFrameand replaced them with their snake case equivalents:groupingByGroupingSets(),naturalJoin(),withColumns(),joinTableFunction().
- Deprecated these methods on class
- Property
DataFrame.columnsis now consistent withDataFrame.schema.namesand the Snowflake databaseIdentifier Requirements. Column.__bool__()now raises aTypeError. This will ban the use of logical operatorsand,or,notonColumnobject, for instancecol("a") > 1 and col("b") > 2will raise theTypeError. Use(col("a") > 1) & (col("b") > 2)instead.- Changed
PutResultandGetResultto subclassNamedTuple. - Fixed a bug which raised an error when the local path or stage location has a space or other special characters.
- Changed
DataFrame.describe()so that non-numeric and non-string columns are ignored instead of raising an exception.
Dependency updates
- Updated
snowflake-connector-pythonto 2.7.4.
v0.3.0
0.3.0 (2022-01-09)
New Features
- Added
Column.isin(), with an aliasColumn.in_(). - Added
Column.try_cast(), which is a special version ofcast(). It tries to cast a string expression to other types and returnsnullif the cast is not possible. - Added
Column.startswith()andColumn.substr()to process string columns. Column.cast()now also accepts astrvalue to indicate the cast type in addition to aDataTypeinstance.- Added
DataFrame.describe()to summarize stats of aDataFrame. - Added
DataFrame.explain()to print the query plan of aDataFrame. DataFrame.filter()andDataFrame.select_expr()now accepts a sql expression.- Added a new
boolparametercreate_temp_tableto methodsDataFrame.saveAsTable()andSession.write_pandas()to optionally create a temp table. - Added
DataFrame.minus()andDataFrame.subtract()as aliases toDataFrame.except_(). - Added
regexp_replace(),concat(),concat_ws(),to_char(),current_timestamp(),current_date(),current_time(),months_between(),cast(),try_cast(),greatest(),least(), andhash()to modulesnowflake.snowpark.functions.
Bug Fixes
- Fixed an issue where
Session.createDataFrame(pandas_df)andSession.write_pandas(pandas_df)raise an exception when thePandas DataFramehas spaces in the column name. DataFrame.copy_into_table()sometimes prints anerrorlevel log entry while it actually works. It's fixed now.- Fixed an API docs issue where some
DataFrameAPIs are missing from the docs.
Dependency updates
- Update
snowflake-connector-pythonto 2.7.2, which upgradespyarrowdependency to 6.0.x. Refer to the python connector 2.7.2 release notes for more details.
v0.2.0
0.2.0 (2021-12-02)
New Features
- Updated the
Session.createDataFrame()method for creating aDataFramefrom a Pandas DataFrame. - Added the
Session.write_pandas()method for writing aPandas DataFrameto a table in Snowflake and getting aSnowpark DataFrameobject back. - Added new classes and methods for calling window functions.
- Added the new functions
cume_dist(), to find the cumulative distribution of a value with regard to other values within a window partition,
androw_number(), which returns a unique row number for each row within a window partition. - Added functions for computing statistics for DataFrames in the
DataFrameStatFunctionsclass. - Added functions for handling missing values in a DataFrame in the
DataFrameNaFunctionsclass. - Added new methods
rollup(),cube(), andpivot()to theDataFrameclass. - Added the
GroupingSetsclass, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS. - Added the new
FileOperation(session)
class that you can use to upload and download files to and from a stage. - Added the
DataFrame.copy_into_table()
method for loading data from files in a stage into a table. - In CASE expressions, the functions
when()andotherwise()
now accept Python types in addition toColumnobjects. - When you register a UDF you can now optionally set the
replaceparameter toTrueto overwrite an existing UDF with the same name.
Improvements
- UDFs are now compressed before they are uploaded to the server. This makes them about 10 times smaller, which can help
when you are using large ML model files. - When the size of a UDF is less than 8196 bytes, it will be uploaded as in-line code instead of uploaded to a stage.
Bug Fixes
- Fixed an issue where the statement
df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)]raised an exception. - Fixed an issue where
df.toPandas()raised an exception when a DataFrame was created from large local data.
Private Preview Release
Initial private preview release of snowflake-snowpark-python