Releases: MTSWebServices/onetl
0.9.5 (2023-10-10)
Features
- Add
XMLfile format support. (#163) - Tested compatibility with Spark 3.5.0.
MongoDBandExcelare not supported yet, but other packages do. (#159)
Improvements
- Add check to all DB and FileDF connections that Spark session is alive. (#164)
Bug Fixes
- Fix
Hive.check()behavior when Hive Metastore is not available. (#164)
0.9.4 (2023-09-26)
Features
- Add
Excelfile format support. (#148) - Add
Sambafile connection. It is now possible to download and upload files to Samba shared folders usingFileDownloader/FileUploader. (#150) - Add
if_exists="ignore"anderrortoHive.WriteOptions(#143) - Add
if_exists="ignore"anderrortoJDBC.WriteOptions(#144) - Add
if_exists="ignore"anderrortoMongoDB.WriteOptions(#145)
Improvements
-
Add documentation about different ways of passing packages to Spark session. (#151)
-
Drastically improve
Greenplumdocumentation:- Added information about network ports, grants,
pg_hba.confand so on. - Added interaction schemas for reading, writing and executing statements in Greenplum.
- Added recommendations about reading data from views and
JOINresults from Greenplum. (#154)
- Added information about network ports, grants,
-
Make
.fetchand.executemethods of DB connections thread-safe. Each thread works with its own connection. (#156) -
Call
.close()on FileConnection then it is removed by garbage collector. (#156)
Bug Fixes
- Fix issue while stopping Python interpreter calls
JDBCMixin.close()and prints exceptions to log. (#156)
0.9.3 (2023-09-06)
Bug Fixes
- Fix documentation build
0.9.2 (2023-09-06)
Features
- Add
if_exists="ignore"anderrortoGreenplum.WriteOptions(#142)
Improvements
- Improve validation messages while writing dataframe to Kafka. (#131)
- Improve documentation:
- Add notes about reading and writing to database connections documentation
- Add notes about executing statements in JDBC and Greenplum connections
Bug Fixes
- Fixed validation of
headerscolumn is written to Kafka with defaultKafka.WriteOptions()- default value wasFalse, but instead of raising an exception, column value was just ignored. (#131) - Fix reading data from Oracle with
partitioningMode="range"without explicitly setlowerBound/upperBound. (#133) - Update Kafka documentation with SSLProtocol usage. (#136)
- Raise exception if someone tries to read data from Kafka topic which does not exist. (#138)
- Allow to pass Kafka topics with name like
some.topic.nameto DBReader. Same for MongoDB collections. (#139)
0.9.1 (2023-08-17)
Bug Fixes
- Fixed bug then number of threads created by
FileDownloader/FileUploader/FileMoverwas notmin(workers, len(files)), butmax(workers, len(files)). leading to create too much workers on large files list.
0.9.0 (2023-08-17)
Breaking Changes
-
Rename methods:
DBConnection.read_df->DBConnection.read_source_as_dfDBConnection.write_df->DBConnection.write_df_to_target(#66)
-
Rename classes:
HDFS.slots->HDFS.SlotsHive.slots->Hive.Slots
Old names are left intact, but will be removed in v1.0.0 (#103)
-
Rename options to make them self-explanatory:
Hive.WriteOptions(mode="append")->Hive.WriteOptions(if_exists="append")Hive.WriteOptions(mode="overwrite_table")->Hive.WriteOptions(if_exists="replace_entire_table")Hive.WriteOptions(mode="overwrite_partitions")->Hive.WriteOptions(if_exists="replace_overlapping_partitions")JDBC.WriteOptions(mode="append")->JDBC.WriteOptions(if_exists="append")JDBC.WriteOptions(mode="overwrite")->JDBC.WriteOptions(if_exists="replace_entire_table")Greenplum.WriteOptions(mode="append")->Greenplum.WriteOptions(if_exists="append")Greenplum.WriteOptions(mode="overwrite")->Greenplum.WriteOptions(if_exists="replace_entire_table")MongoDB.WriteOptions(mode="append")->Greenplum.WriteOptions(if_exists="append")MongoDB.WriteOptions(mode="overwrite")->Greenplum.WriteOptions(if_exists="replace_entire_collection")FileDownloader.Options(mode="error")->FileDownloader.Options(if_exists="error")FileDownloader.Options(mode="ignore")->FileDownloader.Options(if_exists="ignore")FileDownloader.Options(mode="overwrite")->FileDownloader.Options(if_exists="replace_file")FileDownloader.Options(mode="delete_all")->FileDownloader.Options(if_exists="replace_entire_directory")FileUploader.Options(mode="error")->FileUploader.Options(if_exists="error")FileUploader.Options(mode="ignore")->FileUploader.Options(if_exists="ignore")FileUploader.Options(mode="overwrite")->FileUploader.Options(if_exists="replace_file")FileUploader.Options(mode="delete_all")->FileUploader.Options(if_exists="replace_entire_directory")FileMover.Options(mode="error")->FileMover.Options(if_exists="error")FileMover.Options(mode="ignore")->FileMover.Options(if_exists="ignore")FileMover.Options(mode="overwrite")->FileMover.Options(if_exists="replace_file")FileMover.Options(mode="delete_all")->FileMover.Options(if_exists="replace_entire_directory")
Old names are left intact, but will be removed in v1.0.0 (#108)
-
Rename
onetl.log.disable_clients_logging()toonetl.log.setup_clients_logging(). (#120)
Features
-
Add new methods returning Maven packages for specific connection class:
Clickhouse.get_packages()MySQL.get_packages()Postgres.get_packages()Teradata.get_packages()MSSQL.get_packages(java_version="8")Oracle.get_packages(java_version="8")Greenplum.get_packages(scala_version="2.12")MongoDB.get_packages(scala_version="2.12")Kafka.get_packages(spark_version="3.4.1", scala_version="2.12")
Deprecate old syntax:
Clickhouse.packageMySQL.packagePostgres.packageTeradata.packageMSSQL.packageOracle.packageGreenplum.package_spark_2_3Greenplum.package_spark_2_4Greenplum.package_spark_3_2MongoDB.package_spark_3_2MongoDB.package_spark_3_3MongoDB.package_spark_3_4(#87)
-
Allow to set client modules log level in
onetl.log.setup_clients_logging().Allow to enable underlying client modules logging in
onetl.log.setup_logging()by providing additional argumentenable_clients=True. This is useful for debug. (#120) -
Added support for reading and writing data to Kafka topics.
For these operations, new classes were added.
Kafka(#54, #60, #72, #84, #87, #89, #93, #96, #102, #104)Kafka.PlaintextProtocol(#79)Kafka.SSLProtocol(#118)Kafka.BasicAuth(#63, #77)Kafka.KerberosAuth(#63, #77, #110)Kafka.ScramAuth(#115)Kafka.Slots(#109)Kafka.ReadOptions(#68)Kafka.WriteOptions(#68)
Currently, Kafka does not support incremental read strategies, this will be implemented in future releases.
-
Added support for reading files as Spark DataFrame and saving DataFrame as Files.
For these operations, new classes were added.
FileDFConnections:
High-level classes:
File formats:
Improvements
0.8.1 (2023-07-10)
Features
-
Add
@slotdecorator to public methods of:DBConnectionFileConnectionDBReaderDBWriterFileDownloaderFileUploaderFileMover(#49)
-
Add
workersfield toFileDownloader/FileUploader/FileMover.Optionsclasses.This allows to speed up all file operations using parallel threads. (#57)
Improvements
- Add documentation for HWM store
.getand.savemethods. (#49) - Improve Readme:
- Move
Quick startsection from documentation - Add
Non-goalssection - Fix code blocks indentation (#50)
- Move
- Improve Contributing guide:
- Move
Developsection from Readme - Move
docs/changelog/README.rstcontent - Add
Limitationssection - Add instruction of creating a fork and building documentation (#50)
- Move
- Remove duplicated checks for source file existence in
FileDownloader/FileMover. (#57) - Update default logging format to include thread name. (#57)
Bug Fixes
- Fix
S3.list_dir('/')returns empty list on latest Minio version. (#58)
0.8.0 (2023-05-31)
Breaking Changes
-
Rename methods of
FileConnectionclasses:get_directory→resolve_dirget_file→resolve_filelistdir→list_dirmkdir→create_dirrmdir→remove_dir
New naming should be more consistent.
They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36) -
Deprecate
onetl.core.FileFilterclass, replace it with new classes:onetl.file.filter.Globonetl.file.filter.Regexponetl.file.filter.ExcludeDir
Old class will be removed in v1.0.0. (#43)
-
Deprecate
onetl.core.FileLimitclass, replace it with new classonetl.file.limit.MaxFilesCount.Old class will be removed in v1.0.0. (#44)
-
Change behavior of
BaseFileLimit.resetmethod.This method should now return
selfinstead ofNone. Return value could be the same limit object or a copy, this is an implementation detail. (#44) -
Replaced
FileDownloader.filterand.limitwith new options.filtersand.limits:FileDownloader( ..., filter=FileFilter(glob="*.txt", exclude_dir="/path"), limit=FileLimit(count_limit=10), )
FileDownloader( ..., filters=[Glob("*.txt"), ExcludeDir("/path")], limits=[MaxFilesCount(10)], )
This allows to developers to implement their own filter and limit classes, and combine them with existing ones.
Old behavior still supported, but it will be removed in v1.0.0. (#45)
-
Removed default value for
FileDownloader.limits, user should pass limits list explicitly. (#45) -
Move classes from module
onetl.core:from onetl.core import DBReader from onetl.core import DBWriter from onetl.core import FileDownloader from onetl.core import FileUploader from onetl.core import FileResult from onetl.core import FileSet
with new modules
onetl.dbandonetl.file:from onetl.db import DBReader from onetl.db import DBWriter from onetl.file import FileDownloader from onetl.file import FileUploader # not a public interface from onetl.file.file_result import FileResult from onetl.file.file_set import FileSet
Imports from old module
onetl.corestill can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)
Features
-
Add
rename_dirmethod.Method was added to following connections:
FTPFTPSHDFSSFTPWebDAV
It allows to rename/move directory to new path with all its content.
S3does not have directories, so there is no such method in that class. (#40) -
Add
onetl.file.FileMoverclass.It allows to move files between directories of remote file system. Signature is almost the same as in
FileDownloader, but without HWM support. (#42)
Improvements
-
Document all public methods in
FileConnectionclasses:download_fileresolve_dirresolve_fileget_statis_diris_filelist_dircreate_dirpath_existsremove_filerename_fileremove_dirupload_filewalk(#39)
-
Update documentation of
checkmethod of all connections - add usage example and document result type. (#39) -
Add new exception type
FileSizeMismatchError.Methods
connection.download_fileandconnection.upload_filenow raise new exception type instead ofRuntimeError, if target file after download/upload has different size than source. (#39) -
Add new exception type
DirectoryExistsError- it is raised if target directory already exists. (#40) -
Improved
FileDownloader/FileUploaderexception logging.If
DEBUGlogging is enabled, print exception with stacktrace instead of printing only exception message. (#42) -
Updated documentation of
FileUploader.- Class does not support read strategies, added note to documentation.
- Added examples of using
runmethod with explicit files list passing, both absolute and relative paths. - Fix outdated imports and class names in examples. (#42)
-
Updated documentation of
DownloadResultclass - fix outdated imports and class names. (#42) -
Improved file filters documentation section.
Document interface class
onetl.base.BaseFileFilterand functionmatch_all_filters. (#43) -
Improved file limits documentation section.
Document interface class
onetl.base.BaseFileLimitand functionslimits_stop_at/limits_reached/reset_limits. (#44) -
Added changelog.
Changelog is generated from separated news files using towncrier. (#47)
Misc
- Improved CI workflow for tests.
- If developer haven't changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.
- If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.
- Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies
- Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)
Full Changelog: 0.7.2...0.8.0
0.7.2 (2023-05-24)
Dependencies
-
Limited
typing-extensionsversion.typing-extensions==4.6.0release contains some breaking changes causing errors like:Traceback (most recent call last): File "/Users/project/lib/python3.9/typing.py", line 852, in __subclasscheck__ return issubclass(cls, self.__origin__) TypeError: issubclass() arg 1 must be a classtyping-extensions==4.6.1was causing another error:Traceback (most recent call last): File "/home/maxim/Repo/typing_extensions/1.py", line 33, in <module> isinstance(file, ContainsException) File "/home/maxim/Repo/typing_extensions/src/typing_extensions.py", line 599, in __instancecheck__ if super().__instancecheck__(instance): File "/home/maxim/.pyenv/versions/3.7.8/lib/python3.7/abc.py", line 139, in __instancecheck__ return _abc_instancecheck(cls, instance) File "/home/maxim/Repo/typing_extensions/src/typing_extensions.py", line 583, in __subclasscheck__ return super().__subclasscheck__(other) File "/home/maxim/.pyenv/versions/3.7.8/lib/python3.7/abc.py", line 143, in __subclasscheck__ return _abc_subclasscheck(cls, subclass) File "/home/maxim/Repo/typing_extensions/src/typing_extensions.py", line 661, in _proto_hook and other._is_protocol AttributeError: type object 'PathWithFailure' has no attribute '_is_protocol'We updated requirements with
typing-extensions<4.6until fixing compatibility issues.
Full Changelog: 0.7.1...0.7.2
0.7.1 (2023-05-23)
Bug Fixes
-
Fixed
setup_loggingfunction.In onETL==0.7.0 calling
onetl.log.setup_logging()broke the logging:Traceback (most recent call last): File "/opt/anaconda/envs/py39/lib/python3.9/logging/__init__.py", line 434, in format return self._format(record) File "/opt/anaconda/envs/py39/lib/python3.9/logging/__init__.py", line 430, in _format return self._fmt % record.dict KeyError: 'levelname:8s' -
Fixed installation examples.
In onETL==0.7.0 there are examples of installing onETL with extras:
pip install onetl[files, kerberos, spark]
But pip fails to install such package:
ERROR: Invalid requirement: 'onet[files,'This is because of spaces in extras clause. Fixed:
pip install onetl[files,kerberos,spark]
Full Changelog: 0.7.0...0.7.1