0.12.5 (2024-12-03)
·
377 commits
to develop
since this release
Improvements
- Use
sipHash64instead ofmd5in Clickhouse for reading data with{"partitioning_mode": "hash"}, as it is 5 times faster. - Use
hashtextinstead ofmd5in Postgres for reading data with{"partitioning_mode": "hash"}, as it is 3-5 times faster. - Use
BINARY_CHECKSUMinstead ofHASHBYTESin MSSQL for reading data with{"partitioning_mode": "hash"}, as it is 5 times faster.
Big fixes
- In JDBC sources wrap
MOD(partitionColumn, numPartitions)withABS(...)to make al returned values positive. This prevents data skew. - Fix reading table data from MSSQL using
{"partitioning_mode": "hash"}withpartitionColumnof integer type. - Fix reading table data from Postgres using
{"partitioning_mode": "hash"}lead to data skew (all the data was read into one Spark partition).