File tree Expand file tree Collapse file tree 7 files changed +15
-5
lines changed
Expand file tree Collapse file tree 7 files changed +15
-5
lines changed Original file line number Diff line number Diff line change 44Bug Fixes
55---------
66
7- - Allow passing table names in format ``schema."table.with.dots" `` to ``DBReader(name =...) `` and ``DBWriter(name =...) ``.
7+ - Allow passing table names in format ``schema."table.with.dots" `` to ``DBReader(source =...) `` and ``DBWriter(target =...) ``.
Original file line number Diff line number Diff line change 1+ 0.12.4 (2024-11-27)
2+ ===================
3+
4+ Bug Fixes
5+ ---------
6+
7+ - Fix ``DBReader(conn=oracle, options={"partitioning_mode": "hash"}) `` lead to data skew in last partition due to wrong ``ora_hash `` usage. (:github:pull: `319 `)
Original file line number Diff line number Diff line change 33 :caption: Changelog
44
55 DRAFT
6+ 0.12.4
67 0.12.3
78 0.12.2
89 0.12.1
Original file line number Diff line number Diff line change 1- 0.12.3
1+ 0.12.4
Original file line number Diff line number Diff line change 1010
1111class ClickhouseDialect (JDBCDialect ):
1212 def get_partition_column_hash (self , partition_column : str , num_partitions : int ) -> str :
13- return f"modulo( halfMD5({ partition_column } ), { num_partitions } ) "
13+ return f"halfMD5({ partition_column } ) % { num_partitions } "
1414
1515 def get_partition_column_mod (self , partition_column : str , num_partitions : int ) -> str :
1616 return f"{ partition_column } % { num_partitions } "
Original file line number Diff line number Diff line change 1010class MSSQLDialect (JDBCDialect ):
1111 # https://docs.microsoft.com/ru-ru/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver16
1212 def get_partition_column_hash (self , partition_column : str , num_partitions : int ) -> str :
13- return f"CONVERT(BIGINT, HASHBYTES ( 'SHA' , { partition_column } )) % { num_partitions } "
13+ return f"CONVERT(BIGINT, HASHBYTES ('SHA', { partition_column } )) % { num_partitions } "
1414
1515 def get_partition_column_mod (self , partition_column : str , num_partitions : int ) -> str :
1616 return f"{ partition_column } % { num_partitions } "
Original file line number Diff line number Diff line change @@ -43,7 +43,9 @@ def get_sql_query(
4343 )
4444
4545 def get_partition_column_hash (self , partition_column : str , num_partitions : int ) -> str :
46- return f"ora_hash({ partition_column } , { num_partitions } )"
46+ # ora_hash returns values from 0 to N including N.
47+ # Balancing N+1 splits to N partitions leads to data skew in last partition.
48+ return f"ora_hash({ partition_column } , { num_partitions - 1 } )"
4749
4850 def get_partition_column_mod (self , partition_column : str , num_partitions : int ) -> str :
4951 return f"MOD({ partition_column } , { num_partitions } )"
You can’t perform that action at this time.
0 commit comments