Skip to content

Commit e9b35e5

Browse files
committed
perfect_hash_join_min_key_density=0.2
1 parent b847020 commit e9b35e5

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

datafusion/common/src/config.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -486,7 +486,7 @@ config_namespace! {
486486
///
487487
/// Currently only supports cases where build_side.num_rows() < u32::MAX.
488488
/// Support for build_side.num_rows() >= u32::MAX will be added in the future.
489-
pub perfect_hash_join_min_key_density: f64, default = 0.99
489+
pub perfect_hash_join_min_key_density: f64, default = 0.2
490490

491491
/// When set to true, record batches will be examined between each operator and
492492
/// small batches will be coalesced into larger batches. This is helpful when there

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ datafusion.execution.parquet.statistics_enabled page
260260
datafusion.execution.parquet.statistics_truncate_length 64
261261
datafusion.execution.parquet.write_batch_size 1024
262262
datafusion.execution.parquet.writer_version 1.0
263-
datafusion.execution.perfect_hash_join_min_key_density 0.99
263+
datafusion.execution.perfect_hash_join_min_key_density 0.2
264264
datafusion.execution.perfect_hash_join_small_build_threshold 1024
265265
datafusion.execution.planning_concurrency 13
266266
datafusion.execution.skip_partial_aggregation_probe_ratio_threshold 0.8
@@ -397,7 +397,7 @@ datafusion.execution.parquet.statistics_enabled page (writing) Sets if statistic
397397
datafusion.execution.parquet.statistics_truncate_length 64 (writing) Sets statistics truncate length. If NULL, uses default parquet writer setting
398398
datafusion.execution.parquet.write_batch_size 1024 (writing) Sets write_batch_size in bytes
399399
datafusion.execution.parquet.writer_version 1.0 (writing) Sets parquet writer version valid values are "1.0" and "2.0"
400-
datafusion.execution.perfect_hash_join_min_key_density 0.99 The minimum required density of join keys on the build side to consider a perfect hash join (see `HashJoinExec` for more details). Density is calculated as: `(number of rows) / (max_key - min_key + 1)`. A perfect hash join may be used if the actual key density > this value. Currently only supports cases where build_side.num_rows() < u32::MAX. Support for build_side.num_rows() >= u32::MAX will be added in the future.
400+
datafusion.execution.perfect_hash_join_min_key_density 0.2 The minimum required density of join keys on the build side to consider a perfect hash join (see `HashJoinExec` for more details). Density is calculated as: `(number of rows) / (max_key - min_key + 1)`. A perfect hash join may be used if the actual key density > this value. Currently only supports cases where build_side.num_rows() < u32::MAX. Support for build_side.num_rows() >= u32::MAX will be added in the future.
401401
datafusion.execution.perfect_hash_join_small_build_threshold 1024 A perfect hash join (see `HashJoinExec` for more details) will be considered if the range of keys (max - min) on the build side is < this threshold. This provides a fast path for joins with very small key ranges, bypassing the density check. Currently only supports cases where build_side.num_rows() < u32::MAX. Support for build_side.num_rows() >= u32::MAX will be added in the future.
402402
datafusion.execution.planning_concurrency 13 Fan-out during initial physical planning. This is mostly use to plan `UNION` children in parallel. Defaults to the number of CPU cores on the system
403403
datafusion.execution.skip_partial_aggregation_probe_ratio_threshold 0.8 Aggregation ratio (number of distinct groups / number of input rows) threshold for skipping partial aggregation. If the value is greater then partial aggregation will skip aggregation for further input

0 commit comments

Comments
 (0)