Skip to content

Commit ca7de4f

Browse files
committed
add an exmaple
1 parent 661a2c2 commit ca7de4f

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

datafusion/datasource-parquet/src/row_group_filter.rs

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,33 @@ impl RowGroupAccessPlanFilter {
132132
/// | +-----------------------------------+-----------------------------+ |
133133
/// +-----------------------------------------------------------------------+
134134
///
135+
/// # Example with Statistics Truncation and NOT Inversion
136+
///
137+
/// When statistics are truncated to length 6 (e.g., `statistics_truncate_length = 6`),
138+
/// the min/max values become:
139+
///
140+
/// ```
141+
/// Row group 3: species_min="Alpine", species_max="Alpine" (truncated from "Alpine Ibex"/"Alpine Sheep")
142+
/// s_min=76, s_max=101
143+
/// ```
144+
///
145+
/// To identify this as fully matching, the system uses NOT inversion:
146+
/// 1. Original predicate: `species LIKE 'Alpine%' AND s >= 50`
147+
/// 2. Inverted predicate: `NOT (species LIKE 'Alpine%' AND s >= 50)`
148+
/// Simplified to: `species NOT LIKE 'Alpine%' OR s < 50`
149+
/// 3. Pruning predicate generated:
150+
/// `(species_min NOT LIKE 'Alpine%' OR species_max NOT LIKE 'Alpine%') OR s_min < 50`
151+
///
152+
/// For row group 3 with truncated stats:
153+
/// - Evaluating `species_min NOT LIKE 'Alpine%'`: `"A" NOT LIKE 'Alpine%'` = `false`
154+
/// - Evaluating `species_max NOT LIKE 'Alpine%'`: `"A" NOT LIKE 'Alpine%'` = `false`
155+
/// - Evaluating `s_min < 50`: `76 < 50` = `false`
156+
/// - Final result: `(false OR false) OR false` = `false`
157+
///
158+
/// Since the inverted predicate would prune this row group (returns false), it means
159+
/// no rows in this group could possibly satisfy the inverted predicate.
160+
/// Therefore, all rows in this group must match the original predicate, making it fully matched
161+
///
135162
/// Without limit pruning: Scan Partition 2 → Partition 3 → Partition 4 (until limit reached)
136163
/// With limit pruning: If Partition 3 contains enough rows to satisfy the limit,
137164
/// skip Partitions 2 and 4 entirely and go directly to Partition 3.

0 commit comments

Comments
 (0)