Skip to content

Commit a788644

Browse files
committed
Improve documentation for ScalarUDFImpl::preimage
1 parent f8a22a5 commit a788644

File tree

2 files changed

+23
-9
lines changed

2 files changed

+23
-9
lines changed

datafusion/expr/src/udf.rs

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -709,19 +709,34 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send + Sync {
709709
Ok(ExprSimplifyResult::Original(args))
710710
}
711711

712-
/// Returns the [preimage] for this function and the specified scalar value, if any.
712+
/// Returns the preimage for this function and the specified scalar
713+
/// expression, if any.
713714
///
714-
/// A preimage is a single contiguous [`Interval`] of values where the function
715-
/// will always return `lit_value`
715+
/// # Return Value
716716
///
717717
/// Implementations should return intervals with an inclusive lower bound and
718718
/// exclusive upper bound.
719719
///
720-
/// This rewrite is described in the [ClickHouse Paper] and is particularly
721-
/// useful for simplifying expressions `date_part` or equivalent functions. The
722-
/// idea is that if you have an expression like `date_part(YEAR, k) = 2024` and you
723-
/// can find a [preimage] for `date_part(YEAR, k)`, which is the range of dates
724-
/// covering the entire year of 2024. Thus, you can rewrite the expression to `k
720+
/// # Background
721+
///
722+
/// A [preimage] is a single contiguous [`Interval`] of the functions
723+
/// argument where the function will return a single literal (constant)
724+
/// value. This can also be thought of as form of interval containment.
725+
///
726+
/// Using a preimage to rewrite predicates is described in the [ClickHouse
727+
/// Paper]:
728+
///
729+
/// > some functions can compute the preimage of a given function result.
730+
/// > This is used to replace comparisons of constants with function calls
731+
/// > on the key columns by comparing the key column value with the preimage.
732+
/// > For example, `toYear(k) = 2024` can be replaced by
733+
/// > `k >= 2024-01-01 && k < 2025-01-01`
734+
///
735+
/// As mentioned above, this rewrite is particularly useful for simplifying
736+
/// expressions such as `date_part` or equivalent functions. The idea is for
737+
/// an an expression like `date_part(YEAR, k) = 2024`, if there is a
738+
/// [preimage] for `date_part(YEAR, k)`, which is the range of dates
739+
/// covering the entire year of 2024, you can rewrite the expression to `k
725740
/// >= '2024-01-01' AND k < '2025-01-01' which is often more optimizable.
726741
///
727742
/// [ClickHouse Paper]: https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ use datafusion_expr_common::interval_arithmetic::Interval;
2626
/// range for which it is valid) and `x` is not `NULL`
2727
///
2828
/// For details see [`datafusion_expr::ScalarUDFImpl::preimage`]
29-
///
3029
pub(super) fn rewrite_with_preimage(
3130
preimage_interval: Interval,
3231
op: Operator,

0 commit comments

Comments
 (0)