Skip to content

Commit e80694e

Browse files
AdamGSalamb
andauthored
Remove recursive const check in simplify_const_expr (apache#20234)
## Which issue does this PR close? - Closes apache#20134 . ## Rationale for this change The check for simplifying const expressions was recursive and expensive, repeatedly checking the expression's children in a recursive way. I've tried other approached like pre-computing the result for all expressions outside of the loop and using that cache during the traversal, but I've found that it only yielded between 5-8% improvement while adding complexity, while this approach simplifies the code and seems to be more performant in my benchmarks (change is compared to current main branch): ``` tpc-ds/q76/cs/16 time: [27.112 µs 27.159 µs 27.214 µs] change: [−13.533% −13.167% −12.801%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe tpc-ds/q76/ws/16 time: [26.175 µs 26.280 µs 26.394 µs] change: [−14.312% −13.833% −13.346%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild tpc-ds/q76/cs/128 time: [195.79 µs 196.17 µs 196.56 µs] change: [−14.362% −14.080% −13.816%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild tpc-ds/q76/ws/128 time: [197.08 µs 197.61 µs 198.23 µs] change: [−13.531% −13.142% −12.737%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild ``` ## What changes are included in this PR? 1. `simplify_const_expr` now only checks itself and whether all of its children are literals, because it assumes the order of simplification is bottoms-up. 2. Removes some code from the public API, see the last section for the full details. ## Are these changes tested? Existing test suite ## Are there any user-facing changes? I suggest removing some of the physical expression simplification code from the public API, which I believe reduces the maintenance burden here. These changes also helps removing code like the distinct `simplify_const_expr` and `simplify_const_expr_with_dummy`. 1. Makes all `datafusion-physical-expr::simplifier` sub-modules (`not` and `const_evaluator`) private, including their key functions. They are not used externally, and being able to change their behavior seems more valuable long term. The simplifier is also not currently an extension point as far as I can tell, so there's no value in providing atomic building blocks like them for now. 2. Removes `has_column_references` completely, its trivial to re-implement and isn't used anywhere in the codebase. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent fdd36d0 commit e80694e

File tree

3 files changed

+92
-23
lines changed

3 files changed

+92
-23
lines changed

datafusion/physical-expr/src/simplifier/const_evaluator.rs

Lines changed: 84 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -39,18 +39,80 @@ use crate::expressions::{Column, Literal};
3939
/// - `1 + 2` -> `3`
4040
/// - `(1 + 2) * 3` -> `9` (with bottom-up traversal)
4141
/// - `'hello' || ' world'` -> `'hello world'`
42+
#[deprecated(
43+
since = "53.0.0",
44+
note = "This function will be removed in a future release in favor of a private implementation that depends on other implementation details. Please open an issue if you have a use case for keeping it."
45+
)]
4246
pub fn simplify_const_expr(
4347
expr: Arc<dyn PhysicalExpr>,
4448
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
45-
simplify_const_expr_with_dummy(expr, &create_dummy_batch()?)
49+
let batch = create_dummy_batch()?;
50+
// If expr is already a const literal or can't be evaluated into one.
51+
if expr.as_any().is::<Literal>() || (!can_evaluate_as_constant(&expr)) {
52+
return Ok(Transformed::no(expr));
53+
}
54+
55+
// Evaluate the expression
56+
match expr.evaluate(&batch) {
57+
Ok(ColumnarValue::Scalar(scalar)) => {
58+
Ok(Transformed::yes(Arc::new(Literal::new(scalar))))
59+
}
60+
Ok(ColumnarValue::Array(arr)) if arr.len() == 1 => {
61+
// Some operations return an array even for scalar inputs
62+
let scalar = ScalarValue::try_from_array(&arr, 0)?;
63+
Ok(Transformed::yes(Arc::new(Literal::new(scalar))))
64+
}
65+
Ok(_) => {
66+
// Unexpected result - keep original expression
67+
Ok(Transformed::no(expr))
68+
}
69+
Err(_) => {
70+
// On error, keep original expression
71+
// The expression might succeed at runtime due to short-circuit evaluation
72+
// or other runtime conditions
73+
Ok(Transformed::no(expr))
74+
}
75+
}
4676
}
4777

48-
pub(crate) fn simplify_const_expr_with_dummy(
78+
/// Simplify expressions whose immediate children are all literals.
79+
///
80+
/// This function only checks the direct children of the expression,
81+
/// not the entire subtree. It is designed to be used with bottom-up tree
82+
/// traversal, where children are simplified before parents.
83+
///
84+
/// # Example transformations
85+
/// - `1 + 2` -> `3`
86+
/// - `(1 + 2) * 3` -> `9` (with bottom-up traversal, inner expr simplified first)
87+
/// - `'hello' || ' world'` -> `'hello world'`
88+
pub(crate) fn simplify_const_expr_immediate(
4989
expr: Arc<dyn PhysicalExpr>,
5090
batch: &RecordBatch,
5191
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
52-
// If expr is already a const literal or can't be evaluated into one.
53-
if expr.as_any().is::<Literal>() || (!can_evaluate_as_constant(&expr)) {
92+
// Already a literal - nothing to do
93+
if expr.as_any().is::<Literal>() {
94+
return Ok(Transformed::no(expr));
95+
}
96+
97+
// Column references cannot be evaluated at plan time
98+
if expr.as_any().is::<Column>() {
99+
return Ok(Transformed::no(expr));
100+
}
101+
102+
// Volatile nodes cannot be evaluated at plan time
103+
if expr.is_volatile_node() {
104+
return Ok(Transformed::no(expr));
105+
}
106+
107+
// Since transform visits bottom-up, children have already been simplified.
108+
// If all children are now Literals, this node can be const-evaluated.
109+
// This is O(k) where k = number of children, instead of O(subtree).
110+
let all_children_literal = expr
111+
.children()
112+
.iter()
113+
.all(|child| child.as_any().is::<Literal>());
114+
115+
if !all_children_literal {
54116
return Ok(Transformed::no(expr));
55117
}
56118

@@ -77,6 +139,20 @@ pub(crate) fn simplify_const_expr_with_dummy(
77139
}
78140
}
79141

142+
/// Create a 1-row dummy RecordBatch for evaluating constant expressions.
143+
///
144+
/// The batch is never actually accessed for data - it's just needed because
145+
/// the PhysicalExpr::evaluate API requires a RecordBatch. For expressions
146+
/// that only contain literals, the batch content is irrelevant.
147+
///
148+
/// This is the same approach used in the logical expression `ConstEvaluator`.
149+
pub(crate) fn create_dummy_batch() -> Result<RecordBatch> {
150+
// RecordBatch requires at least one column
151+
let dummy_schema = Arc::new(Schema::new(vec![Field::new("_", DataType::Null, true)]));
152+
let col = new_null_array(&DataType::Null, 1);
153+
Ok(RecordBatch::try_new(dummy_schema, vec![col])?)
154+
}
155+
80156
fn can_evaluate_as_constant(expr: &Arc<dyn PhysicalExpr>) -> bool {
81157
let mut can_evaluate = true;
82158

@@ -93,21 +169,11 @@ fn can_evaluate_as_constant(expr: &Arc<dyn PhysicalExpr>) -> bool {
93169
can_evaluate
94170
}
95171

96-
/// Create a 1-row dummy RecordBatch for evaluating constant expressions.
97-
///
98-
/// The batch is never actually accessed for data - it's just needed because
99-
/// the PhysicalExpr::evaluate API requires a RecordBatch. For expressions
100-
/// that only contain literals, the batch content is irrelevant.
101-
///
102-
/// This is the same approach used in the logical expression `ConstEvaluator`.
103-
pub(crate) fn create_dummy_batch() -> Result<RecordBatch> {
104-
// RecordBatch requires at least one column
105-
let dummy_schema = Arc::new(Schema::new(vec![Field::new("_", DataType::Null, true)]));
106-
let col = new_null_array(&DataType::Null, 1);
107-
Ok(RecordBatch::try_new(dummy_schema, vec![col])?)
108-
}
109-
110172
/// Check if this expression has any column references.
173+
#[deprecated(
174+
since = "53.0.0",
175+
note = "This function isn't used internally and is trivial to implement, therefore it will be removed in a future release."
176+
)]
111177
pub fn has_column_references(expr: &Arc<dyn PhysicalExpr>) -> bool {
112178
let mut has_columns = false;
113179
expr.apply(|expr| {

datafusion/physical-expr/src/simplifier/mod.rs

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,7 @@ use std::sync::Arc;
2424
use crate::{
2525
PhysicalExpr,
2626
simplifier::{
27-
const_evaluator::{create_dummy_batch, simplify_const_expr_with_dummy},
28-
not::simplify_not_expr,
29-
unwrap_cast::unwrap_cast_in_comparison,
27+
const_evaluator::create_dummy_batch, unwrap_cast::unwrap_cast_in_comparison,
3028
},
3129
};
3230

@@ -67,10 +65,11 @@ impl<'a> PhysicalExprSimplifier<'a> {
6765

6866
// Apply NOT expression simplification first, then unwrap cast optimization,
6967
// then constant expression evaluation
70-
let rewritten = simplify_not_expr(node, schema)?
68+
#[expect(deprecated, reason = "`simplify_not_expr` is marked as deprecated until it's made private.")]
69+
let rewritten = not::simplify_not_expr(node, schema)?
7170
.transform_data(|node| unwrap_cast_in_comparison(node, schema))?
7271
.transform_data(|node| {
73-
simplify_const_expr_with_dummy(node, &batch)
72+
const_evaluator::simplify_const_expr_immediate(node, &batch)
7473
})?;
7574

7675
#[cfg(debug_assertions)]

datafusion/physical-expr/src/simplifier/not.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ use crate::expressions::{BinaryExpr, InListExpr, Literal, NotExpr, in_list, lit}
4343
/// This function applies a single simplification rule and returns. When used with
4444
/// TreeNodeRewriter, multiple passes will automatically be applied until no more
4545
/// transformations are possible.
46+
#[deprecated(
47+
since = "53.0.0",
48+
note = "This function will be made private in a future release, please file an issue if you have a reason for keeping it public."
49+
)]
4650
pub fn simplify_not_expr(
4751
expr: Arc<dyn PhysicalExpr>,
4852
schema: &Schema,

0 commit comments

Comments
 (0)