Skip to content

perf: Optimize regexp match and not match for .*foo.* cases#20610

Open
petern48 wants to merge 3 commits intoapache:mainfrom
petern48:regexp_simplify_optim
Open

perf: Optimize regexp match and not match for .*foo.* cases#20610
petern48 wants to merge 3 commits intoapache:mainfrom
petern48:regexp_simplify_optim

Conversation

@petern48
Copy link
Contributor

@petern48 petern48 commented Feb 27, 2026

Which issue does this PR close?

Rationale for this change

Improved query performance by optimizing logical plan

What changes are included in this PR?

Added optimization rules to perform the following logic

  • s ~ '.*foo.*' -> contains(s, foo)
  • s !~ '.*foo.*' -> not(contains(s, foo))
  • s ~ '.*.*' -> is_not_null(s)
  • s !~ '.*.*' -> false

Additionally, I found that the existing optimization for s !~ .* was incorrectly converting the condition to s = '', which would return True for rows where s was empty string (''). I confirmed this is different from default non-optimized query, which returns no rows even if some are empty string or NULL.

The reasoning behind it always returning False is that .* matches the empty string so not match should not include it. Additionally, NULL aren't returned either because NULL !~ '.*' results in NULL, not True. Therefore this condition is always False.

Are these changes tested?

Added tests and updated existing tests to pass after fixing the bug in the pre-existing optimization.

Are there any user-facing changes?

This is a slight behavior change due to fixing a bug in the pre-existing optimization. Previously, s !~ '.*' would return rows where s was empty string. This PR fixes the bug so that no rows are returned, which matches the behavior when no optimizations are applied.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Feb 27, 2026
@petern48 petern48 force-pushed the regexp_simplify_optim branch from 7b367d2 to 053dc9b Compare March 1, 2026 21:32
@petern48 petern48 marked this pull request as ready for review March 2, 2026 00:13
@petern48
Copy link
Contributor Author

petern48 commented Mar 2, 2026

built on top of #20581, so wait for it to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expr. simplification / rewrite: regex .*foo.*

1 participant