Skip to content

enhance: type-aware bidirectional in/== or expression rewriting#48544

Open
zhengbuqian wants to merge 1 commit intomilvus-io:masterfrom
zhengbuqian:feat/optimize-in-vs-eqor-threshold
Open

enhance: type-aware bidirectional in/== or expression rewriting#48544
zhengbuqian wants to merge 1 commit intomilvus-io:masterfrom
zhengbuqian:feat/optimize-in-vs-eqor-threshold

Conversation

@zhengbuqian
Copy link
Copy Markdown
Collaborator

Replace the fixed threshold (150) for merging == or into in[] with type-specific thresholds based on benchmark data:

  • INT types: use in[] when N >= 10 (== or is faster below due to simpler execution path)
  • FLOAT types: use in[] when N >= 15 (float hash is more expensive)
  • Other types (varchar, bool): use in[] when N >= 3

The rewriting is now bidirectional:

  • == or → in[]: when shouldUseInExpr returns true (existing direction)
  • in[] → == or: when shouldUseInExpr returns false (new direction, via visitTermExpr)
  • not in → != and: when shouldUseInExpr returns false (new, via visitUnaryExpr)

Both directions use the same shouldUseInExpr function to ensure consistency.

issue: #45525

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added approved size/L Denotes a PR that changes 100-499 lines. labels Mar 26, 2026
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Mar 26, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-ciloop // for ci-v2/ciloop (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, AMD)
  • /ci-rerun-gosdk-arm // for ci-v2/go-sdk-arm (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 97.36842% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.76%. Comparing base (9ee6fce) to head (11c395c).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/parser/planparserv2/rewriter/util.go 84.61% 2 Missing ⚠️
internal/parser/planparserv2/rewriter/term_in.go 98.61% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #48544      +/-   ##
==========================================
- Coverage   77.82%   77.76%   -0.07%     
==========================================
  Files        2112     2112              
  Lines      354195   354294      +99     
==========================================
- Hits       275665   275517     -148     
- Misses      70202    70415     +213     
- Partials     8328     8362      +34     
Components Coverage Δ
Client 79.25% <ø> (ø)
Core 84.30% <100.00%> (-0.03%) ⬇️
Go 75.80% <96.90%> (-0.09%) ⬇️
Files with missing lines Coverage Δ
...rnal/core/unittest/test_expr_materialized_view.cpp 97.94% <100.00%> (+0.05%) ⬆️
internal/parser/planparserv2/rewriter/entry.go 88.26% <100.00%> (+0.84%) ⬆️
internal/parser/planparserv2/rewriter/term_in.go 93.68% <98.61%> (+2.44%) ⬆️
internal/parser/planparserv2/rewriter/util.go 68.75% <84.61%> (-0.70%) ⬇️

... and 52 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Mar 26, 2026
@buqian-zilliz buqian-zilliz force-pushed the feat/optimize-in-vs-eqor-threshold branch from 2a361a6 to 9a7587d Compare March 27, 2026 16:40
@zhengbuqian
Copy link
Copy Markdown
Collaborator Author

/ci-rerun-ut-go

@zhengbuqian
Copy link
Copy Markdown
Collaborator Author

/ci-rerun-e2e-default

@buqian-zilliz buqian-zilliz force-pushed the feat/optimize-in-vs-eqor-threshold branch from 9a7587d to 04ba149 Compare March 28, 2026 06:50
…with PK optimization

Introduce smart expression rewriting that chooses between TermExpr (IN)
and UnaryRangeExpr (== or) based on data type characteristics:
- Integer types: use IN when count >= 10
- Floating types: use IN when count >= 15
- String/other types: use IN when count >= 3
- Primary key fields: always use IN (PK is sorted, IN uses efficient
  two-pointer binary search, 60-117x faster than == or for 100 values)

The rewriting is bidirectional:
- combineOrEqualsToIn: merges == or chains into IN above threshold
- visitTermExpr: splits IN into == or below threshold
- Both directions respect the same type-aware thresholds and PK override

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@buqian-zilliz buqian-zilliz force-pushed the feat/optimize-in-vs-eqor-threshold branch from 04ba149 to 11c395c Compare March 28, 2026 07:25
@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Mar 28, 2026
@mergify mergify bot added the ci-passed label Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/L Denotes a PR that changes 100-499 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants