Skip to content

Add evalMv for MV predicate filtering (incl. REGEXP_LIKE)#17659

Open
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:codex/evalmv-arraytest
Open

Add evalMv for MV predicate filtering (incl. REGEXP_LIKE)#17659
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:codex/evalmv-arraytest

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Feb 7, 2026

What does this PR change?

  • Adds evalMv as both:
  • a transform function (EvalMvTransformFunction) for block-based execution.
  • a scalar function (EvalMvScalarFunction) for scalar execution paths.
  • Registers evalMv in TransformFunctionType and TransformFunctionFactory.
  • Adds MV predicate parsing/evaluation in EvalMvPredicateEvaluator.
  • Supports predicate operators: =, !=, >, >=, <, <=, IN, NOT IN, BETWEEN, LIKE, REGEXP_LIKE, plus AND / OR / NOT.
  • Adds/extends tests:
  • EvalMvTransformFunctionTest unit coverage for REGEXP_LIKE.
  • ArrayTest integration coverage for REGEXP_LIKE with useBothQueryEngines.

evalMv(mv_col, 'predicate') evaluates each MV element against the predicate and returns a filtered MV array.

Why?

This addresses MV filtering semantics gaps (notably with MSE + MV columns) where filtering can keep all MV elements from a row once any element matches. evalMv provides explicit per-element filtering without requiring CROSS JOIN UNNEST.

Query examples

1) MSE: filter MV values by LIKE

SELECT DISTINCT arrayToMv(evalMv(norm_url_list_combined, 'norm_url_list_combined LIKE ''%api%''')) AS sf_node_name
FROM rum_otel_data_p
WHERE product IN ('WEB')
  AND org_id = 'AAAAAAAAAAA'
  AND start_time_micros >= 1769715945540000
  AND norm_url_list_combined IS NOT NULL
ORDER BY sf_node_name ASC
LIMIT 50
OPTION(useMultistageEngine=true)

2) Both engines: filter MV values by REGEXP_LIKE

SELECT evalMv(stringArrayCol, 'REGEXP_LIKE(stringArrayCol, ''^/api/.*'')')
FROM ArrayTest
LIMIT 1

Expected first-row output: ['/api/v1', '/api/v2'] for source ['/api/v1', '/home', '/api/v2', '/metrics'].

3) Both engines: use filtered MV result in row predicate

SELECT COUNT(*)
FROM ArrayTest
WHERE arrayLength(evalMv(stringArrayCol, 'REGEXP_LIKE(stringArrayCol, ''^/api/.*'')')) = 2

Validation

  • ./mvnw -pl pinot-core -am -Dtest=EvalMvTransformFunctionTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-integration-tests -am -Dtest=ArrayTest -Dsurefire.failIfNoSpecifiedTests=false test

@xiangfu0 xiangfu0 changed the title Add evalMv MV filtering support Add evalMv to filter MV elements by predicate Feb 7, 2026
@xiangfu0 xiangfu0 force-pushed the codex/evalmv-arraytest branch from 78d121f to 44db7aa Compare February 7, 2026 20:45
@xiangfu0 xiangfu0 changed the title Add evalMv to filter MV elements by predicate Add evalMv for MV predicate filtering (incl. REGEXP_LIKE) Feb 7, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 22.02729% with 400 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.12%. Comparing base (7305eec) to head (44db7aa).
⚠️ Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
...r/transform/function/EvalMvPredicateEvaluator.java 22.76% 167 Missing and 6 partials ⚠️
...or/transform/function/EvalMvTransformFunction.java 25.43% 118 Missing and 11 partials ⚠️
...not/core/function/scalar/EvalMvScalarFunction.java 14.03% 98 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17659      +/-   ##
============================================
- Coverage     63.16%   63.12%   -0.05%     
- Complexity     1479     1500      +21     
============================================
  Files          3173     3177       +4     
  Lines        189917   190836     +919     
  Branches      29064    29216     +152     
============================================
+ Hits         119970   120464     +494     
- Misses        60621    61024     +403     
- Partials       9326     9348      +22     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.09% <22.02%> (-0.05%) ⬇️
java-21 63.08% <22.02%> (-0.04%) ⬇️
temurin 63.12% <22.02%> (-0.05%) ⬇️
unittests 63.12% <22.02%> (-0.05%) ⬇️
unittests1 55.48% <22.02%> (-0.05%) ⬇️
unittests2 33.98% <3.31%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants