Skip to content

General UDAF pushdown as scripts#5064

Open
songkant-aws wants to merge 16 commits intoopensearch-project:mainfrom
songkant-aws:udaf-script-pushdown
Open

General UDAF pushdown as scripts#5064
songkant-aws wants to merge 16 commits intoopensearch-project:mainfrom
songkant-aws:udaf-script-pushdown

Conversation

@songkant-aws
Copy link
Copy Markdown
Collaborator

@songkant-aws songkant-aws commented Jan 22, 2026

Description

Push down any UDAF as scripts to allow parallel evaluating sub aggregation result per shard and reduce them into a final aggregation result. We expect it will speed up some complex command like patterns or future UDAFs. Pending benchmark test.

Related Issues

Resolves #4354

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Optional pushdown of pattern aggregation to data nodes for distributed processing and improved performance
    • Option to show numbered tokens in pattern aggregation results
    • New cluster setting to enable/disable pattern aggregation pushdown
  • Documentation

    • Expanded docs explaining pushdown behavior, how to enable it, and memory/circuit-breaker considerations
  • Tests

    • Added integration and explain-plan tests covering pushdown and numbered-token scenarios

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds scripted-metric UDAF pushdown for pattern aggregation: introduces PatternAggregationHelpers, scripted-metric UDAF framework (UDAF interface, registry, per-phase factories, ScriptedMetricDataContext), Calcite/UDF integration, OpenSearch scripted-metric wiring, a runtime setting to toggle pushdown, tests, and docs/plan updates.

Changes

Cohort / File(s) Summary
Pattern Aggregation Helpers
common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java, core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java
New shared Map-based accumulator helpers for init/add/combine/result; LogPatternAggFunction refactored to delegate state updates and result production to helpers.
Settings & Runtime Flag
common/src/main/java/org/opensearch/sql/common/setting/Settings.java, opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java
Added CALCITE_UDAF_PUSHDOWN_ENABLED key and a dynamic node-scoped OpenSearch setting to enable/disable UDAF pushdown.
Calcite UDFs & Function Wiring
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java, core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java, core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java, core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java
New builtin function names for pattern phases and adapters wrapping PatternAggregationHelpers static methods into Calcite UDF operators; helper to adapt static methods as UDFs.
Pattern Parsing & Projection
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java, core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java
Added flattenParsedPattern and buildEvalAggSamplesCall to project parsed pattern/tokens/sample_logs; new evalAggSamples path for aggregation-mode parsing including optional numbered tokens.
Scripted-metric UDAF Framework
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricUDAF.java, .../ScriptedMetricUDAFRegistry.java, .../ScriptedMetricDataContext.java, .../udaf/PatternScriptedMetricUDAF.java
New public interface for scripted-metric UDAFs (lifecycle methods), registry for UDAFs, DataContext implementations for phases, and PatternScriptedMetricUDAF that builds RexNode-based init/map/combine/reduce scripts.
Script Factories (per-phase)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricInitScriptFactory.java, .../CalciteScriptedMetricMapScriptFactory.java, .../CalciteScriptedMetricCombineScriptFactory.java, .../CalciteScriptedMetricReduceScriptFactory.java
New factories that wrap compiled RexNode functions into ScriptedMetricAggContexts for init, map, combine, and reduce phases, binding DataContext/params/state/states.
Pushdown Integration & Request Analysis
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
AggregateAnalyzer gains udafPushdownEnabled guard and delegates INTERNAL_PATTERN pushdown to ScriptedMetricUDAFRegistry; CalciteLogicalIndexScan propagates the pushdown flag to the analyzer helper.
Script Engine & Parameter Handling
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java
Added scripted-metric script contexts to CalciteScriptEngine; ScriptParameterHelper gains addSpecialVariable to register special-variable params for scripted-metric bindings.
Scripted-metric Result Parsing
opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/ScriptedMetricParser.java
New MetricParser implementation to parse scripted-metric aggregation results into List/Map structure expected by the engine.
OpenSearch value parsing
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
Enhanced array parsing flow with unified parseArray pre-check and improved handling of empty/mixed arrays and single-element arrays.
Tests & Expectations
integ-test/src/test/java/.../CalcitePPLPatternsIT.java, integ-test/src/test/java/.../ExplainIT.java, ppl/src/test/java/.../CalcitePPLPatternsTest.java, test resources under integ-test/src/test/resources/expectedOutput/..., opensearch/src/test/java/.../AggregateAnalyzerTest.java
Added integration tests toggling pushdown, updated explain-plan expected outputs for pushdown/no-pushdown plans, and adjusted unit tests for constructor signature changes.
Documentation
docs/user/ppl/cmd/patterns.md
Documented UDAF pushdown option, how to enable it, and operational cautions (memory/circuit-breaker).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Coord as Coordinator
    participant DN1 as DataNode1
    participant DN2 as DataNode2
    participant DN3 as DataNode3
    participant Reducer as Reducer

    Client->>Coord: Submit pattern aggregation query (udaf pushdown enabled)
    Coord->>DN1: InitScript (state init)
    DN1->>DN1: MapScript (per-doc add -> buffer/partial merge)
    DN1->>DN1: CombineScript (emit shard state)
    Coord->>DN2: InitScript
    DN2->>DN2: MapScript
    DN2->>DN2: CombineScript
    Coord->>DN3: InitScript
    DN3->>DN3: MapScript
    DN3->>DN3: CombineScript
    DN1-->>Reducer: Shard state 1 (map)
    DN2-->>Reducer: Shard state 2
    DN3-->>Reducer: Shard state 3
    Reducer->>Reducer: ReduceScript (merge shard states -> PatternAggregationHelpers.produce)
    Reducer-->>Client: Final pattern results (List<Map<String,Object>>)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • #4868: Fixes LogPatternAggFunction buffer and return behavior — closely related to the refactor delegating accumulation/result to PatternAggregationHelpers.
  • #4914: Modifies ScriptParameterHelper and script pushdown infra — overlaps with added special-variable handling and scripted-metric integration.

Suggested labels

PPL, calcite

Suggested reviewers

  • LantaoJin
  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • anirudha
  • GumpacG
  • Swiddis
  • mengweieric
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'General UDAF pushdown as scripts' clearly and concisely describes the main objective of this pull request: implementing pushdown of UDAFs as scripted metric aggregations.
Description check ✅ Passed The description is directly related to the changeset, explaining the motivation for UDAF pushdown, linking to the corresponding issue, and documenting the testing and documentation status.
Linked Issues check ✅ Passed The PR implementation meets the core requirements from issue #4354: enabling UDAF pushdown via scripted metric aggregations (init/map/combine/reduce pipeline), with pattern aggregation as a primary use case and supporting infrastructure.
Out of Scope Changes check ✅ Passed All code changes are in scope with the linked issue's objectives: new UDAF scripting infrastructure, pattern aggregation helpers, registry/factory implementations, and integration tests directly support the scripted metric pushdown feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
yuancu
yuancu previously approved these changes Jan 23, 2026
Copy link
Copy Markdown
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +175 to +181
public static final Setting<?> CALCITE_UDAF_PUSHDOWN_ENABLED_SETTING =
Setting.boolSetting(
Key.CALCITE_UDAF_PUSHDOWN_ENABLED.getKeyValue(),
false,
Setting.Property.NodeScope,
Setting.Property.Dynamic);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a a feature flag (cluster setting) that gates UDAF pushdown on/off at runtime? This flag will be deprecated without notice if push-down stable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a feature flag to gates pushdown feature at runtime.

But I'm still cautious about deprecation. There is a comment left in OpenSearch ScriptedMetricAggregator: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/metrics/ScriptedMetricAggregator.java#L65-L73. It bypasses circuit breaker. It will be prudent to keep a kill switch.

"UDAF pushdown is disabled. Enable it via cluster setting"
+ " 'plugins.calcite.udaf_pushdown.enabled'");
}
yield ScriptedMetricUDAFRegistry.INSTANCE
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR need to backport to 2.x which using JDK 11, Can we avoid yield keyword?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heard we will no longer backport. So could we keep it here?

Comment on lines +630 to +634
.orElseThrow(
() ->
new AggregateAnalyzerException(
String.format(
"No scripted metric UDAF registered for %s", functionName)));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will it happen? is it a bug in our system?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it's not possible because we hardcoded the registry with only one supported UDAF(patterns) in this PR. It's a defensive check to provide more meaning info in case it really happens.

Comment on lines +170 to +177
ScriptParameterHelper initParamHelper =
new ScriptParameterHelper(fieldList, fieldTypes, rexBuilder);
ScriptParameterHelper mapParamHelper =
new ScriptParameterHelper(fieldList, fieldTypes, rexBuilder);
ScriptParameterHelper combineParamHelper =
new ScriptParameterHelper(fieldList, fieldTypes, rexBuilder);
ScriptParameterHelper reduceParamHelper =
new ScriptParameterHelper(fieldList, fieldTypes, rexBuilder);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There 4 parameters seem to be exactly the same. Do we really need all of them?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I get it. There is actually status in each param helper. We should clear these status if we want to reuse it, otherwise it will get messed.

It looks fine then.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, different phases of UDAF script will use different RexNodes to serialize script. So, yes. They will have different status of sources, digests in ScriptParameterHelper.

new ScriptContext(rexBuilder, combineParamHelper, cluster, rowType, fieldTypes);
ScriptContext reduceContext =
new ScriptContext(rexBuilder, reduceParamHelper, cluster, rowType, fieldTypes);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Since the param helpers are the same. The ScriptContext should be the same as well.

Comment on lines +77 to +83
mapArgs.add(getArgOrDefault(args, 1, makeIntLiteral(rexBuilder, DEFAULT_MAX_SAMPLE_COUNT)));
mapArgs.add(getArgOrDefault(args, 2, makeIntLiteral(rexBuilder, DEFAULT_BUFFER_LIMIT)));
mapArgs.add(
getArgOrDefault(args, 5, makeIntLiteral(rexBuilder, DEFAULT_VARIABLE_COUNT_THRESHOLD)));
mapArgs.add(
getArgOrDefault(args, 4, makeDoubleLiteral(rexBuilder, DEFAULT_THRESHOLD_PERCENTAGE)));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall these change to dynamic params? Otherwise, the script won't be reusable if these values differs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this part, we only define RexNode. The assignment of dynamic params for literals will be performed at serialization by RexStandardizer.standardizeRexNodeExpression.

Comment on lines +102 to +106
reduceArgs.add(getArgOrDefault(args, 1, makeIntLiteral(rexBuilder, DEFAULT_MAX_SAMPLE_COUNT)));

// Determine variableCountThreshold and thresholdPercentage
RexNode variableCountThreshold = makeIntLiteral(rexBuilder, DEFAULT_VARIABLE_COUNT_THRESHOLD);
RexNode thresholdPercentage = makeDoubleLiteral(rexBuilder, DEFAULT_THRESHOLD_PERCENTAGE);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above response.

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@songkant-aws
Copy link
Copy Markdown
Collaborator Author

songkant-aws commented Mar 3, 2026

Update the PR with a simple performance comparison between queries with pushdown disabled versus pushdown enabled.

The test query was executed on a three-shard index containing approximately 2.7 million rows of log messages, with profiling enabled:

curl -s -X POST http://localhost:9200/_plugins/_ppl -H 'Content-Type: application/json' -d '{"query":"source=logs-181998 | patterns request method=brain mode=aggregation","profile":true}'

Baseline(Pushdown disabled)

{
  "profile": {
    "summary": {
      "total_time_ms": 43977.34
    },
    "phases": {
      "analyze": {
        "time_ms": 4.17
      },
      "optimize": {
        "time_ms": 25.94
      },
      "execute": {
        "time_ms": 43947.1
      },
      "format": {
        "time_ms": 0.09
      }
    },
    "plan": {
      "node": "EnumerableLimit",
      "time_ms": 2.08,
      "rows": 29,
      "children": [
        {
          "node": "EnumerableCalc",
          "time_ms": 1.11,
          "rows": 29,
          "children": [
            {
              "node": "EnumerableCorrelate",
              "time_ms": 0.85,
              "rows": 29,
              "children": [
                {
                  "node": "EnumerableAggregate",
                  "time_ms": 0.0,
                  "rows": 1,
                  "children": [
                    {
                      "node": "EnumerableCalc",
                      "time_ms": 4304.11,
                      "rows": 2708746,
                      "children": [
                        {
                          "node": "CalciteEnumerableIndexScan",
                          "time_ms": 4150.36,
                          "rows": 2708746
                        }
                      ]
                    }
                  ]
                },
                {
                  "node": "EnumerableUncollect",
                  "time_ms": 0.22,
                  "rows": 29,
                  "children": [
                    {
                      "node": "EnumerableCalc",
                      "time_ms": 0.21,
                      "rows": 1,
                      "children": [
                        {
                          "node": "EnumerableValues",
                          "time_ms": 0.01,
                          "rows": 1
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  },
  "schema": [
    {
      "name": "patterns_field",
      "type": "string"
    },
    {
      "name": "pattern_count",
      "type": "bigint"
    },
    {
      "name": "sample_logs",
      "type": "array"
    }
  ],
  "datarows": [
    [
      "GET <*> HTTP/<*>",
      2699196,
      [
        "GET /english/images/nav_tickets_off.gif HTTP/1.1",
        "GET /english/images/comp_bg2_hm.jpg HTTP/1.0",
        "GET /english/nav_inet.html HTTP/1.1",
        "GET /robots.txt HTTP/1.0",
        "GET /french/individuals/player2913.htm HTTP/1.0",
        "GET /english/splash_inet.html HTTP/1.1",
        "GET /english/images/comp_bu_stage1n.gif HTTP/1.0",
        "GET /images/arw_lk.gif HTTP/1.0",
        "GET /english/images/space.gif HTTP/1.0",
        "GET /images/s102381.gif HTTP/1.0"
      ]
    ],
    [
      "POST <*> HTTP/<*>",
      7764,
      [
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0"
      ]
    ],
    [
      "HEAD <*> HTTP/<*>",
      1537,
      [
        "HEAD /french/tickets/ang.zip HTTP/1.0",
        "HEAD /english/venues/cities/images/paris/paris_i.gif HTTP/1.0",
        "HEAD /images/home_bg_stars.gif HTTP/1.0",
        "HEAD /images/cal_mont.gif HTTP/1.0",
        "HEAD /images/s102336.gif HTTP/1.0",
        "HEAD /english/index.html HTTP/1.0",
        "HEAD /images/s140875.gif HTTP/1.0",
        "HEAD /english/news/3004bres.htm HTTP/1.0",
        "HEAD /english/history/history_of/images/cup/515txt.gif HTTP/1.0",
        "HEAD /english/venues/images/Venue_map_top_off.gif HTTP/1.0"
      ]
    ],
    [
      "GET <*> HTTP/X.X",
      182,
      [
        "GET /english/playing/download/media/screen/win_95/dance.zip HTTP/X.X",
        "GET /english/playing/download/media/screen/win_95/intro.zip HTTP/X.X",
        "GET /english/playing/download/media/screen/win_95/dance.zip HTTP/X.X",
        "GET /english/playing/download/media/screen/win_95/intro.zip HTTP/X.X",
        "GET /cnfl/ HTTP/X.X",
        "GET /english/news/><HR><H3>Transfer/ HTTP/X.X",
        "GET /english/playing/download/media/screen/win_95/dance.zip HTTP/X.X",
        "GET /english/news/><HR><H3>Transfer/ HTTP/X.X",
        "GET /english/member/body.html HTTP/X.X",
        "GET /english/playing/download/media/screen/win_95/intro.zip HTTP/X.X"
      ]
    ]
   ...

UDAF Pushdown enabled

{
  "profile": {
    "summary": {
      "total_time_ms": 16231.11
    },
    "phases": {
      "analyze": {
        "time_ms": 3.29
      },
      "optimize": {
        "time_ms": 18.57
      },
      "execute": {
        "time_ms": 16209.09
      },
      "format": {
        "time_ms": 0.1
      }
    },
    "plan": {
      "node": "EnumerableLimit",
      "time_ms": 16207.37,
      "rows": 34,
      "children": [
        {
          "node": "EnumerableCalc",
          "time_ms": 16206.31,
          "rows": 34,
          "children": [
            {
              "node": "EnumerableCorrelate",
              "time_ms": 16206.04,
              "rows": 34,
              "children": [
                {
                  "node": "CalciteEnumerableIndexScan",
                  "time_ms": 16204.15,
                  "rows": 1
                },
                {
                  "node": "EnumerableUncollect",
                  "time_ms": 0.29,
                  "rows": 34,
                  "children": [
                    {
                      "node": "EnumerableCalc",
                      "time_ms": 0.27,
                      "rows": 1,
                      "children": [
                        {
                          "node": "EnumerableValues",
                          "time_ms": 0.01,
                          "rows": 1
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  },
  "schema": [
    {
      "name": "patterns_field",
      "type": "string"
    },
    {
      "name": "pattern_count",
      "type": "bigint"
    },
    {
      "name": "sample_logs",
      "type": "array"
    }
  ],
  "datarows": [
    [
      "GET <*> HTTP/<*>",
      2699196,
      [
        "GET /english/images/nav_tickets_off.gif HTTP/1.1",
        "GET /robots.txt HTTP/1.0",
        "GET /english/images/comp_bu_stage1n.gif HTTP/1.0",
        "GET /images/s102381.gif HTTP/1.0",
        "GET /images/s102381.gif HTTP/1.0",
        "GET /images/s102324.gif HTTP/1.1",
        "GET /images/hm_brdl.gif HTTP/1.1",
        "GET /english/tickets/images/ticket_hm_header.gif HTTP/1.0",
        "GET /images/space.gif HTTP/1.0",
        "GET /french/news/l3304.htm HTTP/1.0"
      ]
    ],
    [
      "POST <*> HTTP/<*>",
      7764,
      [
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0",
        "POST /cgi-bin/trivia/Trivia.pl HTTP/1.0"
      ]
    ],
    [
      "HEAD <*> HTTP/<*>",
      1537,
      [
        "HEAD /images/s140875.gif HTTP/1.0",
        "HEAD /english/venues/images/Venue_map_top_off.gif HTTP/1.0",
        "HEAD /english/venues/cities/images/lens/21discov.gif HTTP/1.0",
        "HEAD /english/history/images/history_hm_nav.gif HTTP/1.0",
        "HEAD /french/index.html HTTP/1.0",
        "HEAD /english/playing/images/france98b.GIF HTTP/1.0",
        "HEAD /french/images/fpnewstop.gif HTTP/1.0",
        "HEAD / HTTP/1.0",
        "HEAD /english/images/nav_sitemap_off.gif HTTP/1.0",
        "HEAD /french/news/2304tix.htm HTTP/1.0"
      ]
    ],
    [
      "GET <*> HTTP/X.X",
      176,
      [
        "GET /english/playing/download/media/screen/win_95/dance.zip HTTP/X.X",
        "GET /cnfl/ HTTP/X.X",
        "GET /english/playing/download/media/screen/win_95/dance.zip HTTP/X.X",
        "GET /english/news/><HR><H3>Transfer/ HTTP/X.X",
        "GET / HTTP/X.X",
        "GET /images/calendarvf.swf HTTP/X.X",
        "GET /english/index.html/screen/ HTTP/X.X",
        "GET /images/hobutt.gif HTTP/X.X",
        "GET /images/hobutt.gif HTTP/X.X",
        "GET /images/hobutt.gif HTTP/X.X"
      ]
    ],
    ...

Results show that the query with pushdown enabled is 2.7 times faster than without pushdown when running across three shards. The performance improvement scales with the number of shards available for parallel processing.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

PR Reviewer Guide 🔍

(Review updated until commit 16da4de)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Refactor pattern aggregation logic into shared helpers and update Calcite plan

Relevant files:

  • common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java
  • core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLPatternsTest.java

Sub-PR theme: Implement scripted metric UDAF pushdown infrastructure and pattern UDAF

Relevant files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricUDAF.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricDataContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/udaf/PatternScriptedMetricUDAF.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java

Sub-PR theme: Fix array parsing in ExprValueFactory and add UDAF pushdown integration tests

Relevant files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java

⚡ Recommended focus areas for review

Thread Safety

The addLogToPattern method modifies the accumulator map in-place (e.g., logMessages.add(logMessage), logMessages.clear(), acc.put(...)) without any synchronization. If multiple threads share the same accumulator (possible in certain Calcite execution models), this could lead to race conditions and data corruption.

public static Map<String, Object> addLogToPattern(
    Map<String, Object> acc,
    String logMessage,
    int maxSampleCount,
    int bufferLimit,
    int variableCountThreshold,
    double thresholdPercentage) {

  if (logMessage == null) {
    return acc;
  }

  List<String> logMessages = (List<String>) acc.get("logMessages");
  logMessages.add(logMessage);

  // Trigger partial merge when buffer reaches limit
  if (bufferLimit > 0 && logMessages.size() >= bufferLimit) {
    Map<String, Map<String, Object>> patternGroupMap =
        (Map<String, Map<String, Object>>) acc.get("patternGroupMap");

    BrainLogParser parser =
        new BrainLogParser(variableCountThreshold, (float) thresholdPercentage);
    Map<String, Map<String, Object>> partialPatterns =
        parser.parseAllLogPatterns(logMessages, maxSampleCount);

    patternGroupMap =
        PatternUtils.mergePatternGroups(patternGroupMap, partialPatterns, maxSampleCount);

    acc.put("patternGroupMap", patternGroupMap);
    logMessages.clear();
  }

  return acc;
}
Stale Doc Values

The dataContext is created once in the constructor using getDoc() and lookup.getLeafSearchLookup(leafContext).source(). The comment claims OpenSearch updates doc values internally before each execute() call, but the MapContext stores a reference to the doc map at construction time. If OpenSearch replaces the doc map object (rather than updating it in-place) between documents, the cached reference would be stale. This should be verified against OpenSearch's scripted metric internals.

  public CalciteScriptedMetricMapScript(
      Function1<DataContext, Object[]> function,
      Map<String, Object> params,
      Map<String, Object> state,
      SearchLookup lookup,
      LeafReaderContext leafContext) {
    super(params, state, lookup, leafContext);
    this.function = function;
    // Create DataContext once and reuse for all documents in this segment.
    // OpenSearch updates doc values and source lookup internally before each execute().
    this.dataContext =
        new ScriptedMetricDataContext.MapContext(
            params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
  }

  @Override
  @SuppressWarnings("unchecked")
  public void execute() {
    // Execute the compiled RexNode expression (reusing the same DataContext)
    Object[] result = function.apply(dataContext);
    ScriptedMetricDataContext.mergeResultIntoState(result, (Map<String, Object>) getState());
  }
}
Behavior Change

The new early array-check block (lines 202-209) runs before the existing field-type-undefined handling. This changes the parsing order for fields that are arrays but have no type mapping. Previously such fields would fall through to the "parse based on value" logic; now they are always parsed as arrays. This could break existing behavior for fields that happen to be arrays but were previously handled differently (e.g., GeoPoint exclusion may not cover all edge cases like nested objects stored as arrays).

if (content.isArray()
    && (fieldType.isEmpty() || supportArrays)
    && !fieldType
        .map(t -> t.equals(OpenSearchDataType.of(OpenSearchDataType.MappingType.GeoPoint)))
        .orElse(false)) {
  ExprType type = fieldType.orElse(ARRAY);
  return parseArray(content, field, type, supportArrays);
}
Fragile Arg Ordering

The resolveBrainParams method relies on alphabetical ordering of optional brain parameters (frequency_threshold_percentage before variable_count_threshold) to distinguish them when only one is present. This implicit contract is fragile — if the caller's ordering changes or a new parameter is added, the type-based disambiguation could silently produce wrong results. Consider using named parameters or a more explicit mapping.

private static RexNode[] resolveBrainParams(List<RexNode> args, RexBuilder rexBuilder) {
  RexNode variableCountThreshold = makeIntLiteral(rexBuilder, DEFAULT_VARIABLE_COUNT_THRESHOLD);
  RexNode thresholdPercentage = makeDoubleLiteral(rexBuilder, DEFAULT_THRESHOLD_PERCENTAGE);

  if (args.size() > 5) {
    // Both present: alphabetical order means args[4]=frequency_threshold_percentage,
    // args[5]=variable_count_threshold
    thresholdPercentage = args.get(4);
    variableCountThreshold = args.get(5);
  } else if (args.size() > 4) {
    // Only one present: determine by type
    RexNode arg4 = args.get(4);
    SqlTypeName arg4Type = arg4.getType().getSqlTypeName();
    if (arg4Type == SqlTypeName.DOUBLE
        || arg4Type == SqlTypeName.DECIMAL
        || arg4Type == SqlTypeName.FLOAT) {
      thresholdPercentage = arg4;
    } else {
      variableCountThreshold = arg4;
    }
  }

  return new RexNode[] {variableCountThreshold, thresholdPercentage};
}
TODO Left In

Multiple TODO: Add proper operand type checking comments are left in production code for PATTERN_ADD_UDF, PATTERN_FLUSH_UDF, and PATTERN_RESULT_UDF. Missing operand type checking means invalid arguments won't be caught at planning time, leading to runtime errors that are harder to diagnose.

public static final SqlOperator PATTERN_ADD_UDF =
    UserDefinedFunctionUtils.adaptStaticMethodToUDF(
            PatternAggregationHelpers.class,
            "addLogToPattern",
            ReturnTypes.explicit(SqlTypeName.ANY), // Returns Map<String, Object>
            NullPolicy.ANY,
            null) // TODO: Add proper operand type checking
        .toUDF("PATTERN_ADD_UDF");

public static final SqlOperator PATTERN_FLUSH_UDF =
    UserDefinedFunctionUtils.adaptStaticMethodToUDF(
            PatternAggregationHelpers.class,
            "flushPatternAccumulator",
            ReturnTypes.explicit(SqlTypeName.ANY), // Returns Map<String, Object>
            NullPolicy.ANY,
            null) // TODO: Add proper operand type checking
        .toUDF("PATTERN_FLUSH_UDF");

public static final SqlOperator PATTERN_RESULT_UDF =
    UserDefinedFunctionUtils.adaptStaticMethodToUDF(
            PatternAggregationHelpers.class,
            "producePatternResultFromStates",
            opBinding -> {
              // Returns List<Map<String, Object>> - represented as ARRAY<ANY>
              RelDataTypeFactory typeFactory = opBinding.getTypeFactory();
              RelDataType anyType = typeFactory.createSqlType(SqlTypeName.ANY);
              return SqlTypeUtil.createArrayType(typeFactory, anyType, true);
            },
            NullPolicy.ANY,
            null) // TODO: Add proper operand type checking
        .toUDF("PATTERN_RESULT_UDF");

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Persistent review updated to latest commit a4a6e4e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

PR Code Suggestions ✨

Latest suggestions up to 16da4de

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent unintended array parsing when arrays are unsupported

The new early array-detection block uses fieldType.isEmpty() || supportArrays as the
condition, but the original code only entered parseArray when supportArrays was true
or the type was ARRAY. This change may cause arrays to be parsed when supportArrays
is false and a field type is defined, potentially breaking existing behavior for
non-aggregation paths where arrays should be treated as single values.

opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java [202-209]

 if (content.isArray()
-    && (fieldType.isEmpty() || supportArrays)
+    && supportArrays
     && !fieldType
         .map(t -> t.equals(OpenSearchDataType.of(OpenSearchDataType.MappingType.GeoPoint)))
         .orElse(false)) {
   ExprType type = fieldType.orElse(ARRAY);
   return parseArray(content, field, type, supportArrays);
 }
Suggestion importance[1-10]: 7

__

Why: The new early array-detection block uses fieldType.isEmpty() || supportArrays which could cause arrays to be parsed when supportArrays is false and a field type is defined, potentially breaking existing behavior. The suggestion to use only supportArrays as the condition is more conservative and safer, though it may not handle the case of undefined field types with arrays in aggregation results that this PR is trying to fix.

Medium
Verify doc values reference is live for reuse

The dataContext is created once in the constructor with getDoc() at construction
time, but OpenSearch updates doc values internally before each execute() call. The
getDoc() reference may be a live map that OpenSearch updates in-place, but if it
returns a snapshot, the dataContext will use stale doc values for all documents
after the first. Verify that getDoc() returns a live reference; if not, the
DataContext must be recreated per document.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java [75-78]

+// Note: getDoc() must return a live reference that OpenSearch updates in-place.
+// If getDoc() returns a snapshot, dataContext must be recreated per execute() call.
 this.dataContext =
     new ScriptedMetricDataContext.MapContext(
         params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
Suggestion importance[1-10]: 6

__

Why: This is a valid concern about whether getDoc() returns a live reference. The improved code only adds a comment without changing behavior, so it's more of a documentation/verification suggestion. The risk is real if getDoc() returns a snapshot, but the improved code doesn't actually fix the issue.

Low
Avoid mutating original state during reduction

The first state (states.get(0)) is used directly as the initial combined value and
then passed to combinePatternAccumulators, which may mutate it. This could corrupt
the original state object. Create a defensive copy or start combining from index 0
explicitly to avoid unintended side effects.

common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java [433-437]

-Map<String, Object> combined = (Map<String, Object>) states.get(0);
+Map<String, Object> combined = new HashMap<>((Map<String, Object>) states.get(0));
 for (int i = 1; i < states.size(); i++) {
   Map<String, Object> state = (Map<String, Object>) states.get(i);
   combined = combinePatternAccumulators(combined, state, maxSampleCount);
 }
Suggestion importance[1-10]: 5

__

Why: The first state is used directly as combined and passed to combinePatternAccumulators, which creates a new HashMap result rather than mutating in-place. Looking at combinePatternAccumulators, it creates a new result map, so the original state is not mutated. The suggestion is valid as a defensive practice but the actual risk is low given the current implementation.

Low
General
Clarify which circuit breakers are bypassed

The warning mentions "certain memory circuit breakers" without specifying which ones
are bypassed. This could leave users without enough information to make an informed
decision. Consider clarifying which specific circuit breakers are bypassed (e.g.,
the parent/field data circuit breakers) so users understand the exact risk.

docs/user/ppl/cmd/patterns.md [102]

-> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses certain memory circuit breakers and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
+> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. Scripted metric aggregations bypass the parent and field data memory circuit breakers, which may cause out-of-memory errors on data nodes when processing very large datasets. Use with caution and monitor cluster memory usage closely.
Suggestion importance[1-10]: 4

__

Why: The suggestion improves documentation clarity by specifying which circuit breakers are bypassed, but it's a minor documentation enhancement that doesn't affect functionality. The current text is still informative enough for users to understand the risk.

Low
Improve type disambiguation for optional parameters

The type-based disambiguation for a single optional brain parameter at args[4] is
fragile. If the user passes variable_count_threshold as a value that happens to be
typed as DECIMAL (e.g., due to literal coercion), it will be incorrectly assigned to
thresholdPercentage. Consider using a more robust mechanism such as named parameters
or positional convention documented clearly, and add a validation check.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/udaf/PatternScriptedMetricUDAF.java [145-156]

 } else if (args.size() > 4) {
   // Only one present: determine by type
   RexNode arg4 = args.get(4);
   SqlTypeName arg4Type = arg4.getType().getSqlTypeName();
   if (arg4Type == SqlTypeName.DOUBLE
-      || arg4Type == SqlTypeName.DECIMAL
       || arg4Type == SqlTypeName.FLOAT) {
+    // Floating-point types unambiguously indicate frequency_threshold_percentage
+    thresholdPercentage = arg4;
+  } else if (arg4Type == SqlTypeName.INTEGER || arg4Type == SqlTypeName.BIGINT) {
+    // Integer types unambiguously indicate variable_count_threshold
+    variableCountThreshold = arg4;
+  } else if (arg4Type == SqlTypeName.DECIMAL) {
+    // DECIMAL could be either; treat as thresholdPercentage (fractional values expected)
     thresholdPercentage = arg4;
   } else {
     variableCountThreshold = arg4;
   }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion adds explicit handling for INTEGER and BIGINT types but the behavior for DECIMAL remains the same as the original. The improvement is marginal since the original code already handles DOUBLE, DECIMAL, and FLOAT as thresholdPercentage, and the new code doesn't fundamentally change the disambiguation logic.

Low
Fix incorrect code block language hint

The code block uses bash ignore as the language hint, but this is a REST API call,
not a bash command. Other examples in the same file use json or similar. Using bash
ignore may prevent syntax highlighting and confuse readers about how to execute the
command.

docs/user/ppl/cmd/patterns.md [93-100]

-```bash ignore
+```json
 PUT _cluster/settings
 {
   "persistent": {
     "plugins.calcite.udaf_pushdown.enabled": true
   }
 }
<details><summary>Suggestion importance[1-10]: 3</summary>

__

Why: The `bash ignore` language hint is indeed inconsistent with the REST API call format, but looking at the existing file, the earlier example at line 83 also uses `bash ignore` for the same type of `PUT _cluster/settings` call, suggesting this is an intentional convention in this file. The improvement is minor and cosmetic.


</details></details></td><td align=center>Low

</td></tr></tr></tbody></table>

___

#### Previous suggestions


<details><summary>Suggestions up to commit bf583fa</summary>
<br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=1>Possible issue</td>
<td>



<details><summary>Avoid mutating original state during reduction</summary>

___


**The first state (<code>states.get(0)</code>) is used directly as the initial <code>combined</code> value and <br>then passed to <code>combinePatternAccumulators</code>, which may mutate it. This can corrupt the <br>original state object. Create a defensive copy or start combining from index 0 <br>explicitly to avoid unintended side effects.**

[common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java [433-437]](https://github.com/opensearch-project/sql/pull/5064/files#diff-3db9c3255990772833efe78b2ae8f456a1b8b1323dbed16782965bdd776068b1R433-R437)

```diff
-Map<String, Object> combined = (Map<String, Object>) states.get(0);
+Map<String, Object> combined = new HashMap<>((Map<String, Object>) states.get(0));
 for (int i = 1; i < states.size(); i++) {
   Map<String, Object> state = (Map<String, Object>) states.get(i);
   combined = combinePatternAccumulators(combined, state, maxSampleCount);
 }
Suggestion importance[1-10]: 5

__

Why: The first state is used directly as combined and then passed to combinePatternAccumulators, which creates a new HashMap result, so the original state is not mutated. The suggestion is valid as a defensive practice but the actual risk is low given the current implementation of combinePatternAccumulators.

Low
General
Ensure source lookup instance is correctly reused per document

The DataContext is created once in the constructor using getDoc(), which returns the
doc map reference at construction time. However, OpenSearch updates doc values per
document by updating the contents of the map object, so the reference should remain
valid. But lookup.getLeafSearchLookup(leafContext).source() creates a new
SourceLookup per call — verify that the source lookup returned here is the same
instance that gets updated per document, otherwise source-based field access will
always return the first document's data.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java [75-78]

+var leafSearchLookup = lookup.getLeafSearchLookup(leafContext);
 this.dataContext =
     new ScriptedMetricDataContext.MapContext(
-        params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
+        params, state, getDoc(), leafSearchLookup.source());
Suggestion importance[1-10]: 4

__

Why: The improved_code is functionally equivalent to the existing code — it just stores the leafSearchLookup in a local variable before calling .source(). The actual concern about whether the SourceLookup instance is updated per document is valid but the suggested fix doesn't actually address it.

Low
Reuse DataContext instance to avoid redundant allocations

The combine script creates a new DataContext on every execute() call, which includes
re-parsing SOURCES and DIGESTS from params each time. Since params don't change
between calls, the DataContext should be created once (similar to MapScript) and
reused to avoid redundant object allocation and parsing overhead.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricCombineScriptFactory.java [45-56]

+private final DataContext dataContext;
+
+public CalciteScriptedMetricCombineScript(
+    Function1<DataContext, Object[]> function,
+    Map<String, Object> params,
+    Map<String, Object> state) {
+  super(params, state);
+  this.function = function;
+  this.dataContext = new ScriptedMetricDataContext.CombineContext(params, state);
+}
+
 @Override
 public Object execute() {
-  // Create data context for combine script
-  @SuppressWarnings("unchecked")
-  Map<String, Object> state = (Map<String, Object>) getState();
-  DataContext dataContext = new ScriptedMetricDataContext.CombineContext(getParams(), state);
-
-  // Execute the compiled RexNode expression
   Object[] result = function.apply(dataContext);
-
-  // Return the combined result
   return (result != null && result.length > 0) ? result[0] : getState();
 }
Suggestion importance[1-10]: 4

__

Why: The combine script's execute() is typically called once per shard, so the performance benefit of caching the DataContext is minimal. The suggestion is valid for consistency with MapScript but has low practical impact.

Low
Clarify which circuit breakers are bypassed

The warning mentions "certain memory circuit breakers" without specifying which ones
are bypassed. This could be misleading to users who need to understand the exact
risk. Consider clarifying which specific circuit breakers are bypassed (e.g., the
request circuit breaker or the field data circuit breaker) to help users make an
informed decision.

docs/user/ppl/cmd/patterns.md [102]

-> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses certain memory circuit breakers and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
+> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. Scripted metric aggregations bypass the request and field data memory circuit breakers, which may cause out-of-memory errors on data nodes when processing very large datasets. Use with caution and monitor cluster memory usage closely.
Suggestion importance[1-10]: 4

__

Why: The suggestion improves documentation clarity by specifying which circuit breakers are bypassed, but it's a minor documentation enhancement. The current text is not incorrect, just less specific.

Low
Validate list element types to prevent downstream ClassCastException

When result is an empty List, the code correctly wraps it. However, if the reduce
script returns a non-List, non-null value (e.g., a Map or scalar), the exception
message says "Expected List<Map<String, Object>>" but the actual contract should
also handle the case where result is a Map (e.g., a single result). More
importantly, the unchecked cast to List<Map<String, Object>> is implicit and could
cause a ClassCastException downstream — consider validating the list element types.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/ScriptedMetricParser.java [43-49]

-if (result instanceof List) {
+if (result instanceof List<?> list) {
+  // Validate that list contains maps (if non-empty)
+  if (!list.isEmpty() && !(list.get(0) instanceof Map)) {
+    throw new IllegalArgumentException(
+        String.format(
+            "Expected List<Map<String, Object>> from scripted metric but list contains %s",
+            list.get(0).getClass().getSimpleName()));
+  }
   return List.of(Map.of(name, result));
 }
 throw new IllegalArgumentException(
     String.format(
         "Expected List<Map<String, Object>> from scripted metric but got %s",
         result.getClass().getSimpleName()));
Suggestion importance[1-10]: 3

__

Why: The validation adds a runtime check for list element types, which is a defensive measure. However, since the reduce script is controlled code that always returns List<Map<String, Object>>, the risk of a ClassCastException is low in practice.

Low
Fix incorrect code block language hint

The code block uses bash ignore as the language hint, but the content is a REST API
call (JSON body with HTTP method), not a bash script. This is inconsistent with the
earlier cluster settings example in the same file which uses json. Using json would
provide proper syntax highlighting for the JSON body.

docs/user/ppl/cmd/patterns.md [93-100]

-```bash ignore
+```json
 PUT _cluster/settings
 {
   "persistent": {
     "plugins.calcite.udaf_pushdown.enabled": true
   }
 }
<details><summary>Suggestion importance[1-10]: 3</summary>

__

Why: The `bash ignore` language hint is inconsistent with the REST API content, but this is a minor style issue. The earlier example in the same file uses `json`, so consistency would be improved by changing to `json`.


</details></details></td><td align=center>Low

</td></tr></tr></tbody></table>

</details>
<details><summary>Suggestions up to commit d815a87</summary>
<br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=4>Possible issue</td>
<td>



<details><summary>Restrict early array detection to supportArrays context</summary>

___


**The new early array-detection block uses <code>(fieldType.isEmpty() || supportArrays)</code> as <br>its condition, but the original code path only entered <code>parseArray</code> when <code>supportArrays</code> <br>was true or the type was an OpenSearch ARRAY type. This change may cause array <br>parsing to be triggered for fields with a known non-array type (e.g., a keyword <br>field that happens to contain a JSON array due to multi-value), potentially breaking <br>existing behavior for those fields.**

[opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java [202-209]](https://github.com/opensearch-project/sql/pull/5064/files#diff-fef3b03341db11f5fc8b5f1f1470de8be5555a22f43d945dea3bdc25b68f3f48R202-R209)

```diff
 if (content.isArray()
-    && (fieldType.isEmpty() || supportArrays)
+    && supportArrays
     && !fieldType
         .map(t -> t.equals(OpenSearchDataType.of(OpenSearchDataType.MappingType.GeoPoint)))
         .orElse(false)) {
   ExprType type = fieldType.orElse(ARRAY);
   return parseArray(content, field, type, supportArrays);
 }
Suggestion importance[1-10]: 7

__

Why: The new early array-detection block with (fieldType.isEmpty() || supportArrays) could trigger array parsing for fields with known non-array types, potentially breaking existing behavior. The suggestion to restrict to supportArrays only is more conservative and safer, though it may not handle the fieldType.isEmpty() case that was intentionally added for unmapped fields in aggregation results.

Medium
Add null checks for accumulator arguments

If either acc1 or acc2 is null, the method will throw a NullPointerException. Add
null checks for both accumulators at the start of the method to handle cases where a
shard returns no data.

common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java [258-264]

 public static Map<String, Object> combinePatternAccumulators(
     Map<String, Object> acc1, Map<String, Object> acc2, int maxSampleCount) {
+  if (acc1 == null) return acc2 != null ? acc2 : new HashMap<>();
+  if (acc2 == null) return acc1;
 
   Map<String, Map<String, Object>> patterns1 =
       (Map<String, Map<String, Object>>) acc1.get("patternGroupMap");
   Map<String, Map<String, Object>> patterns2 =
       (Map<String, Map<String, Object>>) acc2.get("patternGroupMap");
Suggestion importance[1-10]: 6

__

Why: The combinePatternAccumulators method could throw a NullPointerException if either acc1 or acc2 is null, which is a valid scenario when a shard returns no data. Adding null guards is a reasonable defensive improvement.

Low
Verify cached DataContext doc reference safety

The dataContext is created once in the constructor with getDoc(), but getDoc()
returns the doc map reference at construction time. If OpenSearch replaces the doc
map reference (rather than updating it in-place) between documents, the cached
dataContext will hold a stale reference. Verify that OpenSearch always updates the
same map instance in-place; if not, the DataContext must be recreated per document.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java [75-78]

+// If OpenSearch updates doc values in-place (same map reference), this is safe.
+// Otherwise, override execute() to create a new MapContext per document.
 this.dataContext =
     new ScriptedMetricDataContext.MapContext(
         params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about whether getDoc() returns the same map instance updated in-place between documents. However, the improved code only adds a comment without actually fixing the potential issue, making it a verification suggestion rather than a concrete fix.

Low
Fix fragile type-based argument disambiguation

The type-based heuristic to distinguish thresholdPercentage from
variableCountThreshold when args.size() == 5 is fragile. If a user passes an
integer-typed threshold percentage or a decimal-typed variable count threshold, the
arguments will be silently swapped, producing incorrect results. The argument
positions should be fixed and documented rather than inferred from types.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/udaf/PatternScriptedMetricUDAF.java [126-139]

+// args[4] = thresholdPercentage, args[5] = variableCountThreshold (fixed positions)
 if (args.size() > 5) {
   thresholdPercentage = args.get(4);
   variableCountThreshold = args.get(5);
 } else if (args.size() > 4) {
-  RexNode arg4 = args.get(4);
-  SqlTypeName arg4Type = arg4.getType().getSqlTypeName();
-  if (arg4Type == SqlTypeName.DOUBLE
-      || arg4Type == SqlTypeName.DECIMAL
-      || arg4Type == SqlTypeName.FLOAT) {
-    thresholdPercentage = arg4;
-  } else {
-    variableCountThreshold = arg4;
-  }
+  thresholdPercentage = args.get(4);
 }
Suggestion importance[1-10]: 5

__

Why: The type-based heuristic to distinguish thresholdPercentage from variableCountThreshold is indeed fragile and could silently produce incorrect results. However, the improved code changes the semantics when args.size() == 5 (only sets thresholdPercentage, not variableCountThreshold), which may not be the intended fix and could break the existing logic.

Low
General
Clarify which circuit breakers are bypassed

The warning mentions "certain memory circuit breakers" but doesn't specify which
ones are bypassed. Users need to understand the specific risks to make an informed
decision. Consider clarifying which circuit breakers are bypassed (e.g., the
parent/request circuit breakers) and whether there are any recommended safeguards or
limits to set before enabling this feature.

docs/user/ppl/cmd/patterns.md [102]

-> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses certain memory circuit breakers and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
+> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. Scripted metric aggregations bypass the request and parent memory circuit breakers, which may cause out-of-memory errors on data nodes when processing very large datasets. Consider setting appropriate JVM heap sizes and monitoring node memory usage before enabling this feature in production. Use with caution.
Suggestion importance[1-10]: 4

__

Why: The suggestion improves documentation clarity by specifying which circuit breakers are bypassed, but this is a documentation enhancement rather than a critical fix. The existing warning is already informative enough for most users.

Low
Fix inconsistent code block language tag

The code block uses bash ignore as the language hint, which is inconsistent with the
existing cluster settings example earlier in the document that uses just bash or
json. Using ignore may prevent syntax highlighting in some renderers. Consider using
a consistent language tag like json since the content is a JSON body for a REST API
call.

docs/user/ppl/cmd/patterns.md [93-100]

-```bash ignore
+```json
 PUT _cluster/settings
 {
   "persistent": {
     "plugins.calcite.udaf_pushdown.enabled": true
   }
 }
<details><summary>Suggestion importance[1-10]: 3</summary>

__

Why: The `bash ignore` language tag is inconsistent with other code blocks in the document and may affect syntax highlighting. However, looking at the existing code block at line 83 which uses `bash ignore` as well, this appears to be an intentional pattern in this document, reducing the importance of this suggestion.


</details></details></td><td align=center>Low

</td></tr></tr></tbody></table>

</details>
<details><summary>Suggestions up to commit 0dece50</summary>
<br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=3>Possible issue</td>
<td>



<details><summary>Fix overly broad early array-handling condition</summary>

___


**The new early array-check condition uses <code>fieldType.isEmpty() || supportArrays</code>, which <br>means it will intercept arrays even when <code>supportArrays</code> is false and <code>fieldType</code> is <br>present (non-empty). This changes existing behavior for known-typed fields when <br><code>supportArrays</code> is false, potentially bypassing the original single-value extraction <br>path. The condition should be <code>fieldType.isEmpty() && !supportArrays</code> or restructured <br>to only apply when <code>fieldType</code> is absent.**

[opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java [202-209]](https://github.com/opensearch-project/sql/pull/5064/files#diff-fef3b03341db11f5fc8b5f1f1470de8be5555a22f43d945dea3bdc25b68f3f48R202-R209)

```diff
+// Only apply early array handling when field type is unknown (not in mapping)
 if (content.isArray()
-    && (fieldType.isEmpty() || supportArrays)
+    && fieldType.isEmpty()
     && !fieldType
         .map(t -> t.equals(OpenSearchDataType.of(OpenSearchDataType.MappingType.GeoPoint)))
         .orElse(false)) {
-  ExprType type = fieldType.orElse(ARRAY);
+  ExprType type = ARRAY;
   return parseArray(content, field, type, supportArrays);
 }
Suggestion importance[1-10]: 7

__

Why: The condition fieldType.isEmpty() || supportArrays could intercept arrays for known-typed fields when supportArrays is false, potentially changing existing behavior. This is a legitimate concern about the correctness of the new early array-check path, though the suggested fix may be too restrictive by removing the supportArrays branch entirely.

Medium
Guard against null pattern maps before merging

If either patterns1 or patterns2 is null (e.g., when acc1 or acc2 was never
populated with a patternGroupMap), passing null to PatternUtils.mergePatternGroups
may cause a NullPointerException. Add null-checks or default to empty maps before
merging.

common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java [262-268]

 public static Map<String, Object> combinePatternAccumulators(
     Map<String, Object> acc1, Map<String, Object> acc2, int maxSampleCount) {
 
   Map<String, Map<String, Object>> patterns1 =
-      (Map<String, Map<String, Object>>) acc1.get("patternGroupMap");
+      acc1.get("patternGroupMap") != null
+          ? (Map<String, Map<String, Object>>) acc1.get("patternGroupMap")
+          : new HashMap<>();
   Map<String, Map<String, Object>> patterns2 =
-      (Map<String, Map<String, Object>>) acc2.get("patternGroupMap");
+      acc2.get("patternGroupMap") != null
+          ? (Map<String, Map<String, Object>>) acc2.get("patternGroupMap")
+          : new HashMap<>();
 
   Map<String, Map<String, Object>> merged =
       PatternUtils.mergePatternGroups(patterns1, patterns2, maxSampleCount);
Suggestion importance[1-10]: 6

__

Why: If acc1 or acc2 was never populated with a patternGroupMap, passing null to PatternUtils.mergePatternGroups could cause a NullPointerException. The fix adds null-safety, though in practice the accumulators are always initialized via initPatternAccumulator() which sets an empty map.

Low
Verify doc values reference is live across documents

The dataContext is created once in the constructor using getDoc(), but OpenSearch
updates the doc values reference per document. If getDoc() returns a snapshot rather
than a live reference, the dataContext will read stale doc values for subsequent
documents. Verify that getDoc() returns a live/mutable reference that OpenSearch
updates in-place; if not, the DataContext must be recreated per execute() call.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java [75-78]

+// If getDoc() does not return a live reference updated by OpenSearch per document,
+// create the DataContext per execute() call instead:
 this.dataContext =
     new ScriptedMetricDataContext.MapContext(
         params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
+// NOTE: Confirm getDoc() is a live reference; otherwise move DataContext creation to execute()
Suggestion importance[1-10]: 5

__

Why: The concern about getDoc() returning a stale snapshot is valid, but the suggestion only asks to verify behavior rather than providing a concrete fix. The improved_code is essentially the same as existing_code with a comment added.

Low
General
Use fixed argument positions instead of type-based disambiguation

The type-based disambiguation for arg4 when args.size() == 5 is fragile: if a user
passes an integer literal for thresholdPercentage or a decimal for
variableCountThreshold, the wrong parameter will be assigned. The argument positions
should be fixed and documented rather than inferred from runtime types, matching the
positional convention used in buildMapScript and buildCombineScript
(args[4]=thresholdPercentage, args[5]=variableCountThreshold).

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/udaf/PatternScriptedMetricUDAF.java [127-140]

+// Use fixed positional convention: args[4]=thresholdPercentage, args[5]=variableCountThreshold
 if (args.size() > 5) {
   thresholdPercentage = args.get(4);
   variableCountThreshold = args.get(5);
 } else if (args.size() > 4) {
-  RexNode arg4 = args.get(4);
-  SqlTypeName arg4Type = arg4.getType().getSqlTypeName();
-  if (arg4Type == SqlTypeName.DOUBLE
-      || arg4Type == SqlTypeName.DECIMAL
-      || arg4Type == SqlTypeName.FLOAT) {
-    thresholdPercentage = arg4;
-  } else {
-    variableCountThreshold = arg4;
-  }
+  thresholdPercentage = args.get(4);
 }
Suggestion importance[1-10]: 6

__

Why: The type-based disambiguation for arg4 is fragile and inconsistent with the positional convention used in buildMapScript and buildCombineScript. Using fixed positions is more robust and maintainable, though the improved code drops variableCountThreshold assignment when only 5 args are present.

Low
Clarify which circuit breakers are bypassed

The warning mentions "certain memory circuit breakers" without specifying which ones
are bypassed. This could leave users without enough information to assess the risk.
Consider clarifying which specific circuit breakers are bypassed (e.g., the
parent/field data circuit breakers) so users can make an informed decision.

docs/user/ppl/cmd/patterns.md [102]

-> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses certain memory circuit breakers and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
+> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. Scripted metric aggregations bypass the parent and field data memory circuit breakers, which may cause out-of-memory errors on data nodes when processing very large datasets. Use with caution and monitor cluster memory usage closely.
Suggestion importance[1-10]: 4

__

Why: The suggestion improves documentation clarity by specifying which circuit breakers are bypassed, helping users make more informed decisions. However, it's a documentation improvement with moderate impact, and the specific circuit breakers mentioned (parent and field data) may not be fully accurate without verification.

Low
Suggestions up to commit a4a6e4e
CategorySuggestion                                                                                                                                    Impact
General
Handle null aggregation result gracefully

When result is null, the instanceof List check returns false and the code falls
through to throw an exception with a null-safe message. However, an empty result
(null) from a scripted metric aggregation on an empty index is a valid scenario and
should return an empty list rather than throwing an exception. Handle the null case
explicitly to avoid unexpected failures on empty indices.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/ScriptedMetricParser.java [40-47]

+if (result == null) {
+  return List.of();
+}
 if (result instanceof List) {
   return List.of(Map.of(name, result));
 }
 throw new IllegalArgumentException(
     String.format(
         "Expected List<Map<String, Object>> from scripted metric but got %s",
-        result == null ? "null" : result.getClass().getSimpleName()));
+        result.getClass().getSimpleName()));
Suggestion importance[1-10]: 6

__

Why: A null result from a scripted metric aggregation on an empty index is a realistic scenario, and throwing an exception in that case would cause unexpected failures. Returning an empty list for null results is a more robust approach and prevents runtime errors on empty indices.

Low
Replace magic number with enum constant reference

The magic number 3 is used directly instead of referencing the
Source.SPECIAL_VARIABLE.getValue() enum constant. This creates a maintenance risk —
if the enum values change, this code will silently break. Use the enum value
directly to keep the code consistent and safe.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java [108-113]

 public int addSpecialVariable(String variableName) {
   int index = sources.size();
-  sources.add(3); // SPECIAL_VARIABLE = 3
+  sources.add(CalciteScriptEngine.Source.SPECIAL_VARIABLE.getValue());
   digests.add(variableName);
   return index;
 }
Suggestion importance[1-10]: 5

__

Why: Using the magic number 3 instead of CalciteScriptEngine.Source.SPECIAL_VARIABLE.getValue() creates a maintenance risk if enum values change. This is a valid maintainability concern, though the risk is low since the enum is defined in the same codebase.

Low
Avoid mutating input state during reduce phase

The first state (states.get(0)) is directly cast and used as the initial combined
accumulator without copying. Since combinePatternAccumulators creates a new map for
subsequent merges, the first state's logMessages and patternGroupMap are passed into
the first merge but the original state object is not mutated. However, if
states.get(0) is the same object reference used elsewhere (e.g., by OpenSearch),
mutating it could cause issues. More critically, the buffered logMessages in
states.get(0) are included in the combine but never processed through the parser
before the final producePatternResult call, which handles remaining logs correctly —
this is fine. The logic is safe as written, but consider using
initPatternAccumulator() as the starting point and iterating all states for clarity
and safety.

common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java [460-465]

-// Combine all states into a single accumulator
-Map<String, Object> combined = (Map<String, Object>) states.get(0);
-for (int i = 1; i < states.size(); i++) {
-  Map<String, Object> state = (Map<String, Object>) states.get(i);
+// Combine all states into a single accumulator starting from an empty one
+Map<String, Object> combined = initPatternAccumulator();
+for (Object stateObj : states) {
+  Map<String, Object> state = (Map<String, Object>) stateObj;
   combined = combinePatternAccumulators(combined, state, maxSampleCount);
 }
Suggestion importance[1-10]: 4

__

Why: Starting from an empty accumulator and iterating all states is cleaner and avoids any risk of aliasing the first state object. However, since combinePatternAccumulators creates a new map for the result, the original states.get(0) is not mutated, making this primarily a readability/safety improvement rather than a bug fix.

Low
Clarify which circuit breakers are bypassed

The warning mentions "certain memory circuit breakers" without specifying which ones
are bypassed. This could leave users without enough information to make an informed
decision. Consider clarifying which specific circuit breakers are bypassed (e.g.,
the parent/request circuit breakers) so users understand the exact risk.

docs/user/ppl/cmd/patterns.md [102]

-> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses certain memory circuit breakers and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
+> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses the request and parent memory circuit breakers (which normally protect against excessive memory usage during query execution) and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
Suggestion importance[1-10]: 4

__

Why: The suggestion improves clarity by specifying which circuit breakers are bypassed, but this is a minor documentation enhancement. The exact circuit breakers bypassed may depend on implementation details that the PR author would need to verify.

Low
Fix inconsistent code block language hint

The code block uses bash ignore as the language hint, which is inconsistent with the
other cluster settings examples in the same file that use bash. Using ignore will
prevent syntax highlighting and may confuse readers or tooling. It should be changed
to match the existing style.

docs/user/ppl/cmd/patterns.md [93-100]

-```bash ignore
+```bash
 PUT _cluster/settings
 {
   "persistent": {
     "plugins.calcite.udaf_pushdown.enabled": true
   }
 }
<details><summary>Suggestion importance[1-10]: 3</summary>

__

Why: The `bash ignore` language hint is inconsistent with other code blocks in the file that use `bash`. This is a minor style/consistency issue that affects syntax highlighting but not functionality.


</details></details></td><td align=center>Low

</td></tr><tr><td rowspan=1>Possible issue</td>
<td>



<details><summary>Ensure doc values are fresh per document execution</summary>

___


**The <code>MapContext</code> is created once in the constructor using <code>getDoc()</code>, which returns the <br>doc map at construction time. However, OpenSearch updates the doc values internally <br>before each <code>execute()</code> call by updating the same map reference. If <code>getDoc()</code> returns a <br>snapshot rather than the live reference, doc values will be stale for all documents <br>after the first. Verify that <code>getDoc()</code> returns the same mutable map reference that <br>OpenSearch updates per document, otherwise the DataContext must be recreated per <br><code>execute()</code> call.**

[opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java [73-78]](https://github.com/opensearch-project/sql/pull/5064/files#diff-71e9a88d7cdaf170b68b083346df8ec904a1c21ea50643c66d8a03f405e7e0d1R73-R78)

```diff
-this.dataContext =
-    new ScriptedMetricDataContext.MapContext(
-        params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
+@Override
+@SuppressWarnings("unchecked")
+public void execute() {
+  // Recreate DataContext per document to ensure fresh doc values and source lookup
+  DataContext ctx = new ScriptedMetricDataContext.MapContext(
+      getParams(), (Map<String, Object>) getState(), getDoc(),
+      lookup.getLeafSearchLookup(leafContext).source());
+  Object[] result = function.apply(ctx);
+  ScriptedMetricDataContext.mergeResultIntoState(result, (Map<String, Object>) getState());
+}
Suggestion importance[1-10]: 5

__

Why: The concern about stale doc values is valid in principle, but the code comment explicitly explains that doc and sourceLookup are updated internally by OpenSearch before each execute() call, meaning the same map reference is mutated in-place. The suggestion to recreate the context per document would add overhead without necessarily fixing a real bug, and the existing approach is a deliberate optimization.

Low
Suggestions up to commit a4a6e4e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Flush buffered messages before combining accumulators

The combinePatternAccumulators method does not process the buffered logMessages from
either accumulator before merging patternGroupMap. If either accumulator has
unprocessed messages in its buffer, those messages will be silently dropped during
the combine phase, leading to incorrect pattern counts and missing data in the final
result.
Before merging the pattern group maps, the buffered log messages from both
accumulators should be parsed and merged into their respective patternGroupMap
entries first.

common/src/main/java/org/opensearch/sql/common/patterns/PatternAggregationHelpers.java [260-269]

+@SuppressWarnings("unchecked")
 public static Map<String, Object> combinePatternAccumulators(
     Map<String, Object> acc1, Map<String, Object> acc2, int maxSampleCount) {
 
+  // Flush buffered messages from acc1 before combining
+  List<String> logMessages1 = (List<String>) acc1.get("logMessages");
   Map<String, Map<String, Object>> patterns1 =
       (Map<String, Map<String, Object>>) acc1.get("patternGroupMap");
+  if (logMessages1 != null && !logMessages1.isEmpty()) {
+    BrainLogParser parser = new BrainLogParser(
+        BrainLogParser.DEFAULT_VARIABLE_COUNT_THRESHOLD,
+        (float) BrainLogParser.DEFAULT_FREQUENCY_THRESHOLD_PERCENTAGE);
+    patterns1 = PatternUtils.mergePatternGroups(
+        patterns1, parser.parseAllLogPatterns(logMessages1, maxSampleCount), maxSampleCount);
+    logMessages1.clear();
+  }
+
+  // Flush buffered messages from acc2 before combining
+  List<String> logMessages2 = (List<String>) acc2.get("logMessages");
   Map<String, Map<String, Object>> patterns2 =
       (Map<String, Map<String, Object>>) acc2.get("patternGroupMap");
+  if (logMessages2 != null && !logMessages2.isEmpty()) {
+    BrainLogParser parser = new BrainLogParser(
+        BrainLogParser.DEFAULT_VARIABLE_COUNT_THRESHOLD,
+        (float) BrainLogParser.DEFAULT_FREQUENCY_THRESHOLD_PERCENTAGE);
+    patterns2 = PatternUtils.mergePatternGroups(
+        patterns2, parser.parseAllLogPatterns(logMessages2, maxSampleCount), maxSampleCount);
+    logMessages2.clear();
+  }
 
   Map<String, Map<String, Object>> merged =
       PatternUtils.mergePatternGroups(patterns1, patterns2, maxSampleCount);
Suggestion importance[1-10]: 7

__

Why: The combinePatternAccumulators method merges patternGroupMap entries but ignores buffered logMessages in both accumulators. However, looking at the code more carefully, the combine phase in producePatternResultFromStates calls combinePatternAccumulators which merges the logMessages lists (lines 272-280), and then producePatternResult flushes remaining logMessages. So the buffered messages are not dropped but carried forward. The suggestion has some merit but the current design intentionally defers flushing to the reduce phase.

Medium
Handle null scripted metric result gracefully

When result is null (e.g., no documents matched or all shards returned null), the
code throws an IllegalArgumentException instead of returning an empty result. A null
result from a scripted metric aggregation is a valid scenario (e.g., empty index or
no matching documents) and should be handled gracefully by returning an empty list.

opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/ScriptedMetricParser.java [40-46]

 if (result instanceof List) {
     return List.of(Map.of(name, result));
+  }
+  if (result == null) {
+    return List.of();
   }
   throw new IllegalArgumentException(
       String.format(
           "Expected List<Map<String, Object>> from scripted metric but got %s",
-          result == null ? "null" : result.getClass().getSimpleName()));
+          result.getClass().getSimpleName()));
Suggestion importance[1-10]: 6

__

Why: A null result from a scripted metric aggregation (e.g., empty index) would cause an IllegalArgumentException instead of returning an empty list. This is a valid edge case that should be handled gracefully to avoid unexpected errors in production.

Low
Ensure per-document source lookup is correctly advanced

The DataContext is created once in the constructor using getDoc() and
lookup.getLeafSearchLookup(leafContext).source(). However, getDoc() returns a
reference to the doc map that OpenSearch updates in-place before each execute()
call, but lookup.getLeafSearchLookup(leafContext).source() returns a SourceLookup
instance that may need to be advanced per document. If the SourceLookup is not the
same instance that OpenSearch advances, source field access will return stale data
for all documents after the first.
Verify that the SourceLookup obtained via
lookup.getLeafSearchLookup(leafContext).source() is the same instance that
OpenSearch advances per document, or obtain it lazily per execute() call to ensure
correctness.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/CalciteScriptedMetricMapScriptFactory.java [65-78]

 public CalciteScriptedMetricMapScript(
     Function1<DataContext, Object[]> function,
     Map<String, Object> params,
     Map<String, Object> state,
     SearchLookup lookup,
     LeafReaderContext leafContext) {
   super(params, state, lookup, leafContext);
   this.function = function;
-  // Create DataContext once and reuse for all documents in this segment.
-  // OpenSearch updates doc values and source lookup internally before each execute().
+  // Use the SourceLookup from the leaf search lookup - OpenSearch advances this per document.
+  // getDoc() returns the same map reference that OpenSearch updates in-place per document.
   this.dataContext =
       new ScriptedMetricDataContext.MapContext(
-          params, state, getDoc(), lookup.getLeafSearchLookup(leafContext).source());
+          params, state, getDoc(), getLeafLookup().source());
 }
Suggestion importance[1-10]: 5

__

Why: The concern about SourceLookup staleness is valid in principle, but the comment in the code already acknowledges that OpenSearch updates doc values and source lookup internally. The suggestion to use getLeafLookup().source() may not be a valid API call on ScriptedMetricAggContexts.MapScript. The issue needs verification but the improved code may not compile correctly.

Low
Fix inconsistent argument index mapping between script phases

The argument index mapping in buildReduceScript is inconsistent with buildMapScript.
In buildMapScript, args[4] is thresholdPercentage and args[5] is
variableCountThreshold, but in buildReduceScript when args.size() > 5, args.get(4)
is assigned to thresholdPercentage and args.get(5) to variableCountThreshold. The
type-based disambiguation for args.size() == 5 is fragile and error-prone. The
argument ordering should be consistent between map and reduce scripts.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/udaf/PatternScriptedMetricUDAF.java [108-121]

 if (args.size() > 5) {
   thresholdPercentage = args.get(4);
   variableCountThreshold = args.get(5);
 } else if (args.size() > 4) {
-  RexNode arg4 = args.get(4);
-  SqlTypeName arg4Type = arg4.getType().getSqlTypeName();
-  if (arg4Type == SqlTypeName.DOUBLE
-      || arg4Type == SqlTypeName.DECIMAL
-      || arg4Type == SqlTypeName.FLOAT) {
-    thresholdPercentage = arg4;
-  } else {
-    variableCountThreshold = arg4;
-  }
+  // args[4] = thresholdPercentage (consistent with buildMapScript ordering)
+  thresholdPercentage = args.get(4);
 }
Suggestion importance[1-10]: 4

__

Why: The type-based disambiguation for args.size() == 5 is fragile, but the improved code removes the variableCountThreshold assignment for the 5-argument case entirely, which changes behavior. The existing code at least attempts to handle both cases, while the suggestion silently drops variableCountThreshold when only 5 args are present.

Low
General
Clarify which circuit breakers are bypassed

The warning mentions "certain memory circuit breakers" without specifying which ones
are bypassed. This is important security/operational information that users need to
make an informed decision. Consider clarifying which specific circuit breakers are
bypassed (e.g., the parent/field data circuit breakers) so users understand the
exact risk.

docs/user/ppl/cmd/patterns.md [102]

-> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. This bypasses certain memory circuit breakers and may cause out-of-memory errors on nodes when processing very large datasets. Use with caution and monitor cluster resource usage.
+> **Warning**: Enabling UDAF pushdown executes user-defined aggregation functions as scripted metric aggregations on OpenSearch data nodes. Scripted metric aggregations bypass the parent and field data memory circuit breakers, which may cause out-of-memory errors on data nodes when processing very large datasets. Use with caution and monitor cluster heap usage closely.
Suggestion importance[1-10]: 4

__

Why: The suggestion improves documentation clarity by specifying which circuit breakers are bypassed, but this is a minor documentation enhancement rather than a critical fix. The existing warning is already informative enough for most users.

Low
Fix incorrect code block language tag<...

…reduce

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Persistent review updated to latest commit 0dece50

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d815a87.

PathLineSeverityDescription
integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_brain_agg_push.yaml22mediumBase64-encoded payloads with Java serialization magic bytes (rO0ABX prefix, indicating Java object serialization) are embedded in OpenSearch scripted metric scripts. These serialized Calcite RexNode expressions are sent to and deserialized on data nodes. Java deserialization of remotely-supplied payloads is a known RCE vector (CVE pattern). While the serialization appears to be from server-generated query plans, the lack of explicit deserialization safeguards in CalciteScriptEngine warrants investigation.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/ScriptedMetricDataContext.java107mediumThe mergeResultIntoState() method performs an unchecked cast of arbitrary execution results into the shared state map (state.putAll((Map) result[0])). Combined with the UDAF pushdown explicitly bypassing OpenSearch memory circuit breakers (documented in the PR), a malicious or malformed payload could overwrite state entries or trigger unbounded memory growth on data nodes without circuit breaker protection. The bypass of safety mechanisms with no alternative bounds checking is a concern.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/scriptedmetric/udaf/PatternScriptedMetricUDAF.java93lowPATTERN_FLUSH_UDF is referenced in buildCombineScript() and defined in PPLBuiltinOperators.java, but has no corresponding entry in BuiltinFunctionName.java (only PATTERN_INIT_UDF, PATTERN_ADD_UDF, PATTERN_COMBINE_UDF, and PATTERN_RESULT_UDF are registered). This registry inconsistency could indicate an incomplete or hidden registration path for a script executed on remote data nodes.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Persistent review updated to latest commit d815a87

Signed-off-by: Songkan Tang <songkant@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Persistent review updated to latest commit bf583fa

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@ahkcs
Copy link
Copy Markdown
Collaborator

ahkcs commented Mar 17, 2026

Hi @songkant-aws, any updates on this PR?

@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

@songkant-aws songkant-aws requested a review from ahkcs as a code owner April 2, 2026 03:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Persistent review updated to latest commit 16da4de

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pushdown pushdown related issues stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Pushdown any UDAF by scripted metric aggregations

5 participants