Skip to content

Conversation

@joegallo
Copy link
Contributor

@joegallo joegallo commented Nov 5, 2025

These work, and probably have worked forever, but I don't think we document it anywhere. I think it would be useful to have test coverage for these behaviors since they're somewhat obscure.


First, conditionals are commonly seen to have expressions. But they can also have statements. This can be useful as an optimization. For example, this expression:

"if": "ctx.foo?.bar?.baz != null && ctx.foo.bar.baz != 'reject' && ctx.foo.bar.baz != 'failure'"

Can be rewritten as these statements:

"if": "def baz = ctx.foo?.bar?.baz; return baz != null && baz != 'reject' && baz != 'failure';"

The latter approach might be faster than the former (note: you should profile your code, not trust random github PR descriptions) because it descends the ctx object tree via chained Map#get invocations just once, rather than up to three times.


Second, did you know that "if" doesn't have to be a String? It can be a map, too! And the map can have params. An especially useful application of this trick would be a conditional that's checking a candidate value against a large group of other values, like what one might choose to do here.

To my understanding, as written, that example currently compiles down to bytecode that creates an empty ArrayList, then adds twelve strings to it (note: the ArrayList is resized from the default capacity of 10 to a capacity of 15 in order to accommodate the 11th string, with the ensuing array copy), and then a linear scan of the ArrayList is performed in the .contains(...). And that's all performed for every document that goes through the thus-defined processor.

By way of contrast, this would work similarly in terms of effect, but be much faster (note: assuming flow-style inline maps are supported by whatever is handling the yaml there, otherwise the syntax would be slightly different):

- set:
    tag: set_event_kind_23ecd1e1
    field: event.kind
    value: alert
    if: 
     source: 'params.containsKey(ctx.event?.code)'
     params: { "13001":true, "13002":true, "13004":true, "13005":true, "13006":true, "13009":true, "13012":true, "13014":true, "14001":true, "14002":true, "15001":true, "15002":true }

This version is admittedly a bit uglier, but since it generates a single static params map of all the values just once at script definition time and then does a fast containsKey check on that map, it's quite a bit faster (and allocation-free).

The most ideal version of that might not rely on the set semantics of the map keys, but in the absence of support for constant set collections in painless, this is probably the optimal mix of readable and fast for the state of the current technology.

@joegallo joegallo requested a review from masseyke November 5, 2025 20:48
@joegallo joegallo added >test Issues or PRs that are addressing/adding tests :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v9.3.0 v8.19.8 auto-backport Automatically create backport pull requests when merged labels Nov 5, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Nov 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo joegallo merged commit 58a986c into elastic:main Nov 6, 2025
34 checks passed
@joegallo joegallo deleted the ingest-script-conditional-edge-cases branch November 6, 2025 02:41
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137645

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 6, 2025
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 6, 2025
…-json

* upstream/main:
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
  [LTR] Fix feature display order when using explain. (elastic#137671)
  Remove extra RemoteClusterService instances in unit test (elastic#137647)
  Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
  Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
  Update bundled JDK to 25.0.1 (elastic#137640)
  resolve indices for prefixed _all expressions (elastic#137330)
  ESQL: Add TopN support for exponential histograms (elastic#137313)
  allows field caps to be cross project (elastic#137530)
  ESQL: Add exponential histogram percentile function (elastic#137553)
  Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
  Tighten on when THROTTLE decision can be returned (elastic#136794)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
  Add a test for two little known conditional processor paths (elastic#137645)
  Extract a common ORIGIN constant (elastic#137612)
  Remove early phase failure in batched (elastic#136889)
  Returning correct index mode from get data streams api (elastic#137646)
  [ML] Manage AD results indices (elastic#136065)
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 6, 2025
…-json

* upstream/main:
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
  Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
  [LTR] Fix feature display order when using explain. (elastic#137671)
  Remove extra RemoteClusterService instances in unit test (elastic#137647)
  Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
  Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
  Update bundled JDK to 25.0.1 (elastic#137640)
  resolve indices for prefixed _all expressions (elastic#137330)
  ESQL: Add TopN support for exponential histograms (elastic#137313)
  allows field caps to be cross project (elastic#137530)
  ESQL: Add exponential histogram percentile function (elastic#137553)
  Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
  Tighten on when THROTTLE decision can be returned (elastic#136794)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
  Add a test for two little known conditional processor paths (elastic#137645)
  Extract a common ORIGIN constant (elastic#137612)
  Remove early phase failure in batched (elastic#136889)
  Returning correct index mode from get data streams api (elastic#137646)
  [ML] Manage AD results indices (elastic#136065)
@joegallo
Copy link
Contributor Author

joegallo commented Nov 6, 2025

#137692 is up for the 8.19 backport, so I'm removing the backport pending label.

Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team >test Issues or PRs that are addressing/adding tests v8.19.8 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants