Skip to content

ESQL/Ingest pipeline: Multiple grok patterns can leak into each other #136750

@flash1293

Description

@flash1293

Elasticsearch Version

9.3

Installed Plugins

No response

Java Version

bundled

OS Version

doesn't matter

Problem Description

As described in #136541 (comment) , multiple grok patterns are combined to (<pattern 1>)|(<pattern 2>), then compiled as a single pattern and validated. This is efficient, but since the pattern strings are not validated individually, they can interfere with the syntax to combine them.

Steps to Reproduce

POST /_ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "foo": "Test)x"
      }
    }
  ],
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "foo",
          "patterns": [
            "%{WORD:word}\\",
            "x)"
          ]
        }
      }
    ]
  }
}

this should throw an error because the provided patterns are not valid on their own. However, they get turned into (%{WORD:word}\\)|(x)), which is a valid grok.

This is a low priority issue that existed for a long time, but for proper error messages and consistency it would be great to validate the patterns individually.

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions