Skip to content

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Oct 14, 2025

Closes #132486

This PR adds the ability to specify multiple grok patterns as part of a single grok command. Consistent with the grok processor for ingest pipelines, they are tried in order, the first matching one is actually applied:

POST _query
{
  "query": """
    ROW col1="123 This is a test" | GROK col1 "%{UUID:def}", "%{WORD:xxx}"
  """
}

returns

       col1       |      def      |      xxx
------------------+---------------+---------------
123 This is a test|null           |123

It's not allowed to have different types for the same semantic names in different patterns:

POST _query
{
  "query": """
    ROW col1="123 This is a test" | GROK col1 "%{UUID:def}", "%{INT:def}"
  """
}

returns

{"error":{"root_cause":[{"type":"parsing_exception","reason":"line 1:33: Invalid GROK pattern [(?:%{UUID:def})|(?:%{INT:def})]: the attribute [def] is defined multiple times with different types"}],"type":"parsing_exception","reason":"line 1:33: Invalid GROK pattern [(?:%{UUID:def})|(?:%{INT:def})]: the attribute [def] is defined multiple times with different types"},"status":400}

This can be considered syntactic sugar over a more complex manual pattern.

@elasticsearchmachine elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 14, 2025
@flash1293 flash1293 marked this pull request as ready for review October 16, 2025 14:10
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 16, 2025
@flash1293
Copy link
Contributor Author

Hmm, this fails BWC tests with a mixed cluster, which is expected since this is introducing a new syntax. I'm not sure how we handle this case, could you advise?

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @flash1293, LGTM

I left just a couple of minor observations

@benwtrent benwtrent added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Oct 16, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@luigidellaquila
Copy link
Contributor

Hmm, this fails BWC tests with a mixed cluster, which is expected since this is introducing a new syntax. I'm not sure how we handle this case, could you advise?

Sorry, I missed this comment.
You'll have to add a new capability in EsqlCapabilities and then add required_capability: your_new_capability (lowercase) to the CSV test.

@elasticsearchmachine
Copy link
Collaborator

Hi @flash1293, I've created a changelog YAML for you.

if (patterns.size() > 1) {
combinedPattern = "";
for (int i = 0; i < patterns.size(); i++) {
String pattern = patterns.get(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of an edge case, but what would happen if some pattern is invalid? For example:

  1. %{WORD:word}\
  2. a)
  3. %{WORD:word}

The final result would be: (?:%{WORD:word}\)|(?:a))|(?:%{WORD:word})

Which lead to unexpected patterns.

This is user-made, so I don't think this is very problematic, but this looks like a grok injection and the method combinePatterns() would actually be lying here.

So, solutions:

  • Can we verify and warn on wrong patterns? Is that something we do in some other case?
  • Can we sanitize or remove invalid patterns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the resolution, we'll need tests for this, both in Grok and in ESQL, depending on what we do

Copy link
Contributor Author

@flash1293 flash1293 Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fun one - tested with ingest pipelines and this indeed works here:

POST /_ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "foo": "Test)x"
      }
    }
  ],
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "foo",
          "patterns": [
            "%{WORD:word}\\",
            "x)"
          ]
        }
      }
    ]
  }
}

should fail because %{WORD:word}\\ and x) are not valid patterns, but it does pass, no warnings or similar.

I don't feel strongly about this case, we could pass the individual patterns down further and then validate this in the Grok lib, but:

  • It seems like a lot of work for an edge case
  • It's existing behavior in ingest pipelines
  • We would need to compile each regex individually, which sounds like it would add a bunch of additional computation we are avoiding now

I would lean towards leaving the current behavior, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the code was centralized, I guess the pipeline uses (and already used) the same logic right?

Being pragmatic, validating multiple patterns should error (probably?), which means that validating a single pattern should error too. So technically this is existing behavior, and fixing would be a breaking change (Or a bug fix).
So yeah, let's just make an issue explaining this case (For ESQL at least), and continue with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the code was centralized, I guess the pipeline uses (and already used) the same logic right?

Exactly!

So technically this is existing behavior, and fixing would be a breaking change

I never get tired of quoting Hyrums Law 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #136750

@flash1293 flash1293 merged commit 96396b4 into elastic:main Oct 17, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: Multiple patterns for grok command

5 participants