-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Multiple patterns for grok command #136541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Multiple patterns for grok command #136541
Conversation
…93/elasticsearch into flash1293/grok-multiple-patterns
Hmm, this fails BWC tests with a mixed cluster, which is expected since this is introducing a new syntax. I'm not sure how we handle this case, could you advise? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @flash1293, LGTM
I left just a couple of minor observations
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java
Outdated
Show resolved
Hide resolved
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Sorry, I missed this comment. |
Hi @flash1293, I've created a changelog YAML for you. |
if (patterns.size() > 1) { | ||
combinedPattern = ""; | ||
for (int i = 0; i < patterns.size(); i++) { | ||
String pattern = patterns.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of an edge case, but what would happen if some pattern is invalid? For example:
- %{WORD:word}\
- a)
- %{WORD:word}
The final result would be: (?:%{WORD:word}\)|(?:a))|(?:%{WORD:word})
Which lead to unexpected patterns.
This is user-made, so I don't think this is very problematic, but this looks like a grok injection and the method combinePatterns()
would actually be lying here.
So, solutions:
- Can we verify and warn on wrong patterns? Is that something we do in some other case?
- Can we sanitize or remove invalid patterns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever the resolution, we'll need tests for this, both in Grok and in ESQL, depending on what we do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fun one - tested with ingest pipelines and this indeed works here:
POST /_ingest/pipeline/_simulate
{
"docs": [
{
"_source": {
"foo": "Test)x"
}
}
],
"pipeline": {
"processors": [
{
"grok": {
"field": "foo",
"patterns": [
"%{WORD:word}\\",
"x)"
]
}
}
]
}
}
should fail because %{WORD:word}\\
and x)
are not valid patterns, but it does pass, no warnings or similar.
I don't feel strongly about this case, we could pass the individual patterns down further and then validate this in the Grok lib, but:
- It seems like a lot of work for an edge case
- It's existing behavior in ingest pipelines
- We would need to compile each regex individually, which sounds like it would add a bunch of additional computation we are avoiding now
I would lean towards leaving the current behavior, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the code was centralized, I guess the pipeline uses (and already used) the same logic right?
Being pragmatic, validating multiple patterns should error (probably?), which means that validating a single pattern should error too. So technically this is existing behavior, and fixing would be a breaking change (Or a bug fix).
So yeah, let's just make an issue explaining this case (For ESQL at least), and continue with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the code was centralized, I guess the pipeline uses (and already used) the same logic right?
Exactly!
So technically this is existing behavior, and fixing would be a breaking change
I never get tired of quoting Hyrums Law 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #136750
Closes #132486
This PR adds the ability to specify multiple grok patterns as part of a single grok command. Consistent with the grok processor for ingest pipelines, they are tried in order, the first matching one is actually applied:
returns
It's not allowed to have different types for the same semantic names in different patterns:
returns
This can be considered syntactic sugar over a more complex manual pattern.