-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Support flexible field access pattern in ingest pipelines #133270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support flexible field access pattern in ingest pipelines #133270
Conversation
The structure of this new access pattern algorithm is very similar across all methods (get, set, hasField, remove), but is clunky to extract into a reuseable bit of code without making the result worse to read than duplicating the structure in every method. This is because it would need to return multiple values, and each method needs to perform slightly different logic in the conditional branches. Added tests, of which several will require updates when we add support for arrays.
Pinging @elastic/es-data-management (Team:Data Management) |
server/src/test/java/org/elasticsearch/ingest/IngestDocumentTests.java
Outdated
Show resolved
Hide resolved
I haven't read the code yet, but I do intend to -- I'll also spend some time early next week on benchmarking this just to see if anything interesting pops out. |
switch (getCurrentAccessPatternSafe()) { | ||
case CLASSIC -> context = map.get(pathElement); | ||
case FLEXIBLE -> { | ||
Object object = map.getOrDefault(pathElement, NOT_FOUND); | ||
if (object != NOT_FOUND) { | ||
context = object; | ||
} else if (i == lastContainerIndex) { | ||
// This is the last path element, update the leaf key to use this path element as a dotted prefix. | ||
// Leave the context as it is. | ||
leafKey = pathElement + "." + leafKey; | ||
} else { | ||
// Iterate through the remaining path elements, joining them with dots, until we get a hit | ||
String combinedPath = pathElement; | ||
for (int j = i + 1; j <= lastContainerIndex; j++) { | ||
combinedPath = combinedPath + "." + fieldPath.pathElements[j]; | ||
object = map.getOrDefault(combinedPath, NOT_FOUND); // getOrDefault is faster than containsKey + get | ||
if (object != NOT_FOUND) { | ||
// Found one, update the outer loop index to skip past the elements we've used | ||
context = object; | ||
i = j; | ||
break; | ||
} | ||
} | ||
if (object == NOT_FOUND) { | ||
// Made it to the last path element without finding the field. | ||
// Update the leaf key to use the visited combined path elements as a dotted prefix. | ||
leafKey = combinedPath + "." + leafKey; | ||
// Update outer loop index to skip past the elements we've used | ||
i = lastContainerIndex; | ||
} | ||
} | ||
} | ||
} | ||
} else if (context instanceof Map<?, ?> map) { | ||
context = map.get(pathElement); | ||
switch (getCurrentAccessPatternSafe()) { | ||
case CLASSIC -> context = map.get(pathElement); | ||
case FLEXIBLE -> { | ||
@SuppressWarnings("unchecked") | ||
Map<String, Object> typedMap = (Map<String, Object>) context; | ||
Object object = typedMap.getOrDefault(pathElement, NOT_FOUND); | ||
if (object != NOT_FOUND) { | ||
context = object; | ||
} else if (i == lastContainerIndex) { | ||
// This is the last path element, update the leaf key to use this path element as a dotted prefix. | ||
// Leave the context as it is. | ||
leafKey = pathElement + "." + leafKey; | ||
} else { | ||
// Iterate through the remaining path elements, joining them with dots, until we get a hit | ||
String combinedPath = pathElement; | ||
for (int j = i + 1; j <= lastContainerIndex; j++) { | ||
combinedPath = combinedPath + "." + fieldPath.pathElements[j]; | ||
object = typedMap.getOrDefault(combinedPath, NOT_FOUND); // getOrDefault is faster than containsKey + get | ||
if (object != NOT_FOUND) { | ||
// Found one, update the outer loop index to skip past the elements we've used | ||
context = object; | ||
i = j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two blocks identical? It didn't seem like a big deal to have them separate when they were 1-liners, but it seems kind of risky to have this much copy/pasted code. Or maybe there's some difference I'm missing? They're really both just operating on Map<String, Object>, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we need to just make the code different between the different paths? I generated code coverage when i ran the tests, and certain lines in certain branches I just can't hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are both indeed just operating on Map<String,Object>
- I'm not sure I understand the optimization work that has gone into the splitting of these code paths, maybe it's avoiding polymorphic access overhead in favor of the monomorphic usage of the types. I think @joegallo might have more insight into the reasoning. I'm just following his lead.
A lot of these algorithms are nearly identical to each other, with some small differences in the last operations they perform (some return true/false for if a field exists, others need to return a context and how much it couldn't resolve, etc). Additionally, splitting these into their own methods will require multiple return values in the form of updating the outer loop index i
, the context
object that has been selected, and in some cases the building out of the leafKey
variable. I could make these algorithms return all of that packaged in a record class and unpack it at the call site, and inject the different logic via some functions, but there's just enough different between all of them that I worry the result will not be that much easier to follow or maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's monomorphic callsites -- in a resolution of foo.bar.baz
the foo
lookup will be against an IngestCtxMap
but the bar
and baz
lookups will be against HashMap
s.
And that's the usual pattern -- always the first lookup against an IngestCtxMap
, and then all subsequent lookups against something else (mostly HashMap
).
server/src/main/java/org/elasticsearch/ingest/IngestDocument.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The tests are really thorough and helpful. I left a few questions about code reuse / unreachable code. It's also worth seeing how this impacts performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds flexible field access pattern support to ingest pipelines, allowing progressive scanning for dotted field names during document processing. This enables pipelines to handle documents with various field naming patterns.
Key Changes:
- Added new field access pattern functionality to ingest document operations
- Enhanced field resolution to support dotted field name combinations
- Added comprehensive test coverage for flexible access pattern features
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
server/src/main/java/org/elasticsearch/rest/action/ingest/RestPutPipelineAction.java | Adds capability flag for flexible field access pattern support |
server/src/main/java/org/elasticsearch/ingest/IngestService.java | Updates cluster state usage for feature detection |
server/src/main/java/org/elasticsearch/ingest/IngestDocument.java | Implements flexible field access pattern logic for get, set, has, and remove operations |
modules/ingest-common/src/yamlRestTest/resources/rest-api-spec/test/ingest/340_flexible_access_pattern.yml | Adds comprehensive test cases for flexible access pattern functionality |
modules/ingest-common/build.gradle | Adds capabilities API to test dependencies |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
server/src/main/java/org/elasticsearch/ingest/IngestDocument.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/ingest/IngestDocument.java
Outdated
Show resolved
Hide resolved
…3270) Adds support for the flexible field access pattern within ingest pipelines, which provides the ability to progressively scan for fields on an ingest document in a way that includes dotted field names --------- Co-authored-by: elasticsearchmachine <[email protected]>
Adds support for the
flexible
field access pattern within ingest pipelines, which provides the ability to progressively scan for fields on an ingest document in a way that includes dotted field names:Setting
field_access_pattern
toclassic
supports reading the fielda.b.c.d
from the following document:Setting
field_access_pattern
toflexible
supports reading the fielda.b.c.d
from the following documents:When writing and removing fields with flexible mode, as much of the path is resolved as it can be following the abilities of detailed above. When the field path is exhausted, any unresolved fields will be concatenated together to write the field to the document. Note: This does not account for field names that resolve to non-object values.
This results in the following outputs when setting
a.b.c.d.e.f
to"new_value"
{"foo": "bar"}
{"foo": "bar", "a.b.c.d.e.f": "new_value"}
{"a": {"foo": "bar"}}
{"a": {"foo": "bar", "b.c.d.e.f": "new_value"}}
{"a": {"b": {"foo": "bar"}}}
{"a": {"b": {"foo": "bar", "c.d.e.f": "new_value"}}}
{"a": {"b": {"c": {"foo": "bar"}}}}
{"a": {"b": {"c": {"foo": "bar", "d.e.f": "new_value"}}}}
{"a": {"b": {"c": {"d": {"foo": "bar"}}}}}
{"a": {"b": {"c": {"d": {"foo": "bar", "e.f": "new_value"}}}}}
{"a.b": {"foo": "bar"}}
{"a.b": {"foo": "bar", "b.c.d.e.f": "new_value"}}
{"a.b": {"c": {"foo": "bar"}}}
{"a.b": {"c": {"foo": "bar", "d.e.f": "new_value"}}}
{"a.b": {"c": {"d": {"foo": "bar"}}}}
{"a.b": {"c": {"d": {"foo": "bar", "e.f": "new_value"}}}}
{"a": {"b.c": {"d": {"foo": "bar"}}}}
{"a": {"b.c": {"d": {"foo": "bar", "e.f": "new_value"}}}}
{"a": {"b": {"c.d": {"foo": "bar"}}}}
{"a": {"b": {"c.d": {"foo": "bar", "e.f": "new_value"}}}}
{"a.b": {"c": {"d": "value"}}}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]
{"a": {"b.c": {"d": "value"}}}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]
{"a": {"b": {"c.d": "value"}}}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]
{"a.b": {"c.d": "value"}}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]
{"a.b.c": {"d": "value"}}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]
{"a": {"b.c.d": "value"}}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]
{"a.b.c.d": "value"}
cannot set [e.f] with parent object of type [java.lang.String] as part of path [a.b.c.d.e.f]