ECS Namespacing Processor #124735

eyalkoren · 2025-03-13T12:13:12Z

Adding an ECS namespacing processor requires operations on ingested documents that support dotted field names. All existing IngestDocument's APIs assume that dots in field paths can only represent notations for nested objects/fields and cannot be part of field names.
One option to bypass this limitation is to apply direct manipulations on the source map, without relying on the IngestDocument APIs.

However, this PR currently proposes to extend IngestDocument with the following APIs:

APIs to apply direct operations on fields in any level of the document with any field name, for example
- getDirectChildFieldValue()
- setDirectChildFieldValue()
- hasDirectChildField()
Most of these rely on existing code that is extracted from the existing dot-notation-oriented APIs, with the exception of getDirectChildFieldValue(), where I specifically wanted to avoid redundant allocations and avoid throwing all kinds of exceptions. The reason is that I want to use it in algorithms that need to be especially efficient as they may be used for general field values searches rather than looking for a value of a very specific field.
APIs to apply operations on the root level with any field name, including such that contains dots, for example:
- getTopLevelFieldValue()
- setTopLevelFieldValue()
- hasTopLevelField()
getAllFieldValues() - collects all values of a given path from all document levels, assuming that any dot in the path may be either a notation for object nesting or part of the field name.
normalizeField():
- collects all values of a given path from all document levels, assuming that any dot in the path may be either a notation for object nesting or part of the field name
- removes them from their original location
- sets all values as a list of values mapped to a top level field with the exact given path as its name

…cessor

eyalkoren · 2025-03-18T04:22:17Z

@joegallo @rjernst I guess these new IngestDocument APIs will not be required if/when we fully flatten all ingested documents. However, since they are completely additive (not affecting existing IngestDocument APIs) - I hope you would allow these two approaches to coexist. Maybe these new APIs can even cover what we wanted to achieve with the full flattening of ingested documents?

The reason I am proposing to add them now is that they can unblock us already in multiple fronts, in addition to what we need for ECS namespacing, for example:

what I tried to address with Support Fields API in conditional ingest processors #121914
support dotted fields access in the field option of multiple processors through a new syntax

WDYT?

…cessor

joegallo · 2025-03-19T17:08:18Z

Generally speaking, I'm hugely in favor of reviewing and merging small PRs (I do it all the time, and I appreciate when somebody else does the same). The problem with it is that there can be an element of being lead down the garden path -- maybe pull request 4 changes my understanding of pull request 1 and results in an objection (but pull request 1 has already been merged and released, so now we're in a pickle).

All that to say, can you show me the rest of the WIP? I want to see where this is going.

eyalkoren · 2025-03-20T06:00:04Z

It's not going much further than what there is now with regards to APIs additions (and core changes in general). I think I'll only make the getDirectXXX APIs public, as they complete what you can do with ingested documents and maybe add a minor utility or two (like getAllTopLevelFields), but the main logic and complexity I am proposing is already there. I can't promise that I won't find additional things that I am missing in the future though 🙂 , this answers most of what I know we need right now.

The main reason why this is still WIP is that this PR is about the addition of a processor, for which there are still things we are figuring out, but shouldn't affect the core too much, only make use of it. Either way, I'll push what I have by the end of today, so you see how it becomes useful for our purposes.

…cessor

joegallo · 2025-03-25T14:28:11Z

build-tools-internal/src/main/resources/minimumGradleVersion

I don't think it seems relevant to this PR to have changes to this file. Can you tidy this up, please?

eyalkoren · 2025-03-25T16:21:18Z

@joegallo @dakrone we are revisiting our own requirements right now, reconsidering whether we must provide implicit access to dotted fields such as proposed in this PR (thus we must compute and search values in all possible path permutations where each dot can be either a path-separator or part of the field name), or it is enough to provide only explicit access, where the user specifies how the path needs to be resolved (for example - using a dedicated syntax, like proposed in #125566).
If it is the latter, it greatly simplifies what we need to do in IngestDocument. Therefore, please wait a bit longer with the review.

I will open a different PR that applies ECS namespacing as if these APIs don't exist, either through direct manipulation of the source map, or with minor proposals for convenience APIs in IngestDocument.

…cessor

…to ECS-namespacing-processor

eyalkoren added 5 commits March 12, 2025 19:27

Adding APIs for dotted fields to IngestDocument

d00c6e9

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

7550fa9

…cessor

Merging and completing IngestDocument APIs

4a9de22

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

fb7318b

…cessor

Removing EcsNamespacingProcessor for now

e6200db

eyalkoren requested a review from dakrone March 13, 2025 12:13

eyalkoren self-assigned this Mar 13, 2025

elasticsearchmachine added the v9.1.0 label Mar 13, 2025

Small javadoc clarification

1a5186b

joegallo self-requested a review March 13, 2025 17:45

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

08830ee

…cessor

eyalkoren added 7 commits March 20, 2025 18:55

Adding initial ECS namespacing processor

011139e

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

c36a109

…cessor

Merge completion

5d5d26e

Restoring minimum Gradle version

6b5bb05

Restoring minimumGradleVersion

f6b3e4e

Test

e4f26c3

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

a689661

…cessor

joegallo reviewed Mar 25, 2025

View reviewed changes

eyalkoren added 3 commits March 25, 2025 18:46

Initial microbenchmark test

4eaa17c

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

43b69c7

…cessor

Merge remote-tracking branch 'eyalkoren/ECS-namespacing-processor' in…

d2bcd85

…to ECS-namespacing-processor

eyalkoren closed this Mar 26, 2025

eyalkoren deleted the ECS-namespacing-processor branch March 26, 2025 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ECS Namespacing Processor #124735

ECS Namespacing Processor #124735

Uh oh!

eyalkoren commented Mar 13, 2025 •

edited

Loading

Uh oh!

eyalkoren commented Mar 18, 2025

Uh oh!

joegallo commented Mar 19, 2025

Uh oh!

eyalkoren commented Mar 20, 2025

Uh oh!

joegallo Mar 25, 2025

Uh oh!

eyalkoren commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ECS Namespacing Processor #124735

ECS Namespacing Processor #124735

Uh oh!

Conversation

eyalkoren commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eyalkoren commented Mar 18, 2025

Uh oh!

joegallo commented Mar 19, 2025

Uh oh!

eyalkoren commented Mar 20, 2025

Uh oh!

joegallo Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

eyalkoren commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eyalkoren commented Mar 13, 2025 •

edited

Loading