Skip to content

Conversation

@eyalkoren
Copy link
Contributor

@eyalkoren eyalkoren commented Mar 13, 2025

Adding an ECS namespacing processor requires operations on ingested documents that support dotted field names. All existing IngestDocument's APIs assume that dots in field paths can only represent notations for nested objects/fields and cannot be part of field names.
One option to bypass this limitation is to apply direct manipulations on the source map, without relying on the IngestDocument APIs.

However, this PR currently proposes to extend IngestDocument with the following APIs:

  • APIs to apply direct operations on fields in any level of the document with any field name, for example

    • getDirectChildFieldValue()
    • setDirectChildFieldValue()
    • hasDirectChildField()

    Most of these rely on existing code that is extracted from the existing dot-notation-oriented APIs, with the exception of getDirectChildFieldValue(), where I specifically wanted to avoid redundant allocations and avoid throwing all kinds of exceptions. The reason is that I want to use it in algorithms that need to be especially efficient as they may be used for general field values searches rather than looking for a value of a very specific field.

  • APIs to apply operations on the root level with any field name, including such that contains dots, for example:

    • getTopLevelFieldValue()
    • setTopLevelFieldValue()
    • hasTopLevelField()
  • getAllFieldValues() - collects all values of a given path from all document levels, assuming that any dot in the path may be either a notation for object nesting or part of the field name.

  • normalizeField():

    • collects all values of a given path from all document levels, assuming that any dot in the path may be either a notation for object nesting or part of the field name
    • removes them from their original location
    • sets all values as a list of values mapped to a top level field with the exact given path as its name

@eyalkoren eyalkoren requested a review from dakrone March 13, 2025 12:13
@eyalkoren eyalkoren self-assigned this Mar 13, 2025
@joegallo joegallo self-requested a review March 13, 2025 17:45
@eyalkoren
Copy link
Contributor Author

@joegallo @rjernst I guess these new IngestDocument APIs will not be required if/when we fully flatten all ingested documents. However, since they are completely additive (not affecting existing IngestDocument APIs) - I hope you would allow these two approaches to coexist. Maybe these new APIs can even cover what we wanted to achieve with the full flattening of ingested documents?

The reason I am proposing to add them now is that they can unblock us already in multiple fronts, in addition to what we need for ECS namespacing, for example:

WDYT?

@joegallo
Copy link
Contributor

Generally speaking, I'm hugely in favor of reviewing and merging small PRs (I do it all the time, and I appreciate when somebody else does the same). The problem with it is that there can be an element of being lead down the garden path -- maybe pull request 4 changes my understanding of pull request 1 and results in an objection (but pull request 1 has already been merged and released, so now we're in a pickle).

All that to say, can you show me the rest of the WIP? I want to see where this is going.

@eyalkoren
Copy link
Contributor Author

It's not going much further than what there is now with regards to APIs additions (and core changes in general). I think I'll only make the getDirectXXX APIs public, as they complete what you can do with ingested documents and maybe add a minor utility or two (like getAllTopLevelFields), but the main logic and complexity I am proposing is already there. I can't promise that I won't find additional things that I am missing in the future though 🙂 , this answers most of what I know we need right now.

The main reason why this is still WIP is that this PR is about the addition of a processor, for which there are still things we are figuring out, but shouldn't affect the core too much, only make use of it. Either way, I'll push what I have by the end of today, so you see how it becomes useful for our purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it seems relevant to this PR to have changes to this file. Can you tidy this up, please?

@eyalkoren
Copy link
Contributor Author

@joegallo @dakrone we are revisiting our own requirements right now, reconsidering whether we must provide implicit access to dotted fields such as proposed in this PR (thus we must compute and search values in all possible path permutations where each dot can be either a path-separator or part of the field name), or it is enough to provide only explicit access, where the user specifies how the path needs to be resolved (for example - using a dedicated syntax, like proposed in #125566).
If it is the latter, it greatly simplifies what we need to do in IngestDocument. Therefore, please wait a bit longer with the review.

I will open a different PR that applies ECS namespacing as if these APIs don't exist, either through direct manipulation of the source map, or with minor proposals for convenience APIs in IngestDocument.

@eyalkoren eyalkoren closed this Mar 26, 2025
@eyalkoren eyalkoren deleted the ECS-namespacing-processor branch March 26, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants