Adding `NormalizeForStreamProcessor` #125699

eyalkoren · 2025-03-26T17:14:03Z

Namespacing algorithm [EDIT 1]:

start by checking whether the document is OTel or not. A document is considered OTel if:
- resource exists as a key and the value is a map
- resource either doesn't contain an attributes field, or contains an attributes field of type map
- scope is either missing or a map
- attributes is either missing or a map
- body is either missing or a map
- body either doesn't contain a text field, or contains a text field of type String
- body either doesn't contain a structured field, or contains a structured field that is not of type String
if it is OTel - return as is
if it is not OTel:
- create a new attributes map
- create new resource map with one entry of which attributes is the key and a new map as its value
- move the following top level fields (if they exist) to the new attributes map: attributes, resource, span_id, body, severity_text and trace_id
- add the new attributes and resource maps as top level fields
- rename special keys (e.g. span.id, log.level) to OTel-compliant names: for each, look for a value first in the nested form and if not found look for a top level dotted field. The first value that is found is used for the renamed field
- move all remaining top level fields, other than @timestamp, trace_id, span_id, severity_text, body, attributes, resource and scope to the new attributes map
- flatten all fields that are not arrays in attributes
- move specific attributes that describe resources from attributes to resource.attributes

…cessor

elasticsearchmachine · 2025-03-27T16:07:03Z

Hi @eyalkoren, I've created a changelog YAML for you.

elasticsearchmachine · 2025-03-27T16:07:27Z

Pinging @elastic/es-data-management (Team:Data Management)

dakrone

Thanks Eyal, I left some initial comments. I had a question about the way that we nest a document with an existing attributes field into the OTel attributes. Is this something we want to do? For example this doc:

{
  "attributes": {
    "a": "b",
    "c": [1, 2, 3]
  },
  "log.level": "1234"
}

Becomes, after processing:

{
  "resource": {
    "attributes": {}
  },
  "severity_text": "1234",
  "attributes": {
    "attributes.a": "b",
    "attributes.c": [1, 2, 3]
  }
}

Is that the desired behavior for an existing attributes field?

modules/ingest-ecs/src/main/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessor.java

modules/ingest-ecs/src/test/java/org/elasticsearch/ingest/ecs/EcsNamespacingProcessorTests.java

eyalkoren · 2025-03-30T03:22:21Z

Answering the general question:

I had a question about the way that we nest a document with an existing attributes field into the OTel attributes. Is this something we want to do?

We started off by trying to be more "clever" about this and merge OTel with non-OTel. Then we had to handle lots of corner cases, like:

if attributes exists and is not a map - it needs to go into a new attributes map
if resource exists and is not a map - it needs to go into a new attributes map
if there was attributes map before that included a resource entry, and the top-level resource is not a map, we need to make sure that the new attributes.resource entry's value becomes an array that includes both values
same for resource.attribtues
if both span.id and span_id exist - we need to do something about it, for example: make the new span_id an array with two values

And so forth.

So last week we decided to change the way we think about it: a document is either sent by an OTel-compliant shipper, or not. If not, no reason to treat the original fields as if they have the OTel sematics. So even if it has a field that has an OTel name, we can consider it to be by chance and namespace it. If that's so- no reason to complicate things for the unlikely event where non-OTel documents contain fields with intended OTel semantics.

…cessor

Co-authored-by: Lee Hinman <[email protected]>

…to ECS-namespacing-processor

…cessor

joegallo

I opened a ticket for us to track adding the periodic ci to run the otel-semver-crawler test bits, but I'm fine with that staying disabled for now on this PR.

I see also that there's a perhaps unfinished conversation about the wording of the documentation. I'm okay if that's either resolved here and now, or if we fuss with the wording of things in a follow up PR.

✅

joegallo · 2025-05-30T21:07:09Z

I believe it's the case that this PR is intended to be backported to 8.19.0, so I'm going to add that label, and also add the label for attempting to automatically backport it. Feel free to remove those labels if this is not actually intended for 8.19.0, though.

dakrone · 2025-05-31T05:10:56Z

This should go to 8.19

eyalkoren · 2025-06-03T05:08:32Z

CI keeps being unhappy with random stuff, I don't think related to this PR contents.
Once we get a green build, are we good to merge?

Regarding the followup task of running the test nightly - do you want me to open a GH issue with summary of what I already discussed with the delivery team?

joegallo · 2025-06-03T17:09:55Z

Once we get a green build, are we good to merge?

For my part, yes. 👍 (And quick, it's green!)

joegallo · 2025-06-03T17:10:24Z

Regarding the followup task of running the test nightly - do you want me to open a GH issue with summary of what I already discussed with the delivery team?

I'll reach out to you offline.

…or` (#134524) Fixes field querying and writing logic for NormalizeForStreamProcessor so that it can function on both `classic` and `flexible` ingest pipeline access patterns. NormalizeForStreamProcessor was added in #125699 with support for the default ingest node field access logic (now known as `classic` mode). We have since added support for the `flexible` access pattern in ingest pipelines, which allows for querying dotted field names and writing dotted field names when parent path elements are missing. The NormalizeForStreamProcessor was written with the classic access pattern in mind. The processor was designed to look for singular field names and to rely on the classic field writing logic which creates intermediate parent objects when setting a value that is nested in the document. When flexible mode was enabled, the logic did not anticipate dotted field names that could be inconsistently accessible from the source map at certain points in the path notation. Further, the flexible access pattern does not create intermediate parent objects like before. A secondary renaming method was added to take these changes into account.

…or` (elastic#134524) Fixes field querying and writing logic for NormalizeForStreamProcessor so that it can function on both `classic` and `flexible` ingest pipeline access patterns. NormalizeForStreamProcessor was added in elastic#125699 with support for the default ingest node field access logic (now known as `classic` mode). We have since added support for the `flexible` access pattern in ingest pipelines, which allows for querying dotted field names and writing dotted field names when parent path elements are missing. The NormalizeForStreamProcessor was written with the classic access pattern in mind. The processor was designed to look for singular field names and to rely on the classic field writing logic which creates intermediate parent objects when setting a value that is nested in the document. When flexible mode was enabled, the logic did not anticipate dotted field names that could be inconsistently accessible from the source map at certain points in the path notation. Further, the flexible access pattern does not create intermediate parent objects like before. A secondary renaming method was added to take these changes into account.

Adding EcsNamespacingProcessor

30b0ed3

elasticsearchmachine added the v9.1.0 label Mar 26, 2025

eyalkoren added 8 commits March 27, 2025 07:49

Adding module-info

b96cc0c

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

469df61

…cessor

Exposing and testing the processor

2bd819f

Add test and some algorithm fixes

1c2a670

Making scope non-mandatory

904c19c

Minimize dependencies

bd75b06

Extending REST tests

50f3c4d

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

cb0dcee

…cessor

eyalkoren self-assigned this Mar 27, 2025

eyalkoren added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Mar 27, 2025

eyalkoren marked this pull request as ready for review March 27, 2025 16:06

Update docs/changelog/125699.yaml

f68cf93

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 27, 2025

github-actions bot deployed to docs-preview March 27, 2025 16:07 View deployment

joegallo requested review from dakrone and joegallo March 27, 2025 18:37

dakrone requested changes Mar 27, 2025

View reviewed changes

eyalkoren and others added 2 commits March 30, 2025 09:53

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

9dbe94b

…cessor

instanceOf with pattern matching

79bf683

Co-authored-by: Lee Hinman <[email protected]>

github-actions bot had a problem deploying to docs-preview March 30, 2025 07:05 Failure

eyalkoren and others added 4 commits March 30, 2025 10:06

instanceOf with pattern matching

dbc4d4a

Co-authored-by: Lee Hinman <[email protected]>

revert constants usage

d160fe6

Co-authored-by: Lee Hinman <[email protected]>

Merge remote-tracking branch 'eyalkoren/ECS-namespacing-processor' in…

dfe33fc

…to ECS-namespacing-processor

Complete review change proposals

2b77e3c

eyalkoren added 7 commits May 22, 2025 11:20

Disabling the tests temporarily with @ignore

a259ce6

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

9d77fcd

…cessor

Adding to toc

45f02b6

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

675ef90

…cessor

Remove GitHub-API-based crawler

2b8441d

Merge remote-tracking branch 'upstream/main' into ECS-namespacing-pro…

bf906e2

…cessor

Refactor: renaming to normalize_for_stream

f279b7a

eyalkoren changed the title ~~Adding EcsNamespaceProcessor~~ Adding NormalizeToStreamProcessor May 29, 2025

Merge branch 'main' into ECS-namespacing-processor

44ac64b

dakrone changed the title ~~Adding NormalizeToStreamProcessor~~ Adding NormalizeForStreamProcessor May 29, 2025

Merge branch 'main' into ECS-namespacing-processor

55885a8

joegallo approved these changes May 30, 2025

View reviewed changes

joegallo added auto-backport Automatically create backport pull requests when merged v8.19.0 labels May 30, 2025

joegallo and others added 2 commits June 2, 2025 16:21

Merge branch 'main' into ECS-namespacing-processor

978b163

Merge branch 'main' into ECS-namespacing-processor

bc4a015

joegallo merged commit d3d2d9b into elastic:main Jun 3, 2025
17 checks passed

joegallo mentioned this pull request Jun 6, 2025

[8.19] Adding NormalizeForStreamProcessor #129092

Merged

joegallo added the backport pending label Jun 6, 2025

flash1293 mentioned this pull request Jul 18, 2025

Add observed_timestamp to list of special fields in normalize_for_stream processor #131524

Closed

jbaiera mentioned this pull request Sep 11, 2025

Add support for flexible access pattern to NormalizeForStreamProcessor #134524

Merged

masseyke mentioned this pull request Sep 12, 2025

Removing the logs stream feature flag #134649

Merged

Adding NormalizeForStreamProcessor #125699

Adding NormalizeForStreamProcessor #125699

Uh oh!

Conversation

eyalkoren commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 27, 2025

Uh oh!

elasticsearchmachine commented Mar 27, 2025

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eyalkoren commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joegallo left a comment

Choose a reason for hiding this comment

Uh oh!

joegallo commented May 30, 2025

Uh oh!

dakrone commented May 31, 2025

Uh oh!

eyalkoren commented Jun 3, 2025

Uh oh!

joegallo commented Jun 3, 2025

Uh oh!

joegallo commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Adding `NormalizeForStreamProcessor` #125699

Adding `NormalizeForStreamProcessor` #125699

eyalkoren commented Mar 26, 2025 •

edited

Loading

eyalkoren commented Mar 30, 2025 •

edited

Loading