- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
[apm-data] Set event.dataset if empty for logs #129074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
011910c    to
    71b8459      
    Compare
  
    | 
           Pinging @elastic/es-data-management (Team:Data Management)  | 
    
| 
           Hi @carsonip, I've created a changelog YAML for you.  | 
    
          Testing notesTested with adding the processor to default pipeline. Output: {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".ds-logs-apm.app.foo-bar-2025.06.09-000001",
        "_id": "_WCQVZcBbmWL1cSKlTSi",
        "_score": 1,
        "_source": {
          "@timestamp": 1,
          "data_stream": {
            "namespace": "bar",
            "type": "logs",
            "dataset": "apm.app.foo"
          },
          "event": {
            "ingested": "2025-06-09T16:40:28Z",
            "dataset": "apm.app.foo"
          }
        }
      },
      {
        "_index": ".ds-logs-apm.app.foo-bar-2025.06.09-000001",
        "_id": "_mCQVZcBbmWL1cSKoTRm",
        "_score": 1,
        "_source": {
          "@timestamp": 1,
          "data_stream": {
            "namespace": "bar",
            "type": "logs",
            "dataset": "apm.app.foo"
          },
          "event": {
            "ingested": "2025-06-09T16:40:31Z",
            "dataset": "ds"
          }
        }
      }
    ]
  }
} | 
    
3a8135a    to
    7add32d      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a question about sending docs to logs-apm.* without explicitly setting data_stream.dataset. Not sure if that's a thing we need to cater for.
| - set: | ||
| if: "ctx.data_stream?.dataset != null" | ||
| field: event.dataset | ||
| value: "{{{data_stream.dataset}}}" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we use copy_from and ignore_empty_value, and remove the if and value?
(See https://www.elastic.co/docs/reference/enrich-processor/set-processor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source? If that's the case, I'm not sure they will be in ctx.*, and we may need to have a fallback.
Can you add test cases for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source?
No, at least not visible to the log ingest pipeline.
If that's the case, I'm not sure they will be in ctx.*
Correct, they will not be in ctx.
we may need to have a fallback.
Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue. Unless the user explicitly remove the DS fields in an intermediate ingest pipeline, DS fields should always be present in docs from apm-server (even if reroute processor is used).
In any case, I've added test in 957d9ef to confirm this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue.
I'm OK with this. I don't think it's a real thing we need to handle, even if it's technically possible.
| 
           Is there a reason why you switched away from  Can this be considered a breaking change in behavior?  | 
    
          
 The new   | 
    
| 
           Ahh okay, that makes sense. I missed the call out to the original   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, and sorry for the delay
          💔 Backport failed
 You can use sqren/backport to manually backport by running   | 
    
          💚 All backports created successfully
 Questions ?Please refer to the Backport tool documentation  | 
    
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream. (cherry picked from commit 466afba) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream. (cherry picked from commit 466afba) # Conflicts: # test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
| 
           Tested manually with steps outlined in #129074 (comment) With version 9.0.3With version 9.1.0By inspecting the outputs above, version   | 
    
For APM logs, set
event.datasettodata_stream.datasetifevent.datasetis empty, to satisfy Anomaly Detection's requirement to haveevent.datasetin everylogs-*data stream.Fixes elastic/apm-server#16397