Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/129074.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 129074
summary: "[apm-data] Set `event.dataset` if empty for logs"
area: Data streams
type: bug
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -2362,7 +2362,7 @@ protected static boolean isXPackIngestPipeline(String id) {
}
return switch (id) {
case "logs-default-pipeline", "logs@default-pipeline", "logs@json-message", "logs@json-pipeline" -> true;
case "apm@pipeline", "traces-apm@pipeline", "metrics-apm@pipeline" -> true;
case "apm@pipeline", "traces-apm@pipeline", "metrics-apm@pipeline", "logs-apm@pipeline" -> true;
case "behavioral_analytics-events-final_pipeline", "ent-search-generic-ingestion", "search-default-ingestion" -> true;
case "reindex-data-stream-pipeline" -> true;
default -> false;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ template:
settings:
index:
default_pipeline: logs-apm.app@default-pipeline
final_pipeline: apm@pipeline
final_pipeline: logs-apm@pipeline
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ template:
settings:
index:
default_pipeline: logs-apm.error@default-pipeline
final_pipeline: apm@pipeline
final_pipeline: logs-apm@pipeline
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
version: ${xpack.apmdata.template.version}
_meta:
managed: true
description: Built-in ingest pipeline for logs-apm.*-* data streams
processors:
# Set event.dataset if unset to meet Anomaly Detection requirements
- set:
if: "ctx.data_stream?.dataset != null"
field: event.dataset
value: "{{{data_stream.dataset}}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we use copy_from and ignore_empty_value, and remove the if and value?
(See https://www.elastic.co/docs/reference/enrich-processor/set-processor)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source? If that's the case, I'm not sure they will be in ctx.*, and we may need to have a fallback.

Can you add test cases for that?

Copy link
Member Author

@carsonip carsonip Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will data_stream.dataset be automatically set for the logs-apm.* data stream documents, if they're not specified in _source?

No, at least not visible to the log ingest pipeline.

If that's the case, I'm not sure they will be in ctx.*

Correct, they will not be in ctx.

we may need to have a fallback.

Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue. Unless the user explicitly remove the DS fields in an intermediate ingest pipeline, DS fields should always be present in docs from apm-server (even if reroute processor is used).

In any case, I've added test in 957d9ef to confirm this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a fallback to a non-null event.dataset in the case where both data_stream.dataset and event.dataset is missing? I'm inclined to not perform any special case handling to make things up because it sounds more like a misconfiguration issue.

I'm OK with this. I don't think it's a real thing we need to handle, even if it's technically possible.

override: false
- pipeline:
name: apm@pipeline
5 changes: 4 additions & 1 deletion x-pack/plugin/apm-data/src/main/resources/resources.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# "version" holds the version of the templates and ingest pipelines installed
# by xpack-plugin apm-data. This must be increased whenever an existing template or
# pipeline is changed, in order for it to be updated on Elasticsearch upgrade.
version: 14
version: 15

component-templates:
# Data lifecycle.
Expand Down Expand Up @@ -97,6 +97,9 @@ ingest-pipelines:
- metrics-apm@pipeline:
dependencies:
- apm@pipeline
- logs-apm@pipeline:
dependencies:
- apm@pipeline

lifecycle-policies:
- logs-apm.app_logs-default_policy
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
setup:
- do:
cluster.health:
wait_for_events: languid
---
"Test logs-apm.error-* event.dataset field":
- do:
bulk:
index: logs-apm.error-eventdataset
refresh: true
body:
- create: {}
- '{"@timestamp": "2017-06-22", "data_stream": {"type": "logs", "dataset": "apm.error", "namespace": "eventdataset"}, "log": {"level": "error"}, "error": {"log": {"message": "loglevel"}, "exception": [{"message": "exception_used"}]}}'

- create: {}
- '{"@timestamp": "2017-06-22", "data_stream": {"type": "logs", "dataset": "apm.error", "namespace": "eventdataset"}, "event": {"dataset": "foo"}, "log": {"level": "error"}, "error": {"log": {"message": "loglevel"}, "exception": [{"message": "exception_used"}]}}'

- is_false: errors

- do:
search:
index: logs-apm.error-eventdataset
body:
fields: ["event.dataset"]
- length: { hits.hits: 2 }
- match: { hits.hits.0.fields: { "event.dataset": ["apm.error"] } }
- match: { hits.hits.1.fields: { "event.dataset": ["foo"] } }
---
"Test logs-apm.app.*-* event.dataset field":
- do:
bulk:
index: logs-apm.app.foo-eventdataset
refresh: true
body:
- create: {}
- '{"@timestamp": "2017-06-22", "data_stream": {"type": "logs", "dataset": "apm.app.foo", "namespace": "eventdataset"}, "message": "foo"}'

- create: {}
- '{"@timestamp": "2017-06-22", "data_stream": {"type": "logs", "dataset": "apm.app.foo", "namespace": "eventdataset"}, "event": {"dataset": "foo"}, "message": "foo"}'

- is_false: errors

- do:
search:
index: logs-apm.app.foo-eventdataset
body:
fields: ["event.dataset"]
- length: { hits.hits: 2 }
- match: { hits.hits.0.fields: { "event.dataset": ["apm.app.foo"] } }
- match: { hits.hits.1.fields: { "event.dataset": ["foo"] } }

Loading