Skip to content

Comments

[storage] Add Elasticsearch data stream support#7768

Open
SoumyaRaikwar wants to merge 14 commits intojaegertracing:mainfrom
SoumyaRaikwar:feature/es-datastream-support
Open

[storage] Add Elasticsearch data stream support#7768
SoumyaRaikwar wants to merge 14 commits intojaegertracing:mainfrom
SoumyaRaikwar:feature/es-datastream-support

Conversation

@SoumyaRaikwar
Copy link
Contributor

@SoumyaRaikwar SoumyaRaikwar commented Dec 25, 2025

Description

This PR implements Elasticsearch Data Stream support specifically for Spans, following the design in ADR-004.

Part of #4708

Key Changes

Data Stream Implementation

  • Span Writer: Updates SpanWriter to support Data Stream naming conventions and usage (e.g., jaeger-span-ds).
  • OpType Support: Updates IndexService.Add() interface to Add(opType string). This allows the writer to explicitly set op_type=create, which is required for Data Streams.
    • Note: This interface change required cascading updates to client.go, mocks, samplingstore, and depstore.
  • Mappings: Updates jaeger-span-8.json template to conditionally exclude settings incompatible with Data Streams (e.g., rollover_alias, index.requests.cache). Adds jaeger-span-8-ds.json fixture for testing.

Configuration & cleanup

  • Config: Adds UseDataStream flag with validation logic.
  • Refactor: Consolidates index naming logic using config.IndexWithDate and removes unused code in depstore.

Deferred to Follow-up PRs

  • Data Stream support for Sampling & Dependencies
  • Service mappings changes
  • UseISM / EnableLogsDB configs

Verification

  • Unit Tests: Verified spanstore, config, and mappings tests pass.
  • Manual Verification: Validated that Spans are correctly written to Data Streams with op_type=create when UseDataStream=true.

@codecov
Copy link

codecov bot commented Dec 25, 2025

Codecov Report

❌ Patch coverage is 77.19298% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.29%. Comparing base (92ac7cd) to head (511b1c6).

Files with missing lines Patch % Lines
...ernal/storage/v1/elasticsearch/spanstore/reader.go 65.11% 13 Missing and 2 partials ⚠️
internal/storage/elasticsearch/config/config.go 50.00% 3 Missing and 2 partials ⚠️
internal/storage/elasticsearch/textTemplate.go 20.00% 3 Missing and 1 partial ⚠️
...ernal/storage/v1/elasticsearch/spanstore/writer.go 90.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7768      +/-   ##
==========================================
- Coverage   95.48%   95.29%   -0.19%     
==========================================
  Files         316      316              
  Lines       16756    16833      +77     
==========================================
+ Hits        15999    16041      +42     
- Misses        592      622      +30     
- Partials      165      170       +5     
Flag Coverage Δ
badger_v1 9.12% <5.26%> (-0.02%) ⬇️
badger_v2 1.39% <5.26%> (+0.04%) ⬆️
cassandra-4.x-v1-manual 13.27% <5.26%> (-0.05%) ⬇️
cassandra-4.x-v2-auto 1.38% <5.26%> (+0.04%) ⬆️
cassandra-4.x-v2-manual 1.38% <5.26%> (+0.04%) ⬆️
cassandra-5.x-v1-manual 13.27% <5.26%> (-0.05%) ⬇️
cassandra-5.x-v2-auto 1.38% <5.26%> (+0.04%) ⬆️
cassandra-5.x-v2-manual 1.38% <5.26%> (+0.04%) ⬆️
clickhouse 1.47% <5.26%> (+0.04%) ⬆️
elasticsearch-6.x-v1 ?
elasticsearch-7.x-v1 17.11% <56.14%> (+0.17%) ⬆️
elasticsearch-8.x-v1 ?
elasticsearch-8.x-v2 ?
elasticsearch-9.x-v2 ?
grpc_v1 8.12% <5.26%> (-0.01%) ⬇️
grpc_v2 1.39% <5.26%> (+0.04%) ⬆️
kafka-3.x-v2 1.39% <5.26%> (+0.04%) ⬆️
memory_v2 1.39% <5.26%> (+0.04%) ⬆️
opensearch-1.x-v1 17.15% <56.14%> (+0.17%) ⬆️
opensearch-2.x-v1 17.15% <56.14%> (+0.17%) ⬆️
opensearch-2.x-v2 1.39% <5.26%> (+0.04%) ⬆️
opensearch-3.x-v2 1.39% <5.26%> (+0.04%) ⬆️
query 1.39% <5.26%> (+0.04%) ⬆️
tailsampling-processor 0.60% <5.26%> (+0.05%) ⬆️
unittests 94.02% <71.92%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from c5f1ae9 to 1b79e3a Compare December 25, 2025 11:06
@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro @Manik2708

PR implements data stream templates aligned with @Manik2708's design doc:

  • Index pattern: jaeger-span-ds*
  • Ingestion pipeline reference for @timestamp
  • ILM policy integration in settings

Templates are ready for review.

@Manik2708
Copy link
Contributor

Thanks @SoumyaRaikwar! But I would like to suggest a little break. The design doc is not ready for implementation yet. Currently it lacks evidence, we might have to manually update the templates and then see how it behaves. Moreover we have to see whether we are missing something over backward compatibility. Currently it has my research from docs but I still have to test and provide the steps for reviewer. If you can help with this, then I would be very grateful to you.

@SoumyaRaikwar
Copy link
Contributor Author

SoumyaRaikwar commented Dec 25, 2025

@Manik2708, I've reviewed your proposal and the current template implementation. I'm keeping my PR as draft for now.

I can help with the manual testing you mentioned. I'll set up a local ES cluster, create the ingest pipelines, apply the data stream templates, and verify the behavior. I'll document the steps and results here for the design doc evidence. Let me know if there's a specific scenario you'd like me to prioritize

@SoumyaRaikwar SoumyaRaikwar marked this pull request as draft December 25, 2025 18:04
@SoumyaRaikwar
Copy link
Contributor Author

@Manik2708

I've completed initial testing and evidence gathering. Here's what I verified:

Design Alignment Confirmed

  • Templates updated with jaeger-span-ds pattern
  • data_stream: {} mapping added
  • index.default_pipeline linked to ingest pipelines
  • Following TsengSR's approach: no ILM customization in Jaeger code - users can override via ES jaeger-span-custom component templates

Ingest Pipeline Tested

Created and tested jaeger-span-ds-timestamp pipeline on ES 8.11:

  • Successfully copies startTime@timestamp
  • Handles epoch_millis format correctly
  • Falls back to _ingest.timestamp for docs without startTime

Manual Verification Evidence

Posted test document to data stream:

  • Input: {"traceID": "test-1", "startTime": 1672531200000}
  • Result: Correctly stored in .ds-jaeger-span-ds-2025.12.26-000001 with @timestamp: "2023-01-01T00:00:00.000Z"

Full evidence report: https://gist.github.com/SoumyaRaikwar/519a98bcc81dc2df04308ae4a66b702b

Next Steps

Ready to work on:

  1. Backward compatibility testing (dual read from old + new indices)
  2. Documentation showing ES custom template override approach (per TsengSR's suggestion)
  3. All 4 index types (span, service, dependencies, sampling)
  4. Integration test updates

Which should I prioritize?

@SoumyaRaikwar SoumyaRaikwar marked this pull request as ready for review December 28, 2025 12:13
@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro

I've updated the design document and the Gist with the proposed Index Lifecycle Management (ILM) policy for data streams.

Updates include:

  • ILM Policy Proposal (jaeger-ilm-policy):
    • Hot Phase: Rollover at 50GB or 1 day. Priority 100.
    • Warm Phase: Transition immediately after rollover. Priority 50.
    • Delete Phase: Delete indices after 7 days.
  • Technical Verification:
    • Updated jaeger-span-8.json template to reference the ILM policy.
    • Verified that all mapping tests pass with the new configuration.
    • Created a sample jaeger-ilm-policy.json definition.

Please review the updated proposal. Once approved, I can proceed with applying the ILM policy to the templates.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Metrics Comparison Summary

ERROR: No summary files were generated. Expected at least 8 diff files from CI.

This indicates a failure in the E2E test execution or metrics collection process.

➡️ View full metrics file

@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro i have updated PR to reflect changes as per doc, could you review?

@Manik2708
Copy link
Contributor

@SoumyaRaikwar you have embedded a lot of features into a single PR. I would suggest to just support Datastream in this PR that too only for span, we can work on these other features once datastream is enabled without any backward compatibility issue.

@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from 0cefcaa to 7a26f99 Compare January 14, 2026 13:24
SoumyaRaikwar and others added 6 commits February 16, 2026 16:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 19, 2026 12:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (5)

internal/storage/elasticsearch/config/config.go:808

  • The IndexWithDate function has inconsistent behavior when the indexName already ends with a dash. On line 804-805, if the index name ends with "-", it appends the date directly (e.g., "jaeger-span-" + "2025-01-01" = "jaeger-span-2025-01-01"). However, on line 807, if it doesn't end with a dash, it adds one before the date (e.g., "jaeger-span" + "-" + "2025-01-01" = "jaeger-span-2025-01-01"). While both produce the same result for typical usage, this creates an inconsistency where the function behaves differently based on whether the caller includes a trailing dash. Consider standardizing this by always trimming trailing dashes and then adding one, or documenting this behavior clearly.
func IndexWithDate(indexName, dateLayout string, date time.Time) string {
	if indexName == "" {
		return date.UTC().Format(dateLayout)
	}
	if strings.HasSuffix(indexName, "-") {
		return indexName + date.UTC().Format(dateLayout)
	}
	return indexName + "-" + date.UTC().Format(dateLayout)
}

internal/storage/v1/elasticsearch/spanstore/writer.go:101

  • This code comment block (lines 92-99) discusses uncertainty about client readiness and version checking, with multiple questions that appear to be stream-of-consciousness thinking. This should either be removed or converted into a concise explanation of the actual design decision. The final comment on line 100-101 explains the actual behavior, making the preceding comments unnecessary.
	// We can't check the version here because the client might not be ready.
	// However, SpanWriter is lazy, so we can check it when we need it?
	// Actually, NewSpanWriter is not lazy about creating ServiceOperationStorage.
	// But p.Client is a factory function.
	// Let's assume we can get a client instance here to check version?
	// p.Client() creates a NEW client or returns existing?
	// Looking at factory.go: f.getClient returns the stored client.

	// We rely on factory to populate p.UseDataStream based on config or version detection.
	useDataStream := p.UseDataStream

internal/storage/elasticsearch/config/config.go:794

  • The validation logic prevents UseDataStream from being used with explicit span aliases, but doesn't validate against UseReadWriteAliases. According to the code in writer.go line 138, UseDataStream and UseReadWriteAliases are mutually exclusive for spans (the condition is p.UseReadWriteAliases && !p.UseDataStream). Consider adding validation to prevent confusion: if UseDataStream is true, UseReadWriteAliases should probably also be validated or at least documented as incompatible.
	// Data streams are used for spans, so explicit span aliases are incompatible
	// with UseDataStream. Service aliases remain valid since services don't use data streams.
	if c.UseDataStream && hasSpanAliases {
		return errors.New("UseDataStream cannot be enabled together with explicit span aliases (span_read_alias, span_write_alias)")
	}

internal/storage/v1/elasticsearch/spanstore/service_operation.go:68

  • The opType variable is set to an empty string and then immediately passed to Add(). This appears to be intentional to maintain backward compatibility for service writes (which use upsert behavior with explicit IDs rather than create-only behavior). However, this would be clearer with a comment explaining why services don't use "create" opType like spans do, or using a named constant like opTypeUpsert = "" to make the intent explicit.
		il := s.client().Index().Index(indexName).Type(serviceType).BodyJson(service)
		opType := ""
		il.Id(cacheKey)
		il.Add(opType)

test_output.txt:364

  • These test output files appear to be accidentally committed and should not be part of the codebase. They show test failures and are likely temporary debugging artifacts. Please remove test_output.txt, test_output_v2.txt, and test_mappings_full.txt from the PR.
=== RUN   TestMappingBuilderGetMapping
=== RUN   TestMappingBuilderGetMapping/jaeger-span
    mapping_test.go:89: 
        	Error Trace:	/media/soumya/DATA/open-source/jaeger/internal/storage/v1/elasticsearch/mappings/mapping_test.go:89
        	Error:      	Not equal: 
        	            	expected: "{\n    \"index_patterns\": [\n        \"test-jaeger-span*\"\n    ],\n    \"priority\": 500,\n    \"template\": {\n        \"settings\": {\n            \"index\": {\n                \"number_of_shards\": 3,\n                \"number_of_replicas\": 3,\n                \"mapping\": {\n                    \"total_fields\": {\n                        \"limit\": 3000\n                    },\n                    \"nested_objects\": {\n                        \"limit\": 50\n                    }\n                },\n                \"default_pipeline\": \"jaeger-trace-time-to-timestamp\",\n                \"lifecycle\": {\n                    \"name\": \"jaeger-test-policy\",\n                    \"rollover_alias\": \"test-jaeger-span-write\"\n                }\n            }\n        },\n        \"mappings\": {\n            \"dynamic_templates\": [\n                {\n                    \"span_tags_map\": {\n                        \"path_match\": \"tag.*\",\n                        \"mapping\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        }\n                    }\n                },\n                {\n                    \"process_tags_map\": {\n                        \"path_match\": \"process.tag.*\",\n                        \"mapping\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        }\n                    }\n                }\n            ],\n            \"properties\": {\n                \"traceID\": {\n                    \"type\": \"keyword\",\n                    \"ignore_above\": 256\n                },\n                \"spanID\": {\n                    \"type\": \"keyword\",\n                    \"ignore_above\": 256\n                },\n                \"operationName\": {\n                    \"type\": \"keyword\",\n                    \"ignore_above\": 256\n                },\n                \"parentSpanID\": {\n                    \"type\": \"keyword\",\n                    \"ignore_above\": 256\n                },\n                \"startTime\": {\n                    \"type\": \"date\",\n                    \"format\": \"epoch_millis\"\n                },\n                \"startTimeMillis\": {\n                    \"type\": \"date\",\n                    \"format\": \"epoch_millis\"\n                },\n                \"duration\": {\n                    \"type\": \"long\"\n                },\n                \"flags\": {\n                    \"type\": \"integer\"\n                },\n                \"logs\": {\n                    \"type\": \"nested\",\n                    \"dynamic\": false,\n                    \"properties\": {\n                        \"timestamp\": {\n                            \"type\": \"date\",\n                            \"format\": \"epoch_millis\"\n                        },\n                        \"fields\": {\n                            \"type\": \"nested\",\n                            \"dynamic\": false,\n                            \"properties\": {\n                                \"key\": {\n                                    \"type\": \"keyword\",\n                                    \"ignore_above\": 256\n                                },\n                                \"value\": {\n                                    \"type\": \"keyword\",\n                                    \"ignore_above\": 256\n                                },\n                                \"type\": {\n                                    \"type\": \"keyword\",\n                                    \"ignore_above\": 256\n                                }\n                            }\n                        }\n                    }\n                },\n                \"process\": {\n                    \"properties\": {\n                        \"serviceName\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        },\n                        \"tag\": {\n                            \"type\": \"object\"\n                        },\n                        \"tags\": {\n                            \"type\": \"nested\",\n                            \"dynamic\": false,\n                            \"properties\": {\n                                \"key\": {\n                                    \"type\": \"keyword\",\n                                    \"ignore_above\": 256\n                                },\n                                \"value\": {\n                                    \"type\": \"keyword\",\n                                    \"ignore_above\": 256\n                                },\n                                \"type\": {\n                                    \"type\": \"keyword\",\n                                    \"ignore_above\": 256\n                                }\n                            }\n                        }\n                    }\n                },\n                \"references\": {\n                    \"type\": \"nested\",\n                    \"dynamic\": false,\n                    \"properties\": {\n                        \"refType\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        },\n                        \"traceID\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        },\n                        \"spanID\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        }\n                    }\n                },\n                \"tag\": {\n                    \"type\": \"object\"\n                },\n                \"tags\": {\n                    \"type\": \"nested\",\n                    \"dynamic\": false,\n                    \"properties\": {\n                        \"key\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        },\n                        \"value\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        },\n                        \"type\": {\n                            \"type\": \"keyword\",\n                            \"ignore_above\": 256\n                        }\n                    }\n                }\n            }\n        }\n    }\n}"
        	            	actual  : "{\n  \"index_patterns\": [\"test-jaeger-span*\"],\n  \"priority\": 500,\n  \"template\": {\n    \"settings\": {\n      \"index\": {\n        \"number_of_shards\": 3,\n        \"number_of_replicas\": 3,\n        \"mapping\": {\n          \"total_fields\": {\n            \"limit\": 3000\n          },\n          \"nested_objects\": {\n            \"limit\": 50\n          }\n        },\n        \"default_pipeline\": \"jaeger-trace-time-to-timestamp\",\n        \"lifecycle\": {\n          \"name\": \"jaeger-test-policy\",\n          \"rollover_alias\": \"test-jaeger-span-write\"\n        }\n      }\n    },\n    \"mappings\": {\n      \"dynamic_templates\": [\n        {\n          \"span_tags_map\": {\n            \"path_match\": \"tag.*\",\n            \"mapping\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            }\n          }\n        },\n        {\n          \"process_tags_map\": {\n            \"path_match\": \"process.tag.*\",\n            \"mapping\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            }\n          }\n        }\n      ],\n      \"properties\": {\n        \"traceID\": {\n          \"type\": \"keyword\",\n          \"ignore_above\": 256\n        },\n        \"spanID\": {\n          \"type\": \"keyword\",\n          \"ignore_above\": 256\n        },\n        \"operationName\": {\n          \"type\": \"keyword\",\n          \"ignore_above\": 256\n        },\n        \"parentSpanID\": {\n          \"type\": \"keyword\",\n          \"ignore_above\": 256\n        },\n        \"startTime\": {\n          \"type\": \"date\",\n          \"format\": \"epoch_millis\"\n        },\n        \"startTimeMillis\": {\n          \"type\": \"date\",\n          \"format\": \"epoch_millis\"\n        },\n        \"duration\": {\n          \"type\": \"long\"\n        },\n        \"flags\": {\n          \"type\": \"integer\"\n        },\n        \"logs\": {\n          \"type\": \"nested\",\n          \"dynamic\": false,\n          \"properties\": {\n            \"timestamp\": {\n              \"type\": \"date\",\n              \"format\": \"epoch_millis\"\n            },\n            \"fields\": {\n              \"type\": \"nested\",\n              \"dynamic\": false,\n              \"properties\": {\n                \"key\": {\n                  \"type\": \"keyword\",\n                  \"ignore_above\": 256\n                },\n                \"value\": {\n                  \"type\": \"keyword\",\n                  \"ignore_above\": 256\n                },\n                \"type\": {\n                  \"type\": \"keyword\",\n                  \"ignore_above\": 256\n                }\n              }\n            }\n          }\n        },\n        \"process\": {\n          \"properties\": {\n            \"serviceName\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            },\n            \"tag\": {\n              \"type\": \"object\"\n            },\n            \"tags\": {\n              \"type\": \"nested\",\n              \"dynamic\": false,\n              \"properties\": {\n                \"key\": {\n                  \"type\": \"keyword\",\n                  \"ignore_above\": 256\n                },\n                \"value\": {\n                  \"type\": \"keyword\",\n                  \"ignore_above\": 256\n                },\n                \"type\": {\n                  \"type\": \"keyword\",\n                  \"ignore_above\": 256\n                }\n              }\n            }\n          }\n        },\n        \"references\": {\n          \"type\": \"nested\",\n          \"dynamic\": false,\n          \"properties\": {\n            \"refType\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            },\n            \"traceID\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            },\n            \"spanID\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            }\n          }\n        },\n        \"tag\": {\n          \"type\": \"object\"\n        },\n        \"tags\": {\n          \"type\": \"nested\",\n          \"dynamic\": false,\n          \"properties\": {\n            \"key\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            },\n            \"value\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            },\n            \"type\": {\n              \"type\": \"keyword\",\n              \"ignore_above\": 256\n            }\n          }\n        }\n      }\n    }\n  }\n}\n"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,177 +1,176 @@
        	            	 {
        	            	-    "index_patterns": [
        	            	-        "test-jaeger-span*"
        	            	-    ],
        	            	-    "priority": 500,
        	            	-    "template": {
        	            	-        "settings": {
        	            	-            "index": {
        	            	-                "number_of_shards": 3,
        	            	-                "number_of_replicas": 3,
        	            	-                "mapping": {
        	            	-                    "total_fields": {
        	            	-                        "limit": 3000
        	            	-                    },
        	            	-                    "nested_objects": {
        	            	-                        "limit": 50
        	            	-                    }
        	            	+  "index_patterns": ["test-jaeger-span*"],
        	            	+  "priority": 500,
        	            	+  "template": {
        	            	+    "settings": {
        	            	+      "index": {
        	            	+        "number_of_shards": 3,
        	            	+        "number_of_replicas": 3,
        	            	+        "mapping": {
        	            	+          "total_fields": {
        	            	+            "limit": 3000
        	            	+          },
        	            	+          "nested_objects": {
        	            	+            "limit": 50
        	            	+          }
        	            	+        },
        	            	+        "default_pipeline": "jaeger-trace-time-to-timestamp",
        	            	+        "lifecycle": {
        	            	+          "name": "jaeger-test-policy",
        	            	+          "rollover_alias": "test-jaeger-span-write"
        	            	+        }
        	            	+      }
        	            	+    },
        	            	+    "mappings": {
        	            	+      "dynamic_templates": [
        	            	+        {
        	            	+          "span_tags_map": {
        	            	+            "path_match": "tag.*",
        	            	+            "mapping": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            }
        	            	+          }
        	            	+        },
        	            	+        {
        	            	+          "process_tags_map": {
        	            	+            "path_match": "process.tag.*",
        	            	+            "mapping": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            }
        	            	+          }
        	            	+        }
        	            	+      ],
        	            	+      "properties": {
        	            	+        "traceID": {
        	            	+          "type": "keyword",
        	            	+          "ignore_above": 256
        	            	+        },
        	            	+        "spanID": {
        	            	+          "type": "keyword",
        	            	+          "ignore_above": 256
        	            	+        },
        	            	+        "operationName": {
        	            	+          "type": "keyword",
        	            	+          "ignore_above": 256
        	            	+        },
        	            	+        "parentSpanID": {
        	            	+          "type": "keyword",
        	            	+          "ignore_above": 256
        	            	+        },
        	            	+        "startTime": {
        	            	+          "type": "date",
        	            	+          "format": "epoch_millis"
        	            	+        },
        	            	+        "startTimeMillis": {
        	            	+          "type": "date",
        	            	+          "format": "epoch_millis"
        	            	+        },
        	            	+        "duration": {
        	            	+          "type": "long"
        	            	+        },
        	            	+        "flags": {
        	            	+          "type": "integer"
        	            	+        },
        	            	+        "logs": {
        	            	+          "type": "nested",
        	            	+          "dynamic": false,
        	            	+          "properties": {
        	            	+            "timestamp": {
        	            	+              "type": "date",
        	            	+              "format": "epoch_millis"
        	            	+            },
        	            	+            "fields": {
        	            	+              "type": "nested",
        	            	+              "dynamic": false,
        	            	+              "properties": {
        	            	+                "key": {
        	            	+                  "type": "keyword",
        	            	+                  "ignore_above": 256
        	            	                 },
        	            	-                "default_pipeline": "jaeger-trace-time-to-timestamp",
        	            	-                "lifecycle": {
        	            	-                    "name": "jaeger-test-policy",
        	            	-                    "rollover_alias": "test-jaeger-span-write"
        	            	+                "value": {
        	            	+                  "type": "keyword",
        	            	+                  "ignore_above": 256
        	            	+                },
        	            	+                "type": {
        	            	+                  "type": "keyword",
        	            	+                  "ignore_above": 256
        	            	                 }
        	            	+              }
        	            	             }
        	            	+          }
        	            	         },
        	            	-        "mappings": {
        	            	-            "dynamic_templates": [
        	            	-                {
        	            	-                    "span_tags_map": {
        	            	-                        "path_match": "tag.*",
        	            	-                        "mapping": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        }
        	            	-                    }
        	            	+        "process": {
        	            	+          "properties": {
        	            	+            "serviceName": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            },
        	            	+            "tag": {
        	            	+              "type": "object"
        	            	+            },
        	            	+            "tags": {
        	            	+              "type": "nested",
        	            	+              "dynamic": false,
        	            	+              "properties": {
        	            	+                "key": {
        	            	+                  "type": "keyword",
        	            	+                  "ignore_above": 256
        	            	                 },
        	            	-                {
        	            	-                    "process_tags_map": {
        	            	-                        "path_match": "process.tag.*",
        	            	-                        "mapping": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        }
        	            	-                    }
        	            	+                "value": {
        	            	+                  "type": "keyword",
        	            	+                  "ignore_above": 256
        	            	+                },
        	            	+                "type": {
        	            	+                  "type": "keyword",
        	            	+                  "ignore_above": 256
        	            	                 }
        	            	-            ],
        	            	-            "properties": {
        	            	-                "traceID": {
        	            	-                    "type": "keyword",
        	            	-                    "ignore_above": 256
        	            	-                },
        	            	-                "spanID": {
        	            	-                    "type": "keyword",
        	            	-                    "ignore_above": 256
        	            	-                },
        	            	-                "operationName": {
        	            	-                    "type": "keyword",
        	            	-                    "ignore_above": 256
        	            	-                },
        	            	-                "parentSpanID": {
        	            	-                    "type": "keyword",
        	            	-                    "ignore_above": 256
        	            	-                },
        	            	-                "startTime": {
        	            	-                    "type": "date",
        	            	-                    "format": "epoch_millis"
        	            	-                },
        	            	-                "startTimeMillis": {
        	            	-                    "type": "date",
        	            	-                    "format": "epoch_millis"
        	            	-                },
        	            	-                "duration": {
        	            	-                    "type": "long"
        	            	-                },
        	            	-                "flags": {
        	            	-                    "type": "integer"
        	            	-                },
        	            	-                "logs": {
        	            	-                    "type": "nested",
        	            	-                    "dynamic": false,
        	            	-                    "properties": {
        	            	-                        "timestamp": {
        	            	-                            "type": "date",
        	            	-                            "format": "epoch_millis"
        	            	-                        },
        	            	-                        "fields": {
        	            	-                            "type": "nested",
        	            	-                            "dynamic": false,
        	            	-                            "properties": {
        	            	-                                "key": {
        	            	-                                    "type": "keyword",
        	            	-                                    "ignore_above": 256
        	            	-                                },
        	            	-                                "value": {
        	            	-                                    "type": "keyword",
        	            	-                                    "ignore_above": 256
        	            	-                                },
        	            	-                                "type": {
        	            	-                                    "type": "keyword",
        	            	-                                    "ignore_above": 256
        	            	-                                }
        	            	-                            }
        	            	-                        }
        	            	-                    }
        	            	-                },
        	            	-                "process": {
        	            	-                    "properties": {
        	            	-                        "serviceName": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        },
        	            	-                        "tag": {
        	            	-                            "type": "object"
        	            	-                        },
        	            	-                        "tags": {
        	            	-                            "type": "nested",
        	            	-                            "dynamic": false,
        	            	-                            "properties": {
        	            	-                                "key": {
        	            	-                                    "type": "keyword",
        	            	-                                    "ignore_above": 256
        	            	-                                },
        	            	-                                "value": {
        	            	-                                    "type": "keyword",
        	            	-                                    "ignore_above": 256
        	            	-                                },
        	            	-                                "type": {
        	            	-                                    "type": "keyword",
        	            	-                                    "ignore_above": 256
        	            	-                                }
        	            	-                            }
        	            	-                        }
        	            	-                    }
        	            	-                },
        	            	-                "references": {
        	            	-                    "type": "nested",
        	            	-                    "dynamic": false,
        	            	-                    "properties": {
        	            	-                        "refType": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        },
        	            	-                        "traceID": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        },
        	            	-                        "spanID": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        }
        	            	-                    }
        	            	-                },
        	            	-                "tag": {
        	            	-                    "type": "object"
        	            	-                },
        	            	-                "tags": {
        	            	-                    "type": "nested",
        	            	-                    "dynamic": false,
        	            	-                    "properties": {
        	            	-                        "key": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        },
        	            	-                        "value": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        },
        	            	-                        "type": {
        	            	-                            "type": "keyword",
        	            	-                            "ignore_above": 256
        	            	-                        }
        	            	-                    }
        	            	-                }
        	            	+              }
        	            	             }
        	            	+          }
        	            	+        },
        	            	+        "references": {
        	            	+          "type": "nested",
        	            	+          "dynamic": false,
        	            	+          "properties": {
        	            	+            "refType": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            },
        	            	+            "traceID": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            },
        	            	+            "spanID": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            }
        	            	+          }
        	            	+        },
        	            	+        "tag": {
        	            	+          "type": "object"
        	            	+        },
        	            	+        "tags": {
        	            	+          "type": "nested",
        	            	+          "dynamic": false,
        	            	+          "properties": {
        	            	+            "key": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            },
        	            	+            "value": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            },
        	            	+            "type": {
        	            	+              "type": "keyword",
        	            	+              "ignore_above": 256
        	            	+            }
        	            	+          }
        	            	         }
        	            	+      }
        	            	     }
        	            	+  }
        	            	 }
        	            	+
        	Test:       	TestMappingBuilderGetMapping/jaeger-span
=== RUN   TestMappingBuilderGetMapping/jaeger-span#01
=== RUN   TestMappingBuilderGetMapping/jaeger-span#02
--- FAIL: TestMappingBuilderGetMapping (0.00s)
    --- FAIL: TestMappingBuilderGetMapping/jaeger-span (0.00s)
    --- PASS: TestMappingBuilderGetMapping/jaeger-span#01 (0.00s)
    --- PASS: TestMappingBuilderGetMapping/jaeger-span#02 (0.00s)
FAIL
FAIL	github.com/jaegertracing/jaeger/internal/storage/v1/elasticsearch/mappings	0.008s
FAIL

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Populate IngestPipelineName in factory.go

- Update TestSpanReaderIndices expectation for default data stream behavior

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
Copilot AI review requested due to automatic review settings February 19, 2026 13:21
Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from f4402fa to b2105f2 Compare February 19, 2026 13:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (4)

internal/storage/v1/elasticsearch/factory.go:190

  • The default ingest pipeline name here ("jaeger-trace-to-timestamp") doesn’t match the name used in the mapping fixtures/tests ("jaeger-trace-time-to-timestamp") and the PR description. If the template renders default_pipeline with this value, indexing will fail unless a pipeline with the exact name exists. Consider aligning the default pipeline name (and any docs/scripts) to a single canonical value.
const (
	defaultILMPolicyName      = "jaeger-ilm-policy"
	defaultIngestPipelineName = "jaeger-trace-to-timestamp"
)

internal/storage/elasticsearch/config/config.go:795

  • UseDataStream is documented as requiring ES 7.9+/OpenSearch 2.0+ (and the PR description mentions validation), but Validate() currently only checks alias incompatibility. Consider adding a validation that rejects UseDataStream=true when an explicit Version is configured below the minimum supported version (and/or when the detected backend is known not to support data streams) to fail fast on unsupported setups.
	// Data streams are used for spans, so explicit span aliases are incompatible
	// with UseDataStream. Service aliases remain valid since services don't use data streams.
	if c.UseDataStream && hasSpanAliases {
		return errors.New("UseDataStream cannot be enabled together with explicit span aliases (span_read_alias, span_write_alias)")
	}

internal/storage/v1/elasticsearch/spanstore/writer.go:101

  • The block comment here reads like in-line development notes/questions (e.g., speculation about whether p.Client() returns a new client) rather than stable code documentation. This makes the constructor harder to read and can become stale quickly. Consider replacing it with a concise explanation of the intended version/config responsibility (or removing it if it’s no longer needed).
	// We can't check the version here because the client might not be ready.
	// However, SpanWriter is lazy, so we can check it when we need it?
	// Actually, NewSpanWriter is not lazy about creating ServiceOperationStorage.
	// But p.Client is a factory function.
	// Let's assume we can get a client instance here to check version?
	// p.Client() creates a NEW client or returns existing?
	// Looking at factory.go: f.getClient returns the stored client.

	// We rely on factory to populate p.UseDataStream based on config or version detection.
	useDataStream := p.UseDataStream

docs/adr/004-elasticsearch-data-streams.md:90

  • This ADR’s ingest pipeline example copies startTime into @timestamp, but Jaeger’s ES span model uses startTime as microseconds and startTimeMillis as the millis/date field. Copying startTime directly would yield incorrect @timestamp values unless the pipeline converts units. Consider updating the example to copy from startTimeMillis (or to include a script/convert processor) to reflect the actual stored fields.
### Handling @timestamp

Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`:

```json
{
  "description": "Copy startTime to @timestamp for Data Stream compatibility",
  "processors": [
    { "set": { "field": "@timestamp", "copy_from": "startTime" } }
  ]
}
</details>



---

💡 <a href="/jaegertracing/jaeger/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

- Fix service index logic in data stream mode to support aliases.

- Update index templates to use long for timestamps to match Jaeger model.

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
Corrected startTime precision to long in ES8 mapping templates. Fixed flaky anonymizer tests by using a robust directory trigger instead of chmod. Verified backward compatibility for legacy index trace lookups.

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
Copilot AI review requested due to automatic review settings February 20, 2026 12:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro Regarding your concerns about backward compatibility, here is a detailed breakdown of how the Data Stream implementation is designed to be completely backward compatible:

  1. Opt-In Only: Data Stream support is strictly opt-in. Users must explicitly set es.use_data_stream: true (or the equivalent environment variable) to enable this feature. If this configuration is not provided, Jaeger behaves exactly as it did before, using the traditional time-based indices with manual rollover aliases (jaeger-span-write, jaeger-span-read). This ensures zero disruption for users who choose not to migrate.
  2. Stable Field Types: In the Elasticsearch mappings (templates for versions 6, 7, and 8), the field types for timestamps have not changed in a way that breaks existing queries. startTime is mapped as long (for microsecond precision), and startTimeMillis is mapped as date (millisecond precision, required by ES for range queries across different indices/data streams).
  3. Dual Lookups (Read Path): When es.use_data_stream: true is enabled, the SpanReader is designed to be fully backward compatible. It automatically queries both the new Data Stream (jaeger-span-ds) and the legacy indices (e.g., jaeger-span-*). This means users can enable Data Streams without needing to re-index their old data. Legacy indices will simply age out naturally according to their retention settings, while all new data flows into the Data Stream.
  4. Range Queries Compatibility: The SpanReader continues to use the startTimeMillis field for its range queries. Because this field is consistently mapped as date across both legacy indices and the new Data Stream components, Elasticsearch can natively and efficiently query across both storage formats simultaneously without issue.
  5. No Changes to Services/Dependencies: As outlined in the ADR, the hybrid model ensures that Services and Dependencies are still stored in standard indices because they require document updates (for deduplication), which Data Streams do not support (append-only). This preserves the existing logic for these entities.

In summary, for users who do not enable the feature, there is no change. For users who do enable it, the transition is seamless as Jaeger reads from both old and new data sources.

(Note: I've also reverted the unrelated cmd/anonymizer test changes I mistakenly included in this PR to keep it focused.)

@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from 511b1c6 to ef062c9 Compare February 20, 2026 15:35
Copilot AI review requested due to automatic review settings February 20, 2026 15:37
@SoumyaRaikwar SoumyaRaikwar force-pushed the feature/es-datastream-support branch from ef062c9 to 9303539 Compare February 20, 2026 15:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage changelog:new-feature Change that should be called out as new feature in CHANGELOG enhancement storage/elasticsearch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants