docs: Add ADR for Elasticsearch Data Streams by SoumyaRaikwar · Pull Request #7974 · jaegertracing/jaeger

SoumyaRaikwar · 2026-02-03T19:15:07Z

Addresses the design documentation request from #7768

This ADR documents the decision to implement Elasticsearch Data Streams for span storage using a hybrid model:

Spans → Data Streams (jaeger-ds-span) - append-only time-series data
Services/Dependencies → Standard indices - require updates/deduplication

Key decisions documented:

Minimum ES 7.9+ / OpenSearch 2.0+ requirement
Auto-detection of ES vs OpenSearch for ILM/ISM policies
Backward compatibility via dual-lookup (no re-indexing required)
Phased implementation approach

References:

Design Document: https://docs.google.com/document/d/1WDQJmHGjnyck5h1DDvZf4ZJnTaXEF-wUvR6k_lipWJ4

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>

codecov · 2026-02-03T19:24:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.50%. Comparing base (c51c3d9) to head (ce65f58).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7974      +/-   ##
==========================================
+ Coverage   95.47%   95.50%   +0.03%     
==========================================
  Files         316      316              
  Lines       16756    16756              
==========================================
+ Hits        15997    16003       +6     
+ Misses        593      589       -4     
+ Partials      166      164       -2

Flag	Coverage Δ
badger_v1	`9.13% <ø> (ø)`
badger_v2	`1.34% <ø> (ø)`
cassandra-4.x-v1-manual	`13.32% <ø> (ø)`
cassandra-4.x-v2-auto	`1.33% <ø> (ø)`
cassandra-4.x-v2-manual	`1.33% <ø> (ø)`
cassandra-5.x-v1-manual	`13.32% <ø> (ø)`
cassandra-5.x-v2-auto	`1.33% <ø> (ø)`
cassandra-5.x-v2-manual	`1.33% <ø> (ø)`
clickhouse	`1.42% <ø> (ø)`
elasticsearch-6.x-v1	`16.90% <ø> (ø)`
elasticsearch-7.x-v1	`16.93% <ø> (ø)`
elasticsearch-8.x-v1	`17.08% <ø> (ø)`
elasticsearch-8.x-v2	`1.34% <ø> (ø)`
elasticsearch-9.x-v2	`1.34% <ø> (-0.05%)`	⬇️
grpc_v1	`8.12% <ø> (ø)`
grpc_v2	`1.34% <ø> (ø)`
kafka-3.x-v2	`1.34% <ø> (ø)`
memory_v2	`1.34% <ø> (ø)`
opensearch-1.x-v1	`16.97% <ø> (ø)`
opensearch-2.x-v1	`16.97% <ø> (ø)`
opensearch-2.x-v2	`1.34% <ø> (ø)`
opensearch-3.x-v2	`1.34% <ø> (ø)`
query	`1.34% <ø> (ø)`
tailsampling-processor	`0.54% <ø> (ø)`
unittests	`94.19% <ø> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR adds Architecture Decision Record (ADR-004) documenting the design for implementing Elasticsearch Data Streams support in Jaeger's storage backend. The ADR proposes a hybrid model where spans use Data Streams for efficient time-series storage while services and dependencies remain in standard indices that support updates.

Changes:

Added ADR-004 documenting the decision to use Elasticsearch Data Streams for span storage
Updated ADR README to include the new ADR-004 entry with proper numbering and linking

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
docs/adr/README.md	Added ADR-004 entry to the index of architectural decisions
docs/adr/004-elasticsearch-data-streams.md	Complete ADR documenting Data Streams design including context, decision rationale, configuration, consequences, and implementation phases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/adr/004-elasticsearch-data-streams.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>

- Change jaeger-ds-span to jaeger-span-ds for consistency with existing Jaeger naming patterns Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>

SoumyaRaikwar · 2026-02-03T21:32:44Z

@jkowall and @yurishkuro i have addressed all the reviews from copilot

SoumyaRaikwar · 2026-02-10T13:38:08Z

@yurishkuro could you please review this ADR i have added for #7768

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T13:41:33Z

docs/adr/004-elasticsearch-data-streams.md

+Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`:
+
+```json
+{
+  "description": "Copy startTime to @timestamp for Data Stream compatibility",
+  "processors": [
+    { "set": { "field": "@timestamp", "copy_from": "startTime" } }


The ingest pipeline example copies startTime into @timestamp, but in Jaeger’s ES span documents startTime is stored as microseconds (long), while startTimeMillis is the epoch-millis value intended for date fields. Copying startTime directly would produce an incorrect @timestamp (orders of magnitude too large) unless you also convert units. Consider copying from startTimeMillis or using a script/convert processor to map micros → millis/date.

Suggested change

Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`:

```json

{

"description": "Copy startTime to @timestamp for Data Stream compatibility",

"processors": [

{ "set": { "field": "@timestamp", "copy_from": "startTime" } }

Data Streams require a `@timestamp` field. An ingest pipeline copies `startTimeMillis` to `@timestamp`:

```json

{

"description": "Copy startTimeMillis to @timestamp for Data Stream compatibility",

"processors": [

{ "set": { "field": "@timestamp", "copy_from": "startTimeMillis" } }

Copilot · 2026-02-10T13:41:34Z

docs/adr/004-elasticsearch-data-streams.md

+- **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`)
+- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates
+
+Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.


The write-path example (POST /<data-stream>/_doc) is a bit misleading for data streams: writes must be op_type=create (and Jaeger typically writes via Bulk). It would be clearer to mention op_type=create (or _create) and/or the Bulk API create action to avoid readers trying index operations that data streams reject.

Suggested change

- **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`)

- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates

Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.

- **Simplified Writes**: Single endpoint for all writes (for example, `POST /<data-stream>/_doc?op_type=create`, `POST /<data-stream>/_create`, or Bulk API `create` actions)

- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates

Data Streams only support `create` operations (append-only). Indexing requests must use `op_type=create` (or the Bulk API `create` action); standard `index` operations are rejected. Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.

Copilot · 2026-02-10T13:41:34Z

docs/adr/004-elasticsearch-data-streams.md

+- **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`)
+- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates
+
+Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.


This sentence implies documents in data streams can never be deleted/updated by ID. More precisely, Elasticsearch/OpenSearch disallow update/delete requests targeting the data stream name; such operations require targeting the backing index directly (and may still be possible, though discouraged). Clarifying this nuance will prevent readers from assuming deletion is impossible in all cases.

Suggested change

Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.

When targeting the data stream name, Data Stream APIs only support `create` operations (append-only). While documents in the underlying backing indices could technically be updated or deleted by ID, Jaeger treats span data as immutable and does not perform such operations, which makes Data Streams ideal for trace data.

yurishkuro · 2026-02-20T16:09:09Z

I prefer we iterate on the google doc, it's too early for Markdown doc which is harder to comment on and debate.

docs: Add ADR for Elasticsearch Data Streams

5873ea0

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>

SoumyaRaikwar requested a review from a team as a code owner February 3, 2026 19:15

dosubot bot added documentation storage/elasticsearch labels Feb 3, 2026

SoumyaRaikwar mentioned this pull request Feb 3, 2026

[storage] Add Elasticsearch data stream support #7768

Open

jkowall added the changelog:documentation label Feb 3, 2026

jkowall requested a review from Copilot February 3, 2026 19:36

Copilot started reviewing on behalf of jkowall February 3, 2026 19:36 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

docs/adr/004-elasticsearch-data-streams.md Outdated Show resolved Hide resolved

docs/adr/004-elasticsearch-data-streams.md Outdated Show resolved Hide resolved

SoumyaRaikwar and others added 3 commits February 4, 2026 01:33

Update docs/adr/004-elasticsearch-data-streams.md

b2bdf0b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>

Merge branch 'main' into docs/adr-elasticsearch-data-streams

1b11cc6

docs: Fix ADR-004 naming convention to match PR jaegertracing#7768

15ee7aa

- Change jaeger-ds-span to jaeger-span-ds for consistency with existing Jaeger naming patterns Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>

SoumyaRaikwar requested a review from Copilot February 10, 2026 13:37

Copilot started reviewing on behalf of SoumyaRaikwar February 10, 2026 13:37 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

Merge branch 'main' into docs/adr-elasticsearch-data-streams

ce65f58

github-actions bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

docs: Add ADR for Elasticsearch Data Streams#7974

docs: Add ADR for Elasticsearch Data Streams#7974
SoumyaRaikwar wants to merge 5 commits intojaegertracing:mainfrom
SoumyaRaikwar:docs/adr-elasticsearch-data-streams

SoumyaRaikwar commented Feb 3, 2026

Uh oh!

codecov bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

SoumyaRaikwar commented Feb 3, 2026

Uh oh!

SoumyaRaikwar commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

yurishkuro commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.
	When targeting the data stream name, Data Stream APIs only support `create` operations (append-only). While documents in the underlying backing indices could technically be updated or deleted by ID, Jaeger treats span data as immutable and does not perform such operations, which makes Data Streams ideal for trace data.

Comments

Conversation

SoumyaRaikwar commented Feb 3, 2026

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

SoumyaRaikwar commented Feb 3, 2026

Uh oh!

SoumyaRaikwar commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

yurishkuro commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 3, 2026 •

edited

Loading