Skip to content

Comments

docs: Add ADR for Elasticsearch Data Streams#7974

Open
SoumyaRaikwar wants to merge 5 commits intojaegertracing:mainfrom
SoumyaRaikwar:docs/adr-elasticsearch-data-streams
Open

docs: Add ADR for Elasticsearch Data Streams#7974
SoumyaRaikwar wants to merge 5 commits intojaegertracing:mainfrom
SoumyaRaikwar:docs/adr-elasticsearch-data-streams

Conversation

@SoumyaRaikwar
Copy link
Contributor

Addresses the design documentation request from #7768

This ADR documents the decision to implement Elasticsearch Data Streams for span storage using a hybrid model:

  • Spans → Data Streams (jaeger-ds-span) - append-only time-series data
  • Services/Dependencies → Standard indices - require updates/deduplication

Key decisions documented:

  • Minimum ES 7.9+ / OpenSearch 2.0+ requirement
  • Auto-detection of ES vs OpenSearch for ILM/ISM policies
  • Backward compatibility via dual-lookup (no re-indexing required)
  • Phased implementation approach

References:

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.50%. Comparing base (c51c3d9) to head (ce65f58).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7974      +/-   ##
==========================================
+ Coverage   95.47%   95.50%   +0.03%     
==========================================
  Files         316      316              
  Lines       16756    16756              
==========================================
+ Hits        15997    16003       +6     
+ Misses        593      589       -4     
+ Partials      166      164       -2     
Flag Coverage Δ
badger_v1 9.13% <ø> (ø)
badger_v2 1.34% <ø> (ø)
cassandra-4.x-v1-manual 13.32% <ø> (ø)
cassandra-4.x-v2-auto 1.33% <ø> (ø)
cassandra-4.x-v2-manual 1.33% <ø> (ø)
cassandra-5.x-v1-manual 13.32% <ø> (ø)
cassandra-5.x-v2-auto 1.33% <ø> (ø)
cassandra-5.x-v2-manual 1.33% <ø> (ø)
clickhouse 1.42% <ø> (ø)
elasticsearch-6.x-v1 16.90% <ø> (ø)
elasticsearch-7.x-v1 16.93% <ø> (ø)
elasticsearch-8.x-v1 17.08% <ø> (ø)
elasticsearch-8.x-v2 1.34% <ø> (ø)
elasticsearch-9.x-v2 1.34% <ø> (-0.05%) ⬇️
grpc_v1 8.12% <ø> (ø)
grpc_v2 1.34% <ø> (ø)
kafka-3.x-v2 1.34% <ø> (ø)
memory_v2 1.34% <ø> (ø)
opensearch-1.x-v1 16.97% <ø> (ø)
opensearch-2.x-v1 16.97% <ø> (ø)
opensearch-2.x-v2 1.34% <ø> (ø)
opensearch-3.x-v2 1.34% <ø> (ø)
query 1.34% <ø> (ø)
tailsampling-processor 0.54% <ø> (ø)
unittests 94.19% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Architecture Decision Record (ADR-004) documenting the design for implementing Elasticsearch Data Streams support in Jaeger's storage backend. The ADR proposes a hybrid model where spans use Data Streams for efficient time-series storage while services and dependencies remain in standard indices that support updates.

Changes:

  • Added ADR-004 documenting the decision to use Elasticsearch Data Streams for span storage
  • Updated ADR README to include the new ADR-004 entry with proper numbering and linking

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
docs/adr/README.md Added ADR-004 entry to the index of architectural decisions
docs/adr/004-elasticsearch-data-streams.md Complete ADR documenting Data Streams design including context, decision rationale, configuration, consequences, and implementation phases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

SoumyaRaikwar and others added 3 commits February 4, 2026 01:33
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
- Change jaeger-ds-span to jaeger-span-ds for consistency with existing Jaeger naming patterns

Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
@SoumyaRaikwar
Copy link
Contributor Author

@jkowall and @yurishkuro i have addressed all the reviews from copilot

@SoumyaRaikwar
Copy link
Contributor Author

@yurishkuro could you please review this ADR i have added for #7768

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +81 to +87
Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`:

```json
{
"description": "Copy startTime to @timestamp for Data Stream compatibility",
"processors": [
{ "set": { "field": "@timestamp", "copy_from": "startTime" } }
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingest pipeline example copies startTime into @timestamp, but in Jaeger’s ES span documents startTime is stored as microseconds (long), while startTimeMillis is the epoch-millis value intended for date fields. Copying startTime directly would produce an incorrect @timestamp (orders of magnitude too large) unless you also convert units. Consider copying from startTimeMillis or using a script/convert processor to map micros → millis/date.

Suggested change
Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`:
```json
{
"description": "Copy startTime to @timestamp for Data Stream compatibility",
"processors": [
{ "set": { "field": "@timestamp", "copy_from": "startTime" } }
Data Streams require a `@timestamp` field. An ingest pipeline copies `startTimeMillis` to `@timestamp`:
```json
{
"description": "Copy startTimeMillis to @timestamp for Data Stream compatibility",
"processors": [
{ "set": { "field": "@timestamp", "copy_from": "startTimeMillis" } }

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +33
- **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`)
- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates

Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write-path example (POST /<data-stream>/_doc) is a bit misleading for data streams: writes must be op_type=create (and Jaeger typically writes via Bulk). It would be clearer to mention op_type=create (or _create) and/or the Bulk API create action to avoid readers trying index operations that data streams reject.

Suggested change
- **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`)
- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates
Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.
- **Simplified Writes**: Single endpoint for all writes (for example, `POST /<data-stream>/_doc?op_type=create`, `POST /<data-stream>/_create`, or Bulk API `create` actions)
- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates
Data Streams only support `create` operations (append-only). Indexing requests must use `op_type=create` (or the Bulk API `create` action); standard `index` operations are rejected. Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.

Copilot uses AI. Check for mistakes.
- **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`)
- **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates

Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence implies documents in data streams can never be deleted/updated by ID. More precisely, Elasticsearch/OpenSearch disallow update/delete requests targeting the data stream name; such operations require targeting the backing index directly (and may still be possible, though discouraged). Clarifying this nuance will prevent readers from assuming deletion is impossible in all cases.

Suggested change
Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data.
When targeting the data stream name, Data Stream APIs only support `create` operations (append-only). While documents in the underlying backing indices could technically be updated or deleted by ID, Jaeger treats span data as immutable and does not perform such operations, which makes Data Streams ideal for trace data.

Copilot uses AI. Check for mistakes.
@yurishkuro
Copy link
Member

I prefer we iterate on the google doc, it's too early for Markdown doc which is harder to comment on and debate.

@github-actions github-actions bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:documentation documentation storage/elasticsearch waiting-for-author PR is waiting for author to respond to maintainer's comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants