docs: Add ADR for Elasticsearch Data Streams#7974
docs: Add ADR for Elasticsearch Data Streams#7974SoumyaRaikwar wants to merge 5 commits intojaegertracing:mainfrom
Conversation
Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7974 +/- ##
==========================================
+ Coverage 95.47% 95.50% +0.03%
==========================================
Files 316 316
Lines 16756 16756
==========================================
+ Hits 15997 16003 +6
+ Misses 593 589 -4
+ Partials 166 164 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds Architecture Decision Record (ADR-004) documenting the design for implementing Elasticsearch Data Streams support in Jaeger's storage backend. The ADR proposes a hybrid model where spans use Data Streams for efficient time-series storage while services and dependencies remain in standard indices that support updates.
Changes:
- Added ADR-004 documenting the decision to use Elasticsearch Data Streams for span storage
- Updated ADR README to include the new ADR-004 entry with proper numbering and linking
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/adr/README.md | Added ADR-004 entry to the index of architectural decisions |
| docs/adr/004-elasticsearch-data-streams.md | Complete ADR documenting Data Streams design including context, decision rationale, configuration, consequences, and implementation phases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Soumya Raikwar <164396577+SoumyaRaikwar@users.noreply.github.com>
- Change jaeger-ds-span to jaeger-span-ds for consistency with existing Jaeger naming patterns Signed-off-by: SoumyaRaikwar <somuraik@gmail.com>
|
@jkowall and @yurishkuro i have addressed all the reviews from copilot |
|
@yurishkuro could you please review this ADR i have added for #7768 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`: | ||
|
|
||
| ```json | ||
| { | ||
| "description": "Copy startTime to @timestamp for Data Stream compatibility", | ||
| "processors": [ | ||
| { "set": { "field": "@timestamp", "copy_from": "startTime" } } |
There was a problem hiding this comment.
The ingest pipeline example copies startTime into @timestamp, but in Jaeger’s ES span documents startTime is stored as microseconds (long), while startTimeMillis is the epoch-millis value intended for date fields. Copying startTime directly would produce an incorrect @timestamp (orders of magnitude too large) unless you also convert units. Consider copying from startTimeMillis or using a script/convert processor to map micros → millis/date.
| Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`: | |
| ```json | |
| { | |
| "description": "Copy startTime to @timestamp for Data Stream compatibility", | |
| "processors": [ | |
| { "set": { "field": "@timestamp", "copy_from": "startTime" } } | |
| Data Streams require a `@timestamp` field. An ingest pipeline copies `startTimeMillis` to `@timestamp`: | |
| ```json | |
| { | |
| "description": "Copy startTimeMillis to @timestamp for Data Stream compatibility", | |
| "processors": [ | |
| { "set": { "field": "@timestamp", "copy_from": "startTimeMillis" } } |
| - **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`) | ||
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | ||
|
|
||
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. |
There was a problem hiding this comment.
The write-path example (POST /<data-stream>/_doc) is a bit misleading for data streams: writes must be op_type=create (and Jaeger typically writes via Bulk). It would be clearer to mention op_type=create (or _create) and/or the Bulk API create action to avoid readers trying index operations that data streams reject.
| - **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`) | |
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | |
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. | |
| - **Simplified Writes**: Single endpoint for all writes (for example, `POST /<data-stream>/_doc?op_type=create`, `POST /<data-stream>/_create`, or Bulk API `create` actions) | |
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | |
| Data Streams only support `create` operations (append-only). Indexing requests must use `op_type=create` (or the Bulk API `create` action); standard `index` operations are rejected. Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. |
| - **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`) | ||
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | ||
|
|
||
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. |
There was a problem hiding this comment.
This sentence implies documents in data streams can never be deleted/updated by ID. More precisely, Elasticsearch/OpenSearch disallow update/delete requests targeting the data stream name; such operations require targeting the backing index directly (and may still be possible, though discouraged). Clarifying this nuance will prevent readers from assuming deletion is impossible in all cases.
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. | |
| When targeting the data stream name, Data Stream APIs only support `create` operations (append-only). While documents in the underlying backing indices could technically be updated or deleted by ID, Jaeger treats span data as immutable and does not perform such operations, which makes Data Streams ideal for trace data. |
|
I prefer we iterate on the google doc, it's too early for Markdown doc which is harder to comment on and debate. |
Addresses the design documentation request from #7768
This ADR documents the decision to implement Elasticsearch Data Streams for span storage using a hybrid model:
jaeger-ds-span) - append-only time-series dataKey decisions documented:
References: