-
Notifications
You must be signed in to change notification settings - Fork 2.8k
docs: Add ADR for Elasticsearch Data Streams #7974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
5873ea0
b2bdf0b
1b11cc6
15ee7aa
ce65f58
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,146 @@ | ||||||||||||||||||||||||||||||
| # Elasticsearch Data Streams for Span Storage | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Status | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Proposed | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Context | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Jaeger's Elasticsearch storage backend currently uses time-based indices with manual rollover aliases (`jaeger-span-write`, `jaeger-span-read`) for span storage. While functional, this approach has operational challenges. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ### Current Behavior | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| The existing implementation in [`internal/storage/v2/elasticsearch/`](../../internal/storage/v2/elasticsearch/) manages span indices through: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| 1. **Manual Rollover Aliases**: Requires explicit alias configuration and rollover triggers | ||||||||||||||||||||||||||||||
| 2. **Explicit Index Naming**: Index names follow `jaeger-span-YYYY-MM-DD` pattern | ||||||||||||||||||||||||||||||
| 3. **Separate ILM Configuration**: Users must configure Index Lifecycle Management policies independently | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ### Problems | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| 1. **Operational Overhead**: Managing rollover aliases and ILM policies requires significant operational knowledge. | ||||||||||||||||||||||||||||||
| 2. **Configuration Complexity**: Multiple interdependent flags (`UseILM`, `CreateAliases`, `IndexRolloverFrequencySpans`) create potential for misconfiguration. | ||||||||||||||||||||||||||||||
| 3. **Modern ES Features Unused**: Elasticsearch 7.9+ and OpenSearch 2.x+ natively support Data Streams, which simplify time-series data management. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ### Data Streams Overview | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| [Elasticsearch Data Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html) are the native solution for append-only time-series data: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| - **Automatic Rollover**: Built-in index lifecycle management | ||||||||||||||||||||||||||||||
| - **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`) | ||||||||||||||||||||||||||||||
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. | ||||||||||||||||||||||||||||||
|
Comment on lines
+30
to
+33
|
||||||||||||||||||||||||||||||
| - **Simplified Writes**: Single endpoint for all writes (`POST /<data-stream>/_doc`) | |
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | |
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. | |
| - **Simplified Writes**: Single endpoint for all writes (for example, `POST /<data-stream>/_doc?op_type=create`, `POST /<data-stream>/_create`, or Bulk API `create` actions) | |
| - **Integrated ILM/ISM**: Lifecycle policies referenced directly in index templates | |
| Data Streams only support `create` operations (append-only). Indexing requests must use `op_type=create` (or the Bulk API `create` action); standard `index` operations are rejected. Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence implies documents in data streams can never be deleted/updated by ID. More precisely, Elasticsearch/OpenSearch disallow update/delete requests targeting the data stream name; such operations require targeting the backing index directly (and may still be possible, though discouraged). Clarifying this nuance will prevent readers from assuming deletion is impossible in all cases.
| Data Streams only support `create` operations (append-only). Documents cannot be updated or deleted by ID, which makes them ideal for immutable trace data. | |
| When targeting the data stream name, Data Stream APIs only support `create` operations (append-only). While documents in the underlying backing indices could technically be updated or deleted by ID, Jaeger treats span data as immutable and does not perform such operations, which makes Data Streams ideal for trace data. |
SoumyaRaikwar marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ingest pipeline example copies startTime into @timestamp, but in Jaeger’s ES span documents startTime is stored as microseconds (long), while startTimeMillis is the epoch-millis value intended for date fields. Copying startTime directly would produce an incorrect @timestamp (orders of magnitude too large) unless you also convert units. Consider copying from startTimeMillis or using a script/convert processor to map micros → millis/date.
| Data Streams require a `@timestamp` field. An ingest pipeline copies `startTime` to `@timestamp`: | |
| ```json | |
| { | |
| "description": "Copy startTime to @timestamp for Data Stream compatibility", | |
| "processors": [ | |
| { "set": { "field": "@timestamp", "copy_from": "startTime" } } | |
| Data Streams require a `@timestamp` field. An ingest pipeline copies `startTimeMillis` to `@timestamp`: | |
| ```json | |
| { | |
| "description": "Copy startTimeMillis to @timestamp for Data Stream compatibility", | |
| "processors": [ | |
| { "set": { "field": "@timestamp", "copy_from": "startTimeMillis" } } |
Uh oh!
There was an error while loading. Please reload this page.