-
Notifications
You must be signed in to change notification settings - Fork 154
Edit time series docs for clarity #3222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 14 commits
95d852c
1069b85
b68069d
0387a4e
9bbda53
d4fe9bc
54ac127
666dd35
b7a43d2
7141fff
947e8d5
854239f
fb68872
0b0de90
cdfa095
a49050d
3fc8503
80c583d
cf2ec85
141728e
7f608cd
eeef3be
f42d458
1fca353
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
navigation_title: "Advanced topics" | ||
applies_to: | ||
stack: ga | ||
serverless: ga | ||
products: | ||
- id: elasticsearch | ||
--- | ||
|
||
# Advanced topics for working with time series data streams | ||
|
||
This section contains information about advanced concepts and operations for [time series data streams](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md): | ||
|
||
- [](/manage-data/data-store/data-streams/time-bound-tsds.md) | ||
- [](/manage-data/data-store/data-streams/reindex-tsds.md) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
navigation_title: Reindex a TSDS | ||
navigation_title: "Reindex a TSDS" | ||
mapped_pages: | ||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds-reindex.html | ||
applies_to: | ||
|
@@ -9,208 +9,111 @@ products: | |
- id: elasticsearch | ||
--- | ||
|
||
# Reindex a TSDS [tsds-reindex] | ||
# Reindex a time series data stream [tsds-reindex] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kkrik-es if I understand correctly, this reindexing manual is suggesting to reindex the data of one data stream into a single backing index of another data stream. Right? If this is true, then I think we need to add a disclaimer before a user gets unpleasantly surprised. We could also mention the reindex data stream API that was added for upgrades, I will check if it works. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah not ideal.. It's orthogonal to this PR tho, let's file an issue to provide a better path (I thought we had one..) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should at least mention that the result will be a single index, like a note box or something, @marciw what do you think. The rest is indeed orthogonal to this PR. |
||
|
||
## Introduction [tsds-reindex-intro] | ||
Reindexing allows you to copy documents from an old [time series data stream (TSDS)](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md) to a new one. All data streams support reindexing, but time series data streams require special handling due to their time-bound backing indices and strict timestamp acceptance windows. | ||
|
||
With reindexing, you can copy documents from an old [time-series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md) to a new one. Data streams support reindexing in general, with a few [restrictions](use-data-stream.md#reindex-with-a-data-stream). Still, time-series data streams introduce additional challenges due to tight control on the accepted timestamp range for each backing index they contain. Direct use of the reindex API would likely error out due to attempting to insert documents with timestamps that are outside the current acceptance window. | ||
To reindex, follow the steps on this page. | ||
|
||
To avoid these limitations, use the process that is outlined below: | ||
:::{note} | ||
This process only applies to time series data streams without a [downsampling](/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md) configuration. To reindex a downsampled data stream, reindex the backing indices individually, then add them to a new, empty data stream. | ||
::: | ||
|
||
1. Create an index template for the destination data stream that will contain the re-indexed data. | ||
2. Update the template to | ||
## Overview | ||
|
||
1. Set `index.time_series.start_time` and `index.time_series.end_time` index settings to match the lowest and highest `@timestamp` values in the old data stream. | ||
2. Set the `index.number_of_shards` index setting to the sum of all primary shards of all backing indices of the old data stream. | ||
3. Set `index.number_of_replicas` to zero and unset the `index.lifecycle.name` index setting. | ||
These high-level steps summarize the process of reindexing a time series data stream. Each step is detailed in later sections. | ||
|
||
3. Run the reindex operation to completion. | ||
4. Revert the overridden index settings in the destination index template. | ||
5. Invoke the `rollover` api to create a new backing index that can receive new documents. | ||
1. Create an index template for the destination data stream | ||
2. Update the template with temporary settings for reindexing | ||
3. Run the reindex operation | ||
4. Revert the temporary index settings | ||
5. Perform a manual rollover to create a new backing index for incoming data | ||
|
||
::::{note} | ||
This process only applies to time-series data streams without [downsampling](./downsampling-time-series-data-stream.md) configuration. Data streams with downsampling can only be re-indexed by re-indexing their backing indexes individually and adding them to an empty destination data stream. | ||
:::: | ||
The examples on this page use Dev Tools [Console](/explore-analyze/query-filter/tools/console.md) syntax. | ||
|
||
## Create the destination index template [tsds-reindex-create-template] | ||
|
||
In what follows, we elaborate on each step of the process with examples. | ||
|
||
|
||
## Create a TSDS template to accept old documents [tsds-reindex-create-template] | ||
|
||
Consider a TSDS with the following template: | ||
Create an index template for the new TSDS, using your preferred mappings and settings: | ||
|
||
```console | ||
POST /_component_template/source_template | ||
PUT _index_template/my-new-tsds-template | ||
{ | ||
"index_patterns": ["my-new-tsds"], | ||
"priority": 100, | ||
"data_stream": {}, | ||
"template": { | ||
"settings": { | ||
"index": { | ||
"number_of_replicas": 2, | ||
"number_of_shards": 2, | ||
"mode": "time_series", | ||
"routing_path": [ "metricset" ] | ||
} | ||
"index.mode": "time_series", | ||
"index.routing_path": ["dimension_field"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This setting gets auto-populated, let's remove it. |
||
}, | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { "type": "date" }, | ||
"metricset": { | ||
"@timestamp": { | ||
"type": "date" | ||
}, | ||
"dimension_field": { | ||
"type": "keyword", | ||
"time_series_dimension": true | ||
}, | ||
"k8s": { | ||
"properties": { | ||
"tx": { "type": "long" }, | ||
"rx": { "type": "long" } | ||
} | ||
"metric_field": { | ||
"type": "double", | ||
"time_series_metric": "gauge" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
POST /_index_template/1 | ||
{ | ||
"index_patterns": [ | ||
"k8s*" | ||
], | ||
"composed_of": [ | ||
"source_template" | ||
], | ||
"data_stream": {} | ||
} | ||
``` | ||
|
||
A possible output of `/k8s/_settings` looks like: | ||
|
||
```console-result | ||
{ | ||
".ds-k8s-2023.09.01-000002": { | ||
"settings": { | ||
"index": { | ||
"mode": "time_series", | ||
"routing": { | ||
"allocation": { | ||
"include": { | ||
"_tier_preference": "data_hot" | ||
} | ||
} | ||
}, | ||
"hidden": "true", | ||
"number_of_shards": "2", | ||
"time_series": { | ||
"end_time": "2023-09-01T14:00:00.000Z", | ||
"start_time": "2023-09-01T10:00:00.000Z" | ||
}, | ||
"provided_name": ".ds-k9s-2023.09.01-000002", | ||
"creation_date": "1694439857608", | ||
"number_of_replicas": "2", | ||
"routing_path": [ | ||
"metricset" | ||
], | ||
... | ||
} | ||
} | ||
}, | ||
".ds-k8s-2023.09.01-000001": { | ||
"settings": { | ||
"index": { | ||
"mode": "time_series", | ||
"routing": { | ||
"allocation": { | ||
"include": { | ||
"_tier_preference": "data_hot" | ||
} | ||
} | ||
}, | ||
"hidden": "true", | ||
"number_of_shards": "2", | ||
"time_series": { | ||
"end_time": "2023-09-01T10:00:00.000Z", | ||
"start_time": "2023-09-01T06:00:00.000Z" | ||
}, | ||
"provided_name": ".ds-k9s-2023.09.01-000001", | ||
"creation_date": "1694439837126", | ||
"number_of_replicas": "2", | ||
"routing_path": [ | ||
"metricset" | ||
], | ||
... | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
## Update the template for reindexing | ||
|
||
To reindex this TSDS, do not to re-use its index template in the destination data stream, to avoid impacting its functionality. Instead, clone the template of the source TSDS and apply the following modifications: | ||
To support the reindexing process, you need to temporarily modify the template: | ||
|
||
* Set `index.time_series.start_time` and `index.time_series.end_time` index settings explicitly. Their values should be based on the lowest and highest `@timestamp` values in the data stream to reindex. This way, the initial backing index can load all data that is contained in the source data stream. | ||
* Set `index.number_of_shards` index setting to the sum of all primary shards of all backing indices of the source data stream. This helps maintain the same level of search parallelism, as each shard is processed in a separate thread (or more). | ||
* Unset the `index.lifecycle.name` index setting, if any. This prevents ILM from modifying the destination data stream during reindexing. | ||
* (Optional) Set `index.number_of_replicas` to zero. This helps speed up the reindex operation. Since the data gets copied, there is limited risk of data loss due to lack of replicas. | ||
|
||
Using the example above as source TSDS, the template for the destination TSDS would be: | ||
1. Set `index.time_series.start_time` and `index.time_series.end_time` index settings to match the lowest and highest `@timestamp` values in the old data stream. | ||
2. Set `index.number_of_shards` to the sum of all primary shards of all backing indices of the old data stream. | ||
3. Clear the `index.lifecycle.name` index setting (if any), to prevent ILM from modifying the destination data stream during reindexing. | ||
4. (Optional) Set `index.number_of_replicas` to zero, to speed up reindexing. Because the data gets copied in the reindexing process, you don't need replicas. | ||
|
||
```console | ||
POST /_component_template/destination_template | ||
PUT _index_template/new-tsds-template | ||
{ | ||
"index_patterns": ["new-tsds*"], | ||
"priority": 100, | ||
"data_stream": {}, | ||
"template": { | ||
"settings": { | ||
"index": { | ||
"number_of_replicas": 0, | ||
"number_of_shards": 4, | ||
"mode": "time_series", | ||
"routing_path": [ "metricset" ], | ||
"time_series": { | ||
"end_time": "2023-09-01T14:00:00.000Z", | ||
"start_time": "2023-09-01T06:00:00.000Z" | ||
} | ||
} | ||
"index.mode": "time_series", | ||
"index.routing_path": ["host", "service"], | ||
"index.time_series.start_time": "2023-01-01T00:00:00Z", <1> | ||
"index.time_series.end_time": "2025-01-01T00:00:00Z", <2> | ||
"index.number_of_shards": 6, <3> | ||
"index.number_of_replicas": 0, <4> | ||
"index.lifecycle.name": null <5> | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { "type": "date" }, | ||
"metricset": { | ||
"type": "keyword", | ||
"time_series_dimension": true | ||
}, | ||
"k8s": { | ||
"properties": { | ||
"tx": { "type": "long" }, | ||
"rx": { "type": "long" } | ||
} | ||
} | ||
} | ||
... | ||
} | ||
} | ||
} | ||
|
||
POST /_index_template/2 | ||
{ | ||
"index_patterns": [ | ||
"k9s*" | ||
], | ||
"composed_of": [ | ||
"destination_template" | ||
], | ||
"data_stream": {} | ||
} | ||
``` | ||
|
||
1. Lowest timestamp value in the old data stream | ||
2. Highest timestamp value in the old data stream | ||
3. Sum of the primary shards from all source backing indices | ||
4. Speed up reindexing | ||
5. Pause ILM | ||
|
||
## Reindex [tsds-reindex-op] | ||
### Create the destination data stream and reindex [tsds-reindex-op] | ||
|
||
Invoke the reindex api, for instance: | ||
Run the reindex operation: | ||
|
||
```console | ||
POST /_reindex | ||
{ | ||
"source": { | ||
"index": "k8s" | ||
"index": "old-tsds" | ||
}, | ||
"dest": { | ||
"index": "k9s", | ||
"index": "new-tsds", | ||
"op_type": "create" | ||
} | ||
} | ||
|
@@ -219,51 +122,45 @@ POST /_reindex | |
|
||
## Restore the destination index template [tsds-reindex-restore] | ||
|
||
Once the reindexing operation completes, restore the index template for the destination TSDS as follows: | ||
After reindexing completes, update the index template again to remove the temporary settings: | ||
|
||
* Remove the overrides for `index.time_series.start_time` and `index.time_series.end_time`. | ||
* Restore the values of `index.number_of_shards`, `index.number_of_replicas` and `index.lifecycle.name` as applicable. | ||
|
||
Using the previous example, the destination template is modified as follows: | ||
* Restore the values of `index.number_of_shards`, `index.number_of_replicas`, and `index.lifecycle.name` (as applicable). | ||
|
||
```console | ||
POST /_component_template/destination_template | ||
PUT _index_template/new-tsds-template | ||
{ | ||
"index_patterns": ["new-tsds*"], | ||
"priority": 100, | ||
"data_stream": {}, | ||
"template": { | ||
"settings": { | ||
"index": { | ||
"number_of_replicas": 2, | ||
"number_of_shards": 2, | ||
"mode": "time_series", | ||
"routing_path": [ "metricset" ] | ||
} | ||
}, | ||
"settings": { | ||
"index.mode": "time_series", | ||
"index.routing_path": ["host", "service"], | ||
"index.number_of_replicas": 1, <1> | ||
"index.lifecycle.name": "my-ilm-policy" <2> | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"@timestamp": { "type": "date" }, | ||
"metricset": { | ||
"type": "keyword", | ||
"time_series_dimension": true | ||
}, | ||
"k8s": { | ||
"properties": { | ||
"tx": { "type": "long" }, | ||
"rx": { "type": "long" } | ||
} | ||
} | ||
} | ||
... | ||
} | ||
} | ||
} | ||
``` | ||
|
||
Next, Invoke the `rollover` api on the destination data stream without any conditions set. | ||
1. Restore replicas | ||
2. Re-enable ILM | ||
|
||
## Roll over for new data | ||
|
||
Create a new backing index with a manual rollover request: | ||
|
||
```console | ||
POST /k9s/_rollover/ | ||
POST new-tsds/_rollover/ | ||
``` | ||
|
||
This creates a new backing index with the updated index settings. The destination data stream is now ready to accept new documents. | ||
The destination data stream is now ready to accept new documents. | ||
|
||
Note that the initial backing index can still accept documents within the range of timestamps derived from the source data stream. If this is not desired, mark it as [read-only](elasticsearch://reference/elasticsearch/index-settings/index-block.md#index-blocks-read-only) explicitly. | ||
## Related resources | ||
|
||
- [Time series data streams overview](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md) | ||
- [Reindex API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex) |
Uh oh!
There was an error while loading. Please reload this page.