From 35a621a97fe0aa721b116ae7f494bc479fa57fb1 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Thu, 24 Jul 2025 17:25:21 -0400 Subject: [PATCH 01/20] Edit and restructure, part 1 --- .../data-streams/downsampling-concepts.md | 105 +++++++++++++ .../downsampling-time-series-data-stream.md | 147 ++---------------- .../data-streams/run-downsampling-manually.md | 5 +- ...ownsampling-using-data-stream-lifecycle.md | 5 +- .../data-streams/run-downsampling-with-ilm.md | 5 +- .../data-streams/run-downsampling.md | 65 ++++++++ .../data-store/data-streams/set-up-tsds.md | 4 +- .../time-series-data-stream-tsds.md | 6 +- manage-data/toc.yml | 9 +- 9 files changed, 208 insertions(+), 143 deletions(-) create mode 100644 manage-data/data-store/data-streams/downsampling-concepts.md create mode 100644 manage-data/data-store/data-streams/run-downsampling.md diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md new file mode 100644 index 0000000000..a6e9f4d32d --- /dev/null +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -0,0 +1,105 @@ +--- +applies_to: + stack: ga + serverless: ga +products: + - id: elasticsearch +--- + +# Downsampling concepts [how-downsampling-works] + +:::{warning} +🚧 Work in progress 🚧 +::: + +A [time series](time-series-data-stream-tsds.md#time-series) is a sequence of observations taken over time for a specific entity. The observed samples can be represented as a continuous function, where the time series dimensions remain constant and the time series metrics change over time. + +:::{image} /manage-data/images/elasticsearch-reference-time-series-function.png +:alt: time series function +::: + +In an Elasticsearch index, a single document is created for each timestamp. The document contains the immutable time series dimensions, together with metric names and values. Several time series dimensions and metrics can be stored for a single timestamp. + +:::{image} /manage-data/images/elasticsearch-reference-time-series-metric-anatomy.png +:alt: time series metric anatomy +::: + +For your most current and relevant data, the metrics series typically has a low sampling time interval, so it's optimized for queries that require a high data resolution. + +:::{image} /manage-data/images/elasticsearch-reference-time-series-original.png +:alt: time series original +:title: Original metrics series +::: + +Downsampling reduces the footprint of older, less frequently accessed data by replacing the original time series with a data stream of a higher sampling interval, plus statistical representations of the data. For example, if the original metrics samples were taken every 10 seconds, as the data ages you might choose to reduce the sample granularity to hourly or daily. Or you might choose to reduce the granularity of `cold` archival data to monthly or less. + +:::{image} /manage-data/images/elasticsearch-reference-time-series-downsampled.png +:alt: time series downsampled +:title: Downsampled metrics series +::: + + +### The downsampling process [downsample-api-process] + +The downsampling operation traverses the source TSDS index and performs the following steps: + +1. Creates a new document for each value of the `_tsid` field and each `@timestamp` value, rounded to the `fixed_interval` defined in the downsampling configuration. +2. For each new document, copies all [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) from the source index to the target index. Dimensions in a TSDS are constant, so this step happens only once per bucket. +3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. The set of pre-aggregated results differs by metric field type: + + * `gauge` field type: + * `min`, `max`, `sum`, and `value_count` are stored + * `value_count` is stored as type `aggregate_metric_double` + * `counter field type: + * `last_value` is stored. + +4. For all other fields, the most recent value is copied to the target index. + +% TODO ^^ consider mini table in step 3; refactor generally + +### Source and target index field mappings [downsample-api-mappings] + +Fields in the target downsampled index are created based on fields in the original source index, as follows: + +1. **Dimensions:** Fields mapped with the `time-series-dimension` parameter are created in the target downsampled index with the same mapping as in the source index. +2. **Metrics:** Fields mapped with the `time_series_metric` parameter are created in the target downsampled index with the same mapping as in the source index, with one exception: `time_series_metric: gauge` fields are changed to `aggregate_metric_double`. +3. **Labels:** Label fields (fields that are neither dimensions nor metrics) are created in the target downsampled index with the same mapping as in the source index. + +% TODO ^^ make this more concise + +## Querying downsampled indices [querying-downsampled-indices] + +You can use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints to query a downsampled index. Multiple raw data and downsampled indices can be queried in a single request, and a single request can include downsampled indices at different granularities (different bucket timespan). That is, you can query data streams that contain downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). + +The result of a time based histogram aggregation is in a uniform bucket size and each downsampled index returns data ignoring the downsampling time interval. For example, if you run a `date_histogram` aggregation with `"fixed_interval": "1m"` on a downsampled index that has been downsampled at an hourly resolution (`"fixed_interval": "1h"`), the query returns one bucket with all of the data at minute 0, then 59 empty buckets, and then a bucket with data again for the next hour. + + +### Notes on downsample queries [querying-downsampled-indices-notes] + +There are a few things to note about querying downsampled indices: + +* When you run queries in {{kib}} and through Elastic solutions, a normal response is returned without notification that some of the queried indices are downsampled. +* For [date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md), only `fixed_intervals` (and not calendar-aware intervals) are supported. +* Timezone support comes with caveats: + + * Date histograms at intervals that are multiples of an hour are based on values generated at UTC. This works well for timezones that are on the hour, e.g. +5:00 or -3:00, but requires offsetting the reported time buckets, e.g. `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000` for timezone +5:30 (India), if downsampling aggregates values per hour. In this case, the results include the field `downsampled_results_offset: true`, to indicate that the time buckets are shifted. This can be avoided if a downsampling interval of 15 minutes is used, as it allows properly calculating hourly values for the shifted buckets. + * Date histograms at intervals that are multiples of a day are similarly affected, in case downsampling aggregates values per day. In this case, the beginning of each day is always calculated at UTC when generated the downsampled values, so the time buckets need to be shifted, e.g. reported as `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for timezone `America/New_York`. The field `downsampled_results_offset: true` is added in this case too. + * Daylight savings and similar peculiarities around timezones affect reported results, as [documented](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone) for date histogram aggregation. Besides, downsampling at daily interval hinders tracking any information related to daylight savings changes. + + + +## Restrictions and limitations [downsampling-restrictions] + +The following restrictions and limitations apply for downsampling: + +* Only indices in a [time series data stream](time-series-data-stream-tsds.md) are supported. +* Data is downsampled based on the time dimension only. All other dimensions are copied to the new index without any modification. +* Within a data stream, a downsampled index replaces the original index and the original index is deleted. Only one index can exist for a given time period. +* A source index must be in read-only mode for the downsampling process to succeed. Check the [Run downsampling manually](./run-downsampling-manually.md) example for details. +* Downsampling data for the same period many times (downsampling of a downsampled index) is supported. The downsampling interval must be a multiple of the interval of the downsampled index. +* Downsampling is provided as an ILM action. See [Downsample](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md). +* The new, downsampled index is created on the data tier of the original index and it inherits its settings (for example, the number of shards and replicas). +* The numeric `gauge` and `counter` [metric types](elasticsearch://reference/elasticsearch/mapping-reference/mapping-field-meta.md) are supported. +* The downsampling configuration is extracted from the time series data stream [index mapping](./set-up-tsds.md#create-tsds-index-template). The only additional required setting is the downsampling `fixed_interval`. + + diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 234c8f41dd..ea575dfb89 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -1,4 +1,5 @@ --- +navigation_title: "Downsample a TSDS" mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling.html applies_to: @@ -8,145 +9,29 @@ products: - id: elasticsearch --- -# Downsampling a time series data stream [downsampling] +# Downsample a time series data stream [downsampling] -Downsampling provides a method to reduce the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. - -Metrics solutions collect large amounts of time series data that grow over time. As that data ages, it becomes less relevant to the current state of the system. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the `min`, `max`, `sum` and `value_count` for each metric. Data stream [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) are stored unchanged. - -Downsampling, in effect, lets you to trade data resolution and precision for storage size. You can include it in an [{{ilm}} ({{ilm-init}})](../../lifecycle/index-lifecycle-management.md) policy to automatically manage the volume and associated cost of your metrics data at it ages. - -Check the following sections to learn more: - -* [How it works](#how-downsampling-works) -* [Running downsampling on time series data](#running-downsampling) -* [Querying downsampled indices](#querying-downsampled-indices) -* [Restrictions and limitations](#downsampling-restrictions) -* [Try it out](#try-out-downsampling) - - -## How it works [how-downsampling-works] - -A [time series](time-series-data-stream-tsds.md#time-series) is a sequence of observations taken over time for a specific entity. The observed samples can be represented as a continuous function, where the time series dimensions remain constant and the time series metrics change over time. - -:::{image} /manage-data/images/elasticsearch-reference-time-series-function.png -:alt: time series function -::: - -In an Elasticsearch index, a single document is created for each timestamp, containing the immutable time series dimensions, together with the metrics names and the changing metrics values. For a single timestamp, several time series dimensions and metrics may be stored. - -:::{image} /manage-data/images/elasticsearch-reference-time-series-metric-anatomy.png -:alt: time series metric anatomy +:::{warning} +🚧 Work in progress 🚧 ::: -For your most current and relevant data, the metrics series typically has a low sampling time interval, so it’s optimized for queries that require a high data resolution. +Downsampling reduces the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. -:::{image} /manage-data/images/elasticsearch-reference-time-series-original.png -:alt: time series original -:title: Original metrics series -::: +Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for increased storage space. -Downsampling works on older, less frequently accessed data by replacing the original time series with both a data stream of a higher sampling interval and statistical representations of that data. Where the original metrics samples may have been taken, for example, every ten seconds, as the data ages you may choose to reduce the sample granularity to hourly or daily. You may choose to reduce the granularity of `cold` archival data to monthly or less. +The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the `min`, `max`, `sum`, and `value_count` for each metric. Data stream [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) are stored as is, with no changes. -:::{image} /manage-data/images/elasticsearch-reference-time-series-downsampled.png -:alt: time series downsampled -:title: Downsampled metrics series +:::{tip} +You can include downsampling in an [{{ilm}} ({{ilm-init}})](../../lifecycle/index-lifecycle-management.md) policy to automatically manage the volume and associated cost of your metrics data at it ages. ::: +This section explains the available downsampling options and helps you understand the process. -### The downsampling process [downsample-api-process] - -The downsampling operation traverses the source TSDS index and performs the following steps: - -1. Creates a new document for each value of the `_tsid` field and each `@timestamp` value, rounded to the `fixed_interval` defined in the downsample configuration. -2. For each new document, copies all [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) from the source index to the target index. Dimensions in a TSDS are constant, so this is done only once per bucket. -3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. Depending on the metric type of each metric field a different set of pre-aggregated results is stored: - - * `gauge`: The `min`, `max`, `sum`, and `value_count` are stored; `value_count` is stored as type `aggregate_metric_double`. - * `counter`: The `last_value` is stored. - -4. For all other fields, the most recent value is copied to the target index. - - -### Source and target index field mappings [downsample-api-mappings] - -Fields in the target, downsampled index are created based on fields in the original source index, as follows: - -1. All fields mapped with the `time-series-dimension` parameter are created in the target downsample index with the same mapping as in the source index. -2. All fields mapped with the `time_series_metric` parameter are created in the target downsample index with the same mapping as in the source index. An exception is that for fields mapped as `time_series_metric: gauge` the field type is changed to `aggregate_metric_double`. -3. All other fields that are neither dimensions nor metrics (that is, label fields), are created in the target downsample index with the same mapping that they had in the source index. - - -## Running downsampling on time series data [running-downsampling] - -To downsample a time series index, use the [Downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) and set `fixed_interval` to the level of granularity that you’d like: - -```console -POST /my-time-series-index/_downsample/my-downsampled-time-series-index -{ - "fixed_interval": "1d" -} -``` - -To downsample time series data as part of ILM, include a [Downsample action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) in your ILM policy and set `fixed_interval` to the level of granularity that you’d like: - -```console -PUT _ilm/policy/my_policy -{ - "policy": { - "phases": { - "warm": { - "actions": { - "downsample" : { - "fixed_interval": "1h" - } - } - } - } - } -} -``` - - -## Querying downsampled indices [querying-downsampled-indices] - -You can use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints to query a downsampled index. Multiple raw data and downsampled indices can be queried in a single request, and a single request can include downsampled indices at different granularities (different bucket timespan). That is, you can query data streams that contain downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). - -The result of a time based histogram aggregation is in a uniform bucket size and each downsampled index returns data ignoring the downsampling time interval. For example, if you run a `date_histogram` aggregation with `"fixed_interval": "1m"` on a downsampled index that has been downsampled at an hourly resolution (`"fixed_interval": "1h"`), the query returns one bucket with all of the data at minute 0, then 59 empty buckets, and then a bucket with data again for the next hour. - - -### Notes on downsample queries [querying-downsampled-indices-notes] - -There are a few things to note about querying downsampled indices: - -* When you run queries in {{kib}} and through Elastic solutions, a normal response is returned without notification that some of the queried indices are downsampled. -* For [date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md), only `fixed_intervals` (and not calendar-aware intervals) are supported. -* Timezone support comes with caveats: - - * Date histograms at intervals that are multiples of an hour are based on values generated at UTC. This works well for timezones that are on the hour, e.g. +5:00 or -3:00, but requires offsetting the reported time buckets, e.g. `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000` for timezone +5:30 (India), if downsampling aggregates values per hour. In this case, the results include the field `downsampled_results_offset: true`, to indicate that the time buckets are shifted. This can be avoided if a downsampling interval of 15 minutes is used, as it allows properly calculating hourly values for the shifted buckets. - * Date histograms at intervals that are multiples of a day are similarly affected, in case downsampling aggregates values per day. In this case, the beginning of each day is always calculated at UTC when generated the downsampled values, so the time buckets need to be shifted, e.g. reported as `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for timezone `America/New_York`. The field `downsampled_results_offset: true` is added in this case too. - * Daylight savings and similar peculiarities around timezones affect reported results, as [documented](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone) for date histogram aggregation. Besides, downsampling at daily interval hinders tracking any information related to daylight savings changes. - - - -## Restrictions and limitations [downsampling-restrictions] - -The following restrictions and limitations apply for downsampling: - -* Only indices in a [time series data stream](time-series-data-stream-tsds.md) are supported. -* Data is downsampled based on the time dimension only. All other dimensions are copied to the new index without any modification. -* Within a data stream, a downsampled index replaces the original index and the original index is deleted. Only one index can exist for a given time period. -* A source index must be in read-only mode for the downsampling process to succeed. Check the [Run downsampling manually](./run-downsampling-manually.md) example for details. -* Downsampling data for the same period many times (downsampling of a downsampled index) is supported. The downsampling interval must be a multiple of the interval of the downsampled index. -* Downsampling is provided as an ILM action. See [Downsample](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md). -* The new, downsampled index is created on the data tier of the original index and it inherits its settings (for example, the number of shards and replicas). -* The numeric `gauge` and `counter` [metric types](elasticsearch://reference/elasticsearch/mapping-reference/mapping-field-meta.md) are supported. -* The downsampling configuration is extracted from the time series data stream [index mapping](./set-up-tsds.md#create-tsds-index-template). The only additional required setting is the downsampling `fixed_interval`. - - -## Try it out [try-out-downsampling] - -To take downsampling for a test run, try our example of [running downsampling manually](./run-downsampling-manually.md). +% TODO add subsection links and conceptual links after restructuring -Downsampling can easily be added to your ILM policy. To learn how, try our [Run downsampling with ILM](./run-downsampling-with-ilm.md) example. +## Next steps +% TODO confirm patterns +* Run downsampling +* Downsampling concepts +* Time series data streams overview \ No newline at end of file diff --git a/manage-data/data-store/data-streams/run-downsampling-manually.md b/manage-data/data-store/data-streams/run-downsampling-manually.md index e689bb3ca0..4364c2f43d 100644 --- a/manage-data/data-store/data-streams/run-downsampling-manually.md +++ b/manage-data/data-store/data-streams/run-downsampling-manually.md @@ -9,10 +9,11 @@ products: - id: elasticsearch --- - - # Run downsampling manually [downsampling-manual] +:::{warning} +🚧 Work in progress 🚧 +::: The recommended way to [downsample](./downsampling-time-series-data-stream.md) a [time-series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md) is [through index lifecycle management (ILM)](run-downsampling-with-ilm.md). However, if you’re not using ILM, you can downsample a TSDS manually. This guide shows you how, using typical Kubernetes cluster monitoring data. diff --git a/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md b/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md index c32ae1e919..454425b413 100644 --- a/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md +++ b/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md @@ -9,10 +9,11 @@ products: - id: elasticsearch --- - - # Run downsampling using data stream lifecycle [downsampling-dsl] +:::{warning} +🚧 Work in progress 🚧 +::: This is a simplified example that allows you to see quickly how [downsampling](./downsampling-time-series-data-stream.md) works as part of a datastream lifecycle to reduce the storage size of a sampled set of metrics. The example uses typical Kubernetes cluster monitoring data. To test out downsampling with data stream lifecycle, follow these steps: diff --git a/manage-data/data-store/data-streams/run-downsampling-with-ilm.md b/manage-data/data-store/data-streams/run-downsampling-with-ilm.md index 14e87ee04e..9a48ee18cf 100644 --- a/manage-data/data-store/data-streams/run-downsampling-with-ilm.md +++ b/manage-data/data-store/data-streams/run-downsampling-with-ilm.md @@ -9,10 +9,11 @@ products: - id: elasticsearch --- - - # Run downsampling with ILM [downsampling-ilm] +:::{warning} +🚧 Work in progress 🚧 +::: This is a simplified example that allows you to see quickly how [downsampling](./downsampling-time-series-data-stream.md) works as part of an ILM policy to reduce the storage size of a sampled set of metrics. The example uses typical Kubernetes cluster monitoring data. To test out downsampling with ILM, follow these steps: diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md new file mode 100644 index 0000000000..e42a1db748 --- /dev/null +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -0,0 +1,65 @@ +--- +applies_to: + stack: ga + serverless: ga +products: + - id: elasticsearch +--- + +# Run downsampling on time series data [running-downsampling] + +:::{warning} +🚧 Work in progress 🚧 +::: + +% TODO consider retitling to "Downsample time series data" + +To downsample a time series index, you can use the `downsample API`, index lifecycle management (ILM), or a data stream lifecycle. + + +::::{tab-set} +:::{tab-item} Downsample API + +## Use the downsample API + +Issue a [downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) request, setting `fixed_interval` to your preferred level of granularity: + +```console +POST /my-time-series-index/_downsample/my-downsampled-time-series-index +{ + "fixed_interval": "1d" +} +``` +::: + +:::{tab-item} Index lifecycle + +## Downsample with index lifecycle management + +To downsample time series data as part of index lifecycle management (ILM), include a [downsample action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) in your ILM policy, setting `fixed_interval` to your preferred level of granularity: + +```console +PUT _ilm/policy/my_policy +{ +"policy": { + "phases": { + "warm": { + "actions": { + "downsample" : { + "fixed_interval": "1h" + } + } + } + } +} +} +``` +::: + +:::{tab-item} Data stream lifecycle + +Move tutorial here + +::: + +:::: diff --git a/manage-data/data-store/data-streams/set-up-tsds.md b/manage-data/data-store/data-streams/set-up-tsds.md index c3f0c41cbe..6068e057c5 100644 --- a/manage-data/data-store/data-streams/set-up-tsds.md +++ b/manage-data/data-store/data-streams/set-up-tsds.md @@ -11,10 +11,10 @@ products: -# Set up a TSDS [set-up-tsds] +# Set up a time series data stream [set-up-tsds] -To set up a [time series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md), follow these steps: +To set up a [time series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md), complete these steps: 1. Check the [prerequisites](#tsds-prereqs). 2. [Create an index lifecycle policy](#tsds-ilm-policy). diff --git a/manage-data/data-store/data-streams/time-series-data-stream-tsds.md b/manage-data/data-store/data-streams/time-series-data-stream-tsds.md index 702f1df002..4a3fcf52da 100644 --- a/manage-data/data-store/data-streams/time-series-data-stream-tsds.md +++ b/manage-data/data-store/data-streams/time-series-data-stream-tsds.md @@ -8,7 +8,11 @@ products: - id: elasticsearch --- -# Time series data stream (TSDS) [tsds] +# Time series data streams (TSDS) [tsds] + +:::{warning} +🚧 Work in progress 🚧 +::: A time series data stream (TSDS) models timestamped metrics data as one or more time series. diff --git a/manage-data/toc.yml b/manage-data/toc.yml index bbca9ac4a0..17e608b59b 100644 --- a/manage-data/toc.yml +++ b/manage-data/toc.yml @@ -15,9 +15,12 @@ toc: children: - file: data-store/data-streams/set-up-tsds.md - file: data-store/data-streams/downsampling-time-series-data-stream.md - - file: data-store/data-streams/run-downsampling-with-ilm.md - - file: data-store/data-streams/run-downsampling-manually.md - - file: data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md + children: + - file: data-store/data-streams/run-downsampling.md + - file: data-store/data-streams/run-downsampling-with-ilm.md + - file: data-store/data-streams/run-downsampling-manually.md + - file: data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md + - file: data-store/data-streams/downsampling-concepts.md - file: data-store/data-streams/reindex-tsds.md - file: data-store/data-streams/logs-data-stream.md - file: data-store/data-streams/failure-store.md From a9368d57e9757e8282815bd798b4f52161667af8 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Thu, 24 Jul 2025 17:30:34 -0400 Subject: [PATCH 02/20] Breadcrumbs --- manage-data/data-store/data-streams/downsampling-concepts.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index a6e9f4d32d..5588d18b11 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -67,6 +67,9 @@ Fields in the target downsampled index are created based on fields in the origin % TODO ^^ make this more concise +% first pass edits up to here +% TODO resume editing from this line down + ## Querying downsampled indices [querying-downsampled-indices] You can use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints to query a downsampled index. Multiple raw data and downsampled indices can be queried in a single request, and a single request can include downsampled indices at different granularities (different bucket timespan). That is, you can query data streams that contain downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). From f5e7ca5d9d4f52c487f5da2aa0c1adf438f54f5b Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Thu, 24 Jul 2025 17:33:19 -0400 Subject: [PATCH 03/20] Fix anchors --- .../data-store/data-streams/run-downsampling-with-ilm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/run-downsampling-with-ilm.md b/manage-data/data-store/data-streams/run-downsampling-with-ilm.md index 9a48ee18cf..12a018467d 100644 --- a/manage-data/data-store/data-streams/run-downsampling-with-ilm.md +++ b/manage-data/data-store/data-streams/run-downsampling-with-ilm.md @@ -352,7 +352,7 @@ After the ILM policy has taken effect, the original `.ds-datastream-2022.08.26-0 ... ``` -Run a search query on the datastream (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-time-series-data-stream.md#querying-downsampled-indices-notes)). +Run a search query on the datastream (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-concepts.md#querying-downsampled-indices-notes)). ```console GET datastream/_search From 601494fd2ede974407312671a0885f956584e49d Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Thu, 24 Jul 2025 17:34:15 -0400 Subject: [PATCH 04/20] Save your changes before committing --- .../data-store/data-streams/run-downsampling-manually.md | 2 +- .../run-downsampling-using-data-stream-lifecycle.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/manage-data/data-store/data-streams/run-downsampling-manually.md b/manage-data/data-store/data-streams/run-downsampling-manually.md index 4364c2f43d..70896d183f 100644 --- a/manage-data/data-store/data-streams/run-downsampling-manually.md +++ b/manage-data/data-store/data-streams/run-downsampling-manually.md @@ -405,7 +405,7 @@ You can now delete the old backing index. But be aware this will delete the orig ## View the results [downsampling-manual-view-results] -Re-run the earlier search query (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-time-series-data-stream.md#querying-downsampled-indices-notes)): +Re-run the earlier search query (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-concepts.md#querying-downsampled-indices-notes)): ```console GET /my-data-stream/_search diff --git a/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md b/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md index 454425b413..21aca001ef 100644 --- a/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md +++ b/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md @@ -351,7 +351,7 @@ After the data stream lifecycle action was executed, original `.ds-datastream-20 ... ``` -Run a search query on the datastream (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-time-series-data-stream.md#querying-downsampled-indices-notes)). +Run a search query on the datastream (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-concepts.md#querying-downsampled-indices-notes)). ```console GET datastream/_search From 3a0f5155197f2943affc034969adef6feee02bde Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Thu, 24 Jul 2025 17:35:57 -0400 Subject: [PATCH 05/20] wip banners --- manage-data/data-store/data-streams/reindex-tsds.md | 6 +++--- manage-data/data-store/data-streams/set-up-tsds.md | 5 +++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/manage-data/data-store/data-streams/reindex-tsds.md b/manage-data/data-store/data-streams/reindex-tsds.md index a3a95b4cfa..3b420618c9 100644 --- a/manage-data/data-store/data-streams/reindex-tsds.md +++ b/manage-data/data-store/data-streams/reindex-tsds.md @@ -9,11 +9,11 @@ products: - id: elasticsearch --- - - # Reindex a TSDS [tsds-reindex] - +:::{warning} +🚧 Work in progress 🚧 +::: ## Introduction [tsds-reindex-intro] diff --git a/manage-data/data-store/data-streams/set-up-tsds.md b/manage-data/data-store/data-streams/set-up-tsds.md index 6068e057c5..3b37deea9a 100644 --- a/manage-data/data-store/data-streams/set-up-tsds.md +++ b/manage-data/data-store/data-streams/set-up-tsds.md @@ -9,10 +9,11 @@ products: - id: elasticsearch --- - - # Set up a time series data stream [set-up-tsds] +:::{warning} +🚧 Work in progress 🚧 +::: To set up a [time series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md), complete these steps: From 4cb3d4fb4f0ae05da9d1adb49594063808e823b1 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Tue, 12 Aug 2025 19:14:14 -0400 Subject: [PATCH 06/20] Consolidate further; remove tutorial content --- .../data-streams/downsampling-concepts.md | 4 +- .../downsampling-time-series-data-stream.md | 9 +- .../data-streams/run-downsampling-manually.md | 568 ------------------ ...ownsampling-using-data-stream-lifecycle.md | 498 --------------- .../data-streams/run-downsampling-with-ilm.md | 473 --------------- .../data-streams/run-downsampling.md | 110 +++- manage-data/toc.yml | 3 - 7 files changed, 108 insertions(+), 1557 deletions(-) delete mode 100644 manage-data/data-store/data-streams/run-downsampling-manually.md delete mode 100644 manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md delete mode 100644 manage-data/data-store/data-streams/run-downsampling-with-ilm.md diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index 5588d18b11..a5a065db6c 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -98,7 +98,9 @@ The following restrictions and limitations apply for downsampling: * Only indices in a [time series data stream](time-series-data-stream-tsds.md) are supported. * Data is downsampled based on the time dimension only. All other dimensions are copied to the new index without any modification. * Within a data stream, a downsampled index replaces the original index and the original index is deleted. Only one index can exist for a given time period. -* A source index must be in read-only mode for the downsampling process to succeed. Check the [Run downsampling manually](./run-downsampling-manually.md) example for details. +* A source index must be in read-only mode for the downsampling process to succeed. Check the Run downsampling manually example for details. +* Downsampling data for the same period many times (downsampling of a downsampled index) is supported. The downsampling interval must be a multiple of the interval of the downsampled index. +* A source index must be in read-only mode for the downsampling process to succeed. Check the Run downsampling manually example for details. * Downsampling data for the same period many times (downsampling of a downsampled index) is supported. The downsampling interval must be a multiple of the interval of the downsampled index. * Downsampling is provided as an ILM action. See [Downsample](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md). * The new, downsampled index is created on the data tier of the original index and it inherits its settings (for example, the number of shards and replicas). diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index ea575dfb89..074ac8b88c 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -11,8 +11,8 @@ products: # Downsample a time series data stream [downsampling] -:::{warning} -🚧 Work in progress 🚧 +:::{admonition} Page status +🟒 Ready for review ::: Downsampling reduces the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. @@ -32,6 +32,5 @@ This section explains the available downsampling options and helps you understan ## Next steps % TODO confirm patterns -* Run downsampling -* Downsampling concepts -* Time series data streams overview \ No newline at end of file +* [](run-downsampling.md) +* [](downsampling-concepts.md) \ No newline at end of file diff --git a/manage-data/data-store/data-streams/run-downsampling-manually.md b/manage-data/data-store/data-streams/run-downsampling-manually.md deleted file mode 100644 index 70896d183f..0000000000 --- a/manage-data/data-store/data-streams/run-downsampling-manually.md +++ /dev/null @@ -1,568 +0,0 @@ ---- -navigation_title: Run downsampling manually -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-manual.html -applies_to: - stack: ga - serverless: ga -products: - - id: elasticsearch ---- - -# Run downsampling manually [downsampling-manual] - -:::{warning} -🚧 Work in progress 🚧 -::: - -The recommended way to [downsample](./downsampling-time-series-data-stream.md) a [time-series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md) is [through index lifecycle management (ILM)](run-downsampling-with-ilm.md). However, if you’re not using ILM, you can downsample a TSDS manually. This guide shows you how, using typical Kubernetes cluster monitoring data. - -To test out manual downsampling, follow these steps: - -1. Check the [prerequisites](#downsampling-manual-prereqs). -2. [Create a time series data stream](#downsampling-manual-create-index). -3. [Ingest time series data](#downsampling-manual-ingest-data). -4. [Downsample the TSDS](#downsampling-manual-run). -5. [View the results](#downsampling-manual-view-results). - - -## Prerequisites [downsampling-manual-prereqs] - -* Refer to the [TSDS prerequisites](./set-up-tsds.md#tsds-prereqs). -* It is not possible to downsample a [data stream](../data-streams.md) directly, nor multiple indices at once. It’s only possible to downsample one time series index (TSDS backing index). -* In order to downsample an index, it needs to be read-only. For a TSDS write index, this means it needs to be rolled over and made read-only first. -* Downsampling uses UTC timestamps. -* Downsampling needs at least one metric field to exist in the time series index. - - -## Create a time series data stream [downsampling-manual-create-index] - -First, you’ll create a TSDS. For simplicity, in the time series mapping all `time_series_metric` parameters are set to type `gauge`, but [other values](time-series-data-stream-tsds.md#time-series-metric) such as `counter` and `histogram` may also be used. The `time_series_metric` values determine the kind of statistical representations that are used during downsampling. - -The index template includes a set of static [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension): `host`, `namespace`, `node`, and `pod`. The time series dimensions are not changed by the downsampling process. - -```console -PUT _index_template/my-data-stream-template -{ - "index_patterns": [ - "my-data-stream*" - ], - "data_stream": {}, - "template": { - "settings": { - "index": { - "mode": "time_series", - "routing_path": [ - "kubernetes.namespace", - "kubernetes.host", - "kubernetes.node", - "kubernetes.pod" - ], - "number_of_replicas": 0, - "number_of_shards": 2 - } - }, - "mappings": { - "properties": { - "@timestamp": { - "type": "date" - }, - "kubernetes": { - "properties": { - "container": { - "properties": { - "cpu": { - "properties": { - "usage": { - "properties": { - "core": { - "properties": { - "ns": { - "type": "long" - } - } - }, - "limit": { - "properties": { - "pct": { - "type": "float" - } - } - }, - "nanocores": { - "type": "long", - "time_series_metric": "gauge" - }, - "node": { - "properties": { - "pct": { - "type": "float" - } - } - } - } - } - } - }, - "memory": { - "properties": { - "available": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - }, - "majorpagefaults": { - "type": "long" - }, - "pagefaults": { - "type": "long", - "time_series_metric": "gauge" - }, - "rss": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - }, - "usage": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - }, - "limit": { - "properties": { - "pct": { - "type": "float" - } - } - }, - "node": { - "properties": { - "pct": { - "type": "float" - } - } - } - } - }, - "workingset": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - } - } - }, - "name": { - "type": "keyword" - }, - "start_time": { - "type": "date" - } - } - }, - "host": { - "type": "keyword", - "time_series_dimension": true - }, - "namespace": { - "type": "keyword", - "time_series_dimension": true - }, - "node": { - "type": "keyword", - "time_series_dimension": true - }, - "pod": { - "type": "keyword", - "time_series_dimension": true - } - } - } - } - } - } -} -``` - - -## Ingest time series data [downsampling-manual-ingest-data] - -Because time series data streams have been designed to [only accept recent data](time-series-data-stream-tsds.md#tsds-accepted-time-range), in this example, you’ll use an ingest pipeline to time-shift the data as it gets indexed. As a result, the indexed data will have an `@timestamp` from the last 15 minutes. - -Create the pipeline with this request: - -```console -PUT _ingest/pipeline/my-timestamp-pipeline -{ - "description": "Shifts the @timestamp to the last 15 minutes", - "processors": [ - { - "set": { - "field": "ingest_time", - "value": "{{_ingest.timestamp}}" - } - }, - { - "script": { - "lang": "painless", - "source": """ - def delta = ChronoUnit.SECONDS.between( - ZonedDateTime.parse("2022-06-21T15:49:00Z"), - ZonedDateTime.parse(ctx["ingest_time"]) - ); - ctx["@timestamp"] = ZonedDateTime.parse(ctx["@timestamp"]).plus(delta,ChronoUnit.SECONDS).toString(); - """ - } - } - ] -} -``` - -Next, use a bulk API request to automatically create your TSDS and index a set of ten documents: - -```console -PUT /my-data-stream/_bulk?refresh&pipeline=my-timestamp-pipeline -{"create": {}} -{"@timestamp":"2022-06-21T15:49:00Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":91153,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":463314616},"usage":{"bytes":307007078,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":585236},"rss":{"bytes":102728},"pagefaults":120901,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:45:50Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":124501,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":982546514},"usage":{"bytes":360035574,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1339884},"rss":{"bytes":381174},"pagefaults":178473,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:50Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":38907,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":862723768},"usage":{"bytes":379572388,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":431227},"rss":{"bytes":386580},"pagefaults":233166,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:40Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":86706,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":567160996},"usage":{"bytes":103266017,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1724908},"rss":{"bytes":105431},"pagefaults":233166,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:00Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":150069,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":639054643},"usage":{"bytes":265142477,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1786511},"rss":{"bytes":189235},"pagefaults":138172,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:42:40Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":82260,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":854735585},"usage":{"bytes":309798052,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":924058},"rss":{"bytes":110838},"pagefaults":259073,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:42:10Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":153404,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":279586406},"usage":{"bytes":214904955,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1047265},"rss":{"bytes":91914},"pagefaults":302252,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:40:20Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":125613,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":822782853},"usage":{"bytes":100475044,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":2109932},"rss":{"bytes":278446},"pagefaults":74843,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:40:10Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":100046,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":567160996},"usage":{"bytes":362826547,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1986724},"rss":{"bytes":402801},"pagefaults":296495,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:38:30Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":40018,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":1062428344},"usage":{"bytes":265142477,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":2294743},"rss":{"bytes":340623},"pagefaults":224530,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -``` - -You can use the search API to check if the documents have been indexed correctly: - -```console -GET /my-data-stream/_search -``` - -Run the following aggregation on the data to calculate some interesting statistics: - -```console -GET /my-data-stream/_search -{ - "size": 0, - "aggs": { - "tsid": { - "terms": { - "field": "_tsid" - }, - "aggs": { - "over_time": { - "date_histogram": { - "field": "@timestamp", - "fixed_interval": "1d" - }, - "aggs": { - "min": { - "min": { - "field": "kubernetes.container.memory.usage.bytes" - } - }, - "max": { - "max": { - "field": "kubernetes.container.memory.usage.bytes" - } - }, - "avg": { - "avg": { - "field": "kubernetes.container.memory.usage.bytes" - } - } - } - } - } - } - } -} -``` - - -## Downsample the TSDS [downsampling-manual-run] - -A TSDS can’t be downsampled directly. You need to downsample its backing indices instead. You can see the backing index for your data stream by running: - -```console -GET /_data_stream/my-data-stream -``` - -This returns: - -```console-result -{ - "data_streams": [ - { - "name": "my-data-stream", - "timestamp_field": { - "name": "@timestamp" - }, - "indices": [ - { - "index_name": ".ds-my-data-stream-2023.07.26-000001", <1> - "index_uuid": "ltOJGmqgTVm4T-Buoe7Acg", - "prefer_ilm": true, - "managed_by": "Unmanaged" - } - ], - "generation": 1, - "status": "GREEN", - "next_generation_managed_by": "Unmanaged", - "prefer_ilm": true, - "template": "my-data-stream-template", - "hidden": false, - "system": false, - "allow_custom_routing": false, - "replicated": false, - "rollover_on_write": false, - "time_series": { - "temporal_ranges": [ - { - "start": "2023-07-26T09:26:42.000Z", - "end": "2023-07-26T13:26:42.000Z" - } - ] - } - } - ] -} -``` - -1. The backing index for this data stream. - - -Before a backing index can be downsampled, the TSDS needs to be rolled over and the old index needs to be made read-only. - -Roll over the TSDS using the [rollover API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover): - -```console -POST /my-data-stream/_rollover/ -``` - -Copy the name of the `old_index` from the response. In the following steps, replace the index name with that of your `old_index`. - -The old index needs to be set to read-only mode. Run the following request: - -```console -PUT /.ds-my-data-stream-2023.07.26-000001/_block/write -``` - -Next, use the [downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) to downsample the index, setting the time series interval to one hour: - -```console -POST /.ds-my-data-stream-2023.07.26-000001/_downsample/.ds-my-data-stream-2023.07.26-000001-downsample -{ - "fixed_interval": "1h" -} -``` - -Now you can [modify the data stream](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-modify-data-stream), and replace the original index with the downsampled one: - -```console -POST _data_stream/_modify -{ - "actions": [ - { - "remove_backing_index": { - "data_stream": "my-data-stream", - "index": ".ds-my-data-stream-2023.07.26-000001" - } - }, - { - "add_backing_index": { - "data_stream": "my-data-stream", - "index": ".ds-my-data-stream-2023.07.26-000001-downsample" - } - } - ] -} -``` - -You can now delete the old backing index. But be aware this will delete the original data. Don’t delete the index if you may need the original data in the future. - - -## View the results [downsampling-manual-view-results] - -Re-run the earlier search query (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-concepts.md#querying-downsampled-indices-notes)): - -```console -GET /my-data-stream/_search -``` - -The TSDS with the new downsampled backing index contains just one document. For counters, this document would only have the last value. For gauges, the field type is now `aggregate_metric_double`. You see the `min`, `max`, `sum`, and `value_count` statistics based off of the original sampled metrics: - -```console-result -{ - "took": 2, - "timed_out": false, - "_shards": { - "total": 4, - "successful": 4, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": ".ds-my-data-stream-2023.07.26-000001-downsample", - "_id": "0eL0wC_4-45SnTNFAAABiZHbD4A", - "_score": 1, - "_source": { - "@timestamp": "2023-07-26T11:00:00.000Z", - "_doc_count": 10, - "ingest_time": "2023-07-26T11:26:42.715Z", - "kubernetes": { - "container": { - "cpu": { - "usage": { - "core": { - "ns": 12828317850 - }, - "limit": { - "pct": 0.0000277905 - }, - "nanocores": { - "min": 38907, - "max": 153404, - "sum": 992677, - "value_count": 10 - }, - "node": { - "pct": 0.0000277905 - } - } - }, - "memory": { - "available": { - "bytes": { - "min": 279586406, - "max": 1062428344, - "sum": 7101494721, - "value_count": 10 - } - }, - "majorpagefaults": 0, - "pagefaults": { - "min": 74843, - "max": 302252, - "sum": 2061071, - "value_count": 10 - }, - "rss": { - "bytes": { - "min": 91914, - "max": 402801, - "sum": 2389770, - "value_count": 10 - } - }, - "usage": { - "bytes": { - "min": 100475044, - "max": 379572388, - "sum": 2668170609, - "value_count": 10 - }, - "limit": { - "pct": 0.00009923134 - }, - "node": { - "pct": 0.017700378 - } - }, - "workingset": { - "bytes": { - "min": 431227, - "max": 2294743, - "sum": 14230488, - "value_count": 10 - } - } - }, - "name": "container-name-44", - "start_time": "2021-03-30T07:59:06.000Z" - }, - "host": "gke-apps-0", - "namespace": "namespace26", - "node": "gke-apps-0-0", - "pod": "gke-apps-0-0-0" - } - } - } - ] - } -} -``` - -Re-run the earlier aggregation. Even though the aggregation runs on the downsampled TSDS that only contains 1 document, it returns the same results as the earlier aggregation on the original TSDS. - -```console -GET /my-data-stream/_search -{ - "size": 0, - "aggs": { - "tsid": { - "terms": { - "field": "_tsid" - }, - "aggs": { - "over_time": { - "date_histogram": { - "field": "@timestamp", - "fixed_interval": "1d" - }, - "aggs": { - "min": { - "min": { - "field": "kubernetes.container.memory.usage.bytes" - } - }, - "max": { - "max": { - "field": "kubernetes.container.memory.usage.bytes" - } - }, - "avg": { - "avg": { - "field": "kubernetes.container.memory.usage.bytes" - } - } - } - } - } - } - } -} -``` - -This example demonstrates how downsampling can dramatically reduce the number of documents stored for time series data, within whatever time boundaries you choose. It’s also possible to perform downsampling on already downsampled data, to further reduce storage and associated costs, as the time series data ages and the data resolution becomes less critical. - -The recommended way to downsample a TSDS is with ILM. To learn more, try the [Run downsampling with ILM](./run-downsampling-with-ilm.md) example. - diff --git a/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md b/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md deleted file mode 100644 index 21aca001ef..0000000000 --- a/manage-data/data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md +++ /dev/null @@ -1,498 +0,0 @@ ---- -navigation_title: Run downsampling using data stream lifecycle -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-dsl.html -applies_to: - stack: ga - serverless: ga -products: - - id: elasticsearch ---- - -# Run downsampling using data stream lifecycle [downsampling-dsl] - -:::{warning} -🚧 Work in progress 🚧 -::: - -This is a simplified example that allows you to see quickly how [downsampling](./downsampling-time-series-data-stream.md) works as part of a datastream lifecycle to reduce the storage size of a sampled set of metrics. The example uses typical Kubernetes cluster monitoring data. To test out downsampling with data stream lifecycle, follow these steps: - -1. Check the [prerequisites](#downsampling-dsl-prereqs). -2. [Create an index template with data stream lifecycle](#downsampling-dsl-create-index-template). -3. [Ingest time series data](#downsampling-dsl-ingest-data). -4. [View current state of data stream](#downsampling-dsl-view-data-stream-state). -5. [Roll over the data stream](#downsampling-dsl-rollover). -6. [View downsampling results](#downsampling-dsl-view-results). - - -## Prerequisites [downsampling-dsl-prereqs] - -Refer to [time series data stream prerequisites](./set-up-tsds.md#tsds-prereqs). - - -## Create an index template with data stream lifecycle [downsampling-dsl-create-index-template] - -This creates an index template for a basic data stream. The available parameters for an index template are described in detail in [Set up a time series data stream](set-up-data-stream.md). - -For simplicity, in the time series mapping all `time_series_metric` parameters are set to type `gauge`, but the `counter` metric type may also be used. The `time_series_metric` values determine the kind of statistical representations that are used during downsampling. - -The index template includes a set of static [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension): `host`, `namespace`, `node`, and `pod`. The time series dimensions are not changed by the downsampling process. - -To enable downsampling, this template includes a `lifecycle` section with [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) object. `fixed_interval` parameter sets downsampling interval at which you want to aggregate the original time series data. `after` parameter specifies how much time after index was rolled over should pass before downsampling is performed. - -```console -PUT _index_template/datastream_template -{ - "index_patterns": [ - "datastream*" - ], - "data_stream": {}, - "template": { - "lifecycle": { - "downsampling": [ - { - "after": "1m", - "fixed_interval": "1h" - } - ] - }, - "settings": { - "index": { - "mode": "time_series" - } - }, - "mappings": { - "properties": { - "@timestamp": { - "type": "date" - }, - "kubernetes": { - "properties": { - "container": { - "properties": { - "cpu": { - "properties": { - "usage": { - "properties": { - "core": { - "properties": { - "ns": { - "type": "long" - } - } - }, - "limit": { - "properties": { - "pct": { - "type": "float" - } - } - }, - "nanocores": { - "type": "long", - "time_series_metric": "gauge" - }, - "node": { - "properties": { - "pct": { - "type": "float" - } - } - } - } - } - } - }, - "memory": { - "properties": { - "available": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - }, - "majorpagefaults": { - "type": "long" - }, - "pagefaults": { - "type": "long", - "time_series_metric": "gauge" - }, - "rss": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - }, - "usage": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - }, - "limit": { - "properties": { - "pct": { - "type": "float" - } - } - }, - "node": { - "properties": { - "pct": { - "type": "float" - } - } - } - } - }, - "workingset": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - } - } - }, - "name": { - "type": "keyword" - }, - "start_time": { - "type": "date" - } - } - }, - "host": { - "type": "keyword", - "time_series_dimension": true - }, - "namespace": { - "type": "keyword", - "time_series_dimension": true - }, - "node": { - "type": "keyword", - "time_series_dimension": true - }, - "pod": { - "type": "keyword", - "time_series_dimension": true - } - } - } - } - } - } -} -``` - - -## Ingest time series data [downsampling-dsl-ingest-data] - -Use a bulk API request to automatically create your TSDS and index a set of ten documents. - -**Important:** Before running this bulk request you need to update the timestamps to within three to five hours after your current time. That is, search `2022-06-21T15` and replace with your present date, and adjust the hour to your current time plus three hours. - -```console -PUT /datastream/_bulk?refresh -{"create": {}} -{"@timestamp":"2022-06-21T15:49:00Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":91153,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":463314616},"usage":{"bytes":307007078,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":585236},"rss":{"bytes":102728},"pagefaults":120901,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:45:50Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":124501,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":982546514},"usage":{"bytes":360035574,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1339884},"rss":{"bytes":381174},"pagefaults":178473,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:50Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":38907,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":862723768},"usage":{"bytes":379572388,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":431227},"rss":{"bytes":386580},"pagefaults":233166,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:40Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":86706,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":567160996},"usage":{"bytes":103266017,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1724908},"rss":{"bytes":105431},"pagefaults":233166,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:00Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":150069,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":639054643},"usage":{"bytes":265142477,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1786511},"rss":{"bytes":189235},"pagefaults":138172,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:42:40Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":82260,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":854735585},"usage":{"bytes":309798052,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":924058},"rss":{"bytes":110838},"pagefaults":259073,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:42:10Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":153404,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":279586406},"usage":{"bytes":214904955,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1047265},"rss":{"bytes":91914},"pagefaults":302252,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:40:20Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":125613,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":822782853},"usage":{"bytes":100475044,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":2109932},"rss":{"bytes":278446},"pagefaults":74843,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:40:10Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":100046,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":567160996},"usage":{"bytes":362826547,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1986724},"rss":{"bytes":402801},"pagefaults":296495,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:38:30Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":40018,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":1062428344},"usage":{"bytes":265142477,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":2294743},"rss":{"bytes":340623},"pagefaults":224530,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -``` - - -## View current state of data stream [downsampling-dsl-view-data-stream-state] - -Now that you’ve created and added documents to the data stream, check to confirm the current state of the new index. - -```console -GET _data_stream -``` - -If the data stream lifecycle policy has not yet been applied, your results will be like the following. Note the original `index_name`: `.ds-datastream-2024.04.29-000001`. - -```console-result -{ - "data_streams": [ - { - "name": "datastream", - "timestamp_field": { - "name": "@timestamp" - }, - "indices": [ - { - "index_name": ".ds-datastream-2024.04.29-000001", - "index_uuid": "vUMNtCyXQhGdlo1BD-cGRw", - "managed_by": "Data stream lifecycle" - } - ], - "generation": 1, - "status": "GREEN", - "template": "datastream_template", - "lifecycle": { - "enabled": true, - "downsampling": [ - { - "after": "1m", - "fixed_interval": "1h" - } - ] - }, - "next_generation_managed_by": "Data stream lifecycle", - "hidden": false, - "system": false, - "allow_custom_routing": false, - "replicated": false, - "rollover_on_write": false, - "time_series": { - "temporal_ranges": [ - { - "start": "2024-04-29T15:55:46.000Z", - "end": "2024-04-29T18:25:46.000Z" - } - ] - } - } - ] -} -``` - -Next, run a search query: - -```console -GET datastream/_search -``` - -The query returns your ten newly added documents. - -```console-result -{ - "took": 23, - "timed_out": false, - "_shards": { - "total": 1, - "successful": 1, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 10, - "relation": "eq" - }, -... -``` - - -## Roll over the data stream [downsampling-dsl-rollover] - -Data stream lifecycle will automatically roll over data stream and perform downsampling. This step is only needed in order to see downsampling results in scope of this tutorial. - -Roll over the data stream using the [rollover API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover): - -```console -POST /datastream/_rollover/ -``` - - -## View downsampling results [downsampling-dsl-view-results] - -By default, data stream lifecycle actions are executed every five minutes. Downsampling takes place after the index is rolled over and the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has lapsed as the source index is still expected to receive major writes until then. Index is now rolled over after previous step but its time series range end is likely still in the future. Once index time series range is in the past, re-run the `GET _data_stream` request. - -```console -GET _data_stream -``` - -After the data stream lifecycle action was executed, original `.ds-datastream-2024.04.29-000001` index is replaced with a new, downsampled index, in this case `downsample-1h-.ds-datastream-2024.04.29-000001`. - -```console-result -{ - "data_streams": [ - { - "name": "datastream", - "timestamp_field": { - "name": "@timestamp" - }, - "indices": [ - { - "index_name": "downsample-1h-.ds-datastream-2024.04.29-000001", - "index_uuid": "VqXuShP4T8ODAOnWFcqitg", - "managed_by": "Data stream lifecycle" - }, - { - "index_name": ".ds-datastream-2024.04.29-000002", - "index_uuid": "8gCeSdjUSWG-o-PeEAJ0jA", - "managed_by": "Data stream lifecycle" - } - ], -... -``` - -Run a search query on the datastream (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-concepts.md#querying-downsampled-indices-notes)). - -```console -GET datastream/_search -``` - -The new downsampled index contains just one document that includes the `min`, `max`, `sum`, and `value_count` statistics based off of the original sampled metrics. - -```console-result -{ - "took": 26, - "timed_out": false, - "_shards": { - "total": 2, - "successful": 2, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "downsample-1h-.ds-datastream-2024.04.29-000001", - "_id": "0eL0wMf38sl_s5JnAAABjyrMjoA", - "_score": 1, - "_source": { - "@timestamp": "2024-04-29T17:00:00.000Z", - "_doc_count": 10, - "kubernetes": { - "container": { - "cpu": { - "usage": { - "core": { - "ns": 12828317850 - }, - "limit": { - "pct": 0.0000277905 - }, - "nanocores": { - "min": 38907, - "max": 153404, - "sum": 992677, - "value_count": 10 - }, - "node": { - "pct": 0.0000277905 - } - } - }, - "memory": { - "available": { - "bytes": { - "min": 279586406, - "max": 1062428344, - "sum": 7101494721, - "value_count": 10 - } - }, - "majorpagefaults": 0, - "pagefaults": { - "min": 74843, - "max": 302252, - "sum": 2061071, - "value_count": 10 - }, - "rss": { - "bytes": { - "min": 91914, - "max": 402801, - "sum": 2389770, - "value_count": 10 - } - }, - "usage": { - "bytes": { - "min": 100475044, - "max": 379572388, - "sum": 2668170609, - "value_count": 10 - }, - "limit": { - "pct": 0.00009923134 - }, - "node": { - "pct": 0.017700378 - } - }, - "workingset": { - "bytes": { - "min": 431227, - "max": 2294743, - "sum": 14230488, - "value_count": 10 - } - } - }, - "name": "container-name-44", - "start_time": "2021-03-30T07:59:06.000Z" - }, - "host": "gke-apps-0", - "namespace": "namespace26", - "node": "gke-apps-0-0", - "pod": "gke-apps-0-0-0" - } - } - } - ] - } -} -``` - -Use the [data stream stats API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-data-streams-stats-1) to get statistics for the data stream, including the storage size. - -```console -GET /_data_stream/datastream/_stats?human=true -``` - -```console-result -{ - "_shards": { - "total": 4, - "successful": 4, - "failed": 0 - }, - "data_stream_count": 1, - "backing_indices": 2, - "total_store_size": "37.3kb", - "total_store_size_bytes": 38230, - "data_streams": [ - { - "data_stream": "datastream", - "backing_indices": 2, - "store_size": "37.3kb", - "store_size_bytes": 38230, - "maximum_timestamp": 1714410000000 - } - ] -} -``` - -This example demonstrates how downsampling works as part of a data stream lifecycle to reduce the storage size of metrics data as it becomes less current and less frequently queried. diff --git a/manage-data/data-store/data-streams/run-downsampling-with-ilm.md b/manage-data/data-store/data-streams/run-downsampling-with-ilm.md deleted file mode 100644 index 12a018467d..0000000000 --- a/manage-data/data-store/data-streams/run-downsampling-with-ilm.md +++ /dev/null @@ -1,473 +0,0 @@ ---- -navigation_title: Run downsampling with ILM -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-ilm.html -applies_to: - stack: ga - serverless: ga -products: - - id: elasticsearch ---- - -# Run downsampling with ILM [downsampling-ilm] - -:::{warning} -🚧 Work in progress 🚧 -::: - -This is a simplified example that allows you to see quickly how [downsampling](./downsampling-time-series-data-stream.md) works as part of an ILM policy to reduce the storage size of a sampled set of metrics. The example uses typical Kubernetes cluster monitoring data. To test out downsampling with ILM, follow these steps: - -1. Check the [prerequisites](#downsampling-ilm-prereqs). -2. [Create an index lifecycle policy](#downsampling-ilm-policy). -3. [Create an index template](#downsampling-ilm-create-index-template). -4. [Ingest time series data](#downsampling-ilm-ingest-data). -5. [View the results](#downsampling-ilm-view-results). - - -## Prerequisites [downsampling-ilm-prereqs] - -Refer to [time series data stream prerequisites](./set-up-tsds.md#tsds-prereqs). - -Before running this example you may want to try the [Run downsampling manually](./run-downsampling-manually.md) example. - - -## Create an index lifecycle policy [downsampling-ilm-policy] - -Create an ILM policy for your time series data. While not required, an ILM policy is recommended to automate the management of your time series data stream indices. - -To enable downsampling, add a [Downsample action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) and set [`fixed_interval`](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md#ilm-downsample-options) to the downsampling interval at which you want to aggregate the original time series data. - -In this example, an ILM policy is configured for the `hot` phase. The downsample takes place after the index is rolled over and the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has lapsed as the source index is still expected to receive major writes until then. {{ilm-cap}} will not proceed with any action that expects the index to not receive writes anymore until the [index’s end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has passed. The {{ilm-cap}} actions that wait on the end time before proceeding are: - [Delete](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-delete.md) - [Downsample](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) - [Force merge](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-forcemerge.md) - [Read only](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-readonly.md) - [Searchable snapshot](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot.md) - [Shrink](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-shrink.md) - -```console -PUT _ilm/policy/datastream_policy -{ - "policy": { - "phases": { - "hot": { - "actions": { - "rollover" : { - "max_age": "5m" - }, - "downsample": { - "fixed_interval": "1h" - } - } - } - } - } -} -``` - - -## Create an index template [downsampling-ilm-create-index-template] - -This creates an index template for a basic data stream. The available parameters for an index template are described in detail in [Set up a time series data stream](set-up-data-stream.md). - -For simplicity, in the time series mapping all `time_series_metric` parameters are set to type `gauge`, but the `counter` metric type may also be used. The `time_series_metric` values determine the kind of statistical representations that are used during downsampling. - -The index template includes a set of static [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension): `host`, `namespace`, `node`, and `pod`. The time series dimensions are not changed by the downsampling process. - -```console -PUT _index_template/datastream_template -{ - "index_patterns": [ - "datastream*" - ], - "data_stream": {}, - "template": { - "settings": { - "index": { - "mode": "time_series", - "number_of_replicas": 0, - "number_of_shards": 2 - }, - "index.lifecycle.name": "datastream_policy" - }, - "mappings": { - "properties": { - "@timestamp": { - "type": "date" - }, - "kubernetes": { - "properties": { - "container": { - "properties": { - "cpu": { - "properties": { - "usage": { - "properties": { - "core": { - "properties": { - "ns": { - "type": "long" - } - } - }, - "limit": { - "properties": { - "pct": { - "type": "float" - } - } - }, - "nanocores": { - "type": "long", - "time_series_metric": "gauge" - }, - "node": { - "properties": { - "pct": { - "type": "float" - } - } - } - } - } - } - }, - "memory": { - "properties": { - "available": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - }, - "majorpagefaults": { - "type": "long" - }, - "pagefaults": { - "type": "long", - "time_series_metric": "gauge" - }, - "rss": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - }, - "usage": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - }, - "limit": { - "properties": { - "pct": { - "type": "float" - } - } - }, - "node": { - "properties": { - "pct": { - "type": "float" - } - } - } - } - }, - "workingset": { - "properties": { - "bytes": { - "type": "long", - "time_series_metric": "gauge" - } - } - } - } - }, - "name": { - "type": "keyword" - }, - "start_time": { - "type": "date" - } - } - }, - "host": { - "type": "keyword", - "time_series_dimension": true - }, - "namespace": { - "type": "keyword", - "time_series_dimension": true - }, - "node": { - "type": "keyword", - "time_series_dimension": true - }, - "pod": { - "type": "keyword", - "time_series_dimension": true - } - } - } - } - } - } -} -``` - - -## Ingest time series data [downsampling-ilm-ingest-data] - -Use a bulk API request to automatically create your TSDS and index a set of ten documents. - -**Important:** Before running this bulk request you need to update the timestamps to within three to five hours after your current time. That is, search `2022-06-21T15` and replace with your present date, and adjust the hour to your current time plus three hours. - -```console -PUT /datastream/_bulk?refresh -{"create": {}} -{"@timestamp":"2022-06-21T15:49:00Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":91153,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":463314616},"usage":{"bytes":307007078,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":585236},"rss":{"bytes":102728},"pagefaults":120901,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:45:50Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":124501,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":982546514},"usage":{"bytes":360035574,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1339884},"rss":{"bytes":381174},"pagefaults":178473,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:50Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":38907,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":862723768},"usage":{"bytes":379572388,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":431227},"rss":{"bytes":386580},"pagefaults":233166,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:40Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":86706,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":567160996},"usage":{"bytes":103266017,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1724908},"rss":{"bytes":105431},"pagefaults":233166,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:44:00Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":150069,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":639054643},"usage":{"bytes":265142477,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1786511},"rss":{"bytes":189235},"pagefaults":138172,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:42:40Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":82260,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":854735585},"usage":{"bytes":309798052,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":924058},"rss":{"bytes":110838},"pagefaults":259073,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:42:10Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":153404,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":279586406},"usage":{"bytes":214904955,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1047265},"rss":{"bytes":91914},"pagefaults":302252,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:40:20Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":125613,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":822782853},"usage":{"bytes":100475044,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":2109932},"rss":{"bytes":278446},"pagefaults":74843,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:40:10Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":100046,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":567160996},"usage":{"bytes":362826547,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":1986724},"rss":{"bytes":402801},"pagefaults":296495,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -{"create": {}} -{"@timestamp":"2022-06-21T15:38:30Z","kubernetes":{"host":"gke-apps-0","node":"gke-apps-0-0","pod":"gke-apps-0-0-0","container":{"cpu":{"usage":{"nanocores":40018,"core":{"ns":12828317850},"node":{"pct":2.77905e-05},"limit":{"pct":2.77905e-05}}},"memory":{"available":{"bytes":1062428344},"usage":{"bytes":265142477,"node":{"pct":0.01770037710617187},"limit":{"pct":9.923134671484496e-05}},"workingset":{"bytes":2294743},"rss":{"bytes":340623},"pagefaults":224530,"majorpagefaults":0},"start_time":"2021-03-30T07:59:06Z","name":"container-name-44"},"namespace":"namespace26"}} -``` - - -## View the results [downsampling-ilm-view-results] - -Now that you’ve created and added documents to the data stream, check to confirm the current state of the new index. - -```console -GET _data_stream -``` - -If the ILM policy has not yet been applied, your results will be like the following. Note the original `index_name`: `.ds-datastream--000001`. - -```console-result -{ - "data_streams": [ - { - "name": "datastream", - "timestamp_field": { - "name": "@timestamp" - }, - "indices": [ - { - "index_name": ".ds-datastream-2022.08.26-000001", - "index_uuid": "5g-3HrfETga-5EFKBM6R-w" - }, - { - "index_name": ".ds-datastream-2022.08.26-000002", - "index_uuid": "o0yRTdhWSo2pY8XMvfwy7Q" - } - ], - "generation": 2, - "status": "GREEN", - "template": "datastream_template", - "ilm_policy": "datastream_policy", - "hidden": false, - "system": false, - "allow_custom_routing": false, - "replicated": false, - "rollover_on_write": false, - "time_series": { - "temporal_ranges": [ - { - "start": "2022-08-26T13:29:07.000Z", - "end": "2022-08-26T19:29:07.000Z" - } - ] - } - } - ] -} -``` - -Next, run a search query: - -```console -GET datastream/_search -``` - -The query returns your ten newly added documents. - -```console-result -{ - "took": 17, - "timed_out": false, - "_shards": { - "total": 4, - "successful": 4, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 10, - "relation": "eq" - }, -... -``` - -By default, index lifecycle management checks every ten minutes for indices that meet policy criteria. Wait for about ten minutes (maybe brew up a quick coffee or tea β˜• ) and then re-run the `GET _data_stream` request. - -```console -GET _data_stream -``` - -After the ILM policy has taken effect, the original `.ds-datastream-2022.08.26-000001` index is replaced with a new, downsampled index, in this case `downsample-6tkn-.ds-datastream-2022.08.26-000001`. - -```console-result -{ - "data_streams": [ - { - "name": "datastream", - "timestamp_field": { - "name": "@timestamp" - }, - "indices": [ - { - "index_name": "downsample-6tkn-.ds-datastream-2022.08.26-000001", - "index_uuid": "qRane1fQQDCNgKQhXmTIvg" - }, - { - "index_name": ".ds-datastream-2022.08.26-000002", - "index_uuid": "o0yRTdhWSo2pY8XMvfwy7Q" - } - ], -... -``` - -Run a search query on the datastream (note that when querying downsampled indices there are [a few nuances to be aware of](./downsampling-concepts.md#querying-downsampled-indices-notes)). - -```console -GET datastream/_search -``` - -The new downsampled index contains just one document that includes the `min`, `max`, `sum`, and `value_count` statistics based off of the original sampled metrics. - -```console-result -{ - "took": 6, - "timed_out": false, - "_shards": { - "total": 4, - "successful": 4, - "skipped": 0, - "failed": 0 - }, - "hits": { - "total": { - "value": 1, - "relation": "eq" - }, - "max_score": 1, - "hits": [ - { - "_index": "downsample-6tkn-.ds-datastream-2022.08.26-000001", - "_id": "0eL0wC_4-45SnTNFAAABgtpz0wA", - "_score": 1, - "_source": { - "@timestamp": "2022-08-26T14:00:00.000Z", - "_doc_count": 10, - "kubernetes.host": "gke-apps-0", - "kubernetes.namespace": "namespace26", - "kubernetes.node": "gke-apps-0-0", - "kubernetes.pod": "gke-apps-0-0-0", - "kubernetes.container.cpu.usage.nanocores": { - "min": 38907, - "max": 153404, - "sum": 992677, - "value_count": 10 - }, - "kubernetes.container.memory.available.bytes": { - "min": 279586406, - "max": 1062428344, - "sum": 7101494721, - "value_count": 10 - }, - "kubernetes.container.memory.pagefaults": { - "min": 74843, - "max": 302252, - "sum": 2061071, - "value_count": 10 - }, - "kubernetes.container.memory.rss.bytes": { - "min": 91914, - "max": 402801, - "sum": 2389770, - "value_count": 10 - }, - "kubernetes.container.memory.usage.bytes": { - "min": 100475044, - "max": 379572388, - "sum": 2668170609, - "value_count": 10 - }, - "kubernetes.container.memory.workingset.bytes": { - "min": 431227, - "max": 2294743, - "sum": 14230488, - "value_count": 10 - }, - "kubernetes.container.cpu.usage.core.ns": 12828317850, - "kubernetes.container.cpu.usage.limit.pct": 0.000027790500098490156, - "kubernetes.container.cpu.usage.node.pct": 0.000027790500098490156, - "kubernetes.container.memory.majorpagefaults": 0, - "kubernetes.container.memory.usage.limit.pct": 0.00009923134348355234, - "kubernetes.container.memory.usage.node.pct": 0.017700377851724625, - "kubernetes.container.name": "container-name-44", - "kubernetes.container.start_time": "2021-03-30T07:59:06.000Z" - } - } - ] - } -} -``` - -Use the [data stream stats API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-data-streams-stats-1) to get statistics for the data stream, including the storage size. - -```console -GET /_data_stream/datastream/_stats?human=true -``` - -```console-result -{ - "_shards": { - "total": 4, - "successful": 4, - "failed": 0 - }, - "data_stream_count": 1, - "backing_indices": 2, - "total_store_size": "16.6kb", - "total_store_size_bytes": 17059, - "data_streams": [ - { - "data_stream": "datastream", - "backing_indices": 2, - "store_size": "16.6kb", - "store_size_bytes": 17059, - "maximum_timestamp": 1661522400000 - } - ] -} -``` - -This example demonstrates how downsampling works as part of an ILM policy to reduce the storage size of metrics data as it becomes less current and less frequently queried. - -You can also try our [Run downsampling manually](./run-downsampling-manually.md) example to learn how downsampling can work outside of an ILM policy. diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index e42a1db748..d2ba107fc0 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -2,27 +2,43 @@ applies_to: stack: ga serverless: ga +navigation_title: "Run downsampling" +mapped_pages: + - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-manual.html + - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-ilm.html products: - id: elasticsearch --- # Run downsampling on time series data [running-downsampling] -:::{warning} -🚧 Work in progress 🚧 +:::{admonition} Page status +🟒 Ready for review ::: -% TODO consider retitling to "Downsample time series data" +% TODO consider retitling (cf. overview) -To downsample a time series index, you can use the `downsample API`, index lifecycle management (ILM), or a data stream lifecycle. +To downsample a time series data stream backing index, you can use the `downsample API`, index lifecycle management (ILM), or a data stream lifecycle. +:::{note} +Downsampling runs on the data stream backing index, not the data stream itself. +::: + +## Prerequisites + +Before you start, make sure your index is a candidate for downsampling: + +* The index must be **read-only**. You can roll over a write index and make it read-only. +* The index must have at least one metric field. + +For more details about the downsampling process, refer to [](downsampling-concepts.md). ::::{tab-set} :::{tab-item} Downsample API -## Use the downsample API +## Downsampling with the API -Issue a [downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) request, setting `fixed_interval` to your preferred level of granularity: +Make a [downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) request: ```console POST /my-time-series-index/_downsample/my-downsampled-time-series-index @@ -30,13 +46,16 @@ POST /my-time-series-index/_downsample/my-downsampled-time-series-index "fixed_interval": "1d" } ``` + +Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. + ::: :::{tab-item} Index lifecycle -## Downsample with index lifecycle management +## Downsampling with index lifecycle management -To downsample time series data as part of index lifecycle management (ILM), include a [downsample action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) in your ILM policy, setting `fixed_interval` to your preferred level of granularity: +To downsample time series data as part of index lifecycle management (ILM), include a [downsample action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) in your ILM policy: ```console PUT _ilm/policy/my_policy @@ -54,12 +73,85 @@ PUT _ilm/policy/my_policy } } ``` +Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. + +% TODO consider restoring removed tutorial-esque content + +In this example, an ILM policy is configured for the `hot` phase. The downsample action runs after the index is rolled over and the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has passed. + +```console +PUT _ilm/policy/datastream_policy +{ + "policy": { + "phases": { + "hot": { + "actions": { + "rollover" : { + "max_age": "5m" + }, + "downsample": { + "fixed_interval": "1h" + } + } + } + } + } +} +``` + + ::: :::{tab-item} Data stream lifecycle -Move tutorial here +## Downsampling with data stream lifecycle management + +To downsample time series data as part of data lifecycle management, create an index template that includes a `lifecycle` section with a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) object. + +* Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. +* Set `after` to the minimum time to wait after an index rollover, before running downsampling. + +```console +PUT _index_template/datastream_template +{ + "index_patterns": [ + "datastream*" + ], + "data_stream": {}, + "template": { + "lifecycle": { + "downsampling": [ + { + "after": "1m", + "fixed_interval": "1h" + } + ] + }, + "settings": { + "index": { + "mode": "time_series" + } + }, + "mappings": { + "properties": { + "@timestamp": { + "type": "date" + }, + [...] + } + } + } +} +``` + + +For more details about index templates for time series data streams, refer to [](set-up-tsds.md). ::: :::: + +## Additional resources + +* [](downsampling-concepts.md) +* [](time-series-data-stream-tsds.md) diff --git a/manage-data/toc.yml b/manage-data/toc.yml index a87c9ab237..73bb6059a2 100644 --- a/manage-data/toc.yml +++ b/manage-data/toc.yml @@ -17,9 +17,6 @@ toc: - file: data-store/data-streams/downsampling-time-series-data-stream.md children: - file: data-store/data-streams/run-downsampling.md - - file: data-store/data-streams/run-downsampling-with-ilm.md - - file: data-store/data-streams/run-downsampling-manually.md - - file: data-store/data-streams/run-downsampling-using-data-stream-lifecycle.md - file: data-store/data-streams/downsampling-concepts.md - file: data-store/data-streams/reindex-tsds.md - file: data-store/data-streams/logs-data-stream.md From 4e15f58b4df76ad4c4b45b8d07b7079986d3a076 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Wed, 27 Aug 2025 14:30:25 -0400 Subject: [PATCH 07/20] More edits --- .../data-streams/downsampling-concepts.md | 69 ++++++++----------- .../downsampling-time-series-data-stream.md | 4 +- .../data-store/data-streams/set-up-tsds.md | 2 +- .../time-series-data-stream-tsds.md | 5 +- manage-data/toc.yml | 2 +- 5 files changed, 36 insertions(+), 46 deletions(-) diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index a5a065db6c..4f3688e9b4 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -1,4 +1,5 @@ --- +navigation_title: "Concepts" applies_to: stack: ga serverless: ga @@ -8,8 +9,8 @@ products: # Downsampling concepts [how-downsampling-works] -:::{warning} -🚧 Work in progress 🚧 +:::{admonition} Page status +🟒 Ready for review ::: A [time series](time-series-data-stream-tsds.md#time-series) is a sequence of observations taken over time for a specific entity. The observed samples can be represented as a continuous function, where the time series dimensions remain constant and the time series metrics change over time. @@ -18,20 +19,20 @@ A [time series](time-series-data-stream-tsds.md#time-series) is a sequence of ob :alt: time series function ::: -In an Elasticsearch index, a single document is created for each timestamp. The document contains the immutable time series dimensions, together with metric names and values. Several time series dimensions and metrics can be stored for a single timestamp. +In an {{es}} index, a single document is created for each timestamp. The document contains the immutable time series dimensions, plus metric names and values. Several time series dimensions and metrics can be stored for a single timestamp. :::{image} /manage-data/images/elasticsearch-reference-time-series-metric-anatomy.png :alt: time series metric anatomy ::: -For your most current and relevant data, the metrics series typically has a low sampling time interval, so it's optimized for queries that require a high data resolution. +For the most current data, the metrics series typically has a low sampling time interval, to optimize for queries that require a high data resolution. :::{image} /manage-data/images/elasticsearch-reference-time-series-original.png :alt: time series original :title: Original metrics series ::: -Downsampling reduces the footprint of older, less frequently accessed data by replacing the original time series with a data stream of a higher sampling interval, plus statistical representations of the data. For example, if the original metrics samples were taken every 10 seconds, as the data ages you might choose to reduce the sample granularity to hourly or daily. Or you might choose to reduce the granularity of `cold` archival data to monthly or less. +Downsampling reduces the footprint of older, less frequently accessed data by replacing the original time series with a data stream of a higher sampling interval, plus statistical representations of the data. For example, if the original metrics samples were taken every 10 seconds, you might choose to reduce the sample granularity to hourly as the data ages. Or you might choose to reduce the granularity of `cold` archival data to monthly or less. :::{image} /manage-data/images/elasticsearch-reference-time-series-downsampled.png :alt: time series downsampled @@ -39,21 +40,28 @@ Downsampling reduces the footprint of older, less frequently accessed data by re ::: -### The downsampling process [downsample-api-process] +## How downsampling works [downsample-api-process] The downsampling operation traverses the source TSDS index and performs the following steps: 1. Creates a new document for each value of the `_tsid` field and each `@timestamp` value, rounded to the `fixed_interval` defined in the downsampling configuration. 2. For each new document, copies all [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) from the source index to the target index. Dimensions in a TSDS are constant, so this step happens only once per bucket. -3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. The set of pre-aggregated results differs by metric field type: +3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. * `gauge` field type: * `min`, `max`, `sum`, and `value_count` are stored * `value_count` is stored as type `aggregate_metric_double` - * `counter field type: + * `counter` field type: * `last_value` is stored. 4. For all other fields, the most recent value is copied to the target index. +5. The original index is deleted and replaced by the downsampled index. Within a data stream, only one index can exist for a time period. + +The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas. + +:::{tip} +You can downsample a downsampled index. The subsequent downsampling interval must be a multiple of the interval used in the preceding downsampling operation. +::: % TODO ^^ consider mini table in step 3; refactor generally @@ -65,46 +73,27 @@ Fields in the target downsampled index are created based on fields in the origin 2. **Metrics:** Fields mapped with the `time_series_metric` parameter are created in the target downsampled index with the same mapping as in the source index, with one exception: `time_series_metric: gauge` fields are changed to `aggregate_metric_double`. 3. **Labels:** Label fields (fields that are neither dimensions nor metrics) are created in the target downsampled index with the same mapping as in the source index. -% TODO ^^ make this more concise - -% first pass edits up to here -% TODO resume editing from this line down +% TODO ^^ make this more concise / a table? ## Querying downsampled indices [querying-downsampled-indices] -You can use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints to query a downsampled index. Multiple raw data and downsampled indices can be queried in a single request, and a single request can include downsampled indices at different granularities (different bucket timespan). That is, you can query data streams that contain downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). - -The result of a time based histogram aggregation is in a uniform bucket size and each downsampled index returns data ignoring the downsampling time interval. For example, if you run a `date_histogram` aggregation with `"fixed_interval": "1m"` on a downsampled index that has been downsampled at an hourly resolution (`"fixed_interval": "1h"`), the query returns one bucket with all of the data at minute 0, then 59 empty buckets, and then a bucket with data again for the next hour. - - -### Notes on downsample queries [querying-downsampled-indices-notes] - -There are a few things to note about querying downsampled indices: - -* When you run queries in {{kib}} and through Elastic solutions, a normal response is returned without notification that some of the queried indices are downsampled. -* For [date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md), only `fixed_intervals` (and not calendar-aware intervals) are supported. -* Timezone support comes with caveats: +To query a downsampled index, use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints. - * Date histograms at intervals that are multiples of an hour are based on values generated at UTC. This works well for timezones that are on the hour, e.g. +5:00 or -3:00, but requires offsetting the reported time buckets, e.g. `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000` for timezone +5:30 (India), if downsampling aggregates values per hour. In this case, the results include the field `downsampled_results_offset: true`, to indicate that the time buckets are shifted. This can be avoided if a downsampling interval of 15 minutes is used, as it allows properly calculating hourly values for the shifted buckets. - * Date histograms at intervals that are multiples of a day are similarly affected, in case downsampling aggregates values per day. In this case, the beginning of each day is always calculated at UTC when generated the downsampled values, so the time buckets need to be shifted, e.g. reported as `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for timezone `America/New_York`. The field `downsampled_results_offset: true` is added in this case too. - * Daylight savings and similar peculiarities around timezones affect reported results, as [documented](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone) for date histogram aggregation. Besides, downsampling at daily interval hinders tracking any information related to daylight savings changes. +* You can query multiple raw data and downsampled indices in a single request, and a single request can include downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). +* When you run queries in {{kib}} and through Elastic solutions, a standard response is returned, with no indication that some of the queried indices are downsampled. +* [Date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) support `fixed_intervals` only (not calendar-aware intervals). +* Time-based histogram aggregations use a uniform bucket size, without regard to the downsampling time interval specified in the request. +### Time zone offsets +Date histograms are based on UTC values. Some time zone situations require offsetting (shifting the time buckets) when downsampling: + +* For time zone `+5:30` (India), offset by 30 minutes -- for example, `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000`. Or use a downsampling interval of 15 minutes instead of offsetting. +* For intervals based on days rather than hours, adjust the buckets to the appropriate time zone -- for example, `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for `America/New_York`. -## Restrictions and limitations [downsampling-restrictions] +When offsetting is applied, responses include the field `downsampled_results_offset: true`. -The following restrictions and limitations apply for downsampling: +For more details, refer to [Date histogram aggregation: Time zone](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone). -* Only indices in a [time series data stream](time-series-data-stream-tsds.md) are supported. -* Data is downsampled based on the time dimension only. All other dimensions are copied to the new index without any modification. -* Within a data stream, a downsampled index replaces the original index and the original index is deleted. Only one index can exist for a given time period. -* A source index must be in read-only mode for the downsampling process to succeed. Check the Run downsampling manually example for details. -* Downsampling data for the same period many times (downsampling of a downsampled index) is supported. The downsampling interval must be a multiple of the interval of the downsampled index. -* A source index must be in read-only mode for the downsampling process to succeed. Check the Run downsampling manually example for details. -* Downsampling data for the same period many times (downsampling of a downsampled index) is supported. The downsampling interval must be a multiple of the interval of the downsampled index. -* Downsampling is provided as an ILM action. See [Downsample](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md). -* The new, downsampled index is created on the data tier of the original index and it inherits its settings (for example, the number of shards and replicas). -* The numeric `gauge` and `counter` [metric types](elasticsearch://reference/elasticsearch/mapping-reference/mapping-field-meta.md) are supported. -* The downsampling configuration is extracted from the time series data stream [index mapping](./set-up-tsds.md#create-tsds-index-template). The only additional required setting is the downsampling `fixed_interval`. diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 074ac8b88c..67a131130f 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -32,5 +32,5 @@ This section explains the available downsampling options and helps you understan ## Next steps % TODO confirm patterns -* [](run-downsampling.md) -* [](downsampling-concepts.md) \ No newline at end of file +* [](downsampling-concepts.md) +* [](run-downsampling.md) \ No newline at end of file diff --git a/manage-data/data-store/data-streams/set-up-tsds.md b/manage-data/data-store/data-streams/set-up-tsds.md index 3b37deea9a..58b5a76d13 100644 --- a/manage-data/data-store/data-streams/set-up-tsds.md +++ b/manage-data/data-store/data-streams/set-up-tsds.md @@ -12,7 +12,7 @@ products: # Set up a time series data stream [set-up-tsds] :::{warning} -🚧 Work in progress 🚧 +🚧 Work in progress, not ready for review 🚧 ::: To set up a [time series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md), complete these steps: diff --git a/manage-data/data-store/data-streams/time-series-data-stream-tsds.md b/manage-data/data-store/data-streams/time-series-data-stream-tsds.md index 4a3fcf52da..105269671a 100644 --- a/manage-data/data-store/data-streams/time-series-data-stream-tsds.md +++ b/manage-data/data-store/data-streams/time-series-data-stream-tsds.md @@ -1,6 +1,7 @@ --- mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html +navigation_title: "Time series data streams" applies_to: stack: ga serverless: ga @@ -11,7 +12,7 @@ products: # Time series data streams (TSDS) [tsds] :::{warning} -🚧 Work in progress 🚧 +🚧 Work in progress, not ready for review 🚧 ::: A time series data stream (TSDS) models timestamped metrics data as one or more time series. @@ -23,7 +24,7 @@ You can use a TSDS to store metrics data more efficiently. In our benchmarks, me Both a [regular data stream](../data-streams.md) and a TSDS can store timestamped metrics data. Only use a TSDS if you typically add metrics data to {{es}} in near real-time and `@timestamp` order. -A TSDS is only intended for metrics data. For other timestamped data, such as logs or traces, use a [logs data stream](logs-data-stream.md) or regular data stream. +Use a time series data stream for metrics data only. For other timestamped data, such as logs or traces, use a [logs data stream](logs-data-stream.md) or regular data stream. ## Differences from a regular data stream [differences-from-regular-data-stream] diff --git a/manage-data/toc.yml b/manage-data/toc.yml index 13c7a30177..ed835fa829 100644 --- a/manage-data/toc.yml +++ b/manage-data/toc.yml @@ -16,8 +16,8 @@ toc: - file: data-store/data-streams/set-up-tsds.md - file: data-store/data-streams/downsampling-time-series-data-stream.md children: - - file: data-store/data-streams/run-downsampling.md - file: data-store/data-streams/downsampling-concepts.md + - file: data-store/data-streams/run-downsampling.md - file: data-store/data-streams/reindex-tsds.md - file: data-store/data-streams/logs-data-stream.md - file: data-store/data-streams/failure-store.md From 8f4410011a10a4994bf5f6bf63a671adedc832c4 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Wed, 27 Aug 2025 14:42:47 -0400 Subject: [PATCH 08/20] more --- manage-data/data-store/data-streams/downsampling-concepts.md | 4 ++-- manage-data/data-store/data-streams/reindex-tsds.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index 4f3688e9b4..d78772effc 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -54,8 +54,8 @@ The downsampling operation traverses the source TSDS index and performs the foll * `counter` field type: * `last_value` is stored. -4. For all other fields, the most recent value is copied to the target index. -5. The original index is deleted and replaced by the downsampled index. Within a data stream, only one index can exist for a time period. +4. For all other fields, copies the most recent value to the target index. +5. Deletes the original index and replaces it with the downsampled index. Within a data stream, only one index can exist for a time period. The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas. diff --git a/manage-data/data-store/data-streams/reindex-tsds.md b/manage-data/data-store/data-streams/reindex-tsds.md index 3b420618c9..3ddd7c0c2f 100644 --- a/manage-data/data-store/data-streams/reindex-tsds.md +++ b/manage-data/data-store/data-streams/reindex-tsds.md @@ -12,7 +12,7 @@ products: # Reindex a TSDS [tsds-reindex] :::{warning} -🚧 Work in progress 🚧 +🚧 Work in progress, not ready for review 🚧 ::: ## Introduction [tsds-reindex-intro] From 0dc96f94fdf69a7ee8dd3da253195a7ac9a1c5f7 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Mon, 8 Sep 2025 21:02:50 -0400 Subject: [PATCH 09/20] Apply suggestions from review Co-authored-by: Yannis Roussos --- .../data-streams/downsampling-time-series-data-stream.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 67a131130f..b159e21a8b 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -17,7 +17,7 @@ products: Downsampling reduces the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. -Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for increased storage space. +Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for decreased storage space. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the `min`, `max`, `sum`, and `value_count` for each metric. Data stream [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) are stored as is, with no changes. From 4a099278728474adfb6734b19e68ed2e078827e1 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Mon, 8 Sep 2025 21:03:09 -0400 Subject: [PATCH 10/20] Apply suggestions from review Co-authored-by: Yannis Roussos --- .../data-streams/downsampling-time-series-data-stream.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index b159e21a8b..17bbf52b79 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -22,7 +22,7 @@ Metrics tools and solutions collect large amounts of time series data over time. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the `min`, `max`, `sum`, and `value_count` for each metric. Data stream [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) are stored as is, with no changes. :::{tip} -You can include downsampling in an [{{ilm}} ({{ilm-init}})](../../lifecycle/index-lifecycle-management.md) policy to automatically manage the volume and associated cost of your metrics data at it ages. +You can include downsampling in an [{{ilm}} ({{ilm-init}})](../../lifecycle/index-lifecycle-management.md) policy to automatically manage the volume and associated cost of your metrics data as it ages. ::: This section explains the available downsampling options and helps you understand the process. From 5d478f37534dd3f716f97e26efe09ea051945ee4 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Sun, 14 Sep 2025 21:33:23 -0400 Subject: [PATCH 11/20] Apply suggestions from review --- .../data-streams/downsampling-concepts.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index d78772effc..2a491a9697 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -9,17 +9,19 @@ products: # Downsampling concepts [how-downsampling-works] -:::{admonition} Page status -🟒 Ready for review +This page explains core downsampling concepts. + +:::{important} +Downsampling works with [time series data streams](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md) only. ::: -A [time series](time-series-data-stream-tsds.md#time-series) is a sequence of observations taken over time for a specific entity. The observed samples can be represented as a continuous function, where the time series dimensions remain constant and the time series metrics change over time. +A [time series](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md#time-series) is a sequence of observations taken over time for a specific entity. The observed samples can be represented as a continuous function, where the time series dimensions remain constant and the time series metrics change over time. :::{image} /manage-data/images/elasticsearch-reference-time-series-function.png :alt: time series function ::: -In an {{es}} index, a single document is created for each timestamp. The document contains the immutable time series dimensions, plus metric names and values. Several time series dimensions and metrics can be stored for a single timestamp. +In a time series data stream, a single document is created for each timestamp. The document contains the immutable time series dimensions, plus metric names and values. Several time series dimensions and metrics can be stored for a single timestamp. :::{image} /manage-data/images/elasticsearch-reference-time-series-metric-anatomy.png :alt: time series metric anatomy @@ -32,7 +34,7 @@ For the most current data, the metrics series typically has a low sampling time :title: Original metrics series ::: -Downsampling reduces the footprint of older, less frequently accessed data by replacing the original time series with a data stream of a higher sampling interval, plus statistical representations of the data. For example, if the original metrics samples were taken every 10 seconds, you might choose to reduce the sample granularity to hourly as the data ages. Or you might choose to reduce the granularity of `cold` archival data to monthly or less. +_Downsampling_ reduces the footprint of older, less frequently accessed data by replacing the original time series with a data stream of a higher sampling interval, plus statistical representations of the data. For example, if the original metrics samples were taken every 10 seconds, you might choose to reduce the sample granularity to hourly as the data ages. Or you might choose to reduce the granularity of `cold` archival data to monthly or less. :::{image} /manage-data/images/elasticsearch-reference-time-series-downsampled.png :alt: time series downsampled @@ -42,9 +44,12 @@ Downsampling reduces the footprint of older, less frequently accessed data by re ## How downsampling works [downsample-api-process] -The downsampling operation traverses the source TSDS index and performs the following steps: +Downsampling is applied to the individual backing indices of the TSDS. The downsampling operation traverses the source time series index and performs the following steps: + +1. Creates a new document for each group of documents with matching `_tsid` values (time series dimension fields), grouped into buckets that correspond to timestamps in a specific interval. + + For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within aa given hour interval are summarized and stored as a single document in the downsampled index. -1. Creates a new document for each value of the `_tsid` field and each `@timestamp` value, rounded to the `fixed_interval` defined in the downsampling configuration. 2. For each new document, copies all [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) from the source index to the target index. Dimensions in a TSDS are constant, so this step happens only once per bucket. 3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. From c68529cd465b83f604da660fb379a98ba7ef0de0 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Sun, 14 Sep 2025 22:52:47 -0400 Subject: [PATCH 12/20] Apply suggestions from review --- .../data-streams/downsampling-concepts.md | 30 +--- .../downsampling-time-series-data-stream.md | 14 +- .../data-streams/query-downsampled-data.md | 28 ++++ .../data-streams/run-downsampling.md | 151 +++++++----------- manage-data/toc.yml | 1 + 5 files changed, 93 insertions(+), 131 deletions(-) create mode 100644 manage-data/data-store/data-streams/query-downsampled-data.md diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index 2a491a9697..9555d61da3 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -54,13 +54,12 @@ Downsampling is applied to the individual backing indices of the TSDS. The downs 3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. * `gauge` field type: - * `min`, `max`, `sum`, and `value_count` are stored - * `value_count` is stored as type `aggregate_metric_double` + * `min`, `max`, `sum`, and `value_count` are stored as type `aggregate_metric_double` * `counter` field type: * `last_value` is stored. 4. For all other fields, copies the most recent value to the target index. -5. Deletes the original index and replaces it with the downsampled index. Within a data stream, only one index can exist for a time period. +5. Replaces the original index with the downsampled index, then deletes the original index. The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas. @@ -72,33 +71,10 @@ You can downsample a downsampled index. The subsequent downsampling interval mus ### Source and target index field mappings [downsample-api-mappings] -Fields in the target downsampled index are created based on fields in the original source index, as follows: +Fields in the target downsampled index are created with the same mapping as in the source index, with one exception: `time_series_metric: gauge` fields are changed to `aggregate_metric_double`. -1. **Dimensions:** Fields mapped with the `time-series-dimension` parameter are created in the target downsampled index with the same mapping as in the source index. -2. **Metrics:** Fields mapped with the `time_series_metric` parameter are created in the target downsampled index with the same mapping as in the source index, with one exception: `time_series_metric: gauge` fields are changed to `aggregate_metric_double`. -3. **Labels:** Label fields (fields that are neither dimensions nor metrics) are created in the target downsampled index with the same mapping as in the source index. -% TODO ^^ make this more concise / a table? -## Querying downsampled indices [querying-downsampled-indices] - -To query a downsampled index, use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints. - -* You can query multiple raw data and downsampled indices in a single request, and a single request can include downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). -* When you run queries in {{kib}} and through Elastic solutions, a standard response is returned, with no indication that some of the queried indices are downsampled. -* [Date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) support `fixed_intervals` only (not calendar-aware intervals). -* Time-based histogram aggregations use a uniform bucket size, without regard to the downsampling time interval specified in the request. - -### Time zone offsets - -Date histograms are based on UTC values. Some time zone situations require offsetting (shifting the time buckets) when downsampling: - -* For time zone `+5:30` (India), offset by 30 minutes -- for example, `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000`. Or use a downsampling interval of 15 minutes instead of offsetting. -* For intervals based on days rather than hours, adjust the buckets to the appropriate time zone -- for example, `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for `America/New_York`. - -When offsetting is applied, responses include the field `downsampled_results_offset: true`. - -For more details, refer to [Date histogram aggregation: Time zone](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone). diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 17bbf52b79..736ab98557 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -1,5 +1,5 @@ --- -navigation_title: "Downsample a TSDS" +navigation_title: "Downsampling" mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling.html applies_to: @@ -9,22 +9,12 @@ products: - id: elasticsearch --- -# Downsample a time series data stream [downsampling] - -:::{admonition} Page status -🟒 Ready for review -::: +# Downsampling a time series data stream [downsampling] Downsampling reduces the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for decreased storage space. -The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the `min`, `max`, `sum`, and `value_count` for each metric. Data stream [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) are stored as is, with no changes. - -:::{tip} -You can include downsampling in an [{{ilm}} ({{ilm-init}})](../../lifecycle/index-lifecycle-management.md) policy to automatically manage the volume and associated cost of your metrics data as it ages. -::: - This section explains the available downsampling options and helps you understand the process. % TODO add subsection links and conceptual links after restructuring diff --git a/manage-data/data-store/data-streams/query-downsampled-data.md b/manage-data/data-store/data-streams/query-downsampled-data.md new file mode 100644 index 0000000000..106b2dc482 --- /dev/null +++ b/manage-data/data-store/data-streams/query-downsampled-data.md @@ -0,0 +1,28 @@ +--- +applies_to: + stack: ga + serverless: ga +navigation_title: "Query downsampled data" +products: + - id: elasticsearch +--- + +# Querying downsampled data [querying-downsampled-indices] + +To query a downsampled index, use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints. + +* You can query multiple raw data and downsampled indices in a single request, and a single request can include downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`). +* When you run queries in {{kib}} and through Elastic solutions, a standard response is returned, with no indication that some of the queried indices are downsampled. +* [Date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) support `fixed_intervals` only (not calendar-aware intervals). +* Time-based histogram aggregations use a uniform bucket size, without regard to the downsampling time interval specified in the request. + +## Time zone offsets + +Date histograms are based on UTC values. Some time zone situations require offsetting (shifting the time buckets) when downsampling: + +* For time zone `+5:30` (India), offset by 30 minutes -- for example, `2020-01-01T10:30:00.000` instead of `2020-03-07T10:00:00.000`. Or use a downsampling interval of 15 minutes instead of offsetting. +* For intervals based on days rather than hours, adjust the buckets to the appropriate time zone -- for example, `2020-03-07T19:00:00.000` instead of `2020-03-07T00:00:00.000` for `America/New_York`. + +When offsetting is applied, responses include the field `downsampled_results_offset: true`. + +For more details, refer to [Date histogram aggregation: Time zone](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md#datehistogram-aggregation-time-zone). \ No newline at end of file diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index d2ba107fc0..9fae0dce33 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -2,7 +2,7 @@ applies_to: stack: ga serverless: ga -navigation_title: "Run downsampling" +navigation_title: "Downsample data" mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-manual.html - https://www.elastic.co/guide/en/elasticsearch/reference/current/downsampling-ilm.html @@ -10,74 +10,63 @@ products: - id: elasticsearch --- -# Run downsampling on time series data [running-downsampling] +# Downsample time series data [running-downsampling] -:::{admonition} Page status -🟒 Ready for review -::: - -% TODO consider retitling (cf. overview) +To downsample a time series data stream (TSDS), you can use index lifecycle management (ILM) or a data stream lifecycle. (You can also use the [downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) with an individual time series index, but most users don't need to use the API.) -To downsample a time series data stream backing index, you can use the `downsample API`, index lifecycle management (ILM), or a data stream lifecycle. +Before you begin, review the [](downsampling-concepts.md). -:::{note} -Downsampling runs on the data stream backing index, not the data stream itself. +:::{important} +Downsampling requires **read-only** data. ::: -## Prerequisites +In most cases, you can choose the data stream lifecycle option. If you're using [data tiers](/manage-data/lifecycle/data-tiers.md) in {{stack}}, choose the index lifecycle option. -Before you start, make sure your index is a candidate for downsampling: +::::{tab-set} -* The index must be **read-only**. You can roll over a write index and make it read-only. -* The index must have at least one metric field. -For more details about the downsampling process, refer to [](downsampling-concepts.md). +:::{tab-item} Data stream lifecycle -::::{tab-set} -:::{tab-item} Downsample API +## Downsample with a data stream lifecycle +```{applies_to} +stack: ga +serverless: ga +``` -## Downsampling with the API +To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle. -Make a [downsample API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-downsample) request: +* Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. +* Set `after` to the minimum time to wait after an index rollover, before running downsampling. ```console -POST /my-time-series-index/_downsample/my-downsampled-time-series-index +PUT _data_stream/my-data-stream/_lifecycle { - "fixed_interval": "1d" + "data_retention": "7d", + "downsampling": [ + { + "after": "1m", + "fixed_interval": "10m" + }, + { + "after": "1d", + "fixed_interval": "1h" + } + ] } ``` - -Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. - ::: :::{tab-item} Index lifecycle ## Downsampling with index lifecycle management - -To downsample time series data as part of index lifecycle management (ILM), include a [downsample action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) in your ILM policy: - -```console -PUT _ilm/policy/my_policy -{ -"policy": { - "phases": { - "warm": { - "actions": { - "downsample" : { - "fixed_interval": "1h" - } - } - } - } -} -} +```{applies_to} +stack: ga +serverless: unavailable ``` -Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. -% TODO consider restoring removed tutorial-esque content +To downsample time series data as part of index lifecycle management (ILM), include [downsample actions](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-downsample.md) in your ILM policy. You can configure multiple downsampling actions across different phases to progressively reduce data granularity over time. -In this example, an ILM policy is configured for the `hot` phase. The downsample action runs after the index is rolled over and the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has passed. +This example shows a policy with rollover and two downsampling actions: one in the hot phase for initial aggregation at 5-minute intervals, and another in the warm phase for further aggregation at 1-hour intervals: ```console PUT _ilm/policy/datastream_policy @@ -90,68 +79,46 @@ PUT _ilm/policy/datastream_policy "max_age": "5m" }, "downsample": { - "fixed_interval": "1h" + "fixed_interval": "5m" } } + }, + "warm": { + "actions": { + "downsample": { + "fixed_interval": "1h" + } + } } } } } ``` +Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. The downsample action runs after the index is rolled over and the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has passed. ::: +:::: -:::{tab-item} Data stream lifecycle - -## Downsampling with data stream lifecycle management - -To downsample time series data as part of data lifecycle management, create an index template that includes a `lifecycle` section with a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) object. - -* Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. -* Set `after` to the minimum time to wait after an index rollover, before running downsampling. +## Additional resources -```console -PUT _index_template/datastream_template -{ - "index_patterns": [ - "datastream*" - ], - "data_stream": {}, - "template": { - "lifecycle": { - "downsampling": [ - { - "after": "1m", - "fixed_interval": "1h" - } - ] - }, - "settings": { - "index": { - "mode": "time_series" - } - }, - "mappings": { - "properties": { - "@timestamp": { - "type": "date" - }, - [...] - } - } - } -} -``` +* [](downsampling-concepts.md) +* [](time-series-data-stream-tsds.md) +* [](set-up-tsds.md) +% :::{tab-item} Downsample API -For more details about index templates for time series data streams, refer to [](set-up-tsds.md). +% ## Downsampling with the API -::: +% Make a [downsample API] request: -:::: +% ```console +% POST /my-time-series-index/_downsample/my-downsampled-time-series-index +% { +% "fixed_interval": "1d" +% } +% ``` -## Additional resources +% Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. -* [](downsampling-concepts.md) -* [](time-series-data-stream-tsds.md) +% ::: diff --git a/manage-data/toc.yml b/manage-data/toc.yml index 91f08609b7..634324d64a 100644 --- a/manage-data/toc.yml +++ b/manage-data/toc.yml @@ -19,6 +19,7 @@ toc: children: - file: data-store/data-streams/downsampling-concepts.md - file: data-store/data-streams/run-downsampling.md + - file: data-store/data-streams/query-downsampled-data.md - file: data-store/data-streams/reindex-tsds.md - file: data-store/data-streams/logs-data-stream.md - file: data-store/data-streams/failure-store.md From 464947cff7b8f007cf3a62d1ec7255cf77baad2d Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Mon, 15 Sep 2025 18:56:30 -0400 Subject: [PATCH 13/20] Apply suggestions from review Co-authored-by: Mary Gouseti --- manage-data/data-store/data-streams/downsampling-concepts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/downsampling-concepts.md b/manage-data/data-store/data-streams/downsampling-concepts.md index 9555d61da3..e599f41d7e 100644 --- a/manage-data/data-store/data-streams/downsampling-concepts.md +++ b/manage-data/data-store/data-streams/downsampling-concepts.md @@ -48,7 +48,7 @@ Downsampling is applied to the individual backing indices of the TSDS. The downs 1. Creates a new document for each group of documents with matching `_tsid` values (time series dimension fields), grouped into buckets that correspond to timestamps in a specific interval. - For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within aa given hour interval are summarized and stored as a single document in the downsampled index. + For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within a given hour interval are summarized and stored as a single document in the downsampled index. 2. For each new document, copies all [time series dimensions](time-series-data-stream-tsds.md#time-series-dimension) from the source index to the target index. Dimensions in a TSDS are constant, so this step happens only once per bucket. 3. For each [time series metric](time-series-data-stream-tsds.md#time-series-metric) field, computes aggregations for all documents in the bucket. From d5af0d309ff414c26eca7260bb52f5cd04aa1211 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Mon, 15 Sep 2025 19:03:15 -0400 Subject: [PATCH 14/20] Note end time is respected --- manage-data/data-store/data-streams/run-downsampling.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index 9fae0dce33..699da2e5ed 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -54,6 +54,8 @@ PUT _data_stream/my-data-stream/_lifecycle ] } ``` + +The downsampling action runs after the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has passed. ::: :::{tab-item} Index lifecycle From 5ee6333b4177e187c681e7197626cef0033aa4ac Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Mon, 15 Sep 2025 19:11:53 -0400 Subject: [PATCH 15/20] Suggestion from review --- manage-data/data-store/data-streams/run-downsampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index 699da2e5ed..59c550e2a9 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -33,7 +33,7 @@ stack: ga serverless: ga ``` -To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle. +To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). * Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. * Set `after` to the minimum time to wait after an index rollover, before running downsampling. From bad2def7150b4895b086d2cb94dce351d1e949cb Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Tue, 16 Sep 2025 09:11:34 -0400 Subject: [PATCH 16/20] remove review status indicators --- manage-data/data-store/data-streams/reindex-tsds.md | 4 ---- manage-data/data-store/data-streams/set-up-tsds.md | 4 ---- 2 files changed, 8 deletions(-) diff --git a/manage-data/data-store/data-streams/reindex-tsds.md b/manage-data/data-store/data-streams/reindex-tsds.md index 3ddd7c0c2f..773eb1a40d 100644 --- a/manage-data/data-store/data-streams/reindex-tsds.md +++ b/manage-data/data-store/data-streams/reindex-tsds.md @@ -11,10 +11,6 @@ products: # Reindex a TSDS [tsds-reindex] -:::{warning} -🚧 Work in progress, not ready for review 🚧 -::: - ## Introduction [tsds-reindex-intro] With reindexing, you can copy documents from an old [time-series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md) to a new one. Data streams support reindexing in general, with a few [restrictions](use-data-stream.md#reindex-with-a-data-stream). Still, time-series data streams introduce additional challenges due to tight control on the accepted timestamp range for each backing index they contain. Direct use of the reindex API would likely error out due to attempting to insert documents with timestamps that are outside the current acceptance window. diff --git a/manage-data/data-store/data-streams/set-up-tsds.md b/manage-data/data-store/data-streams/set-up-tsds.md index b1b8b71a26..1fa7aca55b 100644 --- a/manage-data/data-store/data-streams/set-up-tsds.md +++ b/manage-data/data-store/data-streams/set-up-tsds.md @@ -11,10 +11,6 @@ products: # Set up a time series data stream [set-up-tsds] -:::{warning} -🚧 Work in progress, not ready for review 🚧 -::: - To set up a [time series data stream (TSDS)](../data-streams/time-series-data-stream-tsds.md), complete these steps: 1. Check the [prerequisites](#tsds-prereqs). From b997a50b6f33c6c4d9978d1faf5fd862e896ed48 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Tue, 16 Sep 2025 09:17:45 -0400 Subject: [PATCH 17/20] revert earlier change and clarify --- .../data-streams/downsampling-time-series-data-stream.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 736ab98557..666220b558 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -13,7 +13,7 @@ products: Downsampling reduces the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. -Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for decreased storage space. +Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. You can _downsample_ older data, reducing resolution and precision in exchange for increased storage space. This section explains the available downsampling options and helps you understand the process. From 67f07cde6c879507e998cc42a5f3fb764d3f6ae5 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Tue, 16 Sep 2025 09:24:58 -0400 Subject: [PATCH 18/20] what i meant was --- .../data-streams/downsampling-time-series-data-stream.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 666220b558..8d28000d6c 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -13,7 +13,7 @@ products: Downsampling reduces the footprint of your [time series data](time-series-data-stream-tsds.md) by storing it at reduced granularity. -Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. You can _downsample_ older data, reducing resolution and precision in exchange for increased storage space. +Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. You can _downsample_ older data to reduce its resolution and precision, freeing up storage space. This section explains the available downsampling options and helps you understand the process. From 3810e30877b81b36d57b3406aad8d2d53468f7e9 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Tue, 16 Sep 2025 10:03:40 -0400 Subject: [PATCH 19/20] slightly better? --- .../data-streams/downsampling-time-series-data-stream.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md index 8d28000d6c..6a12f625c5 100644 --- a/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md +++ b/manage-data/data-store/data-streams/downsampling-time-series-data-stream.md @@ -8,6 +8,7 @@ applies_to: products: - id: elasticsearch --- +% TODO flesh out after the rest of the section has been restructured # Downsampling a time series data stream [downsampling] @@ -17,10 +18,5 @@ Metrics tools and solutions collect large amounts of time series data over time. This section explains the available downsampling options and helps you understand the process. -% TODO add subsection links and conceptual links after restructuring - -## Next steps -% TODO confirm patterns - * [](downsampling-concepts.md) * [](run-downsampling.md) \ No newline at end of file From 8532499b453778ebfced8d46a3d53fb071040613 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Tue, 16 Sep 2025 12:11:05 -0400 Subject: [PATCH 20/20] Apply suggestion from review Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com> --- manage-data/data-store/data-streams/run-downsampling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manage-data/data-store/data-streams/run-downsampling.md b/manage-data/data-store/data-streams/run-downsampling.md index 59c550e2a9..ddf07341c7 100644 --- a/manage-data/data-store/data-streams/run-downsampling.md +++ b/manage-data/data-store/data-streams/run-downsampling.md @@ -33,7 +33,7 @@ stack: ga serverless: ga ``` -To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). +To downsample a time series via a [data stream lifecycle](/manage-data/lifecycle/data-stream.md), add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the data stream lifecycle (for existing data streams) or the index template (for new data streams). * Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval. * Set `after` to the minimum time to wait after an index rollover, before running downsampling.