Skip to content
Open
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions manage-data/data-store/data-streams/run-downsampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,34 @@ Set `fixed_interval` to your preferred level of granularity. The original time s
:::
::::

## Best practices

This section provides some best practices for downsampling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a note about rollover? To avoid creating backing indices that are too big..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been going back and forth for this. For ILM it's easy because it's part of the policy, for data stream lifecycle, I would suggest that if we really think that it should be less maybe we should set it to something less. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, update the default? We can do that at a later point, but what about older versions, or ILM configurations with existing rollover overrides? It could still help to suggest a best practice here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could update the default, that would apply on all version unless the user chose to overwrite it. I restructure it a bit so we can have ILM focused recommendations. But if we think it should be reduced, we should consider updating the default for DLM as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a tracking issue for this, so that we don't forget.

### Choose an optimal downsampling interval

When choosing the downsampling interval, make sure to consider the original sampling rate of your measurements. Use an interval that reduces the number of documents by a significant percentage. For example, if a sensor sends data every 10 seconds, downsampling to 1 minute would reduce the number of documents by 83%. Downsampling to 5 minutes instead would reduce the number by 96%.

The same applies when downsampling already downsampled data.

### Understand downsampling phases (ILM only)

When using [index lifecycle management](/manage-data/lifecycle/index-lifecycle-management.md) (ILM), you can define at most one downsampling round in each of the following phases:

- `hot` phase: Runs after the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) passes
- `warm` phase: Runs after the `min_age` time (starting to count after the rollover and respecting the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this read well @marciw ? Considering that we want to stress that rollover marks "the start of time"?

- `cold` phase: Runs after the `min_age` time (starting to count after the rollover and respecting the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time)

Phases don't require matching tiers. If a matching tier exists for the phase, ILM automatically migrates the data to the respective tier. To prevent this, add a [migrate action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-migrate.md#ilm-migrate-options) and specify `enabled: false`.

If you leave the default migrate action enabled, downsampling runs on the tier of the source index, which typically has more resources. The smaller, downsampled data is then migrated to the next tier.

### Reduce the index size (ILM only)

When configuring an ILM policy with downsampling, use the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) in the `hot` phase to control index size. Using smaller indices helps to minimize the impact of downsampling on a cluster's performance.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important here to say use and not define. When writing an ILM policy a user needs to define a rollover action no matter what. However, if they are using downsampling, they can consider using this to reduce the size of their index. I want us to be careful to not imply that a user needs to define it only if they are trying to reduce the size. @marciw does this make sense?


Because the downsampling operation processes an entire index at once, it can increase the load on the cluster. Smaller indices improve task distribution. To reduce the index size, limit the number of primary shards or use [`max_primary_shard_docs`](https://www.elastic.co/docs/reference/elasticsearch/index-lifecycle-actions/ilm-rollover#ilm-rollover-options) to cap documents per shard. Specify a lower value than the default of 200 million, to help prevent load spikes due to downsampling.

## Additional resources

* [](downsampling-concepts.md)
Expand Down
Loading