-
Notifications
You must be signed in to change notification settings - Fork 152
Add practical tips for downsampling. #3340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gmarouli
wants to merge
7
commits into
main
Choose a base branch
from
downsampling-practical-tips
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+32
−0
Open
Changes from 5 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
ccceb1e
Add practical tips for downsampling
gmarouli 542ef9a
Add ILM specific section
gmarouli 1e32d3c
Fix link to migrate action
gmarouli 69f7ddf
Apply review comments
gmarouli 4dcb367
Merge branch 'main' into downsampling-practical-tips
gmarouli 2c7aac1
Apply suggestions from code review
gmarouli 566abe3
Rearrange ILM tips
gmarouli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -102,6 +102,38 @@ Set `fixed_interval` to your preferred level of granularity. The original time s | |||||
::: | ||||||
:::: | ||||||
|
||||||
## Practical tips | ||||||
|
||||||
Downsampling requires reading and indexing the contents of a backing index. The following guidelines can help you get the most out of it. | ||||||
|
||||||
### Choosing the downsampling interval | ||||||
|
||||||
When choosing the downsampling interval, you need to consider the original sampling rate of your measurements. Ideally, you would like an interval that would reduce your number of documents by a significant amount. For example, if a sensor sends data every 10 seconds downsampling to 1 minute would reduce the number of documents by 83%, compared to downsampling to 5 minutes by 96%. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
The same applied when downsampling already downsampled data. | ||||||
|
||||||
### Downsampling with Index Lifecycle Management | ||||||
|
||||||
The following tips apply to data streams downsampled by index lifecycle management (ILM). | ||||||
|
||||||
#### Reducing index size | ||||||
|
||||||
When configuring an ILM policy with downsampling, it is necessary to define the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) in the `hot` phase. The rollover action consists of the conditions that would trigger a rollover hence it determines the size of an index and its shards. The size of an index can influence the impact that downsampling has on a cluster's performance. | ||||||
|
||||||
The downsampling operation runs over a whole index, so in certain cases downsampling can increase the load on a cluster. One of the ways to reduce that load is to reduce the size of the index; this way you can have smaller downsampling tasks that get better distributed. You can achieve that either by reducing the number of primary shards or by using setting [`max_primary_shard_docs`](https://www.elastic.co/docs/reference/elasticsearch/index-lifecycle-actions/ilm-rollover#ilm-rollover-options) to reduce the number of docs in a single shard. Using a lower value than the default of 200 million is expected to help smoothen load spikes due to downsampling. | ||||||
|
||||||
#### Phases and tiers | ||||||
gmarouli marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
When using ILM, you can define at most one downsampling round in the following phases: | ||||||
|
||||||
- `hot` phase: it will execute the downsampling after the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time) has passed | ||||||
- `warm` phase: it will execute the downsampling `min_age` time after the rollover (respecting the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time)) | ||||||
- `cold` phase: it will execute the downsampling `min_age` time after the rollover (respecting the [index time series end time](elasticsearch://reference/elasticsearch/index-settings/time-series.md#index-time-series-end-time)) | ||||||
|
||||||
The phases do not require the respective tiers to exist. However, when a cluster has tiers, ILM automatically migrates the data processed in the phase to the respective tier. This can be disabled by adding the [migrate action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-migrate.md#ilm-migrate-options) with `enabled: false`. | ||||||
|
||||||
The migrate action is implicitly enabled, so unless explicitly disabled, the downsampling data will have to move to the respective tier; the downsampling operation occurs at the same tier as the source index and then the downsampled data gets migrated, this implementation choice allows downsampling to leverage the better resources from the "hotter" tier and move less data to the next tier. | ||||||
gmarouli marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
||||||
## Additional resources | ||||||
|
||||||
* [](downsampling-concepts.md) | ||||||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a note about rollover? To avoid creating backing indices that are too big..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been going back and forth for this. For ILM it's easy because it's part of the policy, for data stream lifecycle, I would suggest that if we really think that it should be less maybe we should set it to something less. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, update the default? We can do that at a later point, but what about older versions, or ILM configurations with existing rollover overrides? It could still help to suggest a best practice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could update the default, that would apply on all version unless the user chose to overwrite it. I restructure it a bit so we can have ILM focused recommendations. But if we think it should be reduced, we should consider updating the default for DLM as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file a tracking issue for this, so that we don't forget.