-
Notifications
You must be signed in to change notification settings - Fork 152
Add practical tips for downsampling. #3340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Can we wait for Marci to submit her update, then port these tips to the new structure? |
## Practical tips | ||
|
||
Downsampling requires reading and indexing the contents of a backing index. The following guidelines can help you get the most out of it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a note about rollover? To avoid creating backing indices that are too big..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been going back and forth for this. For ILM it's easy because it's part of the policy, for data stream lifecycle, I would suggest that if we really think that it should be less maybe we should set it to something less. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, update the default? We can do that at a later point, but what about older versions, or ILM configurations with existing rollover overrides? It could still help to suggest a best practice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could update the default, that would apply on all version unless the user chose to overwrite it. I restructure it a bit so we can have ILM focused recommendations. But if we think it should be reduced, we should consider updating the default for DLM as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file a tracking issue for this, so that we don't forget.
It is created on the updated downsampling page. Right? |
🔍 Preview links for changed docs |
Co-authored-by: Kostas Krikellas <[email protected]>
|
||
The downsampling operation runs over a whole index, so in certain cases downsampling can increase the load on a cluster. One of the ways to reduce that load is to reduce the size of the index; this way you can have smaller downsampling tasks that get better distributed. You can achieve that either by reducing the number of primary shards or by using the `max_primary_shard_docs` to reduce the number of docs in a single shard. There is already an upper limit enforced by elasticsearch for `max_primary_shard_docs` which is 200 million, but reducing it to 180 or 150 million could be beneficial. Please experiment and monitor the effect of such changes since their impact varies depending on the specific use cases. | ||
|
||
#### Phases and tiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd move this above reducing index size
, it's more important iiuc.
Co-authored-by: Kostas Krikellas <[email protected]>
In this PR, we propose to add practical tips for downsampling. For now this includes, a guideline on how to choose the downsampling interval. And then specifically for ILM, an explanation on how downsampling relates with tiers. After elastic/elasticsearch#135834, we should also add here the option to disable force merge.