-
Notifications
You must be signed in to change notification settings - Fork 130
Updating the Rollover page #2589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
As per internal feedback, we're updating this page. Fixes #1563
🔍 Preview links for changed docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this, I left quite a few comments.
Thank you so much for reviewing, @dakrone! I've addressed your comments (hopefully correctly) -- that's been very useful. One thing I think I struggle to understand, and maybe you can help with, is when/under what conditions would a user prefer to rotate their indices using aliases as opposed to data streams? Would that be attributable to legacy reasons mostly or is there a specific problem they would be solving that way? I'm wondering how to provide more context around the when and why one would do that. The text previously said:
And as you've pointed out, documents can be updated or deleted in the backing index that contain them using the API. So this is not a relevant reason for choosing the aliases approach. |
Certainly, the specific reason that users use aliases and ILM is that they can then avoid rollover. For example, if a user does time-based indices, they can have an alias called The main reason that users wish to avoid rollover is to solve the duplicate handling with rollover problem. It's something we plan to address, but it's a real issue for some users. |
@@ -9,50 +9,96 @@ products: | |||
|
|||
# Rollover [index-rollover] | |||
|
|||
When indexing time series data like logs or metrics, you can’t write to a single index indefinitely. To meet your indexing and search performance requirements and manage resource usage, you write to an index until some threshold is met and then create a new index and start writing to it instead. Using rolling indices enables you to: | |||
In {{es}}, the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) replaces your active write index with a new one whenever your index grows too large, too old, or stores too many documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In {{es}}, the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) replaces your active write index with a new one whenever your index grows too large, too old, or stores too many documents. | |
In {{es}}, the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) replaces your active write index with a new one whenever your index grows beyond a specified size, age, or number of documents. |
Just a thought. I think someone might interpret "too large", "too old", etc. as meaning that Elasticsearch struggles to manage the index in these cases. Maybe the phrasing can be more positive, suggesting just that rollover gives the user better control over these things (as a way to optimize for performance, and retain data according to requirements, as you have in the next sentence).
|
||
We recommend using [data streams](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create-data-stream) to manage time series data. Data streams automatically track the write index while keeping configuration to a minimum. | ||
The rollover feature is an important part of how [index lifecycle](../index-lifecycle-management/index-lifecycle.md) (ILM) and [data stream lifecycles](../data-stream.md) (DLM) work to keep your indices fast and manageable. By switching the write target of an index, the rollover action provides the following benefits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dakrone I think this and line 19 are based off of your comment. Do you think it's okay for us to introduce "DLM" here as an acronym for "data stream lifecycle management"? I'm a little hesitant since I gather DLM is already in popular use as "data lifecycle management" (e.g. IBM, HP), and our use here specifically to denote "data stream lifecycle" doesn't quite match. That is, to my mind "ILM" and "data stream lifecycle" are two different approaches to DLM.
What would you think of our sticking with:
- index lifecycle management (acronym: "ILM")
- data stream lifecycle (no official acronym)
If we do introduce an acronym for "data stream lifecycle" we should also update this main page about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can stick with ILM to refer to the data stream's lifecycle, since they are different (ILM has policies, is attached at the index level, has separate APIs, is not available in Serverless, etc).
The challenge that we've run into with "data stream lifecycle" is that absent an acronym, folks have been calling it "DSL", which is a MUCH more confusing acronym than "DLM", so we've been substituting DLM for the name to differentiate it from ILM, but still indicate it is the "Data (in the data stream) Lifecycle Management".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger that. Thanks for the clarification! DLM it is. :-)
::::{note} | ||
When an index is rolled over, the previous index’s age is updated to reflect the rollover time. This date, rather than the index’s `creation_date`, is used in {{ilm}} `min_age` phase calculations. [Learn more](../../../troubleshoot/elasticsearch/index-lifecycle-management-errors.md#min-age-calculation). | ||
* **Size** - an index will rollover when its shards reach a set size, for example 50 GB. | ||
* **Age** - an index will rollover when an index reaches a certain age, for example 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* **Age** - an index will rollover when an index reaches a certain age, for example 7 days. | |
* **Age** - an index will rollover when it reaches a certain age, for example 7 days. |
:::: | ||
|
||
After rollover, indices move to other index lifecycle phases like warm, cold, frozen, and delete. Rollover creates a new write index while the old one continues through the lifecycle phases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rollover, indices move to other index lifecycle phases like warm, cold, frozen, and delete. Rollover creates a new write index while the old one continues through the lifecycle phases. | |
After rollover, indices move through other configured index lifecycle phases: warm, cold, frozen, and/or delete. Rollover creates a new write index while the old one continues through the lifecycle phases. |
Would this be better? I'd probably avoid like
since that seems to me to imply that "warm", "cold", etc. are examples, rather than an exhaustive list.
|
||
## Automatic rollover [ilm-automatic-rollover] | ||
* Rollover for an empty write index is skipped even if they have an associated `max_age` that would otherwise result in a roll over occurring. A policy can override this behavior if you set `min_docs: 0` in the rollover conditions. This can also be disabled on a cluster-wide basis if you set `indices.lifecycle.rollover.only_if_has_documents` to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Rollover for an empty write index is skipped even if they have an associated `max_age` that would otherwise result in a roll over occurring. A policy can override this behavior if you set `min_docs: 0` in the rollover conditions. This can also be disabled on a cluster-wide basis if you set `indices.lifecycle.rollover.only_if_has_documents` to `false`. | |
* Rollover for an empty write index is skipped even if it has an associated `max_age` that would otherwise result in a rollover occurring. A policy can override this behavior if you set `min_docs: 0` in the rollover conditions. This can also be disabled on a cluster-wide basis if you set `indices.lifecycle.rollover.only_if_has_documents` to `false`. |
::::{important} | ||
Empty indices will not be rolled over, even if they have an associated `max_age` that would otherwise result in a roll over occurring. A policy can override this behavior, and explicitly opt in to rolling over empty indices, by adding a `"min_docs": 0` condition. This can also be disabled on a cluster-wide basis by setting `indices.lifecycle.rollover.only_if_has_documents` to `false`. | ||
:::: | ||
^1^ Rollover is handled automatically in Serverless projects, therefore configuring rollover timing is abstracted from the user. {applies_to}`serverless: ga` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^1^ Rollover is handled automatically in Serverless projects, therefore configuring rollover timing is abstracted from the user. {applies_to}`serverless: ga` | |
^1^ Rollover is handled automatically in {{es-serverless}} projects, therefore configuring rollover timing is abstracted from the user. {applies_to}`serverless: ga` |
I think I'd also remove therefore configuring rollover timing is abstracted from the user
since we usually avoid speaking about "the user" in favour of addressing users directly.
|
||
### Rotating your indices with data streams [rollover-data-stream] | ||
|
||
We recommend using [data streams](../../data-store/data-streams.md) to manage time series data. When set up to use ILM policies that include rollover, data streams automatically manage the rotation of your indices. This ensures you can write to the data stream without additional configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recommend using [data streams](../../data-store/data-streams.md) to manage time series data. When set up to use ILM policies that include rollover, data streams automatically manage the rotation of your indices. This ensures you can write to the data stream without additional configuration. | |
We recommend using [data streams](../../data-store/data-streams.md) to manage time series data. When set up to use an ILM policy that includes rollover, a data stream automatically manages the rotation of your indices. This ensures you can write to the data stream without additional configuration. |
Maybe use singular for simplicity and to match the next sentence at line 65.
### Rotating your indices with data streams [rollover-data-stream] | ||
|
||
We recommend using [data streams](../../data-store/data-streams.md) to manage time series data. When set up to use ILM policies that include rollover, data streams automatically manage the rotation of your indices. This ensures you can write to the data stream without additional configuration. | ||
When targeting a data stream, the new backing index becomes the data stream's writing index. The generation of new backing indices is incremented automatically when it reaches a specified age or size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When targeting a data stream, the new backing index becomes the data stream's writing index. The generation of new backing indices is incremented automatically when it reaches a specified age or size. | |
When you use a data stream, each time the current write index reaches a specified age or size, a new backing index is generated, with an incremented number and timestamp, and it becomes the data stream's writing index. |
Sorry to belabour this one. Would this phrasing maybe be clearer? Feel free to ignore. :-)
```console | ||
.ds-<DATA-STREAM-NAME>-<yyyy.MM.dd>-<GENERATION> | ||
``` | ||
For more information about data stream naming patterns, refer to the [Generation](../../data-store/data-streams.md#data-streams-generation) section of the Data streams page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more information about data stream naming patterns, refer to the [Generation](../../data-store/data-streams.md#data-streams-generation) section of the Data streams page. | |
For more information about the data stream naming pattern, refer to the [Generation](../../data-store/data-streams.md#data-streams-generation) section of the Data streams page. |
:::{important} | ||
The use of aliases for rollover requires meeting certain conditions. Review these considerations before applying this approach: | ||
|
||
* The index name must match the pattern `^.-\d+$*`, for example `my-index-000001`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really ^.-\d+$*
? I would think ^.-\d+$
. That is, can there be characters in the name following the end of line delimiter ($
)?
We could also say something more easily readable like "must match the pattern <INDEX_NAME>-<INDEX_NUMBER>
, for example my-index-000001"
Very nice work on this @yetanothertw! |
As per feedback received and collected in #1563 about the Rollover page, we're updating this page to improve the information quality and relevance.
The content has been restructured into the following sections:
Fixes #1563