-
Notifications
You must be signed in to change notification settings - Fork 130
Updating the Rollover page #2589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 8 commits
3e82318
d6d05cc
0708c30
45c4266
c082712
e88268a
b76c22c
e26c54d
f69f6f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,50 +9,96 @@ products: | |
|
||
# Rollover [index-rollover] | ||
|
||
When indexing time series data like logs or metrics, you can’t write to a single index indefinitely. To meet your indexing and search performance requirements and manage resource usage, you write to an index until some threshold is met and then create a new index and start writing to it instead. Using rolling indices enables you to: | ||
In {{es}}, the [rollover action](elasticsearch://reference/elasticsearch/index-lifecycle-actions/ilm-rollover.md) replaces your active write index with a new one whenever your index grows too large, too old, or stores too many documents. | ||
This is particularly useful for time-series data, such as logs or metrics where index growth is continuous, in order to meet performance and retention requirements. | ||
|
||
* Optimize the active index for high ingest rates on high-performance *hot* nodes. | ||
* Optimize for search performance on *warm* nodes. | ||
* Shift older, less frequently accessed data to less expensive *cold* nodes, | ||
* Delete data according to your retention policies by removing entire indices. | ||
Without rollover, a single index would continue to grow, causing search performance to drop and having a higher administrative burden on the cluster. | ||
|
||
We recommend using [data streams](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create-data-stream) to manage time series data. Data streams automatically track the write index while keeping configuration to a minimum. | ||
The rollover feature is an important part of how [index lifecycle](../index-lifecycle-management/index-lifecycle.md) (ILM) and [data stream lifecycles](../data-stream.md) (DLM) work to keep your indices fast and manageable. By switching the write target of an index, the rollover action provides the following benefits: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dakrone I think this and line 19 are based off of your comment. Do you think it's okay for us to introduce "DLM" here as an acronym for "data stream lifecycle management"? I'm a little hesitant since I gather DLM is already in popular use as "data lifecycle management" (e.g. IBM, HP), and our use here specifically to denote "data stream lifecycle" doesn't quite match. That is, to my mind "ILM" and "data stream lifecycle" are two different approaches to DLM. What would you think of our sticking with:
If we do introduce an acronym for "data stream lifecycle" we should also update this main page about it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we can stick with ILM to refer to the data stream's lifecycle, since they are different (ILM has policies, is attached at the index level, has separate APIs, is not available in Serverless, etc). The challenge that we've run into with "data stream lifecycle" is that absent an acronym, folks have been calling it "DSL", which is a MUCH more confusing acronym than "DLM", so we've been substituting DLM for the name to differentiate it from ILM, but still indicate it is the "Data (in the data stream) Lifecycle Management". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Roger that. Thanks for the clarification! DLM it is. :-) |
||
|
||
Each data stream requires an [index template](../../data-store/templates.md) that contains: | ||
* **Lifecycle** - works with lifecycle management (ILM or DLM) to transition the index through its lifecycle actions and allows for granular control over retention cycles | ||
* **Optimized performance** - keeps shard sizes within recommended limits (10-50 GB) | ||
* **Queries run faster** - improves search performance | ||
|
||
* A name or wildcard (`*`) pattern for the data stream. | ||
* The data stream’s timestamp field. This field must be mapped as a [`date`](elasticsearch://reference/elasticsearch/mapping-reference/date.md) or [`date_nanos`](elasticsearch://reference/elasticsearch/mapping-reference/date_nanos.md) field data type and must be included in every document indexed to the data stream. | ||
* The mappings and settings applied to each backing index when it’s created. | ||
Rollover can be triggered via the [API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover), ILM, or DLM. | ||
|
||
Data streams are designed for append-only data, where the data stream name can be used as the operations (read, write, rollover, shrink etc.) target. If your use case requires data to be updated in place, you can instead manage your time series data using [index aliases](../../data-store/aliases.md). However, there are a few more configuration steps and concepts: | ||
## How rollover works in ILM | ||
|
||
* An *index template* that specifies the settings for each new index in the series. You optimize this configuration for ingestion, typically using as many shards as you have hot nodes. | ||
* An *index alias* that references the entire set of indices. | ||
* A single index designated as the *write index*. This is the active index that handles all write requests. On each rollover, the new index becomes the write index. | ||
You define a rollover action in the hot phase of an index lifecycle policy. It will run when any of the configured thresholds are met and the write index contains at least one document. | ||
You can configure the following rollover conditions: | ||
|
||
::::{note} | ||
When an index is rolled over, the previous index’s age is updated to reflect the rollover time. This date, rather than the index’s `creation_date`, is used in {{ilm}} `min_age` phase calculations. [Learn more](../../../troubleshoot/elasticsearch/index-lifecycle-management-errors.md#min-age-calculation). | ||
* **Size** - an index will rollover when its shards reach a set size, for example 50 GB. | ||
* **Age** - an index will rollover when an index reaches a certain age, for example 7 days. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* **Document count** - an index will rollover when a shard contains a certain number of documents, for example 2 million. | ||
|
||
::::{tip} | ||
Rolling over to a new index based on size, document count, or age is preferable to time-based rollovers. Rolling over at an arbitrary time often results in many small indices, which can have a negative impact on performance and resource usage. | ||
:::: | ||
|
||
After rollover, indices move to other index lifecycle phases like warm, cold, frozen, and delete. Rollover creates a new write index while the old one continues through the lifecycle phases. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**Special rules:** | ||
|
||
## Automatic rollover [ilm-automatic-rollover] | ||
* Rollover for an empty write index is skipped even if they have an associated `max_age` that would otherwise result in a roll over occurring. A policy can override this behavior if you set `min_docs: 0` in the rollover conditions. This can also be disabled on a cluster-wide basis if you set `indices.lifecycle.rollover.only_if_has_documents` to `false`. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Forced rollover occurs if any shard reaches 200 million documents. Usually, a shard will reach 50 GB long before it reaches 200 million documents, but this isn’t the case for space efficient data sets. | ||
|
||
{{ilm-init}} and the data stream lifecycle enable you to automatically roll over to a new index based on conditions like the index size, document count, or age. When a rollover is triggered, a new index is created, the write alias is updated to point to the new index, and all subsequent updates are written to the new index. | ||
## Recommended approaches | ||
|
||
::::{tip} | ||
Rolling over to a new index based on size, document count, or age is preferable to time-based rollovers. Rolling over at an arbitrary time often results in many small indices, which can have a negative impact on performance and resource usage. | ||
:::: | ||
Decide your approach to index rotation based on your use case and requirements. | ||
|
||
| Use case | Recommended approach | Setup benefits and limitations | | ||
| ---------------------- | --------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | | ||
| Logs, metrics | [Data streams](rollover.md#rollover-data-stream) | Configure rollover with lifecycle management, *minimal setup*, control over rollover timing ^1^ | | ||
| Legacy indexing setup | [Alias-based rollover](rollover.md#rollover-with-aliases) | Configure rollover with lifecycle management, *advanced setup*, control over rollover timing | | ||
| Small, static datasets | No rollover | Simpler management | | ||
|
||
::::{important} | ||
Empty indices will not be rolled over, even if they have an associated `max_age` that would otherwise result in a roll over occurring. A policy can override this behavior, and explicitly opt in to rolling over empty indices, by adding a `"min_docs": 0` condition. This can also be disabled on a cluster-wide basis by setting `indices.lifecycle.rollover.only_if_has_documents` to `false`. | ||
:::: | ||
^1^ Rollover is handled automatically in Serverless projects, therefore configuring rollover timing is abstracted from the user. {applies_to}`serverless: ga` | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
:::{tip} | ||
For new projects, use data streams. They're simple to manage with lifecycle policies where you define phases and actions that handle rollover automatically. | ||
::: | ||
|
||
::::{important} | ||
The rollover action implicitly always rolls over a data stream or alias if one or more shards contain 200000000 or more documents. Normally a shard will reach 50GB long before it reaches 200M documents, but this isn’t the case for space efficient data sets. Search performance will very likely suffer if a shard contains more than 200M documents. This is the reason of the builtin limit. | ||
:::: | ||
|
||
### Rotating your indices with data streams [rollover-data-stream] | ||
|
||
We recommend using [data streams](../../data-store/data-streams.md) to manage time series data. When set up to use ILM policies that include rollover, data streams automatically manage the rotation of your indices. This ensures you can write to the data stream without additional configuration. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
When targeting a data stream, the new backing index becomes the data stream's writing index. The generation of new backing indices is incremented automatically when it reaches a specified age or size. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Each data stream requires an [index template](../../data-store/templates.md) that contains the following: | ||
|
||
* A name or wildcard (`*`) pattern for the data stream. | ||
* Optional: The mappings and settings applied to each backing index when it’s created. | ||
|
||
For more information about this approach, refer to the [Manage time series data with data streams](../index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-with-data-streams) tutorial. | ||
|
||
:::{tip} | ||
Data streams are designed for append-only data, where the data stream name can be used as the operations (read, write, rollover, shrink etc.) target. If your use case requires data to be updated in place, you can perform [update or delete operations directly on the backing indices](../../data-store/data-streams/use-data-stream.md#update-delete-docs-in-a-backing-index). | ||
::: | ||
|
||
**Data streams naming pattern**<br> | ||
{{es}} uses a structured naming convention for the backing indices of data streams, following this pattern: | ||
|
||
```console | ||
.ds-<DATA-STREAM-NAME>-<yyyy.MM.dd>-<GENERATION> | ||
``` | ||
For more information about data stream naming patterns, refer to the [Generation](../../data-store/data-streams.md#data-streams-generation) section of the Data streams page. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Rotating your indices with aliases [rollover-with-aliases] | ||
|
||
Rotating indices with aliases requires additional configuration steps, including bootstrapping the initial index. For more details about this approach, refer to the [Manage time series data without data streams](../index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) tutorial. | ||
|
||
:::{important} | ||
The use of aliases for rollover requires meeting certain conditions. Review these considerations before applying this approach: | ||
|
||
* The index name must match the pattern `^.-\d+$*`, for example `my-index-000001`. | ||
yetanothertw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* The `index.lifecycle.rollover_alias` must be configured as the alias to roll over. | ||
* The index must be the [write index](../../data-store/aliases.md#write-index) for the alias. | ||
::: | ||
|
||
::::{note} | ||
When an alias or data stream is rolled over, the previous write index’s rollover time is stored. This date, rather than the index’s `creation_date`, is used in {{ilm}} `min_age` phase calculations. [Learn more](../../../troubleshoot/elasticsearch/index-lifecycle-management-errors.md#min-age-calculation). | ||
|
||
:::: | ||
|
||
**Alias-based naming pattern**<br> | ||
When configured correctly, the newly created write index will have a similar name to one that's been rolled over, however the six-digit, zero-padded suffix will be incremented. For example before rollover, the write index was called `my-index-000001` and after rollover, the newly created index becomes `my-index-000002` and also becomes the new write index. The alias typically shares the base name which in this example is `my-index`. |
Uh oh!
There was an error while loading. Please reload this page.