-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Failure store] Introduce dedicated failure store lifecycle configuration #127314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…with null enabled.
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @gmarouli, I've created a changelog YAML for you. |
jbaiera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a couple of non binding questions, but otherwise this LGTM!
| // We don't issue any warnings if all data streams are internal data streams | ||
| dataStreamOptions.failureStore() | ||
| .lifecycle() | ||
| .addWarningHeaderIfDataRetentionNotEffective(globalRetentionSettings.get(), onlyInternalDataStreams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't have the default failure store retention setting yet does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, it defaults to the global default right now. In the follow up PR it will only use the failures default. Do you think we should handle it differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, follow up is perfectly fine!
| for (DataStream dataStream : project.dataStreams().values()) { | ||
| clearErrorStoreForUnmanagedIndices(project, dataStream); | ||
| if (dataStream.getDataLifecycle() == null) { | ||
| var dataLifecycleEnabled = dataStream.getDataLifecycle() != null && dataStream.getDataLifecycle().enabled(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we weren't checking if the lifecycle was enabled, I'm assuming there isn't any important logic that follows this that needs executing if the data lifecycle configuration is present but disabled? Also, is present-but-disabled an invalid configuration state for the normal lifecycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions.
- A lifecycle can be present by disabled, either for maintenance or if a user wants it disable for some reason.
- If a lifecycle is not present or if it's disabled, then the lifecycle service should not perform any operations for this data stream, so I think it makes sense to skip it if both the data and failures lifecycle are not enabled.
- I will double check the code to ensure the data lifecycle being null is handled properly, because this is something that could not happen before.
- We could keep just the null check here of course, it doesn't cost that much, but it creates this doubt that null and disabled behave differently, that's why I would prefer to include it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed, all actions check DataStream::isIndexManagedByDataStreamLifecycle() before executing any operation and this method handled a null lifecycle.
...treams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java
Outdated
Show resolved
Hide resolved
| ResettableValue.create(new DataStreamFailureStore.Template(ResettableValue.create(true))) | ||
| ) | ||
| ) | ||
| new DataStreamOptions.Template(DataStreamFailureStore.builder().enabled(true).buildTemplate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so happy 😌
| */ | ||
|
|
||
| package org.elasticsearch.datastreams.options.action; | ||
| package org.elasticsearch.action.datastreams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was looking for where this was now used that it needed to be moved and I think I missed it. Where is this used now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for serverless.
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…tion (elastic#127314) The failure store is a set of data stream indices that are used to store certain type of ingestion failures. Until this moment they were sharing the configuration of the backing indices. We understand that the two data sets have different lifecycle needs. We believe that typically the failures will need to be retained much less than the data. Considering this we believe the lifecycle needs of the failures also more limited and they fit better the simplicity of the data stream lifecycle feature. This allows the user to only set the desired retention and we will perform the rollover and other maintenance tasks without the user having to think about them. Furthermore, having only one lifecycle management feature allows us to ensure that these data is managed by default. This PR introduces the following: Configuration We extend the failure store configuration to allow lifecycle configuration too, this configuration reflects the user's configuration only as shown below: PUT _data_stream/*/options { "failure_store": { "lifecycle": { "data_retention": "5d" } } } GET _data_stream/*/options { "data_streams": [ { "name": "my-ds", "options": { "failure_store": { "lifecycle": { "data_retention": "5d" } } } } ] } To retrieve the effective configuration you need to use the GET data streams API, see elastic#126668 Functionality The data stream lifecycle (DLM) will manage the failure indices regardless if the failure store is enabled or not. This will ensure that if the failure store gets disabled we will not have stagnant data. The data stream options APIs reflect only the user's configuration. The GET data stream API should be used to check the current state of the effective failure store configuration. Telemetry We extend the data stream failure store telemetry to also include the lifecycle telemetry. { "data_streams": { "available": true, "enabled": true, "data_streams": 10, "indices_count": 50, "failure_store": { "explicitly_enabled_count": 1, "effectively_enabled_count": 15, "failure_indices_count": 30 "lifecycle": { "explicitly_enabled_count": 5, "effectively_enabled_count": 20, "data_retention": { "configured_data_streams": 5, "minimum_millis": X, "maximum_millis": Y, "average_millis": Z, }, "effective_retention": { "retained_data_streams": 20, "minimum_millis": X, "maximum_millis": Y, "average_millis": Z }, "global_retention": { "max": { "defined": false }, "default": { "defined": true, <------ this is the default value applicable for the failure store "millis": X } } } } } Implementation details We ensure that partially reset failure store will create valid failure store configuration. We ensure that when a node communicates with a note with a previous version it will ensure it will not send an invalid failure store configuration enabled: null. (cherry picked from commit 03d7781) # Conflicts: # modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamsPlugin.java # modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java # modules/data-streams/src/test/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceTests.java # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDataStreamsService.java # server/src/test/java/org/elasticsearch/cluster/metadata/DataStreamTests.java
In this PR we claim the backport to 8.19 version for the introduction of the failure store lifecycle (#127314).
In elastic#127623 we backported elastic#127299 and added a backport transport version for it - `ESQL_AGGREGATE_METRIC_DOUBLE_BLOCK_8_19` aka `8_841_0_24`. This brings that version forwards to `main` and adds support for parsing streams with that version. In elastic#127639 we backported elastic#126401 and added a backport transport version for it - `PINNED_RETRIEVER_8_19` aka `8_841_0_23`. This brings that version forwards to `main` and adds support for parsing streams with that versions. In elastic#127633 we a claimed a backport transport version to backport elastic#127314 - `INTRODUCE_FAILURES_LIFECYCLE_BACKPORT_8_19` aka `8_841_0_23`. That's the same versions as `PINNED_RETRIEVER_8_19`. It's just that this one is in `main` and `PINNED_RETRIEVER_8_19` is in `8.19`. To allow me to bring `PINNED_RETRIEVER_8_19` for wards I've had to revert elastic#127633. Closes elastic#127667
In #127623 we backported #127299 and added a backport transport version for it - `ESQL_AGGREGATE_METRIC_DOUBLE_BLOCK_8_19` aka `8_841_0_24`. This brings that version forwards to `main` and adds support for parsing streams with that version. In #127639 we backported #126401 and added a backport transport version for it - `PINNED_RETRIEVER_8_19` aka `8_841_0_23`. This brings that version forwards to `main` and adds support for parsing streams with that versions. In #127633 we a claimed a backport transport version to backport #127314 - `INTRODUCE_FAILURES_LIFECYCLE_BACKPORT_8_19` aka `8_841_0_23`. That's the same versions as `PINNED_RETRIEVER_8_19`. It's just that this one is in `main` and `PINNED_RETRIEVER_8_19` is in `8.19`. To allow me to bring `PINNED_RETRIEVER_8_19` for wards I've had to revert #127633. Closes #127667
…nfiguration (#127314) (#127577) * [Failure store] Introduce dedicated failure store lifecycle configuration (#127314) The failure store is a set of data stream indices that are used to store certain type of ingestion failures. Until this moment they were sharing the configuration of the backing indices. We understand that the two data sets have different lifecycle needs. We believe that typically the failures will need to be retained much less than the data. Considering this we believe the lifecycle needs of the failures also more limited and they fit better the simplicity of the data stream lifecycle feature. This allows the user to only set the desired retention and we will perform the rollover and other maintenance tasks without the user having to think about them. Furthermore, having only one lifecycle management feature allows us to ensure that these data is managed by default. This PR introduces the following: Configuration We extend the failure store configuration to allow lifecycle configuration too, this configuration reflects the user's configuration only as shown below: PUT _data_stream/*/options { "failure_store": { "lifecycle": { "data_retention": "5d" } } } GET _data_stream/*/options { "data_streams": [ { "name": "my-ds", "options": { "failure_store": { "lifecycle": { "data_retention": "5d" } } } } ] } To retrieve the effective configuration you need to use the GET data streams API, see #126668 Functionality The data stream lifecycle (DLM) will manage the failure indices regardless if the failure store is enabled or not. This will ensure that if the failure store gets disabled we will not have stagnant data. The data stream options APIs reflect only the user's configuration. The GET data stream API should be used to check the current state of the effective failure store configuration. Telemetry We extend the data stream failure store telemetry to also include the lifecycle telemetry. { "data_streams": { "available": true, "enabled": true, "data_streams": 10, "indices_count": 50, "failure_store": { "explicitly_enabled_count": 1, "effectively_enabled_count": 15, "failure_indices_count": 30 "lifecycle": { "explicitly_enabled_count": 5, "effectively_enabled_count": 20, "data_retention": { "configured_data_streams": 5, "minimum_millis": X, "maximum_millis": Y, "average_millis": Z, }, "effective_retention": { "retained_data_streams": 20, "minimum_millis": X, "maximum_millis": Y, "average_millis": Z }, "global_retention": { "max": { "defined": false }, "default": { "defined": true, <------ this is the default value applicable for the failure store "millis": X } } } } } Implementation details We ensure that partially reset failure store will create valid failure store configuration. We ensure that when a node communicates with a note with a previous version it will ensure it will not send an invalid failure store configuration enabled: null. (cherry picked from commit 03d7781) # Conflicts: # modules/data-streams/src/main/java/org/elasticsearch/datastreams/DataStreamsPlugin.java # modules/data-streams/src/main/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleService.java # modules/data-streams/src/test/java/org/elasticsearch/datastreams/lifecycle/DataStreamLifecycleServiceTests.java # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDataStreamsService.java # server/src/test/java/org/elasticsearch/cluster/metadata/DataStreamTests.java * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
…27633) In this PR we claim the backport to 8.19 version for the introduction of the failure store lifecycle (elastic#127314).
In elastic#127623 we backported elastic#127299 and added a backport transport version for it - `ESQL_AGGREGATE_METRIC_DOUBLE_BLOCK_8_19` aka `8_841_0_24`. This brings that version forwards to `main` and adds support for parsing streams with that version. In elastic#127639 we backported elastic#126401 and added a backport transport version for it - `PINNED_RETRIEVER_8_19` aka `8_841_0_23`. This brings that version forwards to `main` and adds support for parsing streams with that versions. In elastic#127633 we a claimed a backport transport version to backport elastic#127314 - `INTRODUCE_FAILURES_LIFECYCLE_BACKPORT_8_19` aka `8_841_0_23`. That's the same versions as `PINNED_RETRIEVER_8_19`. It's just that this one is in `main` and `PINNED_RETRIEVER_8_19` is in `8.19`. To allow me to bring `PINNED_RETRIEVER_8_19` for wards I've had to revert elastic#127633. Closes elastic#127667
The failure store is a set of data stream indices that are used to store certain type of ingestion failures. Until this moment they were sharing the configuration of the backing indices. We understand that the two data sets have different lifecycle needs.
We believe that typically the failures will need to be retained much less than the data. Considering this we believe the lifecycle needs of the failures also more limited and they fit better the simplicity of the data stream lifecycle feature.
This allows the user to only set the desired retention and we will perform the rollover and other maintenance tasks without the user having to think about them. Furthermore, having only one lifecycle management feature allows us to ensure that these data is managed by default.
This PR introduces the following:
Configuration
We extend the failure store configuration to allow lifecycle configuration too, this configuration reflects the user's configuration only as shown below:
To retrieve the effective configuration you need to use the
GETdata streams API, see #126668Functionality
GETdata stream API should be used to check the current state of the effective failure store configuration.Telemetry
We extend the data stream failure store telemetry to also include the lifecycle telemetry.
Implementation details
enabled: null.