-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add node feature for failure store, refactor capability names #126885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Require it on every node before redirecting data to maintain a consistent response until the cluster is fully updated.
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Moved the FeatureService creation point to be earlier in the NodeConstruction so that IngestService can use the service. We'd like to make use of the FeatureService to ensure that downstream ingestion logic is present and consistent on all nodes before applying failure store logic in the IngestService cc @elastic/es-core-infra as code owners |
server/src/main/java/org/elasticsearch/ingest/IngestService.java
Outdated
Show resolved
Hide resolved
| AtomicArray<BulkItemResponse> responses | ||
| ) { | ||
| // Determine if we have the feature enabled once for entire bulk operation | ||
| final boolean clusterSupportsFailureStore = featureService.clusterHasFeature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume even for a very large cluster, this check is cheap enough relatively to a bulk request that it's fine to run on every bulk? I think it just loops through all the nodes in the cluster state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed a linear check over the list of nodes in the cluster. Unfortunately there's not a great way to hoist this up further than a request-by-request basis without introducing some kind of timer element or observer interface. We need to be responsive to changes in the cluster in a timely manner, and refactoring things to build off a cluster state listener seems like more complexity than is worth at the moment. I'd be hard pressed to try and optimize this further without indication that it's in a bad place.
| new FeatureService(List.of()) { | ||
| @Override | ||
| public boolean clusterHasFeature(ClusterState state, NodeFeature feature) { | ||
| return DataStream.DATA_STREAM_FAILURE_STORE_FEATURE.equals(feature); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It stands out as odd that this block of code is repeated over and over again. I don't have a great suggestion though -- maybe a static test helper FeatureService that just says yes to everything? That would avoid someone having to change this in 13 places if they add a new feature that they want used in all the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, FeatureService really feels like it wants a test implementation. I thought we had one but it seems to be unrelated. These were all to just get the tests happy, only one or two of them have any differentiating logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor questions, but LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeConstruction changes look fine.
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
…c#126885) Adds a node feature that is conditionally added to the cluster state if the failure store feature flag is enabled. Requires all nodes in the cluster to have the node feature present in order to redirect failed documents to the failure store from the ingest node or from shard level bulk failures.
…126885) (#127091) * Add node feature for failure store, refactor capability names (#126885) Adds a node feature that is conditionally added to the cluster state if the failure store feature flag is enabled. Requires all nodes in the cluster to have the node feature present in order to redirect failed documents to the failure store from the ingest node or from shard level bulk failures. * Fix backporting issues
Adds a node feature that is conditionally added to the cluster state if the failure store feature flag is enabled. Requires all nodes in the cluster to have the node feature present in order to redirect failed documents to the failure store from the ingest node or from shard level bulk failures. (cherry picked from commit d928d1a)
Adds a node feature that is conditionally added to the cluster state if the failure store feature flag is enabled. Requires all nodes in the cluster to have the node feature present in order to redirect failed documents to the failure store from the ingest node or from shard level bulk failures.
Additionally, updates the names of some of the capabilities returned from the API's:
failure_store_in_templatebecomesdata_stream_options.failure_storeindex_expression_selectorsadded to search apilazy-rollover-failure-storereplaced byindex_expression_selectorsindex-expression-selectorsreplaced byindex_expression_selectorsto match search apidata_stream_failure_store_cluster_settingreplaced by the new node feature