-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add settings for health indicator shard_capacity thresholds #136141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add settings for health indicator shard_capacity thresholds #136141
Conversation
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces configurable thresholds for the shard capacity health indicator, allowing administrators to customize when the indicator reports YELLOW and RED statuses. Previously, these thresholds were hardcoded to 10 and 5 remaining shards respectively.
- Adds two new dynamic cluster settings for configuring YELLOW and RED thresholds
- Updates the ShardsCapacityHealthIndicatorService to use configurable thresholds instead of hardcoded values
- Includes comprehensive test coverage and documentation for the new settings
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
ShardsCapacityHealthIndicatorService.java | Adds configurable threshold settings with validation and dynamic updates |
ShardsCapacityHealthIndicatorServiceTests.java | Updates all tests to use the new constructor and adds threshold validation tests |
NodeConstruction.java | Updates service instantiation to pass settings parameter |
ClusterSettings.java | Registers the new settings with cluster configuration |
HealthFeatures.java | New feature specification for the settings |
module-info.java | Exports the new HealthFeatures class |
META-INF/services/org.elasticsearch.features.FeatureSpecification | Registers the HealthFeatures service |
health-diagnostic-settings.md | Documents the new configuration settings |
30_feature.yml | Adds integration tests for the new settings functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
server/src/main/java/org/elasticsearch/health/node/ShardsCapacityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
...r/src/test/java/org/elasticsearch/health/node/ShardsCapacityHealthIndicatorServiceTests.java
Outdated
Show resolved
Hide resolved
…ityHealthIndicatorService.java Co-authored-by: Copilot <[email protected]>
Hi @samxbr, I've created a changelog YAML for you. |
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
health_report: | ||
feature: shards_capacity | ||
- is_true: cluster_name | ||
- match: { indicators.shards_capacity.status: "green" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do - match: { indicators.shards_capacity.status: "green" }
here. Check out my PR and the related Serverless test failure issue: #109808.
So, I think we need to convert this to a Java REST test, which allows us to wait for the green
indicator health. Maybe we could get away with changing this line from match
to exist
(like I did in that PR), but I'm worried the match: yellow
below might still be flaky, and it'll reduce the value of the test slightly as we don't have a guaranteed starting point. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch! I wonder why HealthMetadata needs to be cluster state metadata, they seem to just be a copy of these cluster settings. If we read the settings directly from cluster state, then it should eliminate the flakiness, but there are probably reason I don't understand yet. I posted in the team thread to discuss about this.
* <p> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formatting looks a bit off with an empty line in the middle of a JavaDoc. Could you revert the <p>
line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the <p>
tag make a difference in your IDE? The rendering looks the same in my Intellij.
"frozen" | ||
); | ||
|
||
public static final NodeFeature SHARD_CAPACITY_UNHEALTHY_THRESHOLD_SETTINGS = new NodeFeature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we include _FEATURE
somewhere in the field name to avoid confusion between this field and the actual settings?
{ | ||
builder.startObject("settings"); | ||
builder.field(SHARD_CAPACITY_UNHEALTHY_THRESHOLD_YELLOW.getKey(), unhealthyThresholdYellow); | ||
builder.field(SHARD_CAPACITY_UNHEALTHY_THRESHOLD_RED.getKey(), unhealthyThresholdRed); | ||
builder.endObject(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you're including the settings in the indicator output? I don't immediately see the value tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it's more for convenience of the health report. Since these thresholds are now dynamic settings, I thought it'd be good to report them as part of the health report. If someone's looking at previous health report logs they know what the thresholds were and don't need to guess if these settings have been changed.
Add dynamic settings to allow user to configure the unhealthy thresholds (yellow/red) of shard capacity health indicator. They replace the current hard-coded values:
health.shard_capacity.unhealthy_threshold.yellow
(default 10)health.shard_capacity.unhealthy_threshold.red
(default 5)Behavior:
RED
threshold needs to be smaller thanYELLOW
threshold, so the health indicator follows the order fromGREEN
->YELLOW
->RED
Closes #116697