Skip to content

Conversation

guluo2016
Copy link
Contributor

@guluo2016 guluo2016 commented Sep 16, 2025

Adjusts the cluster health computation to treat indices with no routing table as unassigned.

Relates #33888
Closes #134806

Copy link

cla-checker-service bot commented Sep 16, 2025

💚 CLA has been signed

@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 16, 2025
@guluo2016
Copy link
Contributor Author

I have already signed the CLA. If I missed anything, please let me know.

Please review, thanks a lot!

@szybia szybia added :Data Management/Health and removed needs:triage Requires assignment of a team area label labels Sep 22, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

this.activeShardsPercent = 100.0;
} else {
this.activeShardsPercent = (((double) this.activeShards) / totalShardCount) * 100;
this.activeShardsPercent = totalShardCount == 0 ? 0.0 : (((double) this.activeShards) / totalShardCount) * 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that perhaps we should represent this as 100%, rather than 0%, since the cluster would be totally healthy if there were no shards. 0% would indicate otherwise to me.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this. If there are indices in the metadata but no routing table then the index is totally unassigned. The only way we can get here is if we skip over those indices here:

if (indexRoutingTable == null) {
continue;
}

That probably made sense prior to #33888 but these days it doesn't, the only time you can get here is in a freshly-started cluster, and for that we should be saying the health is red and the assigned percentage is 0.

I think we should drop that null check and instead make the constructor of org.elasticsearch.cluster.health.ClusterIndexHealth handle a null routing table by treating it as if all the shards are unassigned.

@guluo2016
Copy link
Contributor Author

Thanks for the review @dakrone @DaveCTurner
I’ll update the code according to the review comments.

@guluo2016
Copy link
Contributor Author

Please review, thanks!

DaveCTurner
DaveCTurner previously approved these changes Oct 3, 2025
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the change in approach @guluo2016

@DaveCTurner DaveCTurner self-assigned this Oct 3, 2025
@DaveCTurner
Copy link
Contributor

@elasticmachine test this please

@DaveCTurner DaveCTurner changed the title Fix active_shards_percent_as_number NaN when total shard count is zero Fix active_shards_percent_as_number NaN when state not recovered Oct 3, 2025
@DaveCTurner
Copy link
Contributor

This doesn't pass ./gradlew :server:test - could you take another look @guluo2016 ?

@DaveCTurner DaveCTurner dismissed their stale review October 3, 2025 11:21

Failing tests

@guluo2016
Copy link
Contributor Author

The update has been made.
I manually ran the test cases, and everything passed successfully.

> Task :server:test
WARNING: module-info.class ignored in patch: ...
...
BUILD SUCCESSFUL in 7m
91 actionable tasks: 2 executed, 89 up-to-date
Consider enabling configuration cache to speed up this build: https://docs.gradle.org/9.1.0/userguide/configuration_cache_enabling.html

Please take a look when you have time. Thanks! @DaveCTurner

@guluo2016 guluo2016 requested a review from DaveCTurner October 11, 2025 05:42
@DaveCTurner
Copy link
Contributor

@elasticmachine test this please

@DaveCTurner
Copy link
Contributor

@elasticmachine test this please

@DaveCTurner
Copy link
Contributor

@elasticmachine test this please

@DaveCTurner DaveCTurner enabled auto-merge (squash) October 13, 2025 16:38
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 714c077 into elastic:main Oct 13, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Health external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue Team:Data Management Meta label for data/management team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Cluster Health API] active_shards_percent_as_number calculation should not allow denominator to be zero

5 participants