-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Correctly record SLM state during master failover #134018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Updates SLM to correctly record snapshot statistics during master failover by retrieving and evaluating actual snapshot status rather than assuming failures. This ensures accurate tracking of snapshot success/failure states when the master node is shut down during SLM execution.
Key changes:
- Enhanced WriteJobStatus to accept and process snapshot information for completed registered snapshots
- Added logic to fetch snapshot details for registered snapshots that are no longer running
- Updated test coverage to verify correct stat recording during master failover scenarios
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
SnapshotLifecycleTask.java | Core logic changes to fetch and process snapshot info for accurate stat recording |
SnapshotLifecycleTaskTests.java | Updated test cases to reflect new WriteJobStatus method signatures and behavior |
SnapshotLifecycleService.java | Added utility method to extract policy ID from job ID |
SLMSnapshotBlockingIntegTests.java | Added integration tests for master failover scenarios |
RegisteredPolicySnapshots.java | Updated documentation to reflect new behavior |
Comments suppressed due to low confidence (1)
x-pack/plugin/slm/src/main/java/org/elasticsearch/xpack/slm/SnapshotLifecycleTask.java:1
- This assertion could cause the application to crash in production if an invalid job ID is passed. Consider throwing a proper exception with descriptive error message instead of using an assertion.
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
); | ||
|
||
final SnapshotInfo snapInfo = createSnapshotResponse.getSnapshotInfo(); | ||
assert snapInfo != null : "completed snapshot info is null"; |
Copilot
AI
Sep 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion assumes that CreateSnapshotResponse always contains snapshot info for completed snapshots, but this may not hold true in all edge cases. Consider using a null check with appropriate error handling instead of an assertion.
assert snapInfo != null : "completed snapshot info is null"; | |
if (snapInfo == null) { | |
logger.warn("CreateSnapshotResponse did not contain snapshot info for completed snapshot [{}]", request.snapshot()); | |
onFailure(new ElasticsearchException("Snapshot info is null for completed snapshot [" + request.snapshot() + "]")); | |
return; | |
} |
Copilot uses AI. Check for mistakes.
No description provided.