-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Avoid restarting data stream reindex when cluster is upgraded #125587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid restarting data stream reindex when cluster is upgraded #125587
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
| ReindexDataStreamTaskParams params, | ||
| PersistentTaskState persistentTaskState | ||
| ) { | ||
| if (isComplete(persistentTaskState)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you have to schedule the completeTask for removal here, so the task is cleaned up after 24h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point -- the scheduling of the removal was on the threadpool on the old node, which won't carry over to this node. But I think I can just do task.markAsCompleted(); here. The user has already upgraded, meaning they know that it completed. And the information is really out of date at this point anyway (because the data stream is "old" for this new cluster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a node is terminated and the task moves to a new node (after completion), we wouldn't remove the task metadata otherwise.
Also note, the ScheduledCancellable returned by below line keeps running on the old node in this case, making the behavior a bit indeterministic. As far as I remember completeAndNotifyIfNeeded might throw if completed multiple times in some cases.
threadPool.schedule(completeTask, timeToLive, threadPool.generic())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that the task would only be moved if the node it was on came down altogether, threadpool included, right? Regardless though, the exception will be harmless won't it?
I'll change it to schedule the removal in the future so that it continues to work once users intentionally run this on 9.x (I don't think they will want to until we're approaching 10.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user has already upgraded, meaning they know that it completed. And the information is really out of date at this point anyway (because the data stream is "old" for this new cluster).
Actually, rethinking this, the task might also be moved for other reasons (a node might crash any time), we can't assume the upgrade has happened just because the task was moved. Not sure how important it is to keep the old state around for 24h, but it might lead to surprising results this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this change definitely doesn't assume the task was moved just because it's on a new node -- it is explicitly checking that the completion time is non-null, meaning it was completed on some other node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only keeps the task for 24h beyond whenever it completes. If a node crashes before it completes, it will not have been scheduled for removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is only ever a problem if a node crashes after completion within those 24h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks like that's already addressed
...plugin/migrate/src/main/java/org/elasticsearch/xpack/migrate/task/ReindexDataStreamTask.java
Outdated
Show resolved
Hide resolved
…m:masseyke/elasticsearch into data-stream-reindex-no-restart-on-upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good :-) 👍🏻
…m:masseyke/elasticsearch into data-stream-reindex-no-restart-on-upgrade
| private Long getCompletionTime(PersistentTaskState persistentTaskState) { | ||
| if (persistentTaskState == null) { | ||
| return null; | ||
| } else { | ||
| if (persistentTaskState instanceof ReindexDataStreamPersistentTaskState state) { | ||
| return state.completionTime(); | ||
| } else { | ||
| return null; | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't think you need a null check prior to calling instanceof
|
|
||
| private TimeValue getTimeToLive(long completionTimeInMillis) { | ||
| return TimeValue.timeValueMillis( | ||
| TASK_KEEP_ALIVE_TIME.millis() - Math.max( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Math.min?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
This prevents a node from restarting a completed data stream reindex after the cluster has been upgraded from 8.x.
The persistent task for a data stream reindex is kept alive for 24 hours so that the user has time to check its status. When a cluster is upgraded, the persistent task moves to a new node. It currently does not check whether it was previously completed. So the persistent task running on (for example) 9.0 sees that the data stream contains nothing but "old" (8.18) indices, and unnecessarily begins reindexing all of those so that they are 9.0 indices.
This creates unnecessary work on the cluster. But worse, this second reindexing of the data stream indices causes the backing indices to have a name that is incompatible with the naming pattern expected by system data stream backing indices, leading to a variety of problems querying data.