Skip to content

Conversation

@masseyke
Copy link
Member

@masseyke masseyke commented Mar 25, 2025

This prevents a node from restarting a completed data stream reindex after the cluster has been upgraded from 8.x.
The persistent task for a data stream reindex is kept alive for 24 hours so that the user has time to check its status. When a cluster is upgraded, the persistent task moves to a new node. It currently does not check whether it was previously completed. So the persistent task running on (for example) 9.0 sees that the data stream contains nothing but "old" (8.18) indices, and unnecessarily begins reindexing all of those so that they are 9.0 indices.
This creates unnecessary work on the cluster. But worse, this second reindexing of the data stream indices causes the backing indices to have a name that is incompatible with the naming pattern expected by system data stream backing indices, leading to a variety of problems querying data.

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Mar 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

ReindexDataStreamTaskParams params,
PersistentTaskState persistentTaskState
) {
if (isComplete(persistentTaskState)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you have to schedule the completeTask for removal here, so the task is cleaned up after 24h?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point -- the scheduling of the removal was on the threadpool on the old node, which won't carry over to this node. But I think I can just do task.markAsCompleted(); here. The user has already upgraded, meaning they know that it completed. And the information is really out of date at this point anyway (because the data stream is "old" for this new cluster).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node is terminated and the task moves to a new node (after completion), we wouldn't remove the task metadata otherwise.

Also note, the ScheduledCancellable returned by below line keeps running on the old node in this case, making the behavior a bit indeterministic. As far as I remember completeAndNotifyIfNeeded might throw if completed multiple times in some cases.

threadPool.schedule(completeTask, timeToLive, threadPool.generic())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that the task would only be moved if the node it was on came down altogether, threadpool included, right? Regardless though, the exception will be harmless won't it?
I'll change it to schedule the removal in the future so that it continues to work once users intentionally run this on 9.x (I don't think they will want to until we're approaching 10.0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user has already upgraded, meaning they know that it completed. And the information is really out of date at this point anyway (because the data stream is "old" for this new cluster).

Actually, rethinking this, the task might also be moved for other reasons (a node might crash any time), we can't assume the upgrade has happened just because the task was moved. Not sure how important it is to keep the old state around for 24h, but it might lead to surprising results this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this change definitely doesn't assume the task was moved just because it's on a new node -- it is explicitly checking that the completion time is non-null, meaning it was completed on some other node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only keeps the task for 24h beyond whenever it completes. If a node crashes before it completes, it will not have been scheduled for removal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only ever a problem if a node crashes after completion within those 24h

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks like that's already addressed

@masseyke masseyke requested a review from mosche March 25, 2025 15:31
Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@masseyke masseyke requested review from dakrone and lukewhiting March 25, 2025 15:35
@masseyke masseyke added the auto-backport Automatically create backport pull requests when merged label Mar 25, 2025
Copy link
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :-) 👍🏻

Comment on lines 324 to 335
private Long getCompletionTime(PersistentTaskState persistentTaskState) {
if (persistentTaskState == null) {
return null;
} else {
if (persistentTaskState instanceof ReindexDataStreamPersistentTaskState state) {
return state.completionTime();
} else {
return null;
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think you need a null check prior to calling instanceof


private TimeValue getTimeToLive(long completionTimeInMillis) {
return TimeValue.timeValueMillis(
TASK_KEEP_ALIVE_TIME.millis() - Math.max(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Math.min?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks!

Copy link
Contributor

@JVerwolf JVerwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@masseyke masseyke added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Mar 25, 2025
@masseyke masseyke enabled auto-merge (squash) March 25, 2025 22:15
@masseyke masseyke merged commit 6a74aba into elastic:main Mar 25, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125587

@masseyke
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
9.0

Questions ?

Please refer to the Backport tool documentation

masseyke added a commit to masseyke/elasticsearch that referenced this pull request Mar 25, 2025
…c#125587)

(cherry picked from commit 6a74aba)

# Conflicts:
#	x-pack/plugin/migrate/src/main/java/org/elasticsearch/xpack/migrate/task/ReindexDataStreamTask.java
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending :Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v9.0.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants