Make the datastream reindexing APIs multi-project aware #130035

joegallo · 2025-06-25T19:03:15Z

Updates POST /_migration/reindex and related APIs to operate on a project state (or project metadata) throughout.

Most of this is pretty mechanical, the only part that was a bit creative was tracking the ProjectId in the ReindexDataStreamTask (so make sure that the approach I took there seems sane).

masseyke · 2025-06-25T19:59:22Z

the only part that was a bit creative was tracking the ProjectId in the ReindexDataStreamTask (so make sure that the approach I took there seems sane).

It looks sane to me, assuming that we get a distinct NodeConstruction for each project, and each NodeClient is tied to a specific project (I assume we do, but I don't know a lot about multiproject).

masseyke · 2025-06-25T20:07:30Z

It looks sane to me, assuming that we get a distinct NodeConstruction for each project, and each NodeClient is tied to a specific project (I assume we do, but I don't know a lot about multiproject).

On second thought, I'm not so sure. It sounds like the projectResolver returns whatever the project id for the current request is, and maybe at node construction time it's just the default project? In that case, we probably need to write the project id to the task params. Or change the persistent task framework to pass in a projectResolver to createTask.

.../main/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamIndexTransportAction.java

nielsbauman · 2025-06-25T19:57:23Z

...plugin/migrate/src/main/java/org/elasticsearch/xpack/migrate/task/ReindexDataStreamTask.java

        int totalIndicesToBeUpgraded = initialTotalIndicesToBeUpgraded;
        PersistentTasksCustomMetadata.PersistentTask<?> persistentTask = PersistentTasksCustomMetadata.getTaskWithId(
-            clusterService.state(),
+            clusterService.state().projectState(projectId).metadata(),


There is no need to convert to ProjectState here (we could call clusterService.state().metadata().getProject(projectId)).

I see we have a null-check below for persistentTask. While we should handle project deletions (even soft-deletes) gracefully and stop persistent tasks first, I think it doesn't hurt to have some defense against deleted projects here. We could do something like:
var project = clusterService.state().metadata().projects().get(projectId); PersistentTasksCustomMetadata.PersistentTask<?> persistentTask = project == null ? null : PersistentTasksCustomMetadata.getTaskWithId(project, getPersistentTask());
But other variations of that are fine too of course.

The same goes for similar changes in this and other files, and for obtaining sourceIndex in ReindexDataStreamIndexTransportAction.java. Generally, if a block of code assumes that something exists (e.g. a task/custom/index), I don't do an extra null-check for the project, but if the code defends against missing objects, I check for missing project to maintain the level of defense.

14ac833 addresses your first point, but I'll have to grind a bit on the second one. (Which I'll do tomorrow.)

Okay, I think I've handled your second point via ef53eb2.

...n/java/org/elasticsearch/xpack/migrate/action/CopyLifecycleIndexMetadataTransportAction.java

...e/src/main/java/org/elasticsearch/xpack/migrate/action/ReindexDataStreamTransportAction.java

nielsbauman · 2025-06-25T20:16:45Z

@masseyke before the persistent task framework starts a (project-scoped) persistent task, it will put the project ID in the thread context:

elasticsearch/server/src/main/java/org/elasticsearch/persistent/PersistentTasksNodeService.java

Lines 195 to 203 in 528bd9c

    
           if (projectId != null) { 
        
               @FixForMultiProject( 
        
                   description = "Replace with ProjectResolver#executeOnProject once " 
        
                       + "DefaultProjectResolver can ensure the header in threadContext" 
        
               ) 
        
               final String projectIdString = projectId.id(); 
        
               threadPool.getThreadContext().putHeader(Task.X_ELASTIC_PROJECT_ID_HTTP_HEADER, projectIdString); 
        
           } 
        
           doStartTask(taskInProgress, executor, request);

The ProjectResolver in ReindexDataStreamPersistentTaskExecutor then picks up that project ID from the thread context.
Does that answer your concern, or were you talking about something else?

masseyke · 2025-06-25T20:25:26Z

The ProjectResolver in ReindexDataStreamPersistentTaskExecutor then picks up that project ID from the thread context.
Does that answer your concern, or were you talking about something else?

OK that makes sense. Looks good then!

joegallo · 2025-06-25T21:19:03Z

@nielsbauman for the scope of this kind of work does it make sense for me to be ignoring implied compilation changes that should happen in the tests, too? For example, there are calls to Metadata#getProject() in CopyLifecycleIndexMetadataTransportActionIT but I just ignored files like that.

elasticsearchmachine · 2025-06-26T17:22:33Z

Pinging @elastic/es-data-management (Team:Data Management)

nielsbauman

Thanks for addressing my comments. I added one more comment with one suggestion and one optional suggestion. After that we should be good to go.

nielsbauman · 2025-06-26T17:38:35Z

...plugin/migrate/src/main/java/org/elasticsearch/xpack/migrate/task/ReindexDataStreamTask.java

-            clusterService.state(),
-            getPersistentTaskId()
-        );
+        final var projectMetadata = clusterService.state().metadata().getProject(projectId);


I have good news and I have bad news. The bad news is that I said something wrong before in #130035 (comment). I said

The same goes for similar changes in this and other files

which isn't actually true. In all the other cases in this PR, we use the project resolver to obtain a project metadata (or project state). That throws an exception if the project doesn't exist (so doing a null-check is redundant). The "good" news is that the null-checks aren't doing much harm either, so I'm also fine with leaving them in. Sorry about that.

The places where we do need to do a null-check are the line I'm commenting on here, the lines below in this file, and in the task executor in CopyLifecycleIndexMetadataTransportAction, because those places run async of requests and could thus (theoretically) run after a project has been deleted. You currently use getProject(projectId) in both places, but that will throw an exception if the project doesn't exist (I know, not super intuitive from the method name, maybe we should change that). We'll need to do projects().get(projectId) as I did in my previous suggestion.

Okay, that makes sense. I reverted the extraneous null checking on the results of getProjectMetadata in 08c6012.

nielsbauman

LGTM, thanks a lot for the iterations!

joegallo added 7 commits June 25, 2025 14:55

Use the utility method for fetching the task by id

139bcd9

Rewrite this in terms of project metadata manipulation

b799cb7

Make migrate/10_reindex/* pass

62c4540

Make migrate/20_reindex_status/* pass

59eaa78

Make migrate/30_create_from/* pass

040ea98

Avoid some deprecated methods

4055394

Tidy up an unused logger

e85a4f7

joegallo requested review from masseyke and nielsbauman June 25, 2025 19:03

joegallo added >non-issue :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team v9.2.0 labels Jun 25, 2025

nielsbauman reviewed Jun 25, 2025

View reviewed changes

joegallo added 3 commits June 25, 2025 17:03

Use the project-specific version of start and remove

bb81d9b

Skip through to the projectMetadata directly

14ac833

Avoid the intermediate projectState

34e9808

joegallo added 5 commits June 25, 2025 17:42

Capture the projectId in UpdateIndexMetadataTask

7d17f7a

Fuss about with assignments

5e96402

Use the projectMetadata directly

dc4a442

Add null checks before calling getTaskWithId, etc

ef53eb2

Merge branch 'main' into datastream-reindex-mp

afec341

joegallo requested a review from nielsbauman June 26, 2025 14:42

joegallo marked this pull request as ready for review June 26, 2025 17:22

nielsbauman reviewed Jun 26, 2025

View reviewed changes

joegallo added 2 commits June 30, 2025 12:12

Revert the null checks around getProjectMetadata

08c6012

Merge branch 'main' into datastream-reindex-mp

4817216

nielsbauman approved these changes Jun 30, 2025

View reviewed changes

masseyke approved these changes Jun 30, 2025

View reviewed changes

joegallo merged commit d69a282 into elastic:main Jun 30, 2025
32 checks passed

joegallo deleted the datastream-reindex-mp branch June 30, 2025 18:04

mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025

Make the datastream reindexing APIs multi-project aware (elastic#130035)

0a702bf

Make the datastream reindexing APIs multi-project aware #130035

Make the datastream reindexing APIs multi-project aware #130035

Uh oh!

Conversation

joegallo commented Jun 25, 2025

Uh oh!

masseyke commented Jun 25, 2025

Uh oh!

masseyke commented Jun 25, 2025

Uh oh!

Uh oh!

nielsbauman Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

joegallo Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

joegallo Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nielsbauman commented Jun 25, 2025

Uh oh!

masseyke commented Jun 25, 2025

Uh oh!

joegallo commented Jun 25, 2025

Uh oh!

elasticsearchmachine commented Jun 26, 2025

Uh oh!

nielsbauman left a comment

Choose a reason for hiding this comment

Uh oh!

nielsbauman Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

joegallo Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

nielsbauman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joegallo Jun 26, 2025 •

edited

Loading