Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Mar 21, 2025

This PR migrates RepositoriesMetadata from Metadata#ClusterCustom to Metadata#ProjectCustom and handles wire BWC.

Resolves: ES-10477

@ywangd ywangd added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Mar 23, 2025
@ywangd ywangd marked this pull request as ready for review March 23, 2025 22:24
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Mar 23, 2025
@ywangd ywangd requested review from a team, DaveCTurner and pxsalehi March 23, 2025 22:25
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Mar 23, 2025
.metadata(
Metadata.builder(currentState.getMetadata())
.putCustom(
.putDefaultProjectCustom(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the repository explicitly work with the default project. Otherwise, it throws MultiProjectPendingException in some multi-project enabled tests. It worked previously because repositories are cluster custom which does not check the number of projects. It is a temporary workaround (the method is marked as deprecated for removal) and will be removed once repository and snapshot fully support multi-project.

LicensesMetadata licensesMetadata = new LicensesMetadata(license, TrialLicenseVersion.CURRENT);
RepositoryMetadata repositoryMetadata = new RepositoryMetadata("repo", "fs", Settings.EMPTY);
RepositoriesMetadata repositoriesMetadata = new RepositoriesMetadata(Collections.singletonList(repositoryMetadata));
NodesShutdownMetadata nodesShutdownMetadata = new NodesShutdownMetadata(Map.of());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test just needs a different Metadata#ClusterCustom that is not licenses.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty concerned with the complexity of the surgery being introduced into the (already complex) diff-serialization process. Did we explore any alternatives? For instance, could we keep this info in cluster-wide customs until org.elasticsearch.cluster.ClusterState#getMinTransportVersion advances far enough that we won't need the BwC complexity?

@ywangd
Copy link
Member Author

ywangd commented Mar 24, 2025

It would require v10 to completely advance this change since v9.0 still stores it in cluster customs. I'd rather to pay for the complexity once similar to how we did it for persistent tasks split so that what's left is removing it once we hit v10, part of it can even be dropped once serverless advances in a few week's time. And it will be in the "final" state. I am sure v10 will come with its own pile of tasks, so overall I think it's better to fix it now.

@DaveCTurner
Copy link
Contributor

I don't think that's a very strong argument for introducing this level of complexity right now. We will never need to send a diff from a v9.0 node (using cluster-level repositories) to any v10.x node using project-level ones, and we will need to keep the code for fixing this up when loading the Metadata from disk all the way through the 10.x series either way to support a full-cluster-restart upgrade between these versions.

@DaveCTurner
Copy link
Contributor

part of it can even be dropped once serverless advances in a few week's time

Can you expand on this? What exactly will be dropped? Sorry if I've missed something obvious but I couldn't work out which bit you meant here.

Comment on lines +1090 to +1091
assert out.getTransportVersion().onOrAfter(TransportVersions.MULTI_PROJECT)
&& out.getTransportVersion().before(TransportVersions.REPOSITORIES_METADATA_AS_PROJECT_CUSTOM) : out.getTransportVersion();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner I meant this method which is most of the new complexity. Versions between MULTI_PROJECT and REPOSITORIES_METADATA_AS_PROJECT_CUSTOM are only released in serverless. Once serverless rolls forward, we can remove it.

@ywangd
Copy link
Member Author

ywangd commented Mar 24, 2025

Diffs are not needed for full cluster restart, only xcontent which has rather minimal BWC changes.

@DaveCTurner
Copy link
Contributor

I see, ok, I had missed that if this lands in 9.1 then that becomes dead code. I see there is already prior art for this diff-surgery approach too. I didn't know that, and FTR my discomfort with this PR extends to that prior art too, but I guess we have to roll forwards with it for now.

@ywangd
Copy link
Member Author

ywangd commented Mar 25, 2025

Yeah let's roll forward 😄

@ywangd ywangd requested a review from DaveCTurner March 25, 2025 03:25
Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my eyes, these changes look ok. But this is quite tricky to review. So I'd rather leave the final review/LGTM to David.

@ywangd
Copy link
Member Author

ywangd commented Mar 26, 2025

Thanks for taking a look, Pooya. Let me provide a bit high level explanation and see whether it helps.

The issue is that we need to split an entry from Metadata#customs and move it into ProjectMetadata#customs. We did similar split before merging back to main (TransportVersions.MULTI_PROJECT, hereafter referred as MP) and this is part of the prior art. For this new split, we can reuse the logic if the old version is before MP. The complexity is mostly for communicating with the versions on-or-after MP.

BWC reading and writing for the entire Metadata are not bad at all. The challenge comes from the Diffs due to MapDiff manipulations and how nested it can be. If we look past the MapDiff manipulation, conceptually, the logic is rather straightforward. For example, the steps for writing a Diff is as follows:

  1. Iterate through the map of ProjectMetadataDiff and separate RepositoriesMetadata (or its diff) from the project level customs. Note that we throw if anything is found for non-default project.
  2. Combine RepositoriesMetadata (or its diff) separated in step 1 with the cluster level customs.
  3. Send the combined cluster level customs and the processed map of ProjectMetadataDiff from step 1.

The above logic is encapsulated in method writeDiffWithRepositoriesMetadataAsClusterCustom which contributes to most of the new complexity (reading from old versions is a lot simpler due to merging into MapDiff being easier). Fortunately, as commented earlier, this method will soon become unused once Serverless deployments roll forward. So the complexity does not stay with us for very long. This BWC logic is tested with both unit tests and the serverless rolling upgrade test in the linked PR. Overall I think it is safe to proceed.

@tvernum @nielsbauman Though it is labelled with snapshot/restore, the changes are more for namespacing and tied with previous Metadata changes. I'd appreciate if you could also take a look at this PR since you are familiar with the context. Thank you!

Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not the most confident approval I've ever given. I agree with David and Pooya (and probably you do too) that this seralization logic is becoming extremely complex and unreadable. I'm happy some of this can be dropped in a few weeks already. I don't immediately have any better suggestions.

throwForVersionBeforeRepositoriesMetadataMigration(out);
}
// RepositoriesMetadata found for the default project as an upsert, package it as MapDiff and merge into Metadata#customs
combineClustersCustoms.set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something here, but just to double check: is it guaranteed that only one of these lambdas reaches the end? In other words, are we sure we won't encounter non-empty repositories for the default project in both upserts and diffs? I just want to make sure combineClusterCustoms.set() is only called once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, upserts and diffs are mutually exclusive, see

if (previousValue == null) {
upserts.add(entry);
inserts++;
} else if (entry.getValue().equals(previousValue) == false) {
if (valueSerializer.supportsDiffableValues()) {
diffs.add(
new AbstractMap.SimpleImmutableEntry<>(entry.getKey(), valueSerializer.diff(entry.getValue(), previousValue))
);
} else {
upserts.add(entry);
}
}

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for a very loose definition of "good" which includes:

  • this is at least well-covered by existing tests
  • there are no obvious mistakes here
  • I cannot think of a better way of doing this given where we are today

@ywangd
Copy link
Member Author

ywangd commented Mar 28, 2025

@elasticmachine update branch

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 28, 2025
@ywangd
Copy link
Member Author

ywangd commented Mar 28, 2025

Thanks all for the reviews! 🙏

@elasticsearchmachine elasticsearchmachine merged commit 3568ab8 into elastic:main Mar 28, 2025
17 checks passed
@ywangd ywangd deleted the mp/repositories-metadata-migration branch March 28, 2025 06:53
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
This PR migrates RepositoriesMetadata from Metadata#ClusterCustom to
Metadata#ProjectCustom and handles wire BWC.

Resolves: ES-10477
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 19, 2025
When migrating RepositoriesMetadata from cluster custom to project
custom, we needed temporary BWC handling for clusters running on a
version that is before this change but after the initial MP change. Such
a cluster can only exist in the serverless environment which has
progressed way past any applicable versions. Therefore we no longer need
the BWC handling and this PR removes it.

Relates: elastic#125398
elasticsearchmachine pushed a commit that referenced this pull request Jun 20, 2025
When migrating RepositoriesMetadata from cluster custom to project
custom (#125398), we needed temporary BWC handling for clusters running
on a version that is before this change but after the initial MP change.
Such a cluster can only exist in the serverless environment which has
progressed way past any applicable versions. Therefore we no longer need
the BWC handling and this PR removes it.

Relates: #125398
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025
When migrating RepositoriesMetadata from cluster custom to project
custom (elastic#125398), we needed temporary BWC handling for clusters running
on a version that is before this change but after the initial MP change.
Such a cluster can only exist in the serverless environment which has
progressed way past any applicable versions. Therefore we no longer need
the BWC handling and this PR removes it.

Relates: elastic#125398
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
When migrating RepositoriesMetadata from cluster custom to project
custom (elastic#125398), we needed temporary BWC handling for clusters running
on a version that is before this change but after the initial MP change.
Such a cluster can only exist in the serverless environment which has
progressed way past any applicable versions. Therefore we no longer need
the BWC handling and this PR removes it.

Relates: elastic#125398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants