Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Apr 10, 2025

The intention of this draft PR is to get agreement on the approach for managing per-project repository clients. The current changes are for s3 only. But similar change should be possible for Azure and GCS as well. I do plan to have PoC for at least one of them as well. Raising the initial PR for early feedback.

A few things worth note for the per-project clients are:

  • Creation and deletion/rebuilding by listening to project secrets changes in cluster state
  • Separate from and does not work with the reloadable plugin framework which is still used in single-default-project mode and Serverless MP cluster level client.
  • Each project is managed separately so that update to one does not impact another. Within the same project, updating/closing one client leads to rebuild of clients of the project. This is similar to how we rebuild all clients with the update from the reloadable plugin interface.
  • Do not support overriding cilent settings with repo settings which is still supported in the single-default-project mode.

Relates: ES-11383

Comment on lines 170 to 188
/**
* Delegates to {@link #client(RepositoryMetadata)} when
* 1. per-project client is disabled
* 2. or when the blobstore is cluster level (projectId = null)
* Otherwise, attempts to retrieve a per-project client by the project-id and repository metadata from the
* per-project client manager. Throws if project-id or the client does not exist. The client maybe initialized lazily.
*/
public AmazonS3Reference client(@Nullable ProjectId projectId, RepositoryMetadata repositoryMetadata) {
if (perProjectClientManager == null) {
// Multi-Project is disabled and we have a single default project
assert ProjectId.DEFAULT.equals(projectId) : projectId;
return client(repositoryMetadata);
} else if (projectId == null) {
// Multi-Project is enabled and we are retrieving a client for the cluster level blobstore
return client(repositoryMetadata);
} else {
return perProjectClientManager.client(projectId, repositoryMetadata);
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main interface to access repo client for different scopes. Note the usage of null projectId for cluster level blobstore similar to how cluster level persistent task is handled.

@ywangd ywangd added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Apr 10, 2025
@ywangd
Copy link
Member Author

ywangd commented Apr 11, 2025

@DaveCTurner @henningandersen I updated the PR with changes to GCS repository clients as well. Also tightened up clientReference management for S3.

Overall the S3 changes have more complexity due to refcounting for the clients. I ended up using synchronized blocks similar to the existing code. The sychronization is per-project and needed only when there are project creation/deletion and clients setting update so that the contention should be rather low.

GCS repo does not allow overriding client settings with repo settings. So it might be feasible to merge the per-project client manager with the existing clientsCache. But for now, I am having them separate since it feels easier to reason with the changes. It might also make sense to keep them separate long term because the clients are refreshed with different mechanisms, reloadable plugin vs cluster state listener.

Since Azure does not have a clients cache, I think the changes should be relatively straightforward. It also does not allow overriding client settings with repo settings.

Given this the draft status of the PR, I am mostly seeking for consensus of the overall approach, e.g. separate per-project and existing client management, how the per-project manager is hooked and how project-id is passed around and used. Thanks!

@DaveCTurner
Copy link
Contributor

I ended up using synchronized blocks similar to the existing code.

I'm not a huge fan of creating/closing the clients while holding a mutex, I think we might need something more refined here, especially since with this change we now might block the cluster applier thread on this work. The CHM-based approach in GCS is better, but still blocking. Creating a client should be cheap enough that if we cannot find a cached client then we can optimistically create a fresh one before trying to replace it in the map, accepting that if multiple threads do this then we'd want to discard the clients created by the threads that lost the race.

Note that with SDKv2 we will own the lifecycle of the HTTP client independently of the S3 SDK client, so could potentially share the HTTP client across multiple projects.

For my education, are we going to be accepting credentials via file-based settings and putting them in the cluster state, or do those still use the keystore?

@ywangd
Copy link
Member Author

ywangd commented Apr 11, 2025

Thanks for the feedback, David.

For CHM usages, there are two levels. The first one is Map<ProjectId, ClientsHolder> which tracks the live projects based on cluster state updates. I think this usage is unavoidable? But it should have no contention since all updates happen in the applier thread. The second level CHM usage is inside GCS's ClientsHolder, i.e. clientCache. This one is nevered called on the applier thread. So should have no issue.

I agree that the synchronized usage is not desirable. I tried to avoid it initially but it proved to be difficult while maintaining the current semantics. It might be possible if we are ok with creating the same client more than once as you suggested. But I wonder whether you think it would be an acceptable solution to simply fork client closing to a different thread? Closing the clients is the only use case in the applier thread where it runs into synchronization. Since there is no requirement to close the clients synchronously, I feel this could be a simple solution while maintain all existing semantics. In fact, I think we do want to move closing the clients to a different thread regardless of the synchronization since it might be itself expensive (by closing the underlying s3 sdk client). What do you think?

are we going to be accepting credentials via file-based settings and putting them in the cluster state, or do those still use the keystore?

Per-project credentials will be via per-project file-based secrets (see ReservedProjectSecretsAction in the serverless repo). Cluster level credentials will still be provisioned as is, i.e. via the reloadable plugin mechanism that originated from the keystore time.

@DaveCTurner
Copy link
Contributor

I think we do want to move closing the clients to a different thread regardless of the synchronization since it might be itself expensive (by closing the underlying s3 sdk client). What do you think?

Yes I agree. We should probably block the doClose() of the S3Service until all the clients have finished closing as well. That's probably enough to keep the blocking away from the cluster applier indeed.

@ywangd
Copy link
Member Author

ywangd commented Apr 29, 2025

I assume we are OK with the overall approach and I will go ahead to raise the formal PR for s3 first. Thanks for the discussion!

elasticsearchmachine pushed a commit that referenced this pull request May 30, 2025
With project secrets in place, we can now create and manage per-project
repository clients in addition to the cluster level repository clients.
This PR adds a manager class for that. Note that the logic is not yet
fully wired because it needs per-project repository/objec_store which
will be added in separate PRs (as part of MP snapshots and MP
objecStoreService). As such the tests are currently unit tests.

Relates: #126584 Resolves:
ES-11713
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
With project secrets in place, we can now create and manage per-project
repository clients in addition to the cluster level repository clients.
This PR adds a manager class for that. Note that the logic is not yet
fully wired because it needs per-project repository/objec_store which
will be added in separate PRs (as part of MP snapshots and MP
objecStoreService). As such the tests are currently unit tests.

Relates: elastic#126584 Resolves:
ES-11713
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
With project secrets in place, we can now create and manage per-project
repository clients in addition to the cluster level repository clients.
This PR adds a manager class for that. Note that the logic is not yet
fully wired because it needs per-project repository/objec_store which
will be added in separate PRs (as part of MP snapshots and MP
objecStoreService). As such the tests are currently unit tests.

Relates: elastic#126584 Resolves:
ES-11713
@ywangd
Copy link
Member Author

ywangd commented Aug 14, 2025

This is now implemented by #127631, #131899 and #132701

@ywangd ywangd closed this Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants