POC: Add support for per project repo client #126584

ywangd · 2025-04-10T08:16:04Z

The intention of this draft PR is to get agreement on the approach for managing per-project repository clients. The current changes are for s3 only. But similar change should be possible for Azure and GCS as well. I do plan to have PoC for at least one of them as well. Raising the initial PR for early feedback.

A few things worth note for the per-project clients are:

Creation and deletion/rebuilding by listening to project secrets changes in cluster state
Separate from and does not work with the reloadable plugin framework which is still used in single-default-project mode and Serverless MP cluster level client.
Each project is managed separately so that update to one does not impact another. Within the same project, updating/closing one client leads to rebuild of clients of the project. This is similar to how we rebuild all clients with the update from the reloadable plugin interface.
Do not support overriding cilent settings with repo settings which is still supported in the single-default-project mode.

Relates: ES-11383

ywangd · 2025-04-10T09:49:02Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Service.java

+    /**
+     * Delegates to {@link #client(RepositoryMetadata)} when
+     * 1. per-project client is disabled
+     * 2. or when the blobstore is cluster level (projectId = null)
+     * Otherwise, attempts to retrieve a per-project client by the project-id and repository metadata from the
+     * per-project client manager. Throws if project-id or the client does not exist. The client maybe initialized lazily.
+     */
+    public AmazonS3Reference client(@Nullable ProjectId projectId, RepositoryMetadata repositoryMetadata) {
+        if (perProjectClientManager == null) {
+            // Multi-Project is disabled and we have a single default project
+            assert ProjectId.DEFAULT.equals(projectId) : projectId;
+            return client(repositoryMetadata);
+        } else if (projectId == null) {
+            // Multi-Project is enabled and we are retrieving a client for the cluster level blobstore
+            return client(repositoryMetadata);
+        } else {
+            return perProjectClientManager.client(projectId, repositoryMetadata);
+        }
+    }


This is the main interface to access repo client for different scopes. Note the usage of null projectId for cluster level blobstore similar to how cluster level persistent task is handled.

ywangd · 2025-04-11T04:08:56Z

@DaveCTurner @henningandersen I updated the PR with changes to GCS repository clients as well. Also tightened up clientReference management for S3.

Overall the S3 changes have more complexity due to refcounting for the clients. I ended up using synchronized blocks similar to the existing code. The sychronization is per-project and needed only when there are project creation/deletion and clients setting update so that the contention should be rather low.

GCS repo does not allow overriding client settings with repo settings. So it might be feasible to merge the per-project client manager with the existing clientsCache. But for now, I am having them separate since it feels easier to reason with the changes. It might also make sense to keep them separate long term because the clients are refreshed with different mechanisms, reloadable plugin vs cluster state listener.

Since Azure does not have a clients cache, I think the changes should be relatively straightforward. It also does not allow overriding client settings with repo settings.

Given this the draft status of the PR, I am mostly seeking for consensus of the overall approach, e.g. separate per-project and existing client management, how the per-project manager is hooked and how project-id is passed around and used. Thanks!

…ject-secrets-listener

DaveCTurner · 2025-04-11T04:45:27Z

I ended up using synchronized blocks similar to the existing code.

I'm not a huge fan of creating/closing the clients while holding a mutex, I think we might need something more refined here, especially since with this change we now might block the cluster applier thread on this work. The CHM-based approach in GCS is better, but still blocking. Creating a client should be cheap enough that if we cannot find a cached client then we can optimistically create a fresh one before trying to replace it in the map, accepting that if multiple threads do this then we'd want to discard the clients created by the threads that lost the race.

Note that with SDKv2 we will own the lifecycle of the HTTP client independently of the S3 SDK client, so could potentially share the HTTP client across multiple projects.

For my education, are we going to be accepting credentials via file-based settings and putting them in the cluster state, or do those still use the keystore?

ywangd · 2025-04-11T06:23:27Z

Thanks for the feedback, David.

For CHM usages, there are two levels. The first one is Map<ProjectId, ClientsHolder> which tracks the live projects based on cluster state updates. I think this usage is unavoidable? But it should have no contention since all updates happen in the applier thread. The second level CHM usage is inside GCS's ClientsHolder, i.e. clientCache. This one is nevered called on the applier thread. So should have no issue.

I agree that the synchronized usage is not desirable. I tried to avoid it initially but it proved to be difficult while maintaining the current semantics. It might be possible if we are ok with creating the same client more than once as you suggested. But I wonder whether you think it would be an acceptable solution to simply fork client closing to a different thread? Closing the clients is the only use case in the applier thread where it runs into synchronization. Since there is no requirement to close the clients synchronously, I feel this could be a simple solution while maintain all existing semantics. In fact, I think we do want to move closing the clients to a different thread regardless of the synchronization since it might be itself expensive (by closing the underlying s3 sdk client). What do you think?

are we going to be accepting credentials via file-based settings and putting them in the cluster state, or do those still use the keystore?

Per-project credentials will be via per-project file-based secrets (see ReservedProjectSecretsAction in the serverless repo). Cluster level credentials will still be provisioned as is, i.e. via the reloadable plugin mechanism that originated from the keystore time.

DaveCTurner · 2025-04-11T06:54:49Z

I think we do want to move closing the clients to a different thread regardless of the synchronization since it might be itself expensive (by closing the underlying s3 sdk client). What do you think?

Yes I agree. We should probably block the doClose() of the S3Service until all the clients have finished closing as well. That's probably enough to keep the blocking away from the cluster applier indeed.

ywangd · 2025-04-29T02:21:28Z

I assume we are OK with the overall approach and I will go ahead to raise the formal PR for s3 first. Thanks for the discussion!

With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests. Relates: #126584 Resolves: ES-11713

With project secrets in place, we can now create and manage per-project repository clients in addition to the cluster level repository clients. This PR adds a manager class for that. Note that the logic is not yet fully wired because it needs per-project repository/objec_store which will be added in separate PRs (as part of MP snapshots and MP objecStoreService). As such the tests are currently unit tests. Relates: elastic#126584 Resolves: ES-11713

ywangd · 2025-08-14T02:46:07Z

This is now implemented by #127631, #131899 and #132701

POC: Add support for per project repo client

1454bfa

Relates: ES-11383

elasticsearchmachine added the v9.1.0 label Apr 10, 2025

ywangd added 3 commits April 10, 2025 18:51

fix initialization

79b4594

more comments

a755503

more comments

2f28eca

ywangd commented Apr 10, 2025

View reviewed changes

ywangd requested review from DaveCTurner and henningandersen April 10, 2025 09:52

ywangd added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Apr 10, 2025

ywangd added 3 commits April 10, 2025 20:32

fix

df7120b

per-project clients for GCS

b3195eb

s3 lifecycle

5d94db0

ywangd added 2 commits April 11, 2025 14:09

Merge remote-tracking branch 'origin/main' into mp/poc-repository-pro…

b48a32c

…ject-secrets-listener

tweak

35cd05b

fork client closing

b5e8fb1

block waiting for async client closing on shutdown

ffbe479

ywangd mentioned this pull request May 5, 2025

Support per-project s3 clients #127631

Merged

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

ywangd closed this Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: Add support for per project repo client #126584

POC: Add support for per project repo client #126584

Uh oh!

ywangd commented Apr 10, 2025 •

edited

Loading

Uh oh!

ywangd Apr 10, 2025

Uh oh!

ywangd commented Apr 11, 2025 •

edited

Loading

Uh oh!

DaveCTurner commented Apr 11, 2025

Uh oh!

ywangd commented Apr 11, 2025

Uh oh!

DaveCTurner commented Apr 11, 2025

Uh oh!

ywangd commented Apr 29, 2025

Uh oh!

ywangd commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

POC: Add support for per project repo client #126584

POC: Add support for per project repo client #126584

Uh oh!

Conversation

ywangd commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywangd Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner commented Apr 11, 2025

Uh oh!

ywangd commented Apr 11, 2025

Uh oh!

DaveCTurner commented Apr 11, 2025

Uh oh!

ywangd commented Apr 29, 2025

Uh oh!

ywangd commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ywangd commented Apr 10, 2025 •

edited

Loading

ywangd commented Apr 11, 2025 •

edited

Loading