[ML] Adding bulk create functionality to ModelRegistry #136569

jonathan-buttner · 2025-10-14T19:59:24Z

This PR adds functionality to the ModelRegistry to store multiple inference endpoints at the same time by using a bulk index operation. This will be useful for when the master handles polling EIS for the authorized preconfigured endpoints so that it can create the new ones in one operation.

This doesn't leverage the ability to create multiple endpoints (beyond the storeModel using it internally). It will be used in a follow up PR.

…ticsearch into ml-eis-auth-master

jonathan-buttner · 2025-10-14T20:08:33Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+                    new ElasticsearchStatusException(
+                        "Inference endpoint [{}] already exists",
+                        RestStatus.BAD_REQUEST,
+                        failureItem.failureCause,


I changed this from a ResourceAlreadyExistsException so we could include the cause, but maybe we don't want to include it 🤷‍♂️

jonathan-buttner · 2025-10-14T20:39:59Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+
+    private void updateClusterState(List<Model> models, ActionListener<AcknowledgedResponse> listener, TimeValue timeout) {
+        var inferenceIdsSet = models.stream().map(Model::getInferenceEntityId).collect(Collectors.toSet());
+        var storeListener = listener.delegateResponse((delegate, exc) -> {


I switched to this instead of creating an anonymous class.

…uth-master

elasticsearchmachine · 2025-10-17T12:52:21Z

Pinging @elastic/ml-core (Team:ML)

DonalEvans

Not strictly related to this PR, but it seems like a lot of the tests in ModelRegistryTests could/should be moved to ModelRegistryIT since they're integration tests rather than unit tests (at least, as far as I understand those terms to be defined).

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

DonalEvans · 2025-10-17T17:32:22Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

-        };
+
+            var secretsItem = bulkItems[i + 1];
+            var secretsStoreResponse = createModelStoreResponse(secretsItem, docIdToInferenceId);


Is it worth doing some kind of check that the inference ID for the secrets item matches the inference ID for the configuration item? Or is it not possible to lose items from part way through the bulk response, only at the end?

We probably don't need the check for not getting an even number of responses. I was mostly trying to create a nicer error message in the very very unlikely event that it happened. I'll add an assertion that the inference IDs are the same.

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

DonalEvans · 2025-10-17T18:07:31Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+                return;
+            }
+
+            var failureItem = firstFailureResponse.get();


When creating multiple inference endpoints in one go, it's possible that there may be multiple failures with multiple reasons. Would it be better to report all of the failures in the exception? It could be misleading if we only report the first endpoint that failed to be created when in fact none of them were created. Also, what would be the correct way to handle the case where one of the endpoints wasn't created because it already exists (400 error), but another wasn't created due to some other issue not caused by VersionConflictEngineException (500 error)?

This logic is within the storeModel() method. It only allows one endpoint to be created. The previous behavior to this PR for the storeModel() method was to retrieve the first failure (at most there will be 2 failures, one for each of the bulk items).

Your point about combining the errors is a good idea though. I'd rather create an issue and address that later though if that's ok.

The storeModels() does report each failure/success but it does it using the onResponse() of the listener and returns a list of responses which includes the rest status and an exception if one occurred.

Also, what would be the correct way to handle the case where one of the endpoints wasn't created because it already exists (400 error), but another wasn't created due to some other issue not caused by VersionConflictEngineException (500 error)?

In this situation, VersionConflictEngineException represents the case where the endpoint already exists. The change I made here was to return a raw ElasticsearchStatusException instead of a ResourceExistsException. The reason I chose that was so we could return the actual cause but we don't necessarily need to do that. I figured that might be more informative but maybe it's unnecessarily information for the user.

I believe VersionConflictEngineException could also occur if we were trying to do an update and sequence number we're using is incorrect (some other request occurred before ours). I don't think we need to handle that scenario in this case though, that'd be in the update flow.

Oh, my mistake, I missed that this was the case where there was only one endpoint. Combining the errors doesn't need to be done as part of this PR.

...n/inference/src/test/java/org/elasticsearch/xpack/inference/registry/ModelRegistryTests.java

…uth-master

jonathan-buttner · 2025-10-20T14:16:33Z

Not strictly related to this PR, but it seems like a lot of the tests in ModelRegistryTests could/should be moved to ModelRegistryIT since they're integration tests rather than unit tests (at least, as far as I understand those terms to be defined).

I took a stab at moving over the tests to ModelRegistryIT. I agree that most of them should live in that file now. I left a few that didn't seem to be leveraging elasticsearch and needed package private access to ModelRegistry.

…ticsearch into ml-eis-auth-master

.../internalClusterTest/java/org/elasticsearch/xpack/inference/integration/ModelRegistryIT.java

…uth-master

davidkyle · 2025-10-21T09:12:43Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

-                    format("Storing inference endpoint [%s] failed, no items were received from the bulk response", inferenceEntityId)
-                );
+                var inferenceEntityIds = String.join(", ", models.stream().map(Model::getInferenceEntityId).toList());
+                logger.warn("Storing inference endpoints [{}] failed, no items were received from the bulk response", inferenceEntityIds);


Check that the bulk request or models list is not empty in the storeModels function and trivially return success if that is the case. Otherwise an empty request would return a 500 error code

Good call 👍

davidkyle · 2025-10-21T09:16:07Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

-            var inferenceEntityId = model.getInferenceEntityId();
+            var docIdToInferenceId = models.stream()
+                .collect(Collectors.toMap(m -> Model.documentId(m.getInferenceEntityId()), Model::getInferenceEntityId, (id1, id2) -> {
+                    logger.warn("Encountered duplicate inference ids when storing endpoints: [{}]", id1);


To avoid one config overwriting another (or throwing a version conflict exception) the check for duplicate Ids should be performed before indexing. The storeModels function is called automatically by internal code and we want it to be resilient so maybe filter out the duplicates if the id and model config are exactly the same.

…uth-master

davidkyle · 2025-10-22T11:52:27Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+
+            var storageResponses = responses.stream().map(StoreResponseWithIndexInfo::modelStoreResponse).toList();
+
+            deleteModels(inferenceIdsToBeRemoved, ActionListener.running(() -> delegate.onResponse(storageResponses)));


Is it possible for inferenceIdsToBeRemoved to be an empty set and in which case can deleteModels be skipped?

Good point, I'll have deleteModels return early if the set is empty.

…uth-master

davidkyle

LGTM

* Adding bulk storage of multiple models * Adding tests * Adding log for duplicate ids * [CI] Auto commit changes from spotless * Removing unused code * Removing constructor * Adding more tests * Adding in logic to delete models when a failure occurs * revert rename changes * formatting * Starting on feedback * Improving tests * Moving most tests to ModelRegistryIT * [CI] Auto commit changes from spotless * Fixing test * Removing duplicate tests * Handling empty list and duplicates * Fixing empty delete --------- Co-authored-by: elasticsearchmachine <[email protected]>

jonathan-buttner added 3 commits October 10, 2025 17:26

Adding bulk storage of multiple models

bef0079

Adding tests

feb96ba

Adding log for duplicate ids

5f36d45

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 14, 2025

elasticsearchmachine and others added 3 commits October 14, 2025 20:06

[CI] Auto commit changes from spotless

ebb6476

Removing unused code

c30df2b

Merge branch 'ml-eis-auth-master' of github.com:jonathan-buttner/elas…

5d4382e

…ticsearch into ml-eis-auth-master

jonathan-buttner commented Oct 14, 2025

View reviewed changes

jonathan-buttner and others added 4 commits October 14, 2025 16:50

Removing constructor

4ea87d1

Merge branch 'main' into ml-eis-auth-master

651876d

Adding more tests

155c366

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

44ddd92

…uth-master

jonathan-buttner mentioned this pull request Oct 16, 2025

[ML] Transition EIS auth polling to persistent task on a single node #136713

Merged

jonathan-buttner added 4 commits October 16, 2025 15:38

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

dbe11ce

…uth-master

Adding in logic to delete models when a failure occurs

b7f41f8

revert rename changes

f4d9f2c

formatting

cd8c832

jonathan-buttner requested a review from DonalEvans October 17, 2025 12:51

jonathan-buttner marked this pull request as ready for review October 17, 2025 12:51

jonathan-buttner requested a review from davidkyle October 17, 2025 12:52

DonalEvans reviewed Oct 17, 2025

View reviewed changes

jonathan-buttner added 4 commits October 17, 2025 17:05

Starting on feedback

b6cebf5

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

1130e82

…uth-master

Improving tests

5441da4

Moving most tests to ModelRegistryIT

58f9a75

jonathan-buttner requested a review from DonalEvans October 20, 2025 14:15

elasticsearchmachine and others added 3 commits October 20, 2025 14:20

[CI] Auto commit changes from spotless

1854e50

Fixing test

a8db6cf

Merge branch 'ml-eis-auth-master' of github.com:jonathan-buttner/elas…

9640639

…ticsearch into ml-eis-auth-master

DonalEvans reviewed Oct 20, 2025

View reviewed changes

jonathan-buttner added 2 commits October 20, 2025 15:30

Removing duplicate tests

426fcbe

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

4e9a320

…uth-master

jonathan-buttner requested a review from DonalEvans October 20, 2025 19:31

DonalEvans approved these changes Oct 20, 2025

View reviewed changes

davidkyle reviewed Oct 21, 2025

View reviewed changes

jonathan-buttner added 2 commits October 21, 2025 10:58

Handling empty list and duplicates

65e008f

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

d477b34

…uth-master

jonathan-buttner requested a review from davidkyle October 21, 2025 15:00

davidkyle reviewed Oct 22, 2025

View reviewed changes

jonathan-buttner and others added 4 commits October 22, 2025 10:57

Fixing empty delete

7cea6c1

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

96dbd62

…uth-master

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-a…

da24021

…uth-master

Merge branch 'main' into ml-eis-auth-master

221dc34

davidkyle approved these changes Oct 29, 2025

View reviewed changes

Merge branch 'main' into ml-eis-auth-master

a994357

jonathan-buttner enabled auto-merge (squash) October 29, 2025 12:51

jonathan-buttner merged commit 22fe5be into elastic:main Oct 29, 2025
34 checks passed


		var storageResponses = responses.stream().map(StoreResponseWithIndexInfo::modelStoreResponse).toList();

		deleteModels(inferenceIdsToBeRemoved, ActionListener.running(() -> delegate.onResponse(storageResponses)));

[ML] Adding bulk create functionality to ModelRegistry #136569

[ML] Adding bulk create functionality to ModelRegistry #136569

Uh oh!

Conversation

jonathan-buttner commented Oct 14, 2025

Uh oh!

jonathan-buttner Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Oct 17, 2025

Uh oh!

DonalEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner commented Oct 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonathan-buttner Oct 14, 2025 •

edited

Loading