[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192

jonathan-buttner · 2025-10-08T14:02:34Z

This PR modifies the ModelRegistry to leverage the EIS v1 authorization endpoint to determine which preconfigured inference endpoints are authorized and should be returned by the ModelRegistry.

Notable changes:

The EIS preconfigured inference endpoints are no longer stored in the inference index and they are no longer stored in the ModelRegistry in memory concurrent hash map.
ModelRegistry methods: getModel, getModelWithSecrets, getModelsByTaskType, and getAllModels now reach out to EIS if we know the inference ID is for a preconfigured endpoint for EIS or if we can't find it in the inference index and we can't find it in the in memory hash map
The authorization polling logic was removed. The polling logic occurred on each ES node. The polling logic will be added back and moved only to occur on the master node in a follow up PR.
ModelRegistry::getMinimalServiceSettings relies on a hardcoded map to see if the passed in inference id is an EIS one

Testing

Setup

Start the EIS gateway
eis-gateway repo

make TLS_VERIFY_CLIENT_CERTS=false run

Start ES

run-es -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none

Test Get Services API

GET _inference/_services

    {
        "service": "elastic",
        "name": "Elastic",
        "task_types": [
            "sparse_embedding",
            "chat_completion"
        ],
        "configurations": {
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "chat_completion"
                ]
            },
            "max_input_tokens": {
                "description": "Allows you to specify the maximum number of tokens per input.",
                "label": "Maximum Input Tokens",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding"
                ]
            }
        }
    },

Test Get all endpoints

GET _inference/_all

{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "word",
                "max_chunk_size": 250,
                "overlap": 100
            }
        },
        {
            "inference_id": ".elser-2-elasticsearch",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        },
        {
            "inference_id": ".multilingual-e5-small-elasticsearch",
            "task_type": "text_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".multilingual-e5-small",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        },
        {
            "inference_id": ".rainbow-sprinkles-elastic",
            "task_type": "chat_completion",
            "service": "elastic",
            "service_settings": {
                "model_id": "rainbow-sprinkles"
            }
        },
        {
            "inference_id": ".rerank-v1-elasticsearch",
            "task_type": "rerank",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".rerank-v1",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "task_settings": {
                "return_documents": true
            }
        }
    ]
}

Test get sparse embedding endpoints

GET _inference/sparse_embedding/_all

{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "word",
                "max_chunk_size": 250,
                "overlap": 100
            }
        },
        {
            "inference_id": ".elser-2-elasticsearch",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        }
    ]
}

Get single endpoint

GET _inference/sparse_embedding/.elser-2-elastic

{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "word",
                "max_chunk_size": 250,
                "overlap": 100
            }
        }
    ]
}

Test retrieving unauthorized EIS preconfigured inference endpoint

GET _inference/rerank/.rerank-v1-elastic

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Unable to retrieve the preconfigured inference endpoint [.rerank-v1-elastic] from the Elastic Inference Service"
            }
        ],
        "type": "status_exception",
        "reason": "Unable to retrieve the preconfigured inference endpoint [.rerank-v1-elastic] from the Elastic Inference Service",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "No Elastic Inference Service preconfigured endpoint found for inference ID [.rerank-v1-elastic]. Either it does not exist, or you are not authorized to access it."
        }
    },
    "status": 400
}

…all-model-reg

…lasticsearch into ml-eis-call-model-reg

…all-model-reg

jonathan-buttner · 2025-10-09T14:46:16Z

...st/java/org/elasticsearch/xpack/inference/integration/InferenceRevokeDefaultEndpointsIT.java

@@ -1,359 +0,0 @@
-/*


This PR removes the polling logic against the EIS authorization endpoint so we can't revoke anymore. EIS is treated as the source of truth for EIS preconfigured inference endpoint information.

We don't need to revoke because this PR removes the functionality to store the EIS preconfigured inference endpoints in the inference index and removes them from being stored in the model registry.

jonathan-buttner · 2025-10-09T14:52:53Z

.../main/java/org/elasticsearch/xpack/inference/action/TransportGetInferenceServicesAction.java

    ) {
        SubscribableListener.<ElasticInferenceServiceAuthorizationModel>newForked(authModelListener -> {
-            // Executing on a separate thread because there's a chance the authorization call needs to do some initialization for the Sender
-            threadPool.executor(UTILITY_THREAD_POOL_NAME).execute(() -> getEisAuthorization(authModelListener, eisSender));


We no longer need to initialize the Sender synchronously so we don't need to jump on a separate thread.

jonathan-buttner · 2025-10-09T14:58:41Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+
+        var eisConfig = ElasticInferenceServiceMinimalSettings.getWithInferenceId(inferenceEntityId);
+        if (eisConfig != null) {
+            return eisConfig.minimalSettings();


This method is used when an index mapping is created that contains a semantic text field. If we didn't include these changes, semantic text will always log a warning that the inference endpoint id used may not exist.

The semantic text field uses this method to retrieve some configuration settings like the task type. If it can't get them in this method it will retrieve them during the first document ingestion.

This is a temporary solution until the model registry has the poll logic to retrieve the preconfigured inference endpoints from EIS.

Typically we'd just make a call to EIS here to determine the settings but we can't make an asynchronous call in this context.

We could leave this functionality out and a warning will be logged every time.

jonathan-buttner · 2025-10-09T15:51:25Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

        return defaultConfigIds.containsKey(inferenceEntityId);
    }

-    /**


No longer referenced.

jonathan-buttner · 2025-10-09T16:16:10Z

...rence/services/elastic/authorization/ElasticInferenceServiceAuthorizationRequestHandler.java

            if (Strings.isNullOrEmpty(baseUrl)) {
                logger.debug("The base URL for the authorization service is not valid, rejecting authorization.");
-                listener.onResponse(ElasticInferenceServiceAuthorizationModel.newDisabledService());
+                listener.onFailure(new IllegalStateException("The Elastic Inference Service URL is not configured."));


Expose that EIS isn't configured. In reality this won't change the upstream logic much. The places that use this method should be only debug logging the IllegalStateException.

jonathan-buttner · 2025-10-09T16:36:30Z

...rc/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java

    // This mirrors the memory constraints observed with sparse embeddings
    private static final Integer DENSE_TEXT_EMBEDDINGS_MAX_BATCH_SIZE = 16;

-    // rainbow-sprinkles


Moved to ElasticInferenceServiceMinimalSettings

jonathan-buttner · 2025-10-09T16:39:05Z

...rc/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java

    }

-    @Override
-    public List<DefaultConfigId> defaultConfigIds() {


We go directly to the EIS authorization service for this information now (or temporarily get it from the hardcoded logic we have for the default inference endpoints).

jonathan-buttner · 2025-10-09T16:41:50Z

...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java

-        }
-    }
-
-    public void testSupportedStreamingTasks_ReturnsEmpty_WhenAuthRespondsWithoutChatCompletion() throws Exception {


We don't have authorization logic anymore for supportStreamingTasks so I'm removing the test.

jonathan-buttner · 2025-10-20T19:26:35Z

We decided to go with a different approach implemented in this PR: #136713

jonathan-buttner added 8 commits September 19, 2025 11:42

Adding todos

f2605af

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-c…

b0c82e5

…all-model-reg

Starting changes

4750b48

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-c…

08a13bd

…all-model-reg

Creating conversion functionality

a009e36

Trying to figure out bug

e44d200

Starting test changes

5e5e17e

Test changes

b4b80a4

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 8, 2025

elasticsearchmachine and others added 10 commits October 8, 2025 14:10

[CI] Auto commit changes from spotless

e6eed4f

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-c…

5d1ef9c

…all-model-reg

Adding more tests

5fbb740

Merge branch 'ml-eis-call-model-reg' of github.com:jonathan-buttner/e…

4168fb5

…lasticsearch into ml-eis-call-model-reg

[CI] Auto commit changes from spotless

b29d60d

Removing unnecessary files

92b79e0

Merge branch 'ml-eis-call-model-reg' of github.com:jonathan-buttner/e…

ba620e3

…lasticsearch into ml-eis-call-model-reg

Fixing tests

7db8baf

Adding some comments

5db2279

Merge branch 'main' of github.com:elastic/elasticsearch into ml-eis-c…

407788f

…all-model-reg

jonathan-buttner commented Oct 9, 2025

View reviewed changes

jonathan-buttner added 6 commits October 9, 2025 12:43

Adding more comments and tests

2314635

Trying to fix yaml tests

e390213

Adding requirement on contains

d4a1a03

Adding test for unauthorized model

4dfbcd5

Switching tests for error message change

028ea33

Adding compatability library

11b5e3c

jonathan-buttner closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192

[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192

Uh oh!

jonathan-buttner commented Oct 8, 2025 •

edited

Loading

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner Oct 9, 2025 •

edited

Loading

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner Oct 9, 2025

Uh oh!

jonathan-buttner commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192

[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192

Uh oh!

Conversation

jonathan-buttner commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Setup

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jonathan-buttner commented Oct 8, 2025 •

edited

Loading

jonathan-buttner Oct 9, 2025 •

edited

Loading