Skip to content

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Oct 8, 2025

This PR modifies the ModelRegistry to leverage the EIS v1 authorization endpoint to determine which preconfigured inference endpoints are authorized and should be returned by the ModelRegistry.

Notable changes:

  • The EIS preconfigured inference endpoints are no longer stored in the inference index and they are no longer stored in the ModelRegistry in memory concurrent hash map.
  • ModelRegistry methods: getModel, getModelWithSecrets, getModelsByTaskType, and getAllModels now reach out to EIS if we know the inference ID is for a preconfigured endpoint for EIS or if we can't find it in the inference index and we can't find it in the in memory hash map
  • The authorization polling logic was removed. The polling logic occurred on each ES node. The polling logic will be added back and moved only to occur on the master node in a follow up PR.
  • ModelRegistry::getMinimalServiceSettings relies on a hardcoded map to see if the passed in inference id is an EIS one

Testing

Setup

Start the EIS gateway
eis-gateway repo

make TLS_VERIFY_CLIENT_CERTS=false run

Start ES

run-es -Dtests.es.xpack.inference.elastic.url=https://localhost:8443 -Dtests.es.xpack.inference.elastic.http.ssl.verification_mode=none
Test Get Services API
GET _inference/_services
    {
        "service": "elastic",
        "name": "Elastic",
        "task_types": [
            "sparse_embedding",
            "chat_completion"
        ],
        "configurations": {
            "model_id": {
                "description": "The name of the model to use for the inference task.",
                "label": "Model ID",
                "required": true,
                "sensitive": false,
                "updatable": false,
                "type": "str",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding",
                    "rerank",
                    "chat_completion"
                ]
            },
            "max_input_tokens": {
                "description": "Allows you to specify the maximum number of tokens per input.",
                "label": "Maximum Input Tokens",
                "required": false,
                "sensitive": false,
                "updatable": false,
                "type": "int",
                "supported_task_types": [
                    "text_embedding",
                    "sparse_embedding"
                ]
            }
        }
    },
Test Get all endpoints
GET _inference/_all
{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "word",
                "max_chunk_size": 250,
                "overlap": 100
            }
        },
        {
            "inference_id": ".elser-2-elasticsearch",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        },
        {
            "inference_id": ".multilingual-e5-small-elasticsearch",
            "task_type": "text_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".multilingual-e5-small",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        },
        {
            "inference_id": ".rainbow-sprinkles-elastic",
            "task_type": "chat_completion",
            "service": "elastic",
            "service_settings": {
                "model_id": "rainbow-sprinkles"
            }
        },
        {
            "inference_id": ".rerank-v1-elasticsearch",
            "task_type": "rerank",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".rerank-v1",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "task_settings": {
                "return_documents": true
            }
        }
    ]
}
Test get sparse embedding endpoints
GET _inference/sparse_embedding/_all
{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "word",
                "max_chunk_size": 250,
                "overlap": 100
            }
        },
        {
            "inference_id": ".elser-2-elasticsearch",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 0,
                    "max_number_of_allocations": 32
                }
            },
            "chunking_settings": {
                "strategy": "sentence",
                "max_chunk_size": 250,
                "sentence_overlap": 1
            }
        }
    ]
}
Get single endpoint
GET _inference/sparse_embedding/.elser-2-elastic
{
    "endpoints": [
        {
            "inference_id": ".elser-2-elastic",
            "task_type": "sparse_embedding",
            "service": "elastic",
            "service_settings": {
                "model_id": "elser_model_2"
            },
            "chunking_settings": {
                "strategy": "word",
                "max_chunk_size": 250,
                "overlap": 100
            }
        }
    ]
}
Test retrieving unauthorized EIS preconfigured inference endpoint
GET _inference/rerank/.rerank-v1-elastic
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Unable to retrieve the preconfigured inference endpoint [.rerank-v1-elastic] from the Elastic Inference Service"
            }
        ],
        "type": "status_exception",
        "reason": "Unable to retrieve the preconfigured inference endpoint [.rerank-v1-elastic] from the Elastic Inference Service",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "No Elastic Inference Service preconfigured endpoint found for inference ID [.rerank-v1-elastic]. Either it does not exist, or you are not authorized to access it."
        }
    },
    "status": 400
}

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.3.0 labels Oct 8, 2025
@@ -1,359 +0,0 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes the polling logic against the EIS authorization endpoint so we can't revoke anymore. EIS is treated as the source of truth for EIS preconfigured inference endpoint information.

We don't need to revoke because this PR removes the functionality to store the EIS preconfigured inference endpoints in the inference index and removes them from being stored in the model registry.

) {
SubscribableListener.<ElasticInferenceServiceAuthorizationModel>newForked(authModelListener -> {
// Executing on a separate thread because there's a chance the authorization call needs to do some initialization for the Sender
threadPool.executor(UTILITY_THREAD_POOL_NAME).execute(() -> getEisAuthorization(authModelListener, eisSender));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to initialize the Sender synchronously so we don't need to jump on a separate thread.


var eisConfig = ElasticInferenceServiceMinimalSettings.getWithInferenceId(inferenceEntityId);
if (eisConfig != null) {
return eisConfig.minimalSettings();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used when an index mapping is created that contains a semantic text field. If we didn't include these changes, semantic text will always log a warning that the inference endpoint id used may not exist.

The semantic text field uses this method to retrieve some configuration settings like the task type. If it can't get them in this method it will retrieve them during the first document ingestion.

This is a temporary solution until the model registry has the poll logic to retrieve the preconfigured inference endpoints from EIS.

Typically we'd just make a call to EIS here to determine the settings but we can't make an asynchronous call in this context.

We could leave this functionality out and a warning will be logged every time.

return defaultConfigIds.containsKey(inferenceEntityId);
}

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer referenced.

if (Strings.isNullOrEmpty(baseUrl)) {
logger.debug("The base URL for the authorization service is not valid, rejecting authorization.");
listener.onResponse(ElasticInferenceServiceAuthorizationModel.newDisabledService());
listener.onFailure(new IllegalStateException("The Elastic Inference Service URL is not configured."));
Copy link
Contributor Author

@jonathan-buttner jonathan-buttner Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expose that EIS isn't configured. In reality this won't change the upstream logic much. The places that use this method should be only debug logging the IllegalStateException.

// This mirrors the memory constraints observed with sparse embeddings
private static final Integer DENSE_TEXT_EMBEDDINGS_MAX_BATCH_SIZE = 16;

// rainbow-sprinkles
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to ElasticInferenceServiceMinimalSettings

}

@Override
public List<DefaultConfigId> defaultConfigIds() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We go directly to the EIS authorization service for this information now (or temporarily get it from the hardcoded logic we have for the default inference endpoints).

}
}

public void testSupportedStreamingTasks_ReturnsEmpty_WhenAuthRespondsWithoutChatCompletion() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have authorization logic anymore for supportStreamingTasks so I'm removing the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant