-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[ML] Integrate calls to EIS for preconfigured inference endpoints for ModelRegistry functionality #136192
Conversation
…lasticsearch into ml-eis-call-model-reg
…lasticsearch into ml-eis-call-model-reg
@@ -1,359 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR removes the polling logic against the EIS authorization endpoint so we can't revoke anymore. EIS is treated as the source of truth for EIS preconfigured inference endpoint information.
We don't need to revoke because this PR removes the functionality to store the EIS preconfigured inference endpoints in the inference index and removes them from being stored in the model registry.
) { | ||
SubscribableListener.<ElasticInferenceServiceAuthorizationModel>newForked(authModelListener -> { | ||
// Executing on a separate thread because there's a chance the authorization call needs to do some initialization for the Sender | ||
threadPool.executor(UTILITY_THREAD_POOL_NAME).execute(() -> getEisAuthorization(authModelListener, eisSender)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need to initialize the Sender synchronously so we don't need to jump on a separate thread.
|
||
var eisConfig = ElasticInferenceServiceMinimalSettings.getWithInferenceId(inferenceEntityId); | ||
if (eisConfig != null) { | ||
return eisConfig.minimalSettings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used when an index mapping is created that contains a semantic text field. If we didn't include these changes, semantic text will always log a warning that the inference endpoint id used may not exist.
The semantic text field uses this method to retrieve some configuration settings like the task type. If it can't get them in this method it will retrieve them during the first document ingestion.
This is a temporary solution until the model registry has the poll logic to retrieve the preconfigured inference endpoints from EIS.
Typically we'd just make a call to EIS here to determine the settings but we can't make an asynchronous call in this context.
We could leave this functionality out and a warning will be logged every time.
return defaultConfigIds.containsKey(inferenceEntityId); | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer referenced.
if (Strings.isNullOrEmpty(baseUrl)) { | ||
logger.debug("The base URL for the authorization service is not valid, rejecting authorization."); | ||
listener.onResponse(ElasticInferenceServiceAuthorizationModel.newDisabledService()); | ||
listener.onFailure(new IllegalStateException("The Elastic Inference Service URL is not configured.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expose that EIS isn't configured. In reality this won't change the upstream logic much. The places that use this method should be only debug logging the IllegalStateException
.
// This mirrors the memory constraints observed with sparse embeddings | ||
private static final Integer DENSE_TEXT_EMBEDDINGS_MAX_BATCH_SIZE = 16; | ||
|
||
// rainbow-sprinkles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to ElasticInferenceServiceMinimalSettings
} | ||
|
||
@Override | ||
public List<DefaultConfigId> defaultConfigIds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We go directly to the EIS authorization service for this information now (or temporarily get it from the hardcoded logic we have for the default inference endpoints).
} | ||
} | ||
|
||
public void testSupportedStreamingTasks_ReturnsEmpty_WhenAuthRespondsWithoutChatCompletion() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have authorization logic anymore for supportStreamingTasks
so I'm removing the test.
This PR modifies the
ModelRegistry
to leverage the EIS v1 authorization endpoint to determine which preconfigured inference endpoints are authorized and should be returned by theModelRegistry
.Notable changes:
ModelRegistry
in memory concurrent hash map.ModelRegistry
methods:getModel
,getModelWithSecrets
,getModelsByTaskType
, andgetAllModels
now reach out to EIS if we know the inference ID is for a preconfigured endpoint for EIS or if we can't find it in the inference index and we can't find it in the in memory hash mapModelRegistry::getMinimalServiceSettings
relies on a hardcoded map to see if the passed in inference id is an EIS oneTesting
Setup
Start the EIS gateway
eis-gateway
repoStart ES
Test Get Services API
Test Get all endpoints
Test get sparse embedding endpoints
Get single endpoint
Test retrieving unauthorized EIS preconfigured inference endpoint