-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Inference API] Introduce Update API to change some aspects of existing inference endpoints #114457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/ml-core (Team:ML) |
Hi @maxhniebergall, I've created a changelog YAML for you. |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
merge conflict between base and head |
…h into inferenceUpgradeApi
…eUpgradeApi # Conflicts: # server/src/main/java/org/elasticsearch/inference/TaskSettings.java # x-pack/plugin/inference/src/internalClusterTest/java/org/elasticsearch/xpack/inference/integration/ModelRegistryIT.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/completion/AlibabaCloudSearchCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/sparse/AlibabaCloudSearchSparseTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/amazonbedrock/completion/AmazonBedrockChatCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/completion/AzureAiStudioChatCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureopenai/completion/AzureOpenAiCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/embeddings/CohereEmbeddingsTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elasticsearch/CustomElandRerankTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/rerank/GoogleVertexAiRerankTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/embeddings/OpenAiEmbeddingsTaskSettingsTests.java
@elasticmachine merge upstream |
} | ||
|
||
/** | ||
* Only for non-in-cluster services. Combines the existing model with the new settings to create a new model using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only for non-in-cluster
services part outdated, or how does it reconcile with the below comment
// In cluster services can only have their num_allocations updated, so this is a special case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right, that comment is outdated. I'll update it. Thanks!
modelRegistry.getModelWithSecrets(inferenceEntityId, ActionListener.wrap((model) -> { | ||
if (model == null) { | ||
listener.onFailure( | ||
ExceptionsHelper.badRequestException(Messages.INFERENCE_ENTITY_NON_EXISTANT_NO_UPDATE, inferenceEntityId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw a 404 for these two? Though I see that doesn't exist in ExceptionsHelper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think that would be more correct. I'll update it.
); | ||
|
||
client.prepareBulk().add(configRequest).setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).execute(subListener); | ||
logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this call the below attached listener if we call subListener.onFailure
? If yes, I think we'll log two errors for the same issue. If no, I think we need to preventDeletionLock.remove(inferenceEntityId);
Might be easier to always call subListener
in each block, then replace that last block with:
.addListener(finalListener.delegateResponse(subListener, e) -> {
preventDeletionLock.remove(inferenceEntityId);
finalListener.onFailure(
new ElasticsearchStatusException(
format(
"Failed to rollback while handling failure to update inference endpoint [%s]. "
+ "Endpoint may be in an inconsistent state due to [%s]",
inferenceEntityId
),
RestStatus.INTERNAL_SERVER_ERROR,
e
)
);
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subscribable listener documentation says "A failure of any step will bypass the remaining steps and ultimately fail finalListener.". I do wonder if these lines after 379 will ever actually be executed... Actually, I think they need to be added into the listener passed on 379.
I think that the reason that the final block is set up like this is because the bulk API doesn't throw exceptions, it catches them into this BulkResponse configResponse
object, and we need to turn that into an exception for the listener.
Similarly, I am calling onResponse or onFailure for the finalListener instead of the subListener because I need to call those functions with the correct object types (boolean). The last block isn't really for the whole set of calls, its only for catching the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it also occurs to me that if the bulk API throws an exception for some reason, we won't properly remove the lock, so I need to add a new action listener
); | ||
return; | ||
} else { | ||
preventDeletionLock.add(inferenceEntityId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
format("Failed to update inference endpoint [%s] due to [%s]", inferenceEntityId, configResponse.buildFailureMessage()) | ||
); | ||
// Since none of our updates succeeded at this point, we can simply return. | ||
finalListener.onFailure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to unlock here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think that's correct. We probably should be unlocking whenever we call the finalListener or the the final sub listener. Do you think you could add this tomorrow? If not, I can probably do it at some point before ff.
@elasticmachine update branch |
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
…ng inference endpoints (elastic#114457)
…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)
…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)
…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…ng inference endpoints (elastic#114457)
Overview
The following fields can be updated
Examples:
Put ELSER:
response:
ELSER
Update ELSER:
request:
localhost:9200/_inference/sparse_embedding/elser_endpoint1/_update
response:
GET Trained Models:
response:
GET endpoints:
response
Cohere Rerank
Put endpoint
request:
response:
Update endpoint:
request:
localhost:9200/_inference/rerank/testss/_update
response:
GET endpoints:
response