-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Inference API disable partial search results #132362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jonathan-buttner
merged 9 commits into
elastic:main
from
jonathan-buttner:inference-api-disable-partial-results
Aug 5, 2025
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
c40aab2
Working on tests
jonathan-buttner 756882b
Update docs/changelog/132362.yaml
jonathan-buttner 796d94e
Adding integration test
jonathan-buttner 926c4fe
Wrapping exception
jonathan-buttner d467eab
Fixing flaky tests
jonathan-buttner f0023df
Removing assert
jonathan-buttner 592716c
Merge branch 'main' into inference-api-disable-partial-results
jonathan-buttner 7c2db2f
Refactoring testing functions
jonathan-buttner d389fab
Merge branch 'inference-api-disable-partial-results' of github.com:jo…
jonathan-buttner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 132362 | ||
| summary: Inference API disable partial search results | ||
| area: Machine Learning | ||
| type: bug | ||
| issues: [] |
252 changes: 252 additions & 0 deletions
252
...nalClusterTest/java/org/elasticsearch/xpack/inference/integration/InferenceIndicesIT.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| package org.elasticsearch.xpack.inference.integration; | ||
|
|
||
| import org.elasticsearch.ElasticsearchException; | ||
| import org.elasticsearch.action.ActionFuture; | ||
| import org.elasticsearch.action.search.SearchPhaseExecutionException; | ||
| import org.elasticsearch.common.bytes.BytesReference; | ||
| import org.elasticsearch.common.settings.Settings; | ||
| import org.elasticsearch.core.TimeValue; | ||
| import org.elasticsearch.inference.InferenceServiceExtension; | ||
| import org.elasticsearch.inference.TaskType; | ||
| import org.elasticsearch.license.LicenseSettings; | ||
| import org.elasticsearch.license.XPackLicenseState; | ||
| import org.elasticsearch.plugins.Plugin; | ||
| import org.elasticsearch.test.ESIntegTestCase; | ||
| import org.elasticsearch.test.ESTestCase; | ||
| import org.elasticsearch.xcontent.XContentBuilder; | ||
| import org.elasticsearch.xcontent.XContentFactory; | ||
| import org.elasticsearch.xcontent.XContentType; | ||
| import org.elasticsearch.xpack.core.LocalStateCompositeXPackPlugin; | ||
| import org.elasticsearch.xpack.core.inference.InferenceContext; | ||
| import org.elasticsearch.xpack.core.inference.action.GetInferenceModelAction; | ||
| import org.elasticsearch.xpack.core.inference.action.InferenceAction; | ||
| import org.elasticsearch.xpack.core.inference.action.InferenceActionProxy; | ||
| import org.elasticsearch.xpack.core.inference.action.PutInferenceModelAction; | ||
| import org.elasticsearch.xpack.core.ssl.SSLService; | ||
| import org.elasticsearch.xpack.inference.InferenceIndex; | ||
| import org.elasticsearch.xpack.inference.InferencePlugin; | ||
| import org.elasticsearch.xpack.inference.InferenceSecretsIndex; | ||
| import org.elasticsearch.xpack.inference.mock.TestDenseInferenceServiceExtension; | ||
| import org.elasticsearch.xpack.inference.mock.TestInferenceServicePlugin; | ||
| import org.elasticsearch.xpack.inference.mock.TestSparseInferenceServiceExtension; | ||
|
|
||
| import java.io.IOException; | ||
| import java.nio.file.Path; | ||
| import java.util.Collection; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| import static org.hamcrest.CoreMatchers.containsString; | ||
| import static org.hamcrest.CoreMatchers.equalTo; | ||
| import static org.hamcrest.Matchers.instanceOf; | ||
|
|
||
| @ESTestCase.WithoutEntitlements // due to dependency issue ES-12435 | ||
| public class InferenceIndicesIT extends ESIntegTestCase { | ||
|
|
||
| private static final String INDEX_ROUTER_ATTRIBUTE = "node.attr.index_router"; | ||
| private static final String CONFIG_ROUTER = "config"; | ||
| private static final String SECRETS_ROUTER = "secrets"; | ||
|
|
||
| private static final Map<String, Object> TEST_SERVICE_SETTINGS = Map.of( | ||
| "model", | ||
| "my_model", | ||
| "dimensions", | ||
| 256, | ||
| "similarity", | ||
| "cosine", | ||
| "api_key", | ||
| "my_api_key" | ||
| ); | ||
|
|
||
| public static class LocalStateIndexSettingsInferencePlugin extends LocalStateCompositeXPackPlugin { | ||
| private final InferencePlugin inferencePlugin; | ||
|
|
||
| public LocalStateIndexSettingsInferencePlugin(final Settings settings, final Path configPath) throws Exception { | ||
| super(settings, configPath); | ||
| var thisVar = this; | ||
| this.inferencePlugin = new InferencePlugin(settings) { | ||
| @Override | ||
| protected SSLService getSslService() { | ||
| return thisVar.getSslService(); | ||
| } | ||
|
|
||
| @Override | ||
| protected XPackLicenseState getLicenseState() { | ||
| return thisVar.getLicenseState(); | ||
| } | ||
|
|
||
| @Override | ||
| public List<InferenceServiceExtension.Factory> getInferenceServiceFactories() { | ||
| return List.of( | ||
| TestSparseInferenceServiceExtension.TestInferenceService::new, | ||
| TestDenseInferenceServiceExtension.TestInferenceService::new | ||
| ); | ||
| } | ||
|
|
||
| @Override | ||
| public Settings getIndexSettings() { | ||
| return InferenceIndex.builder() | ||
| .put(Settings.builder().put("index.routing.allocation.require.index_router", "config").build()) | ||
| .build(); | ||
| } | ||
|
|
||
| @Override | ||
| public Settings getSecretsIndexSettings() { | ||
| return InferenceSecretsIndex.builder() | ||
| .put(Settings.builder().put("index.routing.allocation.require.index_router", "secrets").build()) | ||
| .build(); | ||
| } | ||
| }; | ||
| plugins.add(inferencePlugin); | ||
| } | ||
|
|
||
| } | ||
|
|
||
| @Override | ||
| protected Settings nodeSettings(int nodeOrdinal, Settings otherSettings) { | ||
| return Settings.builder().put(LicenseSettings.SELF_GENERATED_LICENSE_TYPE.getKey(), "trial").build(); | ||
| } | ||
|
|
||
| @Override | ||
| protected Collection<Class<? extends Plugin>> nodePlugins() { | ||
| return List.of(LocalStateIndexSettingsInferencePlugin.class, TestInferenceServicePlugin.class); | ||
| } | ||
|
|
||
| public void testRetrievingInferenceEndpoint_ThrowsException_WhenIndexNodeIsNotAvailable() throws Exception { | ||
| final var configIndexNodeAttributes = Settings.builder().put(INDEX_ROUTER_ATTRIBUTE, CONFIG_ROUTER).build(); | ||
|
|
||
| internalCluster().startMasterOnlyNode(configIndexNodeAttributes); | ||
| final var configIndexDataNodes = internalCluster().startDataOnlyNode(configIndexNodeAttributes); | ||
|
|
||
| internalCluster().startDataOnlyNode(Settings.builder().put(INDEX_ROUTER_ATTRIBUTE, SECRETS_ROUTER).build()); | ||
|
|
||
| final var inferenceId = "test-index-id"; | ||
| createInferenceEndpoint(TaskType.TEXT_EMBEDDING, inferenceId, TEST_SERVICE_SETTINGS); | ||
|
|
||
| // Ensure the inference indices are created and we can retrieve the inference endpoint | ||
| var getInferenceEndpointRequest = new GetInferenceModelAction.Request(inferenceId, TaskType.TEXT_EMBEDDING, true); | ||
| var responseFuture = client().execute(GetInferenceModelAction.INSTANCE, getInferenceEndpointRequest); | ||
| assertThat(responseFuture.actionGet(TEST_REQUEST_TIMEOUT).getEndpoints().get(0).getInferenceEntityId(), equalTo(inferenceId)); | ||
|
|
||
| // stop the node that holds the inference index | ||
| internalCluster().stopNode(configIndexDataNodes); | ||
|
|
||
| var responseFailureFuture = client().execute(GetInferenceModelAction.INSTANCE, getInferenceEndpointRequest); | ||
| var exception = expectThrows(ElasticsearchException.class, () -> responseFailureFuture.actionGet(TEST_REQUEST_TIMEOUT)); | ||
| assertThat(exception.toString(), containsString("Failed to load inference endpoint [test-index-id]")); | ||
|
|
||
| var causeException = exception.getCause(); | ||
| assertThat(causeException, instanceOf(SearchPhaseExecutionException.class)); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried adding more assertThat's for looking for certain text in the search phase execution exception, but they exact wording changes during my test runs and was causing the test to be flaky. |
||
| } | ||
|
|
||
| public void testRetrievingInferenceEndpoint_ThrowsException_WhenIndexNodeIsNotAvailable_ForInferenceAction() throws Exception { | ||
| final var configIndexNodeAttributes = Settings.builder().put(INDEX_ROUTER_ATTRIBUTE, CONFIG_ROUTER).build(); | ||
|
|
||
| internalCluster().startMasterOnlyNode(configIndexNodeAttributes); | ||
| final var configIndexDataNodes = internalCluster().startDataOnlyNode(configIndexNodeAttributes); | ||
|
|
||
| internalCluster().startDataOnlyNode(Settings.builder().put(INDEX_ROUTER_ATTRIBUTE, SECRETS_ROUTER).build()); | ||
|
|
||
| final var inferenceId = "test-index-id-2"; | ||
| createInferenceEndpoint(TaskType.TEXT_EMBEDDING, inferenceId, TEST_SERVICE_SETTINGS); | ||
|
|
||
| // Ensure the inference indices are created and we can retrieve the inference endpoint | ||
| var getInferenceEndpointRequest = new GetInferenceModelAction.Request(inferenceId, TaskType.TEXT_EMBEDDING, true); | ||
| var responseFuture = client().execute(GetInferenceModelAction.INSTANCE, getInferenceEndpointRequest); | ||
| assertThat(responseFuture.actionGet(TEST_REQUEST_TIMEOUT).getEndpoints().get(0).getInferenceEntityId(), equalTo(inferenceId)); | ||
|
|
||
| // stop the node that holds the inference index | ||
| internalCluster().stopNode(configIndexDataNodes); | ||
|
|
||
| var proxyResponse = sendInferenceProxyRequest(inferenceId); | ||
| var exception = expectThrows(ElasticsearchException.class, () -> proxyResponse.actionGet(TEST_REQUEST_TIMEOUT)); | ||
| assertThat(exception.toString(), containsString("Failed to load inference endpoint with secrets [test-index-id-2]")); | ||
|
|
||
| var causeException = exception.getCause(); | ||
| assertThat(causeException, instanceOf(SearchPhaseExecutionException.class)); | ||
| } | ||
|
|
||
| public void testRetrievingInferenceEndpoint_ThrowsException_WhenSecretsIndexNodeIsNotAvailable() throws Exception { | ||
| final var configIndexNodeAttributes = Settings.builder().put(INDEX_ROUTER_ATTRIBUTE, CONFIG_ROUTER).build(); | ||
| internalCluster().startMasterOnlyNode(configIndexNodeAttributes); | ||
| internalCluster().startDataOnlyNode(configIndexNodeAttributes); | ||
|
|
||
| var secretIndexDataNodes = internalCluster().startDataOnlyNode( | ||
| Settings.builder().put(INDEX_ROUTER_ATTRIBUTE, SECRETS_ROUTER).build() | ||
| ); | ||
|
|
||
| final var inferenceId = "test-secrets-index-id"; | ||
| createInferenceEndpoint(TaskType.TEXT_EMBEDDING, inferenceId, TEST_SERVICE_SETTINGS); | ||
|
|
||
| // Ensure the inference indices are created and we can retrieve the inference endpoint | ||
| var getInferenceEndpointRequest = new GetInferenceModelAction.Request(inferenceId, TaskType.TEXT_EMBEDDING, true); | ||
| var responseFuture = client().execute(GetInferenceModelAction.INSTANCE, getInferenceEndpointRequest); | ||
| assertThat(responseFuture.actionGet(TEST_REQUEST_TIMEOUT).getEndpoints().get(0).getInferenceEntityId(), equalTo(inferenceId)); | ||
|
|
||
| // stop the node that holds the inference secrets index | ||
| internalCluster().stopNode(secretIndexDataNodes); | ||
|
|
||
| var proxyResponse = sendInferenceProxyRequest(inferenceId); | ||
|
|
||
| var exception = expectThrows(ElasticsearchException.class, () -> proxyResponse.actionGet(TEST_REQUEST_TIMEOUT)); | ||
| assertThat(exception.toString(), containsString("Failed to load inference endpoint with secrets [test-secrets-index-id]")); | ||
|
|
||
| var causeException = exception.getCause(); | ||
|
|
||
| assertThat(causeException, instanceOf(SearchPhaseExecutionException.class)); | ||
| } | ||
|
|
||
| private ActionFuture<InferenceAction.Response> sendInferenceProxyRequest(String inferenceId) throws IOException { | ||
| final BytesReference content; | ||
| try (XContentBuilder builder = XContentFactory.jsonBuilder()) { | ||
| builder.startObject(); | ||
| builder.field("input", List.of("test input")); | ||
| builder.endObject(); | ||
|
|
||
| content = BytesReference.bytes(builder); | ||
| } | ||
|
|
||
| var inferenceRequest = new InferenceActionProxy.Request( | ||
| TaskType.TEXT_EMBEDDING, | ||
| inferenceId, | ||
| content, | ||
| XContentType.JSON, | ||
| TimeValue.THIRTY_SECONDS, | ||
| false, | ||
| InferenceContext.EMPTY_INSTANCE | ||
| ); | ||
|
|
||
| return client().execute(InferenceActionProxy.INSTANCE, inferenceRequest); | ||
| } | ||
|
|
||
| private void createInferenceEndpoint(TaskType taskType, String inferenceId, Map<String, Object> serviceSettings) throws IOException { | ||
| var responseFuture = createInferenceEndpointAsync(taskType, inferenceId, serviceSettings); | ||
| assertThat(responseFuture.actionGet(TEST_REQUEST_TIMEOUT).getModel().getInferenceEntityId(), equalTo(inferenceId)); | ||
| } | ||
|
|
||
| private ActionFuture<PutInferenceModelAction.Response> createInferenceEndpointAsync( | ||
| TaskType taskType, | ||
| String inferenceId, | ||
| Map<String, Object> serviceSettings | ||
| ) throws IOException { | ||
| final BytesReference content; | ||
| try (XContentBuilder builder = XContentFactory.jsonBuilder()) { | ||
| builder.startObject(); | ||
| builder.field("service", TestDenseInferenceServiceExtension.TestInferenceService.NAME); | ||
| builder.field("service_settings", serviceSettings); | ||
| builder.endObject(); | ||
|
|
||
| content = BytesReference.bytes(builder); | ||
| } | ||
|
|
||
| var request = new PutInferenceModelAction.Request(taskType, inferenceId, content, XContentType.JSON, TEST_REQUEST_TIMEOUT); | ||
| return client().execute(PutInferenceModelAction.INSTANCE, request); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can think of a better way to direct the inference configuration index and secrets index to separate nodes let me know. I'm injecting some settings that won't be present in production but allows us to direct the documents to specific nodes for easier testing.