-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Inference API removing _unified and using _stream instead #121804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Inference API removing _unified and using _stream instead #121804
Conversation
…search into ml-proxy-action
…search into ml-proxy-action
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Left a comment about the internal action name everything else is great
...n/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java
Outdated
Show resolved
Hide resolved
| */ | ||
| public class InferenceActionProxy extends ActionType<InferenceAction.Response> { | ||
| public static final InferenceActionProxy INSTANCE = new InferenceActionProxy(); | ||
| public static final String NAME = "cluster:monitor/xpack/inference"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is safe to use the old InferenceAction name can be used here when it has different request and response classes. I'm thinking about a mixed cluster. To be safe please give it a new name.
Get, Put and Delete are cluster:monitor/xpack/inference/[get|put|delete] this could be cluster:monitor/xpack/inference/post or cluster:monitor/xpack/inference/inference`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I'll switch to post 👍
…inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]>
| public class InferenceAction extends ActionType<InferenceAction.Response> { | ||
|
|
||
| public static final InferenceAction INSTANCE = new InferenceAction(); | ||
| public static final String NAME = "cluster:monitor/xpack/inference"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle just wanted to confirm that this is what we want to do here right? Changing it to internal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ yes internal is good
|
|
||
| private void sendUnifiedCompletionRequest(InferenceActionProxy.Request request, ActionListener<InferenceAction.Response> listener) { | ||
| // format any validation exceptions from the rest -> transport path as UnifiedChatCompletionException | ||
| var unifiedErrorFormatListener = listener.delegateResponse((l, e) -> l.onFailure(UnifiedChatCompletionException.fromThrowable(e))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
💔 Backport failed
You can use sqren/backport to manually backport by running |
…ic#121804) * Adding proxy action * [CI] Auto commit changes from spotless * Incrementing reference count for body content and fixing tests * [CI] Auto commit changes from spotless * Refactoring * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]> * Addressing feedback --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]> (cherry picked from commit ab48235)
…ic#121804) * Adding proxy action * [CI] Auto commit changes from spotless * Incrementing reference count for body content and fixing tests * [CI] Auto commit changes from spotless * Refactoring * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]> * Addressing feedback --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]> (cherry picked from commit ab48235) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rest/BaseInferenceActionTests.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…ic#121804) * Adding proxy action * [CI] Auto commit changes from spotless * Incrementing reference count for body content and fixing tests * [CI] Auto commit changes from spotless * Refactoring * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]> * Addressing feedback --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]> (cherry picked from commit ab48235) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rest/BaseInferenceActionTests.java
…) (#122042) * Adding proxy action * [CI] Auto commit changes from spotless * Incrementing reference count for body content and fixing tests * [CI] Auto commit changes from spotless * Refactoring * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]> * Addressing feedback --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]> (cherry picked from commit ab48235) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rest/BaseInferenceActionTests.java
…) (#122040) * Adding proxy action * [CI] Auto commit changes from spotless * Incrementing reference count for body content and fixing tests * [CI] Auto commit changes from spotless * Refactoring * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]> * Addressing feedback --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]> (cherry picked from commit ab48235)
…) (#122045) * Adding proxy action * [CI] Auto commit changes from spotless * Incrementing reference count for body content and fixing tests * [CI] Auto commit changes from spotless * Refactoring * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceActionProxy.java Co-authored-by: David Kyle <[email protected]> * Addressing feedback --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]> (cherry picked from commit ab48235) # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/rest/BaseInferenceActionTests.java
This PR refactors the unified schema approach. Prior to this PR the to interact with the unified schema a request is made to
_inference/completion/<inference endpoint id>/_unifiedThis PR refactors the code such that we no longer have a
_unifiedendpoint. Instead we'll use the previous way with_stream:To do this I created a new proxy action that is registered for the inference rest APIs. The reason we need a proxy action is because we need to know whether the request is the new unified schema or the input task_settings schema. The proxy action determines this by looking to see if the task type was included in the URL. If it wasn't, we retrieve the inference entity from storage and retrieve that task type that way. Once we have the task type we create a new request and route it to
InferenceActionorUnifiedInferenceAction.Other Notes
The unified action does not support non-streaming so if a request is made like:
An error will be returned
Error
The stats/metrics aren't recorded until the internal action is handled.
Testing
Creating the endpoint
Making a unified request
The response structure will be the same as it was before