-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] SageMaker Elastic Payload #129413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] SageMaker Elastic Payload #129413
Conversation
Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response. - SageMaker now supports all task types in the Elastic API format. - Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure. - Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.
|
Hi @prwhelan, I've created a changelog YAML for you. |
|
Pinging @elastic/ml-core (Team:ML) |
|
|
||
| public void testGetServicesWithoutTaskType() throws IOException { | ||
| List<Object> services = getAllServices(); | ||
| assertThat(services.size(), equalTo(24)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to check the list size anymore, containsInAnyOrder does that and will print out the missing element
jonathan-buttner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a few suggestions. Just curious, why didn't we use SSE?
|
|
||
| var deque = new ArrayDeque<StreamingChatCompletionResults.Result>(); | ||
| XContentParser.Token token; | ||
| while ((token = p.nextToken()) != XContentParser.Token.END_ARRAY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would XContentParserUtils.parseList help here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No =( because it parses a List
I'll refactor away from Deque one of these days. It's been more annoying than it's worth.
| if (request.input().size() > 1) { | ||
| builder.field(INPUT.getPreferredName(), request.input()); | ||
| } else { | ||
| builder.field(INPUT.getPreferredName(), request.input().get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle the situation where input is empty? Maybe we already handle that before the inference call gets to the services 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support serializing the request to AWS as a string? Can we always send an array even if it has a single item in the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle the situation where
inputis empty? Maybe we already handle that before the inference call gets to the services 🤔
Yeah it is handled before it gets to the services: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceAction.java#L246
Do we need to support serializing the request to AWS as a string? Can we always send an array even if it has a single item in the array?
I figured why not? Since we read it as a string? Idk, I'm happy to force it to always be an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, it's fine the way it is 👍
|
|
||
| ApiServiceSettings(StreamInput in) throws IOException { | ||
| this( | ||
| in.readOptionalInt(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we use in.readOptionalVInt()?
SSE is a transport protocol that deliminates the start/end of a payload, but AWS already has its own transport protocol so we don't need another one on top of it. We could use it to give users the appearance that the payload will be transported as-is? But we won't be doing that for cross-node streaming (if/when we do that). Up to you, it's not a time-consuming change. |
Ah ok. I was just curious about your reasoning. Looks good without 👍 |
Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response. - SageMaker now supports all task types in the Elastic API format. - Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure. - Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.
Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response. - SageMaker now supports all task types in the Elastic API format. - Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure. - Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.
Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response. - SageMaker now supports all task types in the Elastic API format. - Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure. - Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.
Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response.