[ML] SageMaker Elastic Payload #129413

prwhelan · 2025-06-13T13:20:44Z

Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response.

SageMaker now supports all task types in the Elastic API format.
Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure.
Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.

Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response. - SageMaker now supports all task types in the Elastic API format. - Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure. - Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.

elasticsearchmachine · 2025-06-13T13:21:10Z

Hi @prwhelan, I've created a changelog YAML for you.

elasticsearchmachine · 2025-06-13T15:49:16Z

Pinging @elastic/ml-core (Team:ML)

prwhelan · 2025-06-13T13:35:36Z

...ce-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/InferenceGetServicesIT.java


    public void testGetServicesWithoutTaskType() throws IOException {
-        List<Object> services = getAllServices();
-        assertThat(services.size(), equalTo(24));


We don't need to check the list size anymore, containsInAnyOrder does that and will print out the missing element

jonathan-buttner

Looks good, I left a few suggestions. Just curious, why didn't we use SSE?

jonathan-buttner · 2025-06-23T13:14:59Z

...lasticsearch/xpack/inference/services/sagemaker/schema/elastic/ElasticCompletionPayload.java

+
+                var deque = new ArrayDeque<StreamingChatCompletionResults.Result>();
+                XContentParser.Token token;
+                while ((token = p.nextToken()) != XContentParser.Token.END_ARRAY) {


Would XContentParserUtils.parseList help here?

No =( because it parses a List

I'll refactor away from Deque one of these days. It's been more annoying than it's worth.

jonathan-buttner · 2025-06-23T13:25:31Z

...java/org/elasticsearch/xpack/inference/services/sagemaker/schema/elastic/ElasticPayload.java

+                if (request.input().size() > 1) {
+                    builder.field(INPUT.getPreferredName(), request.input());
+                } else {
+                    builder.field(INPUT.getPreferredName(), request.input().get(0));


Do we need to handle the situation where input is empty? Maybe we already handle that before the inference call gets to the services 🤔

Do we need to support serializing the request to AWS as a string? Can we always send an array even if it has a single item in the array?

Do we need to handle the situation where input is empty? Maybe we already handle that before the inference call gets to the services 🤔

Yeah it is handled before it gets to the services: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceAction.java#L246

Do we need to support serializing the request to AWS as a string? Can we always send an array even if it has a single item in the array?

I figured why not? Since we read it as a string? Idk, I'm happy to force it to always be an array.

Ah ok, it's fine the way it is 👍

jonathan-buttner · 2025-06-23T13:37:25Z

...ticsearch/xpack/inference/services/sagemaker/schema/elastic/ElasticTextEmbeddingPayload.java

+
+        ApiServiceSettings(StreamInput in) throws IOException {
+            this(
+                in.readOptionalInt(),


How about we use in.readOptionalVInt()?

prwhelan · 2025-06-23T14:53:23Z

Looks good, I left a few suggestions. Just curious, why didn't we use SSE?

SSE is a transport protocol that deliminates the start/end of a payload, but AWS already has its own transport protocol so we don't need another one on top of it. We could use it to give users the appearance that the payload will be transported as-is? But we won't be doing that for cross-node streaming (if/when we do that). Up to you, it's not a time-consuming change.

…er-elastic-all-2

jonathan-buttner · 2025-06-23T15:12:38Z

Looks good, I left a few suggestions. Just curious, why didn't we use SSE?

SSE is a transport protocol that deliminates the start/end of a payload, but AWS already has its own transport protocol so we don't need another one on top of it. We could use it to give users the appearance that the payload will be transported as-is? But we won't be doing that for cross-node streaming (if/when we do that). Up to you, it's not a time-consuming change.

Ah ok. I was just curious about your reasoning. Looks good without 👍

Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response. - SageMaker now supports all task types in the Elastic API format. - Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure. - Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.

prwhelan added >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0 labels Jun 13, 2025

Update docs/changelog/129413.yaml

eecce82

elasticsearchmachine and others added 3 commits June 13, 2025 13:29

[CI] Auto commit changes from spotless

b4b1f3f

Add sagemaker to IT

fa1909c

Fix copyright header

76525d3

prwhelan marked this pull request as ready for review June 13, 2025 15:48

prwhelan commented Jun 13, 2025

View reviewed changes

jonathan-buttner approved these changes Jun 23, 2025

View reviewed changes

prwhelan added 3 commits June 23, 2025 10:59

Merge branch 'main' of github.com:prwhelan/elasticsearch into sagemak…

1d67191

…er-elastic-all-2

Merge branch 'main' of github.com:prwhelan/elasticsearch into sagemak…

9ca82dc

…er-elastic-all-2

Use vint

0d9e76b

prwhelan added 3 commits June 23, 2025 12:18

Merge branch 'main' into sagemaker-elastic-all-2

d929fba

Merge branch 'main' into sagemaker-elastic-all-2

e674381

Merge branch 'main' into sagemaker-elastic-all-2

69998a8

prwhelan enabled auto-merge (squash) June 23, 2025 20:37

prwhelan merged commit aeb3718 into elastic:main Jun 23, 2025
32 checks passed

prwhelan mentioned this pull request Jun 23, 2025

[ML] SageMaker Elastic Payload (#129413) #129882

Merged

[ML] SageMaker Elastic Payload #129413

[ML] SageMaker Elastic Payload #129413

Uh oh!

Conversation

prwhelan commented Jun 13, 2025

Uh oh!

elasticsearchmachine commented Jun 13, 2025

Uh oh!

elasticsearchmachine commented Jun 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prwhelan commented Jun 23, 2025

Uh oh!

jonathan-buttner commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants