Skip to content

Conversation

@prwhelan
Copy link
Member

Send the Elastic API Payload to a SageMaker endpoint, and parse the response as if it were an Elastic API response.

  • SageMaker now supports all task types in the Elastic API format.
  • Streaming is supported using the SageMaker client/server rpc, rather than SSE. Payloads must be in a complete and valid JSON structure.
  • Task Settings can be used for additional passthrough settings, but they will not be saved alongside the model. Elastic cannot make guarantees on the structure or contents of this payload, so Elastic will treat it like the other input payloads and only allow them during inference.

Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.

- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
  rather than SSE. Payloads must be in a complete and valid JSON
  structure.
- Task Settings can be used for additional passthrough settings, but
  they will not be saved alongside the model. Elastic cannot make
  guarantees on the structure or contents of this payload, so Elastic
  will treat it like the other input payloads and only allow them during
  inference.
@prwhelan prwhelan added >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0 labels Jun 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @prwhelan, I've created a changelog YAML for you.

@prwhelan prwhelan marked this pull request as ready for review June 13, 2025 15:48
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)


public void testGetServicesWithoutTaskType() throws IOException {
List<Object> services = getAllServices();
assertThat(services.size(), equalTo(24));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to check the list size anymore, containsInAnyOrder does that and will print out the missing element

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a few suggestions. Just curious, why didn't we use SSE?


var deque = new ArrayDeque<StreamingChatCompletionResults.Result>();
XContentParser.Token token;
while ((token = p.nextToken()) != XContentParser.Token.END_ARRAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would XContentParserUtils.parseList help here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No =( because it parses a List

I'll refactor away from Deque one of these days. It's been more annoying than it's worth.

if (request.input().size() > 1) {
builder.field(INPUT.getPreferredName(), request.input());
} else {
builder.field(INPUT.getPreferredName(), request.input().get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the situation where input is empty? Maybe we already handle that before the inference call gets to the services 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support serializing the request to AWS as a string? Can we always send an array even if it has a single item in the array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the situation where input is empty? Maybe we already handle that before the inference call gets to the services 🤔

Yeah it is handled before it gets to the services: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/action/InferenceAction.java#L246

Do we need to support serializing the request to AWS as a string? Can we always send an array even if it has a single item in the array?

I figured why not? Since we read it as a string? Idk, I'm happy to force it to always be an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, it's fine the way it is 👍


ApiServiceSettings(StreamInput in) throws IOException {
this(
in.readOptionalInt(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use in.readOptionalVInt()?

@prwhelan
Copy link
Member Author

Looks good, I left a few suggestions. Just curious, why didn't we use SSE?

SSE is a transport protocol that deliminates the start/end of a payload, but AWS already has its own transport protocol so we don't need another one on top of it. We could use it to give users the appearance that the payload will be transported as-is? But we won't be doing that for cross-node streaming (if/when we do that). Up to you, it's not a time-consuming change.

@jonathan-buttner
Copy link
Contributor

Looks good, I left a few suggestions. Just curious, why didn't we use SSE?

SSE is a transport protocol that deliminates the start/end of a payload, but AWS already has its own transport protocol so we don't need another one on top of it. We could use it to give users the appearance that the payload will be transported as-is? But we won't be doing that for cross-node streaming (if/when we do that). Up to you, it's not a time-consuming change.

Ah ok. I was just curious about your reasoning. Looks good without 👍

@prwhelan prwhelan enabled auto-merge (squash) June 23, 2025 20:37
@prwhelan prwhelan merged commit aeb3718 into elastic:main Jun 23, 2025
32 checks passed
prwhelan added a commit to prwhelan/elasticsearch that referenced this pull request Jun 23, 2025
Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.

- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
  rather than SSE. Payloads must be in a complete and valid JSON
  structure.
- Task Settings can be used for additional passthrough settings, but
  they will not be saved alongside the model. Elastic cannot make
  guarantees on the structure or contents of this payload, so Elastic
  will treat it like the other input payloads and only allow them during
  inference.
elasticsearchmachine pushed a commit that referenced this pull request Jun 23, 2025
Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.

- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
  rather than SSE. Payloads must be in a complete and valid JSON
  structure.
- Task Settings can be used for additional passthrough settings, but
  they will not be saved alongside the model. Elastic cannot make
  guarantees on the structure or contents of this payload, so Elastic
  will treat it like the other input payloads and only allow them during
  inference.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.

- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
  rather than SSE. Payloads must be in a complete and valid JSON
  structure.
- Task Settings can be used for additional passthrough settings, but
  they will not be saved alongside the model. Elastic cannot make
  guarantees on the structure or contents of this payload, so Elastic
  will treat it like the other input payloads and only allow them during
  inference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants