-
Couldn't load subscription status.
- Fork 25.6k
[ML] Integrate OpenAi Chat Completion in SageMaker #127767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
SageMaker now supports Completion and Chat Completion using the OpenAI interfaces. Additionally: - Fixed bug related to timeouts being nullable, default to 30s timeout - Exposed existing OpenAi request/response parsing logic for reuse
|
Hi @prwhelan, I've created a changelog YAML for you. |
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple questions
| } catch (IOException e) { | ||
| throw new ElasticsearchStatusException( | ||
| "Failed to parse event from inference provider: {}", | ||
| RestStatus.INTERNAL_SERVER_ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've talked about switching to use 502s, do you think that'd be appropriate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? Because this IOException is an error with our parsing logic, which may or may not mean there is something wrong with their response. It could be that we're out of date.
| Map<String, Object> taskSettings, | ||
| InputType inputType, | ||
| TimeValue timeout, | ||
| @Nullable TimeValue timeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I thought the timeout is defaulted in the InferenceAction
Line 182 in a4a2714
| this.inferenceTimeout = DEFAULT_TIMEOUT; |
Can it be null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I believe I was hitting an issue when I was using curl, I think it can be null through this path:
Line 49 in a4a2714
| var inferTimeout = parseTimeout(restRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be defaulting it there too I think:
static TimeValue parseTimeout(RestRequest restRequest) {
return restRequest.paramAsTime(InferenceAction.Request.TIMEOUT.getPreferredName(), InferenceAction.Request.DEFAULT_TIMEOUT);
}
I think we should consider it a bug if it's null once it gets to the infer() calls. We should make sure it's defaulted prior to those calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| @Override | ||
| public TransportVersion getMinimalSupportedVersion() { | ||
| return TransportVersions.ML_INFERENCE_SAGEMAKER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to create a new transport version right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but I did anyway. In theory, since the name and parsing logic hadn't changed, both node versions should be able to parse the input/output. But in practice, I couldn't create a multi-node cluster with the same version (9.1.0) and different docker hashes, so I have no way to verify this assumption
| } | ||
| ] | ||
| } | ||
| """.replaceAll("\\s+", "").replaceAll("\\n+", "") + "\n\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would XContentHelper.stripWhitespace() work here?
SageMaker now supports Completion and Chat Completion using the OpenAI interfaces. Additionally: - Fixed bug related to timeouts being nullable, default to 30s timeout - Exposed existing OpenAi request/response parsing logic for reuse
SageMaker now supports Completion and Chat Completion using the OpenAI interfaces. Additionally: - Fixed bug related to timeouts being nullable, default to 30s timeout - Exposed existing OpenAi request/response parsing logic for reuse
SageMaker now supports Completion and Chat Completion using the OpenAI interfaces.
Additionally: