-
Notifications
You must be signed in to change notification settings - Fork 1
Vertexai chatcompletion #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| return messageRoleLowered; | ||
| } | ||
|
|
||
| // TODO: Here is OK to throw an IOException? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better as an ElasticsearchStatusException with RestStatus.BAD_REQUEST since it is an unsupported configuration that the user has to take action on. Preferably, this is validated within GoogleVertexAiService but I'm okay with it being this late in the call chain as well
| builder.field(ROLE, messageRoleToGoogleVertexAiSupportedRole(message.role())); | ||
| builder.startArray(PARTS); | ||
| builder.startObject(); | ||
| builder.field(TEXT, message.content().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the toString() will work, message.content() can return one of a few different objects, example: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/unified/UnifiedChatCompletionRequestEntity.java#L61
…d tools choice with tests
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiServiceTests.java
Implemented basic unit testing. Will improve in the next commit. As of now, we want to find a way to mock certain parts of the initialization of the Google VertexAI service that trigger the authorization decorator, without using tools like powermock or changing too much the code.
Implemented a test case for persisted config with secrets.
…sticsearch into vertexai-chatcompletion
| public InferenceServiceResults parseResult(Request request, Flow.Publisher<HttpResult> flow) { | ||
| assert request.isStreaming() : "GoogleVertexAiUnifiedChatCompletionResponseHandler only supports streaming requests"; | ||
|
|
||
| var serverSentEventProcessor = new JsonArrayPartsEventProcessor(new JsonArrayPartsEventParser()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we send the request with the alt=sse query param, the API will respond in SSE, and then we can reuse the existing ServerSentEventProcessor: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googleaistudio/GoogleAiStudioResponseHandler.java#L90
That is at least what happens when I test the API with curl. We'd then have less code to maintain. JsonArrayPartsEventParser is cleverly written though well done.
| ActionListener<InferenceServiceResults> listener | ||
| ) { | ||
|
|
||
| var chatInputs = (UnifiedChatInput) inferenceInputs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var chatInputs = (UnifiedChatInput) inferenceInputs; | |
| var chatInputs = inferenceInputs.castTo(UnifiedChatInput.class); |
If the types are somehow wrong, this will throw a decorated IllegalArgumentException rather than the ClassCastException
| private static final String FUNCTION_TYPE = "function"; | ||
|
|
||
| private final BiFunction<String, Exception, Exception> errorParser; | ||
| private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can delete the buffer code in this class - StreamingUnifiedChatCompletionResults now has a buffer internally (so we don't have to copy/paste the buffer code everywhere): elastic@b108e39
| } | ||
| } | ||
|
|
||
| public void testUnifiedCompletionInfer_WithGoogleVertexAiModel() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually go in GoogleVertexAiServiceTests
|
|
||
| @Override | ||
| public String getWriteableName() { | ||
| return NAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be registered in InferenceNamedWriteablesProvider. We don't have any tests to verify this explicitly, so it's hard to know/verify, but it'll come up in multi-node clusters if one node calls another node to call Vertex AI.
We just added a test case to help verify this, if you want, that you can extend:
|
@prwhelan thanks for all the feedback! We squashed all commits into a single one and made another PR here: elastic#128105 . Will work on your comments on that PR |
|
Closing, this feature has been merged here elastic#128105 |
Keeping this PR so I can track the progress and see the code. It's still in early development so a lot of things can change.
Once it's finished it will be squashed in one commit and submitted as a PR to the main elasticsearch repo