Skip to content

Conversation

@lhoet-google
Copy link
Collaborator

@lhoet-google lhoet-google commented Apr 30, 2025

Keeping this PR so I can track the progress and see the code. It's still in early development so a lot of things can change.

Once it's finished it will be squashed in one commit and submitted as a PR to the main elasticsearch repo

return messageRoleLowered;
}

// TODO: Here is OK to throw an IOException?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better as an ElasticsearchStatusException with RestStatus.BAD_REQUEST since it is an unsupported configuration that the user has to take action on. Preferably, this is validated within GoogleVertexAiService but I'm okay with it being this late in the call chain as well

builder.field(ROLE, messageRoleToGoogleVertexAiSupportedRole(message.role()));
builder.startArray(PARTS);
builder.startObject();
builder.field(TEXT, message.content().toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lhoet-google and others added 14 commits May 14, 2025 12:22
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiServiceTests.java
Implemented basic unit testing.
Will improve in the next commit.

As of now, we want to find a way to mock certain parts of the initialization of the Google VertexAI service that trigger the authorization decorator, without using tools like powermock or changing too much the code.
Implemented a test case for persisted config with secrets.
public InferenceServiceResults parseResult(Request request, Flow.Publisher<HttpResult> flow) {
assert request.isStreaming() : "GoogleVertexAiUnifiedChatCompletionResponseHandler only supports streaming requests";

var serverSentEventProcessor = new JsonArrayPartsEventProcessor(new JsonArrayPartsEventParser());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we send the request with the alt=sse query param, the API will respond in SSE, and then we can reuse the existing ServerSentEventProcessor: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googleaistudio/GoogleAiStudioResponseHandler.java#L90

That is at least what happens when I test the API with curl. We'd then have less code to maintain. JsonArrayPartsEventParser is cleverly written though well done.

ActionListener<InferenceServiceResults> listener
) {

var chatInputs = (UnifiedChatInput) inferenceInputs;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var chatInputs = (UnifiedChatInput) inferenceInputs;
var chatInputs = inferenceInputs.castTo(UnifiedChatInput.class);

If the types are somehow wrong, this will throw a decorated IllegalArgumentException rather than the ClassCastException

private static final String FUNCTION_TYPE = "function";

private final BiFunction<String, Exception, Exception> errorParser;
private final Deque<StreamingUnifiedChatCompletionResults.ChatCompletionChunk> buffer = new LinkedBlockingDeque<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delete the buffer code in this class - StreamingUnifiedChatCompletionResults now has a buffer internally (so we don't have to copy/paste the buffer code everywhere): elastic@b108e39

}
}

public void testUnifiedCompletionInfer_WithGoogleVertexAiModel() throws IOException {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually go in GoogleVertexAiServiceTests


@Override
public String getWriteableName() {
return NAME;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be registered in InferenceNamedWriteablesProvider. We don't have any tests to verify this explicitly, so it's hard to know/verify, but it'll come up in multi-node clusters if one node calls another node to call Vertex AI.

We just added a test case to help verify this, if you want, that you can extend:

@lhoet-google
Copy link
Collaborator Author

lhoet-google commented May 19, 2025

@prwhelan thanks for all the feedback! We squashed all commits into a single one and made another PR here: elastic#128105 . Will work on your comments on that PR

@lhoet-google
Copy link
Collaborator Author

Closing, this feature has been merged here elastic#128105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants