[ML] Add configurable batch size to GoogleVertexAI to avoid hitting token limit during chunked inference

### Description

The EmbeddingRequestChunker will batch chunks into a single request to a downstream service up to a specified batch size per service (see [batching code](https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/chunking/EmbeddingRequestChunker.java#L138)). GoogleVertexAI has 2 limitations on request sizes, one on the number of inputs (previously limited to 5 but now limited to 250), and one on the total number of tokens across all inputs (20k tokens). These limits can be found [here](https://cloud.google.com/vertex-ai/docs/quotas#text-embedding-limits)). In 8.19/9.1 we were asked to increase the batch size for GoogleVertexAI from 5 to 250 to reflect their new limits (see [relevant PR](https://github.com/elastic/elasticsearch/pull/128518)). This change caused one user to start seeing token limit exceptions when trying to ingest large documents. This is because where they previously were sending at most 5 chunks of 250 words (~1330 tokens assuming 1 token = 0.75 words), they are now sending at most 250 chunks of 250 words (~66500 tokens). 

The purpose of this issue is to add a way for users to configure their batch size through a setting on the inference endpoint (will need to decide if this is a service or a task setting). This will allow users to unblock their calls if they are hitting the token limit when attempting to ingest a large enough document into a semantic text field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Add configurable batch size to GoogleVertexAI to avoid hitting token limit during chunked inference #137288

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Add configurable batch size to GoogleVertexAI to avoid hitting token limit during chunked inference #137288

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions