diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc index 6e7e0e66978..76386905294 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs.adoc @@ -91,6 +91,72 @@ It will not be initialized for you by default. You must opt-in, by passing a `boolean` for the appropriate constructor argument or, if using Spring Boot, setting the appropriate `initialize-schema` property to `true` in `application.properties` or `application.yml`. Check the documentation for the vector store you are using for the specific property name. +== Batching Strategy + +When working with vector stores, it's often necessary to embed large numbers of documents. +While it might seem straightforward to make a single call to embed all documents at once, this approach can lead to issues. +Embedding models process text as tokens and have a maximum token limit, often referred to as the context window size. +This limit restricts the amount of text that can be processed in a single embedding request. +Attempting to embed too many tokens in one call can result in errors or truncated embeddings. + +To address this token limit, Spring AI implements a batching strategy. +This approach breaks down large sets of documents into smaller batches that fit within the embedding model's maximum context window. +Batching not only solves the token limit issue but can also lead to improved performance and more efficient use of API rate limits. + +Spring AI provides this functionality through the `BatchingStrategy` interface, which allows for processing documents in sub-batches based on their token counts. + +The core `BatchingStrategy` interface is defined as follows: + +[source,java] +---- +public interface BatchingStrategy { + List> batch(List documents); +} +---- + +This interface defines a single method, `batch`, which takes a list of documents and returns a list of document batches. + +=== Default Implementation: TokenCountBatchingStrategy + +Spring AI provides a default implementation called `TokenCountBatchingStrategy`. +This strategy batches documents based on their token counts, ensuring that each batch does not exceed a calculated maximum input token count. + +Key features of `TokenCountBatchingStrategy`: + +1. Uses https://platform.openai.com/docs/guides/embeddings/embedding-models[OpenAI's max input token count] (8191) as the default upper limit. +2. Incorporates a reserve percentage (default 10%) to provide a buffer for potential overhead. +3. Calculates the actual max input token count as: `actualMaxInputTokenCount = originalMaxInputTokenCount * (1 - RESERVE_PERCENTAGE)` + +The strategy estimates the token count for each document, groups them into batches without exceeding the max input token count, and throws an exception if a single document exceeds this limit. + +=== Using the BatchingStrategy + +The `BatchingStrategy` is used internally by `EmbeddingModel` implementations to optimize the embedding process. +It automatically batches documents when finding embeddings, which can lead to significant performance benefits, especially when dealing with large numbers of documents or APIs with token limitations. + +=== Customizing Batching Strategy + +While `TokenCountBatchingStrategy` provides a robust default implementation, you can customize the batching strategy to fit your specific needs. +This can be done through Spring Boot's auto-configuration. + +To customize the batching strategy, define a `BatchingStrategy` bean in your Spring Boot application: + +[source,java] +---- +@Configuration +public class EmbeddingConfig { + @Bean + public BatchingStrategy customBatchingStrategy() { + return new CustomBatchingStrategy(); + } +} +---- + +This custom `BatchingStrategy` will then be automatically used by the `EmbeddingModel` implementations in your application. + +NOTE: Vector stores supported by Spring AI are configured to use the default `TokenCountBatchingStrategy`. +SAP Hana vector store is not currently configured for batching. + == Available Implementations These are the available implementations of the `VectorStore` interface: