Update VectorStore docs to use inner builder class

markpollack · leijendary · commit 37a20da526cf · 2025-01-24T01:11:17.000+01:00
Updated the manual configuration examples in the following docs to show
the correct usage of the inner builder class:

- azure.adoc: Show builder(searchIndexClient, embeddingModel) with all available options
- chroma.adoc: Show builder(chromaApi, embeddingModel) with collection config
- oracle.adoc: Show builder(jdbcTemplate, embeddingModel) with database options

The examples now reflect the current implementation where the builder takes
both the client and embedding model as constructor arguments.

Signed-off-by: leijendary &lt;jonathanleijendekker@gmail.com&gt;
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/apache-cassandra.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/apache-cassandra.adoc
@@ -97,90 +97,109 @@ You can use the following properties in your Spring Boot configuration to custom
 
 === Basic Usage
 
-Create a CassandraVectorStore instance using the builder pattern:
+Create a CassandraVectorStore instance as a Spring Bean:
 
 [source,java]
 ----
 @Bean
 public VectorStore vectorStore(CqlSession session, EmbeddingModel embeddingModel) {
-    return CassandraVectorStore.builder()
+    return CassandraVectorStore.builder(embeddingModel)
         .session(session)
-        .embeddingModel(embeddingModel)
         .keyspace("my_keyspace")
         .table("my_vectors")
         .build();
 }
 ----
 
-[NOTE]
-====
-The default configuration connects to Cassandra at `localhost:9042` and will automatically create a default schema in keyspace `springframework`, table `ai_vector_store`.
-====
-
-[NOTE]
-====
-The Cassandra Java Driver is easiest configured via an `application.conf` file on the classpath. More info https://github.com/apache/cassandra-java-driver/tree/4.x/manual/core/configuration[here].
-====
-
-Then in your main code, create and add some documents:
-
-[source,java]
-----
-List<Document> documents = List.of(
-    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", 
-        Map.of("country", "UK", "year", 2020)),
-    new Document("The World is Big and Salvation Lurks Around the Corner", 
-        Map.of()),
-    new Document("You walk forward facing the past and you turn back toward the future.", 
-        Map.of("country", "NL", "year", 2023)));
-
-vectorStore.add(documents);
-----
-
-And retrieve documents similar to a query:
+Once you have the vector store instance, you can add documents and perform searches:
 
 [source,java]
 ----
-List<Document> results = vectorStore.similaritySearch(
-    SearchRequest.builder().query("Spring").topK(5).build());
-----
+// Add documents
+vectorStore.add(List.of(
+    new Document("1", "content1", Map.of("key1", "value1")),
+    new Document("2", "content2", Map.of("key2", "value2"))
+));
 
-You can also limit results based on a similarity threshold:
-
-[source,java]
-----
+// Search with filters
 List<Document> results = vectorStore.similaritySearch(
-    SearchRequest.builder().query("Spring")
-        .topK(5)
-        .similarityThreshold(0.5d).build());
+    SearchRequest.query("search text")
+        .withTopK(5)
+        .withSimilarityThreshold(0.7f)
+        .withFilterExpression("metadata.key1 == 'value1'")
+);
 ----
 
 === Advanced Configuration
 
-For more complex scenarios, the builder pattern offers extensive configuration options:
+For more complex use cases, you can configure additional settings in your Spring Bean:
 
 [source,java]
 ----
 @Bean
 public VectorStore vectorStore(CqlSession session, EmbeddingModel embeddingModel) {
     return CassandraVectorStore.builder(embeddingModel)
+        .session(session)
         .keyspace("my_keyspace")
         .table("my_vectors")
-        .partitionKeys(List.of(new SchemaColumn("id", DataTypes.TEXT)))
-        .clusteringKeys(List.of(new SchemaColumn("timestamp", DataTypes.TIMESTAMP)))
+        // Configure primary keys
+        .partitionKeys(List.of(
+            new SchemaColumn("id", DataTypes.TEXT),
+            new SchemaColumn("category", DataTypes.TEXT)
+        ))
+        .clusteringKeys(List.of(
+            new SchemaColumn("timestamp", DataTypes.TIMESTAMP)
+        ))
+        // Add metadata columns with optional indexing
         .addMetadataColumns(
             new SchemaColumn("category", DataTypes.TEXT, SchemaColumnTags.INDEXED),
             new SchemaColumn("score", DataTypes.DOUBLE)
         )
+        // Customize column names
         .contentColumnName("text")
         .embeddingColumnName("vector")
+        // Performance tuning
         .fixedThreadPoolExecutorSize(32)
+        // Schema management
         .disallowSchemaChanges(false)
+        // Custom batching strategy
         .batchingStrategy(new TokenCountBatchingStrategy())
         .build();
 }
 ----
 
+=== Connection Configuration
+
+There are two ways to configure the connection to Cassandra:
+
+* Using an injected CqlSession (recommended):
+
+[source,java]
+----
+@Bean
+public VectorStore vectorStore(CqlSession session, EmbeddingModel embeddingModel) {
+    return CassandraVectorStore.builder(embeddingModel)
+        .session(session)
+        .keyspace("my_keyspace")
+        .table("my_vectors")
+        .build();
+}
+----
+
+* Using connection details directly in the builder:
+
+[source,java]
+----
+@Bean
+public VectorStore vectorStore(EmbeddingModel embeddingModel) {
+    return CassandraVectorStore.builder(embeddingModel)
+        .contactPoint(new InetSocketAddress("localhost", 9042))
+        .localDatacenter("datacenter1")
+        .keyspace("my_keyspace")
+        .build();
+}
+----
+
 === Metadata Filtering
 
 You can leverage the generic, portable metadata filters with the CassandraVectorStore. For metadata columns to be searchable they must be either primary keys or SAI indexed. To make non-primary-key columns indexed, configure the metadata column with the `SchemaColumnTags.INDEXED`.
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/azure-cosmos-db.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/azure-cosmos-db.adoc
@@ -148,84 +148,51 @@ List<Document> results = vectorStore.similaritySearch(SearchRequest.builder().qu
 
 The following code demonstrates how to set up the `CosmosDBVectorStore` without relying on auto-configuration:
 
-```java
-package com.example.demo;
-
-import com.azure.cosmos.CosmosAsyncClient;
-import com.azure.cosmos.CosmosClientBuilder;
-import io.micrometer.observation.ObservationRegistry;
-import org.springframework.ai.document.Document;
-import org.springframework.ai.embedding.EmbeddingModel;
-import org.springframework.ai.transformers.TransformersEmbeddingModel;
-import org.springframework.ai.vectorstore.cosmosdb.CosmosDBVectorStore;
-import org.springframework.ai.vectorstore.CosmosDBVectorStoreConfig;
-import org.springframework.ai.vectorstore.VectorStore;
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.boot.CommandLineRunner;
-import org.springframework.boot.SpringApplication;
-import org.springframework.boot.autoconfigure.SpringBootApplication;
-import org.springframework.context.annotation.Bean;
-import org.springframework.context.annotation.Lazy;
-
-import java.util.List;
-import java.util.Map;
-import java.util.UUID;
-
-@SpringBootApplication
-public class DemoApplication implements CommandLineRunner {
-
-    @Lazy
-    @Autowired
-    private VectorStore vectorStore;
-
-    @Lazy
-    @Autowired
-    private EmbeddingModel embeddingModel;
-
-    public static void main(String[] args) {
-        SpringApplication.run(DemoApplication.class, args);
-    }
-
-    @Override
-    public void run(String... args) throws Exception {
-        Document document1 = new Document(UUID.randomUUID().toString(), "Sample content1", Map.of("key1", "value1"));
-        Document document2 = new Document(UUID.randomUUID().toString(), "Sample content2", Map.of("key2", "value2"));
-		this.vectorStore.add(List.of(document1, document2));
-
-        List<Document> results = this.vectorStore.similaritySearch(SearchRequest.builder().query("Sample content").topK(1).build());
-        log.info("Search results: {}", results);
-    }
+[source,java]
+----
+@Bean
+public VectorStore vectorStore(ObservationRegistry observationRegistry) {
+    // Create the Cosmos DB client
+    CosmosAsyncClient cosmosClient = new CosmosClientBuilder()
+            .endpoint(System.getenv("COSMOSDB_AI_ENDPOINT"))
+            .key(System.getenv("COSMOSDB_AI_KEY"))
+            .userAgentSuffix("SpringAI-CDBNoSQL-VectorStore")
+            .gatewayMode()
+            .buildAsyncClient();
+
+    // Create and configure the vector store
+    return CosmosDBVectorStore.builder(cosmosClient, embeddingModel)
+            .databaseName("test-database")
+            .containerName("test-container")
+            // Configure metadata fields for filtering
+            .metadataFields(List.of("country", "year", "city"))
+            // Set the partition key path (optional)
+            .partitionKeyPath("/id")
+            // Configure performance settings
+            .vectorStoreThroughput(1000)
+            .vectorDimensions(1536)  // Match your embedding model's dimensions
+            // Add custom batching strategy (optional)
+            .batchingStrategy(new TokenCountBatchingStrategy())
+            // Add observation registry for metrics
+            .observationRegistry(observationRegistry)
+            .build();
+}
 
-    @Bean
-    public ObservationRegistry observationRegistry() {
-        return ObservationRegistry.create();
-    }
+@Bean
+public EmbeddingModel embeddingModel() {
+    return new TransformersEmbeddingModel();
+}
+----
 
-    @Bean
-    public VectorStore vectorStore(ObservationRegistry observationRegistry) {
-
-        CosmosAsyncClient cosmosClient = new CosmosClientBuilder()
-                .endpoint(System.getenv("COSMOSDB_AI_ENDPOINT"))
-                .userAgentSuffix("SpringAI-CDBNoSQL-VectorStore")
-                .key(System.getenv("COSMOSDB_AI_KEY"))
-                .gatewayMode()
-                .buildAsyncClient();
-
-		return CosmosDBVectorStore.builder(cosmosClient, this.embeddingModel)
-				.databaseName("test-database")
-				.containerName("test-container")
-				.metadataFields(List.of("country", "year", "city"))
-				.vectorStoreThroughput(1000)
-                .observationRegistry(observationRegistry)
-				.build();
-    }
+This configuration shows all the available builder options:
 
-    @Bean
-    public EmbeddingModel embeddingModel() {
-        return new TransformersEmbeddingModel();
-    }
-}
-```
+* `databaseName`: The name of your Cosmos DB database
+* `containerName`: The name of your container within the database
+* `partitionKeyPath`: The path for the partition key (e.g., "/id")
+* `metadataFields`: List of metadata fields that will be used for filtering
+* `vectorStoreThroughput`: The throughput (RU/s) for the vector store container
+* `vectorDimensions`: The number of dimensions for your vectors (should match your embedding model)
+* `batchingStrategy`: Strategy for batching document operations (optional)
 
 == Manual Dependency Setup
 
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/azure.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/azure.adoc
@@ -138,6 +138,9 @@ public VectorStore vectorStore(SearchIndexClient searchIndexClient, EmbeddingMod
     // in the similarity search filters.
     .filterMetadataFields(List.of(MetadataField.text("country"), MetadataField.int64("year"),
             MetadataField.date("activationDate")))
+    .defaultTopK(5)
+    .defaultSimilarityThreshold(0.7)
+    .indexName("spring-ai-document-index")
     .build();
 }
 ----
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/chroma.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/chroma.adoc
@@ -232,9 +232,9 @@ Integrate with OpenAI's embeddings by adding the Spring Boot OpenAI starter to y
 @Bean
 public VectorStore chromaVectorStore(EmbeddingModel embeddingModel, ChromaApi chromaApi) {
  return ChromaVectorStore.builder(chromaApi, embeddingModel)
-				.collectionName("TestCollection")
-				.initializeSchema(true)
-				.build();
+    .collectionName("TestCollection")
+    .initializeSchema(true)
+    .build();
 }
 ----
 
@@ -272,4 +272,3 @@ docker run -it --rm --name chroma -p 8000:8000 ghcr.io/chroma-core/chroma:0.5.20
 ```
 
 Starts a chroma store at <http://localhost:8000/api/v1>
-
diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/oracle.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/oracle.adoc
@@ -184,7 +184,14 @@ To configure the `OracleVectorStore` in your application, you can use the follow
 ----
 @Bean
 public VectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel) {
-	return new OracleVectorStore(jdbcTemplate, embeddingModel, true);
+    return OracleVectorStore.builder(jdbcTemplate, embeddingModel)
+        .tableName("my_vectors")
+        .indexType(OracleVectorStoreIndexType.IVF)
+        .distanceType(OracleVectorStoreDistanceType.COSINE)
+        .dimensions(1536)
+        .searchAccuracy(95)
+        .initializeSchema(true)
+        .build();
 }
 ----
 
@@ -199,5 +206,3 @@ You can then connect to the database using:
 ----
 sql mlops/mlops@localhost/freepdb1
 ----
-
-
diff --git a/vector-stores/spring-ai-typesense-store/src/main/java/org/springframework/ai/vectorstore/typesense/TypesenseVectorStore.java b/vector-stores/spring-ai-typesense-store/src/main/java/org/springframework/ai/vectorstore/typesense/TypesenseVectorStore.java