|
| 1 | +# SQL Server Database Embedding Store |
| 2 | + |
| 3 | +This module implements `EmbeddingStore` using SQL Server Database. |
| 4 | + |
| 5 | +## Requirements |
| 6 | +- SQL Server 2025 or newer |
| 7 | + |
| 8 | +### Supported Java Versions |
| 9 | + |
| 10 | +Java >= 17 |
| 11 | + |
| 12 | +## Maven Dependency |
| 13 | + |
| 14 | +```xml |
| 15 | +<dependency> |
| 16 | + <groupId>dev.langchain4j</groupId> |
| 17 | + <artifactId>langchain4j-community-sqlserver</artifactId> |
| 18 | + <version>1.9.0-beta16-SNAPSHOT</version> |
| 19 | +</dependency> |
| 20 | +``` |
| 21 | + |
| 22 | +## APIs |
| 23 | + |
| 24 | +- `SQLServerEmbeddingStore` |
| 25 | + |
| 26 | +## Usage |
| 27 | + |
| 28 | +Instances of this store can be created by configuring a builder. The builder |
| 29 | +requires that a DataSource and an embedding table be provided. |
| 30 | + |
| 31 | +It is recommended to configure a DataSource which pools connections, such as the |
| 32 | +Universal Connection Pool or Hikari. A connection pool will avoid the latency of |
| 33 | +repeatedly creating new database connections. |
| 34 | + |
| 35 | +### Examples of Embedding Store Configuration |
| 36 | + |
| 37 | +If an embedding table already exists in your database, provide the table configuration: |
| 38 | + |
| 39 | +```java |
| 40 | +EmbeddingStore<TextSegment> embeddingStore = SQLServerEmbeddingStore.dataSourceBuilder() |
| 41 | + .dataSource(myDataSource) |
| 42 | + .embeddingTable(EmbeddingTable.builder() |
| 43 | + .name("my_embedding_table") |
| 44 | + .dimension(384) // Must specify dimension |
| 45 | + .build()) |
| 46 | + .build(); |
| 47 | +``` |
| 48 | + |
| 49 | +If the table does not already exist, it can be created by setting the create option: |
| 50 | + |
| 51 | +```java |
| 52 | +EmbeddingStore<TextSegment> embeddingStore = SQLServerEmbeddingStore.dataSourceBuilder() |
| 53 | + .dataSource(myDataSource) |
| 54 | + .embeddingTable(EmbeddingTable.builder() |
| 55 | + .name("my_embedding_table") |
| 56 | + .createOption(CreateOption.CREATE) // Use CreateOption.CREATE_OR_REPLACE to replace the existing table |
| 57 | + .dimension(384) // Must specify dimension |
| 58 | + .build()) |
| 59 | + .build(); |
| 60 | +``` |
| 61 | + |
| 62 | +If the columns of your existing table do not match the predefined column names |
| 63 | +or you would like to use different column names, you can customize the table configuration: |
| 64 | + |
| 65 | +```java |
| 66 | +SQLServerEmbeddingStore embeddingStore = |
| 67 | +SQLServerEmbeddingStore.dataSourceBuilder() |
| 68 | + .dataSource(myDataSource) |
| 69 | + .embeddingTable(EmbeddingTable.builder() |
| 70 | + .createOption(CreateOption.CREATE_OR_REPLACE) |
| 71 | + .name("my_embedding_table") |
| 72 | + .idColumn("id_column_name") |
| 73 | + .embeddingColumn("embedding_column_name") |
| 74 | + .textColumn("text_column_name") |
| 75 | + .metadataColumn("metadata_column_name") |
| 76 | + .dimension(1024) |
| 77 | + .build()) |
| 78 | + .build(); |
| 79 | +``` |
| 80 | + |
| 81 | +You can also configure the SQL Server connection directly without providing a DataSource: |
| 82 | + |
| 83 | +```java |
| 84 | +SQLServerEmbeddingStore embeddingStore = |
| 85 | +SQLServerEmbeddingStore.connectionBuilder() |
| 86 | + .host("localhost") |
| 87 | + .port(1433) |
| 88 | + .database("MyDatabase") |
| 89 | + .userName("myuser") |
| 90 | + .password("mypassword") |
| 91 | + .embeddingTable(EmbeddingTable.builder() |
| 92 | + .name("embeddings") |
| 93 | + .createOption(CreateOption.CREATE_OR_REPLACE) |
| 94 | + .dimension(384) |
| 95 | + .build()) |
| 96 | + .build(); |
| 97 | +``` |
| 98 | + |
| 99 | +### Embeddings table schema |
| 100 | + |
| 101 | +By default, the embedding table will have the following columns: |
| 102 | + |
| 103 | +| Name | Type | Description | |
| 104 | +| ---- |-------------------| ----------- | |
| 105 | +| id | NVARCHAR(36) | Primary key. Used to store UUID strings which are generated when the embedding store | |
| 106 | +| embedding | VECTOR(dimension) | Stores the embedding using SQL Server 2025 native vector type | |
| 107 | +| text | NVARCHAR(MAX) | Stores the text segment | |
| 108 | +| metadata | JSON | Stores the metadata using SQL Server 2025 native JSON data type | |
| 109 | + |
| 110 | + |
| 111 | +## Important Notes |
| 112 | + |
| 113 | +### Numeric Types |
| 114 | +All number values are written as JSON Strings in the metadata fields to avoid overflow issues with numbers as `Long.MAX_VALUE`. |
| 115 | + |
| 116 | +### Vector Storage and Similarity |
| 117 | +SQL Server 2025+ supports native VECTOR data types and this module uses the [VECTOR_DISTANCE](https://learn.microsoft.com/en-us/sql/t-sql/functions/vector-distance-transact-sql?view=sql-server-ver17) similarity function. |
| 118 | +This module supports the following metrics for the `VECTOR_DISTANCE` function: |
| 119 | + |
| 120 | +- **COSINE**: Cosine similarity (default) |
| 121 | +- **EUCLIDEAN**: Euclidean distance. The euclidean metric needs to perform some additional calculations to get the score from the distance. |
| 122 | + |
| 123 | +### JSON Metadata Support |
| 124 | + |
| 125 | +SQL Server 2025 provides native JSON data type support and JSON indexing capabilities. The module |
| 126 | +uses the native JSON data type for metadata storage and supports creating JSON indexes for |
| 127 | +optimized metadata filtering using [JSON_VALUE](https://learn.microsoft.com/es-es/sql/t-sql/functions/json-value-transact-sql?view=sql-server-ver17) function. |
| 128 | + |
| 129 | +You can configure JSON index creation for specific metadata keys, optionally indicating the order of the keys: |
| 130 | + |
| 131 | +```java |
| 132 | +EmbeddingTable embeddingTable = EmbeddingTable.builder() |
| 133 | + .name("test_table") |
| 134 | + .createOption(CreateOption.CREATE_OR_REPLACE) |
| 135 | + .dimension(4) |
| 136 | + .build(); |
| 137 | + |
| 138 | +SQLServerEmbeddingStore embeddingStore = |
| 139 | + SQLServerEmbeddingStore.dataSourceBuilder() |
| 140 | + .dataSource(myDataSource) |
| 141 | + .embeddingTable(embeddingTable) |
| 142 | + .addIndex(Index.jsonIndexBuilder() |
| 143 | + .createOption(CreateOption.CREATE_OR_REPLACE) |
| 144 | + .key("author", String.class, JSONIndexBuilder.Order.ASC) |
| 145 | + .key("year", Integer.class) |
| 146 | + .build() |
| 147 | + ) |
| 148 | + .build(); |
| 149 | +``` |
| 150 | + |
| 151 | +## Limitations |
| 152 | + |
| 153 | +- Vector indexing performance depends on data size and distribution |
| 154 | +- DiskANN indexes on the vector column are not supported |
| 155 | +- The database collation should be set to a case-sensitive collation for metadata case-sensitive string comparisons |
| 156 | +- Distance DOT metric is not supported |
0 commit comments