Skip to content

Commit 3a7c72b

Browse files
Add SQLServer 2025 embedding store integration (#432)
* Add SQLServer 2025 embedding store integration * Add SQLServer 2025 embedding store integration Fixed SQLServerEmbeddingStoreRemovalIT * Add SQLServer 2025 embedding store integration Make format fixes * Add SQLServer 2025 embedding store integration renamed EmbeddingStoreUtil and moved to have a package scope * Avoided SQL injection with EmbeddingTable properties * Created new SQLServerLangChain4jException extending LangChain4jException
1 parent 5b5b542 commit 3a7c72b

22 files changed

+2776
-0
lines changed
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# SQL Server Database Embedding Store
2+
3+
This module implements `EmbeddingStore` using SQL Server Database.
4+
5+
## Requirements
6+
- SQL Server 2025 or newer
7+
8+
### Supported Java Versions
9+
10+
Java >= 17
11+
12+
## Maven Dependency
13+
14+
```xml
15+
<dependency>
16+
<groupId>dev.langchain4j</groupId>
17+
<artifactId>langchain4j-community-sqlserver</artifactId>
18+
<version>1.9.0-beta16-SNAPSHOT</version>
19+
</dependency>
20+
```
21+
22+
## APIs
23+
24+
- `SQLServerEmbeddingStore`
25+
26+
## Usage
27+
28+
Instances of this store can be created by configuring a builder. The builder
29+
requires that a DataSource and an embedding table be provided.
30+
31+
It is recommended to configure a DataSource which pools connections, such as the
32+
Universal Connection Pool or Hikari. A connection pool will avoid the latency of
33+
repeatedly creating new database connections.
34+
35+
### Examples of Embedding Store Configuration
36+
37+
If an embedding table already exists in your database, provide the table configuration:
38+
39+
```java
40+
EmbeddingStore<TextSegment> embeddingStore = SQLServerEmbeddingStore.dataSourceBuilder()
41+
.dataSource(myDataSource)
42+
.embeddingTable(EmbeddingTable.builder()
43+
.name("my_embedding_table")
44+
.dimension(384) // Must specify dimension
45+
.build())
46+
.build();
47+
```
48+
49+
If the table does not already exist, it can be created by setting the create option:
50+
51+
```java
52+
EmbeddingStore<TextSegment> embeddingStore = SQLServerEmbeddingStore.dataSourceBuilder()
53+
.dataSource(myDataSource)
54+
.embeddingTable(EmbeddingTable.builder()
55+
.name("my_embedding_table")
56+
.createOption(CreateOption.CREATE) // Use CreateOption.CREATE_OR_REPLACE to replace the existing table
57+
.dimension(384) // Must specify dimension
58+
.build())
59+
.build();
60+
```
61+
62+
If the columns of your existing table do not match the predefined column names
63+
or you would like to use different column names, you can customize the table configuration:
64+
65+
```java
66+
SQLServerEmbeddingStore embeddingStore =
67+
SQLServerEmbeddingStore.dataSourceBuilder()
68+
.dataSource(myDataSource)
69+
.embeddingTable(EmbeddingTable.builder()
70+
.createOption(CreateOption.CREATE_OR_REPLACE)
71+
.name("my_embedding_table")
72+
.idColumn("id_column_name")
73+
.embeddingColumn("embedding_column_name")
74+
.textColumn("text_column_name")
75+
.metadataColumn("metadata_column_name")
76+
.dimension(1024)
77+
.build())
78+
.build();
79+
```
80+
81+
You can also configure the SQL Server connection directly without providing a DataSource:
82+
83+
```java
84+
SQLServerEmbeddingStore embeddingStore =
85+
SQLServerEmbeddingStore.connectionBuilder()
86+
.host("localhost")
87+
.port(1433)
88+
.database("MyDatabase")
89+
.userName("myuser")
90+
.password("mypassword")
91+
.embeddingTable(EmbeddingTable.builder()
92+
.name("embeddings")
93+
.createOption(CreateOption.CREATE_OR_REPLACE)
94+
.dimension(384)
95+
.build())
96+
.build();
97+
```
98+
99+
### Embeddings table schema
100+
101+
By default, the embedding table will have the following columns:
102+
103+
| Name | Type | Description |
104+
| ---- |-------------------| ----------- |
105+
| id | NVARCHAR(36) | Primary key. Used to store UUID strings which are generated when the embedding store |
106+
| embedding | VECTOR(dimension) | Stores the embedding using SQL Server 2025 native vector type |
107+
| text | NVARCHAR(MAX) | Stores the text segment |
108+
| metadata | JSON | Stores the metadata using SQL Server 2025 native JSON data type |
109+
110+
111+
## Important Notes
112+
113+
### Numeric Types
114+
All number values are written as JSON Strings in the metadata fields to avoid overflow issues with numbers as `Long.MAX_VALUE`.
115+
116+
### Vector Storage and Similarity
117+
SQL Server 2025+ supports native VECTOR data types and this module uses the [VECTOR_DISTANCE](https://learn.microsoft.com/en-us/sql/t-sql/functions/vector-distance-transact-sql?view=sql-server-ver17) similarity function.
118+
This module supports the following metrics for the `VECTOR_DISTANCE` function:
119+
120+
- **COSINE**: Cosine similarity (default)
121+
- **EUCLIDEAN**: Euclidean distance. The euclidean metric needs to perform some additional calculations to get the score from the distance.
122+
123+
### JSON Metadata Support
124+
125+
SQL Server 2025 provides native JSON data type support and JSON indexing capabilities. The module
126+
uses the native JSON data type for metadata storage and supports creating JSON indexes for
127+
optimized metadata filtering using [JSON_VALUE](https://learn.microsoft.com/es-es/sql/t-sql/functions/json-value-transact-sql?view=sql-server-ver17) function.
128+
129+
You can configure JSON index creation for specific metadata keys, optionally indicating the order of the keys:
130+
131+
```java
132+
EmbeddingTable embeddingTable = EmbeddingTable.builder()
133+
.name("test_table")
134+
.createOption(CreateOption.CREATE_OR_REPLACE)
135+
.dimension(4)
136+
.build();
137+
138+
SQLServerEmbeddingStore embeddingStore =
139+
SQLServerEmbeddingStore.dataSourceBuilder()
140+
.dataSource(myDataSource)
141+
.embeddingTable(embeddingTable)
142+
.addIndex(Index.jsonIndexBuilder()
143+
.createOption(CreateOption.CREATE_OR_REPLACE)
144+
.key("author", String.class, JSONIndexBuilder.Order.ASC)
145+
.key("year", Integer.class)
146+
.build()
147+
)
148+
.build();
149+
```
150+
151+
## Limitations
152+
153+
- Vector indexing performance depends on data size and distribution
154+
- DiskANN indexes on the vector column are not supported
155+
- The database collation should be set to a case-sensitive collation for metadata case-sensitive string comparisons
156+
- Distance DOT metric is not supported
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
3+
<modelVersion>4.0.0</modelVersion>
4+
5+
<parent>
6+
<groupId>dev.langchain4j</groupId>
7+
<artifactId>langchain4j-community</artifactId>
8+
<version>1.9.0-beta16-SNAPSHOT</version>
9+
<relativePath>../../pom.xml</relativePath>
10+
</parent>
11+
12+
<artifactId>langchain4j-community-sqlserver</artifactId>
13+
<name>LangChain4j :: Community :: Integration :: SQL Server</name>
14+
<description>SQL Server Database Embedding Store</description>
15+
16+
<properties>
17+
<mssql.jdbc.version>13.2.1.jre11</mssql.jdbc.version>
18+
</properties>
19+
20+
<dependencies>
21+
22+
<dependency>
23+
<groupId>dev.langchain4j</groupId>
24+
<artifactId>langchain4j-core</artifactId>
25+
<version>${langchain4j.core.version}</version>
26+
</dependency>
27+
<dependency>
28+
<groupId>com.microsoft.sqlserver</groupId>
29+
<artifactId>mssql-jdbc</artifactId>
30+
<version>${mssql.jdbc.version}</version>
31+
</dependency>
32+
33+
<!-- Tests extend the integration tests from the core module -->
34+
<dependency>
35+
<groupId>dev.langchain4j</groupId>
36+
<artifactId>langchain4j-core</artifactId>
37+
<version>${langchain4j.core.version}</version>
38+
<classifier>tests</classifier>
39+
<type>test-jar</type>
40+
<scope>test</scope>
41+
</dependency>
42+
<dependency>
43+
<groupId>org.junit.jupiter</groupId>
44+
<artifactId>junit-jupiter-engine</artifactId>
45+
<scope>test</scope>
46+
</dependency>
47+
<dependency>
48+
<groupId>org.junit.jupiter</groupId>
49+
<artifactId>junit-jupiter-params</artifactId>
50+
<scope>test</scope>
51+
</dependency>
52+
53+
<!-- Tests use TestContainers to create a SQL Server Database -->
54+
<dependency>
55+
<groupId>org.testcontainers</groupId>
56+
<artifactId>testcontainers</artifactId>
57+
<scope>test</scope>
58+
</dependency>
59+
<dependency>
60+
<groupId>org.testcontainers</groupId>
61+
<artifactId>junit-jupiter</artifactId>
62+
<scope>test</scope>
63+
</dependency>
64+
<dependency>
65+
<groupId>org.testcontainers</groupId>
66+
<artifactId>mssqlserver</artifactId>
67+
<scope>test</scope>
68+
</dependency>
69+
70+
<dependency>
71+
<groupId>org.apache.logging.log4j</groupId>
72+
<artifactId>log4j-api</artifactId>
73+
<scope>test</scope>
74+
</dependency>
75+
<dependency>
76+
<groupId>org.apache.logging.log4j</groupId>
77+
<artifactId>log4j-core</artifactId>
78+
<scope>test</scope>
79+
</dependency>
80+
<dependency>
81+
<groupId>org.apache.logging.log4j</groupId>
82+
<artifactId>log4j-slf4j2-impl</artifactId>
83+
<scope>test</scope>
84+
</dependency>
85+
86+
<dependency>
87+
<groupId>dev.langchain4j</groupId>
88+
<artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
89+
<scope>test</scope>
90+
</dependency>
91+
92+
</dependencies>
93+
94+
</project>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
package dev.langchain4j.store.embedding.sqlserver;
2+
3+
/**
4+
* Options which configure the creation of database schema objects, such as tables and indexes.
5+
*/
6+
public enum CreateOption {
7+
8+
/** No attempt is made to create the schema object. */
9+
CREATE_NONE,
10+
/** A new schema object is created. */
11+
CREATE,
12+
/** An existing schema object is dropped and replaced with a new one. */
13+
CREATE_OR_REPLACE
14+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
package dev.langchain4j.store.embedding.sqlserver;
2+
3+
/**
4+
* Enum representing different distance metrics that can be used for
5+
* similarity and relevance computations.
6+
*/
7+
public enum DistanceMetric {
8+
9+
/**
10+
* Cosine distance metric.
11+
*/
12+
COSINE("cosine"),
13+
/**
14+
* Euclidean distance metric.
15+
*/
16+
EUCLIDEAN("euclidean");
17+
18+
private final String metric;
19+
20+
private DistanceMetric(String name) {
21+
this.metric = name;
22+
}
23+
24+
/**
25+
* Returns the name of the metric.
26+
* @return the name of the metric
27+
*/
28+
public String getMetric() {
29+
return metric;
30+
}
31+
32+
/**
33+
* Converts distance returned by SQL Server VECTOR_DISTANCE function to relevance score.
34+
* Each metric uses its own conversion formula to map to [0-2] range:
35+
* - COSINE: relevance = 2 - distance (distance is between 0 and 2)
36+
* - EUCLIDEAN: uses exponential decay to convert [0, +∞] to [0-2] range
37+
*
38+
* @param distance the distance value returned by VECTOR_DISTANCE
39+
* @return the relevance score in [0-2] range using metric-specific conversion
40+
*/
41+
public double distanceToScore(double distance) {
42+
return switch (this) {
43+
case COSINE ->
44+
// For cosine: distance is in [0, 2], simply return 2 - distance
45+
2.0 - distance;
46+
case EUCLIDEAN ->
47+
// For euclidean: distance is in [0, +∞], use exponential decay
48+
// Formula: 2 * e^(-distance) maps [0, +∞] to [0, 2]
49+
2.0 * Math.exp(-distance);
50+
default -> throw new UnsupportedOperationException("Unsupported distance metric: " + this.metric);
51+
};
52+
}
53+
}

0 commit comments

Comments
 (0)