-
Notifications
You must be signed in to change notification settings - Fork 69
[FEATURE] Support OceanBase Embedding Store #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SimonChou12138
wants to merge
9
commits into
langchain4j:main
Choose a base branch
from
SimonChou12138:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
640641b
[FEATURE] Support OceanBase Embedding Store
SimonChou12138 bbba7de
feat: make format
Martin7-1 c7a3c41
Merge branch 'langchain4j:main' into main
SimonChou12138 d617d32
[FEATURE] Adjust some suggestions in the ocean embedding store review
SimonChou12138 8885bd0
[FEATURE] Adjust the default distance converter of OceanBaseEmbedding…
SimonChou12138 e2e77a5
Merge branch 'langchain4j:main' into main
SimonChou12138 871a033
[FEATURE] Dealing with issues related to license compliance checks
SimonChou12138 24e964f
Merge branch 'langchain4j:main' into main
SimonChou12138 e67f10c
[FEATURE] Adjust test cases and use EmbeddingStoreIT
SimonChou12138 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
208 changes: 208 additions & 0 deletions
208
embedding-stores/langchain4j-community-oceanbase/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,208 @@ | ||
| # OceanBase Vector Store for LangChain4j | ||
|
|
||
| This module implements an `EmbeddingStore` backed by an OceanBase database. | ||
|
|
||
| - [Product Documentation](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000002826816) | ||
|
|
||
| The **OceanBase for LangChain4j** package provides a first-class experience for connecting to OceanBase instances from the LangChain4j ecosystem while providing the following benefits: | ||
|
|
||
| - **Simplified Vector Storage**: Utilize OceanBase's vector data types and indexing capabilities for efficient similarity searches. | ||
| - **Improved Metadata Handling**: Store metadata in JSON columns instead of strings, resulting in significant performance improvements. | ||
| - **Clear Separation**: Clearly separate table and extension creation, allowing for distinct permissions and streamlined workflows. | ||
| - **Better Integration with OceanBase**: Built-in methods to take advantage of OceanBase's advanced indexing and scalability capabilities. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| In order to use this library, you first need to go through the following steps: | ||
|
|
||
| 1. [Install OceanBase Database](https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000002012734) | ||
| 2. Create a database and user | ||
| 3. Configure vector memory limits (if needed) | ||
|
|
||
| ### Maven Dependency | ||
|
|
||
| ```xml | ||
| <dependency> | ||
| <groupId>dev.langchain4j</groupId> | ||
| <artifactId>langchain4j-community-oceanbase</artifactId> | ||
| <version>1.2.0-beta8-SNAPSHOT</version> | ||
| </dependency> | ||
| ``` | ||
|
|
||
| ### Supported Java Versions | ||
|
|
||
| Java >= 17 | ||
|
|
||
| ## OceanBaseEmbeddingStore Usage | ||
|
|
||
| `OceanBaseEmbeddingStore` is used to store text embedded data and perform vector search. Instances can be created by configuring the provided `Builder`, which requires: | ||
|
|
||
| - A `DataSource` instance (connected to an OceanBase database) | ||
| - Table name | ||
| - Table configuration (optional, uses standard configuration by default) | ||
| - Exact search option (optional, uses approximate search by default) | ||
|
|
||
| Example usage: | ||
|
|
||
| ```java | ||
| import dev.langchain4j.community.store.embedding.oceanbase.OceanBaseEmbeddingStore; | ||
| import dev.langchain4j.community.store.embedding.oceanbase.EmbeddingTable; | ||
| import dev.langchain4j.community.store.embedding.oceanbase.CreateOption; | ||
| import dev.langchain4j.data.document.Metadata; | ||
| import dev.langchain4j.data.embedding.Embedding; | ||
| import dev.langchain4j.data.segment.TextSegment; | ||
| import dev.langchain4j.store.embedding.EmbeddingMatch; | ||
| import dev.langchain4j.store.embedding.EmbeddingSearchRequest; | ||
|
|
||
| import javax.sql.DataSource; | ||
| import java.util.*; | ||
|
|
||
| // Create a data source | ||
| DataSource dataSource = createDataSource(); // You need to implement this method | ||
|
|
||
| // Create a vector store | ||
| OceanBaseEmbeddingStore store = OceanBaseEmbeddingStore.builder(dataSource) | ||
| .embeddingTable( | ||
| EmbeddingTable.builder("my_embeddings") | ||
| .vectorDimension(384) // Set vector dimension | ||
| .createOption(CreateOption.CREATE_IF_NOT_EXISTS) | ||
| .build()) | ||
| .build(); | ||
|
|
||
| // Add embeddings | ||
| List<String> testTexts = Arrays.asList("apple", "banana", "car", "truck"); | ||
| List<Embedding> embeddings = new ArrayList<>(); | ||
| List<TextSegment> textSegments = new ArrayList<>(); | ||
|
|
||
| for (String text : testTexts) { | ||
| Map<String, String> metaMap = new HashMap<>(); | ||
| metaMap.put("category", text.length() <= 5 ? "fruit" : "vehicle"); | ||
| Metadata metadata = Metadata.from(metaMap); | ||
| textSegments.add(TextSegment.from(text, metadata)); | ||
| embeddings.add(myEmbeddingModel.embed(text).content()); // Use your embedding model | ||
| } | ||
|
|
||
| // Batch add embeddings and text segments | ||
| List<String> ids = store.addAll(embeddings, textSegments); | ||
|
|
||
| // Search for similar vectors | ||
| EmbeddingSearchRequest request = EmbeddingSearchRequest.builder() | ||
| .queryEmbedding(embeddings.get(0)) // Search for content similar to "apple" | ||
| .maxResults(10) | ||
| .minScore(0.7) | ||
| .build(); | ||
|
|
||
| List<EmbeddingMatch<TextSegment>> results = store.search(request).matches(); | ||
|
|
||
| // Use metadata filtering | ||
| import dev.langchain4j.store.embedding.filter.MetadataFilterBuilder; | ||
|
|
||
| // Search only in the "fruit" category | ||
| EmbeddingSearchRequest filteredRequest = EmbeddingSearchRequest.builder() | ||
| .queryEmbedding(embeddings.get(0)) | ||
| .maxResults(10) | ||
| .filter(MetadataFilterBuilder.metadataKey("category").isEqualTo("fruit")) | ||
| .build(); | ||
|
|
||
| List<EmbeddingMatch<TextSegment>> filteredResults = store.search(filteredRequest).matches(); | ||
|
|
||
| // Remove embeddings | ||
| store.remove(ids.get(0)); // Remove a single vector | ||
| store.removeAll(Arrays.asList(ids.get(1), ids.get(2))); // Remove multiple vectors | ||
| store.removeAll(MetadataFilterBuilder.metadataKey("category").isEqualTo("fruit")); // Remove by metadata | ||
| store.removeAll(); // Remove all vectors | ||
| ``` | ||
|
|
||
| ## EmbeddingTable Configuration | ||
|
|
||
| The `EmbeddingTable` class is used to configure the structure and creation options of the vector table: | ||
|
|
||
| ```java | ||
| import dev.langchain4j.community.store.embedding.oceanbase.EmbeddingTable; | ||
| import dev.langchain4j.community.store.embedding.oceanbase.CreateOption; | ||
|
|
||
| // Basic configuration | ||
| EmbeddingTable table = EmbeddingTable.builder("my_embeddings") | ||
| .vectorDimension(384) // Set vector dimension | ||
| .createOption(CreateOption.CREATE_IF_NOT_EXISTS) | ||
| .build(); | ||
|
|
||
| // Advanced configuration | ||
| EmbeddingTable advancedTable = EmbeddingTable.builder("advanced_embeddings") | ||
| .idColumn("custom_id") // Custom ID column name | ||
| .embeddingColumn("vector_data") // Custom vector column name | ||
| .textColumn("content") // Custom text column name | ||
| .metadataColumn("meta_info") // Custom metadata column name | ||
| .vectorDimension(768) // Set vector dimension | ||
| .vectorIndexName("idx_vector_search") // Custom index name | ||
| .distanceMetric("L2") // Set distance metric (L2, IP, COSINE) | ||
| .indexType("hnsw") // Set index type (hnsw, flat) | ||
| .createOption(CreateOption.CREATE_OR_REPLACE) // Table creation option | ||
| .build(); | ||
| ``` | ||
|
|
||
| ### Table Creation Options | ||
|
|
||
| The `CreateOption` enum provides the following options: | ||
|
|
||
| - `CREATE_NONE`: Do not create a table, assumes the table already exists | ||
| - `CREATE_IF_NOT_EXISTS`: Create the table if it does not exist (default) | ||
| - `CREATE_OR_REPLACE`: Create the table, replacing it if it exists | ||
|
|
||
| ## Search Options | ||
|
|
||
| `OceanBaseEmbeddingStore` supports two search modes: | ||
|
|
||
| - **Approximate Search** (default): Faster but may not be 100% accurate | ||
| - **Exact Search**: Slower but 100% accurate | ||
|
|
||
| ```java | ||
| // Use exact search | ||
| OceanBaseEmbeddingStore exactStore = OceanBaseEmbeddingStore.builder(dataSource) | ||
| .embeddingTable("my_embeddings") | ||
| .exactSearch(true) // Enable exact search | ||
| .build(); | ||
| ``` | ||
|
|
||
| ## Metadata Filtering | ||
|
|
||
| Complex metadata filtering conditions can be built using `MetadataFilterBuilder`: | ||
|
|
||
| ```java | ||
| import dev.langchain4j.store.embedding.filter.MetadataFilterBuilder; | ||
|
|
||
| // Basic filtering | ||
| EmbeddingSearchRequest request = EmbeddingSearchRequest.builder() | ||
| .queryEmbedding(queryEmbedding) | ||
| .filter(MetadataFilterBuilder.metadataKey("category").isEqualTo("fruit")) | ||
| .build(); | ||
|
|
||
| // Combined filtering | ||
| EmbeddingSearchRequest complexRequest = EmbeddingSearchRequest.builder() | ||
| .queryEmbedding(queryEmbedding) | ||
| .filter( | ||
| MetadataFilterBuilder.metadataKey("category").isEqualTo("fruit") | ||
| .and(MetadataFilterBuilder.metadataKey("color").isEqualTo("red")) | ||
| ) | ||
| .build(); | ||
| ``` | ||
|
|
||
| ## Performance Optimization | ||
|
|
||
| For optimal performance, consider the following recommendations: | ||
|
|
||
| 1. Adjust OceanBase's vector memory limit appropriately for large vector collections: | ||
| ```sql | ||
| ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30; | ||
| ``` | ||
|
|
||
| 2. Choose the index type that suits your use case: | ||
| - `hnsw`: Suitable for most scenarios, provides a good balance of performance/accuracy | ||
| - `flat`: Use when highest accuracy is required | ||
|
|
||
| 3. Add embeddings in batches to improve performance: | ||
| ```java | ||
| store.addAll(embeddings, textSegments); | ||
| ``` | ||
|
|
||
| 4. For frequent similarity searches, consider using approximate search mode (default). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| <?xml version="1.0" encoding="UTF-8"?> | ||
| <project xmlns="http://maven.apache.org/POM/4.0.0" | ||
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
| xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
| <modelVersion>4.0.0</modelVersion> | ||
| <parent> | ||
| <groupId>dev.langchain4j</groupId> | ||
| <artifactId>langchain4j-community</artifactId> | ||
| <version>1.2.0-beta8-SNAPSHOT</version> | ||
| <relativePath>../../pom.xml</relativePath> | ||
| </parent> | ||
|
|
||
| <artifactId>langchain4j-community-oceanbase</artifactId> | ||
| <name>LangChain4j :: Community :: Integration :: OceanBase</name> | ||
| <description>Community Integration with OceanBase Vector Storage</description> | ||
|
|
||
| <dependencies> | ||
| <dependency> | ||
| <groupId>dev.langchain4j</groupId> | ||
| <artifactId>langchain4j-core</artifactId> | ||
| <version>${langchain4j.core.version}</version> | ||
| </dependency> | ||
|
|
||
| <!-- JDBC Driver for OceanBase --> | ||
| <dependency> | ||
| <groupId>com.oceanbase</groupId> | ||
| <artifactId>oceanbase-client</artifactId> | ||
| <version>2.4.5</version> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.slf4j</groupId> | ||
| <artifactId>slf4j-api</artifactId> | ||
| </dependency> | ||
|
|
||
| <!-- For JSON handling --> | ||
| <dependency> | ||
| <groupId>com.fasterxml.jackson.core</groupId> | ||
| <artifactId>jackson-databind</artifactId> | ||
| </dependency> | ||
|
|
||
| <!-- Test dependencies --> | ||
| <dependency> | ||
| <groupId>dev.langchain4j</groupId> | ||
| <artifactId>langchain4j-core</artifactId> | ||
| <version>1.1.0-SNAPSHOT</version> | ||
| <classifier>tests</classifier> | ||
| <type>test-jar</type> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.junit.jupiter</groupId> | ||
| <artifactId>junit-jupiter</artifactId> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.assertj</groupId> | ||
| <artifactId>assertj-core</artifactId> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.testcontainers</groupId> | ||
| <artifactId>testcontainers</artifactId> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>org.testcontainers</groupId> | ||
| <artifactId>junit-jupiter</artifactId> | ||
| <scope>test</scope> | ||
| </dependency> | ||
|
|
||
| <dependency> | ||
| <groupId>ch.qos.logback</groupId> | ||
| <artifactId>logback-classic</artifactId> | ||
|
||
| <scope>test</scope> | ||
| </dependency> | ||
| </dependencies> | ||
| </project> | ||
21 changes: 21 additions & 0 deletions
21
...nbase/src/main/java/dev/langchain4j/community/store/embedding/oceanbase/CreateOption.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| package dev.langchain4j.community.store.embedding.oceanbase; | ||
|
|
||
| /** | ||
| * Option for creating a table. | ||
| */ | ||
| public enum CreateOption { | ||
| /** | ||
| * Do not create a table. This assumes that the table already exists. | ||
| */ | ||
| CREATE_NONE, | ||
|
|
||
| /** | ||
| * Create the table if it does not exist. | ||
| */ | ||
| CREATE_IF_NOT_EXISTS, | ||
|
|
||
| /** | ||
| * Create the table, replacing an existing table if it exists. | ||
| */ | ||
| CREATE_OR_REPLACE | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it~