-
Notifications
You must be signed in to change notification settings - Fork 2k
Add support for Elasticsearch vector store #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just finished up the Elasticsearch Vector Store support.
Let me know your thoughts. |
This is a great contribution. I believe @tzolov has started to review. I'll put it into 0.8.0 as a stretch goal for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JM-Lab, thank you for the contribution.
I started with a quick code check and have few questions related to the chosen ES client library and consequently the custom vs library query classes/records.
<version>${parent.version}</version> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be using the elasticsearch-java
client instead?
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>xxx</version>
</dependency>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to use the Java Low Level REST Client (https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/8.12/java-rest-low.html) with minimal dependencies instead of the elasticsearch-java library's High Level Rest Client. This decision was based on the High Level Rest Client's heavy reliance on dependencies and its sensitivity to Elasticsearch server versions, requiring exact matching with versions 7, 8, and even minor releases. By implementing only the essential data classes for Vectorstore, I successfully tested compatibility with both version 7 and 8.
public static final String COSINE_SIMILARITY_FUNCTION = | ||
"(cosineSimilarity(params.query_vector, 'embedding') + 1.0) / 2"; | ||
|
||
public record ElasticsearchBulkIndexId( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all those custom records? Wouldn't the co.elastic client already provide such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused on simplicity by implementing only the necessary custom classes/records for Vectorstore, minimizing dependencies and Elasticsearch version sensitivity as mentioned earlier.
@JsonProperty("query") | ||
Query query | ||
) { | ||
public record Query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example the isn't this already covered by the co.elastic.clients.elasticsearch._types.query_dsl.Query
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused on simplicity by implementing only the necessary custom classes/records for Vectorstore, minimizing dependencies and Elasticsearch version sensitivity as mentioned earlier.
|
||
private ElasticsearchScriptScoreQuery getElasticsearchSimilarityQuery(List<Double> embedding, int topK, | ||
float similarityThreshold, Filter.Expression filterExpression) { | ||
return new ElasticsearchScriptScoreQuery(topK, new Query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again wouldn't the co.elastic.clients.elasticsearch._types.query_dsl.ScriptScoreQuery
help here in places of custom records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused on simplicity by implementing only the necessary custom classes/records for Vectorstore, minimizing dependencies and Elasticsearch version sensitivity as mentioned earlier.
If we use elasticsearch-java, we might need to either split Vectorstore for Elasticsearch 7 and 8, or let users add elasticsearch-java version matching their elasticsearch server version. What's your preference? I'd love to hear your thoughts on this. |
I think moving to use the latest version that focuses on elastic search 8 and to build upon the elasticsearch-java will be the best use of everyone's time. Sorry for the delay in responding, but we are still very interested in this contribution and appreciate your interest/help. |
@markpollack I understand, and I'll update to support the latest version, Elasticsearch 8, and commit soon. |
@markpollack @tzolov Review code with Elasticsearch 8 support using elasticsearch-java library as discussed. |
…tability VertexAiGeminiAutoConfiguration creates its own FunctionCallbackContext instance with schema type set to SchemaType.OPEN_API_SCHEMA. This ensures that one other (non OPEN_API_SCHEMA) FucntionCallbackContext Bean overrds it.
- Make it use the new OpenAiAudioApi. - Remove trascription code from spring-ai-core. Too early to generalize. Move all related code under the spring-ai-openai project. - Fix missing licenses and javadocs. - Add 'Audio' prefix for Transcription classes and packages. - Add missing auto-configuraiotn and tests.
- Move the OpenAiAudioTranscriptionClient and OpenAiAudioTranscriptionOptions under the org.springframework.ai.openai package. - Rename the AudioTranscriptionRequest into AudioTranscriptionPrompt. - Minor imports cleaning.
* root project pom now sets <dependencyManagement>remove</dependencyManagement> * spring-ai-bom pom flatten plugin added with <dependencyManagement>keep</dependencyManagement>
* Add test code
Unrelated to this change, the Neo4j test version increased to be current.
Hi @JM-Lab , After this the filter expression test passes but the IT fails for half of the tests. Please have a look, hopefully you would be able to resolve those. |
- tename /api/clients/ into /api/chat - move the the image from /api/clients to /api - fix the layout inside the chat and embeddings docs. Moving the runtime options and sample controllers at top level. - adjust all affected links.
- Add VectorSearchAggregation used to actually preform the search on a given collection with embeddings. - add MongoDBVectorStore - Add MongoDBVectorStoreIT. Integration test runs fine given... - You have a mongo atlas cluster to connect to (local or remote) - You have the search index "spring_ai_vector_search" setup correctly - Need to explore getting around this - Need to filter results using threshold - Add postfilter for threshold values - While a post filter is not ideal, it gets the job done. The mongo team seems to be working on having it availible as a prefilter option, in which this implementation can be updated to use later. - implement filtering threshold - fix a few sonar issues - formatting - use higher default num_candidates - use builder for configuration - add documentation and some refactor - use consistent property in integration test - finish implementing filter support - add documentation to filter converter - add vector search index auto creation - Add to BOM. - Fix version to 1.0.0-SN. - Move expresion converter from core to models/mongodb. - Fix style and license headers
- Implemented Bedrock Jurassic ChatClient - added documentation reference - implement auto-configuration and boot starter - Disable the BedrockAi21Jurassic2ChatClientIT.emojiPenaltyWhenTrueByDefaultApplyPenaltyTest() test as it fails when run in combination with the other tests.
`Reuse Container` is a Testcontainers experimental feature. It requires `testcontainers.reuse.enable=true` in `~/.testcontainers.properties` in order to take effect but in order to avoid surprises, this commit remove it. See https://java.testcontainers.org/features/reuse/
…b/spring-ai into elasticsearch-vector-store
…arent pom and Bom pom
…es the use of epoch milliseconds
Hi @tzolov I've reviewed the PR. Please review the changes. Looking forward to your feedback! |
Thanks @JM-Lab , |
Thanks @JM-Lab I will have a look |
This PR adds supports for Elasticsearch vector search.
To achieve compatibility with both Elasticsearch versions 7 and 8, I implemented it using Elasticsearch's Java Low Level REST Client.
To-do: