-
Notifications
You must be signed in to change notification settings - Fork 395
Post Similarity search using vector databases #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post Similarity search using vector databases #2261
Conversation
|
🙈 The PR is closed and the preview is expired. |
ea893ff to
5008cb9
Compare
geoand
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very neat! I've added some comments but I would like @jmartisk to also review
_posts/2025-03-18-movie-similarity-search-using-vector-databases.adoc
Outdated
Show resolved
Hide resolved
_posts/2025-03-18-movie-similarity-search-using-vector-databases.adoc
Outdated
Show resolved
Hide resolved
_posts/2025-03-18-movie-similarity-search-using-vector-databases.adoc
Outdated
Show resolved
Hide resolved
b99b1de to
6a4a667
Compare
|
@geoand applied feedback. |
gsmet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice article, I spotted a few typos here and there, HTH.
| With LLMs becoming increasingly popular we often see them being used even for tasks that are not directly related to text generation. | ||
| Such case is using LLMs for recommendation systems. In this post we'll see how you can build such a system using https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] | ||
| but without using LLMs. More specifically we'll create a simple movie similarity search system using a vector database. The role | ||
| of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this store is to abstract the underlying vector database through the `EmbeddingStore` interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure what you wanted to write but store looked odd?
| of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this store is to abstract the underlying vector database through the `EmbeddingStore` interface. | |
| of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this story is to abstract the underlying vector database through the `EmbeddingStore` interface. |
| but without using LLMs. More specifically we'll create a simple movie similarity search system using a vector database. The role | ||
| of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this store is to abstract the underlying vector database through the `EmbeddingStore` interface. | ||
|
|
||
| A relevant sample has been recently added to the https://github.com/quarkiverse/quarkus-langchain4j/tree/main/samples/[Quarkus Langchain4j samples]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix LangChain4j case everywhere?
| </dependency> | ||
| ---- | ||
|
|
||
| To be able to use these dependencies without needing to specify versions, the bom can be add imported to the `dependencyManagement` of the project: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To be able to use these dependencies without needing to specify versions, the bom can be add imported to the `dependencyManagement` of the project: | |
| To be able to use these dependencies without needing to specify versions, the bom can be added to the `dependencyManagement` of the project: |
| </dependency> | ||
| ---- | ||
|
|
||
| To properly use the in process embedding model we need to configure it in the `application.properties` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To properly use the in process embedding model we need to configure it in the `application.properties` file. | |
| To properly use the in-process embedding model we need to configure it in the `application.properties` file. |
| ---- | ||
|
|
||
| To properly use the in process embedding model we need to configure it in the `application.properties` file. | ||
| We also need to configure the pgvector dimension an ensure it's aligned with the dimension of the embedding model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| We also need to configure the pgvector dimension an ensure it's aligned with the dimension of the embedding model. | |
| We also need to configure the pgvector dimension and ensure it's aligned with the dimension of the embedding model. |
| } | ||
| ---- | ||
|
|
||
| To use the CSV mapper, we'll need to `jackson` csv dataformat: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To use the CSV mapper, we'll need to `jackson` csv dataformat: | |
| To use the CSV mapper, we'll need to add Jackson's CSV dataformat dependency: |
|
|
||
| ==== Bringing it all together ==== | ||
| The only thing that's left is to create a REST endpoint that will allow us to search for similar movies. We could also use a simple UI. | ||
| Let's start with the REST endpoint. It's pretty straight forward. We need to methods one for movie searching and one for searching similar movies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Let's start with the REST endpoint. It's pretty straight forward. We need to methods one for movie searching and one for searching similar movies. | |
| Let's start with the REST endpoint. It's pretty straightforward. We need two methods, one for searching movies and one for searching similar movies. |
|
|
||
| The key elements of that page are: | ||
|
|
||
| * movie-box: a text filed for entering the movie title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * movie-box: a text filed for entering the movie title | |
| * movie-box: a text field for entering the movie title |
| * movie-poster: an image for displaying the movie poster | ||
| * similar-results: an additional unordered list for displaying the similar movies | ||
|
|
||
| It's important to remember that the `Movie` entity is using `jackson` to map the CSV columns to the entity fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| It's important to remember that the `Movie` entity is using `jackson` to map the CSV columns to the entity fields. | |
| It's important to remember that the `Movie` entity is using Jackson to map the CSV columns to the entity fields. |
| </html> | ||
| ---- | ||
|
|
||
| I won't go into much detail about the hmtl code as it's outside the scope of this post. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| I won't go into much detail about the hmtl code as it's outside the scope of this post. | |
| I won't go into much detail about the HTML code as it's outside the scope of this post. |
6a4a667 to
137c925
Compare
This is a post on how to use quarkus with vector databases to implement a similarity search example.