Skip to content

Conversation

@iocanel
Copy link
Contributor

@iocanel iocanel commented Mar 18, 2025

This is a post on how to use quarkus with vector databases to implement a similarity search example.

@github-actions
Copy link

github-actions bot commented Mar 18, 2025

🙈 The PR is closed and the preview is expired.

@iocanel iocanel force-pushed the similarity-search-using-vector-databases branch 2 times, most recently from ea893ff to 5008cb9 Compare March 18, 2025 14:15
@iocanel iocanel requested a review from geoand March 19, 2025 09:36
Copy link
Contributor

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat! I've added some comments but I would like @jmartisk to also review

@iocanel iocanel force-pushed the similarity-search-using-vector-databases branch from b99b1de to 6a4a667 Compare March 19, 2025 11:58
@iocanel
Copy link
Contributor Author

iocanel commented Mar 19, 2025

@geoand applied feedback.

Copy link
Member

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice article, I spotted a few typos here and there, HTH.

With LLMs becoming increasingly popular we often see them being used even for tasks that are not directly related to text generation.
Such case is using LLMs for recommendation systems. In this post we'll see how you can build such a system using https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j]
but without using LLMs. More specifically we'll create a simple movie similarity search system using a vector database. The role
of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this store is to abstract the underlying vector database through the `EmbeddingStore` interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure what you wanted to write but store looked odd?

Suggested change
of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this store is to abstract the underlying vector database through the `EmbeddingStore` interface.
of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this story is to abstract the underlying vector database through the `EmbeddingStore` interface.

but without using LLMs. More specifically we'll create a simple movie similarity search system using a vector database. The role
of https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus Langchain4j] in this store is to abstract the underlying vector database through the `EmbeddingStore` interface.

A relevant sample has been recently added to the https://github.com/quarkiverse/quarkus-langchain4j/tree/main/samples/[Quarkus Langchain4j samples].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix LangChain4j case everywhere?

</dependency>
----

To be able to use these dependencies without needing to specify versions, the bom can be add imported to the `dependencyManagement` of the project:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To be able to use these dependencies without needing to specify versions, the bom can be add imported to the `dependencyManagement` of the project:
To be able to use these dependencies without needing to specify versions, the bom can be added to the `dependencyManagement` of the project:

</dependency>
----

To properly use the in process embedding model we need to configure it in the `application.properties` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To properly use the in process embedding model we need to configure it in the `application.properties` file.
To properly use the in-process embedding model we need to configure it in the `application.properties` file.

----

To properly use the in process embedding model we need to configure it in the `application.properties` file.
We also need to configure the pgvector dimension an ensure it's aligned with the dimension of the embedding model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We also need to configure the pgvector dimension an ensure it's aligned with the dimension of the embedding model.
We also need to configure the pgvector dimension and ensure it's aligned with the dimension of the embedding model.

}
----

To use the CSV mapper, we'll need to `jackson` csv dataformat:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To use the CSV mapper, we'll need to `jackson` csv dataformat:
To use the CSV mapper, we'll need to add Jackson's CSV dataformat dependency:


==== Bringing it all together ====
The only thing that's left is to create a REST endpoint that will allow us to search for similar movies. We could also use a simple UI.
Let's start with the REST endpoint. It's pretty straight forward. We need to methods one for movie searching and one for searching similar movies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Let's start with the REST endpoint. It's pretty straight forward. We need to methods one for movie searching and one for searching similar movies.
Let's start with the REST endpoint. It's pretty straightforward. We need two methods, one for searching movies and one for searching similar movies.


The key elements of that page are:

* movie-box: a text filed for entering the movie title
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* movie-box: a text filed for entering the movie title
* movie-box: a text field for entering the movie title

* movie-poster: an image for displaying the movie poster
* similar-results: an additional unordered list for displaying the similar movies

It's important to remember that the `Movie` entity is using `jackson` to map the CSV columns to the entity fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's important to remember that the `Movie` entity is using `jackson` to map the CSV columns to the entity fields.
It's important to remember that the `Movie` entity is using Jackson to map the CSV columns to the entity fields.

</html>
----

I won't go into much detail about the hmtl code as it's outside the scope of this post.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
I won't go into much detail about the hmtl code as it's outside the scope of this post.
I won't go into much detail about the HTML code as it's outside the scope of this post.

@iocanel iocanel force-pushed the similarity-search-using-vector-databases branch from 6a4a667 to 137c925 Compare March 22, 2025 16:59
@iocanel
Copy link
Contributor Author

iocanel commented Mar 26, 2025

@gsmet @geoand: Forgot to mention that I've applied the feedback.

@iocanel iocanel merged commit 55f2c32 into quarkusio:main Mar 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants