Replies: 1 comment
-
Do you have an estimated release date for this feature? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
The
VectorStore
class interface should have asimilarity_search_with_score_by_vector()
function that can be implemented by all children classes.No one should have to implement
similarity_search_by_vector()
in theirVectorStore
child implementation, because you can just remove the scores from the results ofsimilarity_search_with_score_by_vector()
.This is really simple to implement, and would give much more capabilities to optimize retrieval processes.
Motivation
When implementing retrievers it is really important to have access to a
similarity_search_by_vector_with_score()
function, because that enables to avoid having to generate multiple times vectors for the same query (when building a retriever you might want to do multiple searches with the same query and embeddings, e.g. using different filters)But when doing a similarity search having access to the score of each match is absolutely mandatory.
Unfortunately the current interface for
VectorStore
only has:similarity_search
(no score)similarity_search_with_score
similarity_search_with_relevance_scores
(score normalized between 0 and 1)similarity_search_by_vector
(no score)So the maintainers are aware that getting a score with the similarity search is important! But the function to get
similarity_search_with_score_by_vector()
is painfully missing 😭And to be honest if we had to choose 1, the whole
similarity_search_by_vector
function is useless, because anyone can easily just remove the scores when they are provided.It's quite elementary, a similarity search method should always provide scores (and people can ignore them or not, it's easy to remove something that is not present, but the other way is not possible). Especially that they are already computed, all we need is not throw them away!
Note that when checking at the previous version of the Qdrant vectorstore (now deprecated for one that is not). the
similarity_search_with_score_by_vector()
was present at some point: https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/vectorstores.py#L536 and it is implemented in manylangchain_community
vectorstores (e.g. cassandra, couchbase, faiss...)And I really don't see why it couldn't be part of the
VectorStore
interface, it would just make the system clearer while giving much more control over performance to devs that are building complex retrieval systems (generating multiple times the same vector is really bad for runtime performance, customer satisfaction, cost, energy consumption and the whole planet overall!)Proposal (If applicable)
No one should have to implement
similarity_search_by_vector
in their VectorStore child implementation because you can just remove the scores from the results ofsimilarity_search_with_score_by_vector()
function in the mainVectorStore
class interface asNotImplemented
returning alist[tuple[Document, float]]
similarity_search_with_score_by_vector()
function in the mainVectorStore
class interface: callself.similarity_search_with_score_by_vector()
and remove the scoreI can implement it for you in
langchain_core
and theQdrantVectorStore
if you want and do a PR, then it could be added incrementally to other VectorStore implementations (it's not a breaking change, just a cheap but highly empowering change). Let me know!Beta Was this translation helpful? Give feedback.
All reactions