Vectore Store similaritySearchWithScore (NodeJS) vs similarity_search_with_relevance_scores (Python) #4894
Replies: 2 comments
-
The discrepancy in the average scores returned by the similarity search functions in Python and NodeJS when using a PGVectorStore with the same query could stem from several factors, including differences in the implementation of the similarity search functions, the handling of floating-point arithmetic, or the configuration of the PGVectorStore (e.g., distance strategy). The Python function The Without the Python implementation for comparison, pinpointing the exact cause of the discrepancy is challenging. It would be helpful to provide the Python implementation of Sources
|
Beta Was this translation helpful? Give feedback.
-
I was actually looking at this today and noticed that the PGVector and some other vector store in the JS implementations are returning the distance instead of the similarity score. This means that a low distance value means a higher similarity. While using the retriever on its own doesn't really matter (because the top documents are still returned), this make it impossible to use something like the ScoreThresholdRetriever, which assumes that the score returned is the similarity score and it can only filter documents based on a minimum score (and not a maximum score, which would be required for distance values). I just got started working with Langchain, so I don't know if there is a reason for returning the distance for some vectorstore vs standardizing on the similarity score, but since the score is exposed and many people are probably using it, changing from the distance to the similarity score would probably create a breaking change for some people. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Python code:
Description
I am playing around with the similarity search functions in both Python and NodeJS.
I have a PGVectorStore and doing a simple similarity search with score.
I noticed the average score returned in python script is easily above .82 👍
But, the same functionality in NodeJS script is around 0.12 to 0.15 👎
This experiment is with the same query which is a simple string question.
I am not sure if similarity_search_with_relevance_scores (Python) is equivalent to similaritySearchWithScore() in NodeJS? I am just looking for the same functionality in NodeJS.
🤔 The thought that the score in NodeJS get subtracted by 1 did cross my mind, but I'm not sure if that is right.
Any help will be appreciated 🙏
System Info
NodeJS specs:
"@langchain/openai": "^0.0.21",
"langchain": "^0.1.29",
node version: 18
Python specs:
Beta Was this translation helpful? Give feedback.
All reactions