Asking for similarityscores when using a FAISSVector store seems to return values that are not in the -1,0,1 range. Is this intended? #3133
Replies: 1 comment 1 reply
-
🤖 Hello @garethcthomasdev, I'm here to assist you with your query. I'm an AI bot designed to help with bug resolution, answer questions, and guide you in contributing to the project while you're waiting for a human maintainer. Regarding your question about the 'similaritySearchVectorWithScore' function in the FAISS Vector store of the LangChainJS framework, it's important to note that this function does not explicitly normalize the distance scores to be within the -1,0,1 range. The function returns the distances as computed by the FAISS library, which are not necessarily within this range. The 'similaritySearchVectorWithScore' function returns an array of tuples, each containing a Document and its corresponding score. The score is a number that represents the distance between the query vector and the vector of the document in the FAISS index. The range of these scores depends on the distance metric used in the FAISS index. In this case, the IndexFlatL2 is used, which calculates the L2 (Euclidean) distance. The L2 distance is always non-negative, so the scores will be in the range of [0, Infinity). A lower score indicates a closer match to the query vector. Here is the relevant code: async similaritySearchVectorWithScore(query: number[], k: number) {
const d = this.index.getDimension();
if (query.length !== d) {
throw new Error(
`Query vector must have the same length as the number of dimensions (${d})`
);
}
if (k > this.index.ntotal()) {
const total = this.index.ntotal();
console.warn(
`k (${k}) is greater than the number of elements in the index (${total}), setting k to ${total}`
);
// eslint-disable-next-line no-param-reassign
k = total;
}
const result = this.index.search(query, k);
return result.labels.map((id, index) => {
const uuid = this._mapping[id];
return [this.docstore.search(uuid), result.distances[index]] as [
Document,
number
];
});
} You can find this in the langchain/src/vectorstores/faiss.ts file. I hope this helps clarify the behavior you're seeing. If you have any more questions, feel free to ask! Best, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm trying to run a similarity search using a FAISS Vector store. but I am receiving distances scores with values that are higher than I expected.
the seems similar to the issue in the python version of the libraries here
langchain-ai/langchain#12535
if i look in the FAISS Node 0.3.0 package , there is an example search as follows:
The distances object here returns a distances array as [ 0, 1, 4, 9 ]
I think I was expecting the distances as cosines of the dissimilarity of the vectors
Cosine Similarity: Measures the cosine of the angle between vectors, indicating their similarity. Higher values mean greater similarity.
Cosine Distance: Measures the dissimilarity between vectors as the complement of the cosine similarity. Higher values mean greater dissimilarity.
Beta Was this translation helpful? Give feedback.
All reactions