Replies: 1 comment
-
🤖 Hello, Thank you for reaching out and providing a detailed description of your issue. I understand that you're having trouble retrieving more than four similar documents using the OpenSearchVectorSearch function in LangChain, even when you set the 'k' parameter to a value greater than four. This issue seems to be related to the default settings of the OpenSearchVectorSearch function in LangChain. Specifically, the 'k' value in the '_default_approximate_search_query' function, which is used to construct the search query for the OpenSearch client, is set to 4 by default. This 'k' value also determines the 'size' parameter in the query, which is the number of results to return. Here is the relevant code: def _default_approximate_search_query(
query_vector: List[float],
k: int = 4,
vector_field: str = "vector_field",
) -> Dict:
"""For Approximate k-NN Search, this is the default query."""
return {
"size": k,
"query": {"knn": {vector_field: {"vector": query_vector, "k": k}}},
} Even if you set 'k' to a value greater than four in the 'similarity_search' function, it will not affect the number of results returned because the 'k' value in the '_default_approximate_search_query' function is not updated. To fix this issue, you need to ensure that the 'k' value in the '_default_approximate_search_query' function is updated to match the 'k' value in the 'similarity_search' function. This could be achieved by passing 'k' as a parameter to the '_default_approximate_search_query' function and updating the 'size' parameter in the query accordingly. Additionally, I found a similar issue in the LangChain repository: OpenSearch VectorStore cannot return more than 4 retrieved result.. The suggested solution in this issue is to set another parameter I hope this helps! If you have any other questions or need further clarification, please don't hesitate to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello GitHub Community,
I hope you're doing well. I'm currently facing an issue with OpenSearch's similarity search using Langchain's OpenSearchVectorSearch function. Specifically, the similarity search is not returning more than four similar documents. I have thoroughly reviewed my code and data ingestion process, but I'm still encountering this limitation. Here are some details about my setup and the issue:
Code Overview:
I have a Python script that ingests data, converts it into embeddings, and indexes it into OpenSearch. I'm using Langchain's OpenSearchVectorSearch function to perform similarity search. The code appears to work correctly for data ingestion, and I can confirm that all documents are being indexed successfully.
Problem Description:
When I query for similarity search using Langchain's OpenSearchVectorSearch, I'm getting a maximum of four similar documents in the results. However, I expected to retrieve more similar documents based on the query.
docs = _vector_db.similarity_search(req.q, k=6)
When i try k=3 i returns me 3 similar docs but if i give it more than 4 than it only returns 4 similar docs as result.
Troubleshooting Steps Taken:
I verified that the embeddings are generated correctly, and they have the expected dimensionality (384).
I reviewed the query parameters to ensure that I'm not limiting the number of results to four inadvertently.
I ran sample similarity search queries with different parameters directly against OpenSearch to confirm the limitation exists outside of my code.
Request for Assistance:
I'm seeking guidance on how to diagnose and resolve this issue, specifically when using Langchain's OpenSearchVectorSearch function for similarity search. If you have any insights into why OpenSearch is returning a limited number of results for similarity search when using this function or if you've encountered a similar problem before, I would greatly appreciate your assistance.
Beta Was this translation helpful? Give feedback.
All reactions