Retrieved information from a weaviate vectorstore is dependent on the LLM used #24780

hboen1990 · 2024-07-29T13:43:47Z

hboen1990
Jul 29, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

xxx

Description

I have created a RAG application with weaviate as vectorstore, embedding="BAAI/bge-base-en-v1.5". I have split some Dutch text using the RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200, add_start_index=True).
retriever =vector_store.as_retriever(search_kwargs={"k":5,"score_threshold":0.6},search_type='similarity_score_threshold')
When I tested with openai gpt4o-mini, gemini-1.5-flash, gemini-1.5-pro and asks the following question: “wie betaalt de pgb voor de wmo?” I get the following answer: “Ik weet niet wie de pgb voor de Wmo betaalt.” Which is not correct! If I asks the same question using gpt-4o then I get the correct answer: “De gemeente betaalt het persoonsgebonden budget (pgb) niet direct aan u, maar aan de Sociale Verzekeringsbank (SVB). De SVB beheert het geld en betaalt uw zorgaanbieder achteraf, aan het eind van de maand, nadat u de facturen heeft gecontroleerd en opdracht heeft gegeven om te betalen”.
If I print the response using response = retriever.invoke(input="wie betaalt de pgb voor de wmo?") then I see in the 5 retrieved document chunks about the information regarding the “PGB” and which organization is paying it (should answer with SVB).
I don’t understand the inconsistency and why the retrieved chunks are NOT used when I am not using gpt4-o… See attachments of the results of the retrieved chunks when using gpt4o-mini (no info about SVB) but the info is available when using gpt4-o. How is this possible?
Versions of libraries used:

langchain 0.2.6
langchain-cli 0.0.25
langchain-community 0.2.6
langchain-core 0.2.11
langchain-experimental 0.0.62
langchain-google-genai 1.0.7
langchain-openai 0.1.14
langchain-text-splitters 0.2.2
langchain-weaviate 0.0.2
weaviate-client 4.6.5
rag_results.docx

System Info

langchain 0.2.6
langchain-cli 0.0.25
langchain-community 0.2.6
langchain-core 0.2.11
langchain-experimental 0.0.62
langchain-google-genai 1.0.7
langchain-openai 0.1.14
langchain-text-splitters 0.2.2
langchain-weaviate 0.0.2
weaviate-client 4.6.5

python 3.11

@dosu · 2024-07-29T15:20:25Z

dosubot[bot]
bot Jul 29, 2024

The inconsistency in the retrieved document chunks when using different LLMs with the same retriever setup in LangChain can be attributed to several factors:

Different Context Windows: Different LLMs have varying context window sizes, which can affect how much of the document they can process at once. This can lead to different chunks being processed and different parts of the document being considered relevant.
Model-Specific Behavior: Each LLM may have different internal mechanisms for understanding and processing text, leading to variations in the extracted information even when given the same input.
Relevance Scoring: Different LLMs might score the relevance of chunks differently, leading to variations in which chunks are selected for extraction.
Query Rephrasing: If using a query rephrasing retriever like RePhraseQueryRetriever, the rephrased queries generated by different LLMs can lead to different sets of documents being retrieved.

To diagnose and resolve these inconsistencies:

Standardize Context Windows: Ensure that the chunk size is within the context window of the smallest LLM being used.
Consistent Chunking: Use a consistent chunking strategy across different LLMs to minimize variations due to chunk boundaries.
Evaluate Relevance Scoring: Compare the relevance scoring mechanisms of different LLMs and adjust the retriever settings to ensure consistency.
Consistent Query Rephrasing: If using RePhraseQueryRetriever, ensure that the prompt template and LLM used for query rephrasing are consistent across different runs.

Here is an example of how to set up a consistent chunking strategy and de-duplication:

from langchain_text_splitters import TokenTextSplitter

# Define a consistent chunking strategy
text_splitter = TokenTextSplitter(
    chunk_size=2000,  # Ensure this is within the context window of all LLMs used
    chunk_overlap=20,
)

texts = text_splitter.split_text(document.page_content)

# Extract information using the same retriever setup
extractions = extractor.batch(
    [{"text": text} for text in texts],
    {"max_concurrency": 5},
)

# Merge and de-duplicate results
key_developments = []
seen_evidence = set()

for extraction in extractions:
    for key_development in extraction.key_developments:
        if key_development.evidence not in seen_evidence:
            key_developments.append(key_development)
            seen_evidence.add(key_development.evidence)

key_developments[:10]

Additionally, consider using the MultiQueryRetriever to generate multiple versions of the query to improve retrieval consistency:

from langchain_core.prompts.prompt import PromptTemplate
from langchain_core.language_models import BaseLanguageModel
from langchain_core.retrievers import BaseRetriever
from langchain.retrievers.multi_query import MultiQueryRetriever

# Define the prompt template for generating multiple queries
DEFAULT_QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is 
    to generate 3 different versions of the given user 
    question to retrieve relevant documents from a vector database. 
    By generating multiple perspectives on the user question, 
    your goal is to help the user overcome some of the limitations 
    of distance-based similarity search. Provide these alternative 
    questions separated by newlines. Original question: {question}""",
)

# Initialize MultiQueryRetriever
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=your_retriever,
    llm=your_llm,
    prompt=DEFAULT_QUERY_PROMPT,
    include_original=True
)

By ensuring consistency in chunking, relevance scoring, and query rephrasing, and by using tools like MultiQueryRetriever, you can minimize inconsistencies in the retrieved document chunks ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

1 reply

hboen1990 Jul 30, 2024
Author

That model specific behaviour can occur for the same input and temperature=0 is very disturbing to know

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrieved information from a weaviate vectorstore is dependent on the LLM used #24780

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Retrieved information from a weaviate vectorstore is dependent on the LLM used #24780

Uh oh!

hboen1990 Jul 29, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment · 1 reply

Uh oh!

dosubot[bot] bot Jul 29, 2024

Uh oh!

hboen1990 Jul 30, 2024 Author

hboen1990
Jul 29, 2024

Replies: 1 comment 1 reply

dosubot[bot]
bot Jul 29, 2024

hboen1990 Jul 30, 2024
Author