Performance Issue in RAG Pipeline with LangChain, ChromaDB, and GPT-4o #31014

Adarsh-AMT · 2025-04-25T09:08:55Z

Adarsh-AMT
Apr 25, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

vector_similarity_search = chroma_initializer.max_marginal_relevance_search(
    query=question,
    k=k,
    fetch_k=2 * k,  # Fetching more for diversity
    lambda_mult=0.6,  # Balancing relevance and diversity
    where_document=where_document_filter,
)  



def get_llm_object(open_api_key, model, t):
    return ChatOpenAI(
        temperature=t,
        model_name=model,
        cache=False,
        api_key=open_api_key,
    )

llm = get_llm_object(open_api_key, model, temperature)
response = llm(messages)

Description

We are currently working on a Retrieval-Augmented Generation (RAG) pipeline using LangChain, ChromaDB, and GPT-4o. The pipeline is functionally correct, but we are experiencing performance issues, particularly during the similarity search step.

Pipeline Details :
Similarity Search (ChromaDB)

vector_similarity_search = chroma_initializer.max_marginal_relevance_search(
    query=question,
    k=k,
    fetch_k=2 * k,  # Fetching more for diversity
    lambda_mult=0.6,  # Balancing relevance and diversity
    where_document=where_document_filter,
)

Average Time Taken: approximately 2.94 seconds (range: 1.5s – 5.0s)

Model Inference (GPT-4o via LangChain)

def get_llm_object(open_api_key, model, t):
    return ChatOpenAI(
        temperature=t,
        model_name=model,
        cache=False,
        api_key=open_api_key,
    )

llm = get_llm_object(open_api_key, model, temperature)
response = llm(messages)

Average Time Taken: approximately 3.39 seconds (range: 3.1s – 4.0s)

Sample Execution Timings
Question One
Similarity Search Time: 5.02 seconds
Model Response Time: 3.13 seconds
Total Time: 9.02 seconds

Question Two
Similarity Search Time: 2.24 seconds
Model Response Time: 3.20 seconds
Total Time: 6.78 seconds

Question Three
Similarity Search Time: 3.00 seconds
Model Response Time: 3.20 seconds
Total Time: 6.47 seconds

Question Four
Similarity Search Time: 1.50 seconds
Model Response Time: 4.03 seconds
Total Time: 6.00 seconds

We are looking for any optimum way to resolve the performance issue and improve overall response time. Suggestions related to retrieval optimization, ChromaDB configuration, caching strategies, or model handling are welcome.

System Info

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Issue in RAG Pipeline with LangChain, ChromaDB, and GPT-4o #31014

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Performance Issue in RAG Pipeline with LangChain, ChromaDB, and GPT-4o #31014

Uh oh!

Adarsh-AMT Apr 25, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 0 comments

Adarsh-AMT
Apr 25, 2025