🧪 Ask Agar: An Agentic RAG Pipeline for Cell Culture Protocols

This project implements a multi-agent Retrieval-Augmented Generation (RAG) system designed to provide expert advice on cell culture techniques and knowledge. The web crawler automatically finds relevant websites, scrapes and stores the information in a vector database, and the AI agents make decisions on how to use all of this knowledge to generate context-aware, accurate responses to user queries.

flowchart LR
  subgraph Data_Ingestion
    Crawler["parallel_crawler.py (AsyncWebCrawler)"] -->|markdown| Chunker["chunker.py"]
    Chunker -->|"title/summaries embeddings"| Supabase["documents table"]
  end

  subgraph MultiAgent_QA
    User["User question"] --> UI["Streamlit UI"]
    UI --> Agent["cell_culture_agent"]
    Agent --> Retrieval["Retrieval Agent"]
    Retrieval --> Reasoning["Reasoning Agent"]
    Reasoning --> Calculation["Calculation Agent"]
    Calculation --> Planning["Planning Agent"]
    Planning --> Answer["Final Answer"]
    Answer --> UI
  end

  Supabase --- Retrieval

There are four AI agents that cooperate to produce the final answer:

Retrieval Agent: This agent queries the vector database to fetch the most relevant documents based on the user's input. The agent is configured to work with multiple vector databases.
Reasoning Agent: This agent consumes retrieved passages from the Retrieval Agent along with the user's question to generate intermediate explanations.
Calculation Agent: This agent handles quantitative tasks such as unit conversions or statistical operations that support the reasoning process.
Planning Agent: This agent orchestrates the overall workflow by synthesizing information from each agent and composing their outputs into the final answer.

Demo Videos

Here are some sample questions! Click through to see videos of the high-quality responses from the Streamlit demo.

Design a dose–response experiment for a new anti-cancer compound in cell culture.

demo_1.mp4

Troubleshoot a repeated contamination scenario.

demo_2.mp4

How would you calculate the volume of cell suspension needed per well and the total number of cells required?

demo_3.mp4

Media preparation math

demo_4.mp4

Create experimental plan for creating a stable cell line expressing gene X

demo_5.mp4

Compare two culture models – traditional 2D monolayer vs 3D organoid culture

demo_6.mp4

Environment Setup

To set up the required Python environment using Conda, follow these steps:

Make sure you have Anaconda (or Miniconda) installed and added to your path.
Clone this repository and navigate to the project directory.
Run the following command from your terminal to create the environment from the included environment.yml file.

conda env create --file environment.yml

Activate the environment using:

conda activate cellRAG

Now your environment should be ready to use!

Setup `.env` File

Create a .env file with the following:

# OpenAI API credentials
# Will need to create an OpenAI account to generate an API key
OPENAI_API_KEY=
LLM_MODEL="gpt-4o-mini"  # Or your choice of model
EMBEDDING_MODEL="text-embedding-3-small"  # Or your choice of model

# Supabase credentials
# Log in to Supabase online and create a project. 
# The project URL and API keys are available from your project dashboard.
SUPABASE_URL=
SUPABASE_API_KEY=
SUPABASE_SERVICE_KEY=

Important Packages

crawl4ai : Web crawling and data scraping
langchain : Helpful functions for splitting documents and processing chunks
supabase : Store document chunks, metadata, and vector embeddings in a curated knowledge base
openai : Provide a strong pre-trained, instruction-tuned base Large Language Model (LLM) to use
pydantic-ai : Build an Agentic RAG system with defined dependencies and tools
streamlit : Create a beautiful chat UI to interact with the RAG agent

Data Collection and Processing

To run the data collection, processing, and storage pipeline:

Navigate to the src/data_collection directory.
Run the following command from your terminal:

python parallel_crawler.py

Database Setup

Please make sure that you have already created a Supabase account and a New Project. From your Project dashboard, go to the SQL Editor tab and paste in the SQL commands from documents.sql.

This will set up the documents table with vector storage capabilities and Row-Level Security (RLS). The SQL script also defines the match_docs function, which will be used to query the database for relevant documents in the RAG pipeline.

Run the Agentic RAG Pipeline

Run the Streamlit app using the following command:

streamlit run chatbot_ui.py

The app will run locally at http://localhost:8501.

Further Testing and Optimization

query rewriting
relevance feedback
contextual compression

**********************************************************************************************
*    ,---,                        ,-.           ,---,                                        *
*   '  .' \                   ,--/ /|          '  .' \                                       *
*  /  ;    '.               ,--. :/ |         /  ;    '.                             __  ,-. *
* :  :       \    .--.--.   :  : ' /         :  :       \     ,----._,.            ,' ,'/ /| *
* :  |   /\   \  /  /    '  |  '  /          :  |   /\   \   /   /  ' /   ,--.--.  '  | |' | *
* |  :  ' ;.   :|  :  /`./  '  |  :          |  :  ' ;.   : |   :     |  /       \ |  |   ,' *
* |  |  ;/  \   \  :  ;_    |  |   \         |  |  ;/  \   \|   | .\  . .--.  .-. |'  :  /   *
* '  :  | \  \ ,'\  \    `. '  : |. \        '  :  | \  \ ,'.   ; ';  |  \__\/: . .|  | '    *
* |  |  '  '--'   `----.   \|  | ' \ \       |  |  '  '--'  '   .   . |  ," .--.; |;  : |    *
* |  :  :        /  /`--'  /'  : |--'        |  :  :         `---`-'| | /  /  ,.  ||  , ;    *
* |  | ,'       '--'.     / ;  |,'           |  | ,'         .'__/\_: |;  :   .'   \---'     *
* `--''           `--'---'  '--'             `--''           |   :    :|  ,     .-./         *
*                                                             \   \  /  `--`---'             *
*                                                              `--`-'                        *
**********************************************************************************************

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 Ask Agar: An Agentic RAG Pipeline for Cell Culture Protocols

Demo Videos

Environment Setup

Setup `.env` File

Important Packages

Data Collection and Processing

Database Setup

Run the Agentic RAG Pipeline

Further Testing and Optimization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

dna-witch/cell-culture-Agentic-RAG

Folders and files

Latest commit

History

Repository files navigation

🧪 Ask Agar: An Agentic RAG Pipeline for Cell Culture Protocols

Demo Videos

Environment Setup

Setup .env File

Important Packages

Data Collection and Processing

Database Setup

Run the Agentic RAG Pipeline

Further Testing and Optimization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Setup `.env` File

Packages