This project implements a multi-agent Retrieval-Augmented Generation (RAG) system designed to provide expert advice on cell culture techniques and knowledge. The web crawler automatically finds relevant websites, scrapes and stores the information in a vector database, and the AI agents make decisions on how to use all of this knowledge to generate context-aware, accurate responses to user queries.
flowchart LR
subgraph Data_Ingestion
Crawler["parallel_crawler.py (AsyncWebCrawler)"] -->|markdown| Chunker["chunker.py"]
Chunker -->|"title/summaries embeddings"| Supabase["documents table"]
end
subgraph MultiAgent_QA
User["User question"] --> UI["Streamlit UI"]
UI --> Agent["cell_culture_agent"]
Agent --> Retrieval["Retrieval Agent"]
Retrieval --> Reasoning["Reasoning Agent"]
Reasoning --> Calculation["Calculation Agent"]
Calculation --> Planning["Planning Agent"]
Planning --> Answer["Final Answer"]
Answer --> UI
end
Supabase --- Retrieval
There are four AI agents that cooperate to produce the final answer:
-
Retrieval Agent: This agent queries the vector database to fetch the most relevant documents based on the user's input. The agent is configured to work with multiple vector databases.
-
Reasoning Agent: This agent consumes retrieved passages from the Retrieval Agent along with the user's question to generate intermediate explanations.
-
Calculation Agent: This agent handles quantitative tasks such as unit conversions or statistical operations that support the reasoning process.
-
Planning Agent: This agent orchestrates the overall workflow by synthesizing information from each agent and composing their outputs into the final answer.
Here are some sample questions! Click through to see videos of the high-quality responses from the Streamlit demo.
Design a dose–response experiment for a new anti-cancer compound in cell culture.
demo_1.mp4
Troubleshoot a repeated contamination scenario.
demo_2.mp4
How would you calculate the volume of cell suspension needed per well and the total number of cells required?
demo_3.mp4
Media preparation math
demo_4.mp4
Create experimental plan for creating a stable cell line expressing gene X
demo_5.mp4
Compare two culture models – traditional 2D monolayer vs 3D organoid culture
demo_6.mp4
To set up the required Python environment using Conda, follow these steps:
-
Make sure you have
Anaconda(orMiniconda) installed and added to your path. -
Clone this repository and navigate to the project directory.
-
Run the following command from your terminal to create the environment from the included
environment.ymlfile.
conda env create --file environment.yml- Activate the environment using:
conda activate cellRAGNow your environment should be ready to use!
Create a .env file with the following:
# OpenAI API credentials
# Will need to create an OpenAI account to generate an API key
OPENAI_API_KEY=
LLM_MODEL="gpt-4o-mini" # Or your choice of model
EMBEDDING_MODEL="text-embedding-3-small" # Or your choice of model
# Supabase credentials
# Log in to Supabase online and create a project.
# The project URL and API keys are available from your project dashboard.
SUPABASE_URL=
SUPABASE_API_KEY=
SUPABASE_SERVICE_KEY=crawl4ai: Web crawling and data scrapinglangchain: Helpful functions for splitting documents and processing chunkssupabase: Store document chunks, metadata, and vector embeddings in a curated knowledge baseopenai: Provide a strong pre-trained, instruction-tuned base Large Language Model (LLM) to usepydantic-ai: Build an Agentic RAG system with defined dependencies and toolsstreamlit: Create a beautiful chat UI to interact with the RAG agent
To run the data collection, processing, and storage pipeline:
- Navigate to the
src/data_collectiondirectory. - Run the following command from your terminal:
python parallel_crawler.pyPlease make sure that you have already created a Supabase account and a New Project. From your Project dashboard, go to the SQL Editor tab and paste in the SQL commands from documents.sql.
This will set up the documents table with vector storage capabilities and Row-Level Security (RLS). The SQL script also defines the match_docs function, which will be used to query the database for relevant documents in the RAG pipeline.
Run the Streamlit app using the following command:
streamlit run chatbot_ui.pyThe app will run locally at http://localhost:8501.
- query rewriting
- relevance feedback
- contextual compression
**********************************************************************************************
* ,---, ,-. ,---, *
* ' .' \ ,--/ /| ' .' \ *
* / ; '. ,--. :/ | / ; '. __ ,-. *
* : : \ .--.--. : : ' / : : \ ,----._,. ,' ,'/ /| *
* : | /\ \ / / ' | ' / : | /\ \ / / ' / ,--.--. ' | |' | *
* | : ' ;. :| : /`./ ' | : | : ' ;. : | : | / \ | | ,' *
* | | ;/ \ \ : ;_ | | \ | | ;/ \ \| | .\ . .--. .-. |' : / *
* ' : | \ \ ,'\ \ `. ' : |. \ ' : | \ \ ,'. ; '; | \__\/: . .| | ' *
* | | ' '--' `----. \| | ' \ \ | | ' '--' ' . . | ," .--.; |; : | *
* | : : / /`--' /' : |--' | : : `---`-'| | / / ,. || , ; *
* | | ,' '--'. / ; |,' | | ,' .'__/\_: |; : .' \---' *
* `--'' `--'---' '--' `--'' | : :| , .-./ *
* \ \ / `--`---' *
* `--`-' *
**********************************************************************************************
