Skip to content

A custom Agentic Retrieval-Augmented Generation (RAG) model that is an expert in cell culture techniques and knowledge.

License

Notifications You must be signed in to change notification settings

dna-witch/cell-culture-Agentic-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧪 Ask Agar: An Agentic RAG Pipeline for Cell Culture Protocols

This project implements a multi-agent Retrieval-Augmented Generation (RAG) system designed to provide expert advice on cell culture techniques and knowledge. The web crawler automatically finds relevant websites, scrapes and stores the information in a vector database, and the AI agents make decisions on how to use all of this knowledge to generate context-aware, accurate responses to user queries.

flowchart LR
  subgraph Data_Ingestion
    Crawler["parallel_crawler.py (AsyncWebCrawler)"] -->|markdown| Chunker["chunker.py"]
    Chunker -->|"title/summaries embeddings"| Supabase["documents table"]
  end

  subgraph MultiAgent_QA
    User["User question"] --> UI["Streamlit UI"]
    UI --> Agent["cell_culture_agent"]
    Agent --> Retrieval["Retrieval Agent"]
    Retrieval --> Reasoning["Reasoning Agent"]
    Reasoning --> Calculation["Calculation Agent"]
    Calculation --> Planning["Planning Agent"]
    Planning --> Answer["Final Answer"]
    Answer --> UI
  end

  Supabase --- Retrieval
Loading

There are four AI agents that cooperate to produce the final answer:

  1. Retrieval Agent: This agent queries the vector database to fetch the most relevant documents based on the user's input. The agent is configured to work with multiple vector databases.

  2. Reasoning Agent: This agent consumes retrieved passages from the Retrieval Agent along with the user's question to generate intermediate explanations.

  3. Calculation Agent: This agent handles quantitative tasks such as unit conversions or statistical operations that support the reasoning process.

  4. Planning Agent: This agent orchestrates the overall workflow by synthesizing information from each agent and composing their outputs into the final answer.

Demo Videos

Here are some sample questions! Click through to see videos of the high-quality responses from the Streamlit demo.

Design a dose–response experiment for a new anti-cancer compound in cell culture.
demo_1.mp4
Troubleshoot a repeated contamination scenario.
demo_2.mp4
How would you calculate the volume of cell suspension needed per well and the total number of cells required?
demo_3.mp4
Media preparation math
demo_4.mp4
Create experimental plan for creating a stable cell line expressing gene X
demo_5.mp4
Compare two culture models – traditional 2D monolayer vs 3D organoid culture
demo_6.mp4

Environment Setup

To set up the required Python environment using Conda, follow these steps:

  1. Make sure you have Anaconda (or Miniconda) installed and added to your path.

  2. Clone this repository and navigate to the project directory.

  3. Run the following command from your terminal to create the environment from the included environment.yml file.

conda env create --file environment.yml
  1. Activate the environment using:
conda activate cellRAG

Now your environment should be ready to use!

Setup .env File

Create a .env file with the following:

# OpenAI API credentials
# Will need to create an OpenAI account to generate an API key
OPENAI_API_KEY=
LLM_MODEL="gpt-4o-mini"  # Or your choice of model
EMBEDDING_MODEL="text-embedding-3-small"  # Or your choice of model

# Supabase credentials
# Log in to Supabase online and create a project. 
# The project URL and API keys are available from your project dashboard.
SUPABASE_URL=
SUPABASE_API_KEY=
SUPABASE_SERVICE_KEY=

Important Packages

  • crawl4ai : Web crawling and data scraping
  • langchain : Helpful functions for splitting documents and processing chunks
  • supabase : Store document chunks, metadata, and vector embeddings in a curated knowledge base
  • openai : Provide a strong pre-trained, instruction-tuned base Large Language Model (LLM) to use
  • pydantic-ai : Build an Agentic RAG system with defined dependencies and tools
  • streamlit : Create a beautiful chat UI to interact with the RAG agent

Data Collection and Processing

To run the data collection, processing, and storage pipeline:

  1. Navigate to the src/data_collection directory.
  2. Run the following command from your terminal:
python parallel_crawler.py

knowledge base diagram

Database Setup

Please make sure that you have already created a Supabase account and a New Project. From your Project dashboard, go to the SQL Editor tab and paste in the SQL commands from documents.sql.

This will set up the documents table with vector storage capabilities and Row-Level Security (RLS). The SQL script also defines the match_docs function, which will be used to query the database for relevant documents in the RAG pipeline.

Run the Agentic RAG Pipeline

Run the Streamlit app using the following command:

streamlit run chatbot_ui.py

The app will run locally at http://localhost:8501.

Further Testing and Optimization

  • query rewriting
  • relevance feedback
  • contextual compression
**********************************************************************************************
*    ,---,                        ,-.           ,---,                                        *
*   '  .' \                   ,--/ /|          '  .' \                                       *
*  /  ;    '.               ,--. :/ |         /  ;    '.                             __  ,-. *
* :  :       \    .--.--.   :  : ' /         :  :       \     ,----._,.            ,' ,'/ /| *
* :  |   /\   \  /  /    '  |  '  /          :  |   /\   \   /   /  ' /   ,--.--.  '  | |' | *
* |  :  ' ;.   :|  :  /`./  '  |  :          |  :  ' ;.   : |   :     |  /       \ |  |   ,' *
* |  |  ;/  \   \  :  ;_    |  |   \         |  |  ;/  \   \|   | .\  . .--.  .-. |'  :  /   *
* '  :  | \  \ ,'\  \    `. '  : |. \        '  :  | \  \ ,'.   ; ';  |  \__\/: . .|  | '    *
* |  |  '  '--'   `----.   \|  | ' \ \       |  |  '  '--'  '   .   . |  ," .--.; |;  : |    *
* |  :  :        /  /`--'  /'  : |--'        |  :  :         `---`-'| | /  /  ,.  ||  , ;    *
* |  | ,'       '--'.     / ;  |,'           |  | ,'         .'__/\_: |;  :   .'   \---'     *
* `--''           `--'---'  '--'             `--''           |   :    :|  ,     .-./         *
*                                                             \   \  /  `--`---'             *
*                                                              `--`-'                        *
**********************************************************************************************

About

A custom Agentic Retrieval-Augmented Generation (RAG) model that is an expert in cell culture techniques and knowledge.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors